updated kotlin-bench with results from llama 4 maverick release. we'll update w/ llama 4 beheamouth when it becomes available.
the results were okay. not amazing, but also not horrible. it sits on par with gpt 4o performance which is nice bc its an open source model.
keep in mind, this benchmark is hard to pass, so it only gets 3% correct.
https://firebender.com/leaderboard