GPT 5 Results on Kotlin Bench <https firebender com leaderbo kotlinlang #feed

Join Slack

GPT-5 Results on Kotlin Bench <https://firebender....

# feed

Kevin

08/13/2025, 6:33 PM

GPT-5 Results on Kotlin Bench https://firebender.com/leaderboard

🚫 3

🦜 2

🫡 1

🔥 1

K 12

Sergey Y.

08/14/2025, 9:02 AM

Always wondered how to read these results. Does 30% mean the model succeeded in only 30% of the tasks? If so, that’s pretty low.

👍 3

Edgar Avuzi

08/14/2025, 3:24 PM

The numbers don’t matter as long as the chart is beautiful 😂

😁 2

☝️ 1

Gat Tag

08/14/2025, 6:26 PM

From my understanding most of these benchmarks introduce tasks that we know no LLMs are close to so that there is plenty of room in evaluation. 5 models all getting 100% on a benchmark is not very informative because all that tells me is that the benchmark is not capable of measuring their performance, at which point it is just a validation test, not a benchmark. So you want a test that is at their limit and performance is low

yes black 2

💯 2

Gat Tag

08/14/2025, 6:27 PM

As they get better, the benchmark will be made more difficult. (More likely a new benchmark will be made)

Kevin

08/14/2025, 6:50 PM

exactly @Gat Tag, was going to respond sooner. also we likely will need to make the benchmark harder. if AI can even get 30% of PRs on very well maintained repos, that's already too high imo

Kevin

08/14/2025, 6:51 PM

will have more announcements on this soon, for a kotlin-bench v2 where tasks are much harder

Kevin

08/14/2025, 6:51 PM

and also measuring on different classes of tasks, not just blind PRs + tests

5 Views

Open in Slack

Previous Next