Kotlin bench result - Instacart principle android ...
# android
k
Kotlin bench result - Instacart principle android engineer (Kaushik Gopal) discussing claude 4 opus/sonnet ability to write kotlin code https://x.com/kaushikgopal/status/1926022918829461830
c
Links should go to #C0BJ0GTE2
f
Do you get paid if you do 26% of the work?
and the problem with AI test is that you can't reproduce the result as they are non deterministic. Also you can never know if they made real progress or if they just put some of the answer of your specific test directly in the model. Remember that we are in a capitalist managed system and therefore, you have to do more benefits every year or you die...
k
yep this is forcing model providers to make LLM improvements faster and faster - theres been a 13% increase in the last few months on tasks completed for this benchmark. results are not deterministic, but you can run a sample n times to get a probability/confidence interval