I've written some basic JMH benchmark to confirm it. I am not well versed in JMH benchmarks so I may have did something wrong, but it produced the same results (I too got as big as X35 differential without JMH, but JMH produced consistently ~X8 times slower results for non plain calls). I'll try to look at the byte code later.. It is strange to me. Tnx for trying it