What kind of machine are you running this on? Beca...
# advent-of-code
j
What kind of machine are you running this on? Because my implementation is essentially the same as yours but won't go below ca 40 ms
j
What are you using to bench?
j
On the JVM I use
measureNanoTime
in a loop running my solver 25 times in a row, see https://github.com/jorispz/aoc-2018/blob/master/src/commonMain/kotlin/Runner.kt
j
Using jmh for mine. I can't fathom what kinds of magic hotspot is doing for it to be faster than native rust
Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
I even tried storing the entire table in the stack and doing unsafe (no bounds checked) access, and it was still slower
I’m guessing some SIMD black magic, just a guess though not at all familiar with JVM internals
j
Me neither. But in general, the JIT can perform runtime optimizations that a ahead of time compiler just can't do. It's magic
It's very clear in my daily cross-platform measurements, the JVM and JS always get dramatically faster after an iteration or two, while native is always constant across iterations
j
I always enjoy seeing your cross platform runtime analysis 😁
j
Thanks! I just checked, I am running on an Intel Core i7-7700HQ @ 2.80 GHz
Oh! It just occurs to me part 2 can easily be run in parallel! Hold my beer!
Running the search for the solution for part 2 in parallel using
async
brings the best time on the JVM down from 39 to 13 ms
j
Nice, looking forward to seeing it. I’ve shamefully not yet written any coroutines
j
Here's a quick version on the JVM. You can replace
Dispatchers.Default
(which uses all cores) with
singleThreadedDispatcher
on line 26 to force it to use only one core. Unfortunately, I haven't been able to get coroutines running on native, so I can't use it in my general repo for now
j
Unrelated but i think the difference in perf between your impl an mine is the
Triple
, which are boxed, new heap allocation everytime a new best is found
Also,
summedAreaTable
could be inlined
@Joris PZ Got rust down to 4ms
Copy code
problem11::part_2        ... bench:  10,120,317 ns/iter (+/- 750,638)
problem11::part_2_par    ... bench:   4,053,446 ns/iter (+/- 237,492)
Same technique, data parellization
j
Wow that's fast, cool!
j
HOLY WOW. JVM still wins
2.7 ms!
j
😮
Did you use the coroutine implementation? Or something else?
j
Java 8 parallel stream
j
Ah yes
j
At this point I’m convinced hotpot is memoizing it xD
😄 2
(sarcasm)