What kind of machine are you running this on Because my impl kotlinlang #advent-of-code

What kind of machine are you running this on? Beca...

Joris PZ

12/11/2018, 5:32 PM

What kind of machine are you running this on? Because my implementation is essentially the same as yours but won't go below ca 40 ms

joelpedraza

12/11/2018, 5:45 PM

What are you using to bench?

Joris PZ

12/11/2018, 5:51 PM

On the JVM I use

measureNanoTime

in a loop running my solver 25 times in a row, see https://github.com/jorispz/aoc-2018/blob/master/src/commonMain/kotlin/Runner.kt

joelpedraza

12/11/2018, 5:53 PM

Using jmh for mine. I can't fathom what kinds of magic hotspot is doing for it to be faster than native rust

joelpedraza

12/11/2018, 5:56 PM

Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz

joelpedraza

12/11/2018, 5:59 PM

I even tried storing the entire table in the stack and doing unsafe (no bounds checked) access, and it was still slower

joelpedraza

12/11/2018, 6:00 PM

I’m guessing some SIMD black magic, just a guess though not at all familiar with JVM internals

Joris PZ

12/11/2018, 6:03 PM

Me neither. But in general, the JIT can perform runtime optimizations that a ahead of time compiler just can't do. It's magic

Joris PZ

12/11/2018, 6:05 PM

It's very clear in my daily cross-platform measurements, the JVM and JS always get dramatically faster after an iteration or two, while native is always constant across iterations

joelpedraza

12/11/2018, 6:06 PM

I always enjoy seeing your cross platform runtime analysis 😁

Joris PZ

12/11/2018, 6:07 PM

Thanks! I just checked, I am running on an Intel Core i7-7700HQ @ 2.80 GHz

Joris PZ

12/11/2018, 6:08 PM

Oh! It just occurs to me part 2 can easily be run in parallel! Hold my beer!

Joris PZ

12/11/2018, 6:19 PM

Running the search for the solution for part 2 in parallel using

async

brings the best time on the JVM down from 39 to 13 ms

joelpedraza

12/11/2018, 6:27 PM

Nice, looking forward to seeing it. I’ve shamefully not yet written any coroutines

Joris PZ

12/11/2018, 6:30 PM

Here's a quick version on the JVM. You can replace

Dispatchers.Default

(which uses all cores) with

singleThreadedDispatcher

on line 26 to force it to use only one core. Unfortunately, I haven't been able to get coroutines running on native, so I can't use it in my general repo for now

Foo.kt

joelpedraza

12/11/2018, 6:47 PM

Unrelated but i think the difference in perf between your impl an mine is the

Triple

, which are boxed, new heap allocation everytime a new best is found

joelpedraza

12/11/2018, 6:47 PM

Also,

summedAreaTable

could be inlined

joelpedraza

12/11/2018, 7:07 PM

@Joris PZ Got rust down to 4ms

Copy code

problem11::part_2        ... bench:  10,120,317 ns/iter (+/- 750,638)
problem11::part_2_par    ... bench:   4,053,446 ns/iter (+/- 237,492)

joelpedraza

12/11/2018, 7:08 PM

Same technique, data parellization

Joris PZ

12/11/2018, 7:29 PM

Wow that's fast, cool!

joelpedraza

12/11/2018, 7:30 PM

HOLY WOW. JVM still wins

joelpedraza

12/11/2018, 7:30 PM

2.7 ms!

Joris PZ

12/11/2018, 7:31 PM

😮

Joris PZ

12/11/2018, 7:31 PM

Did you use the coroutine implementation? Or something else?

joelpedraza

12/11/2018, 7:31 PM

Java 8 parallel stream

Joris PZ

12/11/2018, 7:32 PM

Ah yes

joelpedraza

12/11/2018, 7:32 PM

At this point I’m convinced hotpot is memoizing it xD

😄 2

joelpedraza

12/11/2018, 7:33 PM

(sarcasm)

5 Views

Open in Slack

Previous Next