I ve found a weird performance inconsistency with my code on kotlinlang #kotlin-native

I've found a weird performance inconsistency with ...

Krystian

01/13/2023, 7:55 PM

I've found a weird performance inconsistency with my code on Linux and Windows... I'm working on a Raylib binding to K/N (game dev framework) and I've created a bunnymark test. What makes no sense to me that on my main PC - 32GB RAM, Ryzen 5 5600X and a RX5600XT I can render max 6500 entities (on Windows) until FPS drops down to 30... On my Linux laptop that runs on 16GB RAM, i7 11th gen with Iris iGPU I score 124200 entities till 30fps? Is there anything that could be causing this issue? Does K/N on Windows use different memory allocator or something that could also be causing this?

kevin.cianfarini

01/13/2023, 8:38 PM

have you isolated K/N from the equation? It’s possible the underlying APIs on Windows could perform differently than on Linux. Maybe try writing a small C repro that confirms this isn’t the case?

Krystian

01/13/2023, 9:20 PM

I've just ran the exec on a live linux usb and it managed to draw 130700 entities with the K/N binding. I've also ran a pure C code and it performance perfectly fine and renders correct results

Krystian

01/13/2023, 9:21 PM

So in this end there is something VERY weird going on here with K/N on windows when it comes to performance

Krystian

01/13/2023, 9:21 PM

Also both K/N and C benchmarks run on the same lib, no difference there.

Adam S

01/13/2023, 9:23 PM

it sounds like you have a solid basis for making a ticket - such an example means performance could be tested and improved https://youtrack.jetbrains.com/newIssue

kevin.cianfarini

01/13/2023, 9:23 PM

Interesting. I’ve not had to do this, but maybe try conjuring some flame graphs to see what functions are acting slowly? I wonder if it’s in cinterop land or something else

kevin.cianfarini

01/13/2023, 9:24 PM

Could even do more simple timings of methods,

println

them and compare between windows and linux

kevin.cianfarini

01/13/2023, 9:26 PM

Also I think you’d have to go in with more information on what components might be slow (cinterop generated code? allocations? function calls? etc etc) to make a ticket.

Krystian

01/13/2023, 9:28 PM

yeah I will probably have to raise a ticket albeit I'm still rather new to all of this and generating such info might be tad difficult for me. Worth mentioning that as I write the binding (it hide ugly parts of K/N of running allocs etc everywhere in a wrapper) there is no performance difference between the binding and writing it in a pure K/N style

kevin.cianfarini

01/13/2023, 9:33 PM

I suggest you do something like this.

Copy code

fun <T> traceTime(functionName: String, block: () -> T): T {
  val timedResult = measureTimedValue(block)
  println("$functionName took ${timedResult.duration.inMilliseconds} ms.")
}

val functionResult = traceTime("someRaylibFunction") { someRaylibFunction() }

kevin.cianfarini

01/13/2023, 9:33 PM

That would measure the amount of time it takes to perform

block

and print the result to stdout while allowing your program to do it’s thing. Peppering this around your codebase could allow for fairly easy comparisons of windows and linux

Krystian

01/13/2023, 9:35 PM

Thank you, I will do that and create some data with this.

Krystian

01/13/2023, 10:12 PM

There is only one function that I would really suspect causing this which is a function that draws a texture that is loaded into VRAM (GPU) which internally ofc calls OpenGL. Doing the measuring I don't see anything abnormal. On linux it's around 5.36E-4 ms. to max 6.02E-4 ms while on Windows its 5.0E-4 ms (dipping down to 4 not so often) and max 7/9. It also tends to occasionally drop down to 0.001028 ms on Linux and 0.001ms on Windows so pretty much the same. Small problem here is that I have no access to the internal opengl calls so I can't really investigate those

kevin.cianfarini

01/13/2023, 10:20 PM

The internal opengl calls are all implemented in C and K/N wouldn’t impact their performance at all

kevin.cianfarini

01/13/2023, 10:21 PM

IF a single call to opengl is taking slightly different amounts of time on windows and linux, what happens when you make 10 million opengl calls on each platform?

kevin.cianfarini

01/13/2023, 10:22 PM

I think the best way to test this would be to measure times and run your original program that renders 124,000 entities to the screen

Krystian

01/13/2023, 10:25 PM

That does make it slightly difficult as calling the function via traceTime causes the FPS to tank down almost instantly to 10s and 7s and there isn't much difference in time it took to call

Krystian

01/13/2023, 10:33 PM

Okay there is one function that does not match Linux at all which is a function that returns the X and Y position values of the mouse cursor. On linux we get 1.42E-4 ms. while on Windows min 3.0E-4 ms max 6.0E-4 ms Removing that function from Windows ver gave additional 600 entities but it does not affect pure C code test nor Linux K/N one

Krystian

01/13/2023, 11:19 PM

I have also found this https://kotlinlang.slack.com/archives/C3SGXARS6/p1619690840268400?thread_ts=1619349974.244300&cid=C3SGXARS6 which seems to talk about the exactly same problem. Abysmal performance on Windows compared to Linux/macOS

Krystian

01/13/2023, 11:38 PM

-femulated-tls

flag was meant to be removed since 1.6 (https://youtrack.jetbrains.com/issue/KT-47605) but it's still present in konan.properties so I wonder if this is causing all the performance issues on Windows?

kevin.cianfarini

01/13/2023, 11:40 PM

Are you able to clarify @sergey.bogolepov?

Krystian

01/13/2023, 11:41 PM

It would also explain why performance is much better on Linux and macOS as

-femulated-tls

flag is not present and confirmed in the post I've linked earlier

kevin.cianfarini

01/13/2023, 11:42 PM

Yeah, nice sleuthing

Krystian

01/13/2023, 11:45 PM

Much better than living in hell trying to comprehend why things just don't work in the code 🫢

sergey.bogolepov

01/14/2023, 7:26 AM

Yep,

-femululated-tls

is most likely the cause. Unfortunately, it can’t be dropped because toolchain we use is compiled with that flag. It probably can be solved by updating toolchain, but Windows-specific performance problems are out of our focus at the moment.

sergey.bogolepov

01/14/2023, 7:28 AM

BTW

-femulated-tls
flag was meant to be removed since 1.6

No, it was not. The issue is purely about LLD, not that flag.

Krystian

01/14/2023, 10:09 AM

Ah my apologies then, misread the comment on the ticket in that case

9 Views

Open in Slack

Previous Next