Performance question: the following code takes ~47...
# kotlin-native
a
Performance question: the following code takes ~47 secs on K/N with Kotlin 1.3.31 and ~29 secs with Kotlin 1.3.41 (Linux X64 target). And just 1.3 sec on JVM. Both on the same machine. I'm wondering, what is the reason?
Copy code
var map: Map<Int, Int> = emptyMap()
    repeat(10000) {
        map += it to it * 2
    }
d
Interesting, K/N got faster.
JVM has years of optimisations working it's magic. Native is currently not very performant around object graph related stuff. As you can see it's getting better.
K 4
k
There have been performance improvements around memory management in 1.3.4x. Running a profiler would be interesting to see what’s taking time. How many times are you running that loop, btw?
o
This code have plenty of allocations, and JVM handles them pretty quickly. As with any microbench, it just shows cornercase of behavior, where JVM is faster. Here many allocations come from boxed nature of generic containers.
m
for curiosity I make tests on my Linux:
Copy code
Kotlin 1.3.41:
JVM - 1101 ms
Native - 20403 ms
C++14, clang:
6.34732ms
so kotlin native is like 3400x slower than cpp 😞 and 20x slower than JVM
a
Would you mind to share your cpp snippet?
m
@Arkadii Ivanov sure:
Copy code
#include <iostream>
#include <map>
#include <chrono>

int main() {
    auto start = std::chrono::steady_clock::now();

    std::map<int, int> map;

    for (int i = 0; i < 10000; i++)
        map.emplace(std::make_pair(i * 2, i));

    auto end = std::chrono::steady_clock::now();
    auto diff = end - start;
    std::cout << std::chrono::duration<double, std::milli>(diff).count() << " ms\n";

    return 0;
}
o
C++ is irrelevant here. It uses compile-time specialization, technically you can make it way faster by creating literal collections, but not sure what it measures ;).
a
I'm not really familiar with C but docs say that emplace has LogN complexity. You should allocate a new set and copy all the elements from previous set plus the new element.
@kpgalligan I ran only once per test. So it's just single execution takes that much time.
s
that is because of JVM, kotlin sadly is bound to the mistakes of Java, everything is boxed here, so it hurts performance
g
Not everything is boxed, this example is boxed as hell
Yes, it allocates a lot and copies a lot But it can be easily avoided. Not really sure what is measured here
a
You can avoid allocations in this particular example, but you can't avoid allocations in general. Even Google says that it's almost free to allocate small short lived object in the modern ART. Plus Kotlin as language encourages you to allocate, e.g. to add key-value to an immutable map, you are creating Pair and pass it to 'plus' method. And it's thrown away right after execution of the method.
g
Google says that about ART which is very different comparing to HotSpot and optimized for such usecases
a
Of course but allocations are what we are doing normally. Why there is no 'plus' method that accepts just key and value?
k
I personally find the overloaded collection operators to be terrible. They’re confusing. When I first read this I though the += was just doing a
.put
, so a 10k loop would make no sense at 29 seconds.
g
Of course but allocations are what we are doing normally
This example has no sense for me and I’m not sure how would do this normally
a
Yep, also "+=" operator on e.g. MutableList shows warning as it's confused with List.plusAssign and MutableList.plusAssign operators, but it's a different story
@gildor I believe this example shows slowness of allocations in K/N in general. Perhaps it can be replaced with just allocating 100000 Pairs, 20x performance difference is huge to me. Our iOS team is also very concerned about it, and it prevents Kotlin Mulitplaform from being spread in our company. Hope it will be improved soon.
g
Yes. K/N allocations are slow, I’m not talking about K/N, I’m talking about JVM
Perhaps it can be replaced with just allocating 100000 Pairs
It can be allocated by allocating map + entities per pair or by specialized collection that doesn’t box ints
20x performance difference is huge to me
completely different C++ example with JVM? Or 20 years of JVM with tracing GC (which by definition more efficient way to manage memory than manual alloc or ARC) vs not even releases K/N?
a
I'm not blaming K/N, I just brought an issue on table. That C++ example is not relevant since it uses mutable map. BTW inserting into mutable map 10000 times on my machine is 9ms for KN vs 5ms for JVM. Similar numbers to cpp. Waiting for KN stable release 😀
k
There are multiple ways to look at this. I think we get a lot of comparisons to the JVM because that’s easy, but if we’re talking about “Swift”, you need to compare to Swift (or Objc, or whatever). Here’s a raw allocation comparison…
Copy code
data class TestClass(val s:String, val i:Int)

fun runAlloc() {
    val start = getTimeMillis()
    repeat(repeatCount()){
        val t = TestClass("arst", it)
    }
    println("runAlloc took: ${getTimeMillis() - start}")
}

fun repeatCount() = 100_000_000
That’s Kotlin
That’s the Swift class
Copy code
class TestClassSwift {
    let s:String
    let i:Int
    
    init(s:String, i:Int) {
        self.s = s
        self.i = i
    }
}
Here’s the Swift loop
Copy code
let start = <http://TestClassKt.tm|TestClassKt.tm>()
        let repeats = Int(TestClassKt.repeatCount())
        for n in 0..<repeats {
            autoreleasepool {
                let t = TestClassSwift(s: "arst", i: n)
            }
        }
        
        let total = <http://TestClassKt.tm|TestClassKt.tm>() - start
        print("Swift loop: \(total)")
The result
Copy code
runAlloc took: 10975
Swift loop: 28604
Is that a fair comparison? I don’t know. Depends what you’re trying to do. However, you could say “Kotlin Native is 2.6x faster than Swift!” but you’d have a lot more work to do to prove it I’d think.
That’s not to say KN shouldn’t improve optimizations, but still, you get what I’m saying.
s
nvm
a
Is it on the same device?
k
Sqlite access is about 3-5x over Android because there’s no JNI (presumably that’s the reason, anyway).
That’s emulator
But they’re run right after each other
Copy code
TestClassKt.runAlloc()
        
        let start = <http://TestClassKt.tm|TestClassKt.tm>()
        let repeats = Int(TestClassKt.repeatCount())
        for n in 0..<repeats {
            autoreleasepool {
                let t = TestClassSwift(s: "arst", i: n)
            }
        }
        
        let total = <http://TestClassKt.tm|TestClassKt.tm>() - start
        print("Swift loop: \(total)")
Now, in Swift that should be a struct (probably), but we’re getting back to apples !apples
Oh, you mean sqlite?
a
I gonna raise this to our iOS team 🙂
k
Definitely not same device for sqlite 😉
s
you aren't comparing the same thing
k
I am quite sure they’ll push back with something, but that’s kind of the point of benchmarks. You can get them to say things.
Who is “you”? Me?
s
yes
k
Explain
s
kotlin and swift code you showed are different
k
message has been deleted
They are different languages.
s
message has been deleted
k
Please fix
what would you change?
And, before you get into it, understand that my point is to show that this is always a slanted argument, not to defend the honor of Kotlin Native or whatever
You can get rid of autorelease, but that’s problematic
Updated.
Copy code
TestClassKt.runAlloc()
        
        let start = <http://TestClassKt.tm|TestClassKt.tm>()
        let repeats = Int(TestClassKt.repeatCount())
        for n in 0..<repeats {
            let t = TestClassSwift(s: "arst", i: n)
        }
        
        let total = <http://TestClassKt.tm|TestClassKt.tm>() - start
        print("Swift loop: \(total)")
No autoreleasepool. Results.
Copy code
runAlloc took: 10540
Swift loop: 10685
I’ve also updated the Kotlin
Copy code
fun runAlloc() {
    val start = getTimeMillis()
    val repeats = repeatCount()
    for(i in 0 until repeats){
        val t = TestClass("arst", i)
    }
    println("runAlloc took: ${getTimeMillis() - start}")
}
Almost the exact same time. Structs, however, are faster, but in theory they’re just living on the stack?
About 1.5s vs 10s to do that loop with a struct and not a class
However, if you were putting a struct in a map (or whatever) you’d be allocating it’s memory and would be in a worse world.
a
Structs are not the same, for sure. Wondering about feedback from our iOS team))
k
I’ve found that if the “other” team wants to find a showstopper, they will. Lack of structs is definitely a non-trivial topic, but comparing performance is tricky. In general app dev, though, these kinds of raw “performance” comparisons rarely translate into the actual production performance problems. If you’re CPU bound regularly for network/db calls, you have other issues.
💯 1
s
well it does, not just for CPU, but also memory usages, many phones are low-end, with limited ram, if you store lot of data in your app, structs/class makes a huge difference
k
Waiting for you to explain the test differences
s
and games too, it matters a lot, wich i think has bigger market share than traditional apps
i told you already, it was different, 2 things, your autorelease pool, and calling foreign class, wich isn't present in your kotlin example
k
Did you explain that?
s
yes i told you it was different 😛
k
I think I explained that
Copy code
data class TestClass(val s:String, val i:Int)
I figured that was obvious
a
Concern of our iOS team was not about lack of structs, but about performance itself, like allocating, executing etc. If it's not the case anymore then I will raise multiplatform question again.
k
Anyway (@sksk), the point is I didn’t say “performance doesn’t matter”, I said comparing Kotlin Native to the JVM is problematic, and comparing to Swift is more interesting, and appropriate. I don’t think we should avoid the discussions of KN performance, but it’s really easy to pick a few things that are going to do poorly and focus on them. Swift class creation seems to be exactly the same as KN, which is actually a little worrying (seems a little too perfect). I would love structs, but that’ll be a while.
s
I agree 100% with you, comparing with JVM makes no sense, swift is better candidate of comparison
and as you showed heap allocations seems the same, a bit better with KN
u
Waiting for K/N stable too..
k
“Waiting for K/N stable too..“? Curious. How are we defining stable?
u
JB to officially release it from Beta 😉
k
Beta is just a word, but fair.