So just out of curiosity I wanted to test how long...
# announcements
m
So just out of curiosity I wanted to test how long a simple
QuickSort
implementation takes vs. Kotlin's own
List<T>.sorted()
. While I was on it, I also tested variations of my implementation as an extension function and without explicitly storing values in the function. And I noticed they are very different in how fast they are processed. What is causing this? See thread ->
So I had four different options to sort a random list of integers: 1. Kotlin's own
myList.sorted()
2. My basic implementation
Copy code
fun quickSort(input: List<Int>): List<Int> {
    if (input.size < 2) return input
    val pivot = input[input.size / 2]
    val smaller = input.filter { it < pivot }
    val equal = input.filter { it == pivot }
    val bigger = input.filter { it > pivot }
    return quickSort(smaller) + equal + quickSort(bigger)
}
3. The same as an extension
Copy code
fun List<Int>.quickSortExtension(): List<Int> {
    if (this.size < 2) return this
    val pivot = this[this.size / 2]
    val smaller = this.filter { it < pivot }
    val equal = this.filter { it == pivot }
    val bigger = this.filter { it > pivot }
    return smaller.quickSortExtension() + equal + bigger.quickSortExtension()
}
4. Without explicitly storing the vals
Copy code
fun quickSort2(input: List<Int>): List<Int> {
    if (input.size < 2) return input
    val pivot = input[input.size / 2]
    return quickSort2(
    input.filter { it < pivot }) + 
    input.filter { it == pivot } + 
    quickSort2(input.filter { it > pivot })
}
Now what I found curious was that they take extremely differnet times amounts to complete. I measured them in a very simple way:
Copy code
val result = measureNanoTime {
    [myQuickSortFunction(myRandomArray)]
}/1000 // returns microseconds
So for an array with 1000 elements the average times in microseconds are like this: 1. Kotlin: 1800 2. No-val QuickSort: 3400 3. Basic QuickSort: 3600 4. Extension QuickSort: 13000 While the difference between no.2 and 3 might be negligible, it is still there - where does it come from? And how can it be explained that the extension function is so much slower - it takes more than 3 times as long? And what does the stdlib
.sorted()
do different? Now to make this even more interesting (to me at least 🙂 ), when testing the same methods with an array that just contains 10 items, the results are very different: 1. Basic QuickSort: 85 2. No-val QuickSort: 95 3. Kotlin: 650 4. Extension QuickSort: 6500 What causes these differences? Especially, why is Kotlin's standard sorting so much slower for small arrays than it is for larger ones? And what causes the extension function to be so much slower? Does that mean there is a general performance impact when using extensions vs. "regular" functions?
a
well, this is interesting
n
Keep in mind that your implementations are not actually quick sort
Quick sort is in-place
👍 2
it's hard to say what exactly is safe to conclude from these benchmarks, if you want to show that extensions are slower I would try to just measure their overhead without anything else
And then for a recursive function
You should probably also provide the full code you used to do the benchmarks. It's incredibly easy to make mistakes when micro benchmarking
☝️ 1
Not that I personally could spot any said mistakes for a JVM languy but I'm sure other people here could
p
The above code is very inefficient in that every
filter()
and
+
creates a new list. Another difference is that Kotlin's
sorted()
uses Java's Timsort, which can be significantly faster than QuickSort. To compare the performance of sorting algorithms, consider sorting arrays (see Kotlin's
Array.sort()
). Always use JMH for JVM microbenchmarks.
d
Some sorting algorithms are faster on small inputs and worse with larger or vice versa and may contain overhead for setting up. Some are better when the data is fairly sorted already but less well if they’re random. Hybrid algorithms tend to cover this nuance. The majority of the time, trying to hyper optimize something like sort is meaningless in the context of your code; consider Amdahl’s law. Or the saying that premature optimization is the root of all evil (Donald Knuth). I wouldn’t worry about the speed of each implementation too much until you know it’s a bottleneck. And lastly, if you know what you need for the input you’re expecting (or if you know the data is largely pre sorted versus random), it’s better to use the algorithm you need. The default implementation Kotlin provides A) may change and B) is suitable when you don’t care exactly how it is implemented (I believe the contract is only that it’s stable but I could be wrong). Consider the same for listOf - this is currently an array list (and may change). If you know you’ll be inserting a lot of items in the middle of your list, you might choose to explicitly use a linked list. If you know you’ll be doing a lot of random accesses, you may choose to use an array list instead (explicitly since Kotlin can change its current implementation at any time)
m
Thanks for enlightening me! Of course I am aware that neither my implementations nor my measuring are rather sophisticated...performance does not play a big role in my day to day so I usually don't benchmark things. I just did this out of interest and apparently didn't put much thought into it. Not planning to optimize anything actually, just looking around. Still curious though about the huge difference the extension function produces...
p
Best write a JMH benchmark to see if the difference is real.
e
right, JMH or other benchmarking systems are designed to handle JVM behavior, such as warmup, GC, and blackholing (so that the optimizer doesn't just strip out all your code)
Kotlin's
.sorted()
copies to an Array, which can be accessed via bytecode insructions rather than List's `get()`/`set()`, uses Java's built-in sort on that (which is likely timsort under the covers), and copies back to a List
which is 2 * n element copies while your implementations are approximately n^2 log n element copies
timsort is like a stable in-place mergesort variant with heuristics that optimizes for minimizing comparison operations on inputs that are already somewhat structured... in the end it's O(n log n) like other comparison-based sorting methods, but it's effectively O(n) when you're just re-sorting a pre-sorted list after inserting a single element, or other common scenarios
note that you will get different timings again if you sort IntArray instead of Array<Int>, as that uses yet a different algorithm under the covers, since there's no need to use a stable sort on value types
(not to mention the difference between boxed and unboxed integers... not something that matters in many cases, but it can certainly show up in microbenchmarks)
p
I did a not very exact benchmark using JMH and all it really shows is that primitive arrays are really important if you care about speed
Copy code
MyBenchmark.quickSort2                 thrpt   25     2787.005 ±     66.497  ops/s
MyBenchmark.quickSortExtension         thrpt   25     2848.910 ±     11.635  ops/s
MyBenchmark.quickSortNonExtension      thrpt   25     3112.069 ±     45.936  ops/s
MyBenchmark.sortBoxedArrayInPlace      thrpt   25   511593.816 ±  12580.323  ops/s
MyBenchmark.sortBoxedArrayToArray      thrpt   25    11702.525 ±    112.049  ops/s
MyBenchmark.sortBoxedArrayToList       thrpt   25    12085.770 ±    360.113  ops/s
MyBenchmark.sortListInPlace            thrpt   25  1140047.824 ±  13258.572  ops/s
MyBenchmark.sortListToList             thrpt   25    12124.497 ±    165.510  ops/s
MyBenchmark.sortPrimitiveArrayInPlace  thrpt   25  4523362.961 ± 176819.205  ops/s
MyBenchmark.sortPrimitiveArrayToArray  thrpt   25    86727.839 ±   1426.855  ops/s
MyBenchmark.sortPrimitiveArrayToList   thrpt   25    10678.672 ±    120.942  ops/s
e
I don't think you're testing quite right; sort will mutate the list in-place, and you're sorting the same mutated array/list over and over
(but yes, in a straight shootout of sorting integers, primitive arrays make a big difference)
p
Ah, that's a good point. JMH does re-initialize the list a few times, but not per-operation. I don't know if I care enough to rerun, but that is definitely throwing things off
e
it won't affect the trivial quicksort implementations, but timsort will perform a single pass check and bail out early 🙂
p
updated benchmark with fresh data per call