So just out of curiosity I wanted to test how long a simple kotlinlang #announcements

So just out of curiosity I wanted to test how long...

Michael Pohl

07/12/2021, 6:29 PM

So just out of curiosity I wanted to test how long a simple

QuickSort

implementation takes vs. Kotlin's own

List<T>.sorted()

. While I was on it, I also tested variations of my implementation as an extension function and without explicitly storing values in the function. And I noticed they are very different in how fast they are processed. What is causing this? See thread ->

Michael Pohl

07/12/2021, 6:31 PM

So I had four different options to sort a random list of integers: 1. Kotlin's own

myList.sorted()

2. My basic implementation

Copy code

fun quickSort(input: List<Int>): List<Int> {
    if (input.size < 2) return input
    val pivot = input[input.size / 2]
    val smaller = input.filter { it < pivot }
    val equal = input.filter { it == pivot }
    val bigger = input.filter { it > pivot }
    return quickSort(smaller) + equal + quickSort(bigger)
}

3. The same as an extension

Copy code

fun List<Int>.quickSortExtension(): List<Int> {
    if (this.size < 2) return this
    val pivot = this[this.size / 2]
    val smaller = this.filter { it < pivot }
    val equal = this.filter { it == pivot }
    val bigger = this.filter { it > pivot }
    return smaller.quickSortExtension() + equal + bigger.quickSortExtension()
}

4. Without explicitly storing the vals

Copy code

fun quickSort2(input: List<Int>): List<Int> {
    if (input.size < 2) return input
    val pivot = input[input.size / 2]
    return quickSort2(
    input.filter { it < pivot }) + 
    input.filter { it == pivot } + 
    quickSort2(input.filter { it > pivot })
}

Now what I found curious was that they take extremely differnet times amounts to complete. I measured them in a very simple way:

Copy code

val result = measureNanoTime {
    [myQuickSortFunction(myRandomArray)]
}/1000 // returns microseconds

So for an array with 1000 elements the average times in microseconds are like this: 1. Kotlin: 1800 2. No-val QuickSort: 3400 3. Basic QuickSort: 3600 4. Extension QuickSort: 13000 While the difference between no.2 and 3 might be negligible, it is still there - where does it come from? And how can it be explained that the extension function is so much slower - it takes more than 3 times as long? And what does the stdlib

.sorted()

do different? Now to make this even more interesting (to me at least 🙂 ), when testing the same methods with an array that just contains 10 items, the results are very different: 1. Basic QuickSort: 85 2. No-val QuickSort: 95 3. Kotlin: 650 4. Extension QuickSort: 6500 What causes these differences? Especially, why is Kotlin's standard sorting so much slower for small arrays than it is for larger ones? And what causes the extension function to be so much slower? Does that mean there is a general performance impact when using extensions vs. "regular" functions?

andylamax

07/12/2021, 9:04 PM

well, this is interesting

Nir

07/13/2021, 12:09 AM

Keep in mind that your implementations are not actually quick sort

Nir

07/13/2021, 12:09 AM

Quick sort is in-place

👍 2

Nir

07/13/2021, 12:11 AM

it's hard to say what exactly is safe to conclude from these benchmarks, if you want to show that extensions are slower I would try to just measure their overhead without anything else

Nir

07/13/2021, 12:11 AM

And then for a recursive function

Nir

07/13/2021, 12:15 AM

You should probably also provide the full code you used to do the benchmarks. It's incredibly easy to make mistakes when micro benchmarking

☝️ 1

Nir

07/13/2021, 12:15 AM

Not that I personally could spot any said mistakes for a JVM languy but I'm sure other people here could

pniederw

07/13/2021, 12:24 AM

The above code is very inefficient in that every

filter()

and

creates a new list. Another difference is that Kotlin's

sorted()

uses Java's Timsort, which can be significantly faster than QuickSort. To compare the performance of sorting algorithms, consider sorting arrays (see Kotlin's

Array.sort()

). Always use JMH for JVM microbenchmarks.

Dave K

07/13/2021, 1:32 AM

Some sorting algorithms are faster on small inputs and worse with larger or vice versa and may contain overhead for setting up. Some are better when the data is fairly sorted already but less well if they’re random. Hybrid algorithms tend to cover this nuance. The majority of the time, trying to hyper optimize something like sort is meaningless in the context of your code; consider Amdahl’s law. Or the saying that premature optimization is the root of all evil (Donald Knuth). I wouldn’t worry about the speed of each implementation too much until you know it’s a bottleneck. And lastly, if you know what you need for the input you’re expecting (or if you know the data is largely pre sorted versus random), it’s better to use the algorithm you need. The default implementation Kotlin provides A) may change and B) is suitable when you don’t care exactly how it is implemented (I believe the contract is only that it’s stable but I could be wrong). Consider the same for listOf - this is currently an array list (and may change). If you know you’ll be inserting a lot of items in the middle of your list, you might choose to explicitly use a linked list. If you know you’ll be doing a lot of random accesses, you may choose to use an array list instead (explicitly since Kotlin can change its current implementation at any time)

Michael Pohl

07/13/2021, 5:03 AM

Thanks for enlightening me! Of course I am aware that neither my implementations nor my measuring are rather sophisticated...performance does not play a big role in my day to day so I usually don't benchmark things. I just did this out of interest and apparently didn't put much thought into it. Not planning to optimize anything actually, just looking around. Still curious though about the huge difference the extension function produces...

pniederw

07/13/2021, 5:26 AM

Best write a JMH benchmark to see if the difference is real.

ephemient

07/13/2021, 5:43 AM

right, JMH or other benchmarking systems are designed to handle JVM behavior, such as warmup, GC, and blackholing (so that the optimizer doesn't just strip out all your code)

ephemient

07/13/2021, 5:45 AM

Kotlin's

.sorted()

copies to an Array, which can be accessed via bytecode insructions rather than List's `get()`/`set()`, uses Java's built-in sort on that (which is likely timsort under the covers), and copies back to a List

ephemient

07/13/2021, 5:46 AM

which is 2 * n element copies while your implementations are approximately n^2 log n element copies

ephemient

07/13/2021, 5:53 AM

timsort is like a stable in-place mergesort variant with heuristics that optimizes for minimizing comparison operations on inputs that are already somewhat structured... in the end it's O(n log n) like other comparison-based sorting methods, but it's effectively O(n) when you're just re-sorting a pre-sorted list after inserting a single element, or other common scenarios

ephemient

07/13/2021, 5:55 AM

note that you will get different timings again if you sort IntArray instead of Array<Int>, as that uses yet a different algorithm under the covers, since there's no need to use a stable sort on value types

ephemient

07/13/2021, 5:57 AM

(not to mention the difference between boxed and unboxed integers... not something that matters in many cases, but it can certainly show up in microbenchmarks)

Paul Griffith

07/13/2021, 8:45 PM

I did a not very exact benchmark using JMH and all it really shows is that primitive arrays are really important if you care about speed

benchmark.kt

Paul Griffith

07/13/2021, 8:45 PM

Copy code

MyBenchmark.quickSort2                 thrpt   25     2787.005 ±     66.497  ops/s
MyBenchmark.quickSortExtension         thrpt   25     2848.910 ±     11.635  ops/s
MyBenchmark.quickSortNonExtension      thrpt   25     3112.069 ±     45.936  ops/s
MyBenchmark.sortBoxedArrayInPlace      thrpt   25   511593.816 ±  12580.323  ops/s
MyBenchmark.sortBoxedArrayToArray      thrpt   25    11702.525 ±    112.049  ops/s
MyBenchmark.sortBoxedArrayToList       thrpt   25    12085.770 ±    360.113  ops/s
MyBenchmark.sortListInPlace            thrpt   25  1140047.824 ±  13258.572  ops/s
MyBenchmark.sortListToList             thrpt   25    12124.497 ±    165.510  ops/s
MyBenchmark.sortPrimitiveArrayInPlace  thrpt   25  4523362.961 ± 176819.205  ops/s
MyBenchmark.sortPrimitiveArrayToArray  thrpt   25    86727.839 ±   1426.855  ops/s
MyBenchmark.sortPrimitiveArrayToList   thrpt   25    10678.672 ±    120.942  ops/s

ephemient

07/13/2021, 8:59 PM

I don't think you're testing quite right; sort will mutate the list in-place, and you're sorting the same mutated array/list over and over

ephemient

07/13/2021, 9:00 PM

(but yes, in a straight shootout of sorting integers, primitive arrays make a big difference)

Paul Griffith

07/13/2021, 9:02 PM

Ah, that's a good point. JMH does re-initialize the list a few times, but not per-operation. I don't know if I care enough to rerun, but that is definitely throwing things off

ephemient

07/13/2021, 9:04 PM

it won't affect the trivial quicksort implementations, but timsort will perform a single pass check and bail out early 🙂

Paul Griffith

07/13/2021, 11:11 PM

updated benchmark with fresh data per call

Untitled

3 Views

Open in Slack

Previous Next