In what cases will use of a Sequence be significantly slower kotlinlang #announcements

In what cases will use of a Sequence be significan...

Matt Yokan

06/23/2021, 10:03 PM

In what cases will use of a Sequence be significantly slower than use of an Iterable with Sequence's current implementation that they cannot be iterated over multiple times?

Casey Brooks

06/23/2021, 10:07 PM

When you run your code and it feels slow

Casey Brooks

06/23/2021, 10:08 PM

Don’t optimize for performance too early, just write clean, readable code and address performance bottlenecks as they start actually causing issues

Matt Yokan

06/23/2021, 10:13 PM

Sorry, I should clarify, this is in context to the claim that I'm trying to refute that it always makes sense to use a sequence instead of an Iterable for operations like map, filter, etc due to copying costs. I'm trying to think of a case in which the overhead of lazy evaluation would surpass copying costs. I guess just collection size would be of concern? I should be very clear that I'm not trying to micro-optimize or anything like that. I'm purely trying to further investigate the merits of each and where they have shortfalls.

Casey Brooks

06/23/2021, 10:21 PM

A sequence is better when you don’t necessarily know the size before iteration (which may be infinite), or for dynamically generating the values. It’s also better for iterating through something like lines in a giant file to keep memory usage low. If what you’re trying to iterate is already fully in-memory, it’s usually easier to just manage/iterate it as a List. Both APIs support the same set of operators, and over a single-pass the performance between the two likely won’t vary too much. But List implementations of the operators will each create a copy of the list with the transformed values while the Sequence does not, so if you have a lot of operators that need to be chained, you’ll have lower memory usage and faster processing with the Sequence But even still, unless you’re talking thousands of elements or dozens of operators, the practical difference will likely be pretty minimal, and some of the Sequence operators collect the values into an intermediary list anyway making things a bit muddier. So the main deciding factor should primarily be semantic: is the full extent of iteration known and in-memory, or dynamic?

ephemient

06/24/2021, 12:18 AM

List operations can be inlined while Sequence call through lambda objects for each operation, which also affects performance (and not always in obvious ways). This also means you can return@outer and break out of List operations early, while you can't with Sequence.

Nir

06/24/2021, 4:13 AM

imho it's still kind of unfortunate that Iterable operations are pretty much the default in practice. If the collection is small, sure iterable may be faster due to inlining but it's rarely a big in most cases. If the container is big though then Iterable instead of sequence rapidly becomes ridiculous and can easily be an issue, in terms of memory usage as well as perf, I'd guess.

👍 1

Nir

06/24/2021, 4:14 AM

I end up sprinkling

asSequence

in so many places in my code, kills the elegance of things a bit

ephemient

06/24/2021, 4:29 AM

if it fits in nursery generation, I would usually expect inlined list operations creating new write-once objects to perform better than the indirect calls involved in sequence. but it's hard to claim wins one way or another as it depends a lot on other factors

ephemient

06/24/2021, 4:31 AM

so unless it's a performance issue, go with whatever is more natural - list if the input data is already in memory, sequence if it's not or potentially unbounded

ephemient

06/24/2021, 4:32 AM

(skip asSequence unless you measure it)

👀 1

ephemient

06/24/2021, 7:14 AM

https://medium.com/@ajalt/benchmarking-kotlin-sequences-e06d8bb4011c

Nir

06/24/2021, 10:45 PM

it's a nice benchmark, would be interesting to see if it's changed

Nir

06/24/2021, 10:45 PM

doesn't cover memory usage at all though

Nir

06/24/2021, 10:45 PM

and if you don't use sequences by default, then you still have to remember to go out of your way to use them for things like

find

. but good points, nonetheless, I admit I found the results surprising.

Quantum64

06/25/2021, 3:23 AM

I wrote a small memory benchmark: https://gist.github.com/Quantum64/1e99643f743d20ac602e0275c793964d My results show that even for a very small data type (Integer in my case), the Iterable chain allocated a significantly greater amount of memory. Even though Iterable beats Sequence for long chains, though by less than the above article shows in my testing, it looks like most or all of that performance gain will be negated by the extra GC work.

Nir

06/25/2021, 1:47 PM

Thanks for writing here!

Nir

06/25/2021, 1:47 PM

Pretty interesting stuff, still impressive that the performance is comparable

Nir

06/25/2021, 1:48 PM

the increased memory usage makes sense; imho sequences are still a better default for a language (Kotlin's the first I've used where eager is the default for things like map, list, etc)

ephemient

06/25/2021, 2:00 PM

as far as my experience goes, iterable is the default in more languages with these transformation methods (Perl, Ruby, Swift, Scala… well, debatable there with builders)

ephemient

06/25/2021, 2:02 PM

even with more memory usage, GC is negligible as long as you're just using the bump allocator and not performing mutations on objects that have survived a previous GC

ephemient

06/25/2021, 2:05 PM

I think the default is the right way around for the language, because sequence can't early-return and list interoperates better

ephemient

06/25/2021, 2:08 PM

may be different if Kotlin had decided to extend Iterator instead of creating an equivalent-but-separate Sequence, but I can see good reasons not to as well

Nir

06/25/2021, 2:08 PM

python, Rust, C++ ranges

Nir

06/25/2021, 2:09 PM

perl/ruby are older but I'm surprised about Swift/Scala, TIL

ephemient

06/25/2021, 6:39 PM

as far as I'm aware, Perl made their decision so that

map { print "1 $_\n" } map { print "2 $_\n" } @array

doesn't interleave. there are user libraries that can override this (of course) dunno about why in Ruby but probably followed Perl's lead. it does have enumerators though (which allow for lazy iteration) I believe Swift's collection transformations are eager to avoid dealing with lifetime issues. it does have sequences like Kotlin Scala has changed their collections design more times than I can be bothered to learn 🤷

Nir

06/25/2021, 7:07 PM

hah

Quantum64

06/25/2021, 8:37 PM

GC certainly is not negligible, especially in pause-sensitive environments. It's just something you need to consider when comparing benchmarks.

ephemient

06/25/2021, 9:05 PM

nursery (bump allocator, tracing collection bounded by card table) can be faster than allocating and deallocation on heap, it only gets slower if you have too many roots, survivors to promote, or mutations in old generation that force a larger collection. if all you have is transient immutable data it's as fast as an arena allocator, while handling other workloads as well. G1GC is soft-realtime even with STW eden generation. of course you'll run into trouble if you're allocating faster than you can collect, but that either means the program isn't in a steady state (and it'll have to reach one, eventually…) or can be tuned better

18 Views

Open in Slack

Previous Next