In what cases will use of a Sequence be significan...
# announcements
m
In what cases will use of a Sequence be significantly slower than use of an Iterable with Sequence's current implementation that they cannot be iterated over multiple times?
c
When you run your code and it feels slow
Don’t optimize for performance too early, just write clean, readable code and address performance bottlenecks as they start actually causing issues
m
Sorry, I should clarify, this is in context to the claim that I'm trying to refute that it always makes sense to use a sequence instead of an Iterable for operations like map, filter, etc due to copying costs. I'm trying to think of a case in which the overhead of lazy evaluation would surpass copying costs. I guess just collection size would be of concern? I should be very clear that I'm not trying to micro-optimize or anything like that. I'm purely trying to further investigate the merits of each and where they have shortfalls.
c
A sequence is better when you don’t necessarily know the size before iteration (which may be infinite), or for dynamically generating the values. It’s also better for iterating through something like lines in a giant file to keep memory usage low. If what you’re trying to iterate is already fully in-memory, it’s usually easier to just manage/iterate it as a List. Both APIs support the same set of operators, and over a single-pass the performance between the two likely won’t vary too much. But List implementations of the operators will each create a copy of the list with the transformed values while the Sequence does not, so if you have a lot of operators that need to be chained, you’ll have lower memory usage and faster processing with the Sequence But even still, unless you’re talking thousands of elements or dozens of operators, the practical difference will likely be pretty minimal, and some of the Sequence operators collect the values into an intermediary list anyway making things a bit muddier. So the main deciding factor should primarily be semantic: is the full extent of iteration known and in-memory, or dynamic?
e
List operations can be inlined while Sequence call through lambda objects for each operation, which also affects performance (and not always in obvious ways). This also means you can return@outer and break out of List operations early, while you can't with Sequence.
n
imho it's still kind of unfortunate that Iterable operations are pretty much the default in practice. If the collection is small, sure iterable may be faster due to inlining but it's rarely a big in most cases. If the container is big though then Iterable instead of sequence rapidly becomes ridiculous and can easily be an issue, in terms of memory usage as well as perf, I'd guess.
👍 1
I end up sprinkling
asSequence
in so many places in my code, kills the elegance of things a bit
e
if it fits in nursery generation, I would usually expect inlined list operations creating new write-once objects to perform better than the indirect calls involved in sequence. but it's hard to claim wins one way or another as it depends a lot on other factors
so unless it's a performance issue, go with whatever is more natural - list if the input data is already in memory, sequence if it's not or potentially unbounded
(skip asSequence unless you measure it)
👀 1
n
it's a nice benchmark, would be interesting to see if it's changed
doesn't cover memory usage at all though
and if you don't use sequences by default, then you still have to remember to go out of your way to use them for things like
find
. but good points, nonetheless, I admit I found the results surprising.
q
I wrote a small memory benchmark: https://gist.github.com/Quantum64/1e99643f743d20ac602e0275c793964d My results show that even for a very small data type (Integer in my case), the Iterable chain allocated a significantly greater amount of memory. Even though Iterable beats Sequence for long chains, though by less than the above article shows in my testing, it looks like most or all of that performance gain will be negated by the extra GC work.
n
Thanks for writing here!
Pretty interesting stuff, still impressive that the performance is comparable
the increased memory usage makes sense; imho sequences are still a better default for a language (Kotlin's the first I've used where eager is the default for things like map, list, etc)
e
as far as my experience goes, iterable is the default in more languages with these transformation methods (Perl, Ruby, Swift, Scala… well, debatable there with builders)
even with more memory usage, GC is negligible as long as you're just using the bump allocator and not performing mutations on objects that have survived a previous GC
I think the default is the right way around for the language, because sequence can't early-return and list interoperates better
may be different if Kotlin had decided to extend Iterator instead of creating an equivalent-but-separate Sequence, but I can see good reasons not to as well
n
python, Rust, C++ ranges
perl/ruby are older but I'm surprised about Swift/Scala, TIL
e
as far as I'm aware, Perl made their decision so that
map { print "1 $_\n" } map { print "2 $_\n" } @array
doesn't interleave. there are user libraries that can override this (of course) dunno about why in Ruby but probably followed Perl's lead. it does have enumerators though (which allow for lazy iteration) I believe Swift's collection transformations are eager to avoid dealing with lifetime issues. it does have sequences like Kotlin Scala has changed their collections design more times than I can be bothered to learn 🤷
n
hah
q
GC certainly is not negligible, especially in pause-sensitive environments. It's just something you need to consider when comparing benchmarks.
e
nursery (bump allocator, tracing collection bounded by card table) can be faster than allocating and deallocation on heap, it only gets slower if you have too many roots, survivors to promote, or mutations in old generation that force a larger collection. if all you have is transient immutable data it's as fast as an arena allocator, while handling other workloads as well. G1GC is soft-realtime even with STW eden generation. of course you'll run into trouble if you're allocating faster than you can collect, but that either means the program isn't in a steady state (and it'll have to reach one, eventually…) or can be tuned better