What is the best terminal operator for a sequence ...
# getting-started
l
What is the best terminal operator for a sequence if I don't care about the generated sequence (I only use some side effects which happened during sequence computation)? I used
generateSequence { /*...*/ }.forEach { _ -> }
e
I'd use
.count()
c
I only use some side effects which happened during sequence computation
Do you even need a sequence at all, then? It seems like you should either apply those side effects during a
.forEach
block rather than in the generator, or else drop the sequence altogether and just run a normal loop
e
but that's a good point; if your side effects are attached with
.onEach {}
, then just do those in
.forEach {}
instead
l
Good points, maybe I should just explain my use case: I have a long string, where I have kind of a 4-character substring "window" which I move through the string. I'm looking for the first occurrence where the 4-character window has all unique characters. So I have an external variable which keeps the current position of the window, which is incremented inside the sequence generator. The code is here: https://github.com/lukaszkalnik/advent-of-code-2022-day-06/blob/master/src/main/kotlin/Main.kt#L17
I agree that probably in this case a loop would be more suitable. I just really wanted to implement it in a more declarative way.
e
shouldn't you just use
.indexOfFirst { }
to find the index instead of using an external variable?
l
Yes, but then at some point the sequence has to be converted to an
Iterable
.
If the source string is very long, it will keep it all in memory
e
.indexOfFirst
is available on Sequence,
.windowed
shouldn't keep anything more in memory than necessary
l
Ah, that sounds amazing! That's probably exactly what I wanted. Thank you for your great help!
This looks much better:
Copy code
val windowSize = 4

val result = message.asSequence()
    .windowed(windowSize)
    .map { chunk -> chunk.toSet().size == windowSize }
    .indexOfFirst { found -> found }
c
(tangential, but that's exactly why the terminal operator of Flow is called
collect
and the lambda is optional—but that's not really an expected pattern with sequences)
l
What do you mean?
c
Sequences are supposed to be pure, you're not really expected to have a sequence that has side effects before the terminal operator. Flows can be much more long-lived (e.g. a flow of events that lives for the entire application), so it's more OK to have something like
Copy code
someFlow()
  // …
  .onEach { … }
  // …
  .collect()
l
Great, thanks for the explanation!
e
instead of
.map { ... }.indexOfFirst { it }
, just
.indexOfFirst { ... }
1
c
A few more notes about Sequences that may help you understand their use-case better: A Sequence is effectively a series of Flow-like operators that work on Iterables. Sequences use the
suspend
mechanism of coroutines, but are not normal `kotlinx.coroutines`; you cannot do things like
launch { }
within a sequence scope, and the whole Sequence pipeline runs synchronously. It basically does the same job as
for(x in y)
, but with a different syntax that allows for more fluent and readable processing of the elements in
y
. Furthermore, converting something that’s already in memory to a Sequence doesn’t automatically make it more efficient. You’d need the source itself to emit values lazily in a
sequence { }
or
generateSequence { }
block to actually process a large data set lazily with a roughly-constant amount of memory. In your case, a String is already fully in-memory, so windowing it and processing it as a Sequence wouldn’t really make it process more efficiently than a normal
for
loop if it’s really large, since the whole String is already loaded into memory. What you’d need to do is read the “window” from the source file, rather than reading a line into memory and windowing that. And there’s a difference between the use-case of
sequence { }
or
generateSequence { }
.
generateSequence { }
is more for generating a “mathematical” sequence, where each element is strictly a function of the one before it (
generateSequence(7) { it + 2 }
produces
[7, 9, 11, …]
). It basically handles the iteration internally, and you just provide a function to derive one value from the previous one. If you’re looking to do more standard iteration, then
sequence { }
is the one you should be using instead, which lets you directly yield values to the sequence from within the block using whatever iteration logic you need. But in both cases, all the logic needed should be contained within the sequence’s lambda, not modifying values from outside it. As Ivan mentioned, they should be pure functions in the Functional Programming sense.
🚫 1
🙏 1
👍 1
c
One more note: Sequences do not know when they are closed (Flows do); there is no
.onComplete
on Sequence. It's important because it means a Sequence cannot free resources when it's done, flows can. E.g. when using #kmongo, iterating over a request with a Sequence forces the entire request to be loaded in memory so the cursor can be closed. Meanwhile, iterating with a flow is truly lazy (only the requested values are read, and the cursor is closed when a terminal operation finishes)
🙏 1
l
Yes, it would be definitely interesting to implement windowed reading from the file as well. I suppose using the okio library I could set some buffer and read to it, and then process the data with a sequence. I didn't want to use a
for
loop here, because I wanted to solve the task in a more declarative/functional way. And I wanted to learn something about sequences. And thanks to all of you helpful people, I learned even more than I anticipated. 🙂 Thank you again for being so helpful.
BTW I just discovered
String.windowedSequence()
which is again one less call in my solution 🙂
e
(you could also do something like
Copy code
fun <T> Iterable<T>.allDistinct(): Boolean = all(mutableSetOf<T>()::add)
fun CharSequence.allDistinct(): Boolean = all(mutableSetOf<Char>()::add)
etc. instead of checking
.size
, but that's sorta beside the point here)
l
Yes, that is actually even faster (
O(n)
instead of
O(nlogn)
when just converting to a HashSet).
e
.toSet()
is O(n), where's the O(nlogn) from? I just brought it up as it can stop as soon as it finds a single duplicate and doesn't require knowing the size (e.g. it works on
Iterable
)