What is the best terminal operator for a sequence if I don t kotlinlang #getting-started

What is the best terminal operator for a sequence ...

Lukasz Kalnik

01/31/2023, 8:47 PM

What is the best terminal operator for a sequence if I don't care about the generated sequence (I only use some side effects which happened during sequence computation)? I used

generateSequence { /*...*/ }.forEach { _ -> }

ephemient

01/31/2023, 8:53 PM

I'd use

.count()

Casey Brooks

01/31/2023, 8:53 PM

I only use some side effects which happened during sequence computation

Do you even need a sequence at all, then? It seems like you should either apply those side effects during a

.forEach

block rather than in the generator, or else drop the sequence altogether and just run a normal loop

ephemient

01/31/2023, 8:54 PM

but that's a good point; if your side effects are attached with

.onEach {}

, then just do those in

.forEach {}

instead

Lukasz Kalnik

02/01/2023, 8:03 AM

Good points, maybe I should just explain my use case: I have a long string, where I have kind of a 4-character substring "window" which I move through the string. I'm looking for the first occurrence where the 4-character window has all unique characters. So I have an external variable which keeps the current position of the window, which is incremented inside the sequence generator. The code is here: https://github.com/lukaszkalnik/advent-of-code-2022-day-06/blob/master/src/main/kotlin/Main.kt#L17

Lukasz Kalnik

02/01/2023, 8:05 AM

I agree that probably in this case a loop would be more suitable. I just really wanted to implement it in a more declarative way.

ephemient

02/01/2023, 8:13 AM

shouldn't you just use

.indexOfFirst { }

to find the index instead of using an external variable?

Lukasz Kalnik

02/01/2023, 8:16 AM

Yes, but then at some point the sequence has to be converted to an

Iterable

Lukasz Kalnik

02/01/2023, 8:16 AM

If the source string is very long, it will keep it all in memory

ephemient

02/01/2023, 8:17 AM

.indexOfFirst

is available on Sequence,

.windowed

shouldn't keep anything more in memory than necessary

Lukasz Kalnik

02/01/2023, 8:22 AM

Ah, that sounds amazing! That's probably exactly what I wanted. Thank you for your great help!

Lukasz Kalnik

02/01/2023, 8:37 AM

This looks much better:

Copy code

val windowSize = 4

val result = message.asSequence()
    .windowed(windowSize)
    .map { chunk -> chunk.toSet().size == windowSize }
    .indexOfFirst { found -> found }

CLOVIS

02/01/2023, 9:03 AM

(tangential, but that's exactly why the terminal operator of Flow is called

collect

and the lambda is optional—but that's not really an expected pattern with sequences)

Lukasz Kalnik

02/01/2023, 9:15 AM

What do you mean?

CLOVIS

02/01/2023, 2:28 PM

Sequences are supposed to be pure, you're not really expected to have a sequence that has side effects before the terminal operator. Flows can be much more long-lived (e.g. a flow of events that lives for the entire application), so it's more OK to have something like

Copy code

someFlow()
  // …
  .onEach { … }
  // …
  .collect()

Lukasz Kalnik

02/01/2023, 2:31 PM

Great, thanks for the explanation!

ephemient

02/01/2023, 2:48 PM

instead of

.map { ... }.indexOfFirst { it }

, just

.indexOfFirst { ... }

✅ 1

Casey Brooks

02/01/2023, 4:16 PM

A few more notes about Sequences that may help you understand their use-case better: A Sequence is effectively a series of Flow-like operators that work on Iterables. Sequences use the

suspend

mechanism of coroutines, but are not normal `kotlinx.coroutines`; you cannot do things like

launch { }

within a sequence scope, and the whole Sequence pipeline runs synchronously. It basically does the same job as

for(x in y)

, but with a different syntax that allows for more fluent and readable processing of the elements in

. Furthermore, converting something that’s already in memory to a Sequence doesn’t automatically make it more efficient. You’d need the source itself to emit values lazily in a

sequence { }

generateSequence { }

block to actually process a large data set lazily with a roughly-constant amount of memory. In your case, a String is already fully in-memory, so windowing it and processing it as a Sequence wouldn’t really make it process more efficiently than a normal

for

loop if it’s really large, since the whole String is already loaded into memory. What you’d need to do is read the “window” from the source file, rather than reading a line into memory and windowing that. And there’s a difference between the use-case of

sequence { }

generateSequence { }

generateSequence { }

is more for generating a “mathematical” sequence, where each element is strictly a function of the one before it (

generateSequence(7) { it + 2 }

produces

[7, 9, 11, …]

). It basically handles the iteration internally, and you just provide a function to derive one value from the previous one. If you’re looking to do more standard iteration, then

sequence { }

is the one you should be using instead, which lets you directly yield values to the sequence from within the block using whatever iteration logic you need. But in both cases, all the logic needed should be contained within the sequence’s lambda, not modifying values from outside it. As Ivan mentioned, they should be pure functions in the Functional Programming sense.

🚫 1

🙏 1

👍 1

CLOVIS

02/01/2023, 4:20 PM

One more note: Sequences do not know when they are closed (Flows do); there is no

.onComplete

on Sequence. It's important because it means a Sequence cannot free resources when it's done, flows can. E.g. when using #kmongo, iterating over a request with a Sequence forces the entire request to be loaded in memory so the cursor can be closed. Meanwhile, iterating with a flow is truly lazy (only the requested values are read, and the cursor is closed when a terminal operation finishes)

🙏 1

Lukasz Kalnik

02/01/2023, 7:05 PM

Yes, it would be definitely interesting to implement windowed reading from the file as well. I suppose using the okio library I could set some buffer and read to it, and then process the data with a sequence. I didn't want to use a

for

loop here, because I wanted to solve the task in a more declarative/functional way. And I wanted to learn something about sequences. And thanks to all of you helpful people, I learned even more than I anticipated. 🙂 Thank you again for being so helpful.

Lukasz Kalnik

02/01/2023, 8:00 PM

BTW I just discovered

String.windowedSequence()

which is again one less call in my solution 🙂

ephemient

02/01/2023, 8:08 PM

(you could also do something like

Copy code

fun <T> Iterable<T>.allDistinct(): Boolean = all(mutableSetOf<T>()::add)
fun CharSequence.allDistinct(): Boolean = all(mutableSetOf<Char>()::add)

etc. instead of checking

.size

, but that's sorta beside the point here)

Lukasz Kalnik

02/02/2023, 7:46 AM

Yes, that is actually even faster (

O(n)

instead of

O(nlogn)

when just converting to a HashSet).

ephemient

02/02/2023, 9:43 PM

.toSet()

is O(n), where's the O(nlogn) from? I just brought it up as it can stop as soon as it finds a single duplicate and doesn't require knowing the size (e.g. it works on

Iterable

)

7 Views

Open in Slack

Previous Next