Not sure if this is the right place to ask but is there a re kotlinlang #coroutines

Not sure if this is the right place to ask, but is...

Manu Eder

03/04/2021, 4:50 PM

Not sure if this is the right place to ask, but is there a reason why for flows there is an imperative interface for generating flows, but no imperative interface for consuming flows, like you can with channels or iterators? Or did I just not find it? I understand the benefits in readability etc. of using flow combinators where possible, but it seems to me there are certain parsing-type problems which are just not easily / readably expressable with the current flow combinators. Here https://pl.kotl.in/G07SkQwGw is a playground where I've implemented something which preserves the cold-by-default behaviour of flows while allowing you to consume and transform flows imperatively. Code sample:

Copy code

// f1 is a flow of somewhat random numbers, no deeper meaning
    val f1 : Flow<Int> = flowOf(1, 1, 2, 3, 5, 8, 3, 1, 4, 5, 9, 4, 3, 7, 0, 7, 7, 4, 1, 5)
    /*
    Let's imagine that we are told f1 consists of "packets", which start with a length field followed by as many numbers
    as indicated in the length field. We want to transform f1 into a flow where each element is the sum of the numbers
    in the "payload" of one packet.
    */
    runBlocking {
        println("package sums:")
        f1.transformImperatively<Int, Int> {
            while (true) {
                if (hasNext()) {
                    emit(sum(next()))
                } else {
                    break
                }
            }
        }.collect(::println)
    }

code for sum:

Copy code

suspend fun (FlowIterator<Int>).sum( count : Int ) : Int {
    var s : Int = 0
    for (i in 1..count) {
        if( hasNext() ) {
            s += next()
        } else {
            break
        }
    }
    return s
}

It's not finished, more a proof-of-concept. (In particular, I haven't had time yet to try and understand flow cancellation, or the context preservation properties expected of flows, so don't expect it to behave correctly in that regard yet. Though I would very much expect it to be possible to make it behave correctly in that sense as well.) I would be happy about any kind pointers to discussions about similar things, reasons why something like this isn't or shouldn't be in kotlinx.coroutines, reasons why it might break down later on, etc.

ephemient

03/04/2021, 5:17 PM

we already have

Copy code

val flow: Flow<T> = TODO()
val channel = flow.produceIn(this@coroutineScope)
val iterator = channel.iterator()
iterator.hasNext()
iterator.next()

Manu Eder

03/04/2021, 5:21 PM

Somehow I didn't see that. If I had stumbled across it earlier I might not have bothered, but now that I have, I'd argue that that is different from the thing I have here, because it starts producing values from the flow immediately, not on demand.

ephemient

03/04/2021, 5:24 PM

when should it start if not immediately?

ephemient

03/04/2021, 5:25 PM

your transform just needs to be wrapped in a

flow { }

Manu Eder

03/04/2021, 5:25 PM

I.e. here the flow coroutine starts when you call flow.produceIn and the value will typically already be waiting when you call hasNext() or next(). Whereas the variant in the playground will only start computing the next value when you call hasNext() or next().

Manu Eder

03/04/2021, 5:28 PM

From this github discussion: https://github.com/Kotlin/kotlinx.coroutines/issues/254 I get the impression that wanting this subtle difference in behaviour was the original motivation for introducing flows even if channels already existed.

ephemient

03/04/2021, 5:28 PM

Copy code

flow {
    val channel = inputFlow.produceIn(this@coroutineScope)
    val iterator = channel.iterator()
    while (iterator.hasNext()) {
        emit(transform(iterator.next()))
    }
}

ephemient

03/04/2021, 5:29 PM

that doesn't start the inputFlow until the output flow is started

Manu Eder

03/04/2021, 5:30 PM

but the second element will be waiting before it is needed.

Manu Eder

03/04/2021, 5:30 PM

it will be computed immediately after the first call to next()

ephemient

03/04/2021, 5:30 PM

depends on the buffer size, which can be adjusted

Manu Eder

03/04/2021, 5:31 PM

yeah, with buffer size 0 it will be computed immediately after the first call to next(), otherwise earlier

ephemient

03/04/2021, 5:32 PM

in any case, I do think there is maybe some space here to handle backpressure and cancellation differently than what these building blocks give you, if that's what you need... but I don't think there's that much use for it

Manu Eder

03/04/2021, 5:35 PM

It somehow feels like an omission. All the combinators that currently exist and more could be implemented in terms of this one building block which behaves roughly like what you wrote. But with the variant that you wrote you get the opposite "backpressure"-behaviour from how they currently act. The variant in the playground gives the same behaviour.

Manu Eder

03/04/2021, 5:37 PM

The variant that you wrote, if I understand correctly, depending on the dispatcher that's currently active, will run things in parallel on separate threads. That's not what flows usually do, unless you ask them to, I think.

ephemient

03/04/2021, 10:22 PM

it launches a coroutine, which may or may not bind to another thread. if you added

.buffer(0)

, it would be a rendezvous channel which makes every single send/receive blocking, waiting for the receiver/sender. but yes, it would be still be "hot" and the sender run until the next suspension point where most (but not all) flows are "cold" and pull-based

ephemient

03/04/2021, 10:23 PM

you can look into https://github.com/cashapp/turbine for other ideas

Manu Eder

03/05/2021, 1:36 PM

Hey, thanks a lot for taking the time to repsond 🙂 I came to think about this because I'm learning Kotlin and trying to find my way around the standard libraries. Flows are advertised a lot - especially if you're looking to learn Android programming you'll encounter them often. Channels are talked about less, and I get a feeling Kotlin developers are trying to deemphasize them. Some medium articles by Roman Elizarov seem to go in that direction and he mentions that one of the problems is that the concurrency-by-default / hot-by-default thing + the communication overhead implied by that is one of the problems. I started thinking about how to turn a Flow into a (suspending) Iterator (or a ReceiveChannel, which is more or less the same thing interface-wise) before I had seen that Channels exist in Kotlin. Now that I know that channels exist, and since you pointed it out (thank you!), I understand that it's quite easy to implement the transformation from Flow to ReceiveChannel using the Channel primitives (

produce { theFlow.collect { send(it) }  }

, like you wrote). But having read up on Channels and Flows and the reasoning behind Flows, I feel like this is the "wrong" implementation, and I suspect that's the reason why this function and the "imperative flow transformation functions" (or whatever you want to call them), that can be built from it are not in kotlinx.coroutines. The distinction between

channelBasedTransformImperatively

and what I feel is the

correctTransformImperatively

is exactly that

Copy code

theFlow.channelBasedTransformImperatively { ... } == theFlow.buffer(0).correctTransformImperatively { ... }

(if you use capacity 0 for the channel in the implementation of

channelBasedTransformImperatively

.buffer(N)

otherwise, I guess). Note that in general

theFlow.buffer(0) != theFlow

. If I find the time I might submit a bug report and see what the Kotlin devs say. There are a few bug reports on the github repo requesting specific flow combinators, which I suspect people would find easy to write themselves, if they had "permission" to write them imperatively, i.e. if there was a sanctioned, advertised

transformImperatively

that people could use. Thanks in any case for taking the time to go back and forth here. (Oh, and just to make sure we are not talking past each other: the thing that makes this interesting is that the transform function would not have to work element-wise but would instead get access to the iterator. The implementation you worte earlier would be just

inputFlow.buffer(0).map(transform)

. I assumed you knew what I meant. The actual implementation would be:

Copy code

suspend fun <T,R> Flow<T>.channelBasedTransformImperatively( transform : FlowContextAndIterator<T,R> ) {
  val inputFlow = this
  flow {
    val flowContext = this
    val channel = inputFlow.produceIn(this@coroutineScope)
    val iterator = channel.iterator()
    FlowContextAndIterator(flowContext, iterator).transform()
  }
}

(Where

FlowContextAndIterator

implements both FlowContext<T> and Iterator<R> by delegation to the passed objects, so gives acces both to next() and to emit().))

araqnid

03/05/2021, 6:42 PM

afaict the only practical difference is the behaviour of “it will be computed immediately after the first call to next()” - allowing the upstream to generate one value ahead of having values being requested downstream. This seems like a minor difference- can you give an example of when it’s significant?

araqnid

03/05/2021, 7:01 PM

What you’re describing sounds like something you could do with a reactive-streams-style setup, where you have a peeking iterator that requests an item from the publisher only when it needs to fill, and suspends until the item has arrived. Which isn’t quite what a rendezvous channel does, as you’ve observed.

Manu Eder

03/05/2021, 7:53 PM

Roman Elizarov claims here: https://elizarov.medium.com/shared-flows-broadcast-channels-899b675e805c that the difference matters performance-wise. I'll quote the bit that I mean:

In the early versions of the library, we had only channels and we tried to implement various transformations of asynchronous sequences as functions that take one channel as an argument and return another channel as a result. It means that, for example, a
filter
operator would run in its own coroutine.

The performance of such an operator was far from great, especially compared to just writing an
if
statement. In a hindsight, it is not surprising, because a channel is a synchronization primitive. Any channel, even an implementation that is optimized for a single producer and a single consumer, must support concurrent communicating coroutines and a data transfer between them needs synchronization, which is expensive in modern multicore systems. When you start building your application architecture on top of the asynchronous data streams, the need to have transformations naturally appears, and the channel costs start to accrue.

The implementation of transformImperatively using channels suffers from this same problem. The slightly-more-complex-to-write variant in the playground does not need any synchronization primitives, because in

flow ( producer ).transformImperatively( transformer ).collect( ... )

the producer and the transformer will never be running at the same time.

Manu Eder

03/05/2021, 8:07 PM

I also have an example of something where the behaviour, and not the performance, will break things, though not sure how convincing that'll be: I have a

lines

function in the playground linked above, which turns a Flow<Char> into a Flow<Flow<Char>>, where the inner flows are the individual lines of the text in the original flow. If you think about it, unless you want to buffer full arbitrary-length lines, then when collecting the Flow<Flow<Char>> there is the constraint that you have to collect all the inner flows immediately, otherwise this can't work. The lines function checks for this and throws if you violate that constraint. (There's nothing else it can do really.) So if you write

someFlow.lines().collect()

for example, that will throw.

someFlow.lines().collect { println(it.count()) }

is fine. Now the point:

someFlow.lines().buffer(0).whatever().collect { ... }

will also throw, no matter what

whatever()

does. So with the channel-based implementation of transformImperatively, which is equivalent to

.buffer(0).transformImperatively { ... }

, you could not use

channelBasedTransformImperatively

after

lines

at all.

Manu Eder

03/05/2021, 8:24 PM

In fact, I expect any kind of "grouping" operator that goes from

Flow<T>

Flow<Flow<T>>

to exhibit this kind of behaviour, that it must be forbidden to call buffer after the grouping operator and that you have to collect the inner flows. These kind of grouping operators become easy to write with

transformImperatively

. It would be a pity if you couldn't use

transformImperatively

after them any more.

2 Views

Open in Slack

Previous Next