Not sure if this is the right place to ask, but is...
# coroutines
m
Not sure if this is the right place to ask, but is there a reason why for flows there is an imperative interface for generating flows, but no imperative interface for consuming flows, like you can with channels or iterators? Or did I just not find it? I understand the benefits in readability etc. of using flow combinators where possible, but it seems to me there are certain parsing-type problems which are just not easily / readably expressable with the current flow combinators. Here https://pl.kotl.in/G07SkQwGw is a playground where I've implemented something which preserves the cold-by-default behaviour of flows while allowing you to consume and transform flows imperatively. Code sample:
Copy code
// f1 is a flow of somewhat random numbers, no deeper meaning
    val f1 : Flow<Int> = flowOf(1, 1, 2, 3, 5, 8, 3, 1, 4, 5, 9, 4, 3, 7, 0, 7, 7, 4, 1, 5)
    /*
    Let's imagine that we are told f1 consists of "packets", which start with a length field followed by as many numbers
    as indicated in the length field. We want to transform f1 into a flow where each element is the sum of the numbers
    in the "payload" of one packet.
    */
    runBlocking {
        println("package sums:")
        f1.transformImperatively<Int, Int> {
            while (true) {
                if (hasNext()) {
                    emit(sum(next()))
                } else {
                    break
                }
            }
        }.collect(::println)
    }
code for sum:
Copy code
suspend fun (FlowIterator<Int>).sum( count : Int ) : Int {
    var s : Int = 0
    for (i in 1..count) {
        if( hasNext() ) {
            s += next()
        } else {
            break
        }
    }
    return s
}
It's not finished, more a proof-of-concept. (In particular, I haven't had time yet to try and understand flow cancellation, or the context preservation properties expected of flows, so don't expect it to behave correctly in that regard yet. Though I would very much expect it to be possible to make it behave correctly in that sense as well.) I would be happy about any kind pointers to discussions about similar things, reasons why something like this isn't or shouldn't be in kotlinx.coroutines, reasons why it might break down later on, etc.
e
we already have
Copy code
val flow: Flow<T> = TODO()
val channel = flow.produceIn(this@coroutineScope)
val iterator = channel.iterator()
iterator.hasNext()
iterator.next()
m
Somehow I didn't see that. If I had stumbled across it earlier I might not have bothered, but now that I have, I'd argue that that is different from the thing I have here, because it starts producing values from the flow immediately, not on demand.
e
when should it start if not immediately?
your transform just needs to be wrapped in a
flow { }
m
I.e. here the flow coroutine starts when you call flow.produceIn and the value will typically already be waiting when you call hasNext() or next(). Whereas the variant in the playground will only start computing the next value when you call hasNext() or next().
From this github discussion: https://github.com/Kotlin/kotlinx.coroutines/issues/254 I get the impression that wanting this subtle difference in behaviour was the original motivation for introducing flows even if channels already existed.
e
Copy code
flow {
    val channel = inputFlow.produceIn(this@coroutineScope)
    val iterator = channel.iterator()
    while (iterator.hasNext()) {
        emit(transform(iterator.next()))
    }
}
that doesn't start the inputFlow until the output flow is started
m
but the second element will be waiting before it is needed.
it will be computed immediately after the first call to next()
e
depends on the buffer size, which can be adjusted
m
yeah, with buffer size 0 it will be computed immediately after the first call to next(), otherwise earlier
e
in any case, I do think there is maybe some space here to handle backpressure and cancellation differently than what these building blocks give you, if that's what you need... but I don't think there's that much use for it
m
It somehow feels like an omission. All the combinators that currently exist and more could be implemented in terms of this one building block which behaves roughly like what you wrote. But with the variant that you wrote you get the opposite "backpressure"-behaviour from how they currently act. The variant in the playground gives the same behaviour.
The variant that you wrote, if I understand correctly, depending on the dispatcher that's currently active, will run things in parallel on separate threads. That's not what flows usually do, unless you ask them to, I think.
e
it launches a coroutine, which may or may not bind to another thread. if you added
.buffer(0)
, it would be a rendezvous channel which makes every single send/receive blocking, waiting for the receiver/sender. but yes, it would be still be "hot" and the sender run until the next suspension point where most (but not all) flows are "cold" and pull-based
you can look into https://github.com/cashapp/turbine for other ideas
m
Hey, thanks a lot for taking the time to repsond 🙂 I came to think about this because I'm learning Kotlin and trying to find my way around the standard libraries. Flows are advertised a lot - especially if you're looking to learn Android programming you'll encounter them often. Channels are talked about less, and I get a feeling Kotlin developers are trying to deemphasize them. Some medium articles by Roman Elizarov seem to go in that direction and he mentions that one of the problems is that the concurrency-by-default / hot-by-default thing + the communication overhead implied by that is one of the problems. I started thinking about how to turn a Flow into a (suspending) Iterator (or a ReceiveChannel, which is more or less the same thing interface-wise) before I had seen that Channels exist in Kotlin. Now that I know that channels exist, and since you pointed it out (thank you!), I understand that it's quite easy to implement the transformation from Flow to ReceiveChannel using the Channel primitives (
produce { theFlow.collect { send(it) }  }
, like you wrote). But having read up on Channels and Flows and the reasoning behind Flows, I feel like this is the "wrong" implementation, and I suspect that's the reason why this function and the "imperative flow transformation functions" (or whatever you want to call them), that can be built from it are not in kotlinx.coroutines. The distinction between
channelBasedTransformImperatively
and what I feel is the
correctTransformImperatively
is exactly that
Copy code
theFlow.channelBasedTransformImperatively { ... } == theFlow.buffer(0).correctTransformImperatively { ... }
(if you use capacity 0 for the channel in the implementation of
channelBasedTransformImperatively
,
.buffer(N)
otherwise, I guess). Note that in general
theFlow.buffer(0) != theFlow
. If I find the time I might submit a bug report and see what the Kotlin devs say. There are a few bug reports on the github repo requesting specific flow combinators, which I suspect people would find easy to write themselves, if they had "permission" to write them imperatively, i.e. if there was a sanctioned, advertised
transformImperatively
that people could use. Thanks in any case for taking the time to go back and forth here. (Oh, and just to make sure we are not talking past each other: the thing that makes this interesting is that the transform function would not have to work element-wise but would instead get access to the iterator. The implementation you worte earlier would be just
inputFlow.buffer(0).map(transform)
. I assumed you knew what I meant. The actual implementation would be:
Copy code
suspend fun <T,R> Flow<T>.channelBasedTransformImperatively( transform : FlowContextAndIterator<T,R> ) {
  val inputFlow = this
  flow {
    val flowContext = this
    val channel = inputFlow.produceIn(this@coroutineScope)
    val iterator = channel.iterator()
    FlowContextAndIterator(flowContext, iterator).transform()
  }
}
(Where
FlowContextAndIterator
implements both FlowContext<T> and Iterator<R> by delegation to the passed objects, so gives acces both to next() and to emit().))
a
afaict the only practical difference is the behaviour of “it will be computed immediately after the first call to next()” - allowing the upstream to generate one value ahead of having values being requested downstream. This seems like a minor difference- can you give an example of when it’s significant?
What you’re describing sounds like something you could do with a reactive-streams-style setup, where you have a peeking iterator that requests an item from the publisher only when it needs to fill, and suspends until the item has arrived. Which isn’t quite what a rendezvous channel does, as you’ve observed.
m
Roman Elizarov claims here: https://elizarov.medium.com/shared-flows-broadcast-channels-899b675e805c that the difference matters performance-wise. I'll quote the bit that I mean:
In the early versions of the library, we had only channels and we tried to implement various transformations of asynchronous sequences as functions that take one channel as an argument and return another channel as a result. It means that, for example, a
filter
operator would run in its own coroutine.
The performance of such an operator was far from great, especially compared to just writing an
if
statement. In a hindsight, it is not surprising, because a channel is a synchronization primitive. Any channel, even an implementation that is optimized for a single producer and a single consumer, must support concurrent communicating coroutines and a data transfer between them needs synchronization, which is expensive in modern multicore systems. When you start building your application architecture on top of the asynchronous data streams, the need to have transformations naturally appears, and the channel costs start to accrue.
The implementation of transformImperatively using channels suffers from this same problem. The slightly-more-complex-to-write variant in the playground does not need any synchronization primitives, because in
flow ( producer ).transformImperatively( transformer ).collect( ... )
the producer and the transformer will never be running at the same time.
I also have an example of something where the behaviour, and not the performance, will break things, though not sure how convincing that'll be: I have a
lines
function in the playground linked above, which turns a Flow<Char> into a Flow<Flow<Char>>, where the inner flows are the individual lines of the text in the original flow. If you think about it, unless you want to buffer full arbitrary-length lines, then when collecting the Flow<Flow<Char>> there is the constraint that you have to collect all the inner flows immediately, otherwise this can't work. The lines function checks for this and throws if you violate that constraint. (There's nothing else it can do really.) So if you write
someFlow.lines().collect()
for example, that will throw.
someFlow.lines().collect { println(it.count()) }
is fine. Now the point:
someFlow.lines().buffer(0).whatever().collect { ... }
will also throw, no matter what
whatever()
does. So with the channel-based implementation of transformImperatively, which is equivalent to
.buffer(0).transformImperatively { ... }
, you could not use
channelBasedTransformImperatively
after
lines
at all.
In fact, I expect any kind of "grouping" operator that goes from
Flow<T>
to
Flow<Flow<T>>
to exhibit this kind of behaviour, that it must be forbidden to call buffer after the grouping operator and that you have to collect the inner flows. These kind of grouping operators become easy to write with
transformImperatively
. It would be a pity if you couldn't use
transformImperatively
after them any more.