Are there any plans to expand the amount of operation types kotlinlang #arrow

Are there any plans to expand the amount of operat...

dave08

11/27/2024, 12:49 PM

Are there any plans to expand the amount of operation types/arity for Arrow collectors? Currently the amount of operations possible in Kotlin stdlib is much bigger than Arrow collectors, and they can be chained to more than the 3 collectors available... also it seems like only NonSuspending are available currently... or maybe I'm misunderstanding the concept?

Alejandro Serrano.Mena

11/27/2024, 1:01 PM

I don't completely follow, do you have more concrete examples of what are you missing? In theory you can easily create your own collectors by using

Copy code

val sum = Collector.of(
  supply = { 0 },
  accumulate = Int::plus
)

dave08

11/27/2024, 1:05 PM

object : Collector.of?

Alejandro Serrano.Mena

11/27/2024, 1:05 PM

sorry, I started writing something and then changed it

dave08

11/27/2024, 1:06 PM

What about chaining more than 3 collectors?

dave08

11/27/2024, 1:06 PM

This seems to be the biggest arity:

Copy code

fun <A, R, S, T, V> zip(    x: Collector<A, R>,
    y: Collector<A, S>,
    z: Collector<A, T>,
    combine: suspend (R, S, T) -> V
): Collector<A, V>

Alejandro Serrano.Mena

11/27/2024, 1:06 PM

oh, yes, we can expand those without any problem

Alejandro Serrano.Mena

11/27/2024, 1:07 PM

when you said "chaining" I thought you meant "one after the other" not "in parallel"

dave08

11/27/2024, 1:08 PM

Zip is in parallel?

Alejandro Serrano.Mena

11/27/2024, 1:09 PM

parallel in the sense that both accumulations are done over the same data. So if you

zip

something that adds and something that counts, you get a tuple with the addition and the length at the end

dave08

11/27/2024, 1:11 PM

Ok, now I see... so it's not really like stdlib's functions in that sense... but there's no example in the docs of any "one after the other"...?

dave08

11/27/2024, 1:11 PM

Is there a way to do that too?

Alejandro Serrano.Mena

11/27/2024, 1:12 PM

what would be an example of such collection? usually a collector returns a single value when executed, so there's no notion of "after the other"

Alejandro Serrano.Mena

11/27/2024, 1:12 PM

one could potentially have a collector that returns an

Iterable<A>

, and then execute the second on the result of the first one

dave08

11/27/2024, 1:12 PM

list.filter { }.distinct().map { }...

dave08

11/27/2024, 1:13 PM

But when using that it creates a new list every time

dave08

11/27/2024, 1:13 PM

And sequences don't support coroutines

dave08

11/27/2024, 1:13 PM

That's what I had in mind the first time I saw collectors

Alejandro Serrano.Mena

11/27/2024, 1:14 PM

maybe the documentation should be more clear about that, if you want to chain transformations which generate new collections, you should use

Sequence

Flow

, `Collector`s are for getting a single value at the end (like fold or reduce)

dave08

11/27/2024, 1:18 PM

And what's the whole story about the

Characteristics

? It seems like

Collector.of

would still need to specify one, no? I'm not too sure what they're needed for here... (w/o going in to the source code...). Another point is when would we use something other than NonSuspending collectors? I'm having a bit of trouble picturing when something should be a collector, or just be done in the lambda producing the result...

dave08

11/27/2024, 1:20 PM

I think these points of custom collectors and their use-cases, could help in more people adopting it. And in the docs it says:

Collectors help build complex computations ---over sequences of values---, guaranteeing that those values are consumed only once.

Which is I guess why I was expecting "after the other"...

dave08

11/27/2024, 1:23 PM

It should maybe say: > Collectors help build multiple computations over a single sequence of values, combining their result to produce one output while guaranteeing that those values are consumed only once. but then you'd still maybe need to cover

parCollect()

... which confuses me a bit now, since anyways one should be using flows for this?

dave08

11/27/2024, 1:24 PM

Just fixed the proposed text...

Alejandro Serrano.Mena

11/27/2024, 1:28 PM

something is a collector if you could implement it using

fold

reduce

. For example,

sum

max

size

are examples of collectors Imagine that you want to obtain the sym and the maximum of a list, if you do

Copy code

list.sum() to list.max()

you are iterating over the list twice. With collectors you can easily create something which does it in a single go

Copy code

val collector = sumCollector.zip(maxCollector, ::Pair)
list.collect(collector)

dave08

11/27/2024, 1:29 PM

Why would we need the suspend version?

Alejandro Serrano.Mena

11/27/2024, 1:30 PM

just in case your accumulation needs to access some external system, for example

dave08

11/27/2024, 1:31 PM

And parCollect()?

dave08

11/27/2024, 1:31 PM

I can't imagine how that's the same idea?

Alejandro Serrano.Mena

11/27/2024, 1:33 PM

parCollect

is about executing the accumulation in parallel, for example you can sum a list in that way

dave08

11/27/2024, 1:34 PM

Meaning instead of counting each element in the list, it splits the list in parts, counts each part concurrently and then adds the result together?

Alejandro Serrano.Mena

11/27/2024, 1:34 PM

exactly

dave08

11/27/2024, 1:35 PM

But that only allows one collector at a time it seems, so one couldn't compose collectors there...?

dave08

11/27/2024, 1:57 PM

One use case I could see for more collectors is take and drop... if I have a list, and I want to do something on the first 10, and something else on the last 5 and put the list back together while only going through the list once... but then, I'd have to nest collectors somehow, or compose them like:

Copy code

zip(take(10) + mapFirstTen, drop(10) + mapLastFive, ::combineList)

dave08

11/27/2024, 2:58 PM

Btw, thanks for clearing up a lot of my questions so far, I'm now starting to have a better idea of uses for this 👌🏼!

Alejandro Serrano.Mena

11/27/2024, 8:29 PM

I wrote a bit of docs about this https://github.com/arrow-kt/arrow-website/pull/345 If you have some time, I'd love your feedback 🙂

dave08

11/28/2024, 10:40 AM

Yeah, that's already better, although you could possibly also integrate part of that in the first paragraph of that page to give a more exact summary of what collectors are... here: https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html they define:

A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed. Reduction operations can be performed either sequentially or in parallel.

Examples of mutable reduction operations include: accumulating elements into a `Collection`; concatenating strings using a `StringBuilder`; computing summary information about elements such as sum, min, max, or average; computing "pivot table" summaries such as "maximum valued transaction by seller", etc. The class
Collectors
provides implementations of many common mutable reductions.

also, there's no example or explanation for parCollect()...

dave08

11/28/2024, 10:41 AM

also:

Collectors also have a set of characteristics, such as
Collector.Characteristics.CONCURRENT
, that provide hints that can be used by a reduction implementation to provide better performance.

dave08

11/28/2024, 10:41 AM

and they give examples of making your own...

3 Views

Open in Slack

Previous Next