Are there any plans to expand the amount of operat...
# arrow
d
Are there any plans to expand the amount of operation types/arity for Arrow collectors? Currently the amount of operations possible in Kotlin stdlib is much bigger than Arrow collectors, and they can be chained to more than the 3 collectors available... also it seems like only NonSuspending are available currently... or maybe I'm misunderstanding the concept?
a
I don't completely follow, do you have more concrete examples of what are you missing? In theory you can easily create your own collectors by using
Copy code
val sum = Collector.of(
  supply = { 0 },
  accumulate = Int::plus
)
d
object : Collector.of?
a
sorry, I started writing something and then changed it
d
What about chaining more than 3 collectors?
This seems to be the biggest arity:
Copy code
fun <A, R, S, T, V> zip(    x: Collector<A, R>,
    y: Collector<A, S>,
    z: Collector<A, T>,
    combine: suspend (R, S, T) -> V
): Collector<A, V>
a
oh, yes, we can expand those without any problem
when you said "chaining" I thought you meant "one after the other" not "in parallel"
d
Zip is in parallel?
a
parallel in the sense that both accumulations are done over the same data. So if you
zip
something that adds and something that counts, you get a tuple with the addition and the length at the end
d
Ok, now I see... so it's not really like stdlib's functions in that sense... but there's no example in the docs of any "one after the other"...?
Is there a way to do that too?
a
what would be an example of such collection? usually a collector returns a single value when executed, so there's no notion of "after the other"
one could potentially have a collector that returns an
Iterable<A>
, and then execute the second on the result of the first one
d
list.filter { }.distinct().map { }...
But when using that it creates a new list every time
And sequences don't support coroutines
That's what I had in mind the first time I saw collectors
a
maybe the documentation should be more clear about that, if you want to chain transformations which generate new collections, you should use
Sequence
or
Flow
, `Collector`s are for getting a single value at the end (like fold or reduce)
d
And what's the whole story about the
Characteristics
? It seems like
Collector.of
would still need to specify one, no? I'm not too sure what they're needed for here... (w/o going in to the source code...). Another point is when would we use something other than NonSuspending collectors? I'm having a bit of trouble picturing when something should be a collector, or just be done in the lambda producing the result...
I think these points of custom collectors and their use-cases, could help in more people adopting it. And in the docs it says:
Collectors help build complex computations ---over sequences of values---, guaranteeing that those values are consumed only once.
Which is I guess why I was expecting "after the other"...
It should maybe say: > Collectors help build multiple computations over a single sequence of values, combining their result to produce one output while guaranteeing that those values are consumed only once. but then you'd still maybe need to cover
parCollect()
... which confuses me a bit now, since anyways one should be using flows for this?
Just fixed the proposed text...
a
something is a collector if you could implement it using
fold
or
reduce
. For example,
sum
,
max
or
size
are examples of collectors Imagine that you want to obtain the sym and the maximum of a list, if you do
Copy code
list.sum() to list.max()
you are iterating over the list twice. With collectors you can easily create something which does it in a single go
Copy code
val collector = sumCollector.zip(maxCollector, ::Pair)
list.collect(collector)
d
Why would we need the suspend version?
a
just in case your accumulation needs to access some external system, for example
d
And parCollect()?
I can't imagine how that's the same idea?
a
parCollect
is about executing the accumulation in parallel, for example you can sum a list in that way
d
Meaning instead of counting each element in the list, it splits the list in parts, counts each part concurrently and then adds the result together?
a
exactly
d
But that only allows one collector at a time it seems, so one couldn't compose collectors there...?
One use case I could see for more collectors is take and drop... if I have a list, and I want to do something on the first 10, and something else on the last 5 and put the list back together while only going through the list once... but then, I'd have to nest collectors somehow, or compose them like:
Copy code
zip(take(10) + mapFirstTen, drop(10) + mapLastFive, ::combineList)
Btw, thanks for clearing up a lot of my questions so far, I'm now starting to have a better idea of uses for this 👌🏼!
a
I wrote a bit of docs about this https://github.com/arrow-kt/arrow-website/pull/345 If you have some time, I'd love your feedback 🙂
d
Yeah, that's already better, although you could possibly also integrate part of that in the first paragraph of that page to give a more exact summary of what collectors are... here: https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html they define:
A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed. Reduction operations can be performed either sequentially or in parallel.
Examples of mutable reduction operations include: accumulating elements into a `Collection`; concatenating strings using a `StringBuilder`; computing summary information about elements such as sum, min, max, or average; computing "pivot table" summaries such as "maximum valued transaction by seller", etc. The class
Collectors
provides implementations of many common mutable reductions.
also, there's no example or explanation for parCollect()...
also:
Collectors also have a set of characteristics, such as
Collector.Characteristics.CONCURRENT
, that provide hints that can be used by a reduction implementation to provide better performance.
and they give examples of making your own...