https://kotlinlang.org logo
n

Nir

12/04/2020, 7:25 PM
If we're writing some kind of not-completely-trivial transformation that can work on both sequences and iterables (as most can), it's pretty common to have to write the function twice. Is there a convention for which one should call which, in terms of the best performance? Or is there some preferred way to share code in a third function/object? Or should the code just be copy pasted? I guess usually the version that runs on iterable is the one that gets
inline
, so perhaps you'd want the sequence version to call the iterable version?
t

Tobias Berger

12/04/2020, 7:59 PM
I don't think there is a convention. If you just implement one version and call it from the other, the direction doesn't really matter.
Iterable<T>.asSequence()
and
Sequence<T>.asIterable()
work the same way and whether your function is inlined shouldn't depend on either of those types. I would definitely avoid duplicating the code, so I see 2 possibilities:
1. Implement the logic for one type (the one more frequently used, or just thow a dice, not much of a performance impact either way) 2. Implement the logic in an additional function that uses the iterator directly and use in both of your other functions.
n

Nir

12/04/2020, 8:03 PM
Right. Yeah, it seems like a very much 6 of one, half a dozen of the other situation
maybe some feature kotlin introduces will help with this issue
e

ephemient

12/04/2020, 8:04 PM
converting Sequence to Iterable will lose laziness, converting Iterable to Sequence will lose performance
t

Tobias Berger

12/04/2020, 8:05 PM
not true. They yust wrap the
iterator()
call and change nothing about the behaviour
or to be precise: it may make a difference, depending on the actual implementations
e

ephemient

12/04/2020, 8:06 PM
ok, I meant: if you're using normal methods like
.map()
etc.
then Sequence -> Iterable is suddenly eager
if you are going Iterable -> Sequence, then what would have been inline code with potential unboxed values is now out-of-line code that is always boxed
t

Tobias Berger

12/04/2020, 8:07 PM
True. But actually, some operations on sequences also use intermittent Lists (e.g.
.sorted()
) Still, definitely something to keep in mind
n

Nir

12/04/2020, 8:09 PM
Yes, that's what I was thinking about putting the actual implementation in Iterable
e

ephemient

12/04/2020, 8:09 PM
it would be rather magical if
.sorted()
could be lazy (although Haskell manages it)
n

Nir

12/04/2020, 8:09 PM
Copy code
fun<T, R> Sequence<T>.chunkedBy(key: (T) -> R) = asIterable().chunkedBy(key)
And the actual implementation is in Iterable.chunkedBy
ah you're saying this loses laziness?
e

ephemient

12/04/2020, 8:10 PM
well, it depends on how
.chunkedBy()
is implemented
n

Nir

12/04/2020, 8:10 PM
right I see what you mean
t

Tobias Berger

12/04/2020, 8:11 PM
Personally, I prefer using sequences in most cases (especially when chaining operations like filter and map) so I would probably implement on Sequence if it makes sense.
n

Nir

12/04/2020, 8:11 PM
Well, chunkedBy is implemented using
sequence
Copy code
fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = sequence {
    val iter = this@chunkedBy.iterator()
    if (!iter.hasNext()) return@sequence
I guess this is bad because I'm returning a sequence from something that was passed an iterator
which is unexpected behavior
e

ephemient

12/04/2020, 8:11 PM
hmm, in that case I think it should be an extension on Sequence to begin with
if somebody wants to lazily chunk an iterable they can call
asSequence()
themselves
most (all?) of the standard Iterable functions are eager though
(maybe all except for Grouping)
n

Nir

12/04/2020, 8:12 PM
@Tobias Berger I prefer sequences too and frankly dislike the existence of Iterable at all... but containers tend to be everywhere so if you want to do something quick and convenient, you want it implemented on Iterable
So, I guess things that take iterator don't actually return Iterator, do they
they return List
e

ephemient

12/04/2020, 8:13 PM
or Map, etc. but yes
n

Nir

12/04/2020, 8:14 PM
so let's say I did it the other way around. I implemented it on Sequence<T>.chunkedBy using sequence
t

Tobias Berger

12/04/2020, 8:14 PM
And I think you mean Iterable, not Iterator
n

Nir

12/04/2020, 8:14 PM
yes sorry
And now I have:
Copy code
inline fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence.chunkedBy(key).toList()
All I'm losing now is some performance, correct?
e

ephemient

12/04/2020, 8:15 PM
right
n

Nir

12/04/2020, 8:15 PM
Ah, I guess it's more than performance though
with real iterables you can typically return early from lambdas
listOf(1,2,3).map { return }
will work for example
e

ephemient

12/04/2020, 8:16 PM
not sure it makes sense for chunking, but yeah
t

Tobias Berger

12/04/2020, 8:17 PM
Wenn you probably don't loose much performance, But it will create a new List instance. But that is usually the expected behaviour on the Iterable-Extensions
n

Nir

12/04/2020, 8:17 PM
Right. Yeah I'm playing around with placing inline modifiers, and now I'm getting complaints from the compiler about cross-inlining
need to reread that section 🙂
@Tobias Berger right. As always I know there are reasons for everything, I just wish sequence was the default
e

ephemient

12/04/2020, 8:17 PM
the performance loss from the fact that your selector is now not inlined, not from the List construction
n

Nir

12/04/2020, 8:17 PM
(and honestly only)
e

ephemient

12/04/2020, 8:18 PM
if you wanted to do something crazy, Scala does allow for non-local returns from (non-inline) lambdas
n

Nir

12/04/2020, 8:18 PM
not that crazy, thanks 🙂
e

ephemient

12/04/2020, 8:18 PM
which they implement behind the scenes with throw/catch of exceptions for control flow
(that's the crazy part)
n

Nir

12/04/2020, 8:18 PM
I just feel like, the lazy style (sequence) is mandatory because with large containers, it's crazy inefficient and memory prohibitive to keep creating new containers each time
the immediate style is a nice to have
e

ephemient

12/04/2020, 8:19 PM
so if you return early, and let another caller finish the sequence... they get an unexpected exception
n

Nir

12/04/2020, 8:19 PM
but it's also the default whenever you are transforming a list or something with map
e

ephemient

12/04/2020, 8:19 PM
it's not a great solution.
n

Nir

12/04/2020, 8:19 PM
Yeah, it sounds pretty strange.
t

Tobias Berger

12/04/2020, 8:19 PM
but you should still be able to early return from your lambda, even if you pass it through
n

Nir

12/04/2020, 8:20 PM
@Tobias Berger I don't think I can do it
maybe if I use crossinline, I need to look it up
e

ephemient

12/04/2020, 8:20 PM
no, crossinline does the opposite of what you want: it makes a lambda to an inline function not able to
return
n

Nir

12/04/2020, 8:20 PM
yeah, then it's impossible
Copy code
inline fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence().chunkedBy(key).toList()
so now I have this
e

ephemient

12/04/2020, 8:21 PM
(this is necessary if you're inlining the lambda into something with a larger lifetime)
n

Nir

12/04/2020, 8:21 PM
if the sequence function is not inline, then this doesn't work
if the sequence function is inline, then the sequence function itself complains
t

Tobias Berger

12/04/2020, 8:21 PM
about what?
n

Nir

12/04/2020, 8:22 PM
I think that you aren't allowed to inline sequence functions
because they return these lazy objects that hold onto the lambda
that's why they're not inlined in the standard library either
e

ephemient

12/04/2020, 8:22 PM
.chunkedBy { return }
is not possible without inline, and Sequence operations require non-inline lambdas
n

Nir

12/04/2020, 8:22 PM
Right
t

Tobias Berger

12/04/2020, 8:22 PM
that depends on the operation you're trying to perform. Many Sequence methods are inline
n

Nir

12/04/2020, 8:23 PM
non-terminal ones
e.g.:
Copy code
fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R>
they can be inline but then you'd need to annotate the lambda anyway
the lambda itself cannot be an inline lambda, because you're not actually invoking the lambda immediately, you're returning an object that holds the lambda to invoke it later
t

Tobias Berger

12/04/2020, 8:24 PM
yeah, also makes sense...
e

ephemient

12/04/2020, 8:24 PM
yep, if they perform the transform lazily, there's no place to "return" to
n

Nir

12/04/2020, 8:24 PM
right
e

ephemient

12/04/2020, 8:24 PM
(unless you do it the Scala way... which is scary)
n

Nir

12/04/2020, 8:24 PM
So in conclusion, if you have non-trivial logic I'd say the way to go is to implement it for Sequence. using sequence
Then do:
Copy code
fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence().chunkedBy(key).toList()
i.e. call the sequence version but immediately evaluate back to a list
t

Tobias Berger

12/04/2020, 8:25 PM
unless you really need to have it inlined
n

Nir

12/04/2020, 8:25 PM
and now you have the expected semantics in both cases
Then you have no choice but to copy and paste 😞
t

Tobias Berger

12/04/2020, 8:25 PM
or implement it on iterable
n

Nir

12/04/2020, 8:25 PM
you can factor out bits of logic into a third function but you can't simply factor the whole thing, in particular the third function can't touch the lambda itself
if you implement it on iterable, what does the iterable return?
A list, or a sequence?
t

Tobias Berger

12/04/2020, 8:26 PM
usually a list (ore whatever makes sense). But you can use that and turn it back into a sequenc
n

Nir

12/04/2020, 8:27 PM
yes, but it breaks the expected semantics of sequence
t

Tobias Berger

12/04/2020, 8:27 PM
as do some of the standard library functions on sequence.
n

Nir

12/04/2020, 8:27 PM
which are you thinking of?
there are terminal functions on sequences, yes
this function could be terminal but there's absolutely no reason for it to be
t

Tobias Berger

12/04/2020, 8:28 PM
e.g.
.sorted()
n

Nir

12/04/2020, 8:28 PM
so in that sense it would be very very surprising for the user
but there's no other way to implemented sorted
e

ephemient

12/04/2020, 8:29 PM
... there is, and Haskell it lazily
n

Nir

12/04/2020, 8:29 PM
wait... it returns a sequence....
t

Tobias Berger

12/04/2020, 8:29 PM
right, otherwise I'd expect them to have used it.
n

Nir

12/04/2020, 8:29 PM
what is this function doing
I'd be very nervous to use that function without understanding the idea behind it (if I cared about performance)
t

Tobias Berger

12/04/2020, 8:31 PM
well, like you said. You can't sort without getting all elements, so there is just no other way
n

Nir

12/04/2020, 8:31 PM
it returns a sequence though
not a list
e

ephemient

12/04/2020, 8:31 PM
Kotlin's
Sequence.sorted()
is sorta-eager. it doesn't sort until a terminal operation is performed, but it sorts everything as soon as pulled
so it is kind of strange for a sequence, but it can't be a list
t

Tobias Berger

12/04/2020, 8:32 PM
just what I wanted to say
n

Nir

12/04/2020, 8:32 PM
damn that's weird
e

ephemient

12/04/2020, 8:32 PM
(for comparison, the laziness in Haskell's sort is that it performs the comparisons lazily - if you only look at the first element of the result, it only performs O(n) comparisons, not O(n log n).)
n

Nir

12/04/2020, 8:33 PM
I can't say I'm really a fan of it per se. I understand it could be convenient in some cases, but it makes it very unclear what's really happening.
The reason for laziness here usually isn't laziness for its own sake, it's to be able to compose data structures without crazy overhead.
Yeah, haskell is lazy from the ground up though, very different
t

Tobias Berger

12/04/2020, 8:34 PM
@ephemient so it does the sorting lazy, but still requires all elements
n

Nir

12/04/2020, 8:34 PM
Also, Haskell's approach would likely result in N^2 sorting
e

ephemient

12/04/2020, 8:34 PM
yes. obviously you do need to look at all elements at least once to determine which one is the minimum
n

Nir

12/04/2020, 8:34 PM
unless there's some very deep magic there
e

ephemient

12/04/2020, 8:34 PM
no, it's still O(n log n) in the end
n

Nir

12/04/2020, 8:35 PM
Does it use quick select?
I guess that would be a good compromise
e

ephemient

12/04/2020, 8:35 PM
when your whole language is lazy, quicksort and quickselect are the same algorithm
n

Nir

12/04/2020, 8:35 PM
eh I dunno about that, the "haskell quicksort" isn't really quicksort
e

ephemient

12/04/2020, 8:36 PM
quicksort (as implemented in tutorials) is not really quicksort, true
t

Tobias Berger

12/04/2020, 8:36 PM
The reason for laziness here usually isn't laziness for its own sake, it's to be able to compose data structures without crazy overhead.
I'd say that's basically the point. It's not so much about creating a lazy, pull-based flow, as it is about building a kind of pipeline based on iterator implementations, which doesn't need a new collection instance to save the result of each step.
n

Nir

12/04/2020, 8:36 PM
Right, exactly
Which is why I'm not a fan of this behavior, it is creating a new collection so I'd rather just b ehonest about that and return a list and say it's terminal, that's just me though
Either way though, it's definitely not desirable
so I think the "least of all evils" here for common use cases is to implement on sequence, and then implement iterable by calling the sequence implementation eagerly into a list
t

Tobias Berger

12/04/2020, 8:38 PM
But still, you might need it in some situations and then you're happy to have it.
e

ephemient

12/04/2020, 8:38 PM
"most of all evils" solution, implement a macro processor and generate both lazy and eager variants separately
for what it's worth, the Kotlin standard library does this, to generate all those collection-like methods for the various array specializations: https://github.com/JetBrains/kotlin/tree/master/libraries/tools/kotlin-stdlib-gen
t

Tobias Berger

12/04/2020, 8:45 PM
I have used sorted() on a sequence before and think it is right to have it there, But still I actually think sorted() breakes the documentation of Sequence. It sais "The values are evaluated lazily", which is not true in this case. But what comes after this is even worse: "the sequence is potentially infinite." Calling .sorted() on an infinite sequence will at least end with an exeption, not unlikely even an OutOfMemoryError (e.g. int overflow in ArrayList size)
n

Nir

12/04/2020, 8:56 PM
I think having sorted is fine
I just think it should be a terminal operation returning list
That is after all what's happening under the hood anyway
t

Tobias Berger

12/04/2020, 8:57 PM
not exactly
the list is only created once a terminal function is actually called on the sequence
n

Nir

12/04/2020, 8:58 PM
yes, but when it is called, everything is processed into the list
and then asSequence is called on the list and returned
t

Tobias Berger

12/04/2020, 8:58 PM
also, if you're sorting a sequence, you might want to add more sequence-based operations afterwards. At least that's more likely than wanting to add List-Based operations
n

Nir

12/04/2020, 8:59 PM
it doesn't actually follow the processing model of a sequence
t

Tobias Berger

12/04/2020, 8:59 PM
it partly does
n

Nir

12/04/2020, 8:59 PM
If you return a list you'd free to call asSequence and continue
by "processing order" i mean that sequences apply all the operations to one piece of data, then all the operations to the next piece of data, etc
sorted() doesn't obey that at all
it will look at every single piece of data that enters it
immediately, before anything is called from the next sequence in the chain
t

Tobias Berger

12/04/2020, 9:01 PM
If it returned a List directly, the sequence would be terminated in the moment you call sorted(), which doesn't happen with the actual implementation.
n

Nir

12/04/2020, 9:01 PM
Yes, I've agreed it's not 100% identical
Well, okay, I see, you could in principle want to return a lazy sequence until later, and avoid terminating it...
t

Tobias Berger

12/04/2020, 9:01 PM
of course you can't maintain execution order if you call a method that has the sole purpose of changing the order
n

Nir

12/04/2020, 9:02 PM
sole purpose?
oh i see
t

Tobias Berger

12/04/2020, 9:03 PM
150 comments. This might be our new record
n

Nir

12/04/2020, 9:03 PM
Indeed 🙂
at least it was a fun convo
1
I think it's worth having the sequence/iterable implementation strategy as a tip somewhere on the kotlikn docs or on a blog post
I have thought about this problem before and I still learned a lot here
2 Views