If we're writing some kind of not-completely-trivi...
# announcements
n
If we're writing some kind of not-completely-trivial transformation that can work on both sequences and iterables (as most can), it's pretty common to have to write the function twice. Is there a convention for which one should call which, in terms of the best performance? Or is there some preferred way to share code in a third function/object? Or should the code just be copy pasted? I guess usually the version that runs on iterable is the one that gets
inline
, so perhaps you'd want the sequence version to call the iterable version?
t
I don't think there is a convention. If you just implement one version and call it from the other, the direction doesn't really matter.
Iterable<T>.asSequence()
and
Sequence<T>.asIterable()
work the same way and whether your function is inlined shouldn't depend on either of those types. I would definitely avoid duplicating the code, so I see 2 possibilities:
1. Implement the logic for one type (the one more frequently used, or just thow a dice, not much of a performance impact either way) 2. Implement the logic in an additional function that uses the iterator directly and use in both of your other functions.
n
Right. Yeah, it seems like a very much 6 of one, half a dozen of the other situation
maybe some feature kotlin introduces will help with this issue
e
converting Sequence to Iterable will lose laziness, converting Iterable to Sequence will lose performance
t
not true. They yust wrap the
iterator()
call and change nothing about the behaviour
or to be precise: it may make a difference, depending on the actual implementations
e
ok, I meant: if you're using normal methods like
.map()
etc.
then Sequence -> Iterable is suddenly eager
if you are going Iterable -> Sequence, then what would have been inline code with potential unboxed values is now out-of-line code that is always boxed
t
True. But actually, some operations on sequences also use intermittent Lists (e.g.
.sorted()
) Still, definitely something to keep in mind
n
Yes, that's what I was thinking about putting the actual implementation in Iterable
e
it would be rather magical if
.sorted()
could be lazy (although Haskell manages it)
n
Copy code
fun<T, R> Sequence<T>.chunkedBy(key: (T) -> R) = asIterable().chunkedBy(key)
And the actual implementation is in Iterable.chunkedBy
ah you're saying this loses laziness?
e
well, it depends on how
.chunkedBy()
is implemented
n
right I see what you mean
t
Personally, I prefer using sequences in most cases (especially when chaining operations like filter and map) so I would probably implement on Sequence if it makes sense.
n
Well, chunkedBy is implemented using
sequence
Copy code
fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = sequence {
    val iter = this@chunkedBy.iterator()
    if (!iter.hasNext()) return@sequence
I guess this is bad because I'm returning a sequence from something that was passed an iterator
which is unexpected behavior
e
hmm, in that case I think it should be an extension on Sequence to begin with
if somebody wants to lazily chunk an iterable they can call
asSequence()
themselves
most (all?) of the standard Iterable functions are eager though
(maybe all except for Grouping)
n
@Tobias Berger I prefer sequences too and frankly dislike the existence of Iterable at all... but containers tend to be everywhere so if you want to do something quick and convenient, you want it implemented on Iterable
So, I guess things that take iterator don't actually return Iterator, do they
they return List
e
or Map, etc. but yes
n
so let's say I did it the other way around. I implemented it on Sequence<T>.chunkedBy using sequence
t
And I think you mean Iterable, not Iterator
n
yes sorry
And now I have:
Copy code
inline fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence.chunkedBy(key).toList()
All I'm losing now is some performance, correct?
e
right
n
Ah, I guess it's more than performance though
with real iterables you can typically return early from lambdas
listOf(1,2,3).map { return }
will work for example
e
not sure it makes sense for chunking, but yeah
t
Wenn you probably don't loose much performance, But it will create a new List instance. But that is usually the expected behaviour on the Iterable-Extensions
n
Right. Yeah I'm playing around with placing inline modifiers, and now I'm getting complaints from the compiler about cross-inlining
need to reread that section 🙂
@Tobias Berger right. As always I know there are reasons for everything, I just wish sequence was the default
e
the performance loss from the fact that your selector is now not inlined, not from the List construction
n
(and honestly only)
e
if you wanted to do something crazy, Scala does allow for non-local returns from (non-inline) lambdas
n
not that crazy, thanks 🙂
e
which they implement behind the scenes with throw/catch of exceptions for control flow
(that's the crazy part)
n
I just feel like, the lazy style (sequence) is mandatory because with large containers, it's crazy inefficient and memory prohibitive to keep creating new containers each time
the immediate style is a nice to have
e
so if you return early, and let another caller finish the sequence... they get an unexpected exception
n
but it's also the default whenever you are transforming a list or something with map
e
it's not a great solution.
n
Yeah, it sounds pretty strange.
t
but you should still be able to early return from your lambda, even if you pass it through
n
@Tobias Berger I don't think I can do it
maybe if I use crossinline, I need to look it up
e
no, crossinline does the opposite of what you want: it makes a lambda to an inline function not able to
return
n
yeah, then it's impossible
Copy code
inline fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence().chunkedBy(key).toList()
so now I have this
e
(this is necessary if you're inlining the lambda into something with a larger lifetime)
n
if the sequence function is not inline, then this doesn't work
if the sequence function is inline, then the sequence function itself complains
t
about what?
n
I think that you aren't allowed to inline sequence functions
because they return these lazy objects that hold onto the lambda
that's why they're not inlined in the standard library either
e
.chunkedBy { return }
is not possible without inline, and Sequence operations require non-inline lambdas
n
Right
t
that depends on the operation you're trying to perform. Many Sequence methods are inline
n
non-terminal ones
e.g.:
Copy code
fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R>
they can be inline but then you'd need to annotate the lambda anyway
the lambda itself cannot be an inline lambda, because you're not actually invoking the lambda immediately, you're returning an object that holds the lambda to invoke it later
t
yeah, also makes sense...
e
yep, if they perform the transform lazily, there's no place to "return" to
n
right
e
(unless you do it the Scala way... which is scary)
n
So in conclusion, if you have non-trivial logic I'd say the way to go is to implement it for Sequence. using sequence
Then do:
Copy code
fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence().chunkedBy(key).toList()
i.e. call the sequence version but immediately evaluate back to a list
t
unless you really need to have it inlined
n
and now you have the expected semantics in both cases
Then you have no choice but to copy and paste 😞
t
or implement it on iterable
n
you can factor out bits of logic into a third function but you can't simply factor the whole thing, in particular the third function can't touch the lambda itself
if you implement it on iterable, what does the iterable return?
A list, or a sequence?
t
usually a list (ore whatever makes sense). But you can use that and turn it back into a sequenc
n
yes, but it breaks the expected semantics of sequence
t
as do some of the standard library functions on sequence.
n
which are you thinking of?
there are terminal functions on sequences, yes
this function could be terminal but there's absolutely no reason for it to be
t
e.g.
.sorted()
n
so in that sense it would be very very surprising for the user
but there's no other way to implemented sorted
e
... there is, and Haskell it lazily
n
wait... it returns a sequence....
t
right, otherwise I'd expect them to have used it.
n
what is this function doing
I'd be very nervous to use that function without understanding the idea behind it (if I cared about performance)
t
well, like you said. You can't sort without getting all elements, so there is just no other way
n
it returns a sequence though
not a list
e
Kotlin's
Sequence.sorted()
is sorta-eager. it doesn't sort until a terminal operation is performed, but it sorts everything as soon as pulled
so it is kind of strange for a sequence, but it can't be a list
t
just what I wanted to say
n
damn that's weird
e
(for comparison, the laziness in Haskell's sort is that it performs the comparisons lazily - if you only look at the first element of the result, it only performs O(n) comparisons, not O(n log n).)
n
I can't say I'm really a fan of it per se. I understand it could be convenient in some cases, but it makes it very unclear what's really happening.
The reason for laziness here usually isn't laziness for its own sake, it's to be able to compose data structures without crazy overhead.
Yeah, haskell is lazy from the ground up though, very different
t
@ephemient so it does the sorting lazy, but still requires all elements
n
Also, Haskell's approach would likely result in N^2 sorting
e
yes. obviously you do need to look at all elements at least once to determine which one is the minimum
n
unless there's some very deep magic there
e
no, it's still O(n log n) in the end
n
Does it use quick select?
I guess that would be a good compromise
e
when your whole language is lazy, quicksort and quickselect are the same algorithm
n
eh I dunno about that, the "haskell quicksort" isn't really quicksort
e
quicksort (as implemented in tutorials) is not really quicksort, true
t
The reason for laziness here usually isn't laziness for its own sake, it's to be able to compose data structures without crazy overhead.
I'd say that's basically the point. It's not so much about creating a lazy, pull-based flow, as it is about building a kind of pipeline based on iterator implementations, which doesn't need a new collection instance to save the result of each step.
n
Right, exactly
Which is why I'm not a fan of this behavior, it is creating a new collection so I'd rather just b ehonest about that and return a list and say it's terminal, that's just me though
Either way though, it's definitely not desirable
so I think the "least of all evils" here for common use cases is to implement on sequence, and then implement iterable by calling the sequence implementation eagerly into a list
t
But still, you might need it in some situations and then you're happy to have it.
e
"most of all evils" solution, implement a macro processor and generate both lazy and eager variants separately
for what it's worth, the Kotlin standard library does this, to generate all those collection-like methods for the various array specializations: https://github.com/JetBrains/kotlin/tree/master/libraries/tools/kotlin-stdlib-gen
t
I have used sorted() on a sequence before and think it is right to have it there, But still I actually think sorted() breakes the documentation of Sequence. It sais "The values are evaluated lazily", which is not true in this case. But what comes after this is even worse: "the sequence is potentially infinite." Calling .sorted() on an infinite sequence will at least end with an exeption, not unlikely even an OutOfMemoryError (e.g. int overflow in ArrayList size)
n
I think having sorted is fine
I just think it should be a terminal operation returning list
That is after all what's happening under the hood anyway
t
not exactly
the list is only created once a terminal function is actually called on the sequence
n
yes, but when it is called, everything is processed into the list
and then asSequence is called on the list and returned
t
also, if you're sorting a sequence, you might want to add more sequence-based operations afterwards. At least that's more likely than wanting to add List-Based operations
n
it doesn't actually follow the processing model of a sequence
t
it partly does
n
If you return a list you'd free to call asSequence and continue
by "processing order" i mean that sequences apply all the operations to one piece of data, then all the operations to the next piece of data, etc
sorted() doesn't obey that at all
it will look at every single piece of data that enters it
immediately, before anything is called from the next sequence in the chain
t
If it returned a List directly, the sequence would be terminated in the moment you call sorted(), which doesn't happen with the actual implementation.
n
Yes, I've agreed it's not 100% identical
Well, okay, I see, you could in principle want to return a lazy sequence until later, and avoid terminating it...
t
of course you can't maintain execution order if you call a method that has the sole purpose of changing the order
n
sole purpose?
oh i see
t
150 comments. This might be our new record
n
Indeed 🙂
at least it was a fun convo
âž• 1
I think it's worth having the sequence/iterable implementation strategy as a tip somewhere on the kotlikn docs or on a blog post
I have thought about this problem before and I still learned a lot here