If we re writing some kind of not completely trivial transfo kotlinlang #announcements

If we're writing some kind of not-completely-trivi...

Nir

12/04/2020, 7:25 PM

If we're writing some kind of not-completely-trivial transformation that can work on both sequences and iterables (as most can), it's pretty common to have to write the function twice. Is there a convention for which one should call which, in terms of the best performance? Or is there some preferred way to share code in a third function/object? Or should the code just be copy pasted? I guess usually the version that runs on iterable is the one that gets

inline

, so perhaps you'd want the sequence version to call the iterable version?

Tobias Berger

12/04/2020, 7:59 PM

I don't think there is a convention. If you just implement one version and call it from the other, the direction doesn't really matter.

Iterable<T>.asSequence()

and

Sequence<T>.asIterable()

work the same way and whether your function is inlined shouldn't depend on either of those types. I would definitely avoid duplicating the code, so I see 2 possibilities:

Tobias Berger

12/04/2020, 8:02 PM

1. Implement the logic for one type (the one more frequently used, or just thow a dice, not much of a performance impact either way) 2. Implement the logic in an additional function that uses the iterator directly and use in both of your other functions.

Nir

12/04/2020, 8:03 PM

Right. Yeah, it seems like a very much 6 of one, half a dozen of the other situation

Nir

12/04/2020, 8:04 PM

maybe some feature kotlin introduces will help with this issue

ephemient

12/04/2020, 8:04 PM

converting Sequence to Iterable will lose laziness, converting Iterable to Sequence will lose performance

Tobias Berger

12/04/2020, 8:05 PM

not true. They yust wrap the

iterator()

call and change nothing about the behaviour

Tobias Berger

12/04/2020, 8:06 PM

or to be precise: it may make a difference, depending on the actual implementations

ephemient

12/04/2020, 8:06 PM

ok, I meant: if you're using normal methods like

.map()

etc.

ephemient

12/04/2020, 8:06 PM

then Sequence -> Iterable is suddenly eager

ephemient

12/04/2020, 8:07 PM

if you are going Iterable -> Sequence, then what would have been inline code with potential unboxed values is now out-of-line code that is always boxed

Tobias Berger

12/04/2020, 8:07 PM

True. But actually, some operations on sequences also use intermittent Lists (e.g.

.sorted()

) Still, definitely something to keep in mind

Nir

12/04/2020, 8:09 PM

Yes, that's what I was thinking about putting the actual implementation in Iterable

ephemient

12/04/2020, 8:09 PM

it would be rather magical if

.sorted()

could be lazy (although Haskell manages it)

Nir

12/04/2020, 8:09 PM

Copy code

fun<T, R> Sequence<T>.chunkedBy(key: (T) -> R) = asIterable().chunkedBy(key)

Nir

12/04/2020, 8:09 PM

And the actual implementation is in Iterable.chunkedBy

Nir

12/04/2020, 8:10 PM

ah you're saying this loses laziness?

ephemient

12/04/2020, 8:10 PM

well, it depends on how

.chunkedBy()

is implemented

Nir

12/04/2020, 8:10 PM

right I see what you mean

Tobias Berger

12/04/2020, 8:11 PM

Personally, I prefer using sequences in most cases (especially when chaining operations like filter and map) so I would probably implement on Sequence if it makes sense.

Nir

12/04/2020, 8:11 PM

Well, chunkedBy is implemented using

sequence

Nir

12/04/2020, 8:11 PM

Copy code

fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = sequence {
    val iter = this@chunkedBy.iterator()
    if (!iter.hasNext()) return@sequence

Nir

12/04/2020, 8:11 PM

I guess this is bad because I'm returning a sequence from something that was passed an iterator

Nir

12/04/2020, 8:11 PM

which is unexpected behavior

ephemient

12/04/2020, 8:11 PM

hmm, in that case I think it should be an extension on Sequence to begin with

ephemient

12/04/2020, 8:12 PM

if somebody wants to lazily chunk an iterable they can call

asSequence()

themselves

ephemient

12/04/2020, 8:12 PM

most (all?) of the standard Iterable functions are eager though

ephemient

12/04/2020, 8:12 PM

(maybe all except for Grouping)

Nir

12/04/2020, 8:12 PM

@Tobias Berger I prefer sequences too and frankly dislike the existence of Iterable at all... but containers tend to be everywhere so if you want to do something quick and convenient, you want it implemented on Iterable

Nir

12/04/2020, 8:13 PM

So, I guess things that take iterator don't actually return Iterator, do they

Nir

12/04/2020, 8:13 PM

they return List

ephemient

12/04/2020, 8:13 PM

or Map, etc. but yes

Nir

12/04/2020, 8:14 PM

so let's say I did it the other way around. I implemented it on Sequence<T>.chunkedBy using sequence

Tobias Berger

12/04/2020, 8:14 PM

And I think you mean Iterable, not Iterator

Nir

12/04/2020, 8:14 PM

yes sorry

Nir

12/04/2020, 8:15 PM

And now I have:

Copy code

inline fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence.chunkedBy(key).toList()

Nir

12/04/2020, 8:15 PM

All I'm losing now is some performance, correct?

ephemient

12/04/2020, 8:15 PM

right

Nir

12/04/2020, 8:15 PM

Ah, I guess it's more than performance though

Nir

12/04/2020, 8:16 PM

with real iterables you can typically return early from lambdas

Nir

12/04/2020, 8:16 PM

listOf(1,2,3).map { return }

will work for example

ephemient

12/04/2020, 8:16 PM

not sure it makes sense for chunking, but yeah

Tobias Berger

12/04/2020, 8:17 PM

Wenn you probably don't loose much performance, But it will create a new List instance. But that is usually the expected behaviour on the Iterable-Extensions

Nir

12/04/2020, 8:17 PM

Right. Yeah I'm playing around with placing inline modifiers, and now I'm getting complaints from the compiler about cross-inlining

Nir

12/04/2020, 8:17 PM

need to reread that section 🙂

Nir

12/04/2020, 8:17 PM

@Tobias Berger right. As always I know there are reasons for everything, I just wish sequence was the default

ephemient

12/04/2020, 8:17 PM

the performance loss from the fact that your selector is now not inlined, not from the List construction

Nir

12/04/2020, 8:17 PM

(and honestly only)

ephemient

12/04/2020, 8:18 PM

if you wanted to do something crazy, Scala does allow for non-local returns from (non-inline) lambdas

Nir

12/04/2020, 8:18 PM

not that crazy, thanks 🙂

ephemient

12/04/2020, 8:18 PM

which they implement behind the scenes with throw/catch of exceptions for control flow

ephemient

12/04/2020, 8:18 PM

(that's the crazy part)

Nir

12/04/2020, 8:18 PM

I just feel like, the lazy style (sequence) is mandatory because with large containers, it's crazy inefficient and memory prohibitive to keep creating new containers each time

Nir

12/04/2020, 8:18 PM

the immediate style is a nice to have

ephemient

12/04/2020, 8:19 PM

so if you return early, and let another caller finish the sequence... they get an unexpected exception

Nir

12/04/2020, 8:19 PM

but it's also the default whenever you are transforming a list or something with map

ephemient

12/04/2020, 8:19 PM

it's not a great solution.

Nir

12/04/2020, 8:19 PM

Yeah, it sounds pretty strange.

Tobias Berger

12/04/2020, 8:19 PM

but you should still be able to early return from your lambda, even if you pass it through

Nir

12/04/2020, 8:20 PM

@Tobias Berger I don't think I can do it

Nir

12/04/2020, 8:20 PM

maybe if I use crossinline, I need to look it up

ephemient

12/04/2020, 8:20 PM

no, crossinline does the opposite of what you want: it makes a lambda to an inline function not able to

return

Nir

12/04/2020, 8:20 PM

yeah, then it's impossible

Nir

12/04/2020, 8:21 PM

Copy code

inline fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence().chunkedBy(key).toList()

Nir

12/04/2020, 8:21 PM

so now I have this

ephemient

12/04/2020, 8:21 PM

(this is necessary if you're inlining the lambda into something with a larger lifetime)

Nir

12/04/2020, 8:21 PM

if the sequence function is not inline, then this doesn't work

Nir

12/04/2020, 8:21 PM

if the sequence function is inline, then the sequence function itself complains

Tobias Berger

12/04/2020, 8:21 PM

about what?

Nir

12/04/2020, 8:22 PM

I think that you aren't allowed to inline sequence functions

Nir

12/04/2020, 8:22 PM

because they return these lazy objects that hold onto the lambda

Nir

12/04/2020, 8:22 PM

that's why they're not inlined in the standard library either

ephemient

12/04/2020, 8:22 PM

.chunkedBy { return }

is not possible without inline, and Sequence operations require non-inline lambdas

Nir

12/04/2020, 8:22 PM

Right

Tobias Berger

12/04/2020, 8:22 PM

that depends on the operation you're trying to perform. Many Sequence methods are inline

Nir

12/04/2020, 8:23 PM

non-terminal ones

Nir

12/04/2020, 8:23 PM

e.g.:

Copy code

fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R>

Nir

12/04/2020, 8:23 PM

they can be inline but then you'd need to annotate the lambda anyway

Nir

12/04/2020, 8:24 PM

the lambda itself cannot be an inline lambda, because you're not actually invoking the lambda immediately, you're returning an object that holds the lambda to invoke it later

Tobias Berger

12/04/2020, 8:24 PM

yeah, also makes sense...

ephemient

12/04/2020, 8:24 PM

yep, if they perform the transform lazily, there's no place to "return" to

Nir

12/04/2020, 8:24 PM

right

ephemient

12/04/2020, 8:24 PM

(unless you do it the Scala way... which is scary)

Nir

12/04/2020, 8:24 PM

So in conclusion, if you have non-trivial logic I'd say the way to go is to implement it for Sequence. using sequence

Nir

12/04/2020, 8:24 PM

Then do:

Copy code

fun<T, R> Iterable<T>.chunkedBy(key: (T) -> R) = asSequence().chunkedBy(key).toList()

Nir

12/04/2020, 8:25 PM

i.e. call the sequence version but immediately evaluate back to a list

Tobias Berger

12/04/2020, 8:25 PM

unless you really need to have it inlined

Nir

12/04/2020, 8:25 PM

and now you have the expected semantics in both cases

Nir

12/04/2020, 8:25 PM

Then you have no choice but to copy and paste 😞

Tobias Berger

12/04/2020, 8:25 PM

or implement it on iterable

Nir

12/04/2020, 8:25 PM

you can factor out bits of logic into a third function but you can't simply factor the whole thing, in particular the third function can't touch the lambda itself

Nir

12/04/2020, 8:26 PM

if you implement it on iterable, what does the iterable return?

Nir

12/04/2020, 8:26 PM

A list, or a sequence?

Tobias Berger

12/04/2020, 8:26 PM

usually a list (ore whatever makes sense). But you can use that and turn it back into a sequenc

Nir

12/04/2020, 8:27 PM

yes, but it breaks the expected semantics of sequence

Tobias Berger

12/04/2020, 8:27 PM

as do some of the standard library functions on sequence.

Nir

12/04/2020, 8:27 PM

which are you thinking of?

Nir

12/04/2020, 8:28 PM

there are terminal functions on sequences, yes

Nir

12/04/2020, 8:28 PM

this function could be terminal but there's absolutely no reason for it to be

Tobias Berger

12/04/2020, 8:28 PM

e.g.

.sorted()

Nir

12/04/2020, 8:28 PM

so in that sense it would be very very surprising for the user

Nir

12/04/2020, 8:28 PM

but there's no other way to implemented sorted

ephemient

12/04/2020, 8:29 PM

... there is, and Haskell it lazily

Nir

12/04/2020, 8:29 PM

wait... it returns a sequence....

Tobias Berger

12/04/2020, 8:29 PM

right, otherwise I'd expect them to have used it.

Nir

12/04/2020, 8:29 PM

what is this function doing

Nir

12/04/2020, 8:30 PM

I'd be very nervous to use that function without understanding the idea behind it (if I cared about performance)

Tobias Berger

12/04/2020, 8:31 PM

well, like you said. You can't sort without getting all elements, so there is just no other way

Nir

12/04/2020, 8:31 PM

it returns a sequence though

Nir

12/04/2020, 8:31 PM

not a list

ephemient

12/04/2020, 8:31 PM

Kotlin's

Sequence.sorted()

is sorta-eager. it doesn't sort until a terminal operation is performed, but it sorts everything as soon as pulled

ephemient

12/04/2020, 8:32 PM

so it is kind of strange for a sequence, but it can't be a list

Tobias Berger

12/04/2020, 8:32 PM

just what I wanted to say

Nir

12/04/2020, 8:32 PM

damn that's weird

ephemient

12/04/2020, 8:32 PM

(for comparison, the laziness in Haskell's sort is that it performs the comparisons lazily - if you only look at the first element of the result, it only performs O(n) comparisons, not O(n log n).)

Nir

12/04/2020, 8:33 PM

I can't say I'm really a fan of it per se. I understand it could be convenient in some cases, but it makes it very unclear what's really happening.

Nir

12/04/2020, 8:33 PM

The reason for laziness here usually isn't laziness for its own sake, it's to be able to compose data structures without crazy overhead.

Nir

12/04/2020, 8:33 PM

Yeah, haskell is lazy from the ground up though, very different

Tobias Berger

12/04/2020, 8:34 PM

@ephemient so it does the sorting lazy, but still requires all elements

Nir

12/04/2020, 8:34 PM

Also, Haskell's approach would likely result in N^2 sorting

ephemient

12/04/2020, 8:34 PM

yes. obviously you do need to look at all elements at least once to determine which one is the minimum

Nir

12/04/2020, 8:34 PM

unless there's some very deep magic there

ephemient

12/04/2020, 8:34 PM

no, it's still O(n log n) in the end

Nir

12/04/2020, 8:35 PM

Does it use quick select?

Nir

12/04/2020, 8:35 PM

I guess that would be a good compromise

ephemient

12/04/2020, 8:35 PM

when your whole language is lazy, quicksort and quickselect are the same algorithm

Nir

12/04/2020, 8:35 PM

eh I dunno about that, the "haskell quicksort" isn't really quicksort

ephemient

12/04/2020, 8:36 PM

quicksort (as implemented in tutorials) is not really quicksort, true

Tobias Berger

12/04/2020, 8:36 PM

The reason for laziness here usually isn't laziness for its own sake, it's to be able to compose data structures without crazy overhead.

I'd say that's basically the point. It's not so much about creating a lazy, pull-based flow, as it is about building a kind of pipeline based on iterator implementations, which doesn't need a new collection instance to save the result of each step.

Nir

12/04/2020, 8:36 PM

Right, exactly

Nir

12/04/2020, 8:37 PM

Which is why I'm not a fan of this behavior, it is creating a new collection so I'd rather just b ehonest about that and return a list and say it's terminal, that's just me though

Nir

12/04/2020, 8:37 PM

Either way though, it's definitely not desirable

Nir

12/04/2020, 8:37 PM

so I think the "least of all evils" here for common use cases is to implement on sequence, and then implement iterable by calling the sequence implementation eagerly into a list

Tobias Berger

12/04/2020, 8:38 PM

But still, you might need it in some situations and then you're happy to have it.

ephemient

12/04/2020, 8:38 PM

"most of all evils" solution, implement a macro processor and generate both lazy and eager variants separately

ephemient

12/04/2020, 8:41 PM

for what it's worth, the Kotlin standard library does this, to generate all those collection-like methods for the various array specializations: https://github.com/JetBrains/kotlin/tree/master/libraries/tools/kotlin-stdlib-gen

Tobias Berger

12/04/2020, 8:45 PM

I have used sorted() on a sequence before and think it is right to have it there, But still I actually think sorted() breakes the documentation of Sequence. It sais "The values are evaluated lazily", which is not true in this case. But what comes after this is even worse: "the sequence is potentially infinite." Calling .sorted() on an infinite sequence will at least end with an exeption, not unlikely even an OutOfMemoryError (e.g. int overflow in ArrayList size)

Nir

12/04/2020, 8:56 PM

I think having sorted is fine

Nir

12/04/2020, 8:56 PM

I just think it should be a terminal operation returning list

Nir

12/04/2020, 8:57 PM

That is after all what's happening under the hood anyway

Tobias Berger

12/04/2020, 8:57 PM

not exactly

Tobias Berger

12/04/2020, 8:57 PM

the list is only created once a terminal function is actually called on the sequence

Nir

12/04/2020, 8:58 PM

yes, but when it is called, everything is processed into the list

Nir

12/04/2020, 8:58 PM

and then asSequence is called on the list and returned

Tobias Berger

12/04/2020, 8:58 PM

also, if you're sorting a sequence, you might want to add more sequence-based operations afterwards. At least that's more likely than wanting to add List-Based operations

Nir

12/04/2020, 8:59 PM

it doesn't actually follow the processing model of a sequence

Tobias Berger

12/04/2020, 8:59 PM

it partly does

Nir

12/04/2020, 8:59 PM

If you return a list you'd free to call asSequence and continue

Nir

12/04/2020, 8:59 PM

by "processing order" i mean that sequences apply all the operations to one piece of data, then all the operations to the next piece of data, etc

Nir

12/04/2020, 8:59 PM

sorted() doesn't obey that at all

Nir

12/04/2020, 9:00 PM

it will look at every single piece of data that enters it

Nir

12/04/2020, 9:00 PM

immediately, before anything is called from the next sequence in the chain

Tobias Berger

12/04/2020, 9:01 PM

If it returned a List directly, the sequence would be terminated in the moment you call sorted(), which doesn't happen with the actual implementation.

Nir

12/04/2020, 9:01 PM

Yes, I've agreed it's not 100% identical

Nir

12/04/2020, 9:01 PM

Well, okay, I see, you could in principle want to return a lazy sequence until later, and avoid terminating it...

Tobias Berger

12/04/2020, 9:01 PM

of course you can't maintain execution order if you call a method that has the sole purpose of changing the order

Nir

12/04/2020, 9:02 PM

sole purpose?

Nir

12/04/2020, 9:02 PM

oh i see

Tobias Berger

12/04/2020, 9:03 PM

150 comments. This might be our new record

Nir

12/04/2020, 9:03 PM

Indeed 🙂

Nir

12/04/2020, 9:03 PM

at least it was a fun convo

➕ 1

Nir

12/04/2020, 9:04 PM

I think it's worth having the sequence/iterable implementation strategy as a tip somewhere on the kotlikn docs or on a blog post

Nir

12/04/2020, 9:04 PM

I have thought about this problem before and I still learned a lot here

3 Views

Open in Slack

Previous Next