Suppose that you wanted to combine two generated lists ``` v kotlinlang #language-evolution

Suppose that you wanted to combine two generated l...

Derek Peirce

03/27/2021, 6:04 AM

Suppose that you wanted to combine two generated lists:

Copy code

val listA = listOf("foo", "bar")
    val listB = listOf("alpha", "beta", "gamma")

    val firstLetters = listA.map { it.first() } + listB.map { it.first() }

works, and it's as concise as can be, so everything is fine, right? Well, it works, but not very efficiently. If we inline the methods, we see that each

map

creates its own

ArrayList

, and then a third

ArrayList

is created with

to store all of them, which is two more list objects than we actually needed, because

respected that the individual lists shouldn't be mutated, even though they're never used again beyond this block of code. Consider a rewritten, more efficient form:

Copy code

val firstLetters = ArrayList<Char>(listA.size + listB.size).apply {
        listA.mapTo(this) { it.first() }
        listB.mapTo(this) { it.first() }
    }

This creates exactly the single list that we need, no wasted objects. It's also terribly ugly. But what if Kotlin allowed us to use the concise expressiveness of the first functional line with the efficiency of the four lines above? It would have to recognize that

map

has a corresponding

mapTo

, and then that the results of

map

are used only by

. The second requirement is easy, but the first requires a bit more. What if

mapTo

didn't exist as a separate method from

map

? Suppose

map

were written as:

Copy code

public inline fun <T, R> Iterable<T>.map(transform: (T) -> R): List<R> {
    @CollectionBuilder val destination = ArrayList<R>(collectionSizeOrDefault(10))
    for (item in this)
        destination.add(transform(item))
    return destination
}

@CollectionBuilder

would indicate to the compiler that it should build two versions of

map

, the one above and a transformed one, `map_collectionBuilder`:

Copy code

public inline fun <T, R, C : MutableCollection<in R>> Iterable<T>.map_collectionBuilder(destination: C, transform: (T) -> R): C {
    for (item in this)
        destination.add(transform(item))
    return destination
}

This is, of course, the original

mapTo

. As long as the result of

add

is never checked, it should be possible for any method that builds its own collection to instead append to an existing one, with the type

List

being effectively the default output. The compiler can then create its own empty list and fill it with each

map_collectionBuilder

call automatically. The only remaining problem is how to right-size that list. Maybe it would add the two existing

collectionSizeOrDefault

calls together, but that could get tricky to find. Or maybe the

map

method would need an annotation to give a hint to this collection-optimization step that this method creates a collection that is exactly as large as the input,

@SameSizeOutput

, while others like

filter

may indicate that as a ceiling,

@SameSizeOrSmallerOutput

. In such a case, perhaps the methods would be inverted, and we write this one method instead:

Copy code

public inline fun <T, R, C : MutableCollection<in R>> Iterable<T>.mapTo(@CollectionBuilder(directName = "map", size = SAME_SIZE) destination: C, transform: (T) -> R): C {
    for (item in this)
        destination.add(transform(item))
    return destination
}

and the corresponding

map

would be generated from there. If it seems strange for the language to specifically optimize

on collections like this, we've already done it for

String

, converting repeated `+`s into `StringBuilder`s, we may as well offer something similar to collections.

Youssef Shoaib [MOD]

03/27/2021, 9:26 AM

Just fyi there's an experimental API in the stdlib that kind of solves this but not really. It'll basically take your second example and instead write it as this:

Copy code

val firstLetters = buildList(listA.size + listB.size) {
        listA.mapTo(this) { it.first() }
        listB.mapTo(this) { it.first() }
    }

and personally I think this one is clearer and pretty much concise because the intention is immensely clear: I'm building a list, with an initial size, and mapping some items into it.

Derek Peirce

03/27/2021, 8:34 PM

That does work better, but it's just simplifying

ArrayList<Char>(...).apply {

into

buildList(...) {

, it still requires separate

mapTo(this)

calls.

listA.map { it.first() } + listB.map { it.first() }

is by far the most concise form, as it doesn't include manually specifying the size (which a sufficiently advanced compiler could infer on its own) and avoids having to

mapTo(this)

Youssef Shoaib [MOD]

03/27/2021, 9:23 PM

@Derek Peirce I think I might have a concise way that this can be currently implemented in Kotlin, but it relates to another missing optimisation that I'm currently in the process of developing a compiler plugin for. (I'm writing out the code for how this can be implemented right now so I'll brb in 5 minutes)

👍🏻 1

Youssef Shoaib [MOD]

03/27/2021, 11:01 PM

Copy code

// Could be named much much better, but I'm trying to make this type explicitly visible here
// The weird nullability, extra boolean param, and Int return type can be abstracted away
// in a @JvmInline value class, which is part of what KT-44530 proposes.
typealias ListInTheProcessOfBuilding<T> = (MutableList<T>?, Boolean) -> Int

inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
  { list, shouldReturnSize ->
    this@plus.invoke(list, shouldReturnSize) + other.invoke(list, shouldReturnSize)
  }

/* Use this one if you dislike the call to toList at the end of the chain
inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): List<T> {
  val operation: ListInTheProcessOfBuilding<T> = (this@plus + other)
  return buildList(operation(null, true)) {
    operation(this, false)
  }
}
*/
fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  // Note that you lose non-local returns here because of noinline.
  val firstLetters = listA.map { it.first() } + listB.map { it.first() }
  println(firstLetters.toList())
}

/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
  { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }

inline fun <T> ListInTheProcessOfBuilding<T>.toList() = buildList { this@toList(this, false) }

(or, if you want it without the toList call (playground):

Copy code

// Could be named much much better, but I'm trying to make this type explicitly visible here
// The weird nullability, extra boolean param, and Int return type can be abstracted away
// in a @JvmInline value class, which is part of what KT-44530 proposes.
typealias ListInTheProcessOfBuilding<T> = (MutableList<T>?, Boolean) -> Int

inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): List<T> {
  return buildList(this@plus(null, true) + other(null, true)) {
    this@plus(this, false)
    other(this, false)
  }
}

fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  // Note that you lose non-local returns here because of noinline.
  val firstLetters = listA.map { it.first() } + listB.map { it.first() }
  println(firstLetters)
}

/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
  { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }

Or or or if you wanted to get really creative (playground):

Youssef Shoaib [MOD]

03/27/2021, 11:01 PM

Basically, your intended syntax can be implemented with something like the following (playground):

Youssef Shoaib [MOD]

03/27/2021, 11:01 PM

Copy code

// Could be named much much better, but I'm trying to make this type explicitly visible here
// The weird nullability, extra boolean param, and Int return type can be abstracted away
// in a @JvmInline value class, which is part of what KT-44530 proposes.
typealias ListInTheProcessOfBuilding<T> = (list: MutableList<T>?, shouldReturnSize: Boolean) -> Int
typealias FinalListInTheProcessOfBuilding<T> = (list: MutableList<T>?, shouldReturnSize: Boolean, unit: Unit) -> Int
inline val <T> ListInTheProcessOfBuilding<T>.size: Int get() = this(null, true)
inline fun <T> ListInTheProcessOfBuilding<T>.addTo(list: MutableList<T>) { this(list, false) }
inline val <T> FinalListInTheProcessOfBuilding<T>.size: Int get() = this(null, true, Unit)
inline fun <T> FinalListInTheProcessOfBuilding<T>.addTo(list: MutableList<T>) { this(list, false, Unit) }
inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
  { list, shouldReturnSize ->
    this@plus.invoke(list, shouldReturnSize) + other.invoke(list, shouldReturnSize)
  }
inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: FinalListInTheProcessOfBuilding<T>): List<T> = buildList(size + other.size) {
  this@plus.addTo(this)
  other.addTo(this)
}
fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  // Note that you lose non-local returns here because of noinline.
  val firstLetters: List<Char> = listA.map { it.first() } + listB.mapFinal { it.first() }
  val firstLettersInDifferentOrder: List<Char> = listB.map { it.first() } + listA.map { it.first() }.final
  println(firstLetters)
  println(firstLettersInDifferentOrder)
}
/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
  { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }
inline fun <T, R> List<T>.mapFinal(noinline transform: (T) -> R): FinalListInTheProcessOfBuilding<R> = map(transform).final
inline val <T> ListInTheProcessOfBuilding<T>.final: FinalListInTheProcessOfBuilding<T>
  get() = { list, shouldReturnSize, _ ->
    this(
      list,
      shouldReturnSize
    )
  }
inline fun <T> ListInTheProcessOfBuilding<T>.toList() = buildList { this@toList(this, false) }

Youssef Shoaib [MOD]

03/27/2021, 11:14 PM

Basically, this "missing optimisation" that I'm talking about is the fact that the Kotlin compiler doesn't inline lambdas returned by inlined lambdas and/or stored in local vals like in the above code. I've already filed a bug/proposal for this at KT-44530 (Go vote on it btw lol), but basically with this proposal the code above will run efficiently and all the lambdas will get inlined at the call site of the plus operator which should give you a nice and efficient series of calls that creates just one single ArrayList (in the buildList call). I'm currently working on a compiler plugin called kotlin-lambda-return-inliner that is intended to perform this optimisation and more. I'm probably going to add your usecase to the possible uses in the issue and as a sample/test in the compiler plugin because it is quite interesting and realistic. Also about the actual implementation itself of the ListInTheProcessOfBuilding the first one requires a

toList

call at the end, which IMHO is acceptable since it looks like what you would do to a Sequence to make it into a list for example and so it seems idiomatic. The second version immediately converts any 2 added map calls into a list, which works for simple usecases but is probably confusing for anyone who tries to use an odd number of map calls. The last one is what I would go for if

toList

was absolutely out of the question (because I strongly prefer the

toList

option) but if it was out of question then I'd go for this one because it still allows multiple calls to happen with only the last call collapsing the chain. Also initially I wanted to have 2 different plus operators based on the needed return type (so that you can either "request" a ListInTheProcessOfBuilding or a List with type inference) but after diving into it a bit it seemed impossible to do (although I'm sure that there probably is a way that I'm too stupid to come up with) and so if you do find a way to get that to work then please LMK

Youssef Shoaib [MOD]

03/27/2021, 11:56 PM

Actually, I'm stupid because thankfully List is an interface and with my compiler plugin interface calls to inline classes would also be optimised and so something like this will work (playground) :

Youssef Shoaib [MOD]

03/27/2021, 11:56 PM

Copy code

@file:Suppress("OVERRIDE_BY_INLINE", "NOTHING_TO_INLINE")

import java.util.*
import java.util.function.Consumer
import java.util.function.IntFunction
import java.util.stream.Stream
enum class ListInTheProcessOfBuildingCall {
  AddTo, Size, Materialize
}

inline class ListInTheProcessOfBuilding<T> @PublishedApi internal constructor(@PublishedApi internal val lambda: (list: MutableList<T>?, call: ListInTheProcessOfBuildingCall) -> Any?) :
  List<T> {
  override inline fun contains(element: T): Boolean = materialize.contains(element)

  override inline fun containsAll(elements: Collection<T>): Boolean = materialize.containsAll(elements)

  override inline fun get(index: Int): T = materialize.get(index)

  override inline fun indexOf(element: T): Int = materialize.indexOf(element)

  override inline fun isEmpty(): Boolean = materialize.isEmpty()
  override inline fun iterator(): Iterator<T> = materialize.iterator()

  override inline fun lastIndexOf(element: T): Int = materialize.lastIndexOf(element)

  override inline fun listIterator(): ListIterator<T> = materialize.listIterator()

  override inline fun listIterator(index: Int): ListIterator<T> = materialize.listIterator(index)

  override inline fun spliterator(): Spliterator<T> = materialize.spliterator()
  override inline fun subList(fromIndex: Int, toIndex: Int): List<T> = materialize.subList(fromIndex, toIndex)

  override inline fun forEach(action: Consumer<in T>?) = materialize.forEach(action)

  override inline fun parallelStream(): Stream<T> = materialize.parallelStream()

  override inline fun stream(): Stream<T> = materialize.stream()

  override inline fun toString(): String = "ListInTheProcessOfBuilding(materialize=$materialize)"

  override inline val size: Int get() = lambda(null, ListInTheProcessOfBuildingCall.Size) as Int
  inline val materialize: List<T> get() = lambda(null, ListInTheProcessOfBuildingCall.Materialize) as List<T>
  inline fun addTo(list: MutableList<T>) {
    lambda(list, ListInTheProcessOfBuildingCall.AddTo)
  }
}

inline fun <T> ListInTheProcessOfBuilding(
  size: Int,
  noinline add: (MutableList<T>) -> Unit
): ListInTheProcessOfBuilding<T> {
  // variables captured in a closure are basically the same as a var in a class
  var materializedValue: List<T>? = null
  return ListInTheProcessOfBuilding { list, call ->
    when (call) {
      ListInTheProcessOfBuildingCall.AddTo -> add(list!!)
      ListInTheProcessOfBuildingCall.Size -> size
      ListInTheProcessOfBuildingCall.Materialize -> materializedValue
        ?: buildList { add(this) }.also { materializedValue = it }
    }
  }
}


operator fun <T> ListInTheProcessOfBuilding<T>.plus(other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
  ListInTheProcessOfBuilding(size + other.size) {
    addTo(it)
    other.addTo(it)
  }
operator fun <T> ListInTheProcessOfBuilding<T>.plus(other: List<T>): ListInTheProcessOfBuilding<T> =
  ListInTheProcessOfBuilding(size + other.size) {
    addTo(it)
    it.addAll(other)
  }


fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  val listC = listOf("function", "property", "object", "class")
  val listD = listOf("test", "best", "nest")
  val listE = listOf('x', 'y')
  // Note that you lose non-local returns here because of noinline.
  val firstLetters: List<Char> = listA.map { it.first() } + listB.map { it.first() } + listC.map { it.first() } + listD.map { it.first() } + listE
  println(firstLetters)
}

/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> = ListInTheProcessOfBuilding(size) { mapTo(it, transform) }

so yeah personally I prefer this one actually because it works perfectly as a drop-in replacement for List and it looks neat on the call site (Also I'm not splitting my messages on purpose, Slack for some reason just does that automatically so sorry for that)

Derek Peirce

03/29/2021, 5:26 AM

This looks like it would work (I'm guessing with replacing

buildList

with

buildList(size)

to get it right-sized) as long as the compiler's ability to inline were improved. However, it also relies on a final call to

materialize

that would be awkwardly wrapped around the output:

(listA + listB + ... + listE).materialize

. At that point, it becomes clear that this would have to be optimized at the level of the language, like how concatenating multiple strings together becomes a

StringBuilder

call. It's necessary so that

"a" + "b" + "c"

doesn't require a

toString()

call around it, the only way for the compiler to know when

is adding to the

StringBuilder

vs building the final string is a better representation in the language.

Youssef Shoaib [MOD]

03/30/2021, 2:25 AM

@Derek Peirce There's no need to materialize to be called as my example clearly shows.

ListInTheProcessOfBuilding

extends List on its own, and materialize is just simply an implementation detail of

ListInTheProcessOfBuilding

that any consumer doesn't need to worry about since, again, any List-related method just operates on List as an interface (or in a lot of cases Collection which is a superinterface of List). Every single method in the List interface is delegated inside of

ListInTheProcessOfBuilding

to a

materialize

call followed by the intended method. I could provide a more detailed example if you want one where a normal listOf is replaced with a

ListInTheProcessOfBuilding

in normal code so that you can see that it works right away with no awkward calls to

materialize

or anything like that, but simply think of it as if

StringBuilder

was a subtype of String, which means that you won't need a

toString

call. (I didn't realize by the way that I missed up the buildList size thing so whoops nice catch). Again, I'm currently working on a compiler plugin to support this, and there's an issue already for this to be supported, and, tbh, from what I've been seeing so far while trying to implement this plugin, this isn't a complicated optimization to implement, and like the compiler itself already has more complicated things in it, and so yeah hopefully this'll be implemented soon enough

Derek Peirce

03/31/2021, 2:51 AM

Ah, right, it is its own

List

. However, it still doesn't function quite like a naturally-built

List

, as it is instead a lazy

List

. If the list is only used within the current method, then a theoretically powerful enough inlining compiler could optimize that out, but if the list is passed into a non-inline method or returned from a non-inline method, then you're forced to keep the list-building components in a closure unless you manually call

toList()

. This incurs performance and garbage-collection penalties if laziness isn't desired.

Youssef Shoaib [MOD]

03/31/2021, 3:54 AM

For the performance side of things laziness-wise, the moment that you use that list in any way, shape, sort, or form as an actual list it will materialize and so the laziness won't last that long. Garbage-collection wise yes it is problematic if you pass it to a regular function, but if you look at the stdlib for example literally all of the functions are inline, and some library authors choose to have most of their methods as inline. If this becomes an actual language feature/optimization, then I guarantee you that a lot of people will then mark all the functions they possibly can as inline if they take in some form of interface or any kind of object that can possibly be a magic lambda thingy.

Derek Peirce

03/31/2021, 4:19 AM

Inlining isn't a magic wand, it comes with a cost. The larger a method is, the more expensive it is to inline it, and every inlined method called by an inline method contributes to that. Inlining just to optimize something that never should be lazy to make it truly not lazy would be a terrible trade-off in many cases. Overriding methods also can't be inlined at all, so any list passed up or down through such a boundary would have to become lazy.

7 Views

Open in Slack

Previous Next