https://kotlinlang.org logo
#language-evolution
Title
# language-evolution
d

Derek Peirce

03/27/2021, 6:04 AM
Suppose that you wanted to combine two generated lists:
Copy code
val listA = listOf("foo", "bar")
    val listB = listOf("alpha", "beta", "gamma")

    val firstLetters = listA.map { it.first() } + listB.map { it.first() }
+
works, and it's as concise as can be, so everything is fine, right? Well, it works, but not very efficiently. If we inline the methods, we see that each
map
creates its own
ArrayList
, and then a third
ArrayList
is created with
+
to store all of them, which is two more list objects than we actually needed, because
+
respected that the individual lists shouldn't be mutated, even though they're never used again beyond this block of code. Consider a rewritten, more efficient form:
Copy code
val firstLetters = ArrayList<Char>(listA.size + listB.size).apply {
        listA.mapTo(this) { it.first() }
        listB.mapTo(this) { it.first() }
    }
This creates exactly the single list that we need, no wasted objects. It's also terribly ugly. But what if Kotlin allowed us to use the concise expressiveness of the first functional line with the efficiency of the four lines above? It would have to recognize that
map
has a corresponding
mapTo
, and then that the results of
map
are used only by
+
. The second requirement is easy, but the first requires a bit more. What if
mapTo
didn't exist as a separate method from
map
? Suppose
map
were written as:
Copy code
public inline fun <T, R> Iterable<T>.map(transform: (T) -> R): List<R> {
    @CollectionBuilder val destination = ArrayList<R>(collectionSizeOrDefault(10))
    for (item in this)
        destination.add(transform(item))
    return destination
}
@CollectionBuilder
would indicate to the compiler that it should build two versions of
map
, the one above and a transformed one, `map_collectionBuilder`:
Copy code
public inline fun <T, R, C : MutableCollection<in R>> Iterable<T>.map_collectionBuilder(destination: C, transform: (T) -> R): C {
    for (item in this)
        destination.add(transform(item))
    return destination
}
This is, of course, the original
mapTo
. As long as the result of
add
is never checked, it should be possible for any method that builds its own collection to instead append to an existing one, with the type
List
being effectively the default output. The compiler can then create its own empty list and fill it with each
map_collectionBuilder
call automatically. The only remaining problem is how to right-size that list. Maybe it would add the two existing
collectionSizeOrDefault
calls together, but that could get tricky to find. Or maybe the
map
method would need an annotation to give a hint to this collection-optimization step that this method creates a collection that is exactly as large as the input,
@SameSizeOutput
, while others like
filter
may indicate that as a ceiling,
@SameSizeOrSmallerOutput
. In such a case, perhaps the methods would be inverted, and we write this one method instead:
Copy code
public inline fun <T, R, C : MutableCollection<in R>> Iterable<T>.mapTo(@CollectionBuilder(directName = "map", size = SAME_SIZE) destination: C, transform: (T) -> R): C {
    for (item in this)
        destination.add(transform(item))
    return destination
}
and the corresponding
map
would be generated from there. If it seems strange for the language to specifically optimize
+
on collections like this, we've already done it for
String
, converting repeated `+`s into `StringBuilder`s, we may as well offer something similar to collections.
y

Youssef Shoaib [MOD]

03/27/2021, 9:26 AM
Just fyi there's an experimental API in the stdlib that kind of solves this but not really. It'll basically take your second example and instead write it as this:
Copy code
val firstLetters = buildList(listA.size + listB.size) {
        listA.mapTo(this) { it.first() }
        listB.mapTo(this) { it.first() }
    }
and personally I think this one is clearer and pretty much concise because the intention is immensely clear: I'm building a list, with an initial size, and mapping some items into it.
d

Derek Peirce

03/27/2021, 8:34 PM
That does work better, but it's just simplifying
ArrayList<Char>(...).apply {
into
buildList(...) {
, it still requires separate
mapTo(this)
calls.
listA.map { it.first() } + listB.map { it.first() }
is by far the most concise form, as it doesn't include manually specifying the size (which a sufficiently advanced compiler could infer on its own) and avoids having to
mapTo(this)
.
y

Youssef Shoaib [MOD]

03/27/2021, 9:23 PM
@Derek Peirce I think I might have a concise way that this can be currently implemented in Kotlin, but it relates to another missing optimisation that I'm currently in the process of developing a compiler plugin for. (I'm writing out the code for how this can be implemented right now so I'll brb in 5 minutes)
👍🏻 1
Copy code
// Could be named much much better, but I'm trying to make this type explicitly visible here
// The weird nullability, extra boolean param, and Int return type can be abstracted away
// in a @JvmInline value class, which is part of what KT-44530 proposes.
typealias ListInTheProcessOfBuilding<T> = (MutableList<T>?, Boolean) -> Int

inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
  { list, shouldReturnSize ->
    this@plus.invoke(list, shouldReturnSize) + other.invoke(list, shouldReturnSize)
  }

/* Use this one if you dislike the call to toList at the end of the chain
inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): List<T> {
  val operation: ListInTheProcessOfBuilding<T> = (this@plus + other)
  return buildList(operation(null, true)) {
    operation(this, false)
  }
}
*/
fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  // Note that you lose non-local returns here because of noinline.
  val firstLetters = listA.map { it.first() } + listB.map { it.first() }
  println(firstLetters.toList())
}

/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
  { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }

inline fun <T> ListInTheProcessOfBuilding<T>.toList() = buildList { this@toList(this, false) }
(or, if you want it without the toList call (playground):
Copy code
// Could be named much much better, but I'm trying to make this type explicitly visible here
// The weird nullability, extra boolean param, and Int return type can be abstracted away
// in a @JvmInline value class, which is part of what KT-44530 proposes.
typealias ListInTheProcessOfBuilding<T> = (MutableList<T>?, Boolean) -> Int

inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): List<T> {
  return buildList(this@plus(null, true) + other(null, true)) {
    this@plus(this, false)
    other(this, false)
  }
}

fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  // Note that you lose non-local returns here because of noinline.
  val firstLetters = listA.map { it.first() } + listB.map { it.first() }
  println(firstLetters)
}

/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
  { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }
Or or or if you wanted to get really creative (playground):
Basically, your intended syntax can be implemented with something like the following (playground):
Copy code
// Could be named much much better, but I'm trying to make this type explicitly visible here
// The weird nullability, extra boolean param, and Int return type can be abstracted away
// in a @JvmInline value class, which is part of what KT-44530 proposes.
typealias ListInTheProcessOfBuilding<T> = (list: MutableList<T>?, shouldReturnSize: Boolean) -> Int
typealias FinalListInTheProcessOfBuilding<T> = (list: MutableList<T>?, shouldReturnSize: Boolean, unit: Unit) -> Int
inline val <T> ListInTheProcessOfBuilding<T>.size: Int get() = this(null, true)
inline fun <T> ListInTheProcessOfBuilding<T>.addTo(list: MutableList<T>) { this(list, false) }
inline val <T> FinalListInTheProcessOfBuilding<T>.size: Int get() = this(null, true, Unit)
inline fun <T> FinalListInTheProcessOfBuilding<T>.addTo(list: MutableList<T>) { this(list, false, Unit) }
inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
  { list, shouldReturnSize ->
    this@plus.invoke(list, shouldReturnSize) + other.invoke(list, shouldReturnSize)
  }
inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: FinalListInTheProcessOfBuilding<T>): List<T> = buildList(size + other.size) {
  this@plus.addTo(this)
  other.addTo(this)
}
fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  // Note that you lose non-local returns here because of noinline.
  val firstLetters: List<Char> = listA.map { it.first() } + listB.mapFinal { it.first() }
  val firstLettersInDifferentOrder: List<Char> = listB.map { it.first() } + listA.map { it.first() }.final
  println(firstLetters)
  println(firstLettersInDifferentOrder)
}
/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
  { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }
inline fun <T, R> List<T>.mapFinal(noinline transform: (T) -> R): FinalListInTheProcessOfBuilding<R> = map(transform).final
inline val <T> ListInTheProcessOfBuilding<T>.final: FinalListInTheProcessOfBuilding<T>
  get() = { list, shouldReturnSize, _ ->
    this(
      list,
      shouldReturnSize
    )
  }
inline fun <T> ListInTheProcessOfBuilding<T>.toList() = buildList { this@toList(this, false) }
Basically, this "missing optimisation" that I'm talking about is the fact that the Kotlin compiler doesn't inline lambdas returned by inlined lambdas and/or stored in local vals like in the above code. I've already filed a bug/proposal for this at KT-44530 (Go vote on it btw lol), but basically with this proposal the code above will run efficiently and all the lambdas will get inlined at the call site of the plus operator which should give you a nice and efficient series of calls that creates just one single ArrayList (in the buildList call). I'm currently working on a compiler plugin called kotlin-lambda-return-inliner that is intended to perform this optimisation and more. I'm probably going to add your usecase to the possible uses in the issue and as a sample/test in the compiler plugin because it is quite interesting and realistic. Also about the actual implementation itself of the ListInTheProcessOfBuilding the first one requires a
toList
call at the end, which IMHO is acceptable since it looks like what you would do to a Sequence to make it into a list for example and so it seems idiomatic. The second version immediately converts any 2 added map calls into a list, which works for simple usecases but is probably confusing for anyone who tries to use an odd number of map calls. The last one is what I would go for if
toList
was absolutely out of the question (because I strongly prefer the
toList
option) but if it was out of question then I'd go for this one because it still allows multiple calls to happen with only the last call collapsing the chain. Also initially I wanted to have 2 different plus operators based on the needed return type (so that you can either "request" a ListInTheProcessOfBuilding or a List with type inference) but after diving into it a bit it seemed impossible to do (although I'm sure that there probably is a way that I'm too stupid to come up with) and so if you do find a way to get that to work then please LMK
Actually, I'm stupid because thankfully List is an interface and with my compiler plugin interface calls to inline classes would also be optimised and so something like this will work (playground) :
Copy code
@file:Suppress("OVERRIDE_BY_INLINE", "NOTHING_TO_INLINE")

import java.util.*
import java.util.function.Consumer
import java.util.function.IntFunction
import java.util.stream.Stream
enum class ListInTheProcessOfBuildingCall {
  AddTo, Size, Materialize
}

inline class ListInTheProcessOfBuilding<T> @PublishedApi internal constructor(@PublishedApi internal val lambda: (list: MutableList<T>?, call: ListInTheProcessOfBuildingCall) -> Any?) :
  List<T> {
  override inline fun contains(element: T): Boolean = materialize.contains(element)

  override inline fun containsAll(elements: Collection<T>): Boolean = materialize.containsAll(elements)

  override inline fun get(index: Int): T = materialize.get(index)

  override inline fun indexOf(element: T): Int = materialize.indexOf(element)

  override inline fun isEmpty(): Boolean = materialize.isEmpty()
  override inline fun iterator(): Iterator<T> = materialize.iterator()

  override inline fun lastIndexOf(element: T): Int = materialize.lastIndexOf(element)

  override inline fun listIterator(): ListIterator<T> = materialize.listIterator()

  override inline fun listIterator(index: Int): ListIterator<T> = materialize.listIterator(index)

  override inline fun spliterator(): Spliterator<T> = materialize.spliterator()
  override inline fun subList(fromIndex: Int, toIndex: Int): List<T> = materialize.subList(fromIndex, toIndex)

  override inline fun forEach(action: Consumer<in T>?) = materialize.forEach(action)

  override inline fun parallelStream(): Stream<T> = materialize.parallelStream()

  override inline fun stream(): Stream<T> = materialize.stream()

  override inline fun toString(): String = "ListInTheProcessOfBuilding(materialize=$materialize)"

  override inline val size: Int get() = lambda(null, ListInTheProcessOfBuildingCall.Size) as Int
  inline val materialize: List<T> get() = lambda(null, ListInTheProcessOfBuildingCall.Materialize) as List<T>
  inline fun addTo(list: MutableList<T>) {
    lambda(list, ListInTheProcessOfBuildingCall.AddTo)
  }
}

inline fun <T> ListInTheProcessOfBuilding(
  size: Int,
  noinline add: (MutableList<T>) -> Unit
): ListInTheProcessOfBuilding<T> {
  // variables captured in a closure are basically the same as a var in a class
  var materializedValue: List<T>? = null
  return ListInTheProcessOfBuilding { list, call ->
    when (call) {
      ListInTheProcessOfBuildingCall.AddTo -> add(list!!)
      ListInTheProcessOfBuildingCall.Size -> size
      ListInTheProcessOfBuildingCall.Materialize -> materializedValue
        ?: buildList { add(this) }.also { materializedValue = it }
    }
  }
}


operator fun <T> ListInTheProcessOfBuilding<T>.plus(other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
  ListInTheProcessOfBuilding(size + other.size) {
    addTo(it)
    other.addTo(it)
  }
operator fun <T> ListInTheProcessOfBuilding<T>.plus(other: List<T>): ListInTheProcessOfBuilding<T> =
  ListInTheProcessOfBuilding(size + other.size) {
    addTo(it)
    it.addAll(other)
  }


fun main() {
  val listA = listOf("foo", "bar")
  val listB = listOf("alpha", "beta", "gamma")
  val listC = listOf("function", "property", "object", "class")
  val listD = listOf("test", "best", "nest")
  val listE = listOf('x', 'y')
  // Note that you lose non-local returns here because of noinline.
  val firstLetters: List<Char> = listA.map { it.first() } + listB.map { it.first() } + listC.map { it.first() } + listD.map { it.first() } + listE
  println(firstLetters)
}

/* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> = ListInTheProcessOfBuilding(size) { mapTo(it, transform) }
so yeah personally I prefer this one actually because it works perfectly as a drop-in replacement for List and it looks neat on the call site (Also I'm not splitting my messages on purpose, Slack for some reason just does that automatically so sorry for that)
d

Derek Peirce

03/29/2021, 5:26 AM
This looks like it would work (I'm guessing with replacing
buildList
with
buildList(size)
to get it right-sized) as long as the compiler's ability to inline were improved. However, it also relies on a final call to
materialize
that would be awkwardly wrapped around the output:
(listA + listB + ... + listE).materialize
. At that point, it becomes clear that this would have to be optimized at the level of the language, like how concatenating multiple strings together becomes a
StringBuilder
call. It's necessary so that
"a" + "b" + "c"
doesn't require a
toString()
call around it, the only way for the compiler to know when
+
is adding to the
StringBuilder
vs building the final string is a better representation in the language.
y

Youssef Shoaib [MOD]

03/30/2021, 2:25 AM
@Derek Peirce There's no need to materialize to be called as my example clearly shows.
ListInTheProcessOfBuilding
extends List on its own, and materialize is just simply an implementation detail of
ListInTheProcessOfBuilding
that any consumer doesn't need to worry about since, again, any List-related method just operates on List as an interface (or in a lot of cases Collection which is a superinterface of List). Every single method in the List interface is delegated inside of
ListInTheProcessOfBuilding
to a
materialize
call followed by the intended method. I could provide a more detailed example if you want one where a normal listOf is replaced with a
ListInTheProcessOfBuilding
in normal code so that you can see that it works right away with no awkward calls to
materialize
or anything like that, but simply think of it as if
StringBuilder
was a subtype of String, which means that you won't need a
toString
call. (I didn't realize by the way that I missed up the buildList size thing so whoops nice catch). Again, I'm currently working on a compiler plugin to support this, and there's an issue already for this to be supported, and, tbh, from what I've been seeing so far while trying to implement this plugin, this isn't a complicated optimization to implement, and like the compiler itself already has more complicated things in it, and so yeah hopefully this'll be implemented soon enough
d

Derek Peirce

03/31/2021, 2:51 AM
Ah, right, it is its own
List
. However, it still doesn't function quite like a naturally-built
List
, as it is instead a lazy
List
. If the list is only used within the current method, then a theoretically powerful enough inlining compiler could optimize that out, but if the list is passed into a non-inline method or returned from a non-inline method, then you're forced to keep the list-building components in a closure unless you manually call
toList()
. This incurs performance and garbage-collection penalties if laziness isn't desired.
y

Youssef Shoaib [MOD]

03/31/2021, 3:54 AM
For the performance side of things laziness-wise, the moment that you use that list in any way, shape, sort, or form as an actual list it will materialize and so the laziness won't last that long. Garbage-collection wise yes it is problematic if you pass it to a regular function, but if you look at the stdlib for example literally all of the functions are inline, and some library authors choose to have most of their methods as inline. If this becomes an actual language feature/optimization, then I guarantee you that a lot of people will then mark all the functions they possibly can as inline if they take in some form of interface or any kind of object that can possibly be a magic lambda thingy.
d

Derek Peirce

03/31/2021, 4:19 AM
Inlining isn't a magic wand, it comes with a cost. The larger a method is, the more expensive it is to inline it, and every inlined method called by an inline method contributes to that. Inlining just to optimize something that never should be lazy to make it truly not lazy would be a terrible trade-off in many cases. Overriding methods also can't be inlined at all, so any list passed up or down through such a boundary would have to become lazy.
6 Views