Derek Peirce

    Derek Peirce

    1 year ago
    Suppose that you wanted to combine two generated lists:
    val listA = listOf("foo", "bar")
        val listB = listOf("alpha", "beta", "gamma")
    
        val firstLetters = listA.map { it.first() } + listB.map { it.first() }
    +
    works, and it's as concise as can be, so everything is fine, right? Well, it works, but not very efficiently. If we inline the methods, we see that each
    map
    creates its own
    ArrayList
    , and then a third
    ArrayList
    is created with
    +
    to store all of them, which is two more list objects than we actually needed, because
    +
    respected that the individual lists shouldn't be mutated, even though they're never used again beyond this block of code. Consider a rewritten, more efficient form:
    val firstLetters = ArrayList<Char>(listA.size + listB.size).apply {
            listA.mapTo(this) { it.first() }
            listB.mapTo(this) { it.first() }
        }
    This creates exactly the single list that we need, no wasted objects. It's also terribly ugly. But what if Kotlin allowed us to use the concise expressiveness of the first functional line with the efficiency of the four lines above? It would have to recognize that
    map
    has a corresponding
    mapTo
    , and then that the results of
    map
    are used only by
    +
    . The second requirement is easy, but the first requires a bit more. What if
    mapTo
    didn't exist as a separate method from
    map
    ? Suppose
    map
    were written as:
    public inline fun <T, R> Iterable<T>.map(transform: (T) -> R): List<R> {
        @CollectionBuilder val destination = ArrayList<R>(collectionSizeOrDefault(10))
        for (item in this)
            destination.add(transform(item))
        return destination
    }
    @CollectionBuilder
    would indicate to the compiler that it should build two versions of
    map
    , the one above and a transformed one, map_collectionBuilder:
    public inline fun <T, R, C : MutableCollection<in R>> Iterable<T>.map_collectionBuilder(destination: C, transform: (T) -> R): C {
        for (item in this)
            destination.add(transform(item))
        return destination
    }
    This is, of course, the original
    mapTo
    . As long as the result of
    add
    is never checked, it should be possible for any method that builds its own collection to instead append to an existing one, with the type
    List
    being effectively the default output. The compiler can then create its own empty list and fill it with each
    map_collectionBuilder
    call automatically. The only remaining problem is how to right-size that list. Maybe it would add the two existing
    collectionSizeOrDefault
    calls together, but that could get tricky to find. Or maybe the
    map
    method would need an annotation to give a hint to this collection-optimization step that this method creates a collection that is exactly as large as the input,
    @SameSizeOutput
    , while others like
    filter
    may indicate that as a ceiling,
    @SameSizeOrSmallerOutput
    . In such a case, perhaps the methods would be inverted, and we write this one method instead:
    public inline fun <T, R, C : MutableCollection<in R>> Iterable<T>.mapTo(@CollectionBuilder(directName = "map", size = SAME_SIZE) destination: C, transform: (T) -> R): C {
        for (item in this)
            destination.add(transform(item))
        return destination
    }
    and the corresponding
    map
    would be generated from there. If it seems strange for the language to specifically optimize
    +
    on collections like this, we've already done it for
    String
    , converting repeated +s into StringBuilders, we may as well offer something similar to collections.
    y

    Youssef Shoaib [MOD]

    1 year ago
    Just fyi there's an experimental API in the stdlib that kind of solves this but not really. It'll basically take your second example and instead write it as this:
    val firstLetters = buildList(listA.size + listB.size) {
            listA.mapTo(this) { it.first() }
            listB.mapTo(this) { it.first() }
        }
    and personally I think this one is clearer and pretty much concise because the intention is immensely clear: I'm building a list, with an initial size, and mapping some items into it.
    Derek Peirce

    Derek Peirce

    1 year ago
    That does work better, but it's just simplifying
    ArrayList<Char>(...).apply {
    into
    buildList(...) {
    , it still requires separate
    mapTo(this)
    calls.
    listA.map { it.first() } + listB.map { it.first() }
    is by far the most concise form, as it doesn't include manually specifying the size (which a sufficiently advanced compiler could infer on its own) and avoids having to
    mapTo(this)
    .
    y

    Youssef Shoaib [MOD]

    1 year ago
    @Derek Peirce I think I might have a concise way that this can be currently implemented in Kotlin, but it relates to another missing optimisation that I'm currently in the process of developing a compiler plugin for. (I'm writing out the code for how this can be implemented right now so I'll brb in 5 minutes)
    // Could be named much much better, but I'm trying to make this type explicitly visible here
    // The weird nullability, extra boolean param, and Int return type can be abstracted away
    // in a @JvmInline value class, which is part of what KT-44530 proposes.
    typealias ListInTheProcessOfBuilding<T> = (MutableList<T>?, Boolean) -> Int
    
    inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
      { list, shouldReturnSize ->
        this@plus.invoke(list, shouldReturnSize) + other.invoke(list, shouldReturnSize)
      }
    
    /* Use this one if you dislike the call to toList at the end of the chain
    inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): List<T> {
      val operation: ListInTheProcessOfBuilding<T> = (this@plus + other)
      return buildList(operation(null, true)) {
        operation(this, false)
      }
    }
    */
    fun main() {
      val listA = listOf("foo", "bar")
      val listB = listOf("alpha", "beta", "gamma")
      // Note that you lose non-local returns here because of noinline.
      val firstLetters = listA.map { it.first() } + listB.map { it.first() }
      println(firstLetters.toList())
    }
    
    /* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
    inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
      { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }
    
    inline fun <T> ListInTheProcessOfBuilding<T>.toList() = buildList { this@toList(this, false) }
    (or, if you want it without the toList call (playground):
    // Could be named much much better, but I'm trying to make this type explicitly visible here
    // The weird nullability, extra boolean param, and Int return type can be abstracted away
    // in a @JvmInline value class, which is part of what KT-44530 proposes.
    typealias ListInTheProcessOfBuilding<T> = (MutableList<T>?, Boolean) -> Int
    
    inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): List<T> {
      return buildList(this@plus(null, true) + other(null, true)) {
        this@plus(this, false)
        other(this, false)
      }
    }
    
    fun main() {
      val listA = listOf("foo", "bar")
      val listB = listOf("alpha", "beta", "gamma")
      // Note that you lose non-local returns here because of noinline.
      val firstLetters = listA.map { it.first() } + listB.map { it.first() }
      println(firstLetters)
    }
    
    /* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
    inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
      { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }
    Or or or if you wanted to get really creative (playground):
    Basically, your intended syntax can be implemented with something like the following (playground):
    // Could be named much much better, but I'm trying to make this type explicitly visible here
    // The weird nullability, extra boolean param, and Int return type can be abstracted away
    // in a @JvmInline value class, which is part of what KT-44530 proposes.
    typealias ListInTheProcessOfBuilding<T> = (list: MutableList<T>?, shouldReturnSize: Boolean) -> Int
    typealias FinalListInTheProcessOfBuilding<T> = (list: MutableList<T>?, shouldReturnSize: Boolean, unit: Unit) -> Int
    inline val <T> ListInTheProcessOfBuilding<T>.size: Int get() = this(null, true)
    inline fun <T> ListInTheProcessOfBuilding<T>.addTo(list: MutableList<T>) { this(list, false) }
    inline val <T> FinalListInTheProcessOfBuilding<T>.size: Int get() = this(null, true, Unit)
    inline fun <T> FinalListInTheProcessOfBuilding<T>.addTo(list: MutableList<T>) { this(list, false, Unit) }
    inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
      { list, shouldReturnSize ->
        this@plus.invoke(list, shouldReturnSize) + other.invoke(list, shouldReturnSize)
      }
    inline operator fun <T> ListInTheProcessOfBuilding<T>.plus(noinline other: FinalListInTheProcessOfBuilding<T>): List<T> = buildList(size + other.size) {
      this@plus.addTo(this)
      other.addTo(this)
    }
    fun main() {
      val listA = listOf("foo", "bar")
      val listB = listOf("alpha", "beta", "gamma")
      // Note that you lose non-local returns here because of noinline.
      val firstLetters: List<Char> = listA.map { it.first() } + listB.mapFinal { it.first() }
      val firstLettersInDifferentOrder: List<Char> = listB.map { it.first() } + listA.map { it.first() }.final
      println(firstLetters)
      println(firstLettersInDifferentOrder)
    }
    /* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
    inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> =
      { list, shouldReturnSize -> if (shouldReturnSize) size else mapTo(list!!, transform).let { 0 } }
    inline fun <T, R> List<T>.mapFinal(noinline transform: (T) -> R): FinalListInTheProcessOfBuilding<R> = map(transform).final
    inline val <T> ListInTheProcessOfBuilding<T>.final: FinalListInTheProcessOfBuilding<T>
      get() = { list, shouldReturnSize, _ ->
        this(
          list,
          shouldReturnSize
        )
      }
    inline fun <T> ListInTheProcessOfBuilding<T>.toList() = buildList { this@toList(this, false) }
    Basically, this "missing optimisation" that I'm talking about is the fact that the Kotlin compiler doesn't inline lambdas returned by inlined lambdas and/or stored in local vals like in the above code. I've already filed a bug/proposal for this at KT-44530 (Go vote on it btw lol), but basically with this proposal the code above will run efficiently and all the lambdas will get inlined at the call site of the plus operator which should give you a nice and efficient series of calls that creates just one single ArrayList (in the buildList call). I'm currently working on a compiler plugin called kotlin-lambda-return-inliner that is intended to perform this optimisation and more. I'm probably going to add your usecase to the possible uses in the issue and as a sample/test in the compiler plugin because it is quite interesting and realistic. Also about the actual implementation itself of the ListInTheProcessOfBuilding the first one requires a
    toList
    call at the end, which IMHO is acceptable since it looks like what you would do to a Sequence to make it into a list for example and so it seems idiomatic. The second version immediately converts any 2 added map calls into a list, which works for simple usecases but is probably confusing for anyone who tries to use an odd number of map calls. The last one is what I would go for if
    toList
    was absolutely out of the question (because I strongly prefer the
    toList
    option) but if it was out of question then I'd go for this one because it still allows multiple calls to happen with only the last call collapsing the chain. Also initially I wanted to have 2 different plus operators based on the needed return type (so that you can either "request" a ListInTheProcessOfBuilding or a List with type inference) but after diving into it a bit it seemed impossible to do (although I'm sure that there probably is a way that I'm too stupid to come up with) and so if you do find a way to get that to work then please LMK
    Actually, I'm stupid because thankfully List is an interface and with my compiler plugin interface calls to inline classes would also be optimised and so something like this will work (playground) :
    @file:Suppress("OVERRIDE_BY_INLINE", "NOTHING_TO_INLINE")
    
    import java.util.*
    import java.util.function.Consumer
    import java.util.function.IntFunction
    import java.util.stream.Stream
    enum class ListInTheProcessOfBuildingCall {
      AddTo, Size, Materialize
    }
    
    inline class ListInTheProcessOfBuilding<T> @PublishedApi internal constructor(@PublishedApi internal val lambda: (list: MutableList<T>?, call: ListInTheProcessOfBuildingCall) -> Any?) :
      List<T> {
      override inline fun contains(element: T): Boolean = materialize.contains(element)
    
      override inline fun containsAll(elements: Collection<T>): Boolean = materialize.containsAll(elements)
    
      override inline fun get(index: Int): T = materialize.get(index)
    
      override inline fun indexOf(element: T): Int = materialize.indexOf(element)
    
      override inline fun isEmpty(): Boolean = materialize.isEmpty()
      override inline fun iterator(): Iterator<T> = materialize.iterator()
    
      override inline fun lastIndexOf(element: T): Int = materialize.lastIndexOf(element)
    
      override inline fun listIterator(): ListIterator<T> = materialize.listIterator()
    
      override inline fun listIterator(index: Int): ListIterator<T> = materialize.listIterator(index)
    
      override inline fun spliterator(): Spliterator<T> = materialize.spliterator()
      override inline fun subList(fromIndex: Int, toIndex: Int): List<T> = materialize.subList(fromIndex, toIndex)
    
      override inline fun forEach(action: Consumer<in T>?) = materialize.forEach(action)
    
      override inline fun parallelStream(): Stream<T> = materialize.parallelStream()
    
      override inline fun stream(): Stream<T> = materialize.stream()
    
      override inline fun toString(): String = "ListInTheProcessOfBuilding(materialize=$materialize)"
    
      override inline val size: Int get() = lambda(null, ListInTheProcessOfBuildingCall.Size) as Int
      inline val materialize: List<T> get() = lambda(null, ListInTheProcessOfBuildingCall.Materialize) as List<T>
      inline fun addTo(list: MutableList<T>) {
        lambda(list, ListInTheProcessOfBuildingCall.AddTo)
      }
    }
    
    inline fun <T> ListInTheProcessOfBuilding(
      size: Int,
      noinline add: (MutableList<T>) -> Unit
    ): ListInTheProcessOfBuilding<T> {
      // variables captured in a closure are basically the same as a var in a class
      var materializedValue: List<T>? = null
      return ListInTheProcessOfBuilding { list, call ->
        when (call) {
          ListInTheProcessOfBuildingCall.AddTo -> add(list!!)
          ListInTheProcessOfBuildingCall.Size -> size
          ListInTheProcessOfBuildingCall.Materialize -> materializedValue
            ?: buildList { add(this) }.also { materializedValue = it }
        }
      }
    }
    
    
    operator fun <T> ListInTheProcessOfBuilding<T>.plus(other: ListInTheProcessOfBuilding<T>): ListInTheProcessOfBuilding<T> =
      ListInTheProcessOfBuilding(size + other.size) {
        addTo(it)
        other.addTo(it)
      }
    operator fun <T> ListInTheProcessOfBuilding<T>.plus(other: List<T>): ListInTheProcessOfBuilding<T> =
      ListInTheProcessOfBuilding(size + other.size) {
        addTo(it)
        it.addAll(other)
      }
    
    
    fun main() {
      val listA = listOf("foo", "bar")
      val listB = listOf("alpha", "beta", "gamma")
      val listC = listOf("function", "property", "object", "class")
      val listD = listOf("test", "best", "nest")
      val listE = listOf('x', 'y')
      // Note that you lose non-local returns here because of noinline.
      val firstLetters: List<Char> = listA.map { it.first() } + listB.map { it.first() } + listC.map { it.first() } + listD.map { it.first() } + listE
      println(firstLetters)
    }
    
    /* noinline is needed just to get the code compiling, but hopefully with the compiler plugin that modifier won't be needed. I haven't figured out how to suppress that error though yet lol */
    inline fun <T, R> List<T>.map(noinline transform: (T) -> R): ListInTheProcessOfBuilding<R> = ListInTheProcessOfBuilding(size) { mapTo(it, transform) }
    so yeah personally I prefer this one actually because it works perfectly as a drop-in replacement for List and it looks neat on the call site (Also I'm not splitting my messages on purpose, Slack for some reason just does that automatically so sorry for that)
    Derek Peirce

    Derek Peirce

    1 year ago
    This looks like it would work (I'm guessing with replacing
    buildList
    with
    buildList(size)
    to get it right-sized) as long as the compiler's ability to inline were improved. However, it also relies on a final call to
    materialize
    that would be awkwardly wrapped around the output:
    (listA + listB + ... + listE).materialize
    . At that point, it becomes clear that this would have to be optimized at the level of the language, like how concatenating multiple strings together becomes a
    StringBuilder
    call. It's necessary so that
    "a" + "b" + "c"
    doesn't require a
    toString()
    call around it, the only way for the compiler to know when
    +
    is adding to the
    StringBuilder
    vs building the final string is a better representation in the language.
    y

    Youssef Shoaib [MOD]

    1 year ago
    @Derek Peirce There's no need to materialize to be called as my example clearly shows.
    ListInTheProcessOfBuilding
    extends List on its own, and materialize is just simply an implementation detail of
    ListInTheProcessOfBuilding
    that any consumer doesn't need to worry about since, again, any List-related method just operates on List as an interface (or in a lot of cases Collection which is a superinterface of List). Every single method in the List interface is delegated inside of
    ListInTheProcessOfBuilding
    to a
    materialize
    call followed by the intended method. I could provide a more detailed example if you want one where a normal listOf is replaced with a
    ListInTheProcessOfBuilding
    in normal code so that you can see that it works right away with no awkward calls to
    materialize
    or anything like that, but simply think of it as if
    StringBuilder
    was a subtype of String, which means that you won't need a
    toString
    call. (I didn't realize by the way that I missed up the buildList size thing so whoops nice catch). Again, I'm currently working on a compiler plugin to support this, and there's an issue already for this to be supported, and, tbh, from what I've been seeing so far while trying to implement this plugin, this isn't a complicated optimization to implement, and like the compiler itself already has more complicated things in it, and so yeah hopefully this'll be implemented soon enough
    Derek Peirce

    Derek Peirce

    1 year ago
    Ah, right, it is its own
    List
    . However, it still doesn't function quite like a naturally-built
    List
    , as it is instead a lazy
    List
    . If the list is only used within the current method, then a theoretically powerful enough inlining compiler could optimize that out, but if the list is passed into a non-inline method or returned from a non-inline method, then you're forced to keep the list-building components in a closure unless you manually call
    toList()
    . This incurs performance and garbage-collection penalties if laziness isn't desired.
    y

    Youssef Shoaib [MOD]

    1 year ago
    For the performance side of things laziness-wise, the moment that you use that list in any way, shape, sort, or form as an actual list it will materialize and so the laziness won't last that long. Garbage-collection wise yes it is problematic if you pass it to a regular function, but if you look at the stdlib for example literally all of the functions are inline, and some library authors choose to have most of their methods as inline. If this becomes an actual language feature/optimization, then I guarantee you that a lot of people will then mark all the functions they possibly can as inline if they take in some form of interface or any kind of object that can possibly be a magic lambda thingy.
    Derek Peirce

    Derek Peirce

    1 year ago
    Inlining isn't a magic wand, it comes with a cost. The larger a method is, the more expensive it is to inline it, and every inlined method called by an inline method contributes to that. Inlining just to optimize something that never should be lazy to make it truly not lazy would be a terrible trade-off in many cases. Overriding methods also can't be inlined at all, so any list passed up or down through such a boundary would have to become lazy.