There is a very nice <article> about returning mul...
# language-evolution
e
There is a very nice article about returning multiple values in kotlin without any allocation whatsoever by Louis. Beyond Android, in real time 3d graphics this is also quite an important concern. So since there are several fields where this would be happily welcomed, I think the best would be to have kotlin support built-in in the language itself. Is there any KEEP in this regards? Would also other people find this useful?
e
Kotlin compiler already transparently wraps vars in a
kotlin.jvm.internal.Ref
as needed, I think using that to implement out-parameters would be more likely to get into the compiler than forcing a function to be inlined at every callsite just for the sake of removing some allocations
that is of course not free of allocations, but they're allocations that the Kotlin compiler will silently make under many other circumstances anyway
TBH in @louiscad's original article, I feel that the better solution to that particular issue would have been something like
Copy code
@JvmInline value class SignalLevel private constructor(private val value: ULong) {
    constructor(signal: Int, levels: Int) : this(signal.toULong() shl 32 or levels.toUInt().toULong())
    val signal: Int get() = (value shr 32).toInt()
    val levels: Int get() = value.toInt()
}
as JVM's
long
is already a supported method to use two stack slots for return value
l
I agree that the inline value class is a good solution. I didn't use it because I wasn't comfortable checking the correctness of the low level operations for something so simple. Would be great to have inline value classes made of 2 ints stored in a long, and same for sub-64bits numbers, so long the total doesn't exceed 64 bits.
e
I'd prefer much more an option where I'm sure there will be no allocation a priori instead of relying on the compiler optimizations, which may no work out for unknown reason(s).
e
it would be interesting to have built-in Int×2, Short×2/×4, Byte×4/×8 (and corresponding unsigned) value types. Float×2 could be doable as well. but I think the use case is narrow enough that it's unlikely to get into stdlib, and implementing it in an external library should work just fine
e
you keep thinking only about primitives. There are also classes and so on..
maybe a symmetric syntax:
Copy code
typealias Callback = (Int, Foo) -> (Long, Bar)
val callback = { int, foo ->
   ...
   -> long, bar
}

...
val (a, b) = callback(c, d)
e
there's simply not a good way to implement it for general classes on the JVM, only trade-offs
e
well, with the
inline
the compiler may do the work for us
e
callback without inlining results in allocation both the lambda and
kotlin.jvm.internal.Ref
boxes for anything it mutates. callback with inlining needs to be done with care to avoid code explosion. using a temporary immutable data structure for returning multiple values is almost always fine. it should come from the bump allocator, if it's temporary and never stored as a reference anywhere then nursery collection is fast. it's the best trade-off for most general purposes.
c
but is point 2 really true? you can just return a pair and use destructuring.
✔️ 1
Copy code
fun return2(): Pair<String, Int> = Pair("String", 10)

val (a, b) = return2()
e
sure, but with allocation if the compiler cant escape that
c
yeah sorry i did not read this whole thread before posting, i was just commenting on the blog post
and in a lot of cases this is just premature optimisation and not worth the lost readability. but there could be a Pair<Int, Int> as inline value class. that should also work with desctructuring
✔️ 1
e
we are talking about low level libs
l
Or hotspots (code called at high frequencies)
e
I'm tempted to file an issue on youtrack, I'm curious to see if this might get to something concrete
l
Go ahead, I'd love to have @JvmInline capabilities expand. BTW, a
Long
can work for storage even for
Float
types.
e
yeah, as long as they go through Float.toRawBits()/fromRawBits()
y
I commented something similar on the #feed post, but here it is again since it's useful here, too. This idea that Louis presented can be extended into instead returning lambdas from inline functions and having those lambdas inlined too. The idea probably seems confusing, so here's a code example:
Copy code
fun main(){
    val (stuff, bonus) = getStuff()
}

inline fun getStuff(): ZeroCostPair<Stuff, Bonus> {
    val stuff = grabStuffFromCargoBikeBasket()
    val bonus = inspirationElixir()
    return ZeroCostPair(stuff, bonus)
}

//Implementation details
typealias ZeroCostPair<F, S> = (PairCall, F?, S?) -> Any?

enum class PairCall {
  First,
  Second
}

// Mimicking a constructor for the type.
inline fun <F, S> ZeroCostPair(first: F, second: S): ZeroCostPair<F, S> =
    { call, _, _ -> 
        when (call) {
          PairCall.First -> first
          PairCall.Second -> second
        }
    }

// Again, the parameters are useless, so just pass in null for them since during runtime the JVM won't actually know what
// F and S are since they get erased.
// We can safely cast the result of invoking the function as F or S because we know that ZeroCostPairs created using
// the factory function always follow the pattern of returning an F if PairCall.First is passed in and likewise for S.
inline val <F, S> ZeroCostPair<F, S>.first get() = this(PairCall.First, null, null) as F
inline val <F, S> ZeroCostPair<F, S>.second get() = this(PairCall.Second, null, null) as S
And the idea is that the returned lambda would just be fully inlined at compile time. I currently have a ticket for this and a prototype compiler plugin. However, I realise that here the issue of code explosion due to inlining still persists, but I think that there's possibly some possibilities to prevent that. For example, the user themselves can do what the above example does in that it splits the function into 2 other non-inline ones (namely
grabStuffFromCargoBikeBasket
and
inspirationElixir
and so the issue of code explosion is almost eliminated here). I think this could also be implemented on a compiler level, albeit with a compile-time performance hit, by some analysis of how much each part of a function affects the rest, and sense in general programmers are shifting towards a functional style of programming, code like this is quite common:
Copy code
inline fun foo(param1: A, param2: B): Pair<X, Y>{
    val firstThing = //some calculations
    val secondThing = //even more calculations
    // do stuff with firstThing and secondThing
    // derive x and y from firstThing and secondThing
    return x to y
}
and so in that case this function could be split into 5 parts: 2 parts for the first 2 calculations. one part for the processing stuff in the middle, then 2 parts for the derivation of both x and y. In a case like this, the compiler, by noticing that foo is inline, can split foo into 5 different synthetic methods and then whenever foo gets inlined it'll only add the bytecode needed to call those 5 methods. It can even be done on a case-by-case basis by calculating what the bytecode impact of the splitting could be compared to just letting it be inlined normally. The compiler already does some pretty complex analysis stuff, and so it seems as though this wouldn't be heretic to suggest.
e
for a moment, you truly got me super excited! Then I read that needs to be implemented by kotlin 😕
y
Well the prototype currently works and it produces optimized bytecode. I think it is also multiplatform right now, though I haven't made sure it works. I'll be polishing it up soon and adding usage instructions, but yeah it should be as easy as applying one plugin in your build.gradle.kts
e
I cant way to try it out
y
Meanwhile like the issue (if you haven't already) and also if you come up with any use-cases over the next months it would be helpful to show me an example where the plugin would be helpful because I try to yk have a lot of samples that I test against to make sure that everything works properly.
e
on Vulkan, I created a library which is object oriented. Usually in C you have to pass a pointer on which you will find the newly created object to, while on return is
VkResult
. On vkk you get instead directly the object in return, which is much more useful and common, and then if you want to inspect the
result
, at the very moment you can use the inlined lambda at the end and put there your logic about that. With your plugin, I'd be able to return directly object and result
val (obj, res) = vk..
e
for what it's worth, I had some fun building a code generator which produces every
U?(Byte|Short|Int|Float)(Pair|Quad|Oct)
that can be packed into a
(Int|Long)
, all tested including checking that they all get returned as primitives on JVM
e
@Youssef Shoaib [MOD] could you please keep me/us updated on your progresses? That's extremely interesting
l
Watching the repo releases could work, no?
e
yeah, but quick updates directly from the author are much easier to get
l
Depends on the scale though 🙃
🙂 1
j

https://www.youtube.com/watch?v=OFgxAFdxYAQ

at about minute 35 he gets done with his asm examples and summarizes that no matter what you do, your performance is based on counting the cache misses among a stack of ~10 concurrent speculative branches. however you want to fold your two parameters together for a return is a single factor in the focus of locality of reference of all things involved, hard stop. indirection serves to push your program to the "spill" threshold of too much inline and loops that are too fat to stay in cache. Cliff Click designed the C2 hotspot compiler fwiw
👀 1
c
thanks thats a great link!
👍 1
e
I finally found some time to watch it, let's say I already had the feeling, but that was much lightening. thanks jim
👍 1
A second though about this, it'd be cool if
inline
on destructurizations would lead to automatically this kind of code, ie:
Copy code
val (theStuff, theBonus) = getStuff()