I'm experimenting with a use of the compose runtim...
# compose
q
I'm experimenting with a use of the compose runtime. For a quick viability check, I whipped up a test scenario similar to the https://github.com/krausest/js-framework-benchmark tests. If I try to add 10,000 items (with each item being a tree of 8 compose nodes) the compose runtime spends a very long time in Compose:applyChanges, around 57 seconds (see attached flame graph). Trying with only 1,000 items the runtime is much reduced, at less than a second, leading me to believe I might be running into a performance cliff of the slot table implementation.
🤔 1
I've found a possibly related bug report here: https://issuetracker.google.com/issues/246751843
j
Super interested in this. We're running Compose with a custom tree in a JS VM for work. One of the things I've done around the Compose runtime is switch to JS-native collections. The Kotlin collections and collections written in Kotlin around things Kotlin Array (even though it bottoms out in a JS array) are pretty slow.
q
Very interesting. When I change my main iteration loop from
Copy code
for (item in data) {
    key(item.id) {
        Row(item.id == selected, item, { selected = item.id }, {
            data.removeAt(data.indexOfFirst { it.id == item.id })
        })
    }
}
to a chunked one, like this
Copy code
for (chunk in data.chunked(100)) {
    for (item in chunk) {
        key(item.id) {
            Row(item.id == selected, item, { selected = item.id }, {
                data.removeAt(data.indexOfFirst { it.id == item.id })
            })
        }
    }
}
the performance is vastly improved (from 80 seconds down to 2 seconds)
s
interesting, I wouldn't expect these two to have a significant difference, as it results in the same amount of rows in the end could you capture a trace of both?
also looking at the trace in the original comment, it looks like a lot of the time is spent in a binary search over
ArrayList
, which might be not optimized for this in JS implementation.
q
Here's a flamegraph for the fast version; while most of the time is still spent in
SlotWriter.moveFrom
the whole apply changes now is just 1200ms
s
oh wait, it is Java impl 😄
q
Also not sure if it was clear, but this behavior is observed running on the JVM, not with JS
s
my bad 🙂
yeah, this is surprising in general, i'd expect these two to be equivalent in perf most probably there's a different slot table structure generated by the compiler which makes it perform better
q
Decompiling the class file it seems that in the chunked version there is an extra startReplaceableGroup/endReplaceableGroup wrapping each chunk. Could it be possible that the case where a node has a huge amount of children is not well optimized, leading to some n^2 or worse perf?
s
I think some node pointer tracking doesn't scale well in that case Feel free to file a bug or comment with your use case on the bug you referenced above, I'll share it internally
q
Will do. If I create even more nested loops (chunking by 1000/100/10) the performance is again improved, now spending most of the time in user code. Would it be helpful if I submitted the bug with sample code? If that's the case, I would also create a demo repository showing this behavior.
s
yeah, snippets or sample repo are definitely helpful
q