So I thought that recomposition only needed to exe...
# compose
g
So I thought that recomposition only needed to execute a tiny amount of work for a tiny change in the general case, but it appears that every recomposition that occurs causes the runtime to iterate through all the observations within a composition for some invalidation check in
CompositionImpl::addPendingInvalidationsLocked
(I don't have enough context to understand what this is actually doing).
Copy code
private fun addPendingInvalidationsLocked(values: Set<Any>, forgetConditionalScopes: Boolean) {
        var invalidated: HashSet<RecomposeScopeImpl>? = null

        values.fastForEach { value ->
            if (value is RecomposeScopeImpl) {
                value.invalidateForResult(null)
            } else {
                invalidated =
                    invalidated.addPendingInvalidationsLocked(value, forgetConditionalScopes)
                derivedStates.forEachScopeOf(value) {
                    invalidated =
                        invalidated.addPendingInvalidationsLocked(it, forgetConditionalScopes)
                }
            }
        }

        if (forgetConditionalScopes && conditionallyInvalidatedScopes.isNotEmpty()) {
            observations.removeScopeIf { scope ->
                scope in conditionallyInvalidatedScopes || invalidated?.let { scope in it } == true
            }
            conditionallyInvalidatedScopes.clear()
            cleanUpDerivedStateObservations()
        } else {
            invalidated?.let {
                observations.removeScopeIf { scope -> scope in it }
                cleanUpDerivedStateObservations()
            }
        }
    }
What I can understand is that
observations.removeScopeIf { scope -> scope in it }
iterates through every observation, and observations seems to scale with the total number of state objects being read. In one application I have 60,000 states and Kotlin/JS seems to slow to a halt iterating through this list. Before anyone asks, these states are not all being read in one function, there are maybe 4 states being read per composable function and only the one function that actually depends on the mutated state is getting recomposed, all others do not rerun. . In my mind, it is bad to have a linear relationship between the number of TOTAL states observed and the time it takes to restart a single composable function. I would think that the relationship would only need to be as complex as maybe logarithmic-ish time given the need for an Applier to traverse the tree. . Does anyone know how I might get around this iteration?
m
I'm not an expert on the Compose internals, but my understanding is that
addPendingInvalidationsLocked
is only called from
recordChanges
which is only called from a
Recomposer
instance in response to actual changed values in a snapshot. In other words, the origin of the
values :Set<Any>
if you trace it all the way back is the
changedSet
in:
Copy code
Snapshot.registerApplyObserver { changedSet, snapshot -> }
So that would not be iterating through all of the
State<>
observations, only the ones that changed. So unless I read the code wrong, it should only iterating 60,000 states if all of those states were mutated since the last snapshot. And even then, the
Recomposer
has code to only keep track of changed state that's actually read in a composition somewhere. So I wonder if maybe you're looking in the wrong place for the performance issue. If I might ask... what led you to suspect this code?
💁‍♂️ Independently of that particular code.... you may already know this, but just in case: there are things that can cause a composition to recompose everything. For example, if you are not using immutable/stable state (for example, if you use
List<>
rather than
ImmutableList<>
or have data classes that the compiler can't infer as stable that you didn't annotate with
@Stable
or
@Immutable
). This is a good place to start for more guidance on stability in compose.
(It's an Android link, but it also applies to the other Compose runtimes)
💁‍♂️ If you're interested in learning more about how Compose tracks state changes under the hood, there is this excellent article on the Snapshot system. You can actually observe changes yourself (there's sample code in the article) if you want to check if you're mutating state more than you expect.
g
Yeah so if you look at their implementation of ScopeMap, and specifically the line where they call
removeScopeIf
. That function has to traverse through all the entries of the scatter map because scopes are stored as values and not as keys, which is where the bottleneck I'm talking about is. The traversal is very fast, but still linear to the size of the scatter map * scope set size. So the invalidations set is just the one or two states that get invalidated, but the observations ScopeMap stores ALL of the scopes, which for 60,000 states being read means a minimum of 60,000 scopes stored in the map. They actually have a number of sites they call functions on ScopeMap that require full iteration of the map but this was the worst offender. . What led me to this function was Chrome and JVM profilers, then manual stepping with their debuggers and trials with toy example apps. . I assure you there are no odd mutations occurring, every mutable state read and write is managed by my runtime persistence manager. I can see exactly where every state is read and written. None of this code can work with compose state directly, it must read it through the persistence manager. (But my toy reproducers don't use the persistence manager and have the same result) I can also see if any of our composable functions are executing, and compose is doing what it is supposed to do, just the minimal amount of work needed for the nearest restartable function (which in my case is every function, by design and confirmed with the compiler's report). And during recomposition I can see only the minimal amount of state for the function is read (which is correct). I'm confident that there is a scaling issue with this function and all of the places that iterate the observations ScopeMap. And after looking at it longer I question that it might have been an intentional decision for performance when the number of states read are smaller by a factor of 10 ish (which is probably more common). It might cost less to traverse the scatter map values than to sync a second map as its inverse. I tried to look for issues relating to this hypothetical consideration but didn't find anything tangible.
My interpretation is that this is a fundamental limitation under a single CompositionImpl as of now sadly. What I'm unsure of, is if there should be multiple composition impls or something else that breaks this up into smaller sets
m
Ah, I think I misunderstood: When you said "In one application I have 60,000 states", I had assumed they were (for example) in different screens of an app (edit: or scrolled off the screen part of the time) or something, and not all in the composition at the same time. I'm having trouble imagining what kind of UI state would have that many visible elements on the screen at once all depending on independent state. I can come up with some contrived examples like... a 400x400 minesweeper board where each cell stores mine/no-mine and covered/uncovered as separate states — but that's very contrived (on a typical monitor, each cell would be just 4 pixels wide, and even then, there's not really any reason not to have a single
State<>
for each cell, or even for the whole board). Not saying you're doing it wrong, I'm just really curious what kind of UI would involve that number of independent state objects active in a composition all at once.
g
I'm copying this from another thread in this Workspace > I am making something like Phoenix LiveView or HTMX, but purely in kotlin. The JVM server owns all of the logic and the client is very very dumb, knowing nothing but the abstract elements it can render. All content for the web page is incrementally communicated over a web socket. When a user interacts with the UI the interaction is handled by the server and then a ui state diff is communicated to the client. The client does not "think" on its own. But when it comes to things like animations, canvas rendering, and interaction prediction (That last one fundamentally can't be done on the server) the locality of execution is important, for latency, bandwidth, or both. These remote functions are small things to run on the client, but still determined by the server. Like I always want the width and length of some resizable element to be the same. That could be done by the server, but latency would make it perform poorly The framework works at the level of node and properties on those nodes. The server has the ability to tell the client that a property x for node 2 is now value y. Each property for each node is a state. The framework is actually more general than just Kotlin JS and compose. It supports different component sets and pluggable renderers for those component sets on any Kotlin platform. But assume there is a generic HTML component set and assume there is maybe one that is more specific to UI components that are needed for our application. If there are 5 properties defined for a base HTMLElement, and you want a web page with 2500 elements (that is how many the google search page on my other screen has, that is 12500 states right there). For every dedicated property I add that is 2500 more states. Node graph editors need nodes, edges, pins, and all of those might have content, locations, styles, other metadata.
I know there are optimizations with compose to be done, like compressing all the properties into a single state or delaying state creation until first mod, but they all have drawbacks and it really shouldn't be needed. At first glance I think they just need an inverse map, but maybe there is more work needed. I'm just hoping I can figure out something quick that I don't have to wait for the android team to maybe except a change for optimizing large numbers of states
Also sorry I have had a bout of insomnia so I haven't really slept much, If I'm not making sense
And the use case here is highly interactive applications that need to run on a server
m
This is probably something that someone from JetBrains or Google would be better suited to comment on, but my general feeling is that... I'm not sure if this is really a good use-case for Compose. Normally, Compose just deals with what is visible at any given time, and it looks to me that a lot of the architecture is based on that assumption. For example...
Copy code
@Composable
fun MyComposable( selectedTab: Int ) {
    when( selectedTab ) {
        1 -> Tab1()
        2 -> Tab2()
    }
}

@Composable
fun Tab1() {
    var foo by remember { mutableStateOf(1) }
}

@Composable
fun Tab2() {
    var bar by remember { mutableStateOf(2) }
}
...here, if
selectedTab == 1
the state
bar
isn't "in" the composition, so it doesn't count towards the number of observations.
g
Yeah that doesn't get observed either in this
On the client that is
m
Not sure if we're talking about the same thing: I'm talking about
bar
not being in
observations
in
CompositionImpl
because
Tab2()
didn't get called at all.
g
Yeah, I'm agreeing with that
Compose is used in that same way on the sever. And if the server didn't run Tab2. it wouldn't emit nodes for and the client wouldn't know about it
And if they did nodes under Tab2, and then clicked on Tab1 the server tells the client to forget about the nodes emitted by Tab2 and tells it about the nodes emitted by Tab1
m
Hmm, I guess I just don't know enough about HTMX etc to understand the issue.... but for example... it seems to me that scrolling or window resizing would be really slow...
g
Well in compose UI, it doesn't not compute the nodes in a scroll section unless they are in a lazy scroll
So the server will send all of those nodes over
It doesn't render them, but it still computes the tree. But even lazy works the same way it does on clients too. Functionally the difference between Compose UI and what I am doing that causes more states to exist is I derive the state remotely, which means I need the reactive state to drive updates, when computing the state locally you don't need to wrap it in state infrastructure, you just let compose handle the recomp and node reuse and it just figures it out, without using state most of the time
But that still sucks for me
m
Are you using the same kinds of Composable functions on the backend? Like...
LazyList
Box
Button
and so on?
g
No, they are custom
m
Hmm...
g
With a custom applier
On the frontend I am using compose web nodes, standard stuff, just the inputs to my Compose functions are my nodes (UI Structs), and every property is backed by a state
m
This is very interesting to me because I am also working with a custom
Applier
(our team is building a multimedia video compositing library with a composable API... so you can use the same
State<>
to drive both the Compose UI and to get live updates to the video preview based on the same state as you edit video). Unfortunately, this is getting a bit beyond the scope of my knowledge (not knowing your architecture and granularity of nodes, not having visibility into the decisions behind the Compose under-the-hood architecture, etc.), but... my gut feeling is that the answer to your question lies here:
when computing the state locally you don't need to wrap it in state infrastructure, you just let compose handle the recomp and node reuse and it just figures it out, without using state most of the time
I think if I paraphrase, it comes down to something like this: • Compose seems to be optimized based on the assumption that
State<>
will existing only for things that need to change dynamically • BUT... you're making
State<>
for everything, even stuff that never ever changes So I wonder if there's a general way to automate optimizations here. I'm really throwing out some crazy ideas at this point, but I wonder if it might be worth considering using the
Snapshot
API on the server side to capture the sets of state that can actually be read/mutated and somehow optimize so that the client only makes State<> objects for those. In other words... if you can use the data from a
Snapshot
read/write list to generate two separate trees: A static one (that never changes because there are never state reads on the server side) and a dynamic one that "overlays" the nodes on the static tree, but only for nodes where the server observed a state read. (It seems that might possibly allow for a lot of other optimizations in the future too...) Of course, you'd still need to have some state where subtrees get added/removed in response to (e.g.) if/when statements wrapping calls to composables. I could be complete off-base here, and I have no idea (other than a super general feeling) how one would go about doing Snapshot observation within an Applier... I'm just throwing out some random partially-formed ideas in case it helps. In my case, I solved a similar issue by basically making a rule: On a video composition, anything that references the current playhead time can't be inside the composition. In other words, this is allowed:
Copy code
Rotate(45f * userSliderValue) {
    Video("blah.mp4")
}
But this is not allowed:
Copy code
Rotate(playheadTime * 45f * userSliderValue) {
    Video("blah.mp4")
}
That was done to avoid recomposition on every frame during playback. Instead, what I did was make versions of the composables that take lambdas:
Copy code
Rotate({playheadTime * 45f * userSliderValue}) {
    Video("blah.mp4")
}
Those lambdas are evaluated at playback-time. I know when
playheadTime
is accessed (delegation gives me that without even using State) but
userSliderValue
could be a MutableState, so I need to know if it changes so that I the video compose can re-render. So for that, I just call all of those lambdas inside snapshots. This decoupled the two different state-dependent things (responding to UI changes and responding to time changes) and essentially moved one of them to render-time rather than compose-time. That doesn't really apply to what you're doing, but I mention it as an example of where I was able to use the lower-level
Snapshot
API to optimize parts of my projects that weren't really suited to the regular Compose design philosophy. Maybe that will give you some ideas. Maybe not.
On the frontend I am using compose web nodes, standard stuff, just the inputs to my Compose functions are my nodes (UI Structs), and every property is backed by a state (edited)
I wonder if it's really necessary to back each property with a state, rather than just making the whole input a single state. After all, Compose does optimize recomposition based on function parameters too... Again, just throwing random ideas out in case something inspires you. What you're working on sounds like a cool project in any case. I'm very interested to hear if you find a solution.
(Okay, it's late here and I need to get to sleep— I'll check back in the morning)
Okay, one last thought, adding to what I said here:
So I wonder if there's a general way to automate optimizations here. I'm really throwing out some crazy ideas at this point, but I wonder if it might be worth considering using the
Snapshot
API on the server side to capture the sets of state that can actually be read/mutated and somehow optimize so that the client only makes State<> objects for those. In other words... if you can use the data from a
Snapshot
read/write list to generate two separate trees: A static one (that never changes because there are never state reads on the server side) and a dynamic one that "overlays" the nodes on the static tree, but only for nodes where the server observed a state read.
If you know in advance that a given set of attributes will be static (because the server knows) you could basically have your client Composable functions take an
interface
and then have two under-the-hood implementations of the interface: One that's immutable (and just uses a data class) and one this is mutable, where the fields are backed by state. If the server notices there are no state reads at that particular level, it could mark the node "static" (it would still be possible to remove the node entirely if it leaves the composition of course) and in that case the client can see the "static" flag on the node and just use the data-class version that's not backed by state.
Okay, I'm for real out for the evening now. Again, what you working on sounds super interesting.. hope you find a solution! Good luck!
g
Thank you, I'm processing all that you have said
m
Probably most of it isn't useful in your use-case, but maybe it will inspire a solution. 🙂 Hope at least a tiny bit is helpful. Good night 🙂
s
A bit late to the discussion here, but wanted to comment on the original question: The snippet above does indeed traverse through all observed states, simply because
observations
is a map from key to multiple values and traversing over all values was faster than having a reverse index
ScopeMap
in our benchmarks. I checked how big this map gets in some apps and it was around 1000 (in the worst case), so your case is not something we optimized for (not saying that we shouldn't). Overall, having too many states is slower for many other reasons, for example, reading states is not free because composition has to record each of them, and its likely that you are slowing down your initial composition times significantly. Overall, I recommend bundling fields into immutable classes with a state wrapper and relying on skipping to provide incremental updates instead. You might be overinvalidating a little bit, but that would be way more efficient than having a state per field.
thank you color 2
g
By the way, thank you both for taking the time to respond to me! And yeah... I figured that would be the recommended solution, and honestly in most cases there won't be over invalidation anyways. After yesterday, I had honestly settled to just substitute an implementation of
runtime
with my own, but I will try this. There are still some issues that can't be resolved with packing the properties at the node level (I can explain more if you are curious) (also isn't that similar to how live literals work under the hood?) and my intuition tells me it'll require some significant work to automatically pack across nodes without potentially really over invalidating. If I did want to to have a conversation about optimizing for bigger numbers of states in AOSP, do you know if there is a template or guide to starting the conversation with AOSP? . I also believe I'll have to deal with the fact that the Recomposer relies on the global snapshot (sadly not a) and I believe that will create a bottleneck when trying to scale the number of connections on a single JVM instance (without relying on class loader shenanigans, which would introduce its own host of issues), but I haven't experimented yet and that's a different conversation.
s
If you want to report the perf issue, just file a bug to Compiler & Runtime component in Compose About the server side, global snapshot is an issue there, but also compose keeps a number of global locks that will be really slow in high contention environments. Feel free to report that with your use case as well