In a product of ours we have had many performance ...
# coroutines
m
In a product of ours we have had many performance issues connected to the network latency between application and database server. There are many trivial SQL queries executed in sequence and for each query there's a network round trip. Critically the database only permits a single query at a time on a given connection / transaction so I can't hide the latency by running the queries in parallel. Instead of "real" parallel execution I had an idea to develop a coroutine dispatcher that allows each coroutine to issue a query, but it queues them up and sends them in a single batch (one call to Statement.execute() with multiple statements or using executeBatch()). Each query is associated with the continuation that will process its results, so once the result sets come in I can dispatch them all through multiple resumeWith() calls. My problem is, how do I know when to stop waiting for more queries to be queued up and actually make the JDBC call? Can the coroutine dispatcher somehow detect when await() is called (or there is an implicit wait) on any of its queued continuations? Basically I want to continue queuing queries up until the originating control flow enters a waiting state.
a
The dispatcher is the wrong unit to try to manipulate; you can do what you're describing through your suspending query function
m
So the suspending query function... queues up the query, right? But then remains the question, how to know when to send the batch?
Do I yield and then signal the share queue that it should execute the batch?
a
how does a caller signal to you that they've enqueued the full batch?
and does it matter?
how long are you willing to wait to form a batch vs. sending queries serially?
m
I'm imagining that I have a block of code (a scope in a method) that fires up multiple coroutines. Eventually execution enters a point when all coroutines that were started, including the top coroutine, are all suspended, all waiting for some query to finish. That's when I want to fire off the batch to and subsequently release all the waiting continuations when the results come back. I'm willing to wait as long as any coroutine is still running and able to queue up more queries, but only within this particular scope of execution (say, a REST query being serviced).
There will likely be internally dependent queries so once you resume the initial continuations there will be a bunch of new ones being queued up.
a
the coroutines machinery works in layers where lower layers by design don't have knowledge of the semantics of layers above. Dispatchers/ContinuationInterceptors only know how to modify the way that continuations resume; they know nothing about why a coroutine suspended or when/why one will resume. Jobs sort of know about vague structural dependencies between coroutines but again, no concept of why, and there aren't useful intermediate non-terminal states that you can use to represent, "I am still running but I am waiting for a very specific kind of result"
from either of those layers you can't know whether something is suspended waiting for a query vs suspended waiting on a
delay
or similar vs. suspended waiting on some aggregated result of several queries running in different coroutines
m
Ok. So you don't see any way of using a coroutine based abstraction to hide the details of the batching needed to remove latency then?
a
I didn't say that, I said that you don't have enough info to be able to do it from a dispatcher 🙂
or from other notions of local "idleness" because they suffer from the same limitations
so either you need to accept that batching is going to be its own thing that determines when a batch is ready to go on its own without trying to monitor idleness of related operations, or you'll need to have the client give an explicit signal of some sort. That explicit signal might come from something like a dsl-scoped block of code reaching the end, but it's still "explicit" from the standpoint of client layering
m
Yeah the problem is that it's hard to explicitly say when it's time to run the batch since there are multiple levels in the call chain and each level can have queries that are not dependent on the other's results (and can hence run in the same batch). The model I have in my head is that of a build system such as GNU Make: I feed it a dependency graph, and from that it implicitly figures out what it can run in parallel (i.e. in a batch) and what needs to wait because it depends on results from the previous tasks. And I was hoping that the coroutine machinery would have access to such a dependency tree for jobs based on which coroutine waits on what coroutine. But I guess I could make a DSL with lambda functions for something more akin to a build system instead.
a
it has no such dependency tree and really, it probably shouldn't. Relying on such a thing is always going to be fragile since it's so easy to construct a scenario where 3rd party code can suspend in such a way that there's a semantic dependency that isn't represented structurally within the system. Expanding the structure to be able to model all possible use cases would make the whole system unwieldy and possibly perform badly
m
I see. Thanks for taking the time to explain.
a
I think that as you work through this you might find that creating a precise dependency tracking setup doesn't perform any better than accumulating queries from a channel and then sending the whole batch after some short time delay
m
Perhaps. Or I could issue the first query right away and while waiting for the results of that I keep queuing up subsequent queries.
👍 3
a
it may even end up performing worse depending on how subsequent/otherwise unrelated queries get stacked up
yeah, that's another idea too
play with it and profile
j
@Mattias Flodin The last thing you suggested is what is sometimes referred to as "natural batching" - not time-based, not size-based. When the "actor" (the DB) is ready, it takes all available elements from the queue and that's your new batch. While the actor is working on a batch, all new queries are enqueued, waiting for it to be ready. That might work well enough for you
👍 3
m
Ah yes, I've had to deal with that issue with SharedFlow previously and solved it in a pretty roundabout way by keeping a separate event history that is checked simultaneously as the flow is polled for an event. Probably not a good solution for the general case.
m
Hey there, I just found this thread as I’m trying to solve basically the same problem. I’m trying to build Facebook’s DataLoader in Kotlin using coroutines. DataLoader is made for Node.js and basically uses the end of the current event loop cycle to dispatch batch of “load” events. The goal is to use it in a GraphQL project to batch database queries that are executed in parallel when resolving fields. Since coroutines aren’t event loop-based this poses quite a challenge. The only idea I have so far that will likely work is to dispatch a batch after a certain delay plus optionally using a manual trigger. That certainly adds a performance penalty of at least 1ms per batch – maybe less if I implement an alternative to
delay
that supports sub-millisecond delays. @Mattias Flodin basically said that would be my ideal scenario too:
Eventually execution enters a point when all coroutines that were started, including the top coroutine, are all suspended, all waiting for some query to finish. That’s when I want to fire off the batch to and subsequently release all the waiting continuations when the results come back.
Calling
DataLoader.load(…)
makes it explicit that we’re waiting for something and once a batch is dispatched all loads are combined into one query. @Adam Powell I don’t fully understand your point how layering makes this impractical or impossible. What exactly is a layer? Are lower layers opaque to higher layers? Is there a good source to read about the architecture? Is it not possible to wait for all execution within that scope to be suspended, then execute some logic (dispatch a batch, which adds another suspended execution), and only then allow all executions to resume again? It doesn’t matter if a suspension is a query, delay, or anything else. “wait for all execution within that scope to be suspended” would just be the equivalent to Node.js’ end of currently event loop cycle and allowing the execution to resume equivalent to resuming the event loop.