Hi! Is there a reason why `DefaultDispatcher-worke...
# coroutines
j
Hi! Is there a reason why
DefaultDispatcher-worker
threads would keep getting spawned without any limits? I’ve been debugging an Android app which is supposed to run in the background and periodically execute tasks that could be async (the async code uses coroutines). After running the app for long time it looks like the app gets bloated with
DefaultDispatcher-worker
threads (I can see an increasing number of threads in the Profiler and increasing ids of the threads when I print their names) and eventually crashes with the
OutOfMemory
error (it happened once in a coroutine running on a
DefaultDispatcher-worker-739
thread - yes, 739). The coroutine handling in the app is quite poor, yes - that’s what I’m trying to fix, amongst others - and we use mostly the IO dispatcher. The only dependency we use that comes with a suspend interface that I’m aware of is Ktor. There’s also a place where we create a
newSingleThreadContext
and run a coroutine on that single thread. The single thread doesn’t seem to be an issue, though, the context gets closed after it’s no longer needed and I don’t see any zombie threads stacking over time coming from that part of the code. So at this moment it looks like whatever is spawning
DefaultDispatcher-worker
threads is causing the problem. I thought, however, that both
Dispatchers.Default
and
<http://Dispatchers.IO|Dispatchers.IO>
are built on a limited thread pool, so I can’t understand how it could be possible that hundreds of such threads get spawned.
d
Your understanding is correct: by default,
Dispatchers.Default
has as many threads as there are CPU cores, and
<http://Dispatchers.IO|Dispatchers.IO>
has 64 threads + (at most) as many threads as
limitedParallelism
calls require. For example, this program uses at most 70 threads for `Dispatchers.IO`:
Copy code
val worker1 = Dispatchers.IO.limitedParallelism(2)
val worker2 = Dispatchers.IO.limitedParallelism(4)

suspend fun main() {
  repeat(1000) {
    worker1.launch { while(true) { } }
    worker2.launch { while(true) { } }
    Dispatchers.IO.launch { while(true) { } }
  }
}
There are caveats, though, as always. • There are system properties that affect this. For example,
kotlinx.coroutines.io.parallelism
could be set to 100000, then threads would spawn effectively unrestricted. I'd suggest
git grep
for this property in the project. •
limitedParallelism
could in theory be misused. This code has a bug:
Copy code
// Uses at most two threads in total for all calls:
fun runInATwoThreadedWorker(block: suspend () -> Unit) {
  Dispatchers.IO.limitedParallelism(2).launch {
    block()
  }
}
Calling
runInATwoThreadedWorker
will create arbitrarily many threads. So, I would also audit all
limitedParallelism
usages.
j
Oh, I wasn’t aware that
limitedParallelism
worked like this 🤔 Thanks for the tip, I’ve just learnt something new 😄 However, I haven’t found any usages of this nor
kotlinx.coroutines.io.parallelism
in the project. If these are the only ways to configure the dispatchers to different limits, it could mean that the issue may come from the dependencies, couldn’t it?
d
I wasn’t aware that
limitedParallelism
worked like this 🤔
It's a special property of
<http://Dispatchers.IO|Dispatchers.IO>
called elasticity: https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/-dispatchers/-i-o.html
the issue may come from the dependencies
The buggy usage of
limitedParallelism
could happen in them, yes. I'd try logging the value of these system properties during execution: •
kotlinx.coroutines.io.parallelism
kotlinx.coroutines.scheduler.core.pool.size
kotlinx.coroutines.scheduler.max.pool.size
j
I’d try logging the value of these system properties during execution
The three of them are
null
. Is that expected or did I just look for wrong system properties? 😅
d
If you checked the properties correctly, it just means no one set them, which is good, as it rules out them as culprits. So, I don't see any options except for someone leaking threads using
limitedParallelism
. In this case, I'd expect
kotlinx.coroutines.internal.LimitedDispatcher.Worker#run
in the stack traces of these threads.
👍 1
j
Ok, I’ll focus more on the externals that we’re using and try to look for some clues in the stack traces. Thanks for your help, it’s very appreciated!
d
Sure thing. Please let us know when you discover the cause of the problem.
👍 1
c
I've been following this thread silently, and it was very interesting. Are the tips listed here available somewhere else for future reference? (blog posts, documentation…)
j
Please let us know when you discover the cause of the problem.
I think it might have been Ktor setting
limitedParallelism
on its own after all. I did find
kotlinx.coroutines.internal.LimitedDispatcher$Worker.run
together with some
io.ktor
entries in our stack traces and noticed we used an older version of Ktor (
2.0.2
). After I updated it to the latest (
2.3.5
), I would no longer see the worrying amounts of the
DefaultDispatcher-worker
threads.
If there wasn’t a new version of Ktor, would setting the
kotlinx.coroutines.io.parallelism
property in our app overrule any unexpected use of
limitedParallelism
in the dependency? It looks like it’s a really delicate API that might lead to quite erroneous situations, especially if used in dependencies that are out of developer’s control. It would be good to know if there is a way to have the last word in the end product and be able to set a hard limit to simply ignore any misuse in dependencies.
d
If there wasn’t a new version of Ktor, would setting the
kotlinx.coroutines.io.parallelism
property in our app overrule any unexpected use of
limitedParallelism
in the dependency?
It wouldn't. The whole point of
<http://Dispatchers.IO|Dispatchers.IO>
elasticity is to be able to do things like
Copy code
val databaseConnection = Dispatchers.IO.limitedParallelism(200)
val fileReading = Dispatchers.IO.limitedParallelism(10)
and not have these different views fight with one another for threads.
limitedParallelism
is supposed to be used instead of manually created thread pools.
It looks like it’s a really delicate API that might lead to quite erroneous situations, especially if used in dependencies that are out of developer’s control.
The same can be said about any usage of thread pools.
It would be good to know if there is a way to have the last word in the end product and be able to set a hard limit to simply ignore any misuse in dependencies.
The JVM itself doesn't allow this in general: no one can prevent a buggy dependency from spawning a thousand threads using
thread { }
or from forgetting to close thread pools. Likewise, no one can prevent a buggy dependency from hogging all CPU time by flooding
Dispatchers.Default
with nonsense. Likewise, a buggy library could have a memory leak. Dependencies do inevitably have a lot of ways to cause resource starvation. I think adding a workaround for the specific issue of buggy usage of
limitedParallelism
would add a lot of cognitive load around it, but wouldn't make a dent in this general class of bugs.
j
Ok, these are fair points indeed 👍
816 Views