Hello guys, :wave::skin-tone-4: I was wondering i...
# coroutines
h
Hello guys, 👋🏽 I was wondering if any of you could please shed some light regarding the snippet I'm sharing here. We have an application that uses Ktor Http Client with CIO engine. The workflow I'm sharing makes a lot of requests asynchronously to another service. In production this might reach ~12k every minute. However, we've observed that the longer the application stays alive, e.g ~ after 10 days we start to see an increase in latency. We've taken a
jmap
and analysed the heap of the application and we can see that we have a lot of Dispatchers in
runnable
mode, and just a few in
waiting
. When we compare a fresh new deployment of the application, we can see the opposite, a lot of Dispatchers in
waiting
mode, and usually just a few in
runnable
mode. It is worth mentioning as well that we see a high number of
thread tid
. We suspect that the latency starts to increase because when the requests are being fired asynchronously, only a hand of dispatchers are available, for example, 8 out of 160. Is there anything obvious you think we might be missing in our workflow? Any help is greatly appreciated. Note: The snippet is not executable, it is just shared in a way to understand the flow.
m
hey @Helio I can't see anything that stands out in the code.. there might be a resource leak somewhere in the workflow, usages of runCatching (that caught cancellation mistakenly) or launching a coroutines that never succeeded can do that... If I were you maybe I'll try scheduling all the tasks in a specific
CoroutinesScope
, capturing the children, and regularly firing gauge metric on how many childrens in active state / completed / cancelled state..? the theory is there should be a healthy amount of jobs in the active bucket and it should not grow indefinitely
i.e. perhaps something like this happens
Copy code
suspend fun main() {
    coroutineScope {
        val foo = async { doStuff() }
        foo.await() // this does not resolve
    }
}

suspend fun doStuff(): Unit {
    delay(100000) // simulate hanging job
}
h
Oh, hello Mitchell! haha Thanks so much for that... We will continue with the investigation to see if we can find anything. Appreciate your help