Hello guys wave skin tone 4 I was wondering if any of you co kotlinlang #coroutines

Hello guys, :wave::skin-tone-4: I was wondering i...

Helio

01/04/2024, 1:53 AM

Hello guys, 👋🏽 I was wondering if any of you could please shed some light regarding the snippet I'm sharing here. We have an application that uses Ktor Http Client with CIO engine. The workflow I'm sharing makes a lot of requests asynchronously to another service. In production this might reach ~12k every minute. However, we've observed that the longer the application stays alive, e.g ~ after 10 days we start to see an increase in latency. We've taken a

jmap

and analysed the heap of the application and we can see that we have a lot of Dispatchers in

runnable

mode, and just a few in

waiting

. When we compare a fresh new deployment of the application, we can see the opposite, a lot of Dispatchers in

waiting

mode, and usually just a few in

runnable

mode. It is worth mentioning as well that we see a high number of

thread tid

. We suspect that the latency starts to increase because when the requests are being fired asynchronously, only a hand of dispatchers are available, for example, 8 out of 160. Is there anything obvious you think we might be missing in our workflow? Any help is greatly appreciated. Note: The snippet is not executable, it is just shared in a way to understand the flow.

Untitled.cpp

mitch

01/08/2024, 1:33 PM

hey @Helio I can't see anything that stands out in the code.. there might be a resource leak somewhere in the workflow, usages of runCatching (that caught cancellation mistakenly) or launching a coroutines that never succeeded can do that... If I were you maybe I'll try scheduling all the tasks in a specific

CoroutinesScope

, capturing the children, and regularly firing gauge metric on how many childrens in active state / completed / cancelled state..? the theory is there should be a healthy amount of jobs in the active bucket and it should not grow indefinitely

mitch

01/08/2024, 1:38 PM

i.e. perhaps something like this happens

Copy code

suspend fun main() {
    coroutineScope {
        val foo = async { doStuff() }
        foo.await() // this does not resolve
    }
}

suspend fun doStuff(): Unit {
    delay(100000) // simulate hanging job
}

Helio

01/08/2024, 10:01 PM

Oh, hello Mitchell! haha Thanks so much for that... We will continue with the investigation to see if we can find anything. Appreciate your help

Open in Slack

Previous Next