We have a case of a coroutine that stops running in a Ktor s kotlinlang #coroutines

We have a case of a coroutine that stops running i...

Ron Aharoni

12/13/2022, 8:53 AM

We have a case of a coroutine that stops running in a Ktor server we would like help with debugging. This happens sporadically on some of our machines. The machine doesn’t seem to lack resources (CPU / memory).

Copy code

class KafkaConsumer {

   private val scope = CoroutineScope(<http://Dispatchers.IO|Dispatchers.IO>)

   fun onMessage(message: KafkaMessage<K, V>?) {
      print("before launch") // this always shows
      scope.launch() {
        print("in coroutine") // this stops showing sometimes
        process() // Async operation that might take a second or two but also with some small runBlocking inside
      }
   }

}

Inside of

process

we make use again of the IO dispatcher to make an HTTP call. Are there debugging tools / techniques we could use? Would using a separate dispatcher inside

process

help with isolating the problem? Thanks

Sam

12/13/2022, 9:31 AM

Is this close to the real code, or just a simplified example? Can you give some more information about what the code is supposed to do? In its current form, it appears as if it would immediately try to launch an infinite number of coroutines, which I would guess isn’t what you intend.

Ron Aharoni

12/13/2022, 10:36 AM

hey @Sam I edited the message to make the code clear. the context is we’re launching on every message consumed from kafka.

Sam

12/13/2022, 10:40 AM

Okay, that makes more sense 👍. You said that

process

has a call to

runBlocking

inside? That could be an issue, especially if the code inside the

runBlocking

tries to dispatch things back to the original dispatcher. I would recommend trying to make

process

suspend

function, and avoid calling

runBlocking

inside code that might be running in a coroutine.

streetsofboston

12/13/2022, 12:34 PM

Does/can process() throw an exception? If so, one such process() can cancel the 'scope' from which it is launched, and from that moment on 'scope' can no longer launch any new jobs/coroutines.

streetsofboston

12/13/2022, 12:35 PM

To get around this, construct 'scope' with a SupervisorJob()

Pablichjenkov

12/13/2022, 2:16 PM

What happens if you override each job completion callback and print the exception. Does it bring more information about the underlying exception or just the cancellation stacktrace

Ron Aharoni

12/19/2022, 12:16 PM

Thanks all for the replies. @streetsofboston it seems that indeed was the problem, an exception which canceled the scope. To be safe we also removed all runBlocking calls.

3 Views

Open in Slack

Previous Next