We have a case of a coroutine that stops running i...
# coroutines
r
We have a case of a coroutine that stops running in a Ktor server we would like help with debugging. This happens sporadically on some of our machines. The machine doesn’t seem to lack resources (CPU / memory).
Copy code
class KafkaConsumer {

   private val scope = CoroutineScope(<http://Dispatchers.IO|Dispatchers.IO>)

   fun onMessage(message: KafkaMessage<K, V>?) {
      print("before launch") // this always shows
      scope.launch() {
        print("in coroutine") // this stops showing sometimes
        process() // Async operation that might take a second or two but also with some small runBlocking inside
      }
   }

}
Inside of
process
we make use again of the IO dispatcher to make an HTTP call. Are there debugging tools / techniques we could use? Would using a separate dispatcher inside
process
help with isolating the problem? Thanks
s
Is this close to the real code, or just a simplified example? Can you give some more information about what the code is supposed to do? In its current form, it appears as if it would immediately try to launch an infinite number of coroutines, which I would guess isn’t what you intend.
r
hey @Sam I edited the message to make the code clear. the context is we’re launching on every message consumed from kafka.
s
Okay, that makes more sense 👍. You said that
process
has a call to
runBlocking
inside? That could be an issue, especially if the code inside the
runBlocking
tries to dispatch things back to the original dispatcher. I would recommend trying to make
process
a
suspend
function, and avoid calling
runBlocking
inside code that might be running in a coroutine.
s
Does/can process() throw an exception? If so, one such process() can cancel the 'scope' from which it is launched, and from that moment on 'scope' can no longer launch any new jobs/coroutines.
To get around this, construct 'scope' with a SupervisorJob()
p
What happens if you override each job completion callback and print the exception. Does it bring more information about the underlying exception or just the cancellation stacktrace
r
Thanks all for the replies. @streetsofboston it seems that indeed was the problem, an exception which canceled the scope. To be safe we also removed all runBlocking calls.