We’re seeing flakes in our CI system with the Turb...
# coroutines
r
We’re seeing flakes in our CI system with the Turbine library to test flows. The error is:
Copy code
kotlinx.coroutines.test.UncompletedCoroutinesError: After waiting for 10s, the test coroutine is not completing, there were active child jobs: [UndispatchedCoroutine{Active}@337aeb49]
We enabled
DebugProbes
to track down the coroutine running and it’s:
Copy code
Coroutine "coroutine#10":StandaloneCoroutine{Cancelled}@475d618b, state: RUNNING (Last suspension stacktrace, not an actual stacktrace)
        at app.cash.turbine.ChannelKt$withWallclockTimeout$2$timeoutJob$1.invokeSuspend(channel.kt:114)
        at _COROUTINE._CREATION._(CoroutineDebugging.kt:34)
This points us to this line:
Copy code
val timeoutJob = GlobalScope.launch(Dispatchers.Default) { delay(timeout) }
It seems like this job is scheduled and cancelled, but then keeps running. How can a coroutine be cancelled and running at the same time, when it only executes a single suspending block? Any advice? Could this be potentially a bug in the implementation?
f
Cancellation is cooperative. If the suspending function you execute doesn't cooperate (terminate promptly when the coroutine is cancelled), then cancellation will be ineffective. For example:
Copy code
suspend fun example() {
  while (true) {
    Thread.sleep(1000)
  }
}
is not cancellable and will run forever no matter whether you cancel the coroutine. Plus, even when cooperative, cancellation is asynchronous: from the moment when you cancel to the moment when it stops executing, some time may pass. You should leverage structured concurrency and avoid GlobalScope, generally speaking
r
Note the line I’ve linked.
delay(..)
is cooperative and should not keep running. The tests in question usually execute within a few milliseconds. 10 seconds should be more than plenty to cancel the coroutine.
s
A job that’s launched on the default dispatcher won’t cause the error that you shared. The error will only result from a coroutine that’s running on the test dispatcher. I’m not familiar with Turbine, but it appears that this timeout job is launched on the default dispatcher specifically to prevent it from interacting with the coroutine test scheduler. The fact it shows as still active is a red herring; you should look for other uncompleted coroutines that were launched in your test.