https://kotlinlang.org logo
#coroutines
Title
# coroutines
x

xenomachina

03/11/2021, 5:18 PM
How does one debug deadlocks in coroutines? What I'd like to be able to do is run the code until it deadlocks, and see a snapshot of where each coroutine is "stuck". The coroutines tab in the debugger doesn't work, so I've just been adding logging around every suspend point I think could be responsible. This "works", but is incredibly tedious and slow, and also error-prone because it'll fail if I don't think of the guilty suspend point when adding my logging. Is there a better way?
l

louiscad

03/11/2021, 5:28 PM
If you use
Mutex
, know that it is not reentrant.
x

xenomachina

03/11/2021, 5:31 PM
No
Mutex
, just using channels.
l

louiscad

03/11/2021, 5:34 PM
Channels are often best kept as an implementation detail. Once you do this, you just have to think: Is there something that waits for something else that waits for that something? Another possible thing that might seem like a deadlock is an infinite loop, for example because of a special condition changing, or a function no longer suspending, or a
CancellationException
not caught when it should not, and not rethrown.
x

xenomachina

03/11/2021, 5:36 PM
I'm not sure what you mean by "Channels are often best kept as an implementation detail".
l

louiscad

03/11/2021, 5:38 PM
If all of your codebase uses
Channel
, kinda like event buses, then it's harder to figure things out. Often, you can hide it in internals, behind a façade that exposes safer things like plain suspending functions, or
Flow
s, where it's unlikely you can get deadlocks.
x

xenomachina

03/11/2021, 5:44 PM
Ah, yeah, I'm building something that uses channels internally, but uses regular suspend functions for the most part. I haven't found a use for flows just yet.
In any case, I know what can cause deadlocks, the issue is that if there's a bug that's causing a deadlock I don't see how to diagnose the problem without adding printfs everywhere.
l

louiscad

03/11/2021, 5:46 PM
Flows are asynchronous sequences, that's it. Useful for reactive programming, like reacting to a value change, do some operations based on some sequence of values, listen to changes from mutliple sources…
Are these suspending deadlocks or thread /locks related deadlocks?
x

xenomachina

03/11/2021, 5:47 PM
Aren't channels also asynchronous sequences?
l

louiscad

03/11/2021, 5:48 PM
Channels are hot, unlike flows. Channels are more like BlockingQueue, and Flows, more like Sequence or Stream.
x

xenomachina

03/11/2021, 5:48 PM
I have no idea. I can't debug it. One of my unit tests just stops doing anything. If I pause in the debugger, I'm in
sun.misc.Unsafe.park()
.
l

louiscad

03/11/2021, 5:49 PM
It always arrives in
park?
x

xenomachina

03/11/2021, 5:50 PM
As far as I can tell, yes.
l

louiscad

03/11/2021, 5:52 PM
Then you'll need to log things to find what never resumes.
What platform are you on?
JVM or Android?
x

xenomachina

03/11/2021, 5:53 PM
JVM (on Linux)
l

louiscad

03/11/2021, 5:54 PM
You can make an inline
withLog(…) { }
function that has a try catch finally block (and rethrows after logging), so it's less code to log entry/exit of suspend calls that might be the culprit.
x

xenomachina

03/11/2021, 6:00 PM
Yeah, I've done that. Relying on debug logging just seems like a big step back. It's incredibly tedious. The broken exception stack traces also make debugging a challenge.
l

louiscad

03/11/2021, 6:02 PM
Debug logging has never been deprecated 😄
x

xenomachina

03/11/2021, 6:07 PM
True. Just seems I need to rely on it a lot more when working with coroutines.
l

louiscad

03/11/2021, 7:31 PM
I am personally fine with the debugger, setting breakpoints (not step by step though because it doesn't work over suspension points so far), and using evaluate expression, but depending on what you're doing, it might not be enough.
u

uli

03/11/2021, 9:50 PM
Are all the threads in
sun.misc.Unsafe.park()
?
x

xenomachina

03/11/2021, 10:11 PM
I just figured out what was going on with this one a couple of minutes ago. Turns out it wasn't really a deadlock. I'd run into this problem after dealing with a bunch of deadlocks, so I guess I had deadlocks on the brain. 🤦 What was actually happening: A test was waiting for something external to complete, and that thing was never completing. (I hadn't realized it had gotten this far, and was looking for a deadlock much earlier than this). It was doing roughly:
Copy code
while (!isItDone()) {
    delay(100)
}
So when I was pausing in the debugger my test was inside the
delay(100)
, but unfortunately, the call stack in the debugger didn't show delay, or any of my code. Just the test framework, and about 5 other stack frames leading up to
park
.
10 Views