How does one debug deadlocks in coroutines? What ...
# coroutines
x
How does one debug deadlocks in coroutines? What I'd like to be able to do is run the code until it deadlocks, and see a snapshot of where each coroutine is "stuck". The coroutines tab in the debugger doesn't work, so I've just been adding logging around every suspend point I think could be responsible. This "works", but is incredibly tedious and slow, and also error-prone because it'll fail if I don't think of the guilty suspend point when adding my logging. Is there a better way?
l
If you use
Mutex
, know that it is not reentrant.
x
No
Mutex
, just using channels.
l
Channels are often best kept as an implementation detail. Once you do this, you just have to think: Is there something that waits for something else that waits for that something? Another possible thing that might seem like a deadlock is an infinite loop, for example because of a special condition changing, or a function no longer suspending, or a
CancellationException
not caught when it should not, and not rethrown.
x
I'm not sure what you mean by "Channels are often best kept as an implementation detail".
l
If all of your codebase uses
Channel
, kinda like event buses, then it's harder to figure things out. Often, you can hide it in internals, behind a façade that exposes safer things like plain suspending functions, or
Flow
s, where it's unlikely you can get deadlocks.
x
Ah, yeah, I'm building something that uses channels internally, but uses regular suspend functions for the most part. I haven't found a use for flows just yet.
In any case, I know what can cause deadlocks, the issue is that if there's a bug that's causing a deadlock I don't see how to diagnose the problem without adding printfs everywhere.
l
Flows are asynchronous sequences, that's it. Useful for reactive programming, like reacting to a value change, do some operations based on some sequence of values, listen to changes from mutliple sources…
Are these suspending deadlocks or thread /locks related deadlocks?
x
Aren't channels also asynchronous sequences?
l
Channels are hot, unlike flows. Channels are more like BlockingQueue, and Flows, more like Sequence or Stream.
x
I have no idea. I can't debug it. One of my unit tests just stops doing anything. If I pause in the debugger, I'm in
sun.misc.Unsafe.park()
.
l
It always arrives in
park?
x
As far as I can tell, yes.
l
Then you'll need to log things to find what never resumes.
What platform are you on?
JVM or Android?
x
JVM (on Linux)
l
You can make an inline
withLog(…) { }
function that has a try catch finally block (and rethrows after logging), so it's less code to log entry/exit of suspend calls that might be the culprit.
x
Yeah, I've done that. Relying on debug logging just seems like a big step back. It's incredibly tedious. The broken exception stack traces also make debugging a challenge.
l
Debug logging has never been deprecated 😄
x
True. Just seems I need to rely on it a lot more when working with coroutines.
l
I am personally fine with the debugger, setting breakpoints (not step by step though because it doesn't work over suspension points so far), and using evaluate expression, but depending on what you're doing, it might not be enough.
u
Are all the threads in
sun.misc.Unsafe.park()
?
x
I just figured out what was going on with this one a couple of minutes ago. Turns out it wasn't really a deadlock. I'd run into this problem after dealing with a bunch of deadlocks, so I guess I had deadlocks on the brain. 🤦 What was actually happening: A test was waiting for something external to complete, and that thing was never completing. (I hadn't realized it had gotten this far, and was looking for a deadlock much earlier than this). It was doing roughly:
Copy code
while (!isItDone()) {
    delay(100)
}
So when I was pausing in the debugger my test was inside the
delay(100)
, but unfortunately, the call stack in the debugger didn't show delay, or any of my code. Just the test framework, and about 5 other stack frames leading up to
park
.