Recently I’ve had a couple of ANR reports: ```jdk....
# coroutines
m
Recently I’ve had a couple of ANR reports:
Copy code
jdk.internal.misc.Unsafe.getReferenceVolatile
when calling
trySend
on a
BufferedChannel
(actually I’m using SingleShotEventBus). Is this something that can be mitigated?
Copy code
main (runnable):tid=1 systid=22340 
#00 pc 0x4bf790 libart.so (art::DumpNativeStack + 108) (BuildId: 02bec5940be704b863f6514fc7d81c41)
#01 pc 0x4b2940 libart.so (art::Thread::DumpStack const + 388) (BuildId: 02bec5940be704b863f6514fc7d81c41)
#02 pc 0x4b2280 libart.so (art::DumpCheckpoint::Run + 164) (BuildId: 02bec5940be704b863f6514fc7d81c41)
#03 pc 0x3c8c9c libart.so (art::Thread::RunCheckpointFunction + 148) (BuildId: 02bec5940be704b863f6514fc7d81c41)
#04 pc 0x5afae0 libart.so (artTestSuspendFromCode + 232) (BuildId: 02bec5940be704b863f6514fc7d81c41)
#05 pc 0x35501c libart.so (art_quick_test_suspend + 156) (BuildId: 02bec5940be704b863f6514fc7d81c41)
       at jdk.internal.misc.Unsafe.getReferenceVolatile(Native method)
       at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.get(AtomicReferenceFieldUpdater.java:480)
       at kotlinx.coroutines.JobSupport.getState$kotlinx_coroutines_core(JobSupport.kt:164)
       at kotlinx.coroutines.JobSupport.isActive(JobSupport.kt)
       at kotlinx.coroutines.AbstractCoroutine.isActive(AbstractCoroutine.kt)
       at kotlinx.coroutines.channels.ProducerCoroutine.isActive(Produce.kt)
       at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:96)
       at kotlinx.coroutines.EventLoop.processUnconfinedEvent(EventLoop.java:65)
       at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:241)
       at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:159)
       at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:470)
       at kotlinx.coroutines.CancellableContinuationImpl.completeResume(CancellableContinuationImpl.kt:586)
       at kotlinx.coroutines.channels.BufferedChannelKt.tryResume0(BufferedChannel.kt:2904)
       at kotlinx.coroutines.channels.BufferedChannelKt.access$tryResume0(BufferedChannel.kt:1)
       at kotlinx.coroutines.channels.BufferedChannel$BufferedChannelIterator.tryResumeHasNext(BufferedChannel.kt:1694)
       at kotlinx.coroutines.channels.BufferedChannel.tryResumeReceiver(BufferedChannel.kt:642)
       at kotlinx.coroutines.channels.BufferedChannel.updateCellSend(BufferedChannel.kt:458)
       at kotlinx.coroutines.channels.BufferedChannel.access$updateCellSend(BufferedChannel.kt:36)
       at kotlinx.coroutines.channels.BufferedChannel.trySend-JP2dKIU(BufferedChannel.kt:3307)
l
Are you catching all exceptions somewhere? You might have an infinite loop if you catch
CancellationException
without rethrowing it.
Other than that, I don't see why you'd have an ANR.
m
Ok thanks I’ll check
Does that mean every single place in suspending code where I have something like
catch(Exception)
I need to separate out the
CancelationException
and rethrow, or only under certain conditions?
l
Always rethrow
That's also documented on d.android.com/kotlin/coroutines
m
In https://developer.android.com/kotlin/coroutines#handling-exceptions it’s not rethrowing the
CancelationException
. Am I missing something?
l
Don't catch it, or rethrow it.
And that example is bad IMO
Looks like they updated the doc and it's no longer showing the gotcha 😠
m
Oh dear 😞
l
So, yeah, avoid preventing cancellation from happening
m
I wish there was a way in AS to search for try catch blocks but only in suspending code
l
You might also want to check inline functions because they can run suspending blocks
👍 1
m
How about
CancelationException
thrown from a
ContentProvider
implementation to its client? At the moment, I rethrow it as an
OperationCanceledException
which I guess could be a problem.
l
Seems like it's not the one from kotlinx.coroutines/java.util.concurrent, so it's up to you to figure out what's happening in those cases exactly, and how you should/want your code to react to that happening.
m
Do you mean there are some legitimate cases where you would want to catch it without rethrowing it?
l
I mean I don't know what's expected for that specific non Kotlin ContentProvider API
m
Regardless of what is expected of the ContentProvider, I wouldn’t want to leave a coroutine in some inconsistent state. I’ll have a play to see what happens if I just rethrow the CancellationException
Is there a reason why there is no lint check for this rethrowing? It seems like a very easy mistake to make.
l
The one from coroutines would propagate until the launch that started the coroutine eats the exception.
👍 1
I don't know, maybe @Vsevolod Tolstopyatov [JB] knows better why there's no warning, IDE intention or "lint" for this very (very very) common pitfall.
m
It almost feels like it should be a sibling of
Exception
and
Error
l
Oh yeah, that's a very good idea!
Somehow never thought about that.
m
Not gonna happen haha
l
I think it actually could happen
s
l
Would break things, but could be mitigated to break mostly "bad" code
v
There is no such inspection mostly because we haven't seen a lot of user-facing issues caused by accidental catching of
CancellationException
(NB: it's not like "there is no issues" but rather "We haven't seen much internally and we don't have enough external reports"). If you could report us things like "hey, here is my code that was really hard to troubleshoot. It's because I've caught
CE
and the child component got stuck", I would really appreciate it 😊
l
I'm pretty sure you don't see it much internally because you're not doing much Android development where exceptions are commonplace and where you often need to do defensive programming to some extent, only to be shot back in a sneaky way that some people probably don't even ever understand the root causes of because they're not coroutines experts like I am (humbly)
v
Definitely, we are kind of biased in that matter (and many others as well!), that's why external feedback is always appreciated. So any prior experiences and problems caused are welcome
m
I would never have guessed an ANR was caused by catching and not rethrowing a CE
v
This part is a bit unclear to me. The ANR stacktrace looks completely innocuous from the first glance. Potentially, there might be some loop in the form of
while (job.isActive) doWorkUnconditionally
, but other from that it's hard to pinpoint the potential problem
l
If I was asked this:
What do you believe is the worst pitfall of the ones remaining in Modern Android Development?
I would reply: "this issue with unintended breakage of cancellation and structured concurrency", because: • It's very sneaky • Hard to debug • Affects the whole app • You expect kotlinx.coroutines to be all about safety • It changes the way one needs to think about exceptions, without that being properly documented (and who reads doc? Well I do, but you know…)
thank you color 1
1
👌 1
I myself was bitten by this numerous times while writing greenfield "coroutines first". Last time it happened to me was probably in 2022, so 4-5 years after I considered myself very familiar with coroutines, and using them almost daily on my day job back then. Can't speak for others, that's just my experience.
I should probably come up with some snippets that show the problem one day, but I might never take the time to recreate those. What happened often is that some suspending code would be run inside a block that would catch any Exception (or worst, any Throwable), to avoid crashing everything when something called in the suspending code fails with an unpredictable exception. That code, including the part that would catch exceptions (or worst, throwables), would be located inside a loop (e.g. while(true)). On cancellation, you now have an infinite loop that will be extremely hard to debug, especially in release mode (and therefore, in production).
that loop might be there just to handle retries on error, or to restart whatever process once a previous iteration is done
I believe there are other use cases that can lead to something similar
v
Thanks! I'll see how we can incorporate this knowledge further
🙏🏼 1
l
There's this issue in Android Studio Jellyfish Canary 11 where it eats 300% CPU without a valid reason. I see
kotlinx.coroutines
there… Could this be related to that issue? blob thinking fast
Could be from a third party IDE plugin also of course, or could be another problem
k
It's worth noting that the less experienced engineers in my team, particularly ones on the iOS side working on our KMP code, dont realize that catching
Exception
without rethrowing CE is dangerous. We've had many, many issues regarding this. It's really hard to enforce on a team that isn't solely composed of senior engineers
☝🏼 1
And another example recently, I got a notification from davx5 specifying that the job was canceled on my phone. This leads me to believe that this issue is prevalent in popular OSS repos too
🫠 1
l
Copy code
@Deprecated
expect class CancellationException : Exception

@Deprecated
actual typealias CancellationException

class CancellationSignal : Throwable()
Copy code
@Deprecated
fun withTimeout(…)

@Deprecated
class TimeoutCancellationException
🫡
k
At the very least, I think that CancellationException shouldn’t inherit from IllegalStateException. I get why they did this for alias reasons with the JVM’s CancellationException, but now checks like this are error prone. It’s not just
catch (e: Exception)
.
Copy code
try {
  yield()
  checkNotNull(someField) { "It was null! Scary!" }
} catch (e: IllegalStateException) {
  logger.log(e)
  // BUG, we didn't rethrow CancellationException
}
I suspect we definitely wouldn’t be able to make any change like this until coroutines 2.0.0+
Perhaps a more graceful migration would be:
Copy code
// Coroutines < 2.0.0

@Deprecated("Use CancellationSignal") 
open class CancellationException : IllegalStateException

class CancellationSignal : CancellationException()

// Coroutines >= 2.0.0 

class CancellationSignal : Throwable()
👍 1
l
Your snippet with "BUG" misses a suspend call to make it plausible, maybe asd
delay(something)
or
awaitWhathever()
?
k
yeah sorry assume suspend because we’re in the coroutines channel. Lemme update.
m
Devs still need to be aware of this
CancellationException
curiosity because sometimes you want to catch everything in order to do some tidy-up (that is only required when something goes wrong) for example deleting a file that has been partially written to. So I’m not sure how much moving the Exception in the class hierarchy will help. I suppose if there was some kind of
finallyForThrowable
block (or if the
finally
block was passed a nullable
Throwable
), then that would work.
l
throwing null leads to a
NullPointerException
being thrown instead, which is ambiguous, so not great, best to just have a custom
Throwable
type IMO
m
Although getting OT, I was meaning it would be useful (in this case and others) to have a kind of finally for when a throwable is thrown (that’s the
finallyForThrowable
bit). Alternatively, the
finally
block could receive a
Throwable?
arg to indicate what was thrown (in the case of non-null), or null for when the
try
finished normally. Like with
finally
normally, you wouldn’t be throwing anything from there. It’s messy though because is the throwable the one thrown from the
try
or one potentially thrown from a
catch
?
k
catch everything in order to do some tidy-up (that is only required when something goes wrong) for example deleting a file that has been partially written to.
In a scenario like this you should probably be rethrowing the exception, regardless of its type.
Copy code
try {
  aThing()
} catch (e: Throwable) {
  doPartialTidyUp()
  throw e
}
m
Sometimes, but you may also be returning null, for example
l
You might, but should you? Personally, `try`/`finally` and
use { }
extensions do all I need. Then I need to catch `Exception`s general or specific. I can't think of any case where I would want to catch all
Error
subclasses or a random
Throwable
subclass that is not an
Exception
.
catch (e: Throwable)
is not something I'd want to see in my codebases, because it's almost certain it's not handling errors properly.
☝️ 1
m
How would you deal with a partially written file that needs to be deleted on any throwable?
k
With the code that I shared two messages ago
m
But catching
Throwable
is exactly what Louis said he doesn’t want.
l
Unless you rethrow I meant
m
This is the point of the alternative
finally
I was suggesting. You don’t need to think about rethrowing. It’s done for you.
k
Such a niche use case doesn't warrant a language feature IMO when it can be built with existing language features as a function like
use
.
m
How would you use
use
in this case? You still need to either use a try/catch and rethrow or use a messy boolean to indicate whether the block finished normally or not. I think the cleanest solution would be for the
finally
block to be passed a boolean to indicate whether the
try
completed normally or not.
l
It'd just save one line of code (
throw t
). If you find a proper name for a function taking 2 lambdas, running the second one on Throwable before rethrowing, or a name for a
Result<T>
extension that calls a block with the Throwable then rethrows (to be used after
runCatching
, then go for it!
m
It’s really not about saving one line of code, it’s more about enforcement. We could make a similar argument on why
finally
is not necessary. It can easily be achieved by calling a local tidyup function and rethrowing as necessary.
l
I think you'd want something like that (not possible with current Kotlin though):
Copy code
try {
    doPossiblyThrowingStuff()
} intercept (t: Throwable) {
    cleanup() // Won't suffice in case of process death!
}
This can work:
Copy code
runCatching {
    doPossiblyThrowingStuff()
}.intercept<Throwable> {
    cleanup() // Won't suffice in case of process death!
}
You'd need to document that
intercept
rethrows after running its block
And you can make it return
T
(from
Result<T>
)
m
I don’t even think it needs the Throwable generic because it’s very much about if anything goes wrong which is why I was thinking of a boolean.
l
I kept it because without it comes the question: "What is it intercepting exactly?"
m
Like this?
Copy code
fun <T> Result<T>.interceptFailure(failureBlock: () -> Unit): T {
    if (this.isFailure) {
        failureBlock()
        throw exceptionOrNull()!!
    }
    return getOrThrow()
}
l
Yes, and you can (should) make it inline
👍 1
k
Commenting here again — just reviewed a PR from one of my senior coworkers that catches
IllegalStateException
but did not rethrow CancellationException. The existence of utilities like
checkNotNull
in Kotlin popularized the use of general exceptions like ISE, and that directly conflicts with how cancellation should propagate through a system. Perhaps we should raise this as an issue on the repo?