< Ian Lake> < matvei> < Ryan Mentley> < tcracknell> < jim>

<@UJBPFB3SN> <@UJRAJCAVB> <@UJRHRQRM4> <@USH9HDC4T...

Jason A. Donenfeld

09/30/2020, 1:01 PM

@Ian Lake @matvei @Ryan Mentley @tcracknell @jim @Adam Powell @Alexandre Elias [G] @msfjarvis Here's an interesting bug/race I've found in the kotlin coroutine port to Android/Jetpack. Let's assume we're operating inside of a fragment, using the fragment's lifecycle coroutine scope. Consider code like this:

Copy code

lifecycleScope.launch {
	try {
		// do stuff
		delay(5000)
		// do stuff
	} catch (_: Throwable) {
		requireContext()
	}
}

When the Fragment is destroyed (by, e.g., the user pressing the back button), the Fragment's coroutine scope will be properly cancelled, as you'd expect. This in turn raises a CancellationException inside the scope. This exception is caught by our catch block, above. requireContext() is then called, and throws a new exception, because the context has already been destroyed. It seems like destruction of the context and such is happening in the wrong order. What should be happening instead is that context and binding and other various members that Fragment has should be set to null only after coroutines have exited. The example is contrived, but imagine something a bit more sophisticated, like:

Copy code

lifecycleScope.launch {
	try {
		binding!!.config = getConfigAsync()
	} catch (_: Throwable) {
		binding!!.config = null
	}
}

Or, even more innocent looking:

Copy code

lifecycleScope.launch {
	try {
		// do something expensive
	} catch (_: Throwable) {
		Toast.make(ctx, getString(R.string.error_string), LONG).show()
	}
}

That getString will crash too. You might respond that CancellationException should always be handled differently in every try-catch block ever for all coroutines. Maybe you're right. But that's a lot of refactoring and weirdness. And it seems like this would be better solved by just fixing the race of lifecycle shutdown vs context detachment. Does that seem reasonable?

Joost Klitsie

09/30/2020, 1:38 PM

Or maybe don't do error catching at all like this in your view layer?

Jason A. Donenfeld

09/30/2020, 1:39 PM

I wrote above "You might respond that CancellationException...". Doesn't that already address your objection?

Joost Klitsie

09/30/2020, 2:21 PM

No it doesn't. I mean why not handle the work in a separate component (like view model) and then observe changes on that one. The observing should stop when view dies

Joost Klitsie

09/30/2020, 2:24 PM

you could also not do things that throw errors, as kotlin tries to steer you away from throwing errors anyway. So either try to catch a typed error for the operation you try to perform

Joost Klitsie

09/30/2020, 2:24 PM

make sure the getconfig returns a Result<> object of some type

Casey Brooks

09/30/2020, 6:14 PM

It’s not a race condition, it’s just the natural ordering of things. By the time the

lifecycleScope

gets cancelled, the Fragment is already in a bad state, so anything that happens within the coroutine at that point can’t be guaranteed to work. The coroutine itself is asynchronous while the Fragment is synchronous, so you can’t be 100% sure UI operations done in the coroutine are going to be possible. You should either manually check for cancellation in that

catch

, or use some mechanism for communicating with the UI that makes that check for you to add a layer of separation between the async coroutine and the UI (such as sending the UI request through a channel/Flow and the UI reading from the channel/Flow, or wrapping UI operations explicitly in

withContext(Dispatchers.Main) { }

)

Jason A. Donenfeld

09/30/2020, 11:04 PM

@Casey Brooks Well, lifecycleScope is already on Dispatchers.Main.immediate, so dunno about that latter point. Unless you're suggesting that an explicit context there results in a no-op due to the prior cancellation. That's pretty hacky though. But... anyway, the basic issue remains the same. LifecycleScope is supposed to take care of lifecycle issues with tear downs. What you call the "natural ordering of things", I call a race, because it's unpredictable. And that creates a lot of unreliability. Instead of forcing users to handle more cases in error handlers or some other form of synchronization -- or however other suggestions we could come up with in the ordinary course of constructive conversation -- why not avoid the foot gun all together? Namely, have the Fragment shutdown wait for Dispatchers.Main contexts running for that lifecycle to terminate. That wouldn't introduce any blocking ANR situation either, since we're only talking about Dispatchers.Main here, and not some IO thread. In other words, defer nulling members of the context and doing the detaching until the end of the event loop itself. Instead of detaching context immediately after calling cancel, queue up detachment on the event loop subsequently to the cancellation.

Adam Powell

10/01/2020, 3:50 AM

I see what you're getting at, and I understand your frustration, but tbh I think the switch to immediate dispatchers for the lifecycle scopes was something we shouldn't have done for a number of reasons, and relying on the ability to do more than cleanup upon resuming with a cancellation exception is kind of a bad habit to get into.

👆 1

Adam Powell

10/01/2020, 3:51 AM

The default assumption when resuming a suspend function is that you're dispatched. You can't count on being on a particular state when a coroutine is resumed, especially with an exception. You have what you're resumed with (the return value or the exception) and that's about it.

Adam Powell

10/01/2020, 3:53 AM

If you're doing something where you need to hang onto the context for cleanup, say, unbinding a service or something, grab a reference to it before you suspend and hang onto it in a local variable so you have it to use for cleanup in a catch

Adam Powell

10/01/2020, 4:02 AM

and fragment lifecycle teardown is fundamentally subservient to activity lifecycle teardown. If an activity is being destroyed, that's a synchronous notification from the system, not a suggestion where it's going to wait for you. Letting things linger around is fibbing to your code in a way that is going to fall apart in the long run.

Adam Powell

10/01/2020, 4:03 AM

Lots of things don't work on an Activity context once it's been marked as destroyed by the system. Letting that reference linger isn't the favor it might look like

💯 1

Casey Brooks

10/01/2020, 1:28 PM

Something also to know is that the immediate dispatcher only starts coroutines immediately, it cannot cancel them immediately. Coroutine cancellation is not magical, it can’t stop code from running at arbitrary locations, but instead is cooperative and cancellation must be explicitly checked at discrete points in the code (https://kotlinlang.org/docs/reference/coroutines/cancellation-and-timeouts.html#cancellation-is-cooperative). Part of understanding how to use coroutines properly is understanding how to write your code to cooperate with cancellation as well.

Open in Slack

Previous Next