https://kotlinlang.org logo
Title
w

wasyl

01/26/2021, 2:14 PM
FYI I opened https://github.com/apollographql/apollo-android/issues/2899 which is super weird, and we spent better part of recent days chasing it down. I’m available for clarification if needed, esp. if it’s non-reproducible for Apollo folks
👀 1
m

mbonnin

01/26/2021, 2:39 PM
Thanks for the awesome reproducer, I can reproduce as well
🎉 1
If you get a chance to try https://github.com/apollographql/apollo-android/pull/2900, that fixes it for me.
w

wasyl

01/26/2021, 5:38 PM
Is it published somewhere?
Btw what do you mean by
NETWORK_ONLY watcher was cancelled when trying to refetch
does the first query refetch something in my repro? Or it’s just that it’s cancelled right after it emits first value that’s the issue?
m

mbonnin

01/26/2021, 6:38 PM
Is it published somewhere?
Not yet, there will be snapshots once it is merged. If you have the repo checked out locally, you can do
./gradlew publishToMavenLocal
and add
mavenLocal
to your repositories to try it out.
👍 1
does the first query refetch something in my repro?
Yea, the first query is a watcher so technically it refetches with a cache_first refetch policy but gets cancelled as soon as the network finishes because the flow gets cancelled
If you replace the first query
toFlow()
by
.await()
you won't have the problem
w

wasyl

01/26/2021, 8:47 PM
Thanks a lot 🙂 We can’t apply
.toFlow() -> .await()
fix in our codebase because abstractions, also we create watcher quite early and there’s no
.await()
on that. And testing on CI would be slightly too complicated if we had to build the library locally, so we’ll have to wait for a snapshot. I could only confirm I don’t observe the issue on my repro project anymore with the snapshot
m

mbonnin

01/26/2021, 8:49 PM
👍👍I just merged the pr. Snapshots should start to appear in ~30min
w

wasyl

01/26/2021, 8:50 PM
🙏 we’ll test it first thing tomorrow 🙂
Although how the fix works escapes me 😉 In the PR you’re delaying the error that happens in
onCacheRecordsChanged
Is the
RealApolloStore#publish
called from somewhere else than the networkOnly watcher? I thought the issue is that cancelling the watcher also interrupts
publish
call
m

mbonnin

01/26/2021, 8:52 PM
The store has 2 listeners. The first one throws so the second one doesn't get the update
And the second one is the one that we want to get the update
Tbh I think the first one shouldn't throw. It might be ok call refetch in 'cancelled' state
But that's a more intrusive fix which I'd keep for 3.0
👍 1
w

wasyl

01/26/2021, 9:02 PM
So the
ApooloStore#publish
is just posted on some executor, this call is not interrupted by anything? And the race condition is that with working case, we’d have: • network call completes, posts runnable to publish cache changes • watcher is canceled, unsubscribes
recordChangeSubscriber
• store publishes the changes And the not working case would be that • network call completes, posts runnable to publish cache • store starts publishing the changes, gets the first subscriber •
first()
cancels the flow, so the watcher is cancelled • store calls the first subscriber’s callback but it’s already canceled and throws exception ?
Feel free to let me know if this is too much to explain 🙂 I just want to understand better how Apollo works internally, as it’s quite helpful sometimes
👍 1
Seems the most recent snapshot fixes the issue for us 🎉 Thank you 🙂
Is there any release planned soon or not yet?
m

mbonnin

01/27/2021, 11:57 AM
Seems the most recent snapshot fixes the issue for us
Nice glad to hear that! 🎉
Is there any release planned soon or not yet?
There's not much activity in the main branch these days so releases tend to stretch a little bit. Ok if I do one on monday? I like Monday releases
w

wasyl

01/27/2021, 11:58 AM
Sure, no worries 🙂
👍 1
m

mbonnin

01/27/2021, 11:59 AM
I just want to understand better how Apollo works internally, as it’s quite helpful sometimes
Sorry I missed the question above. No problem at all, brainstorming is good and we can have more people know about the internals 👍
Yup what you described seems accurate
Nitpicking, I think in the not working case this happens: • network call completes, posts runnable to publish cache •
first()
 cancels the flow, so the watcher is cancelled • the runnable is dispatched • store calls the first subscriber’s callback but it’s already canceled and throws exception • the second subscriber (cache_only) doesn't get the store update
The publishing is all synchronous, I don't think it can be interrupted
w

wasyl

01/27/2021, 12:05 PM
If the watcher is cancelled, it unregisters the listener though, right? So the cancelling would’ve had to happen already after dispatching the runnable and acquiring reference to still-not-unregistered listener?
m

mbonnin

01/27/2021, 12:05 PM
Mmm good call
w

wasyl

01/27/2021, 12:08 PM
I think that’s why it’s happening so rarely. So my last question: if the
publish
call is just a runnable on a dispatcher, what happens with the exception? Is it thrown just to stop some potential further processing? Is it logged? Because as I understand it shouldn’t be propagated anywhere to the app code
m

mbonnin

01/27/2021, 12:09 PM
So the cancelling would’ve had to happen already after dispatching the runnable and acquiring reference to still-not-unregistered listener
Right, would need to happen somewhere here:
val recordChangeSubscriber: RecordChangeSubscriber = object : RecordChangeSubscriber {
    override fun onCacheRecordsChanged(changedRecordKeys: Set<String>) {
      if (dependentKeys.isEmpty() || !areDisjoint(dependentKeys, changedRecordKeys)) {
        // scheduling would need to happen somewhere here 
        refetch()
👍 1
what happens with the exception
Excellent question, it's lost forever I'm afraid
The semantics of all these dispatchers and and callbacks aren't great TBH, I'm hoping we can do something better with coroutines
Thinking about this a bit more, I think we should consider errors in subscribers as fatal errors, I don't see a reason to allow failure there
The proper fix is maybe to silently ignore a refetch after a cancel
(and crash if someone throws in a store listener)
but that scares me a bit 😅
w

wasyl

01/27/2021, 12:14 PM
I think I got it 🙂 I think the missing piece for me was that publishing cache updates was a posted runnable, and not part of making an actual call. That is, I though
writeToCacheAsynchronously=false
would mean that publishing is also synchronous. But it’s not, just saving to cache is synchronous, publishing the changes is not
👍 1
m

mbonnin

02/01/2021, 11:33 AM