FYI I opened <https github com apollographql apollo android kotlinlang #apollo-kotlin

FYI I opened <https://github.com/apollographql/apo...

wasyl

01/26/2021, 2:14 PM

FYI I opened https://github.com/apollographql/apollo-android/issues/2899 which is super weird, and we spent better part of recent days chasing it down. I’m available for clarification if needed, esp. if it’s non-reproducible for Apollo folks

👀 1

mbonnin

01/26/2021, 2:39 PM

Thanks for the awesome reproducer, I can reproduce as well

🎉 1

mbonnin

01/26/2021, 4:13 PM

If you get a chance to try https://github.com/apollographql/apollo-android/pull/2900, that fixes it for me.

wasyl

01/26/2021, 5:38 PM

Is it published somewhere?

wasyl

01/26/2021, 5:39 PM

Btw what do you mean by

NETWORK_ONLY watcher was cancelled when trying to refetch

does the first query refetch something in my repro? Or it’s just that it’s cancelled right after it emits first value that’s the issue?

mbonnin

01/26/2021, 6:38 PM

Is it published somewhere?

Not yet, there will be snapshots once it is merged. If you have the repo checked out locally, you can do

./gradlew publishToMavenLocal

and add

mavenLocal

to your repositories to try it out.

👍 1

mbonnin

01/26/2021, 6:39 PM

does the first query refetch something in my repro?

Yea, the first query is a watcher so technically it refetches with a cache_first refetch policy but gets cancelled as soon as the network finishes because the flow gets cancelled

mbonnin

01/26/2021, 6:39 PM

If you replace the first query

toFlow()

.await()

you won't have the problem

wasyl

01/26/2021, 8:47 PM

Thanks a lot 🙂 We can’t apply

.toFlow() -> .await()

fix in our codebase because abstractions, also we create watcher quite early and there’s no

.await()

on that. And testing on CI would be slightly too complicated if we had to build the library locally, so we’ll have to wait for a snapshot. I could only confirm I don’t observe the issue on my repro project anymore with the snapshot

mbonnin

01/26/2021, 8:49 PM

👍👍I just merged the pr. Snapshots should start to appear in ~30min

wasyl

01/26/2021, 8:50 PM

🙏 we’ll test it first thing tomorrow 🙂

wasyl

01/26/2021, 8:50 PM

Although how the fix works escapes me 😉 In the PR you’re delaying the error that happens in

onCacheRecordsChanged

wasyl

01/26/2021, 8:51 PM

Is the

RealApolloStore#publish

called from somewhere else than the networkOnly watcher? I thought the issue is that cancelling the watcher also interrupts

publish

call

mbonnin

01/26/2021, 8:52 PM

The store has 2 listeners. The first one throws so the second one doesn't get the update

mbonnin

01/26/2021, 8:53 PM

And the second one is the one that we want to get the update

mbonnin

01/26/2021, 8:53 PM

Tbh I think the first one shouldn't throw. It might be ok call refetch in 'cancelled' state

mbonnin

01/26/2021, 8:54 PM

But that's a more intrusive fix which I'd keep for 3.0

👍 1

wasyl

01/26/2021, 9:02 PM

So the

ApooloStore#publish

is just posted on some executor, this call is not interrupted by anything? And the race condition is that with working case, we’d have: • network call completes, posts runnable to publish cache changes • watcher is canceled, unsubscribes

recordChangeSubscriber

• store publishes the changes And the not working case would be that • network call completes, posts runnable to publish cache • store starts publishing the changes, gets the first subscriber •

first()

cancels the flow, so the watcher is cancelled • store calls the first subscriber’s callback but it’s already canceled and throws exception ?

wasyl

01/26/2021, 9:03 PM

Feel free to let me know if this is too much to explain 🙂 I just want to understand better how Apollo works internally, as it’s quite helpful sometimes

👍 1

wasyl

01/27/2021, 11:52 AM

Seems the most recent snapshot fixes the issue for us 🎉 Thank you 🙂

wasyl

01/27/2021, 11:52 AM

Is there any release planned soon or not yet?

mbonnin

01/27/2021, 11:57 AM

Seems the most recent snapshot fixes the issue for us

Nice glad to hear that! 🎉

Is there any release planned soon or not yet?

There's not much activity in the main branch these days so releases tend to stretch a little bit. Ok if I do one on monday? I like Monday releases

wasyl

01/27/2021, 11:58 AM

Sure, no worries 🙂

👍 1

mbonnin

01/27/2021, 11:59 AM

I just want to understand better how Apollo works internally, as it’s quite helpful sometimes

Sorry I missed the question above. No problem at all, brainstorming is good and we can have more people know about the internals 👍

mbonnin

01/27/2021, 12:00 PM

Yup what you described seems accurate

mbonnin

01/27/2021, 12:01 PM

Nitpicking, I think in the not working case this happens: • network call completes, posts runnable to publish cache •

first()

cancels the flow, so the watcher is cancelled • the runnable is dispatched • store calls the first subscriber’s callback but it’s already canceled and throws exception • the second subscriber (cache_only) doesn't get the store update

mbonnin

01/27/2021, 12:02 PM

The publishing is all synchronous, I don't think it can be interrupted

wasyl

01/27/2021, 12:05 PM

If the watcher is cancelled, it unregisters the listener though, right? So the cancelling would’ve had to happen already after dispatching the runnable and acquiring reference to still-not-unregistered listener?

mbonnin

01/27/2021, 12:05 PM

Mmm good call

wasyl

01/27/2021, 12:08 PM

I think that’s why it’s happening so rarely. So my last question: if the

publish

call is just a runnable on a dispatcher, what happens with the exception? Is it thrown just to stop some potential further processing? Is it logged? Because as I understand it shouldn’t be propagated anywhere to the app code

mbonnin

01/27/2021, 12:09 PM

So the cancelling would’ve had to happen already after dispatching the runnable and acquiring reference to still-not-unregistered listener

Right, would need to happen somewhere here:

Copy code

val recordChangeSubscriber: RecordChangeSubscriber = object : RecordChangeSubscriber {
    override fun onCacheRecordsChanged(changedRecordKeys: Set<String>) {
      if (dependentKeys.isEmpty() || !areDisjoint(dependentKeys, changedRecordKeys)) {
        // scheduling would need to happen somewhere here 
        refetch()

👍 1

mbonnin

01/27/2021, 12:09 PM

what happens with the exception

Excellent question, it's lost forever I'm afraid

mbonnin

01/27/2021, 12:10 PM

The semantics of all these dispatchers and and callbacks aren't great TBH, I'm hoping we can do something better with coroutines

mbonnin

01/27/2021, 12:11 PM

Thinking about this a bit more, I think we should consider errors in subscribers as fatal errors, I don't see a reason to allow failure there

mbonnin

01/27/2021, 12:12 PM

The proper fix is maybe to silently ignore a refetch after a cancel

mbonnin

01/27/2021, 12:13 PM

(and crash if someone throws in a store listener)

mbonnin

01/27/2021, 12:13 PM

but that scares me a bit 😅

wasyl

01/27/2021, 12:14 PM

I think I got it 🙂 I think the missing piece for me was that publishing cache updates was a posted runnable, and not part of making an actual call. That is, I though

writeToCacheAsynchronously=false

would mean that publishing is also synchronous. But it’s not, just saving to cache is synchronous, publishing the changes is not

👍 1

mbonnin

02/01/2021, 11:33 AM

The fix is released: https://github.com/apollographql/apollo-android/releases/tag/v2.5.3

🙏 1

2 Views

Open in Slack

Previous Next