We're testing offline mode and seeing strange diff...
# apollo-kotlin
j
We're testing offline mode and seeing strange difference in behaviour between 2 different queries regarding returning data from the cache. In the working query we're seeing cache value emitted following as expected with exception when it hits network whereas in the other one we're only getting the exception. I've looked at the data in the cache and confirmed entry we're querying for is there. We're using
FetchPolicy.CacheAndNetwork
in both cases. Is there anything else you'd recommend checking to help diagnose? Also if I use
CacheOnly
as a test then the data is coming back.
We're using Apollo Kotlin 4 btw
b
for the one that doesn't work, are you getting a cache miss or network exception?
j
just the network exception I believe....but let me double check as there's some wrappers being used around exception handling (but would be same in both cases). I probably should also ramp up apollo kotlin logging level (would that also show cache misses?).....think there's example of doing that in confetti πŸ˜ƒ
But given that data does come back when using
CacheOnly
it seems that wouldn't be the issue?
b
Yeah you're probably right, better be sure though 😊
CacheAndNetwork
should always emit twice, are you saying you only get 1 emission in the 2nd case?
(not with
toFlowV3
)
j
So, I've enabled aeroplane mode to test this
πŸ‘ 1
so just get the expected network error in second case......but going to double check exceptions now
we're still using
toFlowV3
btw fwiw
b
Oooooh all right!
j
(plan is to update that soon)
I think I tried with
toFlow
but will confirm again (know there's difference in exception logic between them)
πŸ‘ 1
We're also using
doNotStore(false)
in query fwiw (but being used in both cases as well)
b
that shouldn't interfere
j
(actually looks like that's the default anyway....need to confirm why that's being called there)
b
yeah πŸ™‚
doNotStore(false)
is the same as not setting anything
if you could try `toFlow`and see if you go get 2 emissions and what they look like, maybe that'd help figure out what's going on
j
Will do....probably be tomorrow before I get a chance
πŸ‘ 1
Is there anything you'd recommend re. logging setup as well?
b
there's
ApolloClient.Builder.logCacheMisses(...)
that may help troubleshoot
πŸ‘ 1
if you did use it yet, looking at the contents of the cache is easier with the IDE plugin (which is nicer to use if you setup apollo-debug-server)
j
yeah, I actually used that to confirm that the entry is in the cache
b
all right πŸ‘
j
did use that option to pull down db from the device....must try out the apollo-debug-server
πŸ‘ 1
ok, I couldn't resist trying.....so it works if I use
toFlow()
(with associated update in error handling)!
b
that's good news πŸŽ‰ Still intrigued by what's happening with
toFlowV3
though
j
with updated v4 error handling should we need to catch
DefaultApolloException
(and, if so, what would it indicate?)
b
no, you shouldn't need to try catch in general
πŸ‘ 1
j
btw is there any difference to how we'd detect a cache missed error with new v4 flow?
colleague had recommended using
dataAssertNoErrors
and then catching
DefaultApolloException
that that throws
b
well it's different since these won't throw but instead will be surfaced in
.exception
but they still will be
CacheMissException
j
the updated
dataAssertNoErrors
also check
exception
b
if you prefer try/catch then yes
dataAssertNoErrors
is one way to go
j
ah, I guess we check
cause
in that
which should be set to
CacheMissException
in this case
βœ… 1
@bod did you by chance find anything with v3 version that would have caused that cache issue? We're having to at least temporarily switch back to
toFlowV3
until some related error response handling changes are made. I think we can reproduce pretty reliably so can perhaps add/capture some extra logging? cc @Marco Pierucci
b
Hey! Hmm not sure there is much you can log tbh. You could add breakpoints to try to figure out what's going on - starting with FetchPolicyInterceptors.kt:117 which is where the cache is looked up
j
cool, will try that
strange....I'm seeing following being executed for the particular query but not getting in
toFlowV3().map
block (or associated
collect
)
Copy code
emit(cacheResponse.newBuilder().isLast(false).build())
b
and does
cacheResponse
have data (no exception)?
j
yeah, looks good
only seems to be for this query for some reason.....trying to see what might be different
πŸ‘€ 1
so, turns out issue is not related to that particular query.....it happens when we kick off 2 different queries and we see issue with whichever one is kicked off first......some cancellation kicking in perhaps....
if I put invocation of each query in a runBlocking then they both correctly get data from cache
b
hmm that's unexpected. So the scenario is to
toFlowV3
2 queries in parallel, both with
CacheAndNetwork
, both are cached, while the networks is down?
j
yeah, exactly.....and seems to work with
toFlow
....
b
let's see if I can repro on a basic project
πŸ‘ 1
j
Just to confirm....we would expect to get the response with the cached version first and then then the
ApolloNeworkException
(with toFlowV3)?
b
correct. But the exception is thrown so this would interrupt your collect,
I can't seem to reproduce in my simple project that does this:
Copy code
try {
    apolloClient.query(LaunchListQuery()).fetchPolicy(FetchPolicy.CacheAndNetwork).toFlowV3().collect {
      println("0 " + (it.data != null))
    }
  } catch (e: Exception) {
    println("0 " + e)
  }

  GlobalScope.launch {
    try {
      apolloClient.query(LaunchListQuery()).fetchPolicy(FetchPolicy.CacheAndNetwork).toFlowV3().collect {
        println("1 " + (it.data != null))
      }
    } catch (e: Exception) {
      println("1 " + e)
    }
  }
This prints:
Copy code
0 true
0 com.apollographql.apollo.exception.ApolloNetworkException: Failed to execute GraphQL http network request
1 true
1 com.apollographql.apollo.exception.ApolloNetworkException: Failed to execute GraphQL http network request
j
ok, thanks for trying....I'll see if I can perhaps get small repro here
πŸ™ 1
I think there's some scope issues at play.....though not sure why using the v4 version would effect that.
πŸ‘€ 1
m
correct. But the exception is thrown so this would interrupt your collect
Yeah thats the issue we have for replacing toFlowV3 πŸ˜…
j
Adding
withContext
in those functions that were invoking the queries seems to have "fixed" the issue here (while still using toFlowV3)
b
😬 that's a bit worrisome...
I still can't imagine what could prevent this flow from emitting
j
yeah, seems strange all right
Ok, I'm back πŸ˜ƒ . The above fixed an issue we had been able to reproduce pretty reliably but now we're getting one that has mostly been seen by testers. We're getting some non network exception being thrown (thinking maybe
CacheMissedException
) but wihtout being able to reproduce here I haven't been able to confirm (am going to add more logging). Wondering if it could be related to following https://github.com/apollographql/apollo-kotlin/issues/5076
Just realised that was quite old issue but seemed to relate to v4 (albeit I guess alphas at the time)
oh yeah, I guess key thing about our issue above is that when user retries it seems to work (again this is offline so fact that retry works would seem to indicate the data is in the cache)
m
> I guess key thing about our issue above is that when user retries it seems to work This, we have a cache set up that "works" with the caveat that sometime for the same session it seems to have an issue emitting existing cache. I'll try to add more specific information tomorrow If I manage to prreproo
j
We're using
ApolloCompositeException
still btw.....wonder if any potential implications around use of that (though still using
toFlowV3
) ...along with use of
response.dataAssertNoErrors
perhaps that's not relevant in this case....again when it fails we get just
ApolloNetworkException
(and cache data not returned)
I've started to try to replicate some of the setup in a minimal way in following branch https://github.com/joreilly/StarWars/tree/offline_cache_test
All related code is in
MainActivity
and
StarWarsRepository
Run it with network, click get data, and then run in say aeroplane mode. (the clear data button resets data in associated StateFlows being used to store result of query)
I do right now see occasionally that it doesn't seem to return data from cache....though could also be issue with how I'm using scopes right now.....there's also some related logs (filter on "JFOR" in logcat) that show for example that we get the network exception (and no cached data emission before it) in "failing" case
This is using Apollo 4 and also
toFlowV3
m
Copy code
@CacheAndNetworkFlowExceptions
fun <D : Operation.Data> ApolloCall<D>.toFlowWithKrakenFieldException(): Flow<D> {
    return toFlowV3()
        .map { response ->

            response.errors?.takeIf { it.isNotEmpty() }?.let { errors ->
                throw if (errors.containsAuthError()) {
                    UnauthorizedException()
                } else {
                    val message = errors.joinToString { it.message }
                    KrakenFieldClientException(message)
                }
            } ?: response.dataAssertNoErrors
        }.catch { exception ->
            throw when (exception) {
                is ApolloCompositeException -> {
                    val cacheMissed =
                        exception.suppressedExceptions.any { it is CacheMissException }
                    if (cacheMissed) {
                        KrakenFieldCacheMissedException(cause = exception)
                    } else {
                        wrapNetworkException(exception)
                    }
                }

                else -> wrapNetworkException(exception)
            }
        }
}
So.. something I managed to notice, is that when we see the error executing with our extensions, We some times get the ApolloNetworkException but its not intercepted by our catch ( I actually had a logger there and nothing showed up) Im not super versed in coroutines, but could it be there something
toFlowV3
that bubbles up an exception ? Funny thing that this behaviour is intermittent at best
j
@mbonnin that gh issue you replied to was in relation to this but actually not sure it's an actual cache missed (or at least one that's reported to us)
m
So the issue is that sometimes
ApolloNetworkException
is not caught by
catch{}
on the Flow?
A reproducer would help a million times here
j
I posted a reproducer above but actually don't think it exhibits same behaviour re. Exceptions ....but in both cases not seeing cached data emitted consistently
m
Ah, that one? https://github.com/joreilly/StarWars/tree/offline_cache_test Sorry, trying to catch up
j
It might not also exactly capture our scenario...
m
Trying now
So what I’m doing is I got the data from the network once, then go to offline mode and then restart the app and click β€œGet data”. Sometimes I end up in this state
When that happens, I get this:
Copy code
2024-08-16 10:20:27.942 16376-16376 System.out              dev.johnoreilly.starwars.androidApp  I  JFOR: collectPeopleInfo, Data ...
2024-08-16 10:20:27.948 16376-16376 System.out              dev.johnoreilly.starwars.androidApp  I  JFOR: collectPeopleInfo, Exception ...
2024-08-16 10:20:27.956 16376-16376 System.out              dev.johnoreilly.starwars.androidApp  I  JFOR: collectFilmInfo, Exception ...
So there’s no log for the film data πŸ€”
j
Yeah, that's what I'm seeing too....we don't seem to be getting cached data....are you also seeing that it sometimes works?
If you click get data again
m
Yep, exactly
(digging into it)
j
I threw this together quickly so not sure there isn't some issue in my code
m
(trying to make the build an included with apollo-kotlin, this turned out to be a bit more complicated than expected but I’m getting there...)
The MainActivity code doesn’t seem very complicated
j
what might be a factor is collecting the 2 flows from that suspend function.....with expectation that first will finishing emitting before starting to collect the next...
we might be refactoring that but I think it's still an ok/valid setup?
m
So I put a log here and it’s printed
So
emit
is called but not received looks like
j
maybe some threading/sychronization issue
m
Ah wait, let me print the actual contents of the emitted response
Can’t reproduce with the log of data (tried 7-8 times)
So it does look like a threading issue indeed πŸ€”
j
and specific to use of
toFlowV3
for some reason....
m
And now I can’t reproduce at all, even without the logs πŸ€”
j
are u using debug version of apollo?
in case some timing issue
m
Could it be that in
toFlowV3
, the flow throws before it is collected on the UI side?
j
we're hopefully going to migrate to
toFlow
soon but probably good to understand cause anyway
m
There is a
channelFlow
involved. What happens if the
channelFlow
throws and there are still elements in the channel?
Let me write a quick test
Simple test seems to work:
Copy code
val flow = withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
                channelFlow {
                    send(0)
                    send(1)
                    send(2)

                }.onEach {
                    if (it == 2) {
                        throw Exception("ouille")
                    }
                }
            }


            flow.catch {
                println("exception")
            }.collect {
                println("got $it")
                delay(1000)
            }
result:
Copy code
got 0
got 1
exception
Need to take a call, brb
πŸ‘ 1
j
btw we did revert locally to 3.x version we had been using and turns out we were also seeing that issue with that as well (offline is relatively new functionality in the app)
(but again don't seem to be seeing issue with v4 and
toFlow
)
m
(but again don't seem to be seeing issue with v4 and
toFlow
)
I think this is basically because we moved away from throwing in favor of forwarding
Result
s
Ok updates in case its relevant, we seem to have fixed our issues, by migrating to
toFlow
from apollov4 and not throwing anymore ( forwarding exceptions from in
exception
field or custom exceptions from the
errors
field in the form of
Result<D>
)
m
Hi folks, apologies, today was a bit hectic. And thanks for the follow up πŸ™. Moving to
toFlow()
is the path forward πŸ‘ . I spent a bit of time with the reproducer this afternoon but I’m still not 100% clear what the issue is. If anything I’m confused at how
channelFlow {}
should handle errors πŸ€”
I’ll keep digging and update this post if I find anything
πŸ‘ 1