Now that we're using 4.x and we got a nicer way of...
# apollo-kotlin
s
Now that we're using 4.x and we got a nicer way of working around errors without exceptions and all, I wanted to cleanup our setup of extracting real errors (like backend responding with an error because of real business logic errors) happening when we do network requests versus all the noise we unfortunately (because I haven't done it right) we log when we get errors due to a slow internet connection timing out, or just failing immediately due to no internet at all. How are y'all doing this in your apps? Should it be as simple as grabbing the exception, or the suppressed exception in case it's the second exception perhaps, and checking if it's of the right throwable subclass? Any gotchas you have encountered trying to do this?
👀 2
âž• 2
m
backend responding with an error because of real business logic errors
You'll have to understand how your backend returns errors. Typically, I would expect a GraphQL error, not a fetch error here
If you're getting a GraphQL error, I'd recomment checking for
response.data == null && response.errors.orEmpty.isNotEmpty()
s
Right, so backend errors that surface as errors and no exceptions should be quite clear to differentiate. Data must be null and errors must be not null. There's pretty much always a reason for us to log those as errors since something is going wrong which is probably resolvable on our end. Same with perhaps if there are data but also there are errors at the same time, which would be a partial data scenario. Again, this error must be something we can fix on our end, so again we do want to log those as errors. Then I would basically need to figure out when there are exceptions and suppressed exceptions, when I should still log those if 1. There's something actually going wrong somewhere 2. There is an exception due to bad network which I should just ignore and return a failure to the caller, but not log since there's nothing actionable for us to do 3. There is an exception which is neither a bad network one, nor something actually going super wrong, but it's something like a cache miss (which was the case in apollo v3), which is just fine to happen and everything should continue normally and I shouldn't log it since there's nothing actionable for us to do to fix it. I need to then mostly figure out when I am in scenario #2 or #3, so I can also confidently just not worry about those either.
The doc describes "Fetch errors" as • The app is offline or doesn't have access to the network. • A DNS error occurred, making it impossible to look up the host. • An SSL error occurred (e.g., the server certificate isn't trusted). • The connection was closed. etc. And mention that we can only know about these by looking at the exception inside the response. That means that the responsibility of knowing which one of those maps to what kind of exception is on our end of course right? And I am not looking for an exhaustive list of what can happen here. I am mostly worried about what I said above. Differentiating between the ones that are actionable on our end, and ones that we can just ignore and treat as failed requests.
m
It's a tough problem but one that doesn't have a very good answer IMO. Is a TLS handshake failure recoverable? Yes if have forgotten to renew your certificate, no if the user is behind a captive portal that intercepts traffic
Is a premature EOF recoverable? Yes if your server finished early. No if some intermediate CDN proxy fails
Etc...
s
Yes, you are absolutely right. Let me preface this with another thought then 😅 I am in no way trying to get the perfect solution going here. I just want to stop having error level logs in our system which are 99% about the user's network being down, while also not silencing real error logs that we send when there is something else is wrong and we still wanna know about it. Perhaps the answer to my question is that "something else is wrong" isn't really a thing when there are exceptions now with 4.x, and all of them are more or less not something recoverable on our end, at least most of the time barring super edge cases like as you say we forgot to renew our license 😅 I will still log those so we can look at them and take action when needed, but perhaps not with an error level which considers them as actual errors.
m
Yea, I hear you, it's just very difficult to find the good signal/noise ratio there
In a way, once you exclude the GraphQL errors and CacheMissExceptions, the problem is a general HTTP/network problem. I've asked myself that question a bunch of times but not sure there's a definitive answer
s
Should CacheMissException be a thing in 4.x btw? Shouldn't that just be an emission with null data now? This is what this https://www.apollographql.com/docs/kotlin/migration/4.0/#emitcachemissesboolean-is-removed made me believe. But yeah you are right, at that point we leave the GQL land indeed.
m
The response is emitted but still contains
CacheMissException
as a way to include more debugging information so you should probably filter them out, I'll edit the wording there
thank you color 1
It's not exactly the same question though. It's "should I retry that request?" vs "can I fix that request?"