Hey all. I'm investigating an issue between OkHttp...
# squarelibraries
i
Hey all. I'm investigating an issue between OkHttp and a remote server that seems to be closing connections pretty quickly—faster than the connection pool's configured
keepAliveDuration
. From the TCP/SSL dumps I've examined, it appears the server is properly sending an SSL
close_notify
message followed by a TCP FIN packet but it seems OkHttp doesn't recognize those when they happen. Much later, it may retrieve that connection from the pool, send a request, then attempt to read a response, upon which it throws an
IOException: unexpected end of stream on <server-name>
. So broad questions to start with: Is it expected that OkHttp ignores remote SSL/TCP closure events? And only discovers them after sending a request and attempting to read the response?
y
It's not surprising to be honest. More generally given the java blocking Socket API, those read or write operations are generally where you will detect it.
But especially on Android the network stacks can delay suppress socket closing.
i
I see. Do you know why sending the request appears to succeed but reading the response is where it fails? Is
retryOnConnectionFailure(true)
the only way to handle this?
y
That should be true by default
i
That's true but we disabled it for other reasons.
y
It should hopefully be safe, retrying requests that are either safe to retry, or it's clear they can't have started on the server.
The health checks should minimise the failures, but maybe not guaranteed
i
In this particular case, the call can't have started on the server because the socket is closed but OkHttp doesn't know that and appears to send the request anyway. How does OkHttp determine this is "safe" to retry?
And one more I'll dig out
i
Thanks for the links. My concern is about non-idempotent calls to the server—say, mutating some server-side state. The end-of-stream error that
retryOnConnectionFailure(true)
would retry is only triggered after the request has been "sent". Now, because in this case the server has closed the connection, the data haven't really been _sent across the wire—_merely sent to some buffer somewhere, which presumably would timeout at some point. But at that point in the call execution, it doesn't seem like OkHttp knows whether the data have actually made it to the wire. If the connection was broken after the request was sent and the call is not idempotent, it would not be safe to retry. Am I missing some other validation OkHttp uses to be safe about this? Or would setting
retryOnConnectionFailure(true)
potentially expose us to re-executing non-idempotent calls?
y
The errors that okhttp treats as connection errors are meant to account for whether it could be executing. But I'm a bit rusty on this.
If you can reproduce it doing the wrong thing, it's easy to diagnose. I'd have to check for that particular error you mention.
i
OK, let me see if I can assemble a minimal repro. Thanks for the help so far Yuri. 😊
y
Some discussion from @jessewilson https://publicobject.com/2023/11/20/idempotent-apis/ that indicates some caution needed
🫡 1
j
OkHttp special cases GET requests on the assumption that retrying them is safer
i
I don't see that logic in
RetryAndFollowUpInterceptor
...where is the special casing for GETs?
When it takes a connection from the pool it does an expensive connection health check unless it’s a GET
i
OK, so it appears that
doExtensiveHealthChecks
returns false for GET. Then
RealConnection.isHealthy
uses that plus idle time to determine whether to invoke an additional socket-level test. So I can see how non-GET requests get extra verification of socket health on connection acquisition...is that the thing which prevents resending a non-idempotent request? Because I don't yet see how that affects the retry logic in
RetryAndFollowUpInterceptor
. 🤔
j
It’s intended to reduce the opportunities for non-GET calls to require retries