okhttp interceptors question. application vs netwo...
# squarelibraries
c
okhttp interceptors question. application vs network. i use a third party library for inspecting traffic (chucker) and i recently noticed that 304s (cache hits) dont show as 304s when set as an app interceptor. if I switch it over to a network interceptor. i see 304's, but I also see one error along with each call. Is there something in okhttp that could be the culprit of sending an error to the network inspector library (chucker)
Here is with it set as app interceptor
If I have it as a network interceptor
network interceptor with charles looks fine on charles
j
are those exceptions?
c
If I drill into a failure it says
If it helps. I'm making these network calls every 10 seconds in the app so we can see the cache hits being done. I did also increase to 1 minute, and same issue.
j
that exception gets thrown when the response promises a body but fails to deliver one
or delivers a body when it said it wouldn't
so i would compare the content-length header values and the body length in charles
c
oooh. so it could be my backends fault problem!
ok. original response has
Copy code
HTTP/1.1 200 OK
Date: Thu, 01 Feb 2024 19:41:30 GMT
Server: Apache/2.4.41 (Ubuntu)
X-Powered-By: Express
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Last-Modified: 2024-01-25T14:55:47.485Z
Content-Type: application/json; charset=utf-8
Content-Length: 464
ETag: W/"1d0-Zvvdd0uIGAOxUBBQXVtRrGwrlp"
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive

{"status":"success","data":[{"ABL":1,"AsdasdLevel":1,"a":1,"b":1,"c":4,"a":4,"r":4,"e":0,"g":0,"f":0,"i":0,"h":0,"ee":0,"ListType":1,"j":null,"k":[1234,4321],"l":3,"m":1,"n":false,"o":{"p":"2024-01-25T14:55:47.485Z","rules":[]}}]}
I obfuscated the return body.
the cache hit response body is
Copy code
HTTP/1.1 304 Not Modified
Date: Thu, 01 Feb 2024 19:45:43 GMT
Server: Apache/2.4.41 (Ubuntu)
Connection: Keep-Alive
Keep-Alive: timeout=5, max=100
ETag: W/"1d0-Zvvdd0uIGAOxUBBQXVtRrGwrlp"
when I put the original response through a byte calculator. it does indeed get set to 464.
very interesting...
i wonder if this is by chance just an issue in the network inspector lib.
But i guess its showing an actual stack trace. so i guess it must be encountering some legit Exception.
hm. could it be that the server response (either the first time or second time) should return an empty line
\n
?
j
i believe http demands a trailing newline, yes
the headers are separated from the body by a trailing newline
a response with no body still contains a trailing newline (or two, if you count the one at the end of the last header)
c
holy shit. thats what it was. if i add an extra newline via a breakpoint in charles... then everything works fine.
wow
talk about a TIL
still weird that chucker shows 2 requests for the one request. even though the actual network request fails, but then it shows a success since i guess it has no issue showing the cached content. same goes for okhttp i guess. like. i find it weird that my app didn't crash. and seemingly i still get the 200 on the other side.
fun fun fun.
gotta go read the spec about the trailing new lines and send it to my backend team
thanks @jw
First paragraph
😍 1
Status, headers, empty line, optional body
😍 1
c
newline character missing in okhttp response. ^ Sorta related to the above, but kinda a new question... Any ideas why I get this response initially
Copy code
HTTP/1.1 404 Not Found
Date: Thu, 21 Mar 2024 21:28:13 GMT
Server: Apache/2.4.41 (Ubuntu)
X-Powered-By: Express
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Content-Type: application/json; charset=utf-8
Content-Length: 28
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive

{"Error":"404 Not Found..."}
and then I make the same request (with caching enabled on my okhttp client) and I get an exception of
java.io.EOFException: \n not found: limit=0 content=…
It doesn't even look like my second request makes it over the network. Looks like it short-circuits internally. Stacktrace says readResponseHeaders... but it didn't go out to the network... so it's using a cached response? Going to try to remove some headers... and sentry is somehow in there. maybe that's the culprit.
lol. okay. i disabled caching. and the issue still happens. idk whats going on
lmaooo. removed sentry and now no issues.
nevermind. still seems to happen. like in this case. made the call 4 times. no issues. 5th time got the /n error. hmmmmmmm 🤔
Just updated from 4.11.0 to 4.12.0... seems to work now... let me test some more. sorry for rubber ducking in here. lol
lol. worked fine 10 times in a row. 11th time. hit the \n error. WHAT IS GOING ON. lol
Like... does this look sus to anyone
Copy code
HTTP/1.1 404 Not Found
Date: Thu, 21 Mar 2024 22:47:06 GMT
Server: Apache/2.4.41 (Ubuntu)
X-Powered-By: Express
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Content-Type: application/json; charset=utf-8
ETag: W/"FOO"
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Length: 28

{"Error":"404 Not Found..."}
j
This response parsed successfully though, right?
I think it’s the subsequent one that’s failing?
Would you consider setting
retryOnConnectionFailure
on your client?
I suspect your server is timing out the socket connection and that’s racing with your outbound call, and the net result is this exception
If you set
retryOnConnectionFailure
OkHttp will automatically recover from this error for you
c
As far as I can tell (I have proxyman/charles on) it looks like the requests are suceeding successfully.
I have no issue enabling retryOnConnectionFailure. but I would say its weird that my calls are failing. especially since charles makes it seem like its not.
whats more interesting is that (since my network module is in a separate jvm-only module) I'm able to write some tests that just run on the jvm... and no issue. only seems to be an issue when running the network module in my android code.
could it be... my proxy?
i removed all my proxy breakpoints... even though I'm "continuing" the breakpoints instantly... it maybe seems like the \n error only happens when breakpoints on my proxy are on?
let me test some more so i can actually be sure.
nope. same thing.
darn.
and i have retryOnConnectionFailure enabled too
im on a samsung device... lets try a pixel instead 😃
lol. tried on pixel like 20 times. everything was fine. 21st time. \n error. Wat the heck. Has to be a server thing... no?
server somehow killing me
hard to tell if it dies before the request goes out, or after. When hooked up to charles... it seems like it happens before the request actually makes it over the network
http client is also pretty darn simple
Copy code
val client =
  OkHttpClient.Builder()
    .connectTimeout(20, java.util.concurrent.TimeUnit.SECONDS)
    .readTimeout(20, java.util.concurrent.TimeUnit.SECONDS)
    .writeTimeout(20, java.util.concurrent.TimeUnit.SECONDS)
    .retryOnConnectionFailure(true)
    .build()
i wonder if its a viewmodel thing. maybe my viewmodel scope is being killed sometimes. killing this network call mid flight?
j
I don’t think so
I would have expected that EOFException to be retried
c
I think it might be my VM
doing some more testing. but added logs to my VM.
k nope
sync is the name of the endpoint call. so. you can see it was good if i keep hitting the sync button. but eventually, that last one ended up hitting the \n breakpoint
so weird
its gotta be something dead simple...
whoops. in that last test run. i had .retryOnConnectionFailure(true) commented out. just to try to test the VM stuff. let me see if i can repro with that commented in.
j
retryOnConnectionFailure is the default . . . I wonder why it isn’t retrying
c
oh. i thought false was the default.
yeah. even with that specifically set to true, i launched the app. and this time it took two calls to sync for it to fail
and now 4 times in a row it happens on the second call to sync
Going to try to get some sleep. if anyone sees something interesting in this stack, let me know. maybe theres some path in there (like cache interceptor) that we shouldn't be hitting if we don't have cache enabled? maybe theres a rogue cache lingering?)
makes no sense why it doesn't want to repro on jvm
j
The cache interceptor is always installed to implement
Cache-Control: only-if-cached
1
c
ah. i think maybe i know the issue
Copy code
@Composable
fun LifecycleEvents(viewModel: MyViewModel) {
  val lifecycleOwner = LocalLifecycleOwner.current
  DisposableEffect(lifecycleOwner) {
    val observer = LifecycleEventObserver { _, event ->
      if (event == Lifecycle.Event.ON_RESUME) {
        viewModel.syncWithBackend()
      }
    }

    lifecycleOwner.lifecycle.addObserver(observer)

    onDispose { lifecycleOwner.lifecycle.removeObserver(observer) }
  }
}
I bet this composable is the culprit. because it only seems to happen when i jump back into the app
or. at least that composable has something to do with it.
something with the onDispose is my hunch
of course now i add logs all over that composable and no issue. lmaooo. this is a funky one
maybe its not that. but its kinda the only "weird" thing. even though it still seems pretty airtight.
Copy code
fun syncWithBackend() {
  viewModelScope.launch {
      syncUseCase.sync()
  }
}
lets see if i remove that lifecycle stuff. and instead just have a request that is sent via button press if that changes anything 🤞
nope. still repro'd. lmao. oh well. so much for that theory
but it still repros, if i leave app, come back, and hit the sync button. it seems to fail the first time... every time.
well. it doesn't repro every time when i do that. but like 50% when i do that it'll repro. I think its like an intermittent server issue. thats the only thing i can think of.
ran that network call in a jvm unit test. repeat(1_000). can't repro. will try the same in android app instead of click a button every time
that worked fine too. its like it doesn't like the first connection to the server... sometimes. all subsequent calls seem fine tho
j
I vaguely recall there being an HTTP proxy thing that’s broken in the Android emulator?
c
was able to repro again like 10 times in a row by leaving the app and just making the first network call when i come back. its legit like the network isn't ready or something when i pop back into the app lol
j
I wonder if that’s your problem
c
this is a real device tho
j
Or you’re on a real device
Yeah
TLS?
c
whats that website again
lmao
its probably that
j
Are you using TLS or not?
c
yeah. the api is https...
j
Okay, that actually eliminates a bunch of potential problems because you’re probably not talking to dumb software that’s rewriting your requests
c
let me try to remember that tls tracer/verifier website you showed me years ago
j
That’s not it
c
oh
j
Your TLS handshake is working
I think your only real problem is that the EOFException isn’t being retried. That one weirds me out
c
do you think its being retried in subsequent requests that are successful?
its frustrating that i cant seem to repro on jvm honestly.
maybe i'll keep paring down my android app to see when it stops behaving that way?
same thing if i try curl. can't seem to repro there
j
Have you caught it in a debugger yet?
c
yeah
every time i mention me catching it is with me in the debugger
j
What happens when it gets to RetryAndFollowUpInterceptor.recover()
c
it doesn't technically crash or anything.
but im also using retrofit + eithernet and stuff. so not sure if anything is hiding the issue. really the only way to tell its happening is via Chucker
and at first i was like "oh. its maybe just a weird thing with chucker" but ive tried other api endspoints, and everything works fine. its just this api endpoint that is problematic.
setting a breakpoint here. will let you know what happens.
Seemed to get to return true
j
so . . . does it retry?!
c
I'm fairly certain it is retrying successfully. yeah.
If I make 10 requests in a loop. I see 11 in chucker. the first 1 is the \n error, the next ten succeed
im still interested in why its failing though and has to retry in the first place. to me that seems to be the issue to resolve? im starting to get the feeling that your POV is that "this is fine, let it do its thing".
is there some way to check if the /n error is due to the actual response missing \n OR if it because the response actually got interupted?
j
It’s not a missing newline character
It’s an end-of-stream
The server is closing the connection, the client is attempting to reuse
This is a thing
c
ooof. okay. so i feel like this is my fault for being misled, but if you go to the top of this thread... we did actually identify an issue where my 304's from my backend were not sending a new line and jake pointed out that was incorrect. after we did that, everything seemingly worked, but then today we noticed that we were getting those errors again and we thought something regressed. reusing the connection and server is closing! TIL! I didn't know it would be such a frequent thing. wonder if there are knobs for that on the server. but cool. will let them know. sorry for being misled on this newline char issue 😭
👍🏻 1