Hey everyone, I'm experiencing a super weird bug t...
# ktor
m
Hey everyone, I'm experiencing a super weird bug that I can't find the answer to it, so that's why I'm here asking about it When I'm querying my Ktor (server) behind a nginx reverse proxy with Ktor (client), sometimes the request fails with
Copy code
Exception in thread "main" java.io.EOFException: Chunked stream has ended unexpectedly: no chunk size
	at io.ktor.http.cio.ChunkedTransferEncodingKt.decodeChunked(ChunkedTransferEncoding.kt:77)
	at io.ktor.http.cio.ChunkedTransferEncodingKt$decodeChunked$3.invokeSuspend(ChunkedTransferEncoding.kt)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:42)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)
and nginx logs
upstream prematurely closed connection while reading upstream
... Keep in mind the sometimes, it only happens randomly with no rhyme or reason about why this happens. But here's the catch: I'm not able to reproduce this bug with curl! If I attempt to replicate the bug with curl, nginx does log
upstream prematurely closed connection while reading upstream
when this happens, BUT curl successfully returns theresponse http status code (200) and I can also download the http response body without any issues! So the
upstream prematurely closed connection while reading upstream
error log doesn't even make any sense since the response is correct. I'm still attempting to debug this issue, but the weird part is that ktor fails to parse the request, but curl successfully parses the request without any issues.
There is a fix that I found, where if I add this to the location block
Copy code
proxy_http_version 1.1;
proxy_set_header Connection "";
the issue goes away altogether, but why??? it doesn't make any sense why!!! I thought that maybe it was something related to keep alive, but nope, it seems that nginx forwards a http/1.0 request + Connection: close to the webapp, so it couldn't be a keep alive connection causing issues.
There is this issue that also talks about the "Chunked stream has ended unexpectedly: no chunk size" bug, but I don't know if is related or not: https://stackoverflow.com/questions/75758637/ktor-chunked-stream-has-ended-unexpectedly-no-chunk-size
Another thing: The server is using Netty engine. I wasn't able to reproduce the bug with the CIO engine yet
After testing a bit more: I cannot reproduce the bug with the CIO engine, while with Netty it randomly fails. I have a theory: I think the Netty engine has a race condition that causes the connection to be closed BEFORE nginx has received the entire data. This only affects http/1.0 clients that are using "Connection: close" (which nginx does use)
by default nginx uses
http/1.0
with the
Connection: close
header the
Connection: close
header means that the client will close the connection after the response is received maybe there's a race condition on Ktor's Netty engine implementation, where the server closes the connection before the client has fully read the response so when nginx detects that upstream closed the connection prematurely, it thinks "oh shit" and flushes the received data downstream and then it flushes the rest of the received data downstream, because it did receive a complete request (and it knows that it did, after the error is thrown, nginx logs that it sent a http 200 response to the client), it just didn't receive the entire data before flushing the data downstream and that incomplete request borks out most http clients, curl is unaffected because curl is curl so I guess curl has some failsafes the reason why using
http/1.1
and removing the
Connection
header works is because the server, according to the http/1.1 spec, defaults to
keep-alive
if the Connection header isn't present, so the server ends up not closing the connection because it thinks that nginx will reuse the connection later on making the client (in this case, nginx) handle the connection close step, which avoids the issue this is the only plausible theory that I have thought about right now, which would explain why setting http/1.1 + removing the connection header OR switching http engines fixes the issue
195 Views