I have a strange issue... My Kotlin applications a...
# ktor
n
I have a strange issue... My Kotlin applications are running on Ubuntu Server and Raspberry Pi OS (server). They poll a file every 5-10 seconds using the CIO-backed HTTP clients, checking for changes. They work as expected for hours, but suddenly every requests keep running to timeout:
Copy code
io.ktor.client.network.sockets.ConnectTimeoutException: Connect timeout has expired [url=..., connect_timeout=3000 ms]
        at io.ktor.client.plugins.HttpTimeoutKt.ConnectTimeoutException(HttpTimeout.kt:213)
        at io.ktor.client.plugins.HttpTimeoutKt.ConnectTimeoutException$default(HttpTimeout.kt:210)
        at io.ktor.client.engine.cio.Endpoint.getTimeoutException(Endpoint.kt:268)
        at io.ktor.client.engine.cio.Endpoint.connect(Endpoint.kt:257)
        at io.ktor.client.engine.cio.Endpoint.access$connect(Endpoint.kt:25)
        at io.ktor.client.engine.cio.Endpoint$connect$1.invokeSuspend(Endpoint.kt)
The file is accessible and can be downloaded e.g. with a simple browser while the apps are getting timeouts. Besides, all other connections initiated by the applications fail, e.g. WebSocket clients cannot reconnect to the server. Restarting the applications always solves the problem... 😕 Maybe have you seen anything similar? Do you have an idea how I can debug the issue? Thanks.
I tried to switch to the Java client engine but I have run into a WebSocket issue (the bearer Authorization header is sent only by the CIO client engine, the Java engine does not include it 😕 ), so I need more time to refactor the apps...
Full stack trace:
Copy code
io.ktor.client.network.sockets.ConnectTimeoutException: Connect timeout has expired [url=..., connect_timeout=3000 ms]
        at io.ktor.client.plugins.HttpTimeoutKt.ConnectTimeoutException(HttpTimeout.kt:213)
        at io.ktor.client.plugins.HttpTimeoutKt.ConnectTimeoutException$default(HttpTimeout.kt:210)
        at io.ktor.client.engine.cio.Endpoint.getTimeoutException(Endpoint.kt:268)
        at io.ktor.client.engine.cio.Endpoint.connect(Endpoint.kt:257)
        at io.ktor.client.engine.cio.Endpoint.access$connect(Endpoint.kt:25)
        at io.ktor.client.engine.cio.Endpoint$connect$1.invokeSuspend(Endpoint.kt)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:32)
        at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:800)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:704)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:691)
Environment: latest stable Kotlin, Ktor, JDK 21...
a
Can you please try using the
OkHttp
engine?
n
I switched to the Java engine for the polling and until now it seems to work as expected. (But I will need more time to tell it for sure.)
@Aleksei Tirman [JB] I haven't met the issue with the Java client engine since my previous comment. Should I specifically try the OkHttp engine as well to further investigate the root of the problem?
a
To further investigate the issue, the code snippet is required. You can test it with the OkHttp engine to make sure the problem is isolated in the CIO engine.
h
Hey @Norbi, out of curiosity... did you find the root cause? We are experiencing the same issue with the CIO engine. We noticed that we have a large number of TIME_WAIT sockets.
n
@Helio Unfortunatelly, no 😞 After I fixed a WebSocket authorization issue with the Java server engine, I switched to it, so currently I don't use the CIO server engine. (To tell the truth, I have been waiting for about a day before switching from CIO to Java but the issue have not appeared - sadly I didn't have more time to experiment 😞 )
Maybe you can try the recommendation of @Aleksei Tirman [JB] and switch to OkHttp (if you are on JVM).
h
Yes... I'd be keen on trying OkHttp... but I'd love to try to understand what could be the underlying cause. We noticed in our situation that with time these sockets grow so much. We had almost 5k sockets with TIME_WAIT status. I'm not sure if the CIO is still not production ready. I'm going to try using Apache5 for now... let's see if this will help. I was talking with @e5l and he mentioned he identified an issue about connection pooling https://kotlinlang.slack.com/archives/C0A974TJ9/p1701762890530939?thread_ts=1701236098.950109&cid=C0A974TJ9, but not sure if it has been fixed yet. :(
It has been ~4h since we replaced the CIO engine to Apache, and the JVM metrics such as heap / threads seems much more stable after the replacement.... I will monitor throughout the week to see if it will maintain... But there seems to have something fishy with the CIO indeed.
Despite this JVM metrics, we also noticed a high reduction of CPU and Memory, along with latency.
n
Thanks for the info, this is really useful! (And indeed, I didn't expect such a huge difference in resource consumption!)