Hey! I'm using Ktor for a web server (duh), and I ...
# ktor
t
Hey! I'm using Ktor for a web server (duh), and I happen to want to keep a lot of websocket connections open. I seem to be hitting some sort of limit at 3000 sockets, with more timing out. I can hold 100000~ websocket connections open on Bun (javascript runtime). Would you happen to have any tips on how to increase the number of websockets a ktor server can keep alive?
c
if you are on the JVM use the Netty engine, with the native connectors, for full performance/scalability.
t
I am in fact on the JVM, and I am using Netty, thx
Any other idea?
c
I usually ensure the native connectors are available, though that alone wouldn’t explain this limitation.
Copy code
// bundle netty native libraries for linux and osx, for x86_64 and aarch_64
    setOf("x86_64", "aarch_64").forEach { arch ->
        runtimeOnly(libs.netty.transportNativeEpoll) { artifact { classifier = "linux-${arch}" } }
        runtimeOnly(libs.netty.transportNativeKqueue) { artifact { classifier = "osx-${arch}" } }
    }
Perhaps check that the websocket connection/message handling isn’t doing something blocking? A common issue is to have something blocking on a Netty thread, starving Netty from processing IO.
t
Apologies, I'm somewhat of a newbie here, how would I make sure that the native connectors are being used? I'm reasonably familiar with kotlin as a lang but not the ecosystem at all.
c
no worries. the native connectors won’t be used unless specified in your build script. that snipped above is for your Gradle script; it requires entries like this in
gradle/libs.versions.toml
:
Copy code
netty-transportNativeEpoll = { module = "io.netty:netty-transport-native-epoll" }
netty-transportNativeKqueue = { module = "io.netty:netty-transport-native-kqueue" }
It further assumes you’re using the Ktor BOM in your build script, something like:
Copy code
// ----- Ktor
    implementation(platform(ktor.bom))
    implementation(ktor.server.core)
    implementation(ktor.server.netty)
    implementation(ktor.server.compression)
    implementation(ktor.server.forwardedHeader)
    implementation(ktor.network.tls.certificates)
    implementation(ktor.server.metrics.micrometer)
    implementation(ktor.server.contentNegotiation)
    implementation(ktor.serialization.jackson)
    // ----- end Ktor
t
I see, I am in fact not using ktor bom. Let me look into it and then ask back if I cannot make it work myself.
👍 1
Do you happen to have a sample libs.versions.toml ? Google is yielding me very little results for this.
c
Here’s a sample libs.versions.toml, with a reference to the Gradle docs.
Copy code
# reference: <https://docs.gradle.org/current/userguide/version_catalogs.html>

[versions]

[plugins]

[libraries]

# native dependencies
netty-transportNativeEpoll = { module = "io.netty:netty-transport-native-epoll" }
netty-transportNativeKqueue = { module = "io.netty:netty-transport-native-kqueue" }
t
Thank you!I'm now using native transport on netty, but it still seems to cap out at about the same as before. I also managed to segfault with a null pointer deref.
c
wouldn’t have expected a material difference there, as noted earlier - NIO vs native is mainly about performance/memory use - both handle large number of connections in the same way. Have you investigated the earlier suggestion - checking if websocket connect/message handlers are blocking in some way?
t
I don't think my handler is blocking?
Oh wait, synchronizedMap has blocking operations I think?
c
yes, indeed, that’s one culprit. Perhaps re-test with the handler commented-out temporarily, see if that gets the connections back where they should be, then work backwards from there.
t
Thank you for the suggestions, I will go have dinner and come back to this. This was very helpful!
👍 1
It seems that commenting out all the code made me be able to handle more connections, but still not even 10000, which is a tenth of what my Bun server was doing
I see to be getting ping timeouts regardless
Okay, doing a separate basic test seems to handle 10000 connections fine, I don't know what is different. I'll figure it out. Thank you!
Exception in thread "main" kotlinx.coroutines.CoroutinesInternalError: Fatal exception in coroutines machinery for DispatchedContinuation[BlockingEventLoop@6e16b8b5, Continuation at teles.client.ClientKt$main$2$1$1.invokeSuspend(Client.kt:102)@77192705]. Please read KDoc to 'handleFatalException' method and report this incident to maintainers I think I am cursed.
c
That appears to be an exception from a client, not from ktor server
t
Yep, I'm doing testing with ktor client as well.
c
That client appears to not be using async IO (netty) - it has a blocking event loop. Are you using the same client as used to test other bun server?
t
I think clients don't support Netty? But they do support Jetty, does that use netty under the hood? I'm using CIO for my clients, I believe that is async?
This is not the same client I used for Bun though. I didn't reimplement the full server in Bun. I wonder if the clients are the problem? Thats not impossible and would make this a bit embarrassing since I've had that assumption since the beginning.
c
When performing load/compatitive tests it’s best to change only a single thing, otherwise there are many variables in play, making it hard to discern why there are differences. The client plays a key role in generating the load here.
t
I understand this, I rushed some imprecise testing.
Okay, it seems that using the same Kotlin client gets me at least 12000 WSockets with the Bun server, and then I go oom? This is not making a lot of sense to me.
c
Check that the jvm is allocated an appropriate amount of memory; if it is review code for what is allocating memory / analyze heap dump
t
Okay I have to be cursed xD
okay goddamnit, this is a problem for tomorrow
My computer oom'd yesterday and froze. I had to shut it down and decided I would fix it today. For the first time ever I had FS corruption that was not auto solved by fsck and my computer would not boot. I managed to fix it by live booting and for the first time ever I had stuff in /lost+found . Interestingly they were chunks from the browser config which I was using to chat on here!
It seems that the issue is the clients!!! I can connect 100000 sockets from Bun to my ktor server!
👍 1
I love Bun, I just really love kotlin's dev experience more.
Okay, how can I make the Ktor HTTP clients use up less ram and perhaps not time themselves out?
I have reached a few more connections 😄 (still Bun clients)
Okay, I asked gpt to convert my code to javascript and it seems that my client works well enough for testing and I'm doing literally 100x more operations than before, yay!
❤️ 1
Would you happen to have any pointers on where to look on the kotlin client implementation? I don't know why its so much slower.