https://kotlinlang.org logo
Title
b

Byron Katz

01/06/2021, 12:28 AM
Hi folks, I've been building a simplicity-prioritized web application from the ground-up, with the fewest possible dependencies. That means I also built a server for this, using Socket and ServerSocket. It's been of the ordinary blocking kind, and I'm pretty happy. But not complacent. Right now I'm doing some research to see if there's any low-hanging fruit in terms of massive speed gains. The driving business paradigm is that the application is for timekeeping at a company. If I run the code to add a time entry below the server layer, I can add a million time entries per second (thread-safely! It's atomic indexes and ConcurrentHashMap for the shared mutable state). However, if I do the equivalent through the server layer, it's down to eleven thousand per second. Yuck. Well, I mean it's ok and all, but I want a million if there's some easy way to have it. One of the big bottlenecks is the back-and-forth of the HTTP protocol, where the server examines the first line to see what it is (GET? POST? etc) and then reads the headers (Does it have a content-length?, Who is this, per the cookie?) and then assembles a response in kind. It also complicates things that I am handling keep-alive as well. That means the client might decide to stay on the socket for the next back-and-forth. Which leads me to the idea of non-blocking. If I do that, sure, each individual request/response should be the same performance, but with all that waiting taking place in blocking servers, I would imagine I could parallelize this tremendously and get my million requests per second. Questions: 1. Am I insane? 2. Is non-blocking the answer to this? 3. I've looked at Ktor's code. Is there other code that covers similar ground, written in pure Kotlin, test/quality-oriented, that's fast?
d

dave08

01/06/2021, 3:27 AM
In ktor, vert.x, micronaut, etc... they all have a thread pool handling incoming requests, but usually a golden rule is not to block that thread pool when processing the request... that way its free to process and queue up new incoming requests. So using non blocking libs in processing (like jasync for dbs and cio for files), and using another coroutine dispatcher might help. Im no expert in this point, but Ive used all these platforms a lot.
Otherwise, there's always things like nats.io that have little overhead to receive messages.
b

Byron Katz

01/06/2021, 3:29 AM
Hmm good ideas, thanks!
a

asad.awadia

01/06/2021, 3:52 AM
Are you sure your test is correct? Can you try using vertx or something? Very likely its the client load thats the bottleneck
b

Byron Katz

01/06/2021, 4:04 AM
That would be highly non-trivial to do
d

dave08

01/06/2021, 5:38 AM
Another workaround until you get to doing changes could be to put a few services behind a reverse proxy like nginx to spread the requests out
j

Joel

01/06/2021, 7:17 PM
The fact that your non-server code can process millions of transactions per second implies that blocking code is not the issue. Asynchronous code would allow you to handle many long-running operations at the same time, but will not help if your code is already highly optimized and CPU-bound. You're adding all of serialization and transmission when running over a server, which are both absurdly expensive compared to in-memory operations that may even fit into CPU cache. You should quantify the time spent in each of transmission, serialization, and business logic and see where the devil lies. You can also try using a UDP socket which will be faster than TCP with the risk of corrupted data. But hey, you never said accuracy was a concern. 🙂
b

Byron Katz

01/06/2021, 7:23 PM
After digging further, it does appear my original conclusion was false. I wasn't actually getting millions per second, there was a short-circuit I wasn't seeing, buried in the threads (and thus wasn't propagating during test runs). In fact, I only get thousands per second, even below the HTTP layer.
@Joel could you explain this line further?
You're adding all of serialization and transmission when running over a server, which are both absurdly expensive compared to in-memory operations that may even fit into CPU cache.
j

Joel

01/06/2021, 7:37 PM
Which part?
Serialization: When you send a message via HTTP, you are converting some object to said message i.e. string. That adds a bunch of time and complexity.
Transmission: The JVM is highly optimized and any work done within its confines is pretty fast. Anytime you access the network you start hitting shared resources and system calls that are much slower than program execution. They're still fast per wall clock time, but relatively much slower than pure execution within the JVM. Plus TCP itself is a whole ball of wax. Assuming you're testing locally you're not hitting the network (which is relatively very slow) but still time consuming.