For those of you into this kind of thing, Round 18...
# http4k
For those of you into this kind of thing, Round 18 of the techempower benchmarks is out. No big surprises - http4k still consistently performing well:
🎉 3
👍 4
👏 1
I'm surprised by ktor performance... I thought it would be close to vert.x, given the kind of test
Worth reading:
Copy code
Overall, the value we get from this architectural change is high, with connection scaling being the primary benefit, but it does come at a cost. We have a system that is much more complex to debug, code, and test, and we are working within an ecosystem at Netflix that operates on an assumption of blocking systems.
Thanks for your answer@dave Really appreciated 👍
👍 1
From what I can read, the tests mentioned do not currently exercise a use case which the nonblocking/async/'reactive' architectures are designed to optimize. The tests are all 'constant load' syncronous request/reply tests with no dependencies across requests and minimal contention where non-blocking-io has no advantages ( except possibly in a very creative special designed architecture). There is some blogs wrt to 'future test' which might test these cases and help isolate the tradeoffs. In my oppinion, these tests would only favor async-style architecturtes if those implementations were also optimized for other criteria, sufficiently that the overhead for async processing (yes there is overhead in exchange for latency reduction ) is outweighted.
This reminds me of a paper published by Microsoft at a confrence 20+ years ago showing performance results for a server written in C# vs C++ -- the C# server outperformed the C++ one. People in the audience (rightfully) were skeptical. The trick ? In a client-server architecture, the performance was primarily wrt latancy and also request/sec from the client side. Resasonable realistic metrics -- that had little to do with backend 'raw horsepower'. C# uses garbage collection and C++ manages memory inline (each caller/thread does its own alloc/free/new/delete). For cases where the server is not 100% fully utilized all the time, GC tends to happen in 'idle periods' while inline memory allocation overhead is allocated as part of every call. If there were no other differences (say 2 C++ programs one using GC and one not) the GC version wins hands-down -- all(most all) the memory management overhead occurring outside of the benchmarks (in the GC thread during 'idle' periods). I find a lot to ponder in this -- there is nothing at all 'wrong' with the different perspectives/measurements of 'performance' yet they are very different. My takeaway is that when discussing 'performance' one needs to be very thoughtful and open minded about exactly what is being asked and what is being answered, and try very hard to not over generalize -- but also not to simply discard all benchmarks as meaningless either. Well implemented benchmarks have significant information value -- but rarely *simplistic*information -- even in a tightly constrained 'apples to apples' test -- the apples are often not the fruit one is thinking of.
@DALDEI totally agree. We as an industry do seem to have a tendency to fixate on this kind of thing, which to me is the most frustrating/amusing part, because IMHO, >95% of all servers are never actually stretched to any even untuned limit. Having coded both async (Fintrospect) and (currently) sync (http4k) frameworks in this particular benchmark, I know hands down which I'd prefer to work with - it's the one that allows me to be the most productive - ie. the maximum amount of stuff done with the minimum amount of hassle. I take a look at some of the implementations in the TFB repo and just wonder who would want to actually work with code like this? As was pointed out by @s4nchez on Twitter - it would be also nice for this kind of benchmark to also focus on productivity aspects of the code as well as some raw numbers - things like (some type of) analysis of code complexity, number of lines of code, testability etc.
Then again, many many people are using popular libraries that perform pretty badly in the benchmarks, so maybe there is hope... 😉
re "there is hope" -- yes -- this is one area of benchmarking that IMHO is reasonable to infer useful information from even over-generaalized data. If the library/app/framework one is using consistantly shows very poorly on benchmarks -- especialy if they are independent and with different goals -- and particularly if the degree of in-permanent is very high -- then one might reasonably consider that maybe a different library/application/framework might not be such a bad idea to look at for the next project. Similarly -- when one does find a app/libarary/framework that is more appealing for other reasons (ie.. its EASIER TO USE) -- said benchmarks do help a lot in persuading others (such as ones boss) that it may be worth the 'risk' of trying something new .
Id love to see a 'benchmark' that focuses on 'How bad does it get when thigns break ' instead of "How fast can you go BEFORE things break'
👍 2