I created a multiplatform HTML streaming library. ...
# kotlin-native
s
I created a multiplatform HTML streaming library. For fun I thought I'd benchmark it for macosX64 and JVM. Here is a link to the benchmark: https://github.com/ScottPierce/kotlin-html/blob/master/benchmark/src/commonMain/kotlin/dev/scottpierce/html/benchmark/SimpleBenchmark.kt Results:
Copy code
JVM: Benchmark Completed. Benchmark Total Millis: 17324, Average iteration was 346.42 millis
Native: Benchmark Completed. Benchmark Total Millis: 185398, Average iteration was 3707.94 millis
The benchmark is basically building a large String, using a StringBuilder underneath the hood that represents a webpage. It does this 20k times per iteration. Can anyone confirm that I've added the -opt compiler arg correctly? https://github.com/ScottPierce/kotlin-html/blob/master/benchmark/build.gradle.kts#L37
g
K/N work with strings is pretty slow, as I remember (or at least used to be slow)
try to remove
-opt
and check result
e
-opt
is added automatically in release binary. Thank you for benchmark.
👍 1
o
hmm, why your DSL doesn't use inline functions?
☝️ 1
s
@olonho What do you mean? Any part of my DSL that takes a lambda function (i.e. a non-void html element) uses an inline function. Look at any non-void html element: https://github.com/ScottPierce/kotlin-html/tree/master/kotlin-html/src/genMain/kotlin/dev/scottpierce/html/element kotlinx.html doesn't do this, and I'd imagine that's one of the reasons this library performs a bit better from my testing.
I tried a few things. Currently the StringBuilder increases the
CharArray
capacity by 50% instead of the JVM's (*2 +2). That helped slightly shaving off ~200 millis per average iteration. I then changed the initial capacity of the StringBuilder to 8k (larger than the 7.7k character page) - and surprisingly, that actually hurt performance, which doesn't make a lot of sense.
Made a branch for it if anyone wants to try it out: https://github.com/ScottPierce/kotlin-html/tree/benchmark-LargeInitialCapacity
Here is a profile of the above branch where I've made the size of the StringBuilder larger than the page it has to hold. It's not clear to me where all the time is being eaten up.
j
how would i add linux x64 to this build to help benchmark this?
i have a lot of kotlin html templates in literal strings in production
this could be helpful
@spierce7
is this doable ina posix library kind of way ? actual val currentTime: Long get() = (NSDate().timeIntervalSince1970 * 1000).toLong()
e
There is suitable for all platforms function in Kotlin standard library https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.system/measure-time-millis.html And to add linux you need just add in https://github.com/ScottPierce/kotlin-html/blob/master/benchmark/build.gradle.kts
linuxX64
same way as there is already
macosX64
g
For benchmarking better to use
measureNanoTime
which is intended to use for measurement,
measureTimeMillis
uses wall time, not monotonic
e
There is already some improvement of performance for this benchmark on K/N master if compare to 1.3.41. And we'll look what can be done for this case.
@gildor this code is quite huge benchmark, so in this case it isn't very important. In case of microbenchmarks, of course.
j
all metthods listed in kotlin.system are native.
yeah so far cloning the mac stuff has worked and i get the error in compiling Platform. it seems Platform should be ported to kotlin.system and inlined in intellij to make this go away
yagni on the NSDate stuff
@spierce7 im looking though the code now, seems the abstraction of HtmlWriter and write() operate as arrays where you would get tthe native performance you want from a rope abstraction.
StringBUilder is not a rope generator but the javolution library takes extreme measures to support Text() objects with rope semantics for insetion/appendage
it seems like memoizing the write operations with the params to build a graph would be faster and a reification step would be able to fill up the IO buffers in whatever ordered linear streazming way is efficient
each element as a coroutine context would enable an efficient concurrent graph assembly
s
@jimn Not sure I follow the performance changes you are suggesting. I'll direct message you