Interesting fact. It seems like using newer Oracle...
# mathematics
a
Interesting fact. It seems like using newer Oracle GraalVM JIT significantly improves performance even for Multik
I see about 3x performance imporvement for EJML and about 2x improvement for Multik in the same benchmarks.
Another fun fact. Switching two lines in KMath (turning on parallel buffer processing) allows to rump up performance almost to the level of Multik. I am not sure if parallel processing should be turned on by default...
i
Which two lines? I don't get the context.
a
Check the new pull request. I've added a new version of LinearAlgebra for Jvm that introduces parallel processing for matrix building. It works quite well on dot operation.
i
So, it's just JVM's parallel stream? Then it's not surprising, because it's just MISD. Clearly should it been tried before...
We have spent many efforts to achieve SIMD with Viktor or ND4J.
a
Yes. I mean that usually, it is not effective for mathematical operations because the overhead from parallel processing is comparable to the benefit. But the dot operation specifically benefits a lot from it. Operations for each i and j could be done in parallel. And we do not need to make all operations parallel, only this one. We do it by creating a context for optimisation of a specific operation.
I checked the same context with other operations and it does not work well.
i
GPU computations are beneficial from parallelism exactly because of lower overhead you mentioned.
a
It is not lower. The computation itself is cheaper, but data transfer is much more expensive. GPU works well when you can load all your data at once and not update it.
i
Do you have a benchmark comparing kmath-multik, kmath-multik with parallel stream and pure multik?
Oh, I am wrong.
Clearly, multik has their own
dot
. And I'm sure kmath when wraps multik gives no or minimal overhead.
a
Pure multik is the same as KMath-multik since KMath is a thin wrapper on top. But yes. I've done exactly that. On Oracle GraalVM Multik is about 3-4 times faster than kmath with parallel processing. And about 20 times faster than Kmath without parallel processing.
At the same time KMath parallel is faster that both Ejml and TensorFlow-CPU
i
I think there's no use case for kmath-core with parallelism involved, when it's anyway worse than wrappers.
a
It is a good example. And it IS better than most of wrappers. Multik is good, but it is rather limited. Dot operations is practically the only non-trivial operation it could do.
So if one wants reasonable flexibility, but has a botleneck at dot operation, it could help. You just need to do a context switch.
i
If it's just an example, then more experiments should be done, with tensors, for instance.
a
Indeed. But tensors require a lot of clean up, which I can't do right now.