https://kotlinlang.org logo
р

Ролан

05/07/2021, 5:51 AM
Thanks to Andrey Kislitsin, we have a example of a neural network on kotlin mutliplatform using kmath's tensors and including both forward and backward pass (so you can train it everywhere) https://github.com/mipt-npm/kmath/blob/feature/tensor-algebra/examples/src/main/kotlin/space/kscience/kmath/tensors/NeuralNetwork.kt
🔥 1
🦜 2
🙌 2
h

Hampus Londögård

05/07/2021, 6:02 AM
Awesome! How big is the performance penalty in comparison to using Python (or well, Python DSL to Torch/TF one could call it as you don’t really write python 😅)? Are the tensors ever shuffled to the JVM or they stay native until you try to print?
р

Ролан

05/07/2021, 6:29 AM
no it's all KMP including tensors, no dependencies on anything and you can run it anywhere KMP works. In terms of performance, that wasn't our concern yet.
h

Hampus Londögård

05/07/2021, 6:31 AM
KMP = Kotlin Multiplatform? I ment more in the idea, do you shuffle the tensors into DoubleArrays or do you keep them as native (like pytorch and others usually do) so that the operations happens through the original C/C++-code?
р

Ролан

05/07/2021, 6:32 AM
Tensors are backed by
DoubleArray
indeed
there is nothing native
in that sense
yes for kotlin multiplatform, and that was the point of the exercise. Now perfomance wise we need to see, but I think we look more for functionality right now. You won't train mega networks in a browser after all.
h

Hampus Londögård

05/07/2021, 6:35 AM
Oh, I thought this was related to your previous pytorch contribution 😄 But this is really cool, seems simple enough to code that you could easily fit a framework on top which abstracts it into using lambda functions. Typed lambda functions + DL is something I’ve wanted for a while, Python simply doesn’t cut it. Really cool contribution! (unrelated) Do you happen to know if there’s any progress on supporting the new Vector-api on the JVM for kmath?
р

Ролан

05/07/2021, 6:43 AM
Thanks )), no the pytorch story is perpendicular to that. In fact, we wanted the user to be able to prototype simple things in a lightweight framework before getting monsters like pytorch, tf or dl4j. I am sorry for the vector-api, I haven't heard anything yet
👍 1
a

altavir

05/07/2021, 7:30 AM
Current work i the prelude to pytorch integration. We need to understand how to make better API for that. As for performance, it is not yet optimized but after optimization I expect the difference with native solution less than factor of 3. It is also possible to make easy parallelization and lazy computation optimization, so we can even win in some places.
As for vectors, do you mean https://openjdk.java.net/jeps/338? It is on the roadmap, but our current research show that there is already good automatic vectorization in latest JVMs, including GraalVM. Also there is Viktor project and we have bindings for it (not for tensors though).
h

Hampus Londögård

05/07/2021, 7:37 AM
That’s the one yeah. Agreed auto-vectorization is good. But when you know you want it from the get-go it could make sense to code for it, rather than hoping the JVM is smart or that the loop simply runs enough times and is tight 🙂 F64 is more precision than I would’ve preferred hehe.
I’ll most likely be migrating my project (londogard-nlp-toolkit) to kmath in the future, I really like the idea of swapping backends. For now I’ve simply used Ejml (which you wrap) because there’s no expensive native-interop when running single math operations. But once I introduce ML-models & DL I’ll have to use something else I think, at least for the DL-models.
a

altavir

05/07/2021, 7:39 AM
Indeed. The issue is here: https://github.com/mipt-npm/kmath/issues/249. It is marked as waiting for external contributions. so I hope some students will work on it soon. Meanwhile, as said, we get very good results with GraalVM automatic vectorization.
р

Ролан

05/07/2021, 8:26 AM
@altavir talking about performance is misleading here. Deep Learning is just made for GPU. (I am also advocating that beyond DL but that's another story). We are trying to offer some functionality in places where you cannot afford huge GPUs - and there are really a lot of such applications. But you have to forget about performance.
a

altavir

05/07/2021, 8:26 AM
Indeed, I was talking about CPU only. Doing GPU directly from JVM would be hard.
i

Iaroslav Postovalov

05/07/2021, 9:51 PM
It's simply impossible because Cuda like APIs can't be created natively for JVM, so the FFI overhead in different forms is unavoidable.
a

altavir

05/08/2021, 5:45 AM
It is possible for example with http://www.jcuda.org/. You can't create shared memory with the gpu anyway. But the work is tedious. They are experimenting with it right now in MultiK.
i

Iaroslav Postovalov

05/08/2021, 9:17 AM
It is FFI, too.
р

Ролан

05/08/2021, 8:29 PM
@altavir those are just java bindings to C wrappers of CUDA libraries, you still cannot integrate your own CUDA kernels like you would in python or C++. You would have to pass by JNI with all the pain.
a

altavir

05/08/2021, 9:18 PM
I've actually used opencl bindings and it does not require jni. As for object copy, you need to do that anyway to work with gpu.
р

Ролан

05/09/2021, 6:06 AM
Of course with OpenCL you can send your shader programs from the JVM - they are just strings. In C++ you can use
boost_compute
Copy code
namespace bc = boost::compute;
auto src_code = std::string_view{
"float circle_area_gpu(Circle c) { "
" float pi = 3.14f;
"
" return c.r * c.r * pi;
"
"}
"
};
auto circle_area_gpu = bc::make_function_from_source<float(Circle)> (
"circle_area_gpu", src_code.data()
);
a

altavir

05/09/2021, 6:16 AM
Yes and things like aparapi do the same for Java. I think there is some kind of idea to use Kotlin IR to produce kernels for CUDA/OpenCL, but it is not implemented yet.
р

Ролан

05/09/2021, 6:17 AM
Yes, I was looking at TornadoVM as well, it looks great
a

altavir

05/09/2021, 6:19 AM
I've never used it. At a time when I played with OpenCL, it has only first appeared. But basically it is the same idea as in your sample, we generate kernels from the Java bytecode dynamically. Kotlin IR would be even more avanced in this regard since it is higher level.
р

Ролан

05/09/2021, 6:20 AM
I don't know whether TornadoVM or Aparapi somehow integrade nvcc compilation on the fly like numba does in python, or maybe it's all OpenCL
a

altavir

05/09/2021, 6:20 AM
🤷‍♂️
5 Views