Hi folks We are happy to share with you a very first preview kotlinlang #datascience

Hi folks! We are happy to share with you a very fi...

Pavel Gorgulov

12/28/2020, 9:00 AM

Hi folks! We are happy to share with you a very first preview of Multik — a library of multidimensional arrays we are working on. It provides Kotlin-idiomatic, type- and dimension-safe API for mathematical operations on arrays. API has two pluggable implementations in Kotlin/JVM and Native (via JNI), and they can be switched on the fly to get the best performance. Right now we are in the first alpha stage and we are dying for your feedback. Read more: https://github.com/Kotlin/multik

👍 5

K 3

⏸️ 1

🔢 1

👀 2

🤔 2

altavir

12/28/2020, 9:35 AM

Is it possible to use custom types and operations in it or you are limited to primitives?

altavir

12/28/2020, 9:36 AM

And I think it is closer to #mathematics topic

altavir

12/28/2020, 9:42 AM

It seems like you've taken a numpy approach where we (kmath) took Commons-math approach. I have tons of questions,

altavir

12/28/2020, 10:21 AM

cc @Iaroslav Postovalov

Pavel Gorgulov

12/28/2020, 10:37 AM

Yes, right now we are limited to primitives, since we use Critical for native code. But for the JVM, we may support other types in the future

altavir

12/28/2020, 10:44 AM

I made a brief glimpse into both API and implementation. It is really interesting, because this design is exactly what I decided NOT to do after my experience with numpy. I would really like to discuss it in more detail. Also our test showed that direct native interface does not give any significant boost to performance by itself (without cache optimization and other things), so it would be also nice to discuss it. If you are interested, I can organize a JBR seminar on this.

Pavel Gorgulov

12/28/2020, 2:57 PM

I'm happy to discuss it, but I think it's too early to do it as a seminar.

altavir

12/28/2020, 2:57 PM

A discussion meeting then.

Michal Harakal

12/28/2020, 3:48 PM

@altavir could you please more elaborate on numpy vs commons-math approach? Do you mean API style or underlaying implemenations ?

altavir

12/28/2020, 3:56 PM

Both. Numpy give a limited selection of ndarray components exposed by underlying C implementation. The operations on those structures are hard-coded for given structure. It means that you can't easily just add a complex number. You need to add it everywhere and add new operations for it for the top level structure. Also you can't add new operations because you need to "propagate" operations inside the ndarray (in order to know how to sum two arrays, you need to know how to sum it components). This appoach is easier to implement though. Later incarnations of Commons maths and Commons-numbers give another appoach. They define generic algebras separated from actual elements. So one cand define ndarray of everything if the appropriate algebra is supplied. In Java it looks rather cumbersome, but Kotlin scoped functions allow to do it with minimal syntactic overhead. Making it wokr with the same performance is really hard, but I think that we managed it in Kmath. What is more important, we managed to abastract away the inner strucurer of nd array, so one cat use kmath as a wrapper API for any external library. For example we can wrap multik, if it will provide superior performance.

👍 2

altavir

12/28/2020, 4:02 PM

As I already said, we considered numpy approach and there is actually a kotlin library that follows it: https://github.com/kyonifer/koma. But in the end I've decided that we can do better in Kotlin.

Michal Harakal

12/28/2020, 4:04 PM

Thank you for you answer, now I have some more topics to go through 🙂 I find your idea with context in kmath very interesting...

altavir

12/28/2020, 4:05 PM

It is much harder to implement, but in the end I think it is better. And we managed to achieve the level of performance of numpy for simple operations without using native parts.

altavir

12/28/2020, 4:09 PM

Still, as I said, it is harder. So it is quite possible there is a middle ground.

Michal Harakal

12/28/2020, 4:13 PM

What I believe, what became important is interoperability/reuse with other libraries, e.g. integration with deeplearning. This is what makes numpy in Python world so powerfull, despite beeige efficient etc.. I can and use it everywhere. Also multiplatform/native can became a huge advantage....

altavir

12/28/2020, 4:19 PM

Again, there are two different phylosophies. Numpy is universal because everything is based on numpy. You can't use something different. I do not think this approach could be used in Java/Kotlin ecosystem because we already have a lot of libraries using different data formats. Kmath is mostly a wrapper API (though it has its own default implementations), so it is fully compatible with existing libraries by design. Recently @Iaroslav Postovalov even added a binding for native GSL libraries. With the same API and automatic conversions. Also we have integration with another JBR product Viktor and ND4J (also thanks to Iaroslav). The documentation is very thin though (help needed here!). But again, it requires some boilerplate to define a proper context algebra and is much more sophisticated.

Iaroslav Postovalov

12/28/2020, 5:33 PM

gsl libraries -> gnu scientific library

👌 3

breandan

12/29/2020, 12:24 PM

Nice! Looking forward to trying this out. Re: DSL/API design. Agree there is lots of room for improvement over NumPy. The J/K family of languages brought some interesting ideas to the table, and I think there are still unexplored areas for a type-safe DSL to contribute. Was recently reading an interesting comparison of index-based vs. index-free styles of array programming. Worth a glance if you haven't seen it: https://futhark-lang.org/blog/2020-12-28-futhark-and-dex.html

👍 2

Pavel Gorgulov

12/29/2020, 12:48 PM

@altavir I think we'll talk then next year. At any time convenient for you.

✔️ 2

breandan

12/30/2020, 10:34 AM

I really like the gradual type system. Inference seemed to work well on the examples I tried. Have you thought about whether there is a good way of adding dimension types? It would be neat if it were possible to infer the length of each dimension. I think there might be a way to synthesize dimension types up to a fixed length, but maybe it is better to wait until if/when the language supports dependent types, instead of shipping an ad hoc shape system in the DSL. Minor issues I noticed, it seems like operators are asymmetric. Do you plan to support

operator fun Number.<op>(other: MultiArray<T, D>)

? Importing operators on demand with <Alt>+<Enter> gets a little tedious. Maybe there is an IDE/API solution to default to import on demand from the

org.jetbrains.kotlinx.multik.ndarray.operations.*

extensions. The IteratingNdArray extensions are a nice touch. I wonder if there is a way to inherit from iterable without clashing with the operator overloads, or some other language pattern that would avoid reimplementing the wheel here. Is the linear algebra API more or less stable, or what is the planned API surface? In particular, it would be helpful to to know if multik intends to support boolean arrays and the broader NumPy linalg API or just a subset of those features: https://numpy.org/doc/stable/reference/routines.logic.html https://numpy.org/doc/stable/reference/routines.linalg.html

altavir

12/30/2020, 11:27 AM

Iterable won't work because plus operation concatenates them by default. We had to introduce our own structures because of that. The iterator itself is bad for primitive performance.

Hampus Londögård

01/01/2021, 9:27 AM

@altavir hi, got interested in kmath. How mature is it, only one change (readme) last three months and release is not 1.0? Can you do simple arithmetics (*/+-) and does it have things like PCA, Distances (cosine, euclidean etc)? The documentation is a bit lacking from what I see

Hampus Londögård

01/01/2021, 9:29 AM

I'm basically building machine learning tools (mostly NLP-related) in Kotlin and am now deciding on what math lib to use rather than my own FloatArray-extensions

altavir

01/01/2021, 9:35 AM

@Hampus Londögård Kmath is in experimental stage and will be there for a while. It is a huge endeavor and currently we have only three people working on it when time allows. Also we are experimenting a lot with API (to use Kotlin features to their maximum), so I do not think the API outside core module will be stable in a near future. And of course the documentation is a huge problem. On the other hand, we do not implement everything from scratch and prefer to create connectors to already existing libraries, so implementations themselves are sound and the main point is that you can switch implementation at any moment without breaking the API. The geometry module is on the near TODO list (some things are already there, but major new features are planned). I think that at this moment it is better to work together than to create a lot of incompatible tools, so I would really appreciate if you could explain your case as an issue in the github and we can see if features are already there or if we need to implement anything new. And PR-s and new modules are always welcome. The detailed discussion could be done in #mathematics

altavir

01/01/2021, 9:44 AM

Operations on buffers are quite effective right now. If I remember correctly, we do not have anything specialized for float-32, but it is easy to add it.

Hampus Londögård

01/01/2021, 10:53 AM

@altavir thanks! That is sound, I'll join the #mathematics channel!👍

Pavel Gorgulov

01/11/2021, 11:11 AM

@breandan I tried dimensional types but settled on the current version. Yes, I plan to support operators for Number. There is one more annoying thing with operators - plusAssign, timesAssign and so on. These are inplace operators and don’t work with

var

. Alexander is right about Iterable. In addition, Iterable often returns list as a result, which is not suitable for us. API will expand. This applies in particular to linear algebra. Logical array support has not been discussed yet.

👍 1

breandan

01/19/2021, 6:50 PM

Makes sense, thanks for the info! I didn't look closely so not sure if it's possible to use NDArrays to index other NDArrays, but usually indexing has a specific semantics. For example, two equal length dimensions can be contracted without error, but may not produce a semantically valid result. This is the named tensor idea I mentioned which PyTorch recently added support for, perhaps you have already seen it: https://namedtensor.github.io/ Some possible notations:

nd1.contract(index){nd2}

(nd1*nd2)[index]

Re: Iterable. I've seen this pattern in other Kotlin libraries and it makes sense, although I wonder if there is a language solution to reduce duplication somehow.

altavir

01/19/2021, 6:53 PM

Hah, I've actually done those named matrices four or five years ago, for completely different reason of course, I needed to name correlation matrices for multi-dimensional fits. The current implementation uses KMath Symbol instead of strings. Symbol was designed to designate variables in expressions, but it actually works fine for naming dimensions as well.

10 Views

Open in Slack

Previous Next