<@U8SBFSF2B> It would be nice to have a discussion...
# datascience
a
@Maria Khalusova It would be nice to have a discussion about API design. It won't do to just copy Python API in kotlin.
m
what particular API are you talking about?
a
TensorFlow/Keras/whatever needed for DS
I am not doing machine learning, so it is hard for me to design the API.
m
Let's chat, of course 🙂
a
Basically, what I want from ML/DS guys are some use cases in order to understand how to better write API for them in kmath. Sadly, usually I get something like "like python", which is not helpful since I am not doing it in Python and I am not sure that Python does it in the most productive way.
m
What sort of API for ML do you want to have in kmath?
a
I am not sure. That is why I am asking. ML could be developed as a separate library, but if we are talking about multiplatform APIs, it is based on mathematics. For example, a lot of people saying that they need linear algebra for that, but what specific opertaions do they need?
The tensorflow operations from the article would look significantly better with kmath contexts. But I am not sure, if it is needed.
m
I'm sorry it took forever to get back to you. How about we have this discussion some time in May? It is still a little too early, I think.
a
It would be cool. I think that one of the main reasons why Python ecosystem is comfortable for users is that they have common exchange data format and conventions for all packages (numpy/pandas). We need the same level of centralized design (not implementation) in kotlin to attract people.
j
side by side examples would help to understand your concerns. Keras is a wrapper for multiple native wrappers. i wouldn't agree that pandas and keras pose the same python impedences.
a
I did not compare keras with pandas, I did not talk about python. The only think I've said is that python have a benefit of common ground between libraries. Mostly numpy and pandas.
j
do you have any specific python ecosystem examples that you can present a proposed kotlinized analog for?
because of type safety you have to suspend disbeleif in one or the other ideology in my experience to imagine that the python snippet and kotlin snippet represent the same intent.
a
They do not. I was talking about unified API.
j
kotlin has a long way to go to occupy similar space with python. this gist shows the pandas snippet and the pandas analog I've written performing the same work, and while the python is 3 lines the kotlinc overheads, for lack of knowing any way to make this more concise, and the commandline trial and error, were far larger https://gist.github.com/jnorthrup/74e1960bf8d5c4dbc5a9394806118bf0
your point looks like you are advocating for a solution in search of a problem.
m
Hi! I'm joining in on this late, but I've been interested in having TensorFlow and something akin to Keras models available in Kotlin as well. I've started a project. Note that I've just very recently begun experimenting. It's very superficial right now, and has a long way to go. However, this concept excites me. I'm working on it in my spare time, but if anyone's interested in working on this with me, I'd be very happy to work with folks on this, and hopefully build something that makes sense. Perhaps if more folks join on the same project this can become a reality? If anyone has a better code base already started, or better ideas, I'm 100% onboard with dumping my very beginning project and starting with something that's further along. I was also looking into adopting kmath, but have not yet had time to fully vet this out. My goal is to have this work eventually with both Kotlin JVM as well as Native. I set a site up for this: https://tensorflow-kotlin.dev The github repo is here: https://github.com/TensorFlow-Kotlin/tensorflow-kotlin
👍 1
a
Kmath requres a lot of feedback and community work on API design in order to become something but an scientific toy-project.
m
Gotcha. Makes sense. I might skip kmath for now and leave it as a todo while I try to get more of the other features built-out. Of course, I realize that going down the road without all the necessary community feedback could make things difficult later. I think right now I'm mainly trying to see what kind of operations and model types I can get built out and running. I really wish the Java or C API were released for TF2 but may have to wait a bit longer for that, or attempt implementing something. But I suppose that goes back to your earlier conversation about needing a consistent design that works for the community.
a
If you have any ideas please write them here or in kmath tracker. We have some manpower that could be spent on implementing urgent features or even contributing to external project.
m
Thanks. Will do! I’ll collect my thoughts and send over hopefully this weekend.
j
@mattmoore do you have a benchmark project you can share in open source? I have more or less built the parts of pandas I have used in client projects to set up keras forecast pipelines. https://github.com/jnorthrup/columnar once you escape the quantity of local memory in a dataset using pandas overheads, whatever those overheads may add up to, you hit an exponential slowdown with system swap. i wrote this specifically to address that slowdown first as kotlin with NIO, though I'm inclined to say this would be better suited to any native address model with mmap features available than the NIO options, but it works well enough so far.
m
@jimn I have a project barely started to build a TensorFlow/Kotlin library, based on 1.15.0. It is nowhere near ready for use. I started this because it's something I've wanted for a while and other folks have commented to me they'd love to have this option available to them. I'm very interested in your solution. This weekend I'm planning to spend some time looking more in-depth at kmath as well as taking a look at your work with columnar. I'd totally post this in opensource, but I just want to make sure it's clear that my project is very early stage. I've looked most of what needs to be done to get this built and it will take a lot of work. If folks are willing to help with the understanding it's a beginning effort, that would be awesome. I'd love to have help on this. I've seen several articles around about folks trying to do various things in TensorFlow with Java or Kotlin but I think in order to make TF/Kotlin a non-toy reality, there needs to be a solid project started where efforts can be focused. The goal of my project isn't just to have TensorFlow for Kotlin JVM, but to provide a more comprehensive API for the entire ML workflow. I'm also not entirely sure I want this to be available just for Kotlin. As I've thought over it more, I'm wondering if it makes sense to build this out as a more language-agnostic solution that can be more easily bound to various languages. As a disclaimer: I'm a software engineer who's been in the industry for over 15 years. I've built some ML solutions professionally, but I am by no means an expert in the data science industry. I do have a passion for this topic though, and am hoping to do what I can to help make this more of a reality for Kotlin folks. I would need help from the data science community to ensure that what I'm trying to build makes sense for them, otherwise it's a pointless project 🙂 I've started a channel #tensorflow that perhaps we can take this discussion into and maybe start sharing ideas and building this out.
j
I'm happy to build a straw man unit test to deomnstrate a given data prep usecase you want to bring up. I think I'll reserve judgement on the packaging language of TF itself. if there is an idiomatic improvement that kotlin can achieve over the existing keras, I'm game; the TF swift api has made some improvements though that's not an evironment i have at my disposal.