Is there a kotlin native dataflow or pipeline engi...
# datascience
y
Is there a kotlin native dataflow or pipeline engine available, which i can also run on embedded device such as android? Unfortunately, spark will not be an option for me.
n
https://github.com/Kotlin/dataframe targets JVM and should work on Android too
a
@Nikita Klimenko [JB] It is not a dataflow pipline.
@Yingding Wang Could you please clarify, what exactly do you need? DataForge is a pipline engine and it could work on Native, but it is a research library, so it should be refitted for each project.
y
@altavir I want to do transfer learning with tflite on android. I am building a small ml framework for kotlin to abstract away the tflite specific codes, something very similar to tfx (tensorflow extended) in python. Currently, I started to write some code to prototype. Thought if there is already a light-weighted dataflow or pipeline engine, then i can just reuse and integrate it into my android kotlin mlflow framework at first. Most case a dataflow or pipeline engine has already some orchestration mechanism that i can reuse and write some task container to trigger data processing or training task. This kind of pipeline engine mustn’t contain its own DSL, a small in-memory scheduler can be just fine. Maybe also with some annotation mechanism so that i can extend tasks for android. Otherwise I will need to write all the stuff on my own. 🙂
@altavir Can you give me some more details to the DataForge pipeline engine? maybe an example that i can try it out?
I rephrased my question to “kotlin” instead of “kotlin native”, i think i can leave out the NNAPI on android or ios first 🙂 to make my life somewhat easier at the beginning
a
For TF-light you can just use TensorFlow API for Java. I think it will be enough. As for DataForge, it is here: https://github.com/SciProgCentre/dataforge-core. It was developed for much more complicated workflows (particularly, data analysis in particle physics. Here is an example of task definition: https://github.com/SciProgCentre/dataforge-core/blob/5406a6a64c872159d4cb03a5ee757[…]otlin/space/kscience/dataforge/workspace/SimpleWorkspaceTest.kt. The problem is that it is a research library, so it is not fully "production ready" It needs to be adjusted for each case. You are welcome to collaborate on the development though.
On the bright side, it is much better suited to work in non-claster or even heterogeneous/polyglot environment than Spark.
y
thanks