Do you know any time series manipulation and modelling libra kotlinlang #datascience

Join Slack

Do you know any time series manipulation and model...

# datascience

Filipe Duarte

04/16/2021, 4:06 PM

Do you know any time series manipulation and modelling libraries for Kotlin?

Filipe Duarte

04/16/2021, 4:07 PM

Features like sliding window, and date-time, resampling and statistical linear models?

altavir

04/16/2021, 5:32 PM

Something could be done via kotlin-statistics. Also a lot of this in Java. But in general, you need to be more specific.

Filipe Duarte

04/16/2021, 5:34 PM

• Model and forecast time series using ARIMA with a sliding window cross-validation. • Resampling time series frequency. Ex.: seconds to minutes frequency or daily to monthly • Transform a list of values in sub lists of values depending on the time step

altavir

04/16/2021, 5:35 PM

Have not heard of something kotlin-first, but I am sure there is something in Java world

Filipe Duarte

04/16/2021, 5:36 PM

Another thing, let's say I've a Java lib and want to use it in Kotlin while changing operators (doing operator overloading). Instead using X.add(Y), I use X + Y. Is it possible?

Filipe Duarte

04/16/2021, 5:37 PM

create a another operator methods or a function type? I ask this because I know that Java does not have operator overloading but have a gigantic ecosystem

Ролан

04/16/2021, 6:21 PM

To be honest I think kotlin will be a great language for the execution part of algotrading. I won't use to develope the analytics and backtesting though. If you are absolutely desperate to use kotlin for that as well, I would choose a powerful computational platform like deeplearning4j and build on top of it everything. I would also avoid apache commons.

Filipe Duarte

04/16/2021, 6:32 PM

I’m using python to develop the forecasting models and for research (PhD). I’m building models for intraday ... Data is huge and I was not used to build models with this data size. I’m thinking to use Spark to data wrangling

Filipe Duarte

04/16/2021, 6:33 PM

Maybe use Deeplearning4j to build deep learning models

Filipe Duarte

04/16/2021, 6:33 PM

Smile or Spark for machine learning

altavir

04/16/2021, 6:33 PM

Basic time-series manipulation does not require a lot of processing power. But not all tools are currently available in one place.

Filipe Duarte

04/16/2021, 6:35 PM

Krangl has some features for data manipulation but i think it is not suited for big data

altavir

04/16/2021, 6:36 PM

Indeed, it is not. We are doing pretty big-data processing in Troitsk nu-mass analysis, which is close to time-serries, but it has a narrow focus and probably not good for other tasks.

Ролан

04/16/2021, 7:07 PM

@Filipe Duarte you should use https://code.kx.com/q/learn/ for intraday data

Ролан

04/16/2021, 7:47 PM

Since you are an academic you will be fine with the license for

kdb

. If you don't want to learn

, for the kind of data you are talking about, you can go a long way with

python

combining

numpy

pandas

numba

and

pytorch

. For data storage you will be fine with

h5

pt

from

pytorch

. Spark was not designed to deal with this kind of data. You should avoid databases like cassandra as well.

👏 1

Filipe Duarte

04/16/2021, 8:32 PM

Thank you so much @Ролан! Do they give license for academics to use kdb? I thought spark fit time series data well. So in this case, for the sake of time management, it is best to follow your guidelines. Do you use Kotlin for high frequency financial data?

Ролан

04/16/2021, 9:32 PM

kdb 32 bit can be used by anyone, kdb 64 bit can be used for free on personal laptops with constraints on number of cores and RAM, for academics you get it for free with no constraints and you can run it on the cloud. I haven't seen anyone analysing market data in the JVM (execution yes of course, but that's different).

Filipe Duarte

04/16/2021, 9:35 PM

Do you use python api for kdb to do data analytics for market data?

Ролан

04/16/2021, 9:49 PM

I love

to be honest, but you can indeed work with

pyq

to call

from

python

, or vice-versa use

embedpy

to call

python

from

👍 1

Ролан

04/16/2021, 9:52 PM

It is also interesting to note that for purely numerical data serialising to

.h5

is only 2 times faster than using pytorch

.pt

, but reading from pytorch

.pt

.h5

files approximately

Filipe Duarte

04/16/2021, 9:55 PM

Do you talking about reading data from

kdb+

and store data using

h5

.. than using pytorch to load this data?

Ролан

04/16/2021, 9:56 PM

kdb

will be ways much faster than all this, I was talking about pure python/C++ here

Filipe Duarte

04/16/2021, 9:58 PM

but how to train the models with pytorch and then use the data with kdb+?

Ролан

04/16/2021, 9:59 PM

https://code.kx.com/q/interfaces/pyq/

😃 1

Ролан

04/16/2021, 10:00 PM

you can pass around numpy arrays without copying from q to python and vice-versa

👍 1

Ролан

04/16/2021, 10:01 PM

that the beauty of buffer protocols and the reason why

numpy

is so useful in data analysis

Ролан

04/16/2021, 10:02 PM

we alre lacking that in the JVM ecosystem, because you have to copy data all the time through FFI is that is very costly

Filipe Duarte

04/16/2021, 10:03 PM

because you could use the reference of the array?

Filipe Duarte

04/16/2021, 10:06 PM

I am astonished ... kdb+ and the q language are a gem. It would be awesome if they integrate with JVM

Ролан

04/16/2021, 10:07 PM

There is a client throught sockets

Ролан

04/16/2021, 10:08 PM

whenever you do a trade it's very likely it will be executed on the JVM, but for analysis that data get flushed to a kdb database typically

Filipe Duarte

04/16/2021, 10:17 PM

understand, and do you use Kotling for data analysis or something related?

Filipe Duarte

04/16/2021, 10:19 PM

I'm going to request the academic license for kdb+and try it. What do you think it is worth for me to continue learning Kotlin for financial data science and research (science experiments) - thinking in the future?

Ролан

04/17/2021, 6:49 AM

Over the summer I plan to provide some bindings for pytorch to kotlin actually, that will add some of the data science tooling. My main motivation regarding doing so though is to bring powerful analytics tools closer to the data collection systems for doing realtime analytics. In trading typically what I have seen is that ETL and collections systems are running on the JVM. The data is then sent via some broker (like kafka, redis etc. or even just flushed to a database, or just via sockets) to yet a whole different service typically written in python doing the analytics. The result of those analyics is then sent to a client/decision/execution platform (again typically running on the JVM). Now, some simple analytics can happen on the JVM straightaway but that is limited, Spark can help but it has very poor perfomance. For any other numerically intensive modelling you need to go native.

🦜 1

Ролан

04/17/2021, 6:52 AM

That all said, there is a difference between the ability of running heavy analytics on the JVM and doing actually research. I think we are a bit far of on the latter.

Ролан

04/17/2021, 6:53 AM

So I would prototype something in python, and then instead of setting up that python code in production as a miscroservice, I would rewrite it for the JVM using bindings to libraries like tensorflow/pytorch.

Filipe Duarte

04/17/2021, 1:35 PM

So, for you, the ideal scenario would be to use the JVM Kotlin for real-time analytics? In high-frequency trading, do you think today the best way is to use python to prototype the models and strategies and C++ for production using Machine learning libs libtorch, tensorflow, mlpack I’ve seen some posts on HN about using JVM for HFT. Some firms are using it, but I don’t know much of the details. Maybe they operate in not so high-frequency.

altavir

04/17/2021, 1:40 PM

@Filipe Duarte JVM/Kotlin are optimal for any use case, where you do not have a limit on memory consumption. The limitation right now is not in the language/platform, but in availability of libraries.

Ролан

04/17/2021, 1:51 PM

@altavir is biased 😀

Ролан

04/17/2021, 1:52 PM

A lot of HFT are moving indeed to JVM from C++. But this is execution, it has nothing to do with analytics

Ролан

04/17/2021, 1:55 PM

I prototype in python indeed, and then write in C++ but mainly because I want to have the flexibility to integrate my algorithms with any kind of execution platform. People use C#, Go, Python, NodeJs, even Elixir etc. If I write something kotlin-jvm specific, how would I later integrate it with Go if needed?

altavir

04/17/2021, 1:55 PM

I have experience in C++, Python and some in Julia. The thing is that if you need to do something once and never touch it again, it does not really matter which language to take. This is why academics tend to take C++ (for flexibility) and Python (for simplicity). But the problem is that in my experience there is no such thing as a program for a single use. There is always some kind of support involved. And for support you need good ecosystem. Python is considered good for prototyping and indeed even biased as I am, I am teaching python to physicists, but it all works well, while we do not want to turn those prototypes into the software.

Ролан

04/17/2021, 1:57 PM

I don't want to reinvent the wheel all the time tbh

altavir

04/17/2021, 1:59 PM

Indeed, but when we do not algorithms, but full-scale software development, prototyping does not really work. You can't just translate something from python to, say C++ or Java without doing double work of rewriting and optimizing everything. Or you need to replicate familiar tools in every platform. Which you do now. My point is that right now the only advantage of Python ecosystem is the number of libraries existing there.

altavir

04/17/2021, 2:05 PM

Right now the salvation is only in that most of major python libraries are done by large corporations which invest a lot in the code quality, but I think that as soon as more people will move from C++ to Python there will be the same problem with the code quality and maintanance.

Ролан

04/17/2021, 2:43 PM

I am talking more about the maths. This is always very tricky to migrate. So I prefer to write that in C/C++ and keep it there. Then of course, because C++ is not so much fun for larger scale applications you integrate it into with other languages. JVM is certainly not the nicest club for mixing native but it's alright.

altavir

04/17/2021, 2:50 PM

Indeed. The problem with that solution is that you need a highly qualified person like yourself to adopt that solution to specific needs. In science we usually do not have such people available. On the other hand, keeping record performance is not required either. Therefore we need something like python with good maintenance. In other words, Kotlin or Julia. My own choice of Kotlin over Julia has two reasons: much better tooling and support for industrial non-computing ecosystem (networking, data storage, etc).

Filipe Duarte

04/18/2021, 3:26 AM

https://stackoverflow.blog/2021/02/22/choosing-java-instead-of-c-for-low-latency-systems/

Filipe Duarte

04/18/2021, 3:30 AM

When we read Java, we could think in Kotlin?

altavir

04/18/2021, 5:58 AM

Yep. And you can listen to some of Roman Elizarov't talks. He have experience with low-latency finance development in Java. C++ is probably irreplaceable in GameDev, where you need a low-level access to the hardware.

altavir

04/18/2021, 6:00 AM

Things like Torch/TensorFlow as well as long as they are hardware-specific

Filipe Duarte

04/19/2021, 1:56 PM

@altavir could you share links to some of these talks?

Filipe Duarte

04/19/2021, 2:28 PM

@Ролан which libs and frameworks do you use for machine learning, not deep learning? Let’s say... you are building experiments with several methods when prototyping. I have been using Python all the way, but with a new challenge, the high-frequency data, I was thinking of testing the JVM with Kotlin or Scala to do everything, prototype, and deploy. It is not algorithmic trading, it’s more an analytics and forecasting project with high-frequency financial data.

Filipe Duarte

04/19/2021, 2:38 PM

It’s going to be a system that automates the forecasting of some variables. I’m going to analyze the models and test new ideas. As you guys said, Python is ok for prototype the idea, but not to production. So, I would like to have a monolithic system, using the same language for everything. But I am open to everything, combining languages if necessary.

Ролан

04/19/2021, 4:52 PM

@Filipe Duarte I recommend you to use the resources available on the

kdb

website. That stack was specifically designed to help people solve the problems you are working on.

👍 1

Filipe Duarte

04/19/2021, 6:28 PM

I’m going to dive deep

Filipe Duarte

04/19/2021, 6:39 PM

The firm uses the

kdb+

3 Views

Open in Slack

Previous Next