any good library support load/write data from/to d...
# datascience
s
any good library support load/write data from/to disc available? e.g. pandas has support to write feather format (I use a lot) or hdf5 file (I use a lot too), I search around for few days, seems could not find any library support that ...
a
Krangl has some operations for that.
As for HDF5, no, there is no appropriate reader. The only library that supports it oustide Python is NetCDF and the API is not easy. Kotlin/Java has its own nice formats like Parquet. Feather is based on Arrow, so there should be some support for it. I am interested to play arround with Arrow if you want to collaborate.
s
Thanks for mention Parquet, Iooks like pandas has support for parquet format, I looked at krangl dataframe API, does not seems support Parquet http://holgerbrandl.github.io/krangl/javadoc/krangl/krangl/-data-frame/index.html Main usage for me is produce dataframe (timeseries) save to file, then load to Python for analysis and visualization, for now, using csv file, works but bit slow ...
a
What is the size of the file? Can't you use plain CSV/tsv? Anyway, data format is an important topic, we should discuss it more, maybe start a new library for columnar data io.
s
currently I am using compressed csv, 20 years minute interval stock bars, each symbol about 20M in size, and have to process 2000+ symbols, in python, load from one of those csv.gz file took 2.6+ seconds (kotlin took 800+ms), but if could be in other more efficient format will be much faster. I used to use feather format (produce/load from Python), took < 100ms in python for same amount of data to load. for my use case, I feel most needed is a good charting library, then would be nice to have an efficient data format for large dataset io.
Also I saw some examples with kotlin-dataframe, but I am not sure where to get them.