Julia reported release of their own DataFrame: <ht...
# datascience
a
Julia reported release of their own DataFrame: https://dataframes.juliadata.org/stable/. Right now they are probably the main competition to emerging Kotlin for Data ecosystem, so we whould study it. 🧵
🦜 5
DataFrane builders
Copy code
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
While such things are not possible in a type-safe language, we can do things, that are similar yet much more safe:
Copy code
val a by symbol
val b by symbol
val df = DataFrame{
  a(1..4)
  b("M", "F", "F", "M")
}
column constructors here could be done via member extension like
Symbol.invoke()
. Currently they all must be pre-defined in the builder, but with Multi-receivers we can add them as extensions.
Symbolic accessors
Copy code
julia> df.A
Currently there is an ongoing effort by the Kotlin DataFrame team to do that via staged compilation, but in KMath we found another way. We can use pre-defined symbol objects as identifiers. so we could do things like this:
Copy code
val A by symbol
df[A]
One could use column definitions instead and get a type-safe accessor.
Row-based and column-based tables are easy to support. We can even use type-safe row accessors.
Accessor by expression
Copy code
julia> df[(df.A .> 500) .& (300 .< df.C .< 400), :]
Well, it is hard to do that and I am pretty sure we should not do it this way. It is better to create a DataFrameQuery object and create a builder for it like this:
Copy code
df.query{
  a{it>500}
  c{it in 300..400}
}
Data mutability I am pretty sure that data in the table should be immutable and when someone changes something, the new zero-copy with one replaced column should be created.
IO I think the koltin ecosystem is quite rich in that regard, all we need is to provide convinien constructors for the DataFrame and make in an interface to allow 3rd party implementations.
Accessor by expression
Nice point, but simple (DataFrame) -> Boolean filter is generally enough
About mutability and IO - it is also done this way in DataFrame
a
I know, but since DataFrame has not any public discusstions yet, I decided to write the points here.
👍 2
By the way, I would like to have some kind of discussion on DataFrame API. I have some concerns about how it is right now.
cc @roman.belov @Anatoly Nikitin
h
I am late to the party, but thanks for the pointer!