Václav Škorpil
09/24/2024, 12:38 PMIan Koh
09/24/2024, 12:39 PMVáclav Škorpil
09/24/2024, 12:41 PMVáclav Škorpil
09/24/2024, 12:42 PMIan Koh
09/24/2024, 12:44 PMIan Koh
09/24/2024, 12:56 PMfilter
method.
This, however, has its drawbacks too. If you look at the documentation, you'll see that the dataframe syntax is very non-standard. An example is the add
method, which in PySpark would be withColumn
. I think it is quite reasonable to assume that anyone who's considering Kotlin dataframes would already have exposure to dataframe libraries like pandas, PySpark or polars – all of which generally have similar syntax.
Because the syntax here is different, you'll have to spend a lot of time reading the docs to do operations that are simple in SQL. If you're working alone it's not a problem, but when other people have to read your code, I don't know how much friction that might cause.
My biggest pain point is that, when I want to use Kotlin's dataframes for something that's more complicated than the examples in the docs, I often run into issues. See the question I posted just now. The compiler doesn't recognise my dataframe's columns; I was able to keep working in the notebook despite this lack of recognition, while in the Kotlin source file the compiler is not letting me do anything else until I resolve it.
So, to conclude, I think Kotlin dataframes have a lot of potential. It is really helpful to approach data manipulation the way that data analysts and scientists do. However, because it's still not mature, whatever simplifications you enjoy are counterbalanced by an unfamiliar syntax and lack of suitable example code for what you want to do.Ian Koh
09/24/2024, 12:57 PMIan Koh
09/24/2024, 1:01 PMVáclav Škorpil
09/24/2024, 1:01 PMVáclav Škorpil
09/24/2024, 1:04 PMVáclav Škorpil
09/24/2024, 1:04 PMIan Koh
09/24/2024, 1:07 PMVáclav Škorpil
09/24/2024, 1:09 PMroman.belov
09/24/2024, 5:20 PMIan Koh
09/25/2024, 9:01 AMJolan Rensen [JB]
09/25/2024, 9:29 AMkeyValuePaths
in readJson()
, as json can potentially generate hundreds+ of columns and accessors which can get heavy really fast if they are not converted to key/value columnsIan Koh
09/25/2024, 10:31 AMJolan Rensen [JB]
09/25/2024, 10:34 AM