Hi! I'm currently working on converting a Python s...
# datascience
j
Hi! I'm currently working on converting a Python script into Kotlin. The script uses pandas to create a DataFrame with columns but no rows, and then appends data to it. I'm facing some challenges in Kotlin, as it's not straightforward to create a new dataframe by specifying columns without rows. Additionally, the append method in Kotlin seems to only accept varargs, not allowing for the association of a column with its corresponding value. Could you guide me on the most ideal way to implement this in a Kotlin dataframe? Thanks!
z
Hi, interesting idea, could you please share the part of script with pandas dataframe creation?
j
@zaleslaw The actual script is a bit complicated. You can imagine it first creates a dataframe with columns but no rows, then, inside a few nested loops, it appends a row to the dataframe. What is the most ideal way to do this in Kotlin dataframe?
a
In my experience DataFrame column transformation is quite unintuitive. I do not have a lot of experience with Pandas. maybe for people with pandas background it is better.
c
There are several ways of doing it from my experience, but you can create a new empty data frame with columns like this:
Copy code
data class Test(val foo: String, val bar: Int)
val df = DataFrame.emptyOf<Test>()
df
See also https://stackoverflow.com/questions/77215912/adding-rows-to-an-empty-typed-dataframe/77244204#77244204
j
@zaleslaw My question is more about how to build Kotlin dataframe. The panads one is just to explain the original logic is by appending row-by-row. I want to know what is the optimal way to build dataframe in Kotlin instead? I can use some different logic but essentially the same, such as first building a list then to dataframe. I'm just not sure if it is the best practice.
My essential question is: I have a loop that generates data for each row (translated from Python), represented by a function
inline fun getData(eachRow: (Int, String, /* column values */) => Unit)
. This function retrieves the data and applies
eachRow
to every row. How can I best use this to construct a data frame? The ChatGPT answers to build a list first then convert to dataframe. Is it the best approach?
a
I think that the correct answer is that DataFrame is a column-based storage, it is not intended to be used as a row-based. I am not sure if there is a way to fill rows other but creating list first and convert it to columns later, but it won't be native. Tables-kt uses different approach, it has both column-based and row-based tables and can convert them on demand. For example you can fill row-based table and then auto-convert it to DataFrame. Sadly, I don't have time right now to make it production ready (mostly requires tests and documentation. So contributions are welcome.
👍 3
But the documentation is very poor right now.
c
@Jason5lee You can do something like this:
Copy code
%use dataframe

data class Test(val foo: String, val bar: Int)
var df = DataFrame.emptyOf<Test>()
repeat (10) { no ->
    df = df.append("Hello $no", no)
}
df
But given that the DataFrame API is functional, you end up constructing quite a lot of intermediate DataFrame objects which is copying all the data from the previous one, so if performance is a concern it will probably be faster to collect all your rows an a List and add them all in one go.
👍 3
j
@Jason5lee I think the responses to your question sum up DataFrame correctly. While you can create a DataFrame with no rows, it's not the best way. Kotlin DataFrames are immutable, meaning that every iteration of adding a new row would have you copy all previous columns and rows into new ones. The most kotlin ideomatic way is to have a data class (optionally annotated with
@DataSchema
) representing a row and use the
buildList {}
function from the standard library to build a list of instances of your data class from your read function. Next use .
toDataFrame()
to convert the list to a DataFrame 🙂 See the docs for more information, hope that helps!
👍 4