Hi friends! DataFrame question. :slightly_smiling_...
# datascience
p
Hi friends! DataFrame question. 🙂 I have a
.csv
file in which one the columns has a comma-separated string. I would like to split it and have the line replicated for each element. I have a working code, but it's far from optimal. Code in thread. 🧵
So far, I move back and forth from data class to DataFrame, but it seems very inefficient:
Copy code
//    ➜ cat test.csv
//    id,list
//    1,"a,b,c"
//    2,"d"
//    3,"e,f"

    data class Thing(val id: Int, val list: String)

    DataFrame
        .readCSV("test.csv")
        .toListOf<Thing>()
        .flatMap { thing ->
            thing.list.split(",").map { thing.copy(list = it) }
        }
        .toDataFrame()
        .print()

//    id list
//    0  1    a
//    1  1    b
//    2  1    c
//    3  2    d
//    4  3    e
//    5  3    f
Any suggestions are welcome! Thanks! gratitude thank you
r
And with the upcoming compiler plugin even more neat*:*
gratitude thank you 1
p
This is amazing! 🔥 Thank you so much, Roman! gratitude thank you Looking forward to the upcoming plugin! Cheers!
kodee welcoming 1
r
Oh, I've forgotten there's easier way
Copy code
DataFrame.readDelimStr(csv)
    .split("list").by(',').inplace()
    .explode("list")
🤯 1
🔥 2