when working with the toDataFrame() , is there a w...
# datascience
y
when working with the toDataFrame() , is there a way to directly create dataframe based on the input object dynamically? 🧵
image.png
Copy code
data class Person(val name: String, val age: Int)

val persons = listOf(Person("Alice", 15), Person("Bob", 20), Person("Charlie", 22))
val df = persons.toDataFrame()
println("The Person dataframe is")
println(df)

val dfAny = (persons as List<Any>).toDataFrame()
println("Any Data Frame is ")
println(dfAny)
in the above example, dfAny is empty..
j
toDataFrame()
works by retrieving the class of Person by reified type. If you make it a List<Any> that won't work. However, I see you're working in a notebook right? :) Try and execute the cell and in a next cell access
dfAny
Under the hood it should infer the types based on the data inside. (If not, try
df.unbox()
Also, if you use
DISPLAY()
instead of print (or stare your DataFrame at the last line of the cell), the Dataframe renders nicely :)
y
sounds good.. thanks for the DISPLAY() tip…
j
I've now tried it myself with a pc. Looks like
List<Any>.toDataFrame()
indeed doesn't work.
.toDataFrame()
is shorthand for
.toDataFrame { properties() }
(which is sorta mentioned here). What this means is that it will try, with reflection, to build a dataframe with the properties of, in this case,
Any
.
Any
has no properties, so the dataframe will be empty. I made a little explanation notebook here https://gist.github.com/Jolanrensen/34b5b1572e28b9192697c71f93c29bc2
y
what if we had tried
List<*>
would that work here?
j
Since list is defined as
List<out T>
,
List<*>
is exactly the same as
List<Any?>
, so no 🙂
n
What are you trying to do? c: Seems interesting. We definitely could make public
toDataFrame(KClass)
. Then it will be possible to convert dynamically, for example like this:
list.toDataFrame(list[0]::class)
Copy code
public inline fun <reified T> Iterable<T>.toDataFrame(noinline body: CreateDataFrameDsl<T>.() -> Unit): DataFrame<T> =
    createDataFrameImpl(T::class, body)
y
Thanks @Nikita Klimenko [JB] , the idea is to generate dataframe for any object type dynamically. Trying to leverage it a CSV data ingestion pipeline, and which will have generic data importer, transformation, and exporter to database(perhaps via spark API).
n
https://github.com/Kotlin/dataframe/pull/825 It will take some time to discuss, merge and publish new dev, so in the meantime you can try experimenting with this workaround:
Copy code
internal class A(val b: Int)

internal fun main() {
    val a: List<Any?> = listOf(A(1))
    val df = a.toDataFrame {
        val props = (a[0]!!::class.createType().classifier as? KClass<*>)!!.memberProperties.toTypedArray()
        properties(*props)
    }
    df.print()
}

===
   b
 0 1
Let me know if it works for you or API needs to be different in the PR
🎉 1
y
the code snippet worked perfectly `
Copy code
val df = data.toDataFrame {
    val props = (data[0]!!::class.createType().classifier as? KClass<*>)!!.memberProperties.toTypedArray()
    properties(*props)
}
` thanks!