Quick dataframe question, is it possible in a note...
# datascience
p
Quick dataframe question, is it possible in a notebook to write a dataframe extension method after the dataframe instance has been created based on a parsed CSV file
fun DataFrame<???>.myMethod() = filter { some_column == "Yes" }
So the question is how to get the required type since the dataframe schema is automatically generated based on the data found in the CSV file.
i
Hi, you can find it out from the generated code, see screenshots
Another way to find it out is completion
And.. One more way. Define the following function in you notebook:
Copy code
@OptIn(kotlin.ExperimentalStdlibApi::class)
inline fun <reified T> DataFrame<T>.dtype(): kotlin.reflect.KType = kotlin.reflect.typeOf<T>()
And then just call it this way:
p
Thanks, works like a charm!!! P.S What really would be great, if in the (far) future this could grow into something like "type providers", as exists in F# language
i
@Peter DataFrame has one more thing to suggest. You can define an interface, annotate it with
@DataSchema
, and all "compatible" dataframes will implement it. See example. Maybe it's even better than use implicitly generated types
h
In which repo is this generic magic implemented? I'd love to backport this to krangl if possible. 🙂
i
Code generation is in dataframe repository, but it extensively uses Jupyter kernel API. You can use it too like described here. Simple example is here.
n
@Peter Hi! We have a gradle plugin that does schema inference from data samples https://kotlin.github.io/dataframe/gradle.html#schema-inference. I think it kind of similar to
FSharp.Data
type provider in terms of functionality. I would love to hear any feedback so we can polish plugin UX 🙂
h
Thanks for the pointers!