https://kotlinlang.org logo
#datascience
Title
# datascience
h

holgerbrandl

01/20/2023, 11:02 PM
With (legacy) krangl (or dplyr in R) I could do
irisData.groupBy("Species").summarizeAt({ all() }, SumFuns.mean)
. How could I rewrite this to use kotlin-dataframe instead? Essentially I want to summarize all columns in a grouped data-frame to their mean. Conceptually, one may even want to use different aggregators here to not compute mean but also standard deviation or other aggregates at once. (cc @Jolan Rensen [JB])
j

Jolan Rensen [JB]

01/21/2023, 3:05 PM
@Nikita Klimenko [JB]
r

roman.belov

01/21/2023, 3:50 PM
@holgerbrandl probably I didn’t get the question, but generally everything is as easy as
For
mean
and
std
it looks like this
h

holgerbrandl

01/22/2023, 10:05 PM
Thanks for the advice @roman.belov. Works great. Is it possible also to provide a column selector
endsWith("Price")
and to provide a custom aggregation (as lambda)? Similar to the swiss-army-knife in dplyr
across
https://dplyr.tidyverse.org/reference/across.html
r

roman.belov

01/22/2023, 10:11 PM
endsWith is already here. Will add it to the docs
Regarding aggregation, there’s also already a quit broad syntax for custom aggregation : https://kotlin.github.io/dataframe/groupby.html#aggregation Or do you have something different in mind?
h

holgerbrandl

01/22/2023, 10:18 PM
Interestingly when evaluating your suggestion from above in a debug window it fails with
Copy code
Cannot find local variable 'this@AggregateGroupedDsl' with type org.jetbrains.kotlinx.dataframe.aggregation.AggregateGroupedDsl
As part of a program, it works fine.
I've studied the documentation of
aggregate
@roman.belov. However, because the docs are presenting code only and do not include any data examples, I find it hard to understand/read (compared to the
across
docs from above). Also, the grammar of aggregate does not indicate to me how and if (a) column selection and (b) custom aggregates are possible. Neither can I find any example how to do so. Are you sure it's possible? It's an edge use-case I believe, so not supporting it would be fine I guess, although clearly
dplyr::across
emerged because there are obviously use-cases for a more flexible syntax.
8 Views