Kudos and a sincere "thank-you" to <@U0AJS60G1> an...
# datascience
a
Kudos and a sincere "thank-you" to @roman.belov and the Kotlin #datascience team. I know it's just in alpha/preview but so far the API is terrific -- clear and concise, so much better than pandas crazy API and confusing selectors. Working with typed column accessors is a breeze. Everything chaining (like dplyr) makes for easy pipelines. Columns containing classes enables calling class functions within the dataframe, very useful! Replaced 80% of an ugly, complex SQL-based process with about 50 lines of dataframe, will be deployed to production this week.
toDataFrame()
is all that's needed to pass db query result from ORM. Fantastic job!
❤️ 2
K 6
K 3
h
Congratz on the deployment! How was the performance difference compared to raw SQL? 🙂 Which SQL engine are you using?
a
This process was on SQL Server. Hard to compare exactly on speed alone, but dataframe is probably just as fast or faster. Biggest gains are in flexibility with logic -- easy to add a field, or an if/then or when early in the process. With SQL if I added a field, had to make sure that field flowed through to all subsequent queries... so lots of editing of views, each of which were comprised of a sequence of CTE's, so a big mess (and what if you change a view that some other process also relies on?). Kotlin
when
statement so much easier than ugly SQL
case when
. Real string and date formatters rather than SQL hacks with CONCAT(). Embedding functions in data classes within a dataframe column, so objects can perform the calculation appropriate to its type.
h
I can really see the development pace sky rocketing, which could be worth loosing a few milliseconds! 👍 But some type of benchmark would be cool, I'd expect a well done SQL to be plenty faster as it doesn't have to realize the data and could further optimize the query. DataFrame libraries like Polars allows lazyFrame which can use these types of pushdown optimizations etc
Exciting, thanks for response!