Kudos and a sincere thank you to < roman belov> and the Kotl kotlinlang #datascience

Kudos and a sincere "thank-you" to <@U0AJS60G1> an...

andyg

08/10/2022, 9:00 AM

Kudos and a sincere "thank-you" to @roman.belov and the Kotlin #datascience team. I know it's just in alpha/preview but so far the API is terrific -- clear and concise, so much better than pandas crazy API and confusing selectors. Working with typed column accessors is a breeze. Everything chaining (like dplyr) makes for easy pipelines. Columns containing classes enables calling class functions within the dataframe, very useful! Replaced 80% of an ugly, complex SQL-based process with about 50 lines of dataframe, will be deployed to production this week.

toDataFrame()

is all that's needed to pass db query result from ORM. Fantastic job!

❤️ 2

K 6

K 3

Hampus Londögård

08/10/2022, 9:03 AM

Congratz on the deployment! How was the performance difference compared to raw SQL? 🙂 Which SQL engine are you using?

andyg

08/10/2022, 4:51 PM

This process was on SQL Server. Hard to compare exactly on speed alone, but dataframe is probably just as fast or faster. Biggest gains are in flexibility with logic -- easy to add a field, or an if/then or when early in the process. With SQL if I added a field, had to make sure that field flowed through to all subsequent queries... so lots of editing of views, each of which were comprised of a sequence of CTE's, so a big mess (and what if you change a view that some other process also relies on?). Kotlin

when

statement so much easier than ugly SQL

case when

. Real string and date formatters rather than SQL hacks with CONCAT(). Embedding functions in data classes within a dataframe column, so objects can perform the calculation appropriate to its type.

Hampus Londögård

08/10/2022, 5:23 PM

I can really see the development pace sky rocketing, which could be worth loosing a few milliseconds! 👍 But some type of benchmark would be cool, I'd expect a well done SQL to be plenty faster as it doesn't have to realize the data and could further optimize the query. DataFrame libraries like Polars allows lazyFrame which can use these types of pushdown optimizations etc

Hampus Londögård

08/10/2022, 5:23 PM

Exciting, thanks for response!

6 Views

Open in Slack

Previous Next