Last week a SQL tutorial <https gvwilson github io sql tutor kotlinlang #datascience

Last week a SQL tutorial "<SQL for data scientists...

andyg

02/13/2024, 11:27 AM

Last week a SQL tutorial "SQL for data scientists in 100 queries" caught my eye, on the front page of HackerNews. Seemed like its 100 examples would make a great basis for a dataframe tutorial... so I created a clone: "Kotlin dataframe in 100 queries" (full repo w/ notebook file here). I'm interested in your feedback/comments, please feel free to reply here or to DM, thanks! (and thanks for your work on dataframe!)

🔥 14

K 4

K 2

👍 7

Pavel Gorgulov

02/13/2024, 12:28 PM

This is a very cool notebook! Great job! What do you use to convert notebook to html?

Jolan Rensen [JB]

02/13/2024, 1:01 PM

indeed awesome job!! 😄

Jolan Rensen [JB]

02/13/2024, 1:11 PM

Just some small notes regarding nullability: • I'd replace

filter { it.someCol != null}

with

dropNulls { someCol }

as that's a bit easier to read • A lot of

!!

can be replaced with

.castToNotNullable()

on the column in the previous operation For example, n.o. 49:

Copy code

dfPenguins
    .dropNulls { body_mass_g }
    .convert { body_mass_g.castToNotNullable() }.with { 
        when {
            it < 3500 -> "small"
            it < 5000 -> "medium"
            // ... especially useful when there are lots of cases
            // a map also works, as we demonstrated in #41
            else -> "large"
        }
    }
...

Of course your version is equally correct 🙂 it's just my taste

Jolan Rensen [JB]

02/13/2024, 1:35 PM

Regarding your JSON section: It's actually possible to convert JSON columns to column groups in DataFrame. I just noticed this functionality of parse is undocumented, but you can

df.parse { jsonStringCol }

(or alternatively

df.convert { jsonStringCol }.with { DataRow.readJsonStr(it) }

) and then the json-reading abilities of DF will be utilized. Of course, Kotlinx Serialization is also possible, but sometimes that's a bit much boilerplate IMO. Plus, it allows you to show column groups, which SQL doesn't support 🙂 (right?)

👍 1

andyg

02/13/2024, 8:04 PM

Thanks for the suggestions!! ... yes I used

dropNulls

in some places but I agree it is clearer... I've replaced most null filters with it, and `castToNotNullable`where appropriate. Also added some json

parse

examples, thank you for that tip, definitely is a nice option rather than all the setup necessary for a full de-serialization.

🙂 1

andyg

02/13/2024, 8:07 PM

HTML conversion is just the built-in "Save and Export" function from Jupyter (I think it uses "pandoc" under the hood). Styling is automatically applied from * \.jupyter\custom\custom.css

6 Views

Open in Slack

Previous Next