Last week a SQL tutorial "<SQL for data scientists...
# datascience
a
Last week a SQL tutorial "SQL for data scientists in 100 queries" caught my eye, on the front page of HackerNews. Seemed like its 100 examples would make a great basis for a dataframe tutorial... so I created a clone: "Kotlin dataframe in 100 queries" (full repo w/ notebook file here). I'm interested in your feedback/comments, please feel free to reply here or to DM, thanks! (and thanks for your work on dataframe!)
🔥 14
K 4
K 2
👍 7
p
This is a very cool notebook! Great job! What do you use to convert notebook to html?
j
indeed awesome job!! 😄
Just some small notes regarding nullability: • I'd replace
filter { it.someCol != null}
with
dropNulls { someCol }
as that's a bit easier to read • A lot of
!!
can be replaced with
.castToNotNullable()
on the column in the previous operation For example, n.o. 49:
Copy code
dfPenguins
    .dropNulls { body_mass_g }
    .convert { body_mass_g.castToNotNullable() }.with { 
        when {
            it < 3500 -> "small"
            it < 5000 -> "medium"
            // ... especially useful when there are lots of cases
            // a map also works, as we demonstrated in #41
            else -> "large"
        }
    }
...
Of course your version is equally correct 🙂 it's just my taste
Regarding your JSON section: It's actually possible to convert JSON columns to column groups in DataFrame. I just noticed this functionality of parse is undocumented, but you can
df.parse { jsonStringCol }
(or alternatively
df.convert { jsonStringCol }.with { DataRow.readJsonStr(it) }
) and then the json-reading abilities of DF will be utilized. Of course, Kotlinx Serialization is also possible, but sometimes that's a bit much boilerplate IMO. Plus, it allows you to show column groups, which SQL doesn't support 🙂 (right?)
👍 1
a
Thanks for the suggestions!! ... yes I used
dropNulls
in some places but I agree it is clearer... I've replaced most null filters with it, and `castToNotNullable`where appropriate. Also added some json
parse
examples, thank you for that tip, definitely is a nice option rather than all the setup necessary for a full de-serialization.
🙂 1
HTML conversion is just the built-in "Save and Export" function from Jupyter (I think it uses "pandoc" under the hood). Styling is automatically applied from * \.jupyter\custom\custom.css