:warning: Hic sunt leones Occasionally I have to ...
# datascience
p
⚠️ Hic sunt leones Occasionally I have to bring data to DataFrame from an unsupported database, so I wrote the following code as "an exercise to the reader":
Copy code
fun ResultSet.asSequence(): Sequence<ResultSet> = sequence {
    while (next()) {
        yield(this@asSequence)
    }
}

operator fun ResultSet.get(index: Int): Any? = this.getObject(index)

fun ResultSet.toDataFrame(): DataFrame<*> =
    mutableMapOf<String, MutableList<Any?>>()
        .let { map ->
            val names = List(metaData.columnCount) {
                metaData.getColumnName(it + 1)
            }
            this
                .asSequence()
                .forEach { row ->
                    names.mapIndexed { index, name ->
                        map[name]?.add(row[index + 1]) ?: map.put(name, mutableListOf(row[index + 1]))
                    }
                }
            map
        }.toDataFrame()
Hope someone can make use of this (even as a counterexample). 😁 Cheerio!
🦁 1
j
Thanks Paulo! This is very cool to see I'm actually making a little guide for "unsupported" data sources in DataFrame, like Apache Spark, JetBrains Exposed, and maybe more, just to show how simple it is to create a DataFrame from it 🙂 Your example already shows this too! I did make a tiny refactor using some of the stdlib functions you may not have discovered yet 😉
Copy code
fun ResultSet.toDataFrame(): DataFrame<*> =
    buildMap<String, MutableList<Any?>> {
        val names = List(metaData.columnCount) {
            metaData.getColumnName(it + 1)
        }
        this@toDataFrame.asSequence().forEach { row ->
            names.forEachIndexed { index, name ->
                this.getOrPut(name) { mutableListOf() } += row[index + 1]
            }
        }
    }.toDataFrame()
An approach like this works great for any type of database. If you can convert something to a list or map, you can make a DataFrame from it. Regarding JDBC ResultSets, however, we do offer a special "helper" argument
dbType: DbType
which you can define for any unsupported JDBC database you want. Then you can simply
DataFrame.readResultSet(resultSet, dbType)
. See the docs for more info about this approach.
gratitude thank you 2
🔥 1
✔️ 1
p
Hey Jolan! Wow, this is amazing. 🔥 I will definitely check the docs. Always fun to play with DataFrame. Thanks so much! gratitude thank you
❤️ 2