why is `spark.sql.codegen.wholeStage` set to. `fal...
# kotlin-spark
e
why is
spark.sql.codegen.wholeStage
set to.
false
for jupyter?
j
Honestly, I don't remember 😅 are you able to turn it on like
%use spark(spark.sql.codegen.wholeStage=true)
? Or is it set to false either way?
e
haven't tried it since I don't think it affects me
will let u kn ow when I get a chance to try it
j
I suspect we turned it off because it tries to collapse java code calls (as described here https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-whole-stage-codegen.html). This might cause issues since we wrap java functions with kotlin ones. Plus, in Jupyter, we're working with Kotlin REPL which is again an abstraction on top of Kotlin, so the more Spark optimizations we can turn off, the more stable it will be
👍 1
e
I see. What's the best place to understand how you do the kotlin-> Scala translation?
j
Well yes, I'm working on a revamp to support Spark 3.4+. They reworked their internals, so our old hacky approach no longer works. However, this new approach is still a WIP and not yet published anywhere (apart from the draft PR). It should become way less hacky and support Spark Connect (in theory). But it still has some issues, as described in that issue thread. To see how we generate encoders/schemas/datatypes from Kotlin in the latest release you can check out Enocding.kt and KotlinReflection.scala. I gave more explanation in this comment. We also provide many wrappers for functions, like the ones here.
🙏 1