https://kotlinlang.org logo
#kotlin-spark
Title
# kotlin-spark
e

Eric Rodriguez

03/13/2024, 9:41 PM
why is
spark.sql.codegen.wholeStage
set to.
false
for jupyter?
j

Jolan Rensen [JB]

03/18/2024, 9:58 AM
Honestly, I don't remember 😅 are you able to turn it on like
%use spark(spark.sql.codegen.wholeStage=true)
? Or is it set to false either way?
e

Eric Rodriguez

03/18/2024, 10:07 AM
haven't tried it since I don't think it affects me
will let u kn ow when I get a chance to try it
j

Jolan Rensen [JB]

03/18/2024, 1:32 PM
I suspect we turned it off because it tries to collapse java code calls (as described here https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-whole-stage-codegen.html). This might cause issues since we wrap java functions with kotlin ones. Plus, in Jupyter, we're working with Kotlin REPL which is again an abstraction on top of Kotlin, so the more Spark optimizations we can turn off, the more stable it will be
👍 1
e

Eric Rodriguez

03/18/2024, 1:38 PM
I see. What's the best place to understand how you do the kotlin-> Scala translation?
j

Jolan Rensen [JB]

03/18/2024, 1:44 PM
Well yes, I'm working on a revamp to support Spark 3.4+. They reworked their internals, so our old hacky approach no longer works. However, this new approach is still a WIP and not yet published anywhere (apart from the draft PR). It should become way less hacky and support Spark Connect (in theory). But it still has some issues, as described in that issue thread. To see how we generate encoders/schemas/datatypes from Kotlin in the latest release you can check out Enocding.kt and KotlinReflection.scala. I gave more explanation in this comment. We also provide many wrappers for functions, like the ones here.
🙏 1
2 Views