why is `spark sql codegen wholeStage` set to `false` for jup kotlinlang #kotlin-spark

why is `spark.sql.codegen.wholeStage` set to. `fal...

Eric Rodriguez

03/13/2024, 9:41 PM

why is

spark.sql.codegen.wholeStage

set to.

false

for jupyter?

Jolan Rensen [JB]

03/18/2024, 9:58 AM

Honestly, I don't remember 😅 are you able to turn it on like

%use spark(spark.sql.codegen.wholeStage=true)

? Or is it set to false either way?

Eric Rodriguez

03/18/2024, 10:07 AM

haven't tried it since I don't think it affects me

Eric Rodriguez

03/18/2024, 10:07 AM

will let u kn ow when I get a chance to try it

Jolan Rensen [JB]

03/18/2024, 1:32 PM

I suspect we turned it off because it tries to collapse java code calls (as described here https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-whole-stage-codegen.html). This might cause issues since we wrap java functions with kotlin ones. Plus, in Jupyter, we're working with Kotlin REPL which is again an abstraction on top of Kotlin, so the more Spark optimizations we can turn off, the more stable it will be

👍 1

Eric Rodriguez

03/18/2024, 1:38 PM

I see. What's the best place to understand how you do the kotlin-> Scala translation?

Eric Rodriguez

03/18/2024, 1:38 PM

maybe here: https://github.com/Kotlin/kotlin-spark-api/issues/195

Jolan Rensen [JB]

03/18/2024, 1:44 PM

Well yes, I'm working on a revamp to support Spark 3.4+. They reworked their internals, so our old hacky approach no longer works. However, this new approach is still a WIP and not yet published anywhere (apart from the draft PR). It should become way less hacky and support Spark Connect (in theory). But it still has some issues, as described in that issue thread. To see how we generate encoders/schemas/datatypes from Kotlin in the latest release you can check out Enocding.kt and KotlinReflection.scala. I gave more explanation in this comment. We also provide many wrappers for functions, like the ones here.

🙏 1

116 Views

Open in Slack

Previous Next