ita
12/06/2021, 4:17 PMaltavir
12/06/2021, 6:39 PMroman.belov
12/06/2021, 7:50 PMita
12/06/2021, 8:39 PMIlya Muradyan
12/07/2021, 1:51 AM%dumpClassesForSpark
magic. If it still doesn't work, please share stacktraceita
12/10/2021, 7:31 PM%use spark
which call %dumpClassesForSpark
behind the scene and it worked smoothly. However, if I try to create a Spark Dataframe from an AWS S3 file I get java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
when running from Kotlin. Using PySpark on the same container I can create it from the S3 file.
I noticed that it uses the $HOME/.m2
directory to find the spark libraries. How can I use previous Spark configuration libraries from my $SPARK_HOME
?Ilya Muradyan
12/11/2021, 4:45 PM2.4.4
, as you see. To use another version of Spark, specify it in %use magic, i.e:
%use spark(spark=2.4.8, scala=...)
It may turn out that you don't need all the libraries in this descriptor's dependencies block, or need some other libraries. It may also turn out that you need other imports or initialization code. In this case, you can try to avoid using %use spark
at all.
First, execute %dumpClassesForSpark
magic.
Then, specify desired dependencies, i.e:
@file:DependsOn("org.apache.spark:spark-mllib_2.11:2.4.8")
If you want to use locally installed libraries, specify path to JAR instead of GAV coordinates
@file:DependsOn("/path/to/library.jar")
Alternatively, if $SPARK_HOME
is a local maven repository (not just a directory with JARs), you can add it as a repository:
@file:Repository("/spark/home/path")
and then add dependencies via GAV coordinated as it was mentioned above.
Then, you should specify desired imports, i.e.
import org.apache.spark.sql.*
And, finally, write the code that initializes the Spark session, i.e.
val spark = SparkSession
.builder()
.appName("Spark example")
.master("local")
.getOrCreate();
val sc = spark.sparkContext()
altavir
12/11/2021, 4:46 PMasm0dey
12/11/2021, 6:25 PM