razvandragut
07/30/2021, 9:41 PMspark version : 3.0.2
kotlin-spark-api: 1.0.1/2
Mike Lin
12/04/2021, 1:49 AMAdam Szadkowski
12/11/2021, 8:05 PMSebastian Kazenbroot-Guppy
01/03/2022, 2:23 AMMike Lin
01/03/2022, 9:00 PMdavidh
02/23/2022, 3:52 PMMichael Livshitz
04/13/2022, 12:05 PMJolan Rensen [JetBrains]
05/26/2022, 4:55 PMJolan Rensen [JetBrains]
05/31/2022, 3:33 PMjaeyol.youn
06/29/2022, 12:43 AM22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 4
22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk3388.nhnjp.ism:42755 in memory (size: 48.1 KB, free: 366.3 MB)
22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk4040.nhnjp.ism:36704 in memory (size: 48.1 KB, free: 1458.6 MB)
22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk3393.nhnjp.ism:33023 in memory (size: 48.1 KB, free: 1458.6 MB)
22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk3870.nhnjp.ism:34781 in memory (size: 48.1 KB, free: 1458.6 MB)
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 23
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 15
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 5
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 1
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 6
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 21
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 17
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 10
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 8
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 7
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 3
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 13
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 2
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 9
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 22
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 12
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 0
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 24
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 19
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 16
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 14
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 18
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 11
22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 20
22/06/29 01:59:00 ERROR yarn.ApplicationMaster: User class threw exception: kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
at kotlin.jvm.internal.ClassReference.error(ClassReference.kt:84)
at kotlin.jvm.internal.ClassReference.isData(ClassReference.kt:70)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.isSupportedClass(ApiV1.kt:172)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
at com.zshp.agg.batch.bannedword.CatalogBannedWordSparkApplication.main(CatalogBannedWordSparkApplication.kt:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
22/06/29 01:59:00 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
at kotlin.jvm.internal.ClassReference.error(ClassReference.kt:84)
at kotlin.jvm.internal.ClassReference.isData(ClassReference.kt:70)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.isSupportedClass(ApiV1.kt:172)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
at com.zshp.agg.batch.bannedword.CatalogBannedWordSparkApplication.main(CatalogBannedWordSparkApplication.kt:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
)
22/06/29 01:59:00 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/06/29 01:59:00 INFO server.AbstractConnector: Stopped Spark@52631839{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
22/06/29 01:59:00 INFO ui.SparkUI: Stopped Spark web UI at <http://lniuhwk3388.nhnjp.ism:38055>
22/06/29 01:59:00 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
22/06/29 01:59:00 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
22/06/29 01:59:00 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/06/29 01:59:00 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
22/06/29 01:59:00 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/06/29 01:59:00 INFO memory.MemoryStore: MemoryStore cleared
22/06/29 01:59:00 INFO storage.BlockManager: BlockManager stopped
22/06/29 01:59:00 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/06/29 01:59:00 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/06/29 01:59:00 INFO spark.SparkContext: Successfully stopped SparkContext
22/06/29 01:59:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
at kotlin.jvm.internal.ClassReference.error(ClassReference.kt:84)
at kotlin.jvm.internal.ClassReference.isData(ClassReference.kt:70)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.isSupportedClass(ApiV1.kt:172)
at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
at com.zshp.agg.batch.bannedword.CatalogBannedWordSparkApplication.main(CatalogBannedWordSparkApplication.kt:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
)
22/06/29 01:59:00 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
22/06/29 01:59:00 INFO yarn.ApplicationMaster: Deleting staging directory <viewfs://iu/user/z_shopping_ai/.sparkStaging/application_1655162175779_1751556>
22/06/29 01:59:00 INFO util.ShutdownHookManager: Shutdown hook called
jaeyol.youn
06/29/2022, 12:45 AM# gradle version 6.8.2
val kotlinVersion = "1.6.21"
dependency("org.apache.spark:spark-sql_2.12:2.4.4")
dependency("org.jetbrains.kotlinx.spark:kotlin-spark-api-2.4_2.12:1.0.2")
// shadow jar <https://github.com/johnrengelman/shadow>
dependency("com.github.johnrengelman.shadow:com.github.johnrengelman.shadow.gradle.plugin:6.1.0")
jaeyol.youn
06/29/2022, 12:48 AMSpark Version: 2.4.4
$SPARK_HOME/bin/spark-submit \
--master yarn \
--deploy-mode cluster
Jolan Rensen [JetBrains]
06/29/2022, 7:20 PMJolan Rensen [JetBrains]
08/04/2022, 2:48 PMMichael Livshitz
08/29/2022, 8:33 PMkotlin-spark-api_3.0.1_2.12:1.2.1
along side Kotlin 1.6.10
with a 1.8 JVM target
on an existing code base that is already using an older version of Kotlin Spark, but once I've replaced my old dependencies with the new ones, I got Cannot inline bytecode built with JVM target 11 into bytecode that is being built with JVM target 1.8. Please specify proper '-jvm-target' option
. Now when I go to sources I do see that it its bytecode 55 (Java 11). I was wondering if there is anything I can do about it given I'm stuck using a 1.8 JVM.
Thank you for making Spark awesome btw :DDenis Ambatenne
10/06/2022, 10:39 AMAleksander Eskilson
11/15/2022, 3:54 PMkotlin-spark-api
gitter a few days ago, https://gitter.im/JetBrains/kotlin-spark-api?at=636d86ea18f21c023ba7b373
There I'd noted that the default Spark classloader can have conflicts with kotlin-spark-api's replacement spark-sql Catalyst converters. The solution there I'd initially tried was to set the spark.driver.userClassPathFirst
config option. But in practice this causes a lot of other classpath issues with other supplied jars (things like s3 impl jars).
Another approach I'd tried was to make a shaded jar of my project, ensuring I included shaded dependencies for org.jetbrains.kotlinx.spark
and org.jetbrains.kotlin
. This seems to work when I add my project with the spark.jars
and also the spark.driver.extraClassPath
option (which states it'll prepend jars to the Spark classpath, which I'd suppose is the reason the replacement Catalyst converters get properly picked up by spark first), pointing to my project's physical jar.
That's all a little unwieldy though. I'd prefer to use the spark.jars.packages
option for my packaged project, but that doesn't seem to work (the spark-sql overloads seem to not get discovered by the Spark classloader before it's own implementation).
Is anyone else experiencing odd classloader issues between Spark and the kotlin-spark-api replacement of Catalyst converters? Ideas on how to resolve these classpath conflicts?Jolan Rensen [JB]
12/02/2022, 1:12 PMBigInteger
support in #182 thanks to #181
• New Spark versions: 3.2.3, 3.3.1, 3.2.2
• New Scala versions: 2.12.17, 2.13.10
• Updated Kotlin to 1.7.20
• Small bugfix regarding Map encoding
You can get the version that works with your Spark/Scala setup using the following table: https://github.com/Kotlin/kotlin-spark-api#supported-versions-of-apache-spark
(Might take a couple of hours for Maven Central to update)holgerbrandl
12/12/2022, 1:43 PMcitiesWithCoordinates.rightJoin(populations, citiesWithCoordinates.col("name") `===` populations.col("city"))
.filter { (_, citiesPopulation) ->
citiesPopulation.population > 15_000_000L
}
Was this never working, or was destructuring support for Tuple2 dropped?holgerbrandl
12/12/2022, 1:43 PMJolan Rensen [JB]
12/12/2022, 1:46 PMimport org.jetbrains.kotlinx.spark.api.tuples.*
to make sure it's imported 🙂 (the IDE doesn't always auto import stuff like this correctly)Jolan Rensen [JB]
12/12/2022, 1:54 PMholgerbrandl
12/14/2022, 9:44 AMJob aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (192.168.217.128 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
... very long stacktrace ....
I was under the impression that all serialization should be very simply in the expression, and every job is returning a simple string.
What am I doing wrong?Jolan Rensen [JB]
01/09/2023, 12:48 PMJolan Rensen [JB]
02/27/2023, 1:45 PMJolan Rensen [JB]
07/26/2023, 11:41 AMencoder<>()
function, or you simply want to show your interest in the matter, let us know in the relevant issue: https://github.com/Kotlin/kotlin-spark-api/issues/195
As usual, let us know if you have any issues and enjoy the version bump :)Tyler Kinkade
03/05/2024, 3:26 AM%use dataframe
%use spark
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.*
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.*
import org.apache.spark.unsafe.types.CalendarInterval
import org.jetbrains.kotlinx.dataframe.*
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.*
import org.jetbrains.kotlinx.dataframe.columns.ColumnKind.*
import org.jetbrains.kotlinx.dataframe.schema.*
import org.jetbrains.kotlinx.spark.api.*
import java.math.*
import java.sql.*
import java.time.*
import kotlin.reflect.*
fun DataFrame<*>.toSpark(spark: SparkSession, sc: JavaSparkContext): Dataset<Row> {
val rows = sc.toRDD(rows().map(DataRow<*>::toSpark))
return spark.createDataFrame(rows, schema().toSpark())
}
fun DataRow<*>.toSpark(): Row =
RowFactory.create(
*values().map {
when (it) {
is DataRow<*> -> it.toSpark()
else -> it
}
}.toTypedArray()
)
fun DataFrameSchema.toSpark(): StructType =
DataTypes.createStructType(
columns.map { (name, schema) ->
DataTypes.createStructField(name, schema.toSpark(), schema.nullable)
}
)
fun ColumnSchema.toSpark(): DataType =
when (this) {
is ColumnSchema.Value -> type.toSpark() ?: error("unknown data type: $type")
is ColumnSchema.Group -> schema.toSpark()
is ColumnSchema.Frame -> error("nested dataframes are not supported")
else -> error("unknown column kind: $this")
}
fun KType.toSpark(): DataType? = when(this) {
typeOf<Byte>(), typeOf<Byte?>() -> DataTypes.ByteType
typeOf<Short>(), typeOf<Short?>() -> DataTypes.ShortType
typeOf<Int>(), typeOf<Int?>() -> DataTypes.IntegerType
typeOf<Long>(), typeOf<Long?>() -> DataTypes.LongType
typeOf<Boolean>(), typeOf<Boolean?>() -> DataTypes.BooleanType
typeOf<Float>(), typeOf<Float?>() -> DataTypes.FloatType
typeOf<Double>(), typeOf<Double?>() -> DataTypes.DoubleType
typeOf<String>(), typeOf<String?>() -> DataTypes.StringType
typeOf<LocalDate>(), typeOf<LocalDate?>() -> DataTypes.DateType
typeOf<Date>(), typeOf<Date?>() -> DataTypes.DateType
typeOf<Timestamp>(), typeOf<Timestamp?>() -> DataTypes.TimestampType
typeOf<Instant>(), typeOf<Instant?>() -> DataTypes.TimestampType
typeOf<ByteArray>(), typeOf<ByteArray?>() -> DataTypes.BinaryType
typeOf<Decimal>(), typeOf<Decimal?>() -> DecimalType.SYSTEM_DEFAULT()
typeOf<BigDecimal>(), typeOf<BigDecimal?>() -> DecimalType.SYSTEM_DEFAULT()
typeOf<BigInteger>(), typeOf<BigInteger?>() -> DecimalType.SYSTEM_DEFAULT()
typeOf<CalendarInterval>(), typeOf<CalendarInterval?>() -> DataTypes.CalendarIntervalType
else -> null
}
Error:
Line_31.jupyter.kts (20:48 - 55) 'toSpark' is a member and an extension at the same time. References to such elements are not allowed
How can I fix this?Eric Rodriguez
03/12/2024, 4:32 PMspark-submit
it just stalls forever:
24/03/12 12:09:26 INFO StandaloneSchedulerBackend: Granted executor ID app-20240312110926-0002/1 on hostPort 172.20.0.4:42795 with 1 core(s), 1024.0 MiB RAM
24/03/12 12:09:26 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.55:58381 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.55, 58381, None)
24/03/12 12:09:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.55, 58381, None)
24/03/12 12:09:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.55, 58381, None)
24/03/12 12:09:26 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Eric Rodriguez
03/13/2024, 11:31 AMEric Rodriguez
03/13/2024, 9:41 PMspark.sql.codegen.wholeStage
set to. false
for jupyter?