Hi I'm using spark mllib with kotlin . but get an ...
# datascience
i
Hi I'm using spark mllib with kotlin . but get an error
Copy code
fun main() {
    withSpark {
        dsOf(c(1, 2, 0.4f), c(2, 2, 0.5f)).map {
            ALS.Rating(it._1, it._2, it._3)
        }.show()
    }
}
Copy code
Exception in thread "main" java.lang.ClassCastException: kotlin.reflect.jvm.internal.KTypeImpl cannot be cast to kotlin.jvm.internal.TypeReference
a
@Pasha Finkelshteyn Could you help with that? It seems like some kind of kotlin version misconfiguration.
@Icyrockton are you on Andorid by chance?
i
thanks reply . not on Android , just a simple spark project , I want to train a ALS model with spark mllib . this is my gradle file
Copy code
import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar
plugins {
    kotlin("jvm") version "1.6.20"
    id("com.github.johnrengelman.shadow") version "5.2.0"
}

group = "org.example"
version = "1.0-SNAPSHOT"

repositories {
    mavenCentral()
}

dependencies {
    implementation(kotlin("stdlib"))
    implementation("org.jetbrains.kotlinx.spark:kotlin-spark-api-3.2:1.0.3")
    implementation("org.apache.spark:spark-sql_2.12:3.2.0")
    implementation("org.apache.spark:spark-streaming_2.12:3.2.0")
    implementation("org.apache.spark:spark-streaming-kafka-0-10_2.12:2.4.8")
    implementation("org.apache.spark:spark-mllib_2.12:3.2.0")
}

tasks {
    build {
        dependsOn(shadowJar)
    }
}

tasks{
    named<ShadowJar>("shadowJar"){
        dependencies {
            exclude {
                it.moduleGroup == "org.apache.spark" || it.moduleGroup == "org.scala-lang"
            }
        }
    }
}
p
Huh, TBH i didn't try using Spark MLib. We need to check if ALS.Rating is supported
a
It could be a problem with shadow. Especially as you are using an old version. Could you try to use it without shadow?
Also as far as I remember build depends on shadowjar by default
and stdlib dependency not needed since 1.6
p
Well, exception points us that it's something with reflection
I also asked Jolan to take a look, it looks like he's not in this chat yet
a
Yes. I saw similar reports on some Android vendors. In theory Shadow could mess the reflection
i
It still doesn't work
Copy code
plugins {
    kotlin("jvm") version "1.6.20"
}

group = "org.example"
version = "1.0-SNAPSHOT"

repositories {
    mavenCentral()
}

dependencies {
    implementation("org.jetbrains.kotlinx.spark:kotlin-spark-api-3.2:1.0.3")
    implementation("org.apache.spark:spark-sql_2.12:3.2.0")
    implementation("org.apache.spark:spark-streaming_2.12:3.2.0")
    implementation("org.apache.spark:spark-streaming-kafka-0-10_2.12:2.4.8")
    implementation("org.apache.spark:spark-mllib_2.12:3.2.0")
}
p
@Icyrockton could you please post the entire stacktrace?
a
Interesting. Could you please drop the full stack trace or at least the first mention of kotlinx.spark classes
i
message has been deleted
just few lines
a
erm.. it seems like a runner error, not spark error
What is there in line 97?
just click the link
a
What's written on the line 97?
i
message has been deleted
it jump ApiV1.kt:196
a
OK, so definitely unsupported type in Kotlin for Apache Spark
a
Ok, some weird thing with reification.
a
There are lots of inline functions and it looks like typeOf<> returns something incorrect
a
The error is really strange. @Icyrockton could you report it in github?
i
report to kotlin-spark-api repository?
a
@asm0dey I agree and it is probably either compiler bug, or you are using unsafe reflection internals somewhere.
a
Please be aware that we do not officially support Spark MLib yet
@altavir we live in kinda bleeding edge of reflection, so it can be anything
@Icyrockton yes please, on github
i
org.apache.spark.ml.recommendation.ALS <--- this Rating get error org.apache.spark.mllib.recommendation.Rating <--- it's ok
a
Interesting.
ALS$$Rating
looks absolutely legit to me, we should support it. The only thing I can think of — it's Scala generic, not Kotlin, thus it's possible that we may not be able to infer generic type, so it will fail
Is usage of
Rating
instead of
ALS$$Rating
an option for you?
Difference is
org.apache.spark.mllib.recommendation.Rating
is not generic
i
it seems work. emmm ALS.train is RDD version . How conver to Dataset
a
RDD support is in progress right now, you can see ongoing work on GitHub
Still, please report on GitHub 🙂
j
The issue was that we did support the encoding of Scala Products, but didn't notice that case classes were also Products. Fix is on its way on Github 🙂
ALS.Rating is a case class, so after this fix, it should work
K 1