Could someone help me find the source code of hash...
# announcements
j
Could someone help me find the source code of hashCode for data classes in Kotlin?
I have a data class with about 150 fields defined in the primary constructur, and I think the hashCode generated for 2 different objects in different server environment but both containing the exact same data are not resulting in the same hashCode.
k
Do you want the compiler code that generates it or the generated code?
j
I'm just interested in the exact formula that Kotlin uses when I call
myDataClassInstance.hashCode()
k
You can easily look at the generated code in InteliJ: Tools > Kotlin > Show Kotlin bytecode > Decompile
👍 1
You don't happen to be using an array or something else that doesn't have the expected
hashcode
implementation?
j
the data class is extending an abstract class with one
abstract string val
and one
open var of a enum class
. Other than that my data class only has
String
and
optional Int val
k
Let us know what you figure out!
j
Do you know if the hashCode of a enum class value promises to return the same value?
k
Yes it does. Ah wait it's between different JVMs? Then I'm not sure actually.
j
yes it's between JVMs
k
Looks like no in theory, although I don't know which classes change in practice. https://stackoverflow.com/questions/1516843/java-object-hashcode-result-constant-across-all-jvms-systems
d
Enum hashCode implementation = enum ordinal
But your code should never rely on this implementation detail
👍 1
j
we have a service that delivers data to customers and all changes afterwards to this data. We have been using data classes for this data and saving the hashCode from this dataclass in a database and then whenever this hashCode changes we resend it. But somehow now, the hashCode generated on the server is not the same as the hashCode generated locally and this has been running for years for us.
Decision has been made and we will do a new version of this that doesn't rely on hashCode 🙂
Anyway, thanks for all the help
You guys think it would be wrong to do a
md5(data.toString)
?
d
For this purpose, I would use SHA1. MD5 is not good enough. SHA1 can also be broken but it requires a ridiculous amount of computing power. SHA1 creates a 20-byte string. You could use something like this:
Copy code
// do not share between threads
class DataHashCtx {
    private val md = MessageDigest.getInstance("SHA-1")!!
    private val bb = ByteBuffer.allocate(20)!!

    fun reset() {
        md.reset()
        tmpBuf.clear()
    }

    fun update(value: Int) {
        bb.putInt(0, value)
        md.update(bb.array(), 0, Int.SIZE_BYTES)
    }

    fun update(value: String) {
        // String hashCode implementation is part of API and unchanged since JDK 1.2
        update(value.hashCode()) 
    }

    fun hash(): Hash {
        md.digest(bb.array(), 0, 20)
        // Store 20 bytes in some Hash class. 
        return hashOf(bb)
    }
}
Copy code
data class MyData {
    fun dataHash(ctx: DataHashCtx): Hash {
        ctx.reset()
        ctx.update(this.dataItem1)
        ctx.update(this.dataItem2)
        ctx.update(this.dataItem3)
        ctx.update(this.dataItem4)
        ctx.update(this.dataItem5)
        return ctx.hash()
    }
}
But for an initial implementation you could well use data.toString().
a
Personally I'm not sure if relying on hashes being different is the best approach, i guess it depends on how bad it is if you do get a collision
j
we are using keys alongside these hashes, so collisions aren't really a concern for us. We have been using hashCode and I just checked, we did have a few collisions there now, but as we are using a key alongside this, it just needs to not to collide for the same key. So for us going from a 2^32 to 2^180, should be just fine 🙂