Hi there why would 2 different lists return the same hash `` kotlinlang #stdlib

Hi there, why would 2 different lists return the s...

holgerbrandl

08/07/2019, 11:44 PM

Hi there, why would 2 different lists return the same hash?

Copy code

println(listOf(3, 263).hashCode())
 println(listOf(5, 201).hashCode())
// -> 1317

println(Arrays.asList(3, 263).hashCode())
println(Arrays.asList(5, 201).hashCode())
// --> 40830

It's a more general problem as also the java example suffers from exactly the same problem. Interestingly, arrays with the same data give differnt hashes as documented at https://github.com/holgerbrandl/krangl/issues/75

ilya.gorbunov

08/07/2019, 11:50 PM

Given that

hashCode

function converts an infinite range of input values to the limited range of

Int

values, there could certainly be two lists that produce the same hash code value.

ilya.gorbunov

08/08/2019, 12:01 AM

If the question is about why these two particular lists produce the same hash code, it's because for two-element lists the hash code formula is specified as

((1 * 31) + element1.hashCode) * 31) + element2.hashCode

, and an Int value's hash code is that value itself. Thus

(1 * 31 + 3)*31 + 263

is the same as

(1 * 31 + 5)*31 + 201

Dominaezzz

08/08/2019, 12:14 AM

~~I'm impressed you have a collision example.~~ Nvm, just saw the implementation. 😅

Dominaezzz

08/08/2019, 12:19 AM

Arrays don't really override

hashCode

, which is why they are different.

gildor

08/08/2019, 4:57 AM

Looks like a library bug, even if those lists has the same hashCode it doesn’t mean that they are equal, you cannot use hashCode to identify equal object, only different ones For example check Java HashMap implementation

karelpeeters

08/08/2019, 6:30 AM

Indeed, the javadoc is very clear about this. For example,

return 0

is a valid

hashcode

implementation.

holgerbrandl

08/08/2019, 6:45 AM

Thanks gentleman for your help. I now understand that I misused hashCode when I actually wanted to assess equality. So even hashing to

Long

would give rise to an infinite number of collisions.

holgerbrandl

08/08/2019, 6:47 AM

So it seems for my usecase in krangl where I want to group data-frames by a couple of selector columns (like in sql), there is no way to avoid working with the complete grouping tuples. I liked the simplification when using the hashes, but as you helped me to understand this approach was just wrong.

karelpeeters

08/08/2019, 6:49 AM

If I understand your use case that's just

groupBy

from the stdlib.

holgerbrandl

08/08/2019, 6:52 AM

Indeed @karelpeeters

groupBy

seems to be the way forward. My original motivation to reduce the data with

hashCode

was better performance (without much thinking or benchmarking).

7 Views

Open in Slack

Previous Next