Thread
#stdlib
    h

    holgerbrandl

    3 years ago
    Hi there, why would 2 different lists return the same hash?
    println(listOf(3, 263).hashCode())
     println(listOf(5, 201).hashCode())
    // -> 1317
    
    println(Arrays.asList(3, 263).hashCode())
    println(Arrays.asList(5, 201).hashCode())
    // --> 40830
    It's a more general problem as also the java example suffers from exactly the same problem. Interestingly, arrays with the same data give differnt hashes as documented at https://github.com/holgerbrandl/krangl/issues/75
    ilya.gorbunov

    ilya.gorbunov

    3 years ago
    Given that
    hashCode
    function converts an infinite range of input values to the limited range of
    Int
    values, there could certainly be two lists that produce the same hash code value.
    If the question is about why these two particular lists produce the same hash code, it's because for two-element lists the hash code formula is specified as
    ((1 * 31) + element1.hashCode) * 31) + element2.hashCode
    , and an Int value's hash code is that value itself. Thus
    (1 * 31 + 3)*31 + 263
    is the same as
    (1 * 31 + 5)*31 + 201
    Dominaezzz

    Dominaezzz

    3 years ago
    I'm impressed you have a collision example. Nvm, just saw the implementation. 😅
    Arrays don't really override
    hashCode
    , which is why they are different.
    gildor

    gildor

    3 years ago
    Looks like a library bug, even if those lists has the same hashCode it doesn’t mean that they are equal, you cannot use hashCode to identify equal object, only different ones For example check Java HashMap implementation
    karelpeeters

    karelpeeters

    3 years ago
    Indeed, the javadoc is very clear about this. For example,
    return 0
    is a valid
    hashcode
    implementation.
    h

    holgerbrandl

    3 years ago
    Thanks gentleman for your help. I now understand that I misused hashCode when I actually wanted to assess equality. So even hashing to
    Long
    would give rise to an infinite number of collisions.
    So it seems for my usecase in krangl where I want to group data-frames by a couple of selector columns (like in sql), there is no way to avoid working with the complete grouping tuples. I liked the simplification when using the hashes, but as you helped me to understand this approach was just wrong.
    karelpeeters

    karelpeeters

    3 years ago
    If I understand your use case that's just
    groupBy
    from the stdlib.
    h

    holgerbrandl

    3 years ago
    Indeed @karelpeeters
    groupBy
    seems to be the way forward. My original motivation to reduce the data with
    hashCode
    was better performance (without much thinking or benchmarking).