Channels

#getting-started

Title

c

Colton Idle

06/27/2023, 9:48 PMIf I want to track whether I encountered some object already with fast lookups, I'm inclined to write

`val mapOfExistence = _mapOf_<MyObject, Boolean>()`

but is there a way to just have
`val mapOfExistence = _mapOf_<MyObject>()`

which would just be my hint if it's true/false depending on whether it exists in the map?v

Vampire

06/27/2023, 9:49 PMYou mean

`listOf<MyObject>()`

?🚫 1

Or

`setOf<MyObject>()`

?👆 1

y

Youssef Shoaib [MOD]

06/27/2023, 9:50 PMa map of objects to booleans is practically the definition of a set. That's because a set's

`contains`

is a function `(MyObject) -> Boolean`

, which is the equivalent of the map's `get`

with a default value of falsec

Colton Idle

06/27/2023, 9:53 PMwait. can I use a list?

no. I want constant time lookups... right. so a map is what I want.
And cool. a set seems like it should work. lemme give it a go

j

Joffrey

06/27/2023, 9:56 PMConstant time lookup of existence calls for a set

c

Colton Idle

06/27/2023, 10:01 PMthank you all. brain fog is real.

s

Stephan Schröder

06/28/2023, 7:31 AMTLDR: you can't get constant lookup, the best you can do is O(log n) which is close enough

j

Joffrey

06/28/2023, 8:03 AMs

Stephan Schröder

06/29/2023, 9:35 AMj

Joffrey

06/29/2023, 10:11 AMCollisions depend on the hash function of course, but in practice it's considered constant time because there are extremely few collisions with the default hash functions. Definitely not O(log(n))!

v

Vampire

06/29/2023, 10:11 AMThe essence is, he wanted something that has the same lookup-time as a

`HashMap`

and that is a `HashSet`

.
That hash code collisions lower the access time is of course correct. 🙂If you have a well-distributed hash-code, lookup is

`O(1)`

, if you have a badly-distributed (i.e. constant) hash-code, lookup is `O(n)`

But all that is irrelevant anyway, as OP does not talk about

`HashMap`

, but about `mapOf()`

which happens to be implemented using `HashMap`

on JVM backend. But I don't think that is guaranteed. 😄s

Stephan Schröder

06/29/2023, 11:19 AMv

Vampire

06/29/2023, 11:57 AMYou are still assuming implementation details, which are not guaranteed.
But **if** we talk about the standard Java

`HashMap`

(and thus also `HashSet`

as it is implemented using a `HashMap`

), even its JavaDoc states:
This implementation provides constant-time performance for the basic

operations (and`get`

), assuming the hash function`put`

🙂disperses the elements properly among the buckets.

j

Joffrey

06/29/2023, 1:34 PMYou are still assuming implementation details, which are not guaranteedTrue 😄 but I'm ok to roll with hashmap/hashset for the sake of this discussion.

the effective hash within the collection isn't the hash of the value but the hash of the value modulo number of slots/lists within the collectionSure, but that doesn't mean the access cost scales with the number of elements, because the capacity is increased on-demand in order to keep the size of those lists low. The only valid argument is about real hash collisions (not modulo collisions) because real hash collisions would keep lists long even through multiple capacity bumps (re-hashing wouldn't help in this case). In short,

how big is the chan[c]e that in a hashSet of capacity 8 (with a loadFactor of 0,76) 6 entries aren't collidingThat's not what big-O notation (and "constant time access") means. O(1) access time means that the access time doesn't grow indefinitely with the number of elements, and is instead bounded by some constant. So your analysis cannot just analyze one state, it should be measuring how "long" it takes to access an element when there is 10 elements in the map, and comparing it to how long it takes when there is 100, or 1000, or 10000 elements. If the access time stops growing with the size, it means it's bounded by a constant. Now, with the theoretical worst hash function (returning the same hash for any value), we do get an indefinitely growing list in a single bucket, independent of resizing. And it's effectively proportional to the number of elements, as it's effectively a linear search in a list. So that does mean

`O(n)`

as So my only argument is, that "constant access" isn't correct, but my "O(log n)" statement might well also be to pessimisticThe thing is, it depends on the hash function. But assuming a well-distributed hash function, it is correct to say

`O(1)`

access time.v

Vampire

06/29/2023, 1:38 PM...

Actually, no 😄

Copy code

```
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
```

j

Joffrey

06/29/2023, 1:44 PM"No" to what? 🙂

v

Vampire

06/29/2023, 1:50 PM`O(1)`

As the hashes are not used as-is, you can get collisions, even if each instance has a different hash-code and capacity is infinity.
So to reach `O(1)`

you need not only a well-distributed hash function, but also one that stays in the range `0`

- `(1 << 16) - 1`

, i.e. `0`

- `65535`

.
So as soon as you have at least `65536`

elements, `O(1)`

is automatically busted with `java.lang.Hash(Map/Set)`

, no matter what you do. 🙂j

Joffrey

06/29/2023, 2:15 PMI'm not sure I entirely follow what you mean here. I mean, all the above discussion was theoretical, but in practice a hash is a 32-bit integer so in any case, we already had a limit of 2^32 elements if we don't want *any* collision (we can't have a different hash code for every value). With this particular implementation detail, it just means we will have collisions for sure starting at 2^16 instead of 2^32. But having collisions doesn't necessarily mean it's not

`O(1)`

anymore. It just means the constant is a bit bigger.It could mean that it's not

`O(1)`

if the collisions followed some systematic pattern (and the distribution was thus uneven), and that's what the bit shift + XOR trick that you showed here is trying to avoid here. Starting from a well-distributed 32-bit hash, using only half of the bits would create systematic collisions for some common patterns that do indeed heavily rely on only differing in half of the bits, so this bit trick is a mitigation of that.c

Colton Idle

06/29/2023, 8:58 PMI'm glad I asked my question. 😂

😂 2