I implemented the solution to day 7 (file system d...
# advent-of-code
l
I implemented the solution to day 7 (file system directory size calculation) recursively. Interestingly enough, a recursive
data class Directory(val directories: MutableList<Directory>, ...)
works but if I use
MutableSet
instead then it crashes with a stack overflow (although the filesystem tree in the input is quite shallow, 10 hierarchy levels tops).
The crash seems to be because
AbstractSet.hashCode()
also recursively calls
hashCode()
of each of the set's elements.
e
mutating elements of a set is bad practice, regardless
https://docs.oracle.com/javase/8/docs/api/java/util/Set.html
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects
equals
comparisons while the object is an element in the set.
Kotlin collections are just Java collections so all the same contracts apply
l
I now think that the stack overflow crash was actually caused because the directory contained a reference to its parent directory.
data class Directory(val parent: Directory?, val directories: MutableSet<Directory>, ...)
So
hashCode()
tried to calculated the hashcode of the parent, which again called the child
hashCode()
and recursion became endless
e
yes, but also if you're mutating elements of a set you are breaking its invariants and may get unexpected behavior like crashes or iteration that misses elements
l
Although I didn't have the same crash with mutable list
e
yes, because its layout is linear regardless of the contents of the elements, unlike a set
l
But the doc you mention says about "mutable objects as Set elements", not about the Set itself being mutable (i.e. adding/removing to it)
Or is it the same?
e
imagine a
Copy code
class HashSet<T> {
    private val contents = arrayOfNulls<T>(size / loadFactor)
    fun contains(element: T): Boolean =
        contents[element.hashCode().mod(contents.size)] == element
}
l
And if it's true, why is there a MutableSet at all?
e
if you mutate an element, you change its hashCode and equals, without the set knowing
leading to broken behavior
it mentions, as a special case, that sets cannot contain themselves
that is not all of the verboten behavior
p
Lol do english speakers use the word "verboten"?
l
I don't understand the code example you posted
And even Java Set can be mutated - it has
add()
and
remove()
methods. So I'm not sure I understand your remark about dangers of sets being mutable.
e
you can mutate a set - just not the elements in a set while they are members
l
But I don't mutate the elements
I just add them
🤦
Of course, they contain another MutableSets of directories
Now I get it, thanks!
e
if you create a cycle, your definitely mutated something in a set
but mostly: the fact that the elements are mutable in the first place is a trap
l
I should probably solve the directory tree construction differently then
e
this is why Python has a distinction between hashable (immutable) and mutable (non-hashable) types
and immediately rejects adding mutable values to a set (or using them as map keys)
too late to retrofit into the Java/Kotlin type system though
l
Thank you for the patient explanations!
e
sure. hope this explanation saves you from other problems in the future
l
Most definitely, that's why I do the Advent of Code, because later at work I have immediately a red light when seeing this kind of problems. Best way of learning is to fail in a safe environment 😉
Actually I asked ChatGPT to give me an example when it might cause trouble storing mutable objects in a set, and it came up with quite a nice one (obviously with a big error in a critical place, but the idea was pretty clear): Suppose you have a class
Person
containing a mutable variable
name
, where
equals()
and
hashCode()
are based on the
name
- pretty much a
data class Person(var name: String)
. Now create
Copy code
val person1 = Person("Alice")
val person2 = Person("John")
and add them to a
MutableSet
. If you now ask the set about its size, it will obviously print
2
. Now rename
person2
from "John" to "Alice", so that they become equal in the sense of the
equals()
method. Obviously a set should not contain two equal objects. Yet if you ask the set about its size now, it will still print
2
. When you ask the set to print all elements' names, it will print
Alice Alice
. I suppose that's because both elements are already stored in different hash buckets in the set, because their hashes were calculated at the moment of adding them to the set - and are not being recalculated when the objects are modified.
BTW for those curious: ChatGPT got the main idea of the example right, only its suggestion was to change the name from "John" to "Johnny", which didn't make any sense, because then the two objects would still be different. However it corrected itself after I pointed out this mistake 😉