Load populates the map from a file on disk which contains hash values calculated on previous runs of the program. The remaining hashes are calculated lazily as needed so that I can get quick results. I'm looking for duplicate files, so unless I have two files with the same size, there's no reason to calculate the hash.
Unfortunately this means the entire operation is sitting at the edge and cannot be considered pure. Once I do a pass through the analysis given a known set of files I can stop worrying about side-effects. If a file has its hash updated I would feed that back as a bulk operation by merging the ArchiveHashCache value.
The question is should I have a snapshot (i.e. immutable) version of ArchiveHashCache to pass around between operations then accumulate and merge new hashes as above? Any good alternative?
My current implementation has a FileReference data class with a lazy hash property that delegates to the cache and recalculates on the physical File on a cache miss.