Hi, what's the most efficient way to remove an ele...
# announcements
g
Hi, what's the most efficient way to remove an element from a mutable set of ByteArray??
Copy code
val b = "1".toByteArray()  
val s = mutableSetOf("1".toByteArray())
    
s.remove(b) // <== this does not work, probably because "1".toByteArray() != "1".toByteArray()

s.removeIf { b.contentEquals(it) } // <= this works
r
My immediate thought is that there’s no point using a
Set
with arrays; `Set`’s contract is to use
equals
and
hashCode
to ensure no duplicates, and on the JVM I’m 99% sure every array is unique and cannot be equal to another array.
g
You are right, my intention is more about the content than the object itself. The content should be unique (eg, an UUID converted to bytes)
t
What's happening here 😮 This code looks like it does many things different than, intended. Try explicitly typing it. Remember
ByteArray
does not implement collection.
ByteArray
does not overrides
equals
r
I’d suggest not storing the content in an array then, store it in a structure that implements equals & hashCode appropriately.
g
The intention is to emulate in-memory a Redis Key-Set <String, Bytes> - how would you do that?
t
You mean Map<String, ByteArray>? 🤔
g
The Redis Sets data type, so naively it was Map<String, Set<ByteArray>>, but I agree that it does not make much sense to have a set of bytearray
t
I guess you may just create:
Copy code
inline class Bytes(val content : ByteArray) {
    override fun equals(other : Bytes) = TODO()
}
Or somthing like that to make it work : )
Hmm... still it would be boxed inside set...
g
I did that, but it seems overly complex
Copy code
class InMemoryKeySetStorage() : KeySetStorage, Flushable {
    val storage = ConcurrentHashMap<String, MutableSet<Bytes>>()

    override suspend fun getSet(key: String): Set<ByteArray> {
        return getBytesPerKey(key).map { it.content }.toSet()
    }

    override suspend fun addToSet(key: String, value: ByteArray) {
        getBytesPerKey(key).add(Bytes(value))
    }

    override suspend fun removeFromSet(key: String, value: ByteArray) {
        getBytesPerKey(key).remove(Bytes(value))

        // clean key if now empty
        if (getSet(key).isEmpty()) storage.remove(key)
    }

    override fun flush() {
        storage.clear()
    }
    
    private fun getBytesPerKey(key: String): MutableSet<Bytes> {
        return storage[key]
            ?: run {
                val set = mutableSetOf<Bytes>()
                storage[key] = set
                set
            }
    }

    class Bytes(val content : ByteArray) {
        override fun equals(other: Any?): Boolean {
            if (this === other) return true
            if (javaClass != other?.javaClass) return false

            other as Bytes

            if (!content.contentEquals(other.content)) return false

            return true
        }

        override fun hashCode(): Int {
            return content.contentHashCode()
        }
    }
}
t
Seems boxing is necessary. I don't see custom Set/Map implementations that would allow to supply with custom equals/hashCode functions
if (getSet(key).isEmpty()) storage.remove(key)
requires synchronization
You use
ConcurrentHashMap
which suggest you want your data structure to be concurrent, but
mutableSetOf
returns non-concurrent set implementation.
Are you sure
run
provides any kind of synchronization?
In Java word, the Key-Set structure is actually called Multimap. You can find some concurrent Multimap implementations online.
If you care for performance of concurrent operations, you may also try implementing non-blocking concurrent hash triee multimap. I can't find any implementations online.
n
a multimap is a different data structure though?
t
@Nir From Key-Set? Maybe I don't understand what Key-Set is afterall...
n
I haven't heard it called "Key-Set" so I'm not sure I know what that is 🙂
but a set is very different from a multimap
multimaps are maps. they have both keys and values.
and they are multi. They allow multiple equivalent keys.
there's maps, sets, multimaps, and multisets
all different
t
Map<String, Set<Bytes>> ~= Multimap<String, Bytes>
n
Okay, so when you wrote "Key-Set", you meant a Map<Key, Set<Value>>
t
val storage = ConcurrentHashMap<String, MutableSet<Bytes>>()
n
the answer is, no, these are still not equivalent
they are similar, but not identical
For a String specifically they would be the same though
assuming that == is not customizable
t
Maybe SetMultimap?
n
well, still not really the same, they store the same data, but not the same complexity guarantees. There's no way to check in O(1) if there's a byte belonging to a string, in a Multimap<String, Bytes>
t
You mean byte or Bytes?
n
I guess Bytes 🙂 Let's just say Value
t
Well... Interfaces do not define complexity of they operation.
n
I don't know about define but typically interfaces mostly try to provide API they can perform efficiently, to discourage inefficiency
most languages linked lists don't provide a random access operator, for example
t
There are SetMultimap with contains implementation of complexity O(1)
n
I dunno what a SetMultimap is 🙂
But at any rate, just saying that practically, depending what you want to do, a Map<String, Set<Value>> is not equivalent to a Multiset<String, Value>
t
A Multimap that cannot hold duplicate key-value pairs.
From guava SetMultimap Javadoc
The only difference I see is that Map<String, Set<Value>> can hold empty set for some keys.
n
I see what you mean, sure, but that's not a regular multimap
t
This is
SetMultimap
n
Sure. All I said was:
Map<String, Set<Value>> is not equivalent to a MultiMap<String, Value>
t
Tomasz Krakowiak  [7:40 PM]
Maybe SetMultimap?
Nir  [7:40 PM]
well, still not really the same, they store the same data, but not the same complexity guarantees. There's no way to check in O(1) if there's a byte belonging to a string, in a Multimap<String, Bytes>
n
Yep, look which data structure I reference at the end 🙂 I'm responding to your previous comment
t
You caught me. I said Multimap when I should said SetMultimap/
n
Gotcha, I didn't know you meant to say SetMultimap the whole time.
If you had said that I wouldn't have said anything since I've never even heard that term before, whereas multimap is a fairly standard CS term