What's the nicest and most efficient way to take t...
# getting-started
d
What's the nicest and most efficient way to take two lists, one of
data class Foo(val id: String, ...)
and one of
data class Baz(val id: String, ...)
and output a map of the id against a pair of Foo and Baz that correspond
Map<String, Pair<Foo?, Baz?>>
where either could not exist?
l
Can you make any guarantees? Will there always be a matching bazList id for each fooList id? What should happen if not? Are the lists the same size? What should happen if two items in fooList have the same id?
d
No sometimes there'll be an entry in bazList not in fooList and sometimes vice versa...
s
Copy code
val foosById = foos.associateBy { it.id }
val bazsById = bazs.associateBy { it.id }
val pairs = (foosById.keys + bazsById.keys).associateWith { foosById[it] to bazsById[it] }
d
One of the lists is going to be bigger most of the time, but one could have about 50-200 entries or more, the other could be from 1 to 100 or more... @Sam That looks pretty nice! I just wonder if there would be a way to save a bit on the intermediary allocations and steps... this api could have tons of requests... and for each one needs to process such a list.
l
If you want fewer allocations, try this:
Copy code
buildMap {
            val list1 = listOf<Foo>()
            val list2 = listOf<Bar>()

            list1.forEach {
                if(get(it.id) == null) put(it.id, it to null)
                else put(it.id, it to get(it.id).second)
            }

            list2.forEach {
                if(get(it.id) == null) put(it.id, null to it)
                else put(it.id, get(it.id).first to it)
            }
        }
except take list1, list2 as params instead of creating them.
You can make it more efficient by creating a mutable class that holds a Foo/Bar pair instead of using the immutable pair.
d
Doesn't that put the entries that have both not null twice?
l
If both are not null, then when it gets to list2, get(it.id) will return not null.
Specifically, get(it.id) will yield a (Foo to null)
d
get(it.id) will return not null.
It will go to the not null branch which adds it again...
by creating a mutable class
won't help because I need one pair for each combination to be saved in the final map, I just want to save on the intermediaries
l
buildMap builds a HashMap by default, I believe. A call to put(key, value) will update the existing value if there is one already.
d
Which is less efficient than just not trying to re-add it, and then won't create an extra pair for nothing...
e
Copy code
class MutablePair<T, U>(var first: T, var second: U>
buildMap<String, MutablePair<Foo?, Bar?>> {
    for (foo in foos) getOrPut(foo.id) { MutablePair(null, null) }.first = foo
   for (bar in bars) getOrPut(bar.id) { MutablePair(null, null) }.second = bar
}
avoids extra temporaries, at a cost of exposed mutability
👍🏼 1
l
You could try to only add once, at the cost of more iterations, since you’d have to go through once and find where there’s both ids (O(n^2)), then again for only foo (O(n)), then again for only bar (O(n)).
e
but I wouldn't worry about that without measuring
l
If you create the MutablePair above and combine it with the buildMap solution, it’s O(n). If you want to expose immutable pairs, you can add a map phase to map it to Pair, with a O(n) cost. For large lists, O(n^2+2n) is less efficient than O(3n).
d
For large lists, O(n^2+2n) is less efficient than O(3n).
So if it's only a few hundred entries per run, then O(n^2+2n), might be better? Or is that a lot?
e
unknown constants affect the comparison
l
Big O is most useful for looking at large scale, but for a few hundred items, I’d expect it to hold. Best way to find out is measurements.
e
GC usage differs and doesn't show up in traditional complexity analysis
👍 1
l
Try a best case, worst case, and average. Best case is where there’s only pairs or only singles. Not sure what worst case is in this case, but it may vary by the approach.
I’m sure kotlinx.benchmark will be helpful for getting a good answer.
👌🏼 1
e
it can still be pretty hard to tease apart in microbenchmarks, so if you don't have some way to perform in-situ performance tests, I would go with whichever is reasonably performant that you feel comfortable maintaining
👍🏼 1
👍 1
d
Thanks! I guess I'll try one, and see how it does in production for now, maybe just deploying it to a portion of our users in the beginning...
l
Just for completion, here’s some smaple code that should build a map without putting any ids twice
Copy code
buildMap {
            val list1 = listOf<Foo>()
            val list2 = listOf<Bar>()

            list1.forEach { item1 ->
                list2.forEach { item2 ->
                    if(item1.id == item2.id)  put(item1.id, item1 to item2)
                }
            }
            
            list1.forEach {
                if(get(it.id) == null) put(it.id, it to null)
            }

            list2.forEach {
                if(get(it.id) == null) put(it.id, null to it)
            }
        }
this solution has the downside of not handling duplicate ids in the same list well. How you sanitize the inputs will affect how much of a big deal this is.
👍🏼 1
d
One list for sure doesn't have any duplicates, the other one might, so I do this on it first... (NOT optimized either...):
Copy code
list1..groupBy { it.id }
        .mapValues { it.value.maxBy { entry -> entry.version } }
because I only need the highest version for that id... the other option (maybe better) could be to make the result
Map<String, Pair<Foo, List<Baz>>
and let my resolvers decide what to output from each of those map entries and one step would be comparing those versions... That was my goal. to have some processors process each entry they might care about and output an end result for it.
Because I have a bunch of criteria to decide what to finally output for each of those ids based on the values of that map