What s the nicest and most efficient way to take two lists o kotlinlang #getting-started

What's the nicest and most efficient way to take t...

dave08

01/04/2023, 4:05 PM

What's the nicest and most efficient way to take two lists, one of

data class Foo(val id: String, ...)

and one of

data class Baz(val id: String, ...)

and output a map of the id against a pair of Foo and Baz that correspond

Map<String, Pair<Foo?, Baz?>>

where either could not exist?

Landry Norris

01/04/2023, 4:07 PM

Can you make any guarantees? Will there always be a matching bazList id for each fooList id? What should happen if not? Are the lists the same size? What should happen if two items in fooList have the same id?

dave08

01/04/2023, 4:09 PM

No sometimes there'll be an entry in bazList not in fooList and sometimes vice versa...

Sam

01/04/2023, 4:09 PM

Copy code

val foosById = foos.associateBy { it.id }
val bazsById = bazs.associateBy { it.id }
val pairs = (foosById.keys + bazsById.keys).associateWith { foosById[it] to bazsById[it] }

dave08

01/04/2023, 4:12 PM

One of the lists is going to be bigger most of the time, but one could have about 50-200 entries or more, the other could be from 1 to 100 or more... @Sam That looks pretty nice! I just wonder if there would be a way to save a bit on the intermediary allocations and steps... this api could have tons of requests... and for each one needs to process such a list.

Landry Norris

01/04/2023, 4:13 PM

If you want fewer allocations, try this:

Copy code

buildMap {
            val list1 = listOf<Foo>()
            val list2 = listOf<Bar>()

            list1.forEach {
                if(get(it.id) == null) put(it.id, it to null)
                else put(it.id, it to get(it.id).second)
            }

            list2.forEach {
                if(get(it.id) == null) put(it.id, null to it)
                else put(it.id, get(it.id).first to it)
            }
        }

except take list1, list2 as params instead of creating them.

Landry Norris

01/04/2023, 4:15 PM

You can make it more efficient by creating a mutable class that holds a Foo/Bar pair instead of using the immutable pair.

dave08

01/04/2023, 4:15 PM

Doesn't that put the entries that have both not null twice?

Landry Norris

01/04/2023, 4:16 PM

If both are not null, then when it gets to list2, get(it.id) will return not null.

Landry Norris

01/04/2023, 4:16 PM

Specifically, get(it.id) will yield a (Foo to null)

dave08

01/04/2023, 4:19 PM

get(it.id) will return not null.

It will go to the not null branch which adds it again...

by creating a mutable class

won't help because I need one pair for each combination to be saved in the final map, I just want to save on the intermediaries

Landry Norris

01/04/2023, 4:20 PM

buildMap builds a HashMap by default, I believe. A call to put(key, value) will update the existing value if there is one already.

dave08

01/04/2023, 4:21 PM

Which is less efficient than just not trying to re-add it, and then won't create an extra pair for nothing...

ephemient

01/04/2023, 4:21 PM

Copy code

class MutablePair<T, U>(var first: T, var second: U>
buildMap<String, MutablePair<Foo?, Bar?>> {
    for (foo in foos) getOrPut(foo.id) { MutablePair(null, null) }.first = foo
   for (bar in bars) getOrPut(bar.id) { MutablePair(null, null) }.second = bar
}

avoids extra temporaries, at a cost of exposed mutability

👍🏼 1

Landry Norris

01/04/2023, 4:22 PM

You could try to only add once, at the cost of more iterations, since you’d have to go through once and find where there’s both ids (O(n^2)), then again for only foo (O(n)), then again for only bar (O(n)).

ephemient

01/04/2023, 4:22 PM

but I wouldn't worry about that without measuring

Landry Norris

01/04/2023, 4:25 PM

If you create the MutablePair above and combine it with the buildMap solution, it’s O(n). If you want to expose immutable pairs, you can add a map phase to map it to Pair, with a O(n) cost. For large lists, O(n^2+2n) is less efficient than O(3n).

dave08

01/04/2023, 4:27 PM

For large lists, O(n^2+2n) is less efficient than O(3n).

So if it's only a few hundred entries per run, then O(n^2+2n), might be better? Or is that a lot?

ephemient

01/04/2023, 4:29 PM

unknown constants affect the comparison

Landry Norris

01/04/2023, 4:29 PM

Big O is most useful for looking at large scale, but for a few hundred items, I’d expect it to hold. Best way to find out is measurements.

ephemient

01/04/2023, 4:29 PM

GC usage differs and doesn't show up in traditional complexity analysis

👍 1

Landry Norris

01/04/2023, 4:30 PM

Try a best case, worst case, and average. Best case is where there’s only pairs or only singles. Not sure what worst case is in this case, but it may vary by the approach.

Landry Norris

01/04/2023, 4:31 PM

I’m sure kotlinx.benchmark will be helpful for getting a good answer.

👌🏼 1

ephemient

01/04/2023, 4:33 PM

it can still be pretty hard to tease apart in microbenchmarks, so if you don't have some way to perform in-situ performance tests, I would go with whichever is reasonably performant that you feel comfortable maintaining

👍🏼 1

👍 1

dave08

01/04/2023, 4:35 PM

Thanks! I guess I'll try one, and see how it does in production for now, maybe just deploying it to a portion of our users in the beginning...

Landry Norris

01/04/2023, 4:38 PM

Just for completion, here’s some smaple code that should build a map without putting any ids twice

Copy code

buildMap {
            val list1 = listOf<Foo>()
            val list2 = listOf<Bar>()

            list1.forEach { item1 ->
                list2.forEach { item2 ->
                    if(item1.id == item2.id)  put(item1.id, item1 to item2)
                }
            }
            
            list1.forEach {
                if(get(it.id) == null) put(it.id, it to null)
            }

            list2.forEach {
                if(get(it.id) == null) put(it.id, null to it)
            }
        }

this solution has the downside of not handling duplicate ids in the same list well. How you sanitize the inputs will affect how much of a big deal this is.

👍🏼 1

dave08

01/04/2023, 4:45 PM

One list for sure doesn't have any duplicates, the other one might, so I do this on it first... (NOT optimized either...):

Copy code

list1..groupBy { it.id }
        .mapValues { it.value.maxBy { entry -> entry.version } }

because I only need the highest version for that id... the other option (maybe better) could be to make the result

Map<String, Pair<Foo, List<Baz>>

and let my resolvers decide what to output from each of those map entries and one step would be comparing those versions... That was my goal. to have some processors process each entry they might care about and output an end result for it.

dave08

01/04/2023, 4:46 PM

Because I have a bunch of criteria to decide what to finally output for each of those ids based on the values of that map

2 Views

Open in Slack

Previous Next