Is there in the standard library or an external one a fucnti kotlinlang #announcements

Is there in the standard library (or an external o...

Louis Saglio

09/23/2020, 7:50 PM

Is there in the standard library (or an external one) a fucntion similar to

random.choices

in python ?

Nir

09/23/2020, 7:56 PM

I can't seem to see one, but implementing one on top of random seems reasonably straightforward

Nir

09/23/2020, 7:57 PM

do you need weights?

Louis Saglio

09/23/2020, 8:03 PM

Yes, alognside with the possibility of returning multiple items from the original collection without duplicates. A naive implementation would be easy but I need an very efficient one because it will be called thousands of times (possibly millions).

Nir

09/23/2020, 8:05 PM

you mean, sampling without replacement?

Nir

09/23/2020, 8:05 PM

sorry, sampling with replacement

Nir

09/23/2020, 8:05 PM

no, without. lol

Nir

09/23/2020, 8:05 PM

Okay. So sampling with with replacement is very simple to write yourself.

Nir

09/23/2020, 8:06 PM

you just generated N doubles

Nir

09/23/2020, 8:06 PM

take a cumulative sum of the weights. For each double, binary search into the list to find the index, and return the value at that index.

Nir

09/23/2020, 8:07 PM

If the weights are integers and their total sum is relatively small, a lookup table is probably faster for N large

Nir

09/23/2020, 8:07 PM

are your weights integers or floats?

Louis Saglio

09/23/2020, 8:12 PM

idealy floats but I can go with Integers

Nir

09/23/2020, 8:12 PM

If you can keep the weights in a reasonable range, that would be faster

Louis Saglio

09/23/2020, 8:14 PM

To add context, its for the selection phase of a genetic algorithm, the weights are the fitness values of the population and the collection to choose from is the population

Louis Saglio

09/23/2020, 8:15 PM

Python is very handy for this case, you can implement the whole selection phase with only reasonably complex expression

Louis Saglio

09/23/2020, 8:17 PM

Anyway thank you for your help 🙂

Nir

09/23/2020, 8:27 PM

Copy code

fun<T> Random.choices(population: List<T>, n: Int, weights: List<Float>): List<T> {
    val result = ArrayList<T>()
    result.ensureCapacity(n)
    val cumWeights = run {
        val c = weights.runningReduce { s, t -> s + t }
        c.map { it / c.last() }
    }

    for (d in doubles(n.toLong())) {
        val index = cumWeights.binarySearch(d)
        result.add(population[-1*index])
    }
    return result
}

👍 1

Nir

09/23/2020, 8:27 PM

it should look something like this

Nir

09/23/2020, 8:27 PM

if you already produce cmulative weights in your algorithm you can skip that step

Nir

09/23/2020, 8:28 PM

the way kotlin returns the index from binary search is pretty weird so there's probably off by 1 errors there

Nir

09/23/2020, 8:28 PM

but that's the basic idea

Nir

09/23/2020, 8:32 PM

without replacement is trickier, I'd need to think about it

ephemient

09/23/2020, 9:05 PM

Python's

random.choices()

selects with replacement, by the way

ephemient

09/23/2020, 9:06 PM

are you expecting without replacement behavior?

ephemient

09/23/2020, 9:08 PM

for with replacement behavior, I think I'd do something like this: https://pl.kotl.in/j_b3jZ6jb

ephemient

09/23/2020, 9:10 PM

using a Sequence so the caller can pull as many as they want, and letting the weights wrap around if they're too short, because I feel like the Kotlin stdlib design is generally to return something (but you could save some code if you took that out)

Nir

09/23/2020, 9:14 PM

I have to admit, I don't follow how this works

Nir

09/23/2020, 9:16 PM

even the code in main:

Copy code

(1..9).toList()
    	    .randoms(weights = listOf(1.0, 2.0))

Your weights list is not the same length as your population

Nir

09/23/2020, 9:18 PM

Or how you can take 1000 integers, from the list 1-9 without replacement...

Nir

09/23/2020, 9:18 PM

oh, this is with replacement, so forget that part 🙂

ephemient

09/23/2020, 9:19 PM

I'm cycling the weights 😄

Nir

09/23/2020, 9:19 PM

But yeah, it just seems more complex, more math? And in the end, you call headSet, which is still going to be a log(N) call.

Nir

09/23/2020, 9:20 PM

just on a more complex data structure.

Nir

09/23/2020, 9:20 PM

so it's probably going to be slower.

Nir

09/23/2020, 9:20 PM

I'm guessing that sortedset is probably going to be some kind of tree.

ephemient

09/23/2020, 9:20 PM

yeah default impl is TreeSet

Nir

09/23/2020, 9:20 PM

Yeah, I have to admit I don't understand what advantage this has over my solution

ephemient

09/23/2020, 9:21 PM

well as I said, IMO kotlin library functions tend to return something instead of erroring on odd inputs

Nir

09/23/2020, 9:21 PM

err what

Nir

09/23/2020, 9:22 PM

If you have specific example I'd be curious to see what they are.

Nir

09/23/2020, 9:22 PM

In this specific case I'm pretty sure that erroring out is the right call.

Nir

09/23/2020, 9:22 PM

Even if you wanted to cycle the weights though, it would still be faster to just build up an array of weights the same size via cycling, then call my implementation

ephemient

09/23/2020, 9:22 PM

that, I don't see why

Nir

09/23/2020, 9:23 PM

why it's right to error out?

ephemient

09/23/2020, 9:23 PM

N=population size, W=weight size: mine is O(log W), yours is O(log N)?

Nir

09/23/2020, 9:23 PM

the population size and the weight size are constrained to be the same?

Nir

09/23/2020, 9:23 PM

that's how the python function works

Nir

09/23/2020, 9:24 PM

Ah, I see. Yes, if you add this totally nonintuitive behavior of cycling the weights, then yours may be faster when you have many fewer weights than population

ephemient

09/23/2020, 9:25 PM

zip() doesn't fail when the inputs are different lengths, substringBefore/After doesn't fail if it isn't found, etc.

Nir

09/23/2020, 9:25 PM

If you have examples of what you wrote above I'm genuinely curious btw. Fail fast has been the consensus in software engineering for a while and Kotlin is mostly following consensus

Nir

09/23/2020, 9:25 PM

zip does that for a very specific reason

Nir

09/23/2020, 9:25 PM

so that it works with infinite ranges

ephemient

09/23/2020, 9:25 PM

all the orNull methods instead of throwing

ephemient

09/23/2020, 9:26 PM

relative to Java, Kotlin definitely throws less

Nir

09/23/2020, 9:26 PM

kotlin likes exceptions less

Nir

09/23/2020, 9:26 PM

returning null though is signifying an error, just in a different way

ephemient

09/23/2020, 9:26 PM

anyway, even without that part

Nir

09/23/2020, 9:26 PM

(typically)

ephemient

09/23/2020, 9:26 PM

I believe returning a Sequence is useful

Nir

09/23/2020, 9:26 PM

sure, the sequence part, I agree with

Nir

09/23/2020, 9:27 PM

making it a member function of List to me is weird but largely subjective, I guess.

ephemient

09/23/2020, 9:27 PM

looks like Python doesn't have a sample-without-replacement-with-weights method

Nir

09/23/2020, 9:27 PM

(/s/member/extension)

ephemient

09/23/2020, 9:27 PM

well we already have fun Iterable<T>.randomOrNull() in the standard library

Nir

09/23/2020, 9:27 PM

fair enough, I didn't know that

ephemient

09/23/2020, 9:27 PM

this is basically the same

Nir

09/23/2020, 9:28 PM

it's hard to think how to do it efficiently, to be honest

Nir

09/23/2020, 9:28 PM

without replacement, without weights

Nir

09/23/2020, 9:28 PM

you would create a hash set of the indices

Nir

09/23/2020, 9:28 PM

hmm actually no, that doesn't work

Nir

09/23/2020, 9:28 PM

anyway, it seems doable to do it efficiently. with weights and without replacement seems really hard

ephemient

09/23/2020, 9:29 PM

nah I think it's doable with a custom tree structure for the weights

ephemient

09/23/2020, 9:29 PM

just... messy

Nir

09/23/2020, 9:29 PM

sure that's what i mean

Nir

09/23/2020, 9:29 PM

but using trees at all is kinda slow

Nir

09/23/2020, 9:29 PM

but then I'm not sure how much difference it makes in these languages.

Nir

09/23/2020, 9:30 PM

in value semantic languages binary searches on arrays will beat the stuffing out of trees

Nir

09/23/2020, 9:30 PM

not sure how dramatic the gap is in kotlikn

ephemient

09/23/2020, 9:30 PM

on the JVM, List<Int> is always boxed (until Valhalla I guess)

ephemient

09/23/2020, 9:32 PM

and it doesn't do pointer tagging, although it does have compressed OOPs so at least it takes fewer than 64 bits per memory address like native languages

Nir

09/23/2020, 9:34 PM

right, that's what I figured. it's still fewer indirections, since the trees will be double indirections unless they are implemented inside the JVM or something

Nir

09/23/2020, 9:34 PM

but the difference between single and double indirections is less than the differecne between single and none

ephemient

09/23/2020, 9:35 PM

moving GC also improves data locality so that's a factor too

Nir

09/23/2020, 9:35 PM

right

2 Views

Open in Slack

Previous Next