Is there in the standard library (or an external o...
# announcements
l
Is there in the standard library (or an external one) a fucntion similar to
random.choices
in python ?
n
I can't seem to see one, but implementing one on top of random seems reasonably straightforward
do you need weights?
l
Yes, alognside with the possibility of returning multiple items from the original collection without duplicates. A naive implementation would be easy but I need an very efficient one because it will be called thousands of times (possibly millions).
n
you mean, sampling without replacement?
sorry, sampling with replacement
no, without. lol
Okay. So sampling with with replacement is very simple to write yourself.
you just generated N doubles
take a cumulative sum of the weights. For each double, binary search into the list to find the index, and return the value at that index.
If the weights are integers and their total sum is relatively small, a lookup table is probably faster for N large
are your weights integers or floats?
l
idealy floats but I can go with Integers
n
If you can keep the weights in a reasonable range, that would be faster
l
To add context, its for the selection phase of a genetic algorithm, the weights are the fitness values of the population and the collection to choose from is the population
Python is very handy for this case, you can implement the whole selection phase with only reasonably complex expression
Anyway thank you for your help 🙂
n
Copy code
fun<T> Random.choices(population: List<T>, n: Int, weights: List<Float>): List<T> {
    val result = ArrayList<T>()
    result.ensureCapacity(n)
    val cumWeights = run {
        val c = weights.runningReduce { s, t -> s + t }
        c.map { it / c.last() }
    }

    for (d in doubles(n.toLong())) {
        val index = cumWeights.binarySearch(d)
        result.add(population[-1*index])
    }
    return result
}
👍 1
it should look something like this
if you already produce cmulative weights in your algorithm you can skip that step
the way kotlin returns the index from binary search is pretty weird so there's probably off by 1 errors there
but that's the basic idea
without replacement is trickier, I'd need to think about it
e
Python's
random.choices()
selects with replacement, by the way
are you expecting without replacement behavior?
for with replacement behavior, I think I'd do something like this: https://pl.kotl.in/j_b3jZ6jb
using a Sequence so the caller can pull as many as they want, and letting the weights wrap around if they're too short, because I feel like the Kotlin stdlib design is generally to return something (but you could save some code if you took that out)
n
I have to admit, I don't follow how this works
even the code in main:
Copy code
(1..9).toList()
    	    .randoms(weights = listOf(1.0, 2.0))
Your weights list is not the same length as your population
Or how you can take 1000 integers, from the list 1-9 without replacement...
oh, this is with replacement, so forget that part 🙂
e
I'm cycling the weights 😄
n
But yeah, it just seems more complex, more math? And in the end, you call headSet, which is still going to be a log(N) call.
just on a more complex data structure.
so it's probably going to be slower.
I'm guessing that sortedset is probably going to be some kind of tree.
e
yeah default impl is TreeSet
n
Yeah, I have to admit I don't understand what advantage this has over my solution
e
well as I said, IMO kotlin library functions tend to return something instead of erroring on odd inputs
n
err what
If you have specific example I'd be curious to see what they are.
In this specific case I'm pretty sure that erroring out is the right call.
Even if you wanted to cycle the weights though, it would still be faster to just build up an array of weights the same size via cycling, then call my implementation
e
that, I don't see why
n
why it's right to error out?
e
N=population size, W=weight size: mine is O(log W), yours is O(log N)?
n
the population size and the weight size are constrained to be the same?
that's how the python function works
Ah, I see. Yes, if you add this totally nonintuitive behavior of cycling the weights, then yours may be faster when you have many fewer weights than population
e
zip() doesn't fail when the inputs are different lengths, substringBefore/After doesn't fail if it isn't found, etc.
n
If you have examples of what you wrote above I'm genuinely curious btw. Fail fast has been the consensus in software engineering for a while and Kotlin is mostly following consensus
zip does that for a very specific reason
so that it works with infinite ranges
e
all the orNull methods instead of throwing
relative to Java, Kotlin definitely throws less
n
kotlin likes exceptions less
returning null though is signifying an error, just in a different way
e
anyway, even without that part
n
(typically)
e
I believe returning a Sequence is useful
n
sure, the sequence part, I agree with
making it a member function of List to me is weird but largely subjective, I guess.
e
looks like Python doesn't have a sample-without-replacement-with-weights method
n
(/s/member/extension)
e
well we already have fun Iterable<T>.randomOrNull() in the standard library
n
fair enough, I didn't know that
e
this is basically the same
n
it's hard to think how to do it efficiently, to be honest
without replacement, without weights
you would create a hash set of the indices
hmm actually no, that doesn't work
anyway, it seems doable to do it efficiently. with weights and without replacement seems really hard
e
nah I think it's doable with a custom tree structure for the weights
just... messy
n
sure that's what i mean
but using trees at all is kinda slow
but then I'm not sure how much difference it makes in these languages.
in value semantic languages binary searches on arrays will beat the stuffing out of trees
not sure how dramatic the gap is in kotlikn
e
on the JVM, List<Int> is always boxed (until Valhalla I guess)
and it doesn't do pointer tagging, although it does have compressed OOPs so at least it takes fewer than 64 bits per memory address like native languages
n
right, that's what I figured. it's still fewer indirections, since the trees will be double indirections unless they are implemented inside the JVM or something
but the difference between single and double indirections is less than the differecne between single and none
e
moving GC also improves data locality so that's a factor too
n
right