<@U12AGS8JG> 2 questions, 1. on `Arb.distinct` wh...
# kotest
m
@sam 2 questions, 1. on
Arb.distinct
what’s the actual intention behind that? i.e. what’s the behaviour that you would expect if that arb is called? Am i understanding this right?
for any given arbitraries, return a uniformly distributed random elements based on a selector
as in, if the input arb yields
[1, 1, 1, 1, 1, 1, 1, 2]
, the distinct arb would yield
[1, 2, 1, 1, 2, 1, 2, 2]
or something like that with equal sampling probability of
1
and
2
2. are you still keen on introducing Edgecases<A> for the combinatorial edgecases? I’m still working on that atm, but will only proceed if you are.
l
I think that's the expected behavior for distinct, yes
👍 1
m
awesome - in which case i might have a way to do that. That’ll include some random sampling of the input. the reason
sequence.distinct
currently hangs is it’s trying to loop until it exhausts the whole population of the input arb (which is an infinite sequence)
s
No, distinct is about not repeating values
So if you have
<http://Arb.int|Arb.int>().distinct()
you will get random ints, but you won't get the same one twice in a run
It possibly has questionable value. It can be done by wrapping the arb with another arb that holds state.
m
🤔
That is a bit.. questionable 😆
So for 1 and 2, the third sample not available is the intended behaviour with this definition
To be absolutely honest if i were a kotest user i would avoid distinct at all cost if i learn that's the intended behaviour
s
It can be problematic yes. It's no different to the scenario in the stb lib that I pointed out,
sequence { yield(1) }.distinct().take(2)
m
I wonder if that's the case the return type should be reconsidered. Arb in my opinion is like a population that you can always sample from. Distinct definitely does not meet that criteria.
s
I don't think distinct is what you have an issue with necessarily, it's the "we might run out of elements and hang" that you're concerned about right?
m
Yeah but that stems from my understanding of arbs being a population. Hence in my head take(10) will return 10 randomly sampled elements from the population.
s
right
m
If distinct returns an exhaustive that's a different story
s
So I guess I can see two people having the following arguments on this. a) I have a square root function and I want to test it with 1000 random ints, but I don't want repeats because I'd just be testing the same thing twice
and b) Even without distinct you'd probably only get 2 repeats in a 1000 anyway so what's the real value in it
m
Indeed
s
I do agree with you in principle
my initial thought was that we could have a return that indicated the size. If you do
<http://Arb.int|Arb.int>().distinct()
you can have a max of Integer.Max - Integer.Min values
<http://Arb.int|Arb.int>(0..1000).distinct()
can only be used in a method that asks for at most 1000 iterations
the problem is, you can do arbitrary filtering, so it can always end up being uncertain
or I could do
.map { 0 }
now I have a sample space of 1
m
Yeah
s
just thinking out loud here. If we had a subtype of Arb, say RangedArb, that had a size component that was known, we could define distinct on that. Arb.int(0..1000) would return a ranged arb of size 1000, so distinct() could be called on that (we can error if you then ask for > 1000), and if you do an operation like .map or .filter that loses this information, you get a regular arb back.
m
I wonder if it makes more sense for distinct to return an exhaustive
From your explanation it seems like that's what we want
s
that would be nice, but only if we do the ranged arb thing, because
<http://Arb.int|Arb.int>().filter ( it == 0 }.exhaustive()
can't return ?
m
Nope, 😆
Back to drawing board
s
I guess we could do
arb.toExhaustive(requiredSize, maxValues)
which will pull at most maxValues to generate requiredSize, and will error if you don't get there.
or even use that signature with distinct
<http://Arb.int|Arb.int>().exhaustive(1000, 2000)
tries to up 2000 values to get an Exhaustive of 1000 values taken from the int arb
And as a convenience
fun Arb.Companion.toExhaustive(size) = toExhaustive(size, size * 2
there's actually nothing in the Exhaustive contract that says values have to be distinct
and exhaustives do loop
perhaps we model this as another subtype in the ADT as well -
Distinct
which is a group of randomized, but never repeated values
Arb
- infinite, random
Exhaustive
infinite, predictable, has min iteration requirement `Distinct`finite, random, has max iteration value due to being finite
m
https://kotlinlang.slack.com/archives/CT0G9SD7Z/p1599864664164400?thread_ts=1599859758.158400&amp;cid=CT0G9SD7Z Re this, hypothetically it should. I think i may have a way to sanely do that. Will get back to you, getting groceries now
Arb
 - infinite, random
Exhaustive
 infinite, predictable, has min iteration requirement
`Distinct`finite, random, has max iteration value due to being finite (edited)
🤔
s
Maybe it all makes no sense and we just drop distinct
Loads of options for you to play around with there 🤣
m
via random sampling the population: exhaustive
Copy code
fun <A, B> Arb<A>.distinctExhaustive(sampleSize: Int = 1000, rs: RandomSource = RandomSource.Default, selector: (A) -> B): Exhaustive<A> =
   exhaustive(
      this.generate(rs).take(sampleSize).map { it.value }
         .toList()
         .distinctBy(selector) // :tada: 
   )
🙈 rebalanced arbs
Copy code
fun <A, B> Arb<A>.distinctBy(sampleSize: Int = 1000, selector: (A) -> B): Arb<A> =
   Arb.list(this, (sampleSize-1)..sampleSize)
      .flatMap { listA: List<A> ->
         Arb.of(listA.distinctBy(selector))
      }
err, not convinced. may be useful, maybe not?
s
I guess we're not convinced by the usefulness of distinct, but having the user pass in a predicate is quite a nice way of passing the issue of sample space < requested count back to the user
m
Yeah I'm probably going to raise a pr with that just to fix the current issue. Regarding whether we keep it or not, what are your thoughts?
s
I like the idea of having a selector. I'm not convinced on the usefulness of distinct anymore as I can't think of any real usecase for it.
m
On it - expect a pr up by today or tomorrow 😆
👍🏻 1
@sam do you want me to close this PR? https://github.com/kotest/kotest/pull/1722 neither of us are convinced with distinct anymore
s
I'm not sure if we should fix it with caveats, or just deprecate it and remove it
I can't think of a compelling use case for distinct anymore now we have exhaustive
m
yeah
100%
i’d close it. we should close the issue as won’t fix as well
s
Let's do a PR to deprecate distinct
m
coolio
i’ve closed the issue. will raise a PR to deprecate distinct
s
ok great
m
s
merged
m
🎉
thinking out loud, perhaps a
fun <T> Arb<T>.toExhaustive(rs: RandomSource, distinct: boolean = false): Exhaustive<T>
makes more sense
s
Why would you want to turn an arb into an exhaustive for example
m
idk, i wouldn’t 😆
s
lol
Like if you're just generating a bunch of values, that's the same as an arb really
distinct was around from before I added the idea of exhaustive
you've convinced me over these weeks that it's pointless really
m
s
m
yeah i’m moving to that. flatmap already done that apparently, i just realized. so the Arb.bind can just be treated as a monadic bind
s
ok cool
m
obviously the problem of union with emptyList still exists tho, but let’s address that separately
like, if you still recall your suggestion on
EdgeCases<A>
that’s gonna solve that problem, but it’s a bit of a big one
s
if I recall the issue was, how do you make edgecases if some of the bind arbs don't have edgecases?
m
yeah haha, that’s a separate issue altogether i reckon
and it does mean we need to be breaking applicative law a bit 😆
s
I guess the other way to look at it is, if one or more of the arbs have no edgecases, then bind has no edgecases, as it is a function of the whole
m
yep, that’s the approach i’ll take for #1700
s
ok cool
m
and since we have
withEdgecsases
aready, i don’t think it really matters that much
s
right
yeah I mean if you decide to compose with something that has no edgecases, then the result is no edgecases. That's a fair approach.
m
yeah agreed. That also makes sense when you have a list and an empty list, the product would be empty. for reference: flatMap does the cartesian product.
Copy code
/**
 * Returns a new [Arb] which takes its elements from the receiver and maps them using the supplied function.
 */
fun <A, B> Arb<A>.flatMap(f: (A) -> Arb<B>): Arb<B> = object : Arb<B>() {
   override fun edgecases(): List<B> = this@flatMap.edgecases().flatMap { f(it).edgecases() }

   override fun sample(rs: RandomSource): Sample<B> = f(this@flatMap.sample(rs).value).sample(rs)
}
s
The list example is a good one yes. The edgecases for a bind are just the product of the edgecases of its components.
👍 1