< sam> 2 questions 1 on `Arb distinct` what s the actual int kotlinlang #kotest

<@U12AGS8JG> 2 questions, 1. on `Arb.distinct` wh...

mitch

09/11/2020, 9:29 PM

@sam 2 questions, 1. on

Arb.distinct

what’s the actual intention behind that? i.e. what’s the behaviour that you would expect if that arb is called? Am i understanding this right?

for any given arbitraries, return a uniformly distributed random elements based on a selector

as in, if the input arb yields

[1, 1, 1, 1, 1, 1, 1, 2]

, the distinct arb would yield

[1, 2, 1, 1, 2, 1, 2, 2]

or something like that with equal sampling probability of

and

2. are you still keen on introducing Edgecases<A> for the combinatorial edgecases? I’m still working on that atm, but will only proceed if you are.

LeoColman

09/11/2020, 9:38 PM

I think that's the expected behavior for distinct, yes

👍 1

mitch

09/11/2020, 9:50 PM

awesome - in which case i might have a way to do that. That’ll include some random sampling of the input. the reason

sequence.distinct

currently hangs is it’s trying to loop until it exhausts the whole population of the input arb (which is an infinite sequence)

sam

09/11/2020, 10:16 PM

No, distinct is about not repeating values

sam

09/11/2020, 10:17 PM

So if you have

<http://Arb.int|Arb.int>().distinct()

you will get random ints, but you won't get the same one twice in a run

sam

09/11/2020, 10:17 PM

It possibly has questionable value. It can be done by wrapping the arb with another arb that holds state.

mitch

09/11/2020, 10:18 PM

🤔

mitch

09/11/2020, 10:19 PM

That is a bit.. questionable 😆

mitch

09/11/2020, 10:20 PM

So for 1 and 2, the third sample not available is the intended behaviour with this definition

mitch

09/11/2020, 10:22 PM

To be absolutely honest if i were a kotest user i would avoid distinct at all cost if i learn that's the intended behaviour

sam

09/11/2020, 10:27 PM

It can be problematic yes. It's no different to the scenario in the stb lib that I pointed out,

sequence { yield(1) }.distinct().take(2)

mitch

09/11/2020, 10:35 PM

I wonder if that's the case the return type should be reconsidered. Arb in my opinion is like a population that you can always sample from. Distinct definitely does not meet that criteria.

sam

09/11/2020, 10:37 PM

I don't think distinct is what you have an issue with necessarily, it's the "we might run out of elements and hang" that you're concerned about right?

mitch

09/11/2020, 10:39 PM

Yeah but that stems from my understanding of arbs being a population. Hence in my head take(10) will return 10 randomly sampled elements from the population.

sam

09/11/2020, 10:40 PM

right

mitch

09/11/2020, 10:41 PM

If distinct returns an exhaustive that's a different story

sam

09/11/2020, 10:41 PM

So I guess I can see two people having the following arguments on this. a) I have a square root function and I want to test it with 1000 random ints, but I don't want repeats because I'd just be testing the same thing twice

sam

09/11/2020, 10:41 PM

and b) Even without distinct you'd probably only get 2 repeats in a 1000 anyway so what's the real value in it

mitch

09/11/2020, 10:42 PM

Indeed

sam

09/11/2020, 10:43 PM

I do agree with you in principle

sam

09/11/2020, 10:44 PM

my initial thought was that we could have a return that indicated the size. If you do

<http://Arb.int|Arb.int>().distinct()

you can have a max of Integer.Max - Integer.Min values

sam

09/11/2020, 10:44 PM

<http://Arb.int|Arb.int>(0..1000).distinct()

can only be used in a method that asks for at most 1000 iterations

sam

09/11/2020, 10:45 PM

the problem is, you can do arbitrary filtering, so it can always end up being uncertain

sam

09/11/2020, 10:45 PM

or I could do

.map { 0 }

now I have a sample space of 1

mitch

09/11/2020, 10:49 PM

Yeah

sam

09/11/2020, 10:49 PM

just thinking out loud here. If we had a subtype of Arb, say RangedArb, that had a size component that was known, we could define distinct on that. Arb.int(0..1000) would return a ranged arb of size 1000, so distinct() could be called on that (we can error if you then ask for > 1000), and if you do an operation like .map or .filter that loses this information, you get a regular arb back.

mitch

09/11/2020, 10:50 PM

I wonder if it makes more sense for distinct to return an exhaustive

mitch

09/11/2020, 10:50 PM

From your explanation it seems like that's what we want

sam

09/11/2020, 10:51 PM

that would be nice, but only if we do the ranged arb thing, because

<http://Arb.int|Arb.int>().filter ( it == 0 }.exhaustive()

can't return ?

mitch

09/11/2020, 10:52 PM

Nope, 😆

mitch

09/11/2020, 10:52 PM

Back to drawing board

sam

09/11/2020, 10:52 PM

I guess we could do

arb.toExhaustive(requiredSize, maxValues)

which will pull at most maxValues to generate requiredSize, and will error if you don't get there.

sam

09/11/2020, 10:53 PM

or even use that signature with distinct

sam

09/11/2020, 10:54 PM

<http://Arb.int|Arb.int>().exhaustive(1000, 2000)

tries to up 2000 values to get an Exhaustive of 1000 values taken from the int arb

sam

09/11/2020, 10:56 PM

And as a convenience

fun Arb.Companion.toExhaustive(size) = toExhaustive(size, size * 2

sam

09/11/2020, 10:56 PM

there's actually nothing in the Exhaustive contract that says values have to be distinct

sam

09/11/2020, 10:57 PM

and exhaustives do loop

sam

09/11/2020, 10:57 PM

perhaps we model this as another subtype in the ADT as well -

Distinct

which is a group of randomized, but never repeated values

sam

09/11/2020, 11:00 PM

Arb

- infinite, random

Exhaustive

infinite, predictable, has min iteration requirement `Distinct`finite, random, has max iteration value due to being finite

mitch

09/11/2020, 11:32 PM

https://kotlinlang.slack.com/archives/CT0G9SD7Z/p1599864664164400?thread_ts=1599859758.158400&cid=CT0G9SD7Z Re this, hypothetically it should. I think i may have a way to sanely do that. Will get back to you, getting groceries now

mitch

09/12/2020, 12:10 AM

Arb
- infinite, random

Exhaustive
infinite, predictable, has min iteration requirement

`Distinct`finite, random, has max iteration value due to being finite (edited)

🤔

sam

09/12/2020, 1:07 AM

Maybe it all makes no sense and we just drop distinct

sam

09/12/2020, 1:08 AM

Loads of options for you to play around with there 🤣

mitch

09/12/2020, 12:08 PM

via random sampling the population: exhaustive

Copy code

fun <A, B> Arb<A>.distinctExhaustive(sampleSize: Int = 1000, rs: RandomSource = RandomSource.Default, selector: (A) -> B): Exhaustive<A> =
   exhaustive(
      this.generate(rs).take(sampleSize).map { it.value }
         .toList()
         .distinctBy(selector) // :tada: 
   )

🙈 rebalanced arbs

Copy code

fun <A, B> Arb<A>.distinctBy(sampleSize: Int = 1000, selector: (A) -> B): Arb<A> =
   Arb.list(this, (sampleSize-1)..sampleSize)
      .flatMap { listA: List<A> ->
         Arb.of(listA.distinctBy(selector))
      }

err, not convinced. may be useful, maybe not?

sam

09/12/2020, 1:09 PM

I guess we're not convinced by the usefulness of distinct, but having the user pass in a predicate is quite a nice way of passing the issue of sample space < requested count back to the user

mitch

09/12/2020, 10:01 PM

Yeah I'm probably going to raise a pr with that just to fix the current issue. Regarding whether we keep it or not, what are your thoughts?

sam

09/12/2020, 10:02 PM

I like the idea of having a selector. I'm not convinced on the usefulness of distinct anymore as I can't think of any real usecase for it.

mitch

09/18/2020, 10:09 PM

On it - expect a pr up by today or tomorrow 😆

👍🏻 1

mitch

09/25/2020, 11:46 PM

@sam do you want me to close this PR? https://github.com/kotest/kotest/pull/1722 neither of us are convinced with distinct anymore

sam

09/25/2020, 11:46 PM

I'm not sure if we should fix it with caveats, or just deprecate it and remove it

sam

09/25/2020, 11:47 PM

I can't think of a compelling use case for distinct anymore now we have exhaustive

mitch

09/25/2020, 11:47 PM

yeah

mitch

09/25/2020, 11:47 PM

100%

mitch

09/25/2020, 11:47 PM

i’d close it. we should close the issue as won’t fix as well

sam

09/25/2020, 11:47 PM

Let's do a PR to deprecate distinct

mitch

09/25/2020, 11:47 PM

coolio

mitch

09/25/2020, 11:54 PM

i’ve closed the issue. will raise a PR to deprecate distinct

sam

09/25/2020, 11:54 PM

ok great

mitch

09/26/2020, 12:12 AM

raised https://github.com/kotest/kotest/pull/1730

sam

09/26/2020, 12:18 AM

merged

mitch

09/26/2020, 12:20 AM

🎉

mitch

09/26/2020, 12:27 AM

thinking out loud, perhaps a

fun <T> Arb<T>.toExhaustive(rs: RandomSource, distinct: boolean = false): Exhaustive<T>

makes more sense

sam

09/26/2020, 12:27 AM

Why would you want to turn an arb into an exhaustive for example

mitch

09/26/2020, 12:28 AM

idk, i wouldn’t 😆

sam

09/26/2020, 12:28 AM

lol

sam

09/26/2020, 12:28 AM

Like if you're just generating a bunch of values, that's the same as an arb really

sam

09/26/2020, 12:28 AM

distinct was around from before I added the idea of exhaustive

sam

09/26/2020, 12:28 AM

you've convinced me over these weeks that it's pointless really

mitch

09/26/2020, 12:30 AM

sam

09/26/2020, 12:31 AM

https://github.com/kotest/kotest/issues/1700 so what is left on this

mitch

09/26/2020, 12:31 AM

yeah i’m moving to that. flatmap already done that apparently, i just realized. so the Arb.bind can just be treated as a monadic bind

sam

09/26/2020, 12:32 AM

ok cool

mitch

09/26/2020, 12:32 AM

obviously the problem of union with emptyList still exists tho, but let’s address that separately

mitch

09/26/2020, 12:34 AM

like, if you still recall your suggestion on

EdgeCases<A>

that’s gonna solve that problem, but it’s a bit of a big one

sam

09/26/2020, 12:34 AM

if I recall the issue was, how do you make edgecases if some of the bind arbs don't have edgecases?

mitch

09/26/2020, 12:35 AM

yeah haha, that’s a separate issue altogether i reckon

mitch

09/26/2020, 12:35 AM

and it does mean we need to be breaking applicative law a bit 😆

sam

09/26/2020, 12:35 AM

I guess the other way to look at it is, if one or more of the arbs have no edgecases, then bind has no edgecases, as it is a function of the whole

mitch

09/26/2020, 12:36 AM

yep, that’s the approach i’ll take for #1700

sam

09/26/2020, 12:36 AM

ok cool

mitch

09/26/2020, 12:36 AM

and since we have

withEdgecsases

aready, i don’t think it really matters that much

sam

09/26/2020, 12:37 AM

right

sam

09/26/2020, 12:37 AM

yeah I mean if you decide to compose with something that has no edgecases, then the result is no edgecases. That's a fair approach.

mitch

09/26/2020, 12:38 AM

yeah agreed. That also makes sense when you have a list and an empty list, the product would be empty. for reference: flatMap does the cartesian product.

Copy code

/**
 * Returns a new [Arb] which takes its elements from the receiver and maps them using the supplied function.
 */
fun <A, B> Arb<A>.flatMap(f: (A) -> Arb<B>): Arb<B> = object : Arb<B>() {
   override fun edgecases(): List<B> = this@flatMap.edgecases().flatMap { f(it).edgecases() }

   override fun sample(rs: RandomSource): Sample<B> = f(this@flatMap.sample(rs).value).sample(rs)
}

sam

09/26/2020, 12:39 AM

The list example is a good one yes. The edgecases for a bind are just the product of the edgecases of its components.

👍 1

15 Views

Open in Slack

Previous Next