<@U12AGS8JG> hey hey I finally got into this slack...
# kotest
m
@sam hey hey I finally got into this slack! thanks for answering my q on https://github.com/kotest/kotest/issues/1646 I'm currently prepping a PR for that (including introducing the new
value(rs: RandomSource): Sample<A>
). now alot of the code is currently using values (it shows the strikethrough in intellij) and I'm fixing those as well. I realized that there's a fair few functions like this one
arb(...)
that wants Sequence in it, how would I go about that?
s
We would need to keep the original functions, and somehow come up with an alternative builder function that doesn't require sequences, but a simple function. Eg,
arb { n }
If we can't do that in the same package because of signature clashes then we can either, add that a subpackage (arb.builders?) or add a function to the arb companion,
fun Arb.Companion.from(f: () -> T): Arb<T>
🎖️ 1
Arb.from could be Arb.builder or Arb.fn or whatever
m
ah nice. Yeah i was thinking along the same line
Arb.create
already exist with the correct types, so i wonder if I can just add several more creates
meanwhile i'm hitting another wall: bind.. still thinking how to make that work with the edgecases with the single emission model..
hmm, looking at scalacheck, the way they do it is by using varying frequency i.e.
Arb.choose(vararg pairs: Pair<Int, Arb<A>>)
in kotest https://github.com/typelevel/scalacheck/blob/master/src/main/scala/org/scalacheck/Gen.scala#L1224-L1232
@sam i can make that change but will need to run it by you as to whether that design aligns to what you have in mind
s
Why is bind problematic with the single emission model ?
Do you mean because of how to handle edgecases? If so, I would just ignore edge cases. They don't really make much sense once you get into composition
Alternatively, the edge cases in bind, could be the permutation of the edge cases of each contributing arb
m
yeah tried the permutation approach (cartesian product) but it failed as soon as there's an arb with an empty edgecases. i.e.
Copy code
[1,2,3]
['a', 'b', 'c']
[]

-> yields []
s
Right, I guess in that case, you just return no edge cases.
If you're trying to compose a, b, c into (a, b, c) and c has no edge cases, then you can't have a union with edge cases.
m
yeah it's a product after all.. alternatively i'm pretty sure the approach using
Arb.choose
would work. as in, when you do a bind, you'd assign something like weight 1 to the edge cases, and weight 9 to the randomized values
edgecases i feel is one of the most powerful feature in proptesting so that's one thing that i don't want to sacrifice
s
I'm not sure how many other prop libraries do edge cases like kotest
I think it's acceptable to say, if you're using bind and one of the contributing arbs doesn't provide edge cases, then the product doesn't have edge cases either.
All the "basic" types have edge cases, so most of the time, your bind will to
Another alternative is this - take the permutations of the edge cases and if an arb has zero edge cases, we treat it as an edge case of 1, where that 1 value is random. If all arbs have zero edge cases, then bind has zero edge cases.
Copy code
[1,2,3]
['a', 'b', 'c']
[] -> we treat this as [random]
That would give you [1, a, random], [2, a, random] and so on
m
hang on - don't edgecases not have access to random seed?
s
Yeah that's a good point
we could introduce it
m
Hmm.. we can do that, i'm not entirely sure if that's fit with the interface. I kinda like the edgecases being a list right now. So far we have a couple of options: • cartesian product with empty assigned as randomized list of 1. this need access to rs • the scalacheck approach using weighted arbs. no change needed in the interface.
I might bang my head to the keyboard a few more times for some inspiration.
s
How does the scalatest one work, if the arb has no edge cases, then what does the weighting do ?
m
it's going to give the full weight to
value(rs)
i suppose
s
I think the weighting in scalacheck is how they do edgecases. Give a bigger weighting to edge cases. I don't like that because you're not guaranteed to get edge cases, which I feel you should be.
I think any function accepting an int should always be tested with 0 for example. Not just "likely" tested with 0.
m
yeah agreed
s
I also agree with you that I like edgecases to not require a random source.
The only other thing I can think of, is to introduce a subtype of Arb,
ComposedArb
that has extra method(s) for dealing with this kind of thing.
You would require the property test framework to then be aware of the extra value in the ADT
My final suggestion is to have bind accept a random parameter of it's own. This would require the user to provide their own random instance, or it could default.
m
I'm tempted to experiment with the ComposedArb idea. The final suggestion feels quite leaky as an abstraction for the users.
s
Have a play with ComposedArb and see what you can come up with. Probably some function that generates edge cases from a random source delegating to the underlying arbs.
❤️ 1
m
what's in my head right now is as if
Copy code
edgecases(): Exhaustive<A>
(conceptually)
s
I think it would need to stay as List to avoid breakage
m
yeah definitely a list - sorry that's just how it felt like in my head.
👍🏻 1
because exhaustive.toArb does shove the values into a list and randomly choosing one
s
Right
Copy code
abstract class ComposedArb<out A> : Gen<A> {

   fun edgecases(rs: RandomSource): List<A>

   abstract fun value(rs: RandomSource): Sample<A>
}
It doesn't feel quite right, given that it's just an Arb with the rs parameter in edgecases
m
indeed 🤔
s
What about if edgecases was to return an ADT itself.
m
something like a
(RandomSource) -> List<A>
kleisli?
s
Copy code
sealed class Edgecases<out A> {

   abstract fun values(rs: RandomSource): List<A>

   class Static(val values: List<A>) : Edgecases<A> {
      override fun values(rs: RandomSource): List<A> = values
   }
   class Dynamic(val fn: (RandomSource) -> List<A>) : Edgecases<A> {
      override fun values(rs: RandomSource): List<A> = fn(rs)
   }
}
yea
m
ha that's so good
s
The trick is making that work with the existing signature
Again we could introduce a sibling function,
fun edges() :EdgeCases<A>
🎖️ 1
m
i think this is less of a user-facing thing, the old builder would still work because it's
_ -> List<A>
s
Copy code
abstract class Arb<out A> : Gen<A>() {

   abstract fun edges(): Edgecases<A> = Edgecases.Static(edgecases())

   abstract fun edgecases(): List<A>

   abstract fun values(rs: RandomSource): Sequence<Sample<A>>

   companion object
}
Yes, some people do override Arb directly though. I also completely rewrote the prop test framework for 4.0.0 so I'm loathe to introduce any breaking change now, no matter how small. Best to have stability for users who spent the time converting from 3.x to 4.x
m
ah yeah me dumb, didn't consider peeps who override edgecases directly
s
But that works, if you override edgecases, you can continue to do so. Or you can override edges().
m
users who spent the time converting from 3.x to 4.x
i can relate lol
😂 1
s
I think I prefer this to composed arb
1
I don't like the terminology of static and dynamic, so better names perhaps
Maybe we don't even bother with the ADT,
Copy code
typealias Edgecases = (RandomSource) -> List<A>
m
or simply a data class for now:
Copy code
data class Edgecases<A>(edges: (RandomSource) -> List<A>)
i'll get something ready,
s
Although, arguing with myself, there's not much difference between returning a function that accepts a random source and just passing random source into the function
Let's take the Sequence -> Sample problem first, and leave bind as it is for now. Then we can introduce edge cases as a second PR.
✔️ 1
Copy code
sealed class Edgecases<out A> {
   abstract fun values(rs: RandomSource): List<A>
   class List(val values: List<A>) : Edgecases<A> {
      override fun values(rs: RandomSource): List<A> = values
   }
   class Randomized(val fn: (RandomSource) -> List<A>) : Edgecases<A> {
      override fun values(rs: RandomSource): List<A> = fn(rs)
   }
}
m
awesome! yeah personally i like having something that's typed (like
Sample<A>
and in this case
Edgecases<A>
makes it easier to enrich and refactor in the future.
👍🏻 1
s
Ok sounds like a plan
m
thanks @sam!! that's awesome, let me prep something for you
👍🏻 1
@sam phew - i suppose i'll check this in first - the first PR to introduce single emission is up. https://github.com/kotest/kotest/pull/1688
I refrained from changing all the arbs in the kotest-property. I suppose i'd need to focus on backward compatibility (and given that we haven't to introduced
Edgecases<A>
yet)
@sam qq - i'm currently doing some experiment on bind with
Edgecases<A>
i'm not entirely convinced with the cartesian product approach, as the size of minimum iterations can explode with the number of edgecase combinations. i.e.
Copy code
a - [1, 2, 3]
b - [1, 2, 3, 4]
c - [1, 2]
d - [1, 2, 3, 4, 5]
e - [] - randomize 1 element
if we were to compute the product we'll end up with 3 * 4 * 2 * 5 * 1 = 120 minimum iterations. currently Kotest takes the edgecases linearly due to generate(rs) and iterator.
an alternative would be to follow what kotest currently does (linearly take them), but then the combo between a,b,c,d, and e will be very deterministic:
Copy code
(a, b, c, d, e)
(1, 1, 1, 1, r)
(2, 2, 2, 2, r)
(3, 3, r, 3, r)
(r, 4, r, 4, r)
(r, r, r, 5, r)
I start to contemplate about this:
Do you mean because of how to handle edgecases? If so, I would just ignore edge cases. They don't really make much sense once you get into composition
questioning what's the correct thing to do in
bind
and
flatMap
case..
we can somewhat shuffle the list first (because we have access to random seed). However, in terms of flatMap the size of the list is still a problem. we can randomly pick candidates out of the edgecases, but that's going to be similar to the scalacheck approach using frequency 🤔
s
Tricky questions. The point of edgecases is to make some parts of the selection of values deterministic. Like I believe it's important to always test (Int) -> T with 0 for example. Most (all?) other frameworks are not going to guarantee you'll get the 0 every time.
So it appears our options are: a) combinatorial explosion b) linear select losing some combinations c) randomize the edge cases
a) is clearly the best option if it would work. I prefer b over c but other people may have a different opinion.
The framework will raise an error if we try to use an arb that generates more edge cases than the iteration count of our prop test. So if you have 2500 edge cases, and you try to use that arb in a forAll that only has 1000 iterations, it errors.
So that puts the focus on the user to not use bind in situations where you will end up with millions of edge cases (or you increase your iteration count appropriately).
An interesting function to add would be
Arb.randomOnly()
which returns a copy of the arb but with the edge cases removed (that randomOnly name is a bit lame though) and Arb.withEdgeCases to copy the arb with differnt edge cases.
Then you could use bind, and drop the edge cases from either one or more of the inputs ,or all them from the result.
m
we can even offload those decisions to the user at execution. With the single-emission and new edgecases model that supports random seed, it's possible to encode both behaviours at time of generation. I'm thinking out loud here, but it might look like this:
Copy code
fun generate(rs: RandomSource, edgesSamplingMode: Edgecases.SamplingMode = Edecases.SamplingMode.Exhaustive) = // implementation detail

// i'm making stuff up here
sealed class SamplingMode {
    object Exhaustive : SamplingMode() // all exhaustive permutations

    // if we want to dirty our generated distribution with edgecases with fixed probability
    case class FixedProbabilitySampling(val samplingProbability: Double = 0.1) : SamplingMode()

    // if we want more control over the ratio of the distribution
    case class DynamicProbabilitySampling(val startProbability: Double = 1.0, val targetProbability = 0.1, val decayFunction: (previousProbability: Double, iterations: Int) -> Double) : SamplingMode()
}
for these options the advantage of shuffling is so that we can test all edgecases combination uniformly, i.e.
Copy code
(a, b, c, d, e)
(1, r, 1, 3, r)
(2, 2, 3, 2, r)
(3, 1, r, r, r)
(r, 4, 3, 1, r)
(r, 3, r, 5, r)
It's not as powerful as the exhaustive combinations, because it gives only an approximation of it. If things fail, dev will be presented with the random seed which they can use. I won't be surprised if this is a nice compromise for a system with many inputs.
s
It's a solution but it does go against the concept of edgecases a bit
m
yeah i see your perspective - which i also strongly agree. (I just happen to have been dealing with some developers who have some perspective around this). I think I like this approach. Users who want to transform their edgecases to more of a probabilistic Arbitrary can do that instead. Meanwhile I believe Kotest should do what it should do, i.e. getting all the edgecases permutations.
An interesting function to add would be 
Arb.randomOnly()
  which returns a copy of the arb but with the edge cases removed (that randomOnly name is a bit lame though) and Arb.withEdgeCases to copy the arb with differnt edge cases. (edited)
I believe I'll close #1646 and raise a new issue for this
s
Alright cool
we're making great improvements with this work
or you are
m
Thanks to you! I like the framework. And my secret agenda is to make Kotest the go-to standard for testing Kotlin projects in my company. 😉
it’s by far superior to every other ones
❤️ 1
and to me, the major selling point is definitely the property testing bit
💯 1