I'm learning property testing. I want to generate ...
# kotest
c
I'm learning property testing. I want to generate a blank string and a non-blank string. My generators for each are:
Copy code
val genBlankString: Arb<String> = Arb.string().filter { it.isBlank() }
Copy code
val genNonBlankString: Arb<String> = Arb.string().filter { it.isNotBlank() }
What are the drawbacks of this approach? Specifically, what are the drawbacks of using
filter
? Does it reduce the number of tests that are being run and the comprehensiveness of the tests?
l
There are no major drawbacks of using filter. The sequence generated by
Arb.string()
is lazy and will be filtered as elements are generated. The problem you will face for your approach specifically is that getting an empty string from all the possible strings is very unlikely, so your test will probably take a long time (if not forever) to generate enough strings to get you plenty of blank strings to work with. I would instead check the parameters for
Arb.string()
as they can be manipulated for your liking:
Copy code
fun Arb.Companion.string(
   minSize: Int = 0,
   maxSize: Int = 100,
   codepoints: Arb<Codepoint> = Codepoint.printableAscii()
)
---- It's not a
filter
issue specifically, but the way generation works
The amount of tests should remain the same either way (1000 by default IIRC)
c
Thanks, @LeoColman
I assume you mean look into the codepoints argument.
l
I suppose the size of the string can help too, as a string of size 0 is considered a blank string (although not in the sense you're meaning)
👍 1
c
I want to test both empty and blank strings.
l
Copy code
fun Codepoint.Companion.whitespace(): Arb<Codepoint> =
   Arb.of(listOf(
      9,  // TAB
      10, // LINE FEED
      11, // LINE TABULATION
      12, // FORM FEED
      13, // CARRIAGE RETURN
      32, // SPACE
   ).map(::Codepoint))
this guy is probably what you want for strings with whitespaces
👍 1
thank you frog 1
s
Outside of ASCII there are a ton of invisible characters in Unicode. This is most definitely not a complete list but here are some examples: https://invisible-characters.com/
c
@Scott Fedorov, how would you write this generator?
s
Leonardo's suggestion for generating whitespace is probably good, but just suggesting you might want to also include more than those ASCII values. I've seen bad actors regularly use unicode characters well outside the ascii range but invisible in order to skirt text match rules. You could spend a lot of time deep diving into unicode, so pick whatever makes sense for your use case, but I don't think theres a generic method.
👍 1