Rob Elliot
09/18/2024, 10:03 AM"[a-z]+".toRegex()
into a memoized val
.
Decompiling the generated bytecode shows that the compiler is generating byte code that instantiates a new instance of the Regex every time the code is called (as you'd expect).
Is there some other optimisation taking place (in the Regex class's constructor? In the JVM?) which would prevent it endlessly recompiling the same regex? Is compiling the regex a sufficiently fast operation that it's stupid to be wasting time extracting a constant? Or am I right to care about this?Hristijan
09/18/2024, 10:40 AMlazy
Not sure about whether this is a correct approach tooRob Elliot
09/18/2024, 10:41 AMSzymon Jeziorski
09/18/2024, 10:52 AMprivate val regexStrings = """
(\W|^)stock\s{0,3}tips(\W|${'$'})
(\W|^)stock\s{0,3}tip(s){0,1}(\W|${'$'})
(\W|^)[\w.\-]{0,25}@(yahoo|hotmail|gmail)\.com(\W|${'$'})
(\W|^)po[#\-]{0,1}\s{0,1}\d{2}[\s-]{0,1}\d{4}(\W|${'$'})
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})${'$'}
""".trimIndent().lines().map { it.trim() }
@State(Scope.Benchmark)
class RegexBenchmark {
private val regexString0 = regexStrings[0]
private val regexString1 = regexStrings[1]
private val regexString2 = regexStrings[2]
private val regexString3 = regexStrings[3]
private val regex0 = regexString0.toRegex()
private val regex1 = regexString1.toRegex()
private val regex2 = regexString2.toRegex()
private val regex3 = regexString3.toRegex()
@Benchmark
fun memoized(): List<Regex> = buildList {
add(regex0)
add(regex1)
add(regex2)
add(regex3)
}
@Benchmark
fun createdOnDemand(): List<Regex> = buildList {
add(regexString0.toRegex())
add(regexString1.toRegex())
add(regexString2.toRegex())
add(regexString3.toRegex())
}
}
and results suggest that unless I over simplified it, there is no magic optimization under the hood and it is often better to cache regexes performance-wise:
benchmarks summary:
Benchmark Mode Cnt Score Error Units
RegexBenchmark.createdOnDemand avgt 5 2552.654 ± 269.883 ns/op
RegexBenchmark.memoized avgt 5 12.873 ± 1.689 ns/op
Rob Elliot
09/18/2024, 10:54 AMKlitos Kyriacou
09/18/2024, 11:01 AMRob Elliot
09/18/2024, 11:07 AMimport org.apache.commons.lang3.RandomStringUtils
import kotlin.time.measureTime
fun main() {
val elapsed = measureTime {
repeat(1_000_000) {
if (RandomStringUtils.random(10).matches("[a-z]+".toRegex())) {
println("matched!")
}
}
}
println("took $elapsed")
}
Klitos Kyriacou
09/18/2024, 11:19 AMRob Elliot
09/18/2024, 11:24 AMRob Elliot
09/18/2024, 12:00 PMRob Elliot
09/18/2024, 12:14 PM^[a-zA-Z0-9]+([-.][a-zA-Z0-9]+)*\.?$
Szymon Jeziorski
09/18/2024, 12:16 PMprivate val simpleRegexStrings = """
[a-zA-Z]{2,}
(\w+)\s*(\w+)
\d{3}-\d{3}-\d{4}
\d+
""".trimIndent().lines().map { it.trim() }
private val complexRegexStrings = """
(\W|^)stock\s{0,3}tips(\W|${'$'})
(\W|^)stock\s{0,3}tip(s){0,1}(\W|${'$'})
(\W|^)[\w.\-]{0,25}@(yahoo|hotmail|gmail)\.com(\W|${'$'})
(\W|^)po[#\-]{0,1}\s{0,1}\d{2}[\s-]{0,1}\d{4}(\W|${'$'})
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})${'$'}
""".trimIndent().lines().map { it.trim() }
private val testedStrings = with(RandomStringUtils.secure()) {
buildList {
repeat(10) { this += next(Random.nextInt(10, 100)) }
repeat(10) { this += nextAscii(Random.nextInt(10, 100)) }
repeat(10) { this += nextAlphabetic(Random.nextInt(10, 100)) }
repeat(10) { this += nextNumeric(Random.nextInt(10, 100)) }
repeat(10) { this += nextAlphanumeric(Random.nextInt(10, 100)) }
}
}
@State(Scope.Benchmark)
class RegexBenchmark {
private val memoizedSimpleRegexes = simpleRegexStrings.map { it.toRegex() }
private val memoizedComplexRegexes = complexRegexStrings.map { it.toRegex() }
@Benchmark
fun simpleMemoizedPlusMatches(): List<Boolean> =
testedStrings.flatMap { string ->
memoizedSimpleRegexes.map { it.matches(string) }
}
@Benchmark
fun simpleCreatedOnDemandPlusMatches(): List<Boolean> =
testedStrings.flatMap { string ->
simpleRegexStrings.map { it.toRegex().matches(string) }
}
@Benchmark
fun simpleMemoizedPlusFindAll(): List<MatchResult> =
testedStrings.flatMap { string ->
memoizedSimpleRegexes.flatMap { it.findAll(string).toList() }
}
@Benchmark
fun simpleCreatedOnDemandPlusFindAll(): List<MatchResult> =
testedStrings.flatMap { string ->
simpleRegexStrings.flatMap { it.toRegex().findAll(string).toList() }
}
@Benchmark
fun complexMemoizedPlusMatches(): List<Boolean> =
testedStrings.flatMap { string ->
memoizedComplexRegexes.map { it.matches(string) }
}
@Benchmark
fun complexCreatedOnDemandPlusMatches(): List<Boolean> =
testedStrings.flatMap { string ->
complexRegexStrings.map { it.toRegex().matches(string) }
}
@Benchmark
fun complexMemoizedPlusFindAll(): List<MatchResult> =
testedStrings.flatMap { string ->
memoizedComplexRegexes.flatMap { it.findAll(string).toList() }
}
@Benchmark
fun complexCreatedOnDemandPlusFindAll(): List<MatchResult> =
testedStrings.flatMap { string ->
complexRegexStrings.flatMap { it.toRegex().findAll(string).toList() }
}
}
benchmarks summary:
Benchmark Mode Cnt Score Error Units
RegexBenchmark.complexCreatedOnDemandPlusFindAll avgt 5 421853.928 ± 18149.835 ns/op
RegexBenchmark.complexMemoizedPlusFindAll avgt 5 246762.121 ± 33902.417 ns/op
RegexBenchmark.complexCreatedOnDemandPlusMatches avgt 5 235193.925 ± 13763.335 ns/op
RegexBenchmark.complexMemoizedPlusMatches avgt 5 64234.799 ± 13706.492 ns/op
RegexBenchmark.simpleCreatedOnDemandPlusFindAll avgt 5 190437.414 ± 7547.736 ns/op
RegexBenchmark.simpleMemoizedPlusFindAll avgt 5 175379.259 ± 4604.692 ns/op
RegexBenchmark.simpleCreatedOnDemandPlusMatches avgt 5 68223.847 ± 1464.906 ns/op
RegexBenchmark.simpleMemoizedPlusMatches avgt 5 31435.911 ± 1861.502 ns/op
Rob Elliot
09/18/2024, 12:17 PMDaniel Pitts
09/18/2024, 1:49 PMRegex("\\w+")
matches, but val userNameRegex = Regex("\\w+")
tells me WHY we're matching it.
It also has the benefit – as minuscule as it may be – of reducing number of compiles.