https://kotlinlang.org logo
#getting-started
Title
# getting-started
j

james

02/14/2022, 10:38 PM
in Kotlin, what is the most efficient way to replace many strings inside another string? what I mean is, I could do:
Copy code
originalString
    .replace("cat", "dog")
    .replace("goat", "cow")
    .replace("horse", "pig")
..but chaining those
replace()
calls would become very inefficient I assume, even if I created a map to iterate over. does Kotlin have any functions I can use to do a single pass over
originalString
and replace X number of items?
e

ephemient

02/14/2022, 10:43 PM
Copy code
originalString.replace("cat|goat|horse".toRegex()) {
    when (val value = it.value) {
        "cat" -> "dog"
        "goat" -> "cow"
        "horse" -> "pig"
        else -> value
    }
}
👍 1
😮 3
j

james

02/14/2022, 11:28 PM
nice one, thanks mate!
a

Alexander Maryanovsky

02/15/2022, 5:42 AM
Note that the overhead of a regular expression is very high. Unless you're manipulating a very big string, this is going to be slower than just a chained
replace
.
e

ephemient

02/15/2022, 7:49 AM
that's true, Regex (which just wraps java.util.regex.Pattern on JVM) doesn't optimize this case, and incurs quite a bit of overhead. but if you have to take care with overlapping matches, it can be a bit challenging to handle with a loop of string replacements. this is something where an algorithm like https://github.com/robert-bor/aho-corasick can do far better than the naive approach at scale. running the following benchmark on my machine, I get the results
Copy code
Benchmark                                       (replacements)  (size)   Mode  Cnt        Score        Error  Units
StringReplaceBenchmark.replaceMultiple                   Small      10  thrpt    5  6843692.841 ± 403750.822  ops/s
StringReplaceBenchmark.replaceMultiple                   Small     100  thrpt    5   752203.152 ±  24401.086  ops/s
StringReplaceBenchmark.replaceMultiple                   Small    1000  thrpt    5    56490.448 ±    833.514  ops/s
StringReplaceBenchmark.replaceMultiple                  Medium      10  thrpt    5  3439929.575 ±  37718.221  ops/s
StringReplaceBenchmark.replaceMultiple                  Medium     100  thrpt    5   316184.481 ±   5520.951  ops/s
StringReplaceBenchmark.replaceMultiple                  Medium    1000  thrpt    5    18601.717 ±   1656.362  ops/s
StringReplaceBenchmark.replaceMultiple                   Large      10  thrpt    5   493234.512 ±  14192.615  ops/s
StringReplaceBenchmark.replaceMultiple                   Large     100  thrpt    5    60454.752 ±   2081.056  ops/s
StringReplaceBenchmark.replaceMultiple                   Large    1000  thrpt    5     2916.083 ±     91.152  ops/s
StringReplaceBenchmark.replaceRegex                      Small      10  thrpt    5   640718.250 ±   6910.791  ops/s
StringReplaceBenchmark.replaceRegex                      Small     100  thrpt    5    75116.168 ±    506.125  ops/s
StringReplaceBenchmark.replaceRegex                      Small    1000  thrpt    5     8765.309 ±    684.884  ops/s
StringReplaceBenchmark.replaceRegex                     Medium      10  thrpt    5   194739.721 ±   2636.998  ops/s
StringReplaceBenchmark.replaceRegex                     Medium     100  thrpt    5    43549.693 ±    650.096  ops/s
StringReplaceBenchmark.replaceRegex                     Medium    1000  thrpt    5     2896.253 ±     23.372  ops/s
StringReplaceBenchmark.replaceRegex                      Large      10  thrpt    5    21717.137 ±    258.507  ops/s
StringReplaceBenchmark.replaceRegex                      Large     100  thrpt    5     3740.151 ±     90.825  ops/s
StringReplaceBenchmark.replaceRegex                      Large    1000  thrpt    5      395.084 ±     15.892  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled           Small      10  thrpt    5  1152230.965 ±  36441.508  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled           Small     100  thrpt    5    89396.970 ±   2498.650  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled           Small    1000  thrpt    5     8850.670 ±     94.337  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled          Medium      10  thrpt    5   297080.038 ±   8322.604  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled          Medium     100  thrpt    5    47088.269 ±   1587.499  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled          Medium    1000  thrpt    5     2940.105 ±     50.378  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled           Large      10  thrpt    5    34627.292 ±   1444.838  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled           Large     100  thrpt    5     4001.184 ±    155.575  ops/s
StringReplaceBenchmark.replaceRegexPrecompiled           Large    1000  thrpt    5      408.340 ±     18.374  ops/s
StringReplaceBenchmark.replaceTrie                       Small      10  thrpt    5   475508.504 ±  27061.453  ops/s
StringReplaceBenchmark.replaceTrie                       Small     100  thrpt    5   101975.264 ±    771.657  ops/s
StringReplaceBenchmark.replaceTrie                       Small    1000  thrpt    5    10060.275 ±    166.347  ops/s
StringReplaceBenchmark.replaceTrie                      Medium      10  thrpt    5    89048.167 ±    573.991  ops/s
StringReplaceBenchmark.replaceTrie                      Medium     100  thrpt    5    43124.884 ±    290.183  ops/s
StringReplaceBenchmark.replaceTrie                      Medium    1000  thrpt    5     5730.146 ±    155.629  ops/s
StringReplaceBenchmark.replaceTrie                       Large      10  thrpt    5    10000.751 ±    356.272  ops/s
StringReplaceBenchmark.replaceTrie                       Large     100  thrpt    5     8499.478 ±    114.008  ops/s
StringReplaceBenchmark.replaceTrie                       Large    1000  thrpt    5     3197.324 ±     37.847  ops/s
StringReplaceBenchmark.replaceTriePrecompiled            Small      10  thrpt    5  1646352.116 ±  38731.483  ops/s
StringReplaceBenchmark.replaceTriePrecompiled            Small     100  thrpt    5   118071.785 ±   2940.900  ops/s
StringReplaceBenchmark.replaceTriePrecompiled            Small    1000  thrpt    5    10122.962 ±    320.628  ops/s
StringReplaceBenchmark.replaceTriePrecompiled           Medium      10  thrpt    5  1011767.809 ±  48454.186  ops/s
StringReplaceBenchmark.replaceTriePrecompiled           Medium     100  thrpt    5    76427.972 ±   1846.560  ops/s
StringReplaceBenchmark.replaceTriePrecompiled           Medium    1000  thrpt    5     6137.517 ±     83.122  ops/s
StringReplaceBenchmark.replaceTriePrecompiled            Large      10  thrpt    5   804546.386 ±  11122.620  ops/s
StringReplaceBenchmark.replaceTriePrecompiled            Large     100  thrpt    5    59592.308 ±   4187.163  ops/s
StringReplaceBenchmark.replaceTriePrecompiled            Large    1000  thrpt    5     4790.049 ±    384.746  ops/s
which shows some that string replace gets linearly worse (as expected) while Aho-Corasick grows much slower
r

Roukanken

02/15/2022, 9:39 AM
tbh, with an efficient kotlin/java library for regex, you would have basically Aho-Corasick there automatically 😄
e

ephemient

02/15/2022, 9:46 AM
yep! but it won't be hooked up to kotlin.text.Regex so you'll have to use separate types just like this example anyway
j

Joffrey

02/15/2022, 9:54 AM
TIL there is a kotlinx.benchmark multiplatform library that wraps JMH on JVM mind blown thanks @ephemient
👌 1
14 Views