https://kotlinlang.org logo
#getting-started
Title
# getting-started
v

v79

10/19/2023, 9:34 PM
I'm getting an unexpected behaviour with
String.split(regex)
when there are no matches. Does
split()
return "the rest of the string" when a regex match fails? Code in 🧵
👌 1
Copy code
fun main() {
    val regex = Regex("-{3} #")
    val willMatch = """
    ---
    apple: banana
    --- #matches
    Yay
    """
    val willNotMatch = """
    ---
    Not a valid match
    """
    println("Find the number of matches")
    println(regex.findAll(willMatch).count()) // expect 1, got 1
    println(regex.findAll(willNotMatch).count()) // expect 0, got 0

    println("Split and build a map from the matches. Why does size != count from above?")
    println(willMatch.split(regex).map {it.trim()}.associate{it.substringBefore("\n").trim() to it.substringAfter("\n")}.size) // expect 1, got 2
    println(willNotMatch.split(regex).map {it.trim()}.associate{it.substringBefore("\n") to it.substringAfter("\n")}.size) // expect 0, got 1

}
e

ephemient

10/19/2023, 9:41 PM
split, with a regex or not, always returns at least 1 segment
☝️ 2
h

hho

10/19/2023, 9:42 PM
Just as the Java API docs state:
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-int-
e

ephemient

10/19/2023, 9:43 PM
Java's string split actually changed behavior between 7 and 8
Kotlin's string split is always consistent though
simpler example to demonstrate:
Copy code
"a-b".split('-') == listOf("a", "b")
there is 1 match and 2 results
v

v79

10/19/2023, 9:50 PM
Thanks for the confirmation. I guess I'll need to check for matches, then only do the split if the match succeeds.
e

ephemient

10/19/2023, 9:50 PM
(the weird edge cases come from empty strings or empty matches)
why?
v

v79

10/19/2023, 9:53 PM
I'm extracting text that may be split by named blocks ("--- #blockName"). I need the names and the text that follows. But I don't want to split when there is no block name.
e

ephemient

10/19/2023, 9:54 PM
so? no matter what, it sounds like you don't want the content before the first match, and so
.split().drop(1)
always gets you what you want whether there is a match or not
or possibly split isn't the right tool anyway, if you can potentially have both named and unnamed blocks
e.g. https://pl.kotl.in/ZAzlNnaNn can't be done by just
split()
5 Views