stdlib, embrace codePoints! ```inline fun CharSequ...
# stdlib
m
stdlib, embrace codePoints!
Copy code
inline fun CharSequence.forEachCodePoint(
    start: Int = 0, end: Int = length,
    action: (index: Int, codePoint: Int) -> Unit
) {
    var i = start
    var codePoint: Int
    while (i < end) {
        codePoint = Character.codePointAt(this, i);
        action(i, codePoint)
        i += Character.charCount(codePoint)
    }
}
👍 2
m
To avoid the ambiguity of whether the start/end/index are char offsets or codepoint indexes, I just use a sequence and then drop/take as necessary. Not as efficient, but fine in most cases.
Copy code
fun CharSequence?.asCodePointSequence(): Sequence<Int> {
    return if (this.isNullOrEmpty()) {
        emptySequence()
    } else {
        var nextOffset = 0
        sequence {
            while (nextOffset < length) {
                val result = Character.codePointAt(this@asCodePointSequence, nextOffset)
                nextOffset += Character.charCount(result)
                yield(result)
            }
        }
    }
}
m
Java has similar
.codePoints(): IntStream
but it's not really useful. I'd better use
IntIterator
.
m
What are the pros/cons of building an
IntIterator
vs
Sequence<Int>
?
m
Iterator has direct
next()
method. This makes easy, for example, to wrap it and make iterator of words: you just call
next()
until space or punctuation encountered. Making the same thing to Sequence/Stream is basically the same but requires more indirection because Sequence/Stream it is not an Iterator itself but a (spl)iterator factory. Example: https://youtrack.jetbrains.com/issue/KT-38384
👍 1