Proposal. ```fun CharSequence.splitToSequences( ...
# stdlib
d
Proposal.
Copy code
fun CharSequence.splitToSequences(
    vararg delimiters: String,
    ignoreCase: Boolean = false,
    limit: Int = 0
): Sequence<CharSequence>
2
h
yeah, i really want there to be a
String::split
method with all the params that
Collection::joinToString
has.
d
The existing
splitToSequence
returns
String
instead of
CharSequence
. Which means more allocations. 😞
j
CharSequence is worth avoiding performance-wise. it's a meaningless marker interface that requires workarounds in k-n to avoid O(n) access
adapting mappedbytebuffer to charsequence would be a good use for it.
e
But why?
d
To be fair my current use case is quite specific. I have a large string of about 20,000 words separated by spaces and I want to count the number of occurrences of each word. I'm multi-threading. It's a bit rough to make large copies instead of slices.
j
i can think of at least a couple of libraries like suffix search trees that might find this applicable to Sequence<T> where the jvm implementations (and my kotlin port) vary between <T:Byte> and <T:Int>
e
I’ve played with it at some point, and, unfortunately, the version with sequences does not give any siginificant performance benefits.
j
nio ByteBuffer approximates c++ stl iterators for bytes where you get some tiny bit of hotspot love for a tight loop comparing values.
m
I'd like to see Iterators instead of Sequences 🤔
d
asIterator()
solves that.
m
no-no-no, creating a redundant wrapper is a responsibility of a person who wants it:
.toSequence()
e
We try to avoid public APIs that return single-shot entities like
Iterator
. They are error-prone to work with. That is why Kotlin collections do not have any operators on iterators.
Sequence
is multi-use, thus easier to reason about, since it has not intrinsic “state” (used/not-used) that a person using it has to keep in mind.
m
But
.constrainOnce()
makes `Sequence`s even more unsafe.
e
Yes. That is why I say “we try to”. There are some primitives that violate this rule, but they are limited. Same with
Flow
. Most of them multi-use with limited exceptions