Hi, there! I was converting some Swift code into K...
# language-proposals
b
Hi, there! I was converting some Swift code into Kotlin, and it feels weird, but AFAIK there isn't any way to deal with utf-8 strings + ranges, specially when there are emojis (or other special utf-8 chars). Example:
Copy code
val str = "a😀cde"
    val start = str.indexOf(str.first()) + 2
    val end = str.indexOf(str.last()) - 2
    val range = IntRange(start, end)
    println(str.substring(range)) // "should output 😀c, but will output ?c"
Swift makes use of a Range<String.Index>, instead of Range<Int> (IntRange). The String Index is calculated based on a particular string so that it knows if there are any emoji or extended grapheme clusters. Don't you think would be nice to see a StringRange kind of thing on Kotlin? Source: https://stackoverflow.com/a/35193481/4418073
g
JVM works with UTF-8 and UTF-16 properly, but emojis outside of Basic Multilingual Plane of UTF and encoded with 2 UTF-16 codes, so your calculations don’t work for this case
Do you have some real-life example of code where it would be useful to have some additional abstraction over string to handle such cases? Because your example of course not correct, when you just add and subtract indexes
It’s also not a problem of range or substring itself, it’s how strings are implemented on JVM
But there are some tools that can help you handle this even for your case
check code points API
Copy code
val str = "a😀cde"
str.length // 6
str.codePoints().count() // 5
b
Requires Java 8 😭, but better than nothing.. Working with emojis on Kotlin has been very frustrating. I was hoping there were something that could work on both Java + JS, but apparently not. Thanks for the answers.
g
only codePoints() requires java 9, it’s abstraction that uses Streams API
other parts of code points API is there, nothing that not available on java 6
But require some additional code
Could you show some real use case, maybe it can be solved without code points at all
About JS is also much more difficult, check for example this https://blog.jonnew.com/posts/poo-dot-length-equals-two
In general, I wouldn’t just recommend to use unsafe arithmetics for substrings, for many use cases you can avoid it
Just found that kotlin has extension function to count code points:
Copy code
val str = "a😀cde"
str.codePointCount(0, str.length) // 5
b
A few weeks ago, I wanted the user to type a single "glyph", letter or emoji, i.e.: a single "visible char". If I limited string size to 1 (obviously) emojis couldn't be shown, 2 would show two standard text chars. I ended without an easy solution and went to sleep. My current problem now, however, is trying to make a code editor in Kotlin (specially for Android, but having it eventually on JS would be awesome). There is a code editor in Swift which is very nice (https://github.com/louisdh/savannakit) but has so many ranges, with use cases like: calculating the paragraph number based on a range of text, and generating regex tokens (kind of a compiler)
Glad to know that, I've never seen it, great extension!! I think I can make it solve most of my issues.
g
I wanted the user to type a single “glyph”, letter or emoji
It’s actually not so simple problem. For example some emojis is combination of multiple code points, for example emoji Family is combination of multiple code points, that rendered by font renderer in a specific way
b
Oh crap..
g
And even Idea doesn’t render it as a single character: đŸ‘Ș. But still I’m not really understand use case when you should know length of string how it rendered (one or multiple characters), it’s just matter of rendering
calculating the paragraph number based on a range of text, and generating regex tokens
I don’t see why do you need to know how emojis rendered by UI to solve those problems
b
I just discovered the same.. Well, I think that is it. If I have any super weird use case I'll let you know, but I think that was probably enough for today.
d
BreakIterator
will probably accomplish what you are looking for -
BreakIterator.getCharacterInstance
will return an instance that allows you to loop over and count the number of graphemes
➕ 1