Is it possible to count graphemes instead of chara...
# announcements
m
Is it possible to count graphemes instead of characters in pure kotlin (one that works in both jvm and native)? I'm looking for something like this but for pure kotlin: https://stackoverflow.com/a/29743275/1118475. E.g. an emoji in utf16 will be counted as two characters with 
emojiText.length
 and I would like a method which returns 
1
.
e
ICU's BreakIterator can do that, although to use it from Kotlin's MPP you'd have to manually wrap the Java and C++ libraries manually
m
thanks for responding @ephemient wrapping c++ sounds like something I would awfully struggle with. maybe I don't need it at all, wdyt? it's a follow up question to: https://kotlinlang.slack.com/archives/C3PQML5NU/p1627398784081900
c
I would guess there’s no pure-kotlin solution for it, but there may be wrapper libraries out there. but I’m not aware of any off the top of my head Anything dealing with unicode you’d want to delegate to the platform as much as possible, because it’s so complex and changes all the time. There are standard facilities for this in Java, and I would guess the Native stdlib includes something in there as well. So an actual/expect calling to platform libraries is probably going to be the easiest route
e
Kotlin can use JVM methods to access codepoints, but that is still different than graphemes (e.g. composing characters, ZWJ emoji, changes every Unicode release). If you want graphemes you will need some external library. If you want codepoints, that is easier.
Swift can access a string by utf8, utf16, utf32, Unicode scalars, or extended grapheme clusters (the last of which is represented by Character)
m
I get it, no shortcuts then, I should delegate to platform. Didn't realise it's so complicated and dynamic. Thanks guys, will update when I have solution
c
The specification is a 14Mb, 1030 page document which gets updated every year. So yeah, it’s pretty complex 😉 https://www.unicode.org/versions/Unicode13.0.0/UnicodeStandard-13.0.pdf
🤯 3
278 Views