https://kotlinlang.org logo
#random
Title
# random
k

Klitos Kyriacou

03/21/2024, 6:05 PM
Why don't Strings have a
size
property, just like Arrays and Collections, but they have a
length()
method instead?
t

Todd

03/21/2024, 6:13 PM
I'm guessing it's mostly to match java. AFAIK, String is basically a CharArray. Array's have
length
. Collections such as ArrayList have
size
k

Klitos Kyriacou

03/21/2024, 6:16 PM
I know, but Java arrays have
length
and yet Kotlin arrays have
size
. So why not the same with strings?
t

Todd

03/21/2024, 6:23 PM
TIL 😅
I wonder if they used size for a string you might expect to get the total size in bytes since a character like 🙂 takes up a different amount of space than
a
. Similar to how rust implements len, but I guess Rust chose len instead of size 🤔
k

Klitos Kyriacou

03/21/2024, 6:33 PM
Do you expect
"😜".size
to be 1? It's 2, because Kotlin (like Java) counts UTF-16 characters.
s

Sam

03/21/2024, 6:35 PM
Any attempt to define the length/size of a string is going to be flawed. It's just not a well defined concept. So I don't really mind what Kotlin calls it 😄
👆 1
😂 1
t

Todd

03/21/2024, 7:01 PM
Sorry that was a bad example. I more meant this: https://pl.kotl.in/DJGLW1N00
r

Ray Rahke

03/22/2024, 7:05 AM
@Sam Why do you say that? I would expect the emoji size to just be 1, and disagree with Java's decision there. The low level encoding of a singular symbol does not seem relevant to what we consider it's length to be in the high level plane.
s

Sam

03/22/2024, 7:28 AM
But there's no good definition for what constitutes a "single symbol". Many languages don't divide their script into individual symbols at all. Unicode can still represent these by combining various different strokes and marks. Devanagari contains some simple examples of this sort of thing, e.g.
नमस्ते
(namaste) which is 6 code points forming 4 alphabet characters which—as you'll see if you try to select/copy them—are actually rendered as 3 clusters/symbols. Even in languages with a latin alphabet, a single symbol isn't always well-defined. An
ë
, for example (e with umlaut), can be represented either by one unicode character (
U+00EB
) or two (
U+0065
,
U+0308
). Both are valid representations of the same text. If we store them as strings, do they have the same length, or different lengths? (Answer: who cares, String length is meaningless).
l

Loney Chou

03/22/2024, 8:04 AM
Agree here,
String
just shouldn't be
CharSequence
themselves. An explicit conversion like
chars()
Which returns a
CharSequence
is more sound IMO.
The representation for
String
is very complex that it shouldn't be as easy as a simple
get
directly. It will be a fault because of how Unicode behaves.
👍 1
s

Sam

03/22/2024, 8:09 AM
💯 if String was a modern interface, it's hard to think of many functions/properties that it could actually contain. Outside of
toByteArray(Charset)
, most of its functions are actually leaking its implementation details 😞
l

Loney Chou

03/22/2024, 8:11 AM
Speaking of String, Char is a bit awkward as well because of surrogates. Emoji will break everyone's expectation.
s

Sam

03/22/2024, 8:15 AM
Yeah, I would love to see the `char`/`Char` type in Java and Kotlin be deprecated, or at least relegated to an invisible implementation detail of some strings. User code needs to use
Int
to reliably represent a code point, though who knows if that'll even be future proof 😄