Why don't Strings have a `size` property, just lik...
# random
k
Why don't Strings have a
size
property, just like Arrays and Collections, but they have a
length()
method instead?
t
I'm guessing it's mostly to match java. AFAIK, String is basically a CharArray. Array's have
length
. Collections such as ArrayList have
size
k
I know, but Java arrays have
length
and yet Kotlin arrays have
size
. So why not the same with strings?
t
TIL 😅
I wonder if they used size for a string you might expect to get the total size in bytes since a character like 🙂 takes up a different amount of space than
a
. Similar to how rust implements len, but I guess Rust chose len instead of size 🤔
k
Do you expect
"😜".size
to be 1? It's 2, because Kotlin (like Java) counts UTF-16 characters.
s
Any attempt to define the length/size of a string is going to be flawed. It's just not a well defined concept. So I don't really mind what Kotlin calls it 😄
👆 1
😂 1
t
Sorry that was a bad example. I more meant this: https://pl.kotl.in/DJGLW1N00
r
@Sam Why do you say that? I would expect the emoji size to just be 1, and disagree with Java's decision there. The low level encoding of a singular symbol does not seem relevant to what we consider it's length to be in the high level plane.
s
But there's no good definition for what constitutes a "single symbol". Many languages don't divide their script into individual symbols at all. Unicode can still represent these by combining various different strokes and marks. Devanagari contains some simple examples of this sort of thing, e.g.
नमस्ते
(namaste) which is 6 code points forming 4 alphabet characters which—as you'll see if you try to select/copy them—are actually rendered as 3 clusters/symbols. Even in languages with a latin alphabet, a single symbol isn't always well-defined. An
ë
, for example (e with umlaut), can be represented either by one unicode character (
U+00EB
) or two (
U+0065
,
U+0308
). Both are valid representations of the same text. If we store them as strings, do they have the same length, or different lengths? (Answer: who cares, String length is meaningless).
l
Agree here,
String
just shouldn't be
CharSequence
themselves. An explicit conversion like
chars()
Which returns a
CharSequence
is more sound IMO.
The representation for
String
is very complex that it shouldn't be as easy as a simple
get
directly. It will be a fault because of how Unicode behaves.
👍 1
s
💯 if String was a modern interface, it's hard to think of many functions/properties that it could actually contain. Outside of
toByteArray(Charset)
, most of its functions are actually leaking its implementation details 😞
l
Speaking of String, Char is a bit awkward as well because of surrogates. Emoji will break everyone's expectation.
s
Yeah, I would love to see the `char`/`Char` type in Java and Kotlin be deprecated, or at least relegated to an invisible implementation detail of some strings. User code needs to use
Int
to reliably represent a code point, though who knows if that'll even be future proof 😄