Why don t Strings have a `size` property just like Arrays an kotlinlang #random

Why don't Strings have a `size` property, just lik...

Klitos Kyriacou

03/21/2024, 6:05 PM

Why don't Strings have a

size

property, just like Arrays and Collections, but they have a

length()

method instead?

Todd

03/21/2024, 6:13 PM

I'm guessing it's mostly to match java. AFAIK, String is basically a CharArray. Array's have

length

. Collections such as ArrayList have

size

Klitos Kyriacou

03/21/2024, 6:16 PM

I know, but Java arrays have

length

and yet Kotlin arrays have

size

. So why not the same with strings?

Todd

03/21/2024, 6:23 PM

TIL 😅

Todd

03/21/2024, 6:24 PM

I wonder if they used size for a string you might expect to get the total size in bytes since a character like 🙂 takes up a different amount of space than

. Similar to how rust implements len, but I guess Rust chose len instead of size 🤔

Klitos Kyriacou

03/21/2024, 6:33 PM

Do you expect

"😜".size

to be 1? It's 2, because Kotlin (like Java) counts UTF-16 characters.

Sam

03/21/2024, 6:35 PM

Any attempt to define the length/size of a string is going to be flawed. It's just not a well defined concept. So I don't really mind what Kotlin calls it 😄

👆 1

😂 1

Todd

03/21/2024, 7:01 PM

Sorry that was a bad example. I more meant this: https://pl.kotl.in/DJGLW1N00

Ray Rahke

03/22/2024, 7:05 AM

@Sam Why do you say that? I would expect the emoji size to just be 1, and disagree with Java's decision there. The low level encoding of a singular symbol does not seem relevant to what we consider it's length to be in the high level plane.

Sam

03/22/2024, 7:28 AM

But there's no good definition for what constitutes a "single symbol". Many languages don't divide their script into individual symbols at all. Unicode can still represent these by combining various different strokes and marks. Devanagari contains some simple examples of this sort of thing, e.g.

नमस्ते

(namaste) which is 6 code points forming 4 alphabet characters which—as you'll see if you try to select/copy them—are actually rendered as 3 clusters/symbols. Even in languages with a latin alphabet, a single symbol isn't always well-defined. An

ë

, for example (e with umlaut), can be represented either by one unicode character (

U+00EB

) or two (

U+0065

U+0308

). Both are valid representations of the same text. If we store them as strings, do they have the same length, or different lengths? (Answer: who cares, String length is meaningless).

Loney Chou

03/22/2024, 8:04 AM

Agree here,

String

just shouldn't be

CharSequence

themselves. An explicit conversion like

chars()

Which returns a

CharSequence

is more sound IMO.

Loney Chou

03/22/2024, 8:07 AM

The representation for

String

is very complex that it shouldn't be as easy as a simple

get

directly. It will be a fault because of how Unicode behaves.

👍 1

Sam

03/22/2024, 8:09 AM

💯 if String was a modern interface, it's hard to think of many functions/properties that it could actually contain. Outside of

toByteArray(Charset)

, most of its functions are actually leaking its implementation details 😞

Loney Chou

03/22/2024, 8:11 AM

Speaking of String, Char is a bit awkward as well because of surrogates. Emoji will break everyone's expectation.

Sam

03/22/2024, 8:15 AM

Yeah, I would love to see the `char`/`Char` type in Java and Kotlin be deprecated, or at least relegated to an invisible implementation detail of some strings. User code needs to use

Int

to reliably represent a code point, though who knows if that'll even be future proof 😄

18 Views

Open in Slack

Previous Next