Kotlin Multiplatform lacks character set support. ...
# feed
e
Kotlin Multiplatform lacks character set support. I recently needed to get access to EBCDIC charsets under K/JS, so I went down the rabbit hole of studying how various providers generate them (e.g. OpenJDK, ICU4J). I ended up implementing https://github.com/lppedd/kotlinx-charset which turned out pretty great, tho it is still limited to EBCDIC. It would be cool to add other charsets like the ISO8859-* variants, or JIS X 0213, so contributions are welcomed (even to existing test cases!).
K 5
c
The prefix
kotlinx-
is usually reserved for official libraries, so maybe name it something else? Very nice project
👍 3
e
Thanks! I'll consider that, yeah. It would be a lot of work for me currently tho, so not a priority.
l
I ended up creating a compatibility layer that calls into ICU on native.
e
@loke In the end I just thought it was easier to output my own code, than trying to C-interop to ICU or iconv or whatever else. Supporting all platforms becomes straightforward.
k
Just to add to what @CLOVIS mentioned, one thing I learned from some OSS drama in the Rust ecosystem is to avoid naming libraries using others' trademarks. https://github.com/redis-rs/redis-rs/issues/1419
e
I'm not using a reserved namespace tho. Is kotlinx a trademarked name?
l
That's fair. My biggest problem right now is that my JS backend is less capable than the others, because JS and Unicode are not great friends. And since I have working backends for everything else, I don't feel like writing a unicode data parser myself.
e
@loke btw, I've also observed differences in the way JDK charsets behave compared to ICU ones. A very common example is unmappable code points: ICU most often uses different byte sequences as replacement compared to the JDK. So there is also the question of consistency, do I keep the behavior the same as what the JDK does, or are inconsistencies ok between platforms?
l
That's a great question. I have no answer for you.
e
Still an open question for me, although I went down the route of sticking to how the JDK behaves, so that when I debug on the JVM I know the result will be exactly the same on the other platforms.
k
Is kotlinx a trademarked name?
I am not a lawyer and don’t really know anything about trademark law, but Kotlin (not kotlinx) is trademarked in the US. I’m not sure if that also implicitly includes
kotlinx
. https://tsdr.uspto.gov/#caseNumber=86082774&caseSearchType=US_APPLICATION&caseType=DEFAULT&searchType=statusSearch
e
I don't think it applies. There are other non-official libraries with a kotlinx prefix on the internet. If they get to me asking to change it, I'll most likely do it, otherwise I won't go through the effort of renaming and redeploying honestly.
e
did you find https://github.com/fleeksoft/fleeksoft-io/blob/main/CharsetsReadme.md lacking in some way? it looks like it contains EBCDIC despite not being in the documentation https://github.com/fleeksoft/fleeksoft-io/issues/5
e
I wasn't aware of that to be honest. I needed IBM1390 and IBM1399, which are stateful extended double byte EBCDIC, and couldn't find them anywhere. Looking at the sources, they look like a direct translation from the JDK sources to Kotlin, or are they generated? I generate sources from .map/.nr/.c2b files (and generate those from .ucm files if they're missing).
Note, another reason for a complete reimplementation for me was the actual understanding of Unicode planes and how the transcoding process works. I wasn't comfortable in just translating existing code to Kotlin, as there is also a big test infrastructure behind them that is difficult to port.
c
For the naming thing, in the Kotlin ecosystem it's actually a lot less of a problem than in Rust, because Maven coordinates have a ‘group’ section and ideally this library isn't in the package ‘kotlinx.’
💯 1
k
Good point!
e
The publishing process was actually quite interesting for me. Up to this point I had only deployed to private company repos. Have to admit it took a while to realize it's not possible to verify com.github.* namespaces anymore lol. In the end I re-used my own domain name.
c
Yeah Central is a mess
k
Oh, interesting. I publish under
io.github.kevincianfarini
. I need to register a domain for my libraries 😕
I need to do that before some of them go 1.0.0 anyway
e
I was going down the route of
io.github
too since the verification is automatic, but given I already had a personal domain I just tried using that one.
c
Note that the package doesn't have to match the maven group
👍 1
✔️ 1
Still, better keep something similar, it's confusing otherwise
k
Yup. I’m working towards API stability in Cardiologist so might as well plop this on the 1.0.0 milestone
e
I'm not sure what happens if for some reason my domain expires. I'm not gonna pay a thousand dollars to keep it up lol
e
I thought OSSRH allowed com.github.username as long as you can prove that you own username by creating specific named repos within there
k
I’m going to buy the cheapest crappiest domain for like $2/month
e
It's gone since mid 2021 😞
k
I thought OSSRH allowed com.github.username as long as you can prove that you own username by creating specific named repos within there
It’s gone since mid 2021
This doesn’t sound right. I claimed
io.github.kevincianfarini
in 2023
e
OSSRH is deprecated AFAIU. You could open an issue in Sonatype's Jira to manually verify, but that's not possible anymore. GitHub removed the possibility of using their com.github prefix too.
You can claim io.github, but not com.github
👍 1
k
This knowledge has all atrophied for me
I set it up once and immediately forgot all of it
e
I also have two Sonatype accounts now, one registered through GitHub, and the other via email. Which was strange because both the account refer to the same email. A bit of a mess.
🫠 1
I’m going to buy the cheapest crappiest domain for like $2/month
Mine was 5 euros the first year, now it's 15. So, you never know what happens!
c
$2/month is quite expensive
I think mine are ~12€/year
e
Yeah my figures are per year too obviously!
c
When you buy they must give you the cost for the next year too
e
Not sure there was any indication of that. I'm using Namecheap as domain registrar. Will have to check.
c
I use Namecheap too, if the second year is more expensive it will tell you
gratitude thank you 1
e
anyhow back to the naming point: even if Maven artifacts have a two-level namespace, the JARs themselves often get copied into flat directories in applications. so you should still strive to use a distinct prefix to prevent issues in downstream consumers