I need to escape a path containing non-ascii chara...
# getting-started
s
I need to escape a path containing non-ascii characters (
Unexpected char 0xfc at 11 in Dropbox-API-Arg value: {"path":"/süße vögelchen/photo_9.jpg"}
). In the forums they say this:
For these calls with the parameters in the header, you need to escape these characters. That is, when you use the “Dropbox-API-Arg” header, you need to make it “HTTP header safe”. This means using JSON-style “\uXXXX” escape codes for the character 0x7F and all non-ASCII characters.
Is there anything already build-in to achieve that? Maybe in Ktor or kotlinx-serialization? How would KMP code look like to do this?
1
ChatGPT gives me this JVM only code:
Copy code
fun escapeString(input: String): String {
    val sb = StringBuilder()

    for (char in input) {
        when {
            char == 0x7F.toChar() || char.toInt() > 127 -> {
                // Escape non-ASCII characters using JSON-style "\uXXXX" escape codes
                sb.append("\\u").append(String.format("%04x", char.toInt()))
            }
            else -> sb.append(char)
        }
    }

    return sb.toString()
}
s
The choice of encoding for header values is really implementation-specific, so I don't think there will be a built-in method that works for exactly this use-case.
JSON strings only need to encode invisible/control characters, so running it through a JSON encoder wouldn't cover all the characters you want to encode.
Technically control characters are the only ones strictly prohibited from HTTP header values too, but implementations are so inconsistent that sticking to plain ASCII is generally safest.
s
so running it through a JSON encoder wouldn't cover all the characters you want to encode.
Yes, it already runs through kotlinx-serialization, but that doesn't change umlauts. And there seems to be no option to do so.
s
yeah, there would be no need in ordinary JSON encoding
s
Okay, understood. So OkHttp should provide it. At least it's checking for that. There comes the error from.
s
Well, OkHttp may check for the presence of non-ASCII characters, but that doesn't mean it will provide a way to encode them to ASCII
since the specific encoding method is depending on the API you're using
The ChatGPT suggestion could be reasonably easily modified to pure Kotlin, but it is somewhat incomplete—it fails for non-BMP characters
To write a correct implementation you'd need to iterate the Unicode code points, rather than the 2-byte Java chars, and I don't think pure Kotlin has a way to do that 😞
s
Damn 😕
ChatGPTs suggestion should be better than the current state
s
If you're happy with the limitation, you can probably do something like this:
Copy code
fun escapeString(input: String): String {
  val sb = StringBuilder()
  for (char in input) {
    if (char.code in 32..127) {
      sb.append(char)
    } else {
      sb.append("\\u" + char.code.toHexString().takeLast(4))
    }
  }
  return sb.toString()
}
🙏 1
It will probably fail for characters outside the BMP, unless the receiving server has the same bug in its code 😂 (which is actually not that unlikely, people suck at this stuff)
s
Indeed, encoding is hard and especially americans tend to forget about the rest of the world 😄
In Germany we only have a few special chars (like our umlauts ä,ö,ü) that also have common representations like ae, oe & ue. So it's usually not to bad. But I know that other countries go crazy with special chars. 😄
I think of France
Your code work! Thank you! 🙂 🙏
s
Glad I could help!
🙏 1
r
You could make that slightly more idiomatic by using the
buildString
function:
Copy code
fun escapeString(input: String) = buildString {
  for (char in input) {
    if (char.code in 32..<127) {
      append(char)
    } else {
      append("\\u" + char.code.toHexString().takeLast(4))
    }
  }
}
Or, if you're feeling really spicy:
Copy code
fun String.escape() = buildString {
  for (char in this@escape) {
    if (char.code in 32..<127) {
      append(char)
    } else {
      append("\\u" + char.code.toHexString().takeLast(4))
    }
  }
}
(also note I changed it to
32..<127
to match the error, which says
0x7f
needs to be converted as well)
👍 1
🙏 1