How can I process a file line by line in Kotlin JS in a webp kotlinlang #javascript

How can I process a file line-by-line in Kotlin/JS...

Jonathan Lennox

07/09/2024, 8:14 PM

How can I process a file line-by-line in Kotlin/JS (in a webpage context) without reading the whole file into memory? I'm currently trying to start with a JS FileReader, though I'd be willing to use a different API if it worked.

Jonathan Lennox

07/09/2024, 8:16 PM

The browser is willing to load the whole file with

readAsArrayBuffer

(it wasn't with

readAsText

) but I crash with "oh, snap" (presumably out of memory) when I try to use

decodeToString

on it. Is there any way to get it as a

CharSequence

that I could use

splitToSequence

on? Or any other suggestions?

Edoardo Luppi

07/09/2024, 9:39 PM

You could

slice()

the

File

in multiple chunks.

Edoardo Luppi

07/09/2024, 9:39 PM

AFAIK that's the only clean way to do it.

Jonathan Lennox

07/09/2024, 9:41 PM

So to line-delimit I'd need to implement my own stdio-style buffering I guess. (With the additional fun of the slices not necessarily cleanly matching UTF-8 boundaries.)

Edoardo Luppi

07/09/2024, 9:45 PM

You can use

readAsText(chunk)

instead of

readAsArrayBuffer

to simplify it, and then somehow manage the EOL chars.

Edoardo Luppi

07/09/2024, 9:47 PM

Although... you're right that it might result in a broken string. I'm not sure how to solve this part

Edoardo Luppi

07/09/2024, 9:48 PM

Sorry I missed the

it wasn't with
readAsText

You mean it did not load the entire content? That's strange.

Jonathan Lennox

07/09/2024, 9:49 PM

It looks like the behavior with

readAsText

is to call the event callback with an empty string if it's too big for the memory.

Edoardo Luppi

07/09/2024, 9:49 PM

Damn how big is the file?

Jonathan Lennox

07/09/2024, 9:49 PM

500 MB

Jonathan Lennox

07/09/2024, 9:50 PM

(That's after filtering it, the original file was 3 GB. This is a stats viewer for stats from our server, and the particular thing I'm trying to view stats for lasted like 20 hours.)

Edoardo Luppi

07/09/2024, 9:50 PM

Quickly found https://stackoverflow.com/a/32753261/1392277 See the comment

I just noticed this with some rather large datasets: a 257MB file reads but a 459MB file returns an empty string, Chrome 49

Jonathan Lennox

07/09/2024, 9:51 PM

readAsArrayBuffer

loads the file fine (presumably it's just mmap'ing it or something internally) but my attempt to convert it to text fails with "oh snap"

Edoardo Luppi

07/09/2024, 9:52 PM

Probably fails for the same reason

readAsText

fails, although the latter doesn't crash the page

Jonathan Lennox

07/09/2024, 9:52 PM

Yeah, agreed

Edoardo Luppi

07/09/2024, 9:52 PM

So yeah, looks like you'll need custom logic, but to me it looks too messy

Jonathan Lennox

07/09/2024, 9:53 PM

I was hoping someone else had written it already, but I guess not. 🙂

Edoardo Luppi

07/09/2024, 9:54 PM

You could read the entire buffer, and then scan bytes to find line delimiters.

Jonathan Lennox

07/09/2024, 9:54 PM

Yeah, that's probably a cleaner way to avoid issues with partial UTF-8. (It's a bit ugly because the files can either have Unix or DOS line endings, but that's manageable.)

✔️ 1

Edoardo Luppi

07/09/2024, 9:55 PM

And what if the entire 500 MB text file is a single line? 👀

Jonathan Lennox

07/09/2024, 9:56 PM

Well, that would fail no matter what...unless we have a buffered JSON parser, but I don't think we do...and even then I'd need the memory for the whole parsed JSON

Edoardo Luppi

07/09/2024, 9:57 PM

Also, check out kotlinx-io UTF8 reader functions, which should accept byte arrays. Not sure if they'll work for your use case tho.

Jonathan Lennox

07/09/2024, 9:58 PM

Yeah, I was wondering if kotlinx-io had tools in general for this, but I couldn't find them

Edoardo Luppi

07/09/2024, 10:01 PM

You need to wrap your JS buffer into a kotlinx-io

Buffer

, and then you can use

Source.readString()

. Or you can even wrap your JS buffer into a custom

Source

implementation to avoid copying bytes.

Edoardo Luppi

07/09/2024, 10:04 PM

Ah yeah that one was failing

Edoardo Luppi

07/09/2024, 10:05 PM

Better if I go sleep, I'm forgetting stuff lol

Edoardo Luppi

07/09/2024, 10:10 PM

stdlib and kotlinx-io decoding implementations differs, so giving io a try looks ok.

turansky

07/10/2024, 9:37 AM

Do you have browser limitations?

31 Views

Open in Slack

Previous Next