Is there any special attention that should be paid...
# io
z
Is there any special attention that should be paid to loading large files into the browser with
Buffer
? (200MB+)
What I'm seeing is that, in JVM, I can read the file in about 2.5 seconds. In the browser, running the same verbatim code, takes about 40 seconds, and has a much higher/faster memory consumption.
It actually might be deserializing inline classes in JS that's taking so much time
f
Could you please clarify how do you load files? Initially, I though about reading files using
SystemFileSystem
and for JS it's indeed tremendously slow and could be improved. But the
SystemFileSystem
is not supported for browser.
z
Got a couple parts at play. I have a Kotlin Multiplatform JS project using the React Wrapper. This project has a React component, which boils down to an
<input type="file">
. It has an
onChange
listener that attaches a
FileReader
. The
FileReader
has an
onLoad
event that reads the file as an
ArrayBuffer
. After that, I make an
Int8Array(ArrayBuffer)
to get a plain
ByteArray
. Then, as a sanity check, I actually iterate through the 249MB file, and
xor
every byte with the last. All of that is the fast part.
Then I convert that
ByteArray
to a
ByteString
, and perform the same
xor
test. Again, still pretty fast.
But then for some reason, when I go to deserialize ~4 million 4-byte objects from this
ByteString
, it takes about 80 seconds. Iterating over every byte, milliseconds. It's a custom format, so I'm sure it's something I'm doing, but running the same code on JVM is like, immediately faster in all ways.
In the browser, I get an output like this.
4387994, 50494, 869688, 3467813, 1m 20.358s, 457ms, 6.555s, 5.44s, 22.593s, 0s, 0s
• 4,387,994 4-byte inline strings • 50,494 4-byte inline ints • 869,688 different 4-byte inline ints • 3,467,813 2-byte inline shorts • 1m20s total time to read 4,387,994 (first) • 6.555s total time to read 869,688 (third) • 5.44s total time to read a different group of inline ints • 22.593s total time to read 3,467,813 (fourth)
On the JVM, the results are
4387994, 50494, 869688, 3467813, 1.650348681s, 11.886447ms, 172.236873ms, 304.934099ms, 677.563268ms, 30.814us, 3.202us
According to the profiler in the browser, most of the time is being spent in Major GC, or
TypedArraySpeciesConstructor
, coming from
source.readTo
calls. Gonna try making an implementation that doesn't depend on
readTo
f
Maybe you could share a snippet of code where you're experiencing the performance problem? Unfortunately, I'm struggling with reconstructing it from the description 😿
I don't know if it exists already, but what would be nice is a re-readable Buffer over a fixed ByteString. I didn't want to really make the little-endian code, especially once I found out the
.toInt
methods for the primitives fill the remaining upper bits with the sign bits. Made
OR
ing the values require a mask to get the correct values. Additionally, I still needed the stream-like interaction, reading from the first bit to the last in order. But I had to effectively split up this very long ByteString into smaller chunks at certain points, and
readTo
would make a copy. So I tried to figure something out without using
readTo
or use a copy.