What's the best way to read a local file with rand...
# io
h
What's the best way to read a local file with random access? Use-case, I have a parquet file that contains the metadata at the end of the file... So I need to read the last 8 bytes to get the start offset of the metadata footer, and then read the file starting from that offset to parse the metadata. I currently have a RawSource, so to "start" reading at a specific offset, I want to use
rawSource.readAtMostTo()
, but this api requires a
Buffer
, not a
RawSink
that can be ignored by an
discardingSink
. (I can use
discardingSink().buffered().buffer
though, but it is an internal api.)
f
There's no good way to read files from a random offset, unfortunately. If a file is small enough to fit into a memory (which may not be the case with parquet files, I guess), you can open a source, get a buffered wrapper over it (
.buffered()
) and then use a peek source for every read. Something like:
Copy code
SystemFileSystem.source(...).buffered().use { src ->
   val startOffset = src.peek().buffered().use { p ->
      p.skip(fileSize - 8)
      p.readLong() // or int?
   }
   src.peek().buffered().use { p ->
      p.skip(startOffset)
      p.readMetadataHeader()
   }
}
(I can use
discardingSink().buffered().buffer
though, but it is an internal api.)
It's the same as creating a
Buffer
manually and then writing into it: there's no hidden connection between the discarding sink and a buffer, so nothing will be discarded.
h
and calling
buffer()
is cheap? I thought it will load the whole file in memory (not once, but each segment over and over).
f
Yes, it'll load the whole file eventually. That's why I mentioned
If a file is small enough to fit into a memory
kodee sad
k
Is there any plan to add support mmap? That would partially address this issue.
f
There is, yes. Here's a corresponding feature request: https://github.com/Kotlin/kotlinx-io/issues/397
blob ty sign 1
k
Neat!