What's the best way to read a local file with rand...
# io
h
What's the best way to read a local file with random access? Use-case, I have a parquet file that contains the metadata at the end of the file... So I need to read the last 8 bytes to get the start offset of the metadata footer, and then read the file starting from that offset to parse the metadata. I currently have a RawSource, so to "start" reading at a specific offset, I want to use
rawSource.readAtMostTo()
, but this api requires a
Buffer
, not a
RawSink
that can be ignored by an
discardingSink
. (I can use
discardingSink().buffered().buffer
though, but it is an internal api.)
f
There's no good way to read files from a random offset, unfortunately. If a file is small enough to fit into a memory (which may not be the case with parquet files, I guess), you can open a source, get a buffered wrapper over it (
.buffered()
) and then use a peek source for every read. Something like:
Copy code
SystemFileSystem.source(...).buffered().use { src ->
   val startOffset = src.peek().buffered().use { p ->
      p.skip(fileSize - 8)
      p.readLong() // or int?
   }
   src.peek().buffered().use { p ->
      p.skip(startOffset)
      p.readMetadataHeader()
   }
}
(I can use
discardingSink().buffered().buffer
though, but it is an internal api.)
It's the same as creating a
Buffer
manually and then writing into it: there's no hidden connection between the discarding sink and a buffer, so nothing will be discarded.
h
and calling
buffer()
is cheap? I thought it will load the whole file in memory (not once, but each segment over and over).
f
Yes, it'll load the whole file eventually. That's why I mentioned
If a file is small enough to fit into a memory
kodee sad
k
Is there any plan to add support mmap? That would partially address this issue.
f
There is, yes. Here's a corresponding feature request: https://github.com/Kotlin/kotlinx-io/issues/397
blob ty sign 1
k
Neat!
c
Hey @Filipp Zhinkin can you confirm that the only time file is loaded (in segment) into memory in the example you shared is when
skip()
is called
peek()
and
buffered()
just wraps the source?
f
@Cherrio LLC, hey! Yes, that's correct. Sorry for the late reply.
K 1