Hi <@U053Y96LR9U>! I ping you because you seem to ...
# io
d
Hi @Filipp Zhinkin! I ping you because you seem to be the most involved in
kotlinx-io
reincarnation. I would like to write a
kotlinx-io
adapter for `flatbuffers-kotlin`'
ReadWriteBuffer
. To do that I would need random-access read and write to buffers. Is this something that is on the roadmap? I stumbled upon this issue but it was created for the previous incarnation of
kotlinx-io
. To clarify my use-case: •
flatbuffers-kotlin
's
ReadBuffer
interface has a
limit
property indicating the read limit and only offers random-access reads (up to
limit
). •
flatbuffers-kotlin
's
ReadWriteBuffer
interface extends
ReadBuffer
, has a
writePosition
(equals
limit
) property, a
writeLimit
property and a
capacity
property, and offers both sequential writes (updating
writePosition
adequately) and random-access writes (up to
capacity
). • All read and write operations are in Little Endian byte order.
f
Hi @Didier Villevalois! Currently,
kotlinx.io.Buffer
provides API for random-access reads. There's a plan to augment that API with something similar to Okio's UnsafeCursor (but a bit safer) as a more performance alternative to
Buffer::get
. There were no plans to provide random-access writes, but it could be reconsidered if there's a demand for it. Could you please elaborate a bit on what
kotlinx-io
adapters for FlatBuffers would look like? I see a few other potential issues there besides missing random-access writes.
d
@Filipp Zhinkin, thanks a lot for your reply. Indeed, I missed
Buffer::get
as I was looking for
getXXX
versions for all
XXX
primitive types (or rather the
Le
-suffixed versions). So I will have to implement those first. Do you think I could contribute those as extension functions directly in
kotlinx-io
? (If I understand correctly, they would have to be rewritten to use unsafe cursor later to only
seek
once...) As for the what the adapters would look like, I was naively thinking of something like
Copy code
fun Source.asFlatbuffersReadBuffer(): ReadBuffer
fun Sink.asFlatbuffersReadWriteBuffer(): ReadWriteBuffer
where
ReadBuffer
and
ReadWriteBuffer
are the two interfaces defined at the top of https://github.com/google/flatbuffers/blob/master/kotlin/flatbuffers-kotlin/src/commonMain/kotlin/com/google/flatbuffers/kotlin/Buffers.kt. Do you think I am on the right track? What potential issues do you see? Thanks for your help.
f
Do you think I am on the right track?
Sink
and
Source
are more like
Input-
and
OutputStream
in Java, so
Buffer
would be a better counterpart for
Read-
and
ReadWriteBuffer
. Although, the extensions you've mentioned could simply wrap Sink's and Source's internal buffer.
What potential issues do you see?
One of the main issues with
Buffer::get
is its poor performance: every
get
call requires segment lookup. And it's hard to get rid of that lookup as indices are relative to the beginning of a buffer and it advances on every read. The latter may also be an issue for a Sink/Source wrapped into a ReadBuffer/ReadWriteBuffer: I don't think that someone will actually use a Source and ReadBuffer-view simultaneously, but every read from the Source will invalidate corresponding ReadBuffer (and the same is true for Sinks and ReadWriteBuffers). For Sinks it could also be problematic to implement
requestCapacity
as the only way to extend Sink's buffer is to write data into it. And yet another possible issue from the top of my head - a Source could be infinite, or more realistically, it may not have a predefined size (for example, when a source reads data from a network). It's unclear how
ReadBuffer::limit
should work in that case. I don't have a lot of experience with flatbuffers and don't see the full picture, so what I'd propose may sound silly, but instead of representing Sink/Source/Buffer as Read/ReadWriteBuffers and dealing with their semantic differences, it may be better to support reading
ReadBuffer
from
Source
and writing it and
ReadWriteBuffer
to
Sink
. The main issue here would be array copying, but it is something that could be addressed (relatively easy) in the near future.
Indeed, I missed
Buffer::get
as I was looking for
getXXX
versions for all
XXX
primitive types (or rather the
Le
-suffixed versions). So I will have to implement those first. Do you think I could contribute those as extension functions directly in
kotlinx-io
? (If I understand correctly, they would have to be rewritten to use unsafe cursor later to only
seek
once...)
One of the reasons there are no such functions is the poor performance of random-access reads from
Buffer
, which makes
Buffer::get
almost impractical.
a
I thought that discussion looks familiar. My solution to my own issue is here: https://github.com/SciProgCentre/dataforge-core/blob/kotlinx-io/dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/Binary.kt (this is implementation on top of new kotlinx-io API. It would be nice to have it in the core API because it would allow to optimize it for zero-copy use in random-access sources like files.
d
@Filipp Zhinkin thank you very much for this detailed answer and sorry for the late reply.
One of the reasons there are no such functions is the poor performance of random-access reads from
Buffer
, which makes
Buffer::get
almost impractical.
I understand and looking at the implementation details, this makes total sense.
[...] it may be better to support reading
ReadBuffer
from
Source
and writing it and
ReadWriteBuffer
to
Sink
.
I see. So let me focus on reading
ReadBuffer
from
Source
. I tried to experiment adding read-only views with random-access read from
Source
. Would this make sense to add the following to
kotlinx-io
? • Defining a
BufferView
(bad tentative name) with random-access. • In order, to support views larger than a buffer's segment size, we would make a segment index array and use a binary search in that index. This would give something like
O(log n)
index indirection (where
n
is the number of segments). • "Reading" a
BufferView
from a
Source
, by creating a private buffer and wrapping it in a
BufferView
. I toyed with this idea in the following branch: https://github.com/ptitjes/kotlinx-io/tree/read-random-access. Do you think it makes sense? Could a similar thing be added to
kotlinx-io
? (Btw, this could use the API you experimented in the
segments-iter
branch.)
f
@Didier Villevalois nice idea! In general, I like it a lot. What concerns me a bit is that there will be no way to invalidate a view once it was created, a user read data from the buffer (for some reason) and attempted to use the same view again. Unlikely someone will intentionally write a code like that, but for a library, we have to anticipate such cases.
(Btw, this could use the API you experimented in the
segments-iter
branch.)
It's for https://github.com/Kotlin/kotlinx-io/issues/135 which aimed to provide a public API not only for iterating over segments, but also for accessing individual bytes within the segment.
I toyed with this idea in the following branch: https://github.com/ptitjes/kotlinx-io/tree/read-random-access. Do you think it makes sense? Could a similar thing be added to
kotlinx-io
?
Do you mind opening an issue on github to loop in other potentially interested folks into a discussion? I think such an API might be useful, but it might make sense to wait until Buffers/Segments API update I mentioned above is complete.
d
What concerns me a bit is that there will be no way to invalidate a view once it was created, a user read data from the buffer (for some reason) and attempted to use the same view again. Unlikely someone will intentionally write a code like that, but for a library, we have to anticipate such cases.
I am not sure that I follow your reasoning, as the buffer instance is fully private. But it is true that I did not include a way to release the segments as I was focusing on the random-access. I expanded upon this on my branch: • made
BufferView
implement
AutoCloseableAlias
, by clearing the buffer, • added a
BufferView.slice(startIndex: Long, endIndex: Long): BufferView
that uses
Buffer.copyTo(...)
(and hence sharedCopies the segments), • added a
Buffer.copyToView(startIndex: Long, endIndex: Long): BufferView
convenience method that also uses
Buffer.copyTo(...)
• factored out byte reinterpretation operations (to avoid duplicated logic between
Buffer
and
BufferView
) • added `TODO`s referencing https://github.com/Kotlin/kotlinx-io/issues/135, where appropriate
Do you mind opening an issue on github to loop in other potentially interested folks into a discussion?
I created this issue: https://github.com/Kotlin/kotlinx-io/issues/225
I think such an API might be useful, but it might make sense to wait until Buffers/Segments API update I mentioned above is complete.
Indeed. Would you need some help on #135? I would be glad to contribute to this too. (But I don't want to put any more pressure on you! :P)
f
I created this issue: https://github.com/Kotlin/kotlinx-io/issues/225
Thanks!
Indeed. Would you need some help on #135? I would be glad to contribute to this too. (But I don't want to put any more pressure on you! :P)
The task is currently in a design phase, I'll publish an update as soon as it is more or less ready for public review along with a prototype implementation. Will appreciate any suggestions as well as reviews a lot then!
👍 1