Hi < Filipp Zhinkin> I ping you because you seem to be the m kotlinlang #io

Hi <@U053Y96LR9U>! I ping you because you seem to ...

Didier Villevalois

09/11/2023, 10:47 AM

Hi @Filipp Zhinkin! I ping you because you seem to be the most involved in

kotlinx-io

reincarnation. I would like to write a

kotlinx-io

adapter for `flatbuffers-kotlin`'

ReadWriteBuffer

. To do that I would need random-access read and write to buffers. Is this something that is on the roadmap? I stumbled upon this issue but it was created for the previous incarnation of

kotlinx-io

. To clarify my use-case: •

flatbuffers-kotlin

ReadBuffer

interface has a

limit

property indicating the read limit and only offers random-access reads (up to

limit

). •

flatbuffers-kotlin

ReadWriteBuffer

interface extends

ReadBuffer

, has a

writePosition

(equals

limit

) property, a

writeLimit

property and a

capacity

property, and offers both sequential writes (updating

writePosition

adequately) and random-access writes (up to

capacity

). • All read and write operations are in Little Endian byte order.

Filipp Zhinkin

09/11/2023, 11:27 AM

Hi @Didier Villevalois! Currently,

kotlinx.io.Buffer

provides API for random-access reads. There's a plan to augment that API with something similar to Okio's UnsafeCursor (but a bit safer) as a more performance alternative to

Buffer::get

. There were no plans to provide random-access writes, but it could be reconsidered if there's a demand for it. Could you please elaborate a bit on what

kotlinx-io

adapters for FlatBuffers would look like? I see a few other potential issues there besides missing random-access writes.

Didier Villevalois

09/11/2023, 11:45 AM

@Filipp Zhinkin, thanks a lot for your reply. Indeed, I missed

Buffer::get

as I was looking for

getXXX

versions for all

XXX

primitive types (or rather the

Le

-suffixed versions). So I will have to implement those first. Do you think I could contribute those as extension functions directly in

kotlinx-io

? (If I understand correctly, they would have to be rewritten to use unsafe cursor later to only

seek

once...) As for the what the adapters would look like, I was naively thinking of something like

Copy code

fun Source.asFlatbuffersReadBuffer(): ReadBuffer
fun Sink.asFlatbuffersReadWriteBuffer(): ReadWriteBuffer

where

ReadBuffer

and

ReadWriteBuffer

are the two interfaces defined at the top of https://github.com/google/flatbuffers/blob/master/kotlin/flatbuffers-kotlin/src/commonMain/kotlin/com/google/flatbuffers/kotlin/Buffers.kt. Do you think I am on the right track? What potential issues do you see? Thanks for your help.

Filipp Zhinkin

09/11/2023, 1:08 PM

Do you think I am on the right track?

Sink

and

Source

are more like

Input-

and

OutputStream

in Java, so

Buffer

would be a better counterpart for

Read-

and

ReadWriteBuffer

. Although, the extensions you've mentioned could simply wrap Sink's and Source's internal buffer.

What potential issues do you see?

One of the main issues with

Buffer::get

is its poor performance: every

get

call requires segment lookup. And it's hard to get rid of that lookup as indices are relative to the beginning of a buffer and it advances on every read. The latter may also be an issue for a Sink/Source wrapped into a ReadBuffer/ReadWriteBuffer: I don't think that someone will actually use a Source and ReadBuffer-view simultaneously, but every read from the Source will invalidate corresponding ReadBuffer (and the same is true for Sinks and ReadWriteBuffers). For Sinks it could also be problematic to implement

requestCapacity

as the only way to extend Sink's buffer is to write data into it. And yet another possible issue from the top of my head - a Source could be infinite, or more realistically, it may not have a predefined size (for example, when a source reads data from a network). It's unclear how

ReadBuffer::limit

should work in that case. I don't have a lot of experience with flatbuffers and don't see the full picture, so what I'd propose may sound silly, but instead of representing Sink/Source/Buffer as Read/ReadWriteBuffers and dealing with their semantic differences, it may be better to support reading

ReadBuffer

from

Source

and writing it and

ReadWriteBuffer

Sink

. The main issue here would be array copying, but it is something that could be addressed (relatively easy) in the near future.

Indeed, I missed
Buffer::get
as I was looking for
getXXX
versions for all
XXX
primitive types (or rather the
Le
-suffixed versions). So I will have to implement those first. Do you think I could contribute those as extension functions directly in
kotlinx-io
? (If I understand correctly, they would have to be rewritten to use unsafe cursor later to only
seek
once...)

One of the reasons there are no such functions is the poor performance of random-access reads from

Buffer

, which makes

Buffer::get

almost impractical.

altavir

09/11/2023, 6:20 PM

I thought that discussion looks familiar. My solution to my own issue is here: https://github.com/SciProgCentre/dataforge-core/blob/kotlinx-io/dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/Binary.kt (this is implementation on top of new kotlinx-io API. It would be nice to have it in the core API because it would allow to optimize it for zero-copy use in random-access sources like files.

Didier Villevalois

09/22/2023, 8:42 AM

@Filipp Zhinkin thank you very much for this detailed answer and sorry for the late reply.

One of the reasons there are no such functions is the poor performance of random-access reads from
Buffer
, which makes
Buffer::get
almost impractical.

I understand and looking at the implementation details, this makes total sense.

[...] it may be better to support reading
ReadBuffer
from
Source
and writing it and
ReadWriteBuffer
to
Sink
.

I see. So let me focus on reading

ReadBuffer

from

Source

. I tried to experiment adding read-only views with random-access read from

Source

. Would this make sense to add the following to

kotlinx-io

? • Defining a

BufferView

(bad tentative name) with random-access. • In order, to support views larger than a buffer's segment size, we would make a segment index array and use a binary search in that index. This would give something like

O(log n)

index indirection (where

is the number of segments). • "Reading" a

BufferView

from a

Source

, by creating a private buffer and wrapping it in a

BufferView

. I toyed with this idea in the following branch: https://github.com/ptitjes/kotlinx-io/tree/read-random-access. Do you think it makes sense? Could a similar thing be added to

kotlinx-io

? (Btw, this could use the API you experimented in the

segments-iter

branch.)

Filipp Zhinkin

09/22/2023, 3:42 PM

@Didier Villevalois nice idea! In general, I like it a lot. What concerns me a bit is that there will be no way to invalidate a view once it was created, a user read data from the buffer (for some reason) and attempted to use the same view again. Unlikely someone will intentionally write a code like that, but for a library, we have to anticipate such cases.

(Btw, this could use the API you experimented in the
segments-iter
branch.)

It's for https://github.com/Kotlin/kotlinx-io/issues/135 which aimed to provide a public API not only for iterating over segments, but also for accessing individual bytes within the segment.

I toyed with this idea in the following branch: https://github.com/ptitjes/kotlinx-io/tree/read-random-access. Do you think it makes sense? Could a similar thing be added to
kotlinx-io
?

Do you mind opening an issue on github to loop in other potentially interested folks into a discussion? I think such an API might be useful, but it might make sense to wait until Buffers/Segments API update I mentioned above is complete.

Didier Villevalois

09/23/2023, 9:31 AM

What concerns me a bit is that there will be no way to invalidate a view once it was created, a user read data from the buffer (for some reason) and attempted to use the same view again. Unlikely someone will intentionally write a code like that, but for a library, we have to anticipate such cases.

I am not sure that I follow your reasoning, as the buffer instance is fully private. But it is true that I did not include a way to release the segments as I was focusing on the random-access. I expanded upon this on my branch: • made

BufferView

implement

AutoCloseableAlias

, by clearing the buffer, • added a

BufferView.slice(startIndex: Long, endIndex: Long): BufferView

that uses

Buffer.copyTo(...)

(and hence sharedCopies the segments), • added a

Buffer.copyToView(startIndex: Long, endIndex: Long): BufferView

convenience method that also uses

Buffer.copyTo(...)

• factored out byte reinterpretation operations (to avoid duplicated logic between

Buffer

and

BufferView

) • added `TODO`s referencing https://github.com/Kotlin/kotlinx-io/issues/135, where appropriate

Do you mind opening an issue on github to loop in other potentially interested folks into a discussion?

I created this issue: https://github.com/Kotlin/kotlinx-io/issues/225

I think such an API might be useful, but it might make sense to wait until Buffers/Segments API update I mentioned above is complete.

Indeed. Would you need some help on #135? I would be glad to contribute to this too. (But I don't want to put any more pressure on you! :P)

Filipp Zhinkin

09/25/2023, 3:23 PM

I created this issue: https://github.com/Kotlin/kotlinx-io/issues/225

Thanks!

Indeed. Would you need some help on #135? I would be glad to contribute to this too. (But I don't want to put any more pressure on you! :P)

The task is currently in a design phase, I'll publish an update as soon as it is more or less ready for public review along with a prototype implementation. Will appreciate any suggestions as well as reviews a lot then!

👍 1

9 Views

Open in Slack

Previous Next