Just realised that `BufferedReader forEachLine` doesn t dist kotlinlang #getting-started

Just realised that `BufferedReader.forEachLine` do...

Rob Elliot

11/24/2021, 11:44 AM

Just realised that

BufferedReader.forEachLine

doesn’t distinguish between

last line\nEOF

and

last lineEOF

- in both cases the last line emitted is

last line

, whether it ends in a line feed or not. This is frustrating if you need to capture the output of the

Reader

exactly, but would also like to buffer based on line feeds rather than an arbitrary number of characters. Anyone know of a workaround? (It’s nearly NOT KOTLIN I know, because it just delegates to `java.io.BufferedReader.readLine`… hope the fact that kotlin has extension methods calling it justifies asking?!)

ephemient

11/24/2021, 12:23 PM

no, not really. this may not help you, but https://en.wikipedia.org/wiki/Text_file#Unix_text_files

POSIX defines a text file as a file that contains characters organized into zero or more lines, where lines are sequences of zero or more non-newline characters plus a terminating newline character

a file that ends in a non-newline character is not a "text" file, and most line-oriented processing tools (such as

sed

) don't preserve final lines without newlines either

Rob Elliot

11/24/2021, 1:56 PM

I decided in the end just to use

read()

directly, it was for capturing a child process’s standard out in memory while also passing it to the current process’s standard out, and I realised that for e.g. progress bars it was going to be rubbish to only emit at line feed anyway.

Rob Elliot

11/24/2021, 1:57 PM

(I want the captured output of

echo 'hello world'

and

printf 'hello world'

to contain the fact that the first has a newline and the second does not)

ephemient

11/24/2021, 1:58 PM

you'd have an easier time passing through as a binary InputStream/OutputStream, I suspect

ephemient

11/24/2021, 2:00 PM

(of course you can tee the bytestream through a CharsetDecoder or similar to get chars for your own use)

Rob Elliot

11/24/2021, 3:09 PM

Good ideas, thanks. I want to store the in memory capture in some kind of ring buffer ideally so that it’s not unbounded, so I think it would need to be char rather than byte based otherwise if the encoding were UTF-8 dropping some early bytes might render the result undecodable.

ephemient

11/24/2021, 3:56 PM

example that can pass process output directly, while also safely decoding UTF-8 for your own use, even if it contains binary garbage

Copy code

err: "4+0 records in\n4+0 records out\n64 bytes transferred in 0.000016 secs (3947580 bytes/sec)\n"
out: "\r\uFFFD\uFFFD8\uFFFDd\uFFFD&\uFFFD\uFFFDA\uFFFD\u0580l\uFFFD\u0000\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\u0007\uFFFD`\uFFFD\uFFFDQB\u0000v\uFFFD\uFFFD\uFFFD\u491BD\uFFFD\uFFFDi\uFFFD\uFFFDe\uFFFD_\uFFFD)\uFFFD\uFFFD\u000Fy\u001C4\u0004\uFFFD\uFFFD\u0007Q\uFFFD\uFFFD#"
out: "\uFFFD"
exitCode: 0

(U+FFFD being the default replacement character)

Untitled.cpp

Rob Elliot

11/24/2021, 4:02 PM

Wow, thanks

5 Views

Open in Slack

Previous Next