Just realised that `BufferedReader.forEachLine` do...
# getting-started
r
Just realised that
BufferedReader.forEachLine
doesn’t distinguish between
last line\nEOF
and
last lineEOF
- in both cases the last line emitted is
last line
, whether it ends in a line feed or not. This is frustrating if you need to capture the output of the
Reader
exactly, but would also like to buffer based on line feeds rather than an arbitrary number of characters. Anyone know of a workaround? (It’s nearly NOT KOTLIN I know, because it just delegates to `java.io.BufferedReader.readLine`… hope the fact that kotlin has extension methods calling it justifies asking?!)
e
no, not really. this may not help you, but https://en.wikipedia.org/wiki/Text_file#Unix_text_files
POSIX defines a text file as a file that contains characters organized into zero or more lines, where lines are sequences of zero or more non-newline characters plus a terminating newline character
a file that ends in a non-newline character is not a "text" file, and most line-oriented processing tools (such as
sed
) don't preserve final lines without newlines either
r
I decided in the end just to use
read()
directly, it was for capturing a child process’s standard out in memory while also passing it to the current process’s standard out, and I realised that for e.g. progress bars it was going to be rubbish to only emit at line feed anyway.
(I want the captured output of
echo 'hello world'
and
printf 'hello world'
to contain the fact that the first has a newline and the second does not)
e
you'd have an easier time passing through as a binary InputStream/OutputStream, I suspect
(of course you can tee the bytestream through a CharsetDecoder or similar to get chars for your own use)
r
Good ideas, thanks. I want to store the in memory capture in some kind of ring buffer ideally so that it’s not unbounded, so I think it would need to be char rather than byte based otherwise if the encoding were UTF-8 dropping some early bytes might render the result undecodable.
e
example that can pass process output directly, while also safely decoding UTF-8 for your own use, even if it contains binary garbage
Copy code
err: "4+0 records in\n4+0 records out\n64 bytes transferred in 0.000016 secs (3947580 bytes/sec)\n"
out: "\r\uFFFD\uFFFD8\uFFFDd\uFFFD&\uFFFD\uFFFDA\uFFFD\u0580l\uFFFD\u0000\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\u0007\uFFFD`\uFFFD\uFFFDQB\u0000v\uFFFD\uFFFD\uFFFD\u491BD\uFFFD\uFFFDi\uFFFD\uFFFDe\uFFFD_\uFFFD)\uFFFD\uFFFD\u000Fy\u001C4\u0004\uFFFD\uFFFD\u0007Q\uFFFD\uFFFD#"
out: "\uFFFD"
exitCode: 0
(U+FFFD being the default replacement character)
r
Wow, thanks