so I'm in this funny spot where im removing a mass...
# general-advice
g
so I'm in this funny spot where im removing a massively overbuilt antlr thing with what im hoping will be a hundred lines of hand written parser for a CSV file. Is there an iconic kotlin tokenizer implementation? I was kinda hoping I could implement something really simple with
Scanner
, but it demands delimeters that I dont want to give it. I was also hoping kotli might have a kind of reader + regex extension function, something like
<http://java.io|java.io>.Reader.takeWhile(regex: Regex)
, but no luck there either. What does a quintessential kotlin text tokenizer look like?
a
I can really recommend https://github.com/alllex/parsus, it's nice and easy to use.
is there anything you're particularly interested in? E.g. performance, multiplatform support?
g
i was looking to avoid libraries entirely
i feel like one of the first things you do to show how neat your programming language is is implement a parser for something, maybe even the language itself
and im mostly just interested in tokenization, i just want something in the stdlib to let me take a file and apply some rules to grouping text from front-to-back
what im going with now is extension functions on
java.io.Reader
k
If you're reading CSV files, the recommended way has always been to use one of they many available CSV reader libraries. There is probably also one for kotlinx.serialization. CSV formats are complicated. For example, how would you deal with commas inside tokens? By putting tokens in quotations? Then how would you deal with quotations inside tokens? ... etc. By the way, regarding your use of
chars.use
it's recommended not to close
Scanner
when it's been opened on an
InputStream
that you haven't opened yourself. It's the caller's responsibility to close it.
g
yeah i realized that I just wasn't sure the best way to wrap the suspension seemantics of my
Sequence