Would it be difficult to write a CSV parser that m...
# android
l
Would it be difficult to write a CSV parser that maps lines to a data class based on annotations? The first question would be what I need to declare own annotations. Do I have to create a compiler that interprets these annotations? Edit: I know there are CSV libraries out there. It's for the learning effect.
c
I haven't used it personally, but there is an unofficial CSV serializtion formatter for kotlinx.serialization https://github.com/brudaswen/kotlinx-serialization-csv. You might give that a try before going and writing a parser/mapper yourself. The alternative, to actually write it yourself, is not terribly difficult, but certainly isn't trivial. You'd typically either use reflection to do the mapping at runtime (which is JVM-only), or use KSP to generate mapping code at compile time (works for all Kotlin targets) https://github.com/google/ksp
l
Thanks @Casey Brooks I know there are libraries for this purpose but I'm doing it for the learning effect. I have 3 questions: 1. What are the advantages of generating code via annotations over reflection. I often read that reflection has performance penalities?? 2. Do I have to always use kapt/ksp when using own annotations? 3. Always KSP instead of kapt? Kapt becomes outdated AFAIK, right?
e
1. reflection has some performance impact and won't work on non-JVM, or even ProGuard (e.g. Android) without care 2. no, but it's the so most straightforward way to add generated code output based on other code input into your build 3. kapt will remain supported, according to the latest blog post, but ksp is where development will focus
c
1. Performance is one of the main reasons to go with code-generation (KSP) over reflection. But a big concern, especially with Kotlin classes, is that Reflection is a pure-java API, and there are a lot of Kotlin constructs that Java Reflection does not know about (nullability, for one), which can lead to some strange issues like getting NPEs on non-null properties. Another thing to consider is that there's no IDE help for runtime annotation processing, so it's pretty difficult to track what exactly is happening with the annotations, which makes it hard to use and debug without extensive documentation. With KSP and code-gen, you can always go and look at the generated code to figure out what it's doing, and since it's normal source code at that point the IDE can detect it in autocomplete suggestions. That said, the reflection APIs are going to be easier to write than a KSP symbol processor. 2. No, but the alternatives are significantly more difficult, and not even officially public yet (compiler plugins). But they do allow you to do much more than KSP/KAPT, for extreme use-cases like Compose. KSP is also built as a full compiler plugin. 3. For new processors, yes KSP fully replaces KAPT, which should now be considered deprecated. It's effectively a superset of KAPT that is not tied to Java, and offers better ability to understand Kotlin-specific constructs, but otherwise functions the same.
l
ok thank you very much for the good points guys. I'm a bit overwhelmed tbh. I currently just need a deserialization logic, that converts a line of String into a data class with proper type conversion, serialization is not needed because I'm reading from an existing Excel file. I guess the right way is to go with KSP. Do you have any starting points for me?
c
You could also go another route and generate code from a simpler format, such as JSON, YAML, or .properties. It's easy enough to write a small script to take those formats as input and generate models and deserializers for those models, and then make a custom Gradle task to do that parsing. A benefit of that is that it's not tied to Kotlin either, and you could use most of that same code-gen for other platforms as well (for example, sharing models with iOS without needing to set up KMM, by also generating Swift code)
I don't think there's a ton of documentation for KSP beyond what's in its Github repo, but the #ksp channel might help you find your bearings with it
l
You could also go another route and generate code from a simpler format, such as JSON, YAML, or .properties.
I guess that's not possible @Casey Brooks I already have a Excel file (400 lines) and the data structure is complex. Some columns are generated by excel's power query editor. It would be hard to convert this to JSON
c
Not the file itself, but write some kind of YAML/JSON file that describes the data models. And then use that format to generate the CSV parser
l
Ah ok you mean to use the CSV and create an intermediate format like JSON from this CSV. ok I guess I can't follow. The JSON file should describe the model?
How does it help me? The only benefit I can see is, it is not coupled to Kotlin
e
@Casey Brooks for mapping to a data class, Kotlin reflection should be fine, and that does have access to nullity etc. not sure why you'd need to drop down to Java reflection
c
I mean like basically defining your data models in JSON, instead of Kotlin. And then some custom scripts generate the actual Kotlin data classes you'll use. For example, a JSON file like:
Copy code
package: com.my.big.record
className: MyBigRecord
fields:
  - propertyName: field1
    type: String
  - propertyName: anotherField
    type: Int
  - propertyName: isChecked
    type: Boolean
  - ...
and your scripts produce the following code (which is mostly what you'd generate for KSP, too):
Copy code
package com.my.big.record

data class MyBigRecord(
    val field1: String,
    val anotherField: Int,
    val isChecked: Boolean,
)

object MyBigRecordParser { 
    fun parse(csv: CsvRecord) : MyBigRecord { 
        return MyBigRecord(
            field1 = csv["field1"],
            anotherField = csv["anotherField"],
            isChecked = csv["isChecked"],
        )
    }
}
e
basically avoiding having to deal with the kapt/ksp mirror APIs by working outside of the system
you still have to integrate it into your build somehow, but that might be just "run the generator once and copy into sources" at first
c
Right, I'm just laying out some options I've used in the past, not necessarily saying one is the best choice. It's all dependent on the needs of your app and your team: • KSP if you're fully bought-in to maintaining the code generator. Since it's new, there may be changes you'll need to keep up with, but is going to be the "most idiomatic" for Kotlin for robust code generation • custom code-gen for simple jobs and you value the performance and IDE assistance, if you don't mind writing/maintaining those build scripts either • reflection if you can afford the performance penalty and don't mind losing IDE assistance, and if using the Kotlin reflection library to work better with kotlin constructs, you can accept adding the huge dependency
e
for what it's worth, Kotlin stdlib itself is partly code-generated via an external process: https://github.com/JetBrains/kotlin/tree/master/libraries/tools/kotlin-stdlib-gen
l
@Casey Brooks ok I got it now. If I have to use Gradle for creating this script I'm out. I won't bother with gradle ^^ @ephemient Did I get you right that for my specific use case I would just have to generate the code once. I'm new to this KSP stuff, so stupid questions might be incoming
e
Gradle isn't hard to use, I've posted examples of code generation from it here in this slack before
if you have questions about ksp, #ksp
l
There is one fundamental part which I might not understand yet. I use ksp to generate a class that maps a String to a data class and also does type conversation, right? How would this code looks like? Wouldn't this code use reflection too?
e
No, the processor has access to a reflection-like mirror API
in the case of (k)apt, it's using a Java-like code model so there are some things that are awkward (e.g. properties show up multiple times, as constructor args, fields, and accessors)
c
KSP would be run every time the Kotlin Compiler runs (every build), but you generally wouldn't have to do anything to hook it into Gradle beyond the normal KSP setup stuff. Custom code-gen could be done in a separate project and published as a normal Kotlin library (you'd manually run the scripts and publish as-necessary), manually run the scripts and check the sources into your main repo, or with some basic Gradle config, hook it into the build to make sure it's always up-to-date (like the above example)
e
well, ksp processor should be written such that it can be cached, so maybe not every build :)
l
I should start with something simple ^^ I'm already overwhelmed. When you say "Custom code-gen", what API we are talking about or do you mean with gradle? How would I do the type conversation?
I mean we are talking about scripts and code generation but I don't even know how to do the type conversion and even if I would go with reflection (simplest solution) I first would have to check out how the mapping is done.
c
The parsing and type-conversion of the CSV would all happen at runtime, so the code you generate would need to generate that conversion code. For the sample code I posted above, assuming everything in the CsvRecord was all String values, you might generate a mapper class like:
Copy code
object MyBigRecordParser { 
    fun parse(csv: CsvRecord) : MyBigRecord { 
        return MyBigRecord(
            field1 = csv["field1"],
            anotherField = csv["anotherField"].toIntOrNull() ?: 0,
            isChecked = csv["isChecked"].toBoolean(),
        )
    }
}
And of course, your code generators would have to know what types it supports, so it knows how to generate the proper "type conversion" for each field.
Keep in mind that the code you'd generate is just Kotlin. You'd just generate whatever code you'd normally use to do type conversion if you were writing it by hand. That's the general idea behind most code-generators: it's writing the same code you would anyway, but automatically.
l
But why do I even need to generate code. I can just write this code into a class? Wait stupid question.
c
Yes, that's the Reflection route. You don't need to generate code at all if you use Reflection instead
e
for code generation I would recommend kotlinpoet, which also integrates with APT and KSP (insofar as being able to indicate to the output Filer which sources an element was generated from, which helps with incremental compilation), but you'll have to figure out yourself what you want your generated code to look like
l
When I go with code generation no matter if I use kotlinpoet, gradle or ksp, wouldn't I end up writing almost the same code (for the generation) that I would write manually? When my code generation looks like this:
Copy code
package com.my.big.record

data class MyBigRecord(
    val field1: String,
    val anotherField: Int,
    val isChecked: Boolean,
)

object MyBigRecordParser { 
    fun parse(csv: CsvRecord) : MyBigRecord { 
        return MyBigRecord(
            field1 = csv["field1"],
            anotherField = csv["anotherField"].toIntOrNull() ?: 0,
            isChecked = csv["isChecked"].toBoolean(),
        )
    }
}
I have to write the code generation logic which would look almost identical, not? I mean since I need type conversation I would have to touch every single field or did I miss something?
I guess the code-gen approach makes sense for something more generic but for this single use case it might be better to go with reflection?
e
you should be able to design it so that it adapts to whatever shape your input data type is in
start with reflection if that's easier to get your head around
l
ok what would you go with?
e
codegen via gradle if the schema was external (e.g. json), codegen via kapt if I needed Java compatibility, codegen via ksp otherwise. maybe reflection if I was just prototyping something
l
Thanks for your time 🙂