Would it be difficult to write a CSV parser that maps lines kotlinlang #android

Would it be difficult to write a CSV parser that m...

Lilly

10/12/2021, 5:02 PM

Would it be difficult to write a CSV parser that maps lines to a data class based on annotations? The first question would be what I need to declare own annotations. Do I have to create a compiler that interprets these annotations? Edit: I know there are CSV libraries out there. It's for the learning effect.

Casey Brooks

10/12/2021, 5:09 PM

I haven't used it personally, but there is an unofficial CSV serializtion formatter for kotlinx.serialization https://github.com/brudaswen/kotlinx-serialization-csv. You might give that a try before going and writing a parser/mapper yourself. The alternative, to actually write it yourself, is not terribly difficult, but certainly isn't trivial. You'd typically either use reflection to do the mapping at runtime (which is JVM-only), or use KSP to generate mapping code at compile time (works for all Kotlin targets) https://github.com/google/ksp

Lilly

10/12/2021, 5:17 PM

Thanks @Casey Brooks I know there are libraries for this purpose but I'm doing it for the learning effect. I have 3 questions: 1. What are the advantages of generating code via annotations over reflection. I often read that reflection has performance penalities?? 2. Do I have to always use kapt/ksp when using own annotations? 3. Always KSP instead of kapt? Kapt becomes outdated AFAIK, right?

ephemient

10/12/2021, 5:28 PM

1. reflection has some performance impact and won't work on non-JVM, or even ProGuard (e.g. Android) without care 2. no, but it's the so most straightforward way to add generated code output based on other code input into your build 3. kapt will remain supported, according to the latest blog post, but ksp is where development will focus

Casey Brooks

10/12/2021, 5:32 PM

1. Performance is one of the main reasons to go with code-generation (KSP) over reflection. But a big concern, especially with Kotlin classes, is that Reflection is a pure-java API, and there are a lot of Kotlin constructs that Java Reflection does not know about (nullability, for one), which can lead to some strange issues like getting NPEs on non-null properties. Another thing to consider is that there's no IDE help for runtime annotation processing, so it's pretty difficult to track what exactly is happening with the annotations, which makes it hard to use and debug without extensive documentation. With KSP and code-gen, you can always go and look at the generated code to figure out what it's doing, and since it's normal source code at that point the IDE can detect it in autocomplete suggestions. That said, the reflection APIs are going to be easier to write than a KSP symbol processor. 2. No, but the alternatives are significantly more difficult, and not even officially public yet (compiler plugins). But they do allow you to do much more than KSP/KAPT, for extreme use-cases like Compose. KSP is also built as a full compiler plugin. 3. For new processors, yes KSP fully replaces KAPT, which should now be considered deprecated. It's effectively a superset of KAPT that is not tied to Java, and offers better ability to understand Kotlin-specific constructs, but otherwise functions the same.

Lilly

10/12/2021, 5:36 PM

ok thank you very much for the good points guys. I'm a bit overwhelmed tbh. I currently just need a deserialization logic, that converts a line of String into a data class with proper type conversion, serialization is not needed because I'm reading from an existing Excel file. I guess the right way is to go with KSP. Do you have any starting points for me?

Casey Brooks

10/12/2021, 5:37 PM

You could also go another route and generate code from a simpler format, such as JSON, YAML, or .properties. It's easy enough to write a small script to take those formats as input and generate models and deserializers for those models, and then make a custom Gradle task to do that parsing. A benefit of that is that it's not tied to Kotlin either, and you could use most of that same code-gen for other platforms as well (for example, sharing models with iOS without needing to set up KMM, by also generating Swift code)

Casey Brooks

10/12/2021, 5:40 PM

I don't think there's a ton of documentation for KSP beyond what's in its Github repo, but the #ksp channel might help you find your bearings with it

Lilly

10/12/2021, 5:40 PM

You could also go another route and generate code from a simpler format, such as JSON, YAML, or .properties.

I guess that's not possible @Casey Brooks I already have a Excel file (400 lines) and the data structure is complex. Some columns are generated by excel's power query editor. It would be hard to convert this to JSON

Casey Brooks

10/12/2021, 5:42 PM

Not the file itself, but write some kind of YAML/JSON file that describes the data models. And then use that format to generate the CSV parser

Lilly

10/12/2021, 5:42 PM

~~Ah ok you mean to use the CSV and create an intermediate format like JSON from this CSV.~~ ok I guess I can't follow. The JSON file should describe the model?

Lilly

10/12/2021, 5:46 PM

How does it help me? The only benefit I can see is, it is not coupled to Kotlin

ephemient

10/12/2021, 5:48 PM

@Casey Brooks for mapping to a data class, Kotlin reflection should be fine, and that does have access to nullity etc. not sure why you'd need to drop down to Java reflection

Casey Brooks

10/12/2021, 5:49 PM

I mean like basically defining your data models in JSON, instead of Kotlin. And then some custom scripts generate the actual Kotlin data classes you'll use. For example, a JSON file like:

Copy code

package: com.my.big.record
className: MyBigRecord
fields:
  - propertyName: field1
    type: String
  - propertyName: anotherField
    type: Int
  - propertyName: isChecked
    type: Boolean
  - ...

and your scripts produce the following code (which is mostly what you'd generate for KSP, too):

Copy code

package com.my.big.record

data class MyBigRecord(
    val field1: String,
    val anotherField: Int,
    val isChecked: Boolean,
)

object MyBigRecordParser { 
    fun parse(csv: CsvRecord) : MyBigRecord { 
        return MyBigRecord(
            field1 = csv["field1"],
            anotherField = csv["anotherField"],
            isChecked = csv["isChecked"],
        )
    }
}

ephemient

10/12/2021, 5:50 PM

basically avoiding having to deal with the kapt/ksp mirror APIs by working outside of the system

ephemient

10/12/2021, 5:51 PM

you still have to integrate it into your build somehow, but that might be just "run the generator once and copy into sources" at first

Casey Brooks

10/12/2021, 5:56 PM

Right, I'm just laying out some options I've used in the past, not necessarily saying one is the best choice. It's all dependent on the needs of your app and your team: • KSP if you're fully bought-in to maintaining the code generator. Since it's new, there may be changes you'll need to keep up with, but is going to be the "most idiomatic" for Kotlin for robust code generation • custom code-gen for simple jobs and you value the performance and IDE assistance, if you don't mind writing/maintaining those build scripts either • reflection if you can afford the performance penalty and don't mind losing IDE assistance, and if using the Kotlin reflection library to work better with kotlin constructs, you can accept adding the huge dependency

ephemient

10/12/2021, 6:01 PM

for what it's worth, Kotlin stdlib itself is partly code-generated via an external process: https://github.com/JetBrains/kotlin/tree/master/libraries/tools/kotlin-stdlib-gen

Lilly

10/12/2021, 6:01 PM

@Casey Brooks ok I got it now. If I have to use Gradle for creating this script I'm out. I won't bother with gradle ^^ @ephemient Did I get you right that for my specific use case I would just have to generate the code once. I'm new to this KSP stuff, so stupid questions might be incoming

ephemient

10/12/2021, 6:02 PM

Gradle isn't hard to use, I've posted examples of code generation from it here in this slack before

ephemient

10/12/2021, 6:03 PM

if you have questions about ksp, #ksp

Lilly

10/12/2021, 6:05 PM

There is one fundamental part which I might not understand yet. I use ksp to generate a class that maps a String to a data class and also does type conversation, right? How would this code looks like? Wouldn't this code use reflection too?

ephemient

10/12/2021, 6:06 PM

No, the processor has access to a reflection-like mirror API

ephemient

10/12/2021, 6:08 PM

in the case of (k)apt, it's using a Java-like code model so there are some things that are awkward (e.g. properties show up multiple times, as constructor args, fields, and accessors)

ephemient

10/12/2021, 6:09 PM

codegen in Gradle: https://kotlinlang.slack.com/archives/C0B8L3U69/p1610495662094800?thread_ts=1610495014.094400&cid=C0B8L3U69

Casey Brooks

10/12/2021, 6:13 PM

KSP would be run every time the Kotlin Compiler runs (every build), but you generally wouldn't have to do anything to hook it into Gradle beyond the normal KSP setup stuff. Custom code-gen could be done in a separate project and published as a normal Kotlin library (you'd manually run the scripts and publish as-necessary), manually run the scripts and check the sources into your main repo, or with some basic Gradle config, hook it into the build to make sure it's always up-to-date (like the above example)

ephemient

10/12/2021, 6:15 PM

well, ksp processor should be written such that it can be cached, so maybe not every build :)

Lilly

10/12/2021, 6:17 PM

I should start with something simple ^^ I'm already overwhelmed. When you say "Custom code-gen", what API we are talking about or do you mean with gradle? How would I do the type conversation?

Lilly

10/12/2021, 6:21 PM

I mean we are talking about scripts and code generation but I don't even know how to do the type conversion and even if I would go with reflection (simplest solution) I first would have to check out how the mapping is done.

Casey Brooks

10/12/2021, 6:21 PM

The parsing and type-conversion of the CSV would all happen at runtime, so the code you generate would need to generate that conversion code. For the sample code I posted above, assuming everything in the CsvRecord was all String values, you might generate a mapper class like:

Copy code

object MyBigRecordParser { 
    fun parse(csv: CsvRecord) : MyBigRecord { 
        return MyBigRecord(
            field1 = csv["field1"],
            anotherField = csv["anotherField"].toIntOrNull() ?: 0,
            isChecked = csv["isChecked"].toBoolean(),
        )
    }
}

And of course, your code generators would have to know what types it supports, so it knows how to generate the proper "type conversion" for each field.

Casey Brooks

10/12/2021, 6:22 PM

Keep in mind that the code you'd generate is just Kotlin. You'd just generate whatever code you'd normally use to do type conversion if you were writing it by hand. That's the general idea behind most code-generators: it's writing the same code you would anyway, but automatically.

Lilly

10/12/2021, 6:22 PM

~~But why do I even need to generate code. I can just write this code into a class?~~ Wait stupid question.

Casey Brooks

10/12/2021, 6:24 PM

Yes, that's the Reflection route. You don't need to generate code at all if you use Reflection instead

ephemient

10/12/2021, 6:26 PM

for code generation I would recommend kotlinpoet, which also integrates with APT and KSP (insofar as being able to indicate to the output Filer which sources an element was generated from, which helps with incremental compilation), but you'll have to figure out yourself what you want your generated code to look like

Lilly

10/12/2021, 6:50 PM

When I go with code generation no matter if I use kotlinpoet, gradle or ksp, wouldn't I end up writing almost the same code (for the generation) that I would write manually? When my code generation looks like this:

Copy code

package com.my.big.record

data class MyBigRecord(
    val field1: String,
    val anotherField: Int,
    val isChecked: Boolean,
)

object MyBigRecordParser { 
    fun parse(csv: CsvRecord) : MyBigRecord { 
        return MyBigRecord(
            field1 = csv["field1"],
            anotherField = csv["anotherField"].toIntOrNull() ?: 0,
            isChecked = csv["isChecked"].toBoolean(),
        )
    }
}

I have to write the code generation logic which would look almost identical, not? I mean since I need type conversation I would have to touch every single field or did I miss something?

Lilly

10/12/2021, 6:54 PM

I guess the code-gen approach makes sense for something more generic but for this single use case it might be better to go with reflection?

ephemient

10/12/2021, 6:54 PM

you should be able to design it so that it adapts to whatever shape your input data type is in

ephemient

10/12/2021, 6:55 PM

start with reflection if that's easier to get your head around

Lilly

10/12/2021, 6:56 PM

ok what would you go with?

ephemient

10/12/2021, 7:12 PM

codegen via gradle if the schema was external (e.g. json), codegen via kapt if I needed Java compatibility, codegen via ksp otherwise. maybe reflection if I was just prototyping something

Lilly

10/12/2021, 9:30 PM

Thanks for your time 🙂

27 Views

Open in Slack

Previous Next