Hello I am modeling the reading of a ` csv` file and want to kotlinlang #getting-started

Hello, I am modeling the reading of a `.csv`-file,...

Esa

11/12/2019, 8:24 AM

Hello, I am modeling the reading of a

.csv

-file, and want to be as explicit as possible. The file comes in a strict format, and I am able to say for sure what content-type each column will have. For this reason, I want to model it explicitly, like so:

Copy code

sealed class CsvField {
    data class ExternalId(val value: String) {
        companion object {
            const val index: Int = 0
        }
    }
}

So obviously there will be many more columns, which means the sealed class ends up looking extremely verbose when every data class has its own companion object, and there are like 15 of them. So back to my question; when my intent is to have 1 subclass of a sealed class with 1 constant, unchangeable field and one specified in constructor, is this the best way? Or am I missing something obvious? Is there a better pattern here? I tried with

enum class

first, but that way I don’t get to specify the value of the field, only the expected data. I could have a data class with the enum as type, and a plain string as value, but idk. Thoughts?

karelpeeters

11/14/2019, 11:07 AM

What exactly do you want to model? Colums? Rows? Single fields? Can you show some input and what shape of output you want?

Esa

11/14/2019, 11:18 AM

Copy code

NAME;SURNAME;ID;EMAIL
Ted;Jamesson;1;some@email.net
Joe;Johnsson;2;some@invalid@email.net

This would be parsed as 2x4 CsvField-subclasses, with one subclass per column.

Esa

11/14/2019, 11:19 AM

The reason is I have a

fun validate(csvField: CsvField)

that contains the logic to determine if a provided string value per column is valid or not, such as a regex to filter out the invalid email in the bottom row.

karelpeeters

11/14/2019, 11:24 AM

What about a set of (immutable) instances that describe the columns and then a separate "dumb" storage that actually stores the values?

Esa

11/14/2019, 11:25 AM

I did that first, but I found such a separation didn’t look as clean. When I had that implementation, I had an enum for the columns, and a string for the value. But in that case, the validate function takes two parameters, and won’t be as “smart” as reading the type of the subclass.

Esa

11/14/2019, 11:27 AM

Copy code

fun validate(csvField: CsvField) {
    when (csvField) {
        is Name -> validateByName(csvField.value)
        is Email -> validateByEmail(csvField.value)
    }
}

Something like this is what I want

Esa

11/14/2019, 11:27 AM

And I have that now, and it works, but the CsvField class now looks really messy. 😄

karelpeeters

11/14/2019, 11:28 AM

And what's the scenario? You get a csv file and you want to validate it? Or do you actually want to do something with the data?

Esa

11/14/2019, 11:28 AM

So to call all of this it looks like

Copy code

val validData = csvValues.all { validate(it) }

if (validData) {
    proceedWithTask()
}

Esa

11/14/2019, 11:28 AM

Yea this is just the preliminary validation of a datafile to figure out if I should proceed with the actual work

karelpeeters

11/14/2019, 11:30 AM

Then don't you want to get rid of the csv as soon as possible? Why not immediately parse into a real data structure and validate on the fly?

Esa

11/14/2019, 11:34 AM

Ah, well most of the data correspond to IDs of objects that’s hidden behind an api. Converting it to the objects they represent means fetching the data through api-calls, and before I do that I figure a bit of data-cleaning on my end is prudent. That way I can recognize what will fail without actually using the endpoints yet.

Esa

11/14/2019, 11:34 AM

I may have misunderstood your question though. However when I use the endpoint to fetch the dataobjects, the conversion from

CsvField

DomainObject?

happens

karelpeeters

11/14/2019, 11:38 AM

Yeah that is what I meant, I see. If you aren't going to be doing a lot of different things with the csv values and only validate them I would just keep the columns and data separate. It's weird to have a class that contains data and what that data is supposed to look like and a single function that that checks that.

Esa

11/14/2019, 1:03 PM

I see. How would you structure it? Currently it’s 15 columns, so quite a bit of different validation happening. I could spread the validation per column out to their own functions, but that would still be 8-9 functions even if I group the ones with similar validation. What would be the bteter way of doing it?

karelpeeters

11/14/2019, 4:45 PM

My first thought would be something like this:

Copy code

class CsvData(val columns: List<Column>, val data: List<List<String>>) {
    fun verify() {
        for (row in data) {
            check(columns.size == row.size)
            for ((col, value) in (columns zip row))
                col.verify(value)
        }
    }
}

interface Column {
    fun verify(value: String)
}

class RegexColumn(val regex: Regex): Column {
    override fun verify(value: String) {
        check(regex.matchEntire(value) != null)
    }
}

Matteo Mirk

11/20/2019, 1:51 PM

I agree with this latest design: validation should be a responsibility of each column, executed polymorphically.

14 Views

Open in Slack

Previous Next