Hello, I am modeling the reading of a `.csv`-file,...
# getting-started
e
Hello, I am modeling the reading of a
.csv
-file, and want to be as explicit as possible. The file comes in a strict format, and I am able to say for sure what content-type each column will have. For this reason, I want to model it explicitly, like so:
Copy code
sealed class CsvField {
    data class ExternalId(val value: String) {
        companion object {
            const val index: Int = 0
        }
    }
}
So obviously there will be many more columns, which means the sealed class ends up looking extremely verbose when every data class has its own companion object, and there are like 15 of them. So back to my question; when my intent is to have 1 subclass of a sealed class with 1 constant, unchangeable field and one specified in constructor, is this the best way? Or am I missing something obvious? Is there a better pattern here? I tried with
enum class
first, but that way I don’t get to specify the value of the field, only the expected data. I could have a data class with the enum as type, and a plain string as value, but idk. Thoughts?
k
What exactly do you want to model? Colums? Rows? Single fields? Can you show some input and what shape of output you want?
e
Copy code
NAME;SURNAME;ID;EMAIL
Ted;Jamesson;1;some@email.net
Joe;Johnsson;2;some@invalid@email.net
This would be parsed as 2x4 CsvField-subclasses, with one subclass per column.
The reason is I have a
fun validate(csvField: CsvField)
that contains the logic to determine if a provided string value per column is valid or not, such as a regex to filter out the invalid email in the bottom row.
k
What about a set of (immutable) instances that describe the columns and then a separate "dumb" storage that actually stores the values?
e
I did that first, but I found such a separation didn’t look as clean. When I had that implementation, I had an enum for the columns, and a string for the value. But in that case, the validate function takes two parameters, and won’t be as “smart” as reading the type of the subclass.
Copy code
fun validate(csvField: CsvField) {
    when (csvField) {
        is Name -> validateByName(csvField.value)
        is Email -> validateByEmail(csvField.value)
    }
}
Something like this is what I want
And I have that now, and it works, but the CsvField class now looks really messy. 😄
k
And what's the scenario? You get a csv file and you want to validate it? Or do you actually want to do something with the data?
e
So to call all of this it looks like
Copy code
val validData = csvValues.all { validate(it) }

if (validData) {
    proceedWithTask()
}
Yea this is just the preliminary validation of a datafile to figure out if I should proceed with the actual work
k
Then don't you want to get rid of the csv as soon as possible? Why not immediately parse into a real data structure and validate on the fly?
e
Ah, well most of the data correspond to IDs of objects that’s hidden behind an api. Converting it to the objects they represent means fetching the data through api-calls, and before I do that I figure a bit of data-cleaning on my end is prudent. That way I can recognize what will fail without actually using the endpoints yet.
I may have misunderstood your question though. However when I use the endpoint to fetch the dataobjects, the conversion from
CsvField
to
DomainObject?
happens
k
Yeah that is what I meant, I see. If you aren't going to be doing a lot of different things with the csv values and only validate them I would just keep the columns and data separate. It's weird to have a class that contains data and what that data is supposed to look like and a single function that that checks that.
e
I see. How would you structure it? Currently it’s 15 columns, so quite a bit of different validation happening. I could spread the validation per column out to their own functions, but that would still be 8-9 functions even if I group the ones with similar validation. What would be the bteter way of doing it?
k
My first thought would be something like this:
Copy code
class CsvData(val columns: List<Column>, val data: List<List<String>>) {
    fun verify() {
        for (row in data) {
            check(columns.size == row.size)
            for ((col, value) in (columns zip row))
                col.verify(value)
        }
    }
}

interface Column {
    fun verify(value: String)
}

class RegexColumn(val regex: Regex): Column {
    override fun verify(value: String) {
        check(regex.matchEntire(value) != null)
    }
}
m
I agree with this latest design: validation should be a responsibility of each column, executed polymorphically.