So I wrote a library for csv-parsing in Kotlin cal...
# feed
s
So I wrote a library for csv-parsing in Kotlin called KSV: https://github.com/whichdigital/ksv The problem we had at 'Which?' was that we regularly import government data in ever-slightly-changing csv-format (e.g. changed order of columns, new columns at arbitrary position and column names with changing lower/uppercase and sometimes with additional spaces). So this solution looks doesn't use the index of data (because that kept changing), but uses the (normalized) column names in its mapping. A
csv2List
-function is used to map an InputStream containing csv-text into a list of a user-defined data class. Strong type guarantees, including nullability and user-defined property types are provided (as well as common csv-features like
,
being allowed in csv-values if those are surrounded by quotation marks).
Copy code
@CsvRow data class DataRow(
    @CsvValue(name = "RQIA") val id: String,
    @CsvValue(name = "Number of beds") val bedCount: Int?,   // types can be nullable
    val addressLine1: String,                                // without annotation it's assumed the the column name is the the property name
    val city: String = "London",                             // without value in the csv file the Kotlin default value is used
    @CsvTimestamp(name = "latest check", format = "yyyy/MM/dd|dd/MM/yyyy")  
    val latestCheckDate: LocalDate?,                         // multiple formats can be provided separated by '|'
    @CsvGeneric(name = "offers Cola, Sprite or Fanta", converterName = "beverageBoolean")
    val refreshments: Boolean?                               // a user-defined converter can be used
)

// register a user-defined converter
registerGenericConverter("beverageBoolean") {
    it.toLowerCase()=="indeed"
}

// some text in csv-format to be mapped
val csvStream: InputStream = """
  city, addressLine1, Number of beds, latest check, RQIA, "offers Cola, Sprite or Fanta"
  if a line doesn't fit the pattern, it will be discarded <- like this line, the next line is fine because city and Number of beds are nullable
      , "2 Marylebone Rd",          ,2020/03/11,   WERS234, nope
  Berlin, "Berkaer Str 41", 1       ,28/08/2012, "NONE123", indeed
  Paris,"Rue Gauge, Maison 1", 4    ,          , "FR92834",
  Atlantis,,25000,,,
  """.trimIndent().byteInputStream()

val dataRows: List<DataRow> = csv2List(            // <- this is how you trigger the mapping, you basically tell the lib when the InputStream comes from and which type the data is
  CsvSourceConfig(
    stream = csvStream 
  )    
)
👍 7
j
Very cool, we use something similar to parse CSVs at my current job, I'm interested to try out your solution. Ours is less sophisticated, the columns don't have any type information, looks something like this:
Copy code
data class PersonCsvEntry(
    @CsvField("Worker ID") var workerId: String? = null,
    @CsvField("Legal First Name") var firstName: String? = null,
    @CsvField("Legal Last Name") var lastName: String? = null,
    @CsvField("Phone Number") var phoneNumber: String? = null
): CsvEntry()
s
Have a look and if you find a bug open an issue on the GitHub page 🙂 (I assume I’d get an automatic email in that case, if you don’t hear anything within 2 days, send me a message here and I’ll have a lock)