holgerbrandl
10/14/2024, 5:55 AMDataFrame.readDelim(
FileInputStream(File("src/main/resources/data/msleep.csv")),
csvType = CSVType.DEFAULT,
colTypes = mapOf("brainwt" to ColType.Double),
parserOptions = ParserOptions(nullStrings = setOf("NA"))
)
It seems to guess that the brainweight is guessed to be a big decimal (should be double) and also struggles with the NA despite the provided parser option:
java.lang.IllegalStateException: Couldn't parse 'NA' into type kotlin.DoubleWhat am I doing wrong?
Jolan Rensen [JB]
10/14/2024, 1:30 PMnullStrings
into account (for the current csv implementation). Only the date-time parsers do. DataFrame, at the moment, uses NumberFormat.getInstance(locale).parse()
to parse doubles with some manual conversions, like "inf", "nan". "NA" is not recognized as a Double
, unfortunately, only "NaN" is.
We're working on a completely new CSV implementation at the moment, for which I'll add this file as a test-case. The next version of DF will likely have the new implementation it as an experimental opt-in.
Until then, the best intermediate solution would be to read the column as String, and convert it to double manually.
If you want NA to become null, try
val df = DataFrame.readCSV(
"path/to/msleep.csv",
colTypes = mapOf("brainwt" to ColType.String),
).convert { "brainwt"<String>() }.with { it.toDoubleOrNull() }
or if you want to make it NaN:
val df = DataFrame.readCSV(
"path/to/msleep.csv",
colTypes = mapOf("brainwt" to ColType.String),
).convert { "brainwt"<String>() }.with { it.toDoubleOrNull() ?: Double.NaN }
Jolan Rensen [JB]
10/14/2024, 2:41 PMval df = DataFrame.readCSV(
"/mnt/data/Projects/dataframe/examples/idea-examples/json/src/main/resources/msleep.csv",
colTypes = mapOf("brainwt" to ColType.String),
).parse(ParserOptions(nullStrings = setOf("NA")))
interestingly, the normal .parse
operation does understand nullStrings
, though, it still becomes a BigDecimal.Jolan Rensen [JB]
10/14/2024, 2:44 PM3e-04
unfortunately. Double.parseDouble("3e-04")
does work, hence why my first example returns the correct resultJolan Rensen [JB]
10/14/2024, 6:15 PMholgerbrandl
10/14/2024, 9:38 PM