Paulo Cereda
11/25/2022, 8:38 AM.csv
file (exported from a third-party system) which has integer columns with ,
as thousands separator (e.g, 47,302
, 48,000
). Needless to say, this is potentially problematic. π
When I load my .csv
file into my Jupyter notebook, I believe dataframe
relies on my system locale (pt_BR
) and thus parses these integer columns as doubles β pt_BR
has ,
as decimal separator and .
as thousands separator. I end up having wrong values in those columns (.csv
is of course to blame, not dataframe
). So I was wondering if I could (a) disable type inference for either the entire .csv
or selected columns and get everything as string, so I can manually parse these values, (b) change the underlying locale and see if it helps the type inference mechanism, or (c) have parsing rules associated to certain columns. Any suggestions are highly appreciated! I apologise in advance if this is trivial, but I failed to identify a similar scenario in the documentation. Cheers! πaltavir
11/25/2022, 9:16 AMaltavir
11/25/2022, 9:18 AMPaulo Cereda
11/25/2022, 9:20 AM.csv
files before loading into Jupyter (I wrote a Kotlin script for this), but I was wondering if I could find an easier way. πPaulo Cereda
11/25/2022, 9:22 AMaltavir
11/25/2022, 9:22 AMPaulo Cereda
11/25/2022, 9:23 AMaltavir
11/25/2022, 9:23 AMPaulo Cereda
11/25/2022, 9:25 AMaltavir
11/25/2022, 9:35 AMPaulo Cereda
11/25/2022, 9:55 AMNikita Klimenko [JB]
11/25/2022, 7:12 PMval df = DataFrame.readCSV(
"datasets/decimals.csv",
colTypes = mapOf("colName" to ColType.String)
)
b) it's also possible to provide locale as a parameter to readCSV
val df = DataFrame.readCSV(
"datasets/decimals.csv",
parserOptions = ParserOptions(locale = <http://Locale.UK|Locale.UK>),
)
Paulo Cereda
11/26/2022, 9:05 AM