Hello friends! I beseech your wisdom! 😊 I have to tackle a
.csv
file (exported from a third-party system) which has integer columns with
,
as thousands separator (e.g,
47,302
,
48,000
). Needless to say, this is potentially problematic. 😅 When I load my
.csv
file into my Jupyter notebook, I believe
dataframe
relies on my system locale (
pt_BR
) and thus parses these integer columns as doubles —
pt_BR
has
,
as decimal separator and
.
as thousands separator. I end up having wrong values in those columns (
.csv
is of course to blame, not
dataframe
). So I was wondering if I could
(a) disable type inference for either the entire
.csv
or selected columns and get everything as string, so I can manually parse these values,
(b) change the underlying locale and see if it helps the type inference mechanism, or
(c) have parsing rules associated to certain columns. Any suggestions are highly appreciated! I apologise in advance if this is trivial, but I failed to identify a similar scenario in the documentation. Cheers! 🙏