how do you guys handle determining a text file's encoding?
r
Ruckus
12/12/2017, 6:05 AM
We just had a huge project for that at work. Ultimately management just decided we'll assume all files are UTF-8. Turns out must people edit CSV files in Excel on Windows, which saves I'm Latin 1. So now we have a whole bunch of broken translations and no solution.
Ruckus
12/12/2017, 6:06 AM
From what I understand, if the file doesn't specify, there's no way to know for sure, though there are some heuristic based libraries that will try to guess.
Ruckus
12/12/2017, 6:08 AM
In our case, we had a pretty non standard distribution of characters, so they didn't help much.
e
edvin
12/12/2017, 7:27 AM
I second that. At least with UTF-8 you can look for a BOM to be sure, as most editors will add the BOM these days. In any case, sticking with UTF-8 and even assuming it is probably the best you can do.