On Tue, 2021-05-11 at 11:00 +0200, Mauro Carvalho Chehab wrote:
Yet, this series has two positive side effects:
- it helps people needing to touch the documents using non-utf8 locales[1];
- it makes easier to grep for a text;
[1] There are still some widely used distros nowadays (LTS ones?) that don't set UTF-8 as default. Last time I installed a Debian machine I had to explicitly set UTF-8 charset after install as the default were using ASCII encoding (can't remember if it was Debian 10 or an older version).
This whole line of thinking is fundamentally wrong.
A given set of characters in a "text file" are encoded with a specific character set / encoding. To interpret that file and convert the bytes back to characters, we need to use the *same* charset.
That charset is a property of the text file, and each text file or piece of text in a system (like this email, which will contain a Content-Type: header indicating the charset) might be encoded with a *different* character set.
In the days before you could connect computers together — or before you could exchange data between computers in different countries, at least — perhaps it made sense to store 'text' files without explicitly noting their encoding. And to interpret them using some kind of "default" character set.
Those days are long gone. You're trying to work around an egregiously stupid bug, if you're trying to pander to "default" encodings. There *is* no default encoding that even makes sense, except perhaps UTF-8. To *speak* of them as you did shows a misunderstanding of how broken they are. It's *precisely* that kind of half-baked thinking which always used to lead to stupid assumptions and double conversions and Mojibake. Before we just standardised on UTF-8 everywhere and it stopped mattering so much.
Just don't.
Now, you *can* make this work if you really insist on it, even for systems with EBCDIC as their default encoding. Just make git do the "convert to local charset" on checkout, precisely the same way as it does CRLF for Windows systems. But it's stupid and anachronistic, so I don't really see the point.