Em Mon, 10 May 2021 15:22:02 -0400 "Theodore Ts'o" tytso@mit.edu escreveu:
On Mon, May 10, 2021 at 02:49:44PM +0100, David Woodhouse wrote:
On Mon, 2021-05-10 at 13:55 +0200, Mauro Carvalho Chehab wrote:
This patch series is doing conversion only when using ASCII makes more sense than using UTF-8.
See, a number of converted documents ended with weird characters like ZERO WIDTH NO-BREAK SPACE (U+FEFF) character. This specific character doesn't do any good.
Others use NO-BREAK SPACE (U+A0) instead of 0x20. Harmless, until someone tries to use grep[1].
Replacing those makes sense. But replacing emdashes — which are a distinct character that has no direct replacement in ASCII and which people do *deliberately* use instead of hyphen-minus — does not.
I regularly use --- for em-dashes and -- for en-dashes. Markdown will automatically translate 3 ASCII hypens to em-dashes, and 2 ASCII hyphens to en-dashes. It's much, much easier for me to type 2 or 3 hypens into my text editor of choice than trying to enter the UTF-8 characters.
Yeah, typing those UTF-8 chars are a lot harder than typing -- and --- on several text editors ;-)
Here, I only type UTF-8 chars for accents (my US-layout keyboards are all set to US international, so typing those are easy).
If we can make sphinx do this translation, maybe that's the best way of dealing with these two characters?
Sphinx already does that by default[1], using smartquotes:
https://docutils.sourceforge.io/docs/user/smartquotes.html
Those are the conversions that are done there:
- Straight quotes (" and ') turned into "curly" quote characters; - dashes (-- and ---) turned into en- and em-dash entities; - three consecutive dots (... or . . .) turned into an ellipsis char.
So, we can simply use single/double commas, hyphens and dots for curly commas and ellipses.
[1] There's a way to disable it at conf.py, but at the Kernel this is kept on its default: to automatically do such conversions.
Thanks, Mauro