On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
The main point on this series is to replace just the occurrences where ASCII represents the symbol equally well
- U+2014 ('—'): EM DASH
Em dash is not the same thing as hyphen-minus, and the latter does not serve 'equally well'. People use em dashes because — even in monospace fonts — they make text easier to read and comprehend, when used correctly. I accept that some of the other distinctions — like en dashes — are needlessly pedantic (though I don't doubt there is someone out there who will gladly defend them with the same fervour with which I argue for the em dash) and I wouldn't take the trouble to use them myself; but I think there is a reasonable assumption that when someone goes to the effort of using a Unicode punctuation mark that is semantic (rather than merely typographical), they probably had a reason for doing so.
- U+2018 ('‘'): LEFT SINGLE QUOTATION MARK
- U+2019 ('’'): RIGHT SINGLE QUOTATION MARK
- U+201c ('“'): LEFT DOUBLE QUOTATION MARK
- U+201d ('”'): RIGHT DOUBLE QUOTATION MARK
(These are purely typographic, I have no problem with dumping them.)
- U+00d7 ('×'): MULTIPLICATION SIGN
Presumably this is appearing in mathematical formulae, in which case changing it to 'x' loses semantic information.
Using the above symbols will just trick tools like grep for no good reason.
NBSP, sure. That one's probably an artefact of some document format conversion somewhere along the line, anyway. But what kinds of things with × or — in are going to be grept for?
If there are em dashes lying around that semantically _should_ be hyphen-minus (one of your patches I've seen, for instance, fixes an *en* dash moonlighting as the option character in an `ethtool` command line), then sure, convert them. But any time someone is using a Unicode character to *express semantics*, even if you happen to think the semantic distinction involved is a pedantic or unimportant one, I think you need an explicit grep case to justify ASCIIfying it.
-ed