In a couple of recent threads, the recurring problem of what happens to non-standard characters when copying-and-pasting from the online OED arose again.
It appears that, depending on your computer system and browser (possibly also which OED server you access, if there are more than one and they have different software), one of two things will happen if you simply copy-and-paste. (For my examples, I’m using “heathen” and “sophist”. Don’t take it personally, they just happened to provide good examples of etymologies containing Greek and Anglo-Saxon letters.)
The non-standard characters may disappear entirely:
[OE. hen = OFris. hêthin, -en, OS. hêin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heiinn (Sw., Da. heden); cf. Goth. hainô Gentile or heathen woman.
[ad. L. sophista, sophists, ad. Gr., f. to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]
or they may be replaced by the OED server’s “names” for the characters, enclosed in wavy brackets:
[OE. h{aeacu}{edh}en = OFris. hêthin, -en, OS. hê{edh}in (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. hei{edh}inn (Sw., Da. heden); cf. Goth. hai{th}nô Gentile or heathen woman.
[ad. L. sophista, sophist{emac}s, ad. Gr. {sigma}{omicron}{phi}{iota}{sigma}{tau}{ghacu}{fsigma}, f. {sigma}{omicron}{phi}{giacu}{zeta}{epsilon}{sigma}{theta}{alpha}{iota} to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]
If your browser takes the latter approach, the result is unlovely but decipherable to some extent. Everyone here presumably knows what a sigma is, even if ghacu and eacu are not so obvious. The first approach is seriously misleading. The OE word for heathen was not hen, and the second Latin word in the etymology for “sophist” should be sophistes with a macron over the e, not sophists. The wavy brackets approach, even if unattractive and sometimes hard to interpret, does at least provide notice that there are nonstandard characters in the word. And in cases where entire words drop out, as is typical with Greek, it provides notice that a Greek word was present at that point, which can help prevent a quotation from becoming gibberish.
As we all know, presenting misinformation here is held in low esteem, so what ought someone whose browser takes the first approach do?
IMHO, the bare minimum is to visually compare what you see in the OED with what you see in your posting, and add a “missing characters” warning if needed. Inserting asterisks or underlines to indicate the position of the missing characters as well is a bit better.
If the poster is familiar enough with the non-standard characters to provide a transliteration, that’s even better:
[OE. haethen = OFris. hêthin, -en, OS. hêthin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heithinn (Sw., Da. heden); cf. Goth. haithnô Gentile or heathen woman.
[ad. L. sophista, sophistes, ad. Gr. sophistés, f. sophízesthai to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]
When using this approach, I sometimes bracket multiple characters that stand for a single character in the original languge, e.g., [th] for an edh or thorn.
It also makes things clearer if one restores the formatting, at least to the extent of italicization:
[OE. haethen = OFris. hêthin, -en, OS. hêthin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heithinn (Sw., Da. heden); cf. Goth. haithnô Gentile or heathen woman.
[ad. L. sophista, sophistes, ad. Gr. sophistés, f. sophízesthai to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]
The most accurate, but also most laborious method, is to insert Unicode characters to reproduce the non-standard characters as exactly as possible:
[OE. hǽðen = OFris. hêthin, -en, OS. hêðin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heiðinn (Sw., Da. heden); cf. Goth. haiþnô Gentile or heathen woman.
[ad. L. sophista, sophistēs, ad. Gr. σοφιστής, f. σοφίζεσθαι to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]
This approach does offer some chance for error, especially if, like me, you’re not well-versed in Greek diacriticals, so there may be some minor error in the above. But it’s clearly vastly more informative than simply letting the non-standard characters disappear without comment, and more attractive and readable than leaving the wavy brackets in. IMHO, if one has time for it (and a browser that supports it), it’s the best approach.
