Pasting from the OED
Posted: 27 June 2008 11:00 AM   [ Ignore ]
Avatar
RankRankRankRankRank
Total Posts:  2777
Joined  2007-01-31

In a couple of recent threads, the recurring problem of what happens to non-standard characters when copying-and-pasting from the online OED arose again.

It appears that, depending on your computer system and browser (possibly also which OED server you access, if there are more than one and they have different software), one of two things will happen if you simply copy-and-paste. (For my examples, I’m using “heathen” and “sophist”. Don’t take it personally, they just happened to provide good examples of etymologies containing Greek and Anglo-Saxon letters.)

The non-standard characters may disappear entirely:

[OE. hen = OFris. hêthin, -en, OS. hêin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heiinn (Sw., Da. heden); cf. Goth. hainô Gentile or heathen woman.

[ad. L. sophista, sophists, ad. Gr., f. to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]

or they may be replaced by the OED server’s “names” for the characters, enclosed in wavy brackets:

[OE. h{aeacu}{edh}en = OFris. hêthin, -en, OS. hê{edh}in (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. hei{edh}inn (Sw., Da. heden); cf. Goth. hai{th}nô Gentile or heathen woman.

[ad. L. sophista, sophist{emac}s, ad. Gr. {sigma}{omicron}{phi}{iota}{sigma}{tau}{ghacu}{fsigma}, f. {sigma}{omicron}{phi}{giacu}{zeta}{epsilon}{sigma}{theta}{alpha}{iota} to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]

If your browser takes the latter approach, the result is unlovely but decipherable to some extent. Everyone here presumably knows what a sigma is, even if ghacu and eacu are not so obvious.  The first approach is seriously misleading.  The OE word for heathen was not hen, and the second Latin word in the etymology for “sophist” should be sophistes with a macron over the e, not sophists.  The wavy brackets approach, even if unattractive and sometimes hard to interpret, does at least provide notice that there are nonstandard characters in the word. And in cases where entire words drop out, as is typical with Greek, it provides notice that a Greek word was present at that point, which can help prevent a quotation from becoming gibberish.

As we all know, presenting misinformation here is held in low esteem, so what ought someone whose browser takes the first approach do?

IMHO, the bare minimum is to visually compare what you see in the OED with what you see in your posting, and add a “missing characters” warning if needed.  Inserting asterisks or underlines to indicate the position of the missing characters as well is a bit better.

If the poster is familiar enough with the non-standard characters to provide a transliteration, that’s even better:

[OE. haethen = OFris. hêthin, -en, OS. hêthin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heithinn (Sw., Da. heden); cf. Goth. haithnô Gentile or heathen woman.

[ad. L. sophista, sophistes, ad. Gr. sophistés, f. sophízesthai to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]

When using this approach, I sometimes bracket multiple characters that stand for a single character in the original languge, e.g., [th] for an edh or thorn.

It also makes things clearer if one restores the formatting, at least to the extent of italicization:

[OE. haethen = OFris. hêthin, -en, OS. hêthin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heithinn (Sw., Da. heden); cf. Goth. haithnô Gentile or heathen woman.

[ad. L. sophista, sophistes, ad. Gr. sophistés, f. sophízesthai to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]

The most accurate, but also most laborious method, is to insert Unicode characters to reproduce the non-standard characters as exactly as possible:

[OE. hǽðen = OFris. hêthin, -en, OS. hêðin (MDu., Du. heiden), OHG. heidan (MHG. heiden, Ger. heide), ON. heiðinn (Sw., Da. heden); cf. Goth. haiþnô Gentile or heathen woman.

[ad. L. sophista, sophistēs, ad. Gr. σοφιστής, f. σοφίζεσθαι to become wise or learned. Hence also Sp. and It. sofista, F. sophiste.]

This approach does offer some chance for error, especially if, like me, you’re not well-versed in Greek diacriticals, so there may be some minor error in the above. But it’s clearly vastly more informative than simply letting the non-standard characters disappear without comment, and more attractive and readable than leaving the wavy brackets in.  IMHO, if one has time for it (and a browser that supports it), it’s the best approach.

[ Edited: 27 June 2008 06:36 PM by Dr. Techie ]
Profile
 
 
Posted: 27 June 2008 01:30 PM   [ Ignore ]   [ # 1 ]
RankRankRankRankRank
Total Posts:  3412
Joined  2007-01-29

I heartily agree.

Profile
 
 
Posted: 27 June 2008 01:47 PM   [ Ignore ]   [ # 2 ]
RankRankRank
Total Posts:  114
Joined  2008-04-24

presenting misinformation here is held in low esteem

Guilty as charged.  Feel free to put me at the bottom of your pile so that I can rise to the top, a breath of fresh air.  Polluted by the pile, but fresh.

Dave, could you please help here - my browser simply ignores all foreign script in a straight copy and paste from the OED.  I know it isn’t right, but laziness is always a much easier solution.  What is your recommendation?  How do we copy and paste unicode or other characters?  If there is a site that includes an idiot’s usage guide to Unicode or whatever package(s) we need, please would you link to it somewhere on WordOrigins, or somewhere we can copy and paste.

I aim to please.

Profile
 
 
Posted: 27 June 2008 01:55 PM   [ Ignore ]   [ # 3 ]
Avatar
RankRankRankRankRank
Total Posts:  2777
Joined  2007-01-31

It is my belief that the online OED does not use Unicode to represent Greek and other nonstandard characters. [Indeed, examination of the HTML code for one of their pages shows that they actually display little GIFs of the characters.] I do not think that any browser will allow you to simply copy and paste from the OED and generate correct output.  I would be happy to be proven wrong on this point, but I don’t think I will be unless and until the OED switches to Unicode.

Astal, to find out if your browser does copy and paste Unicode correctly, I suggest you try copying some of the Greek from my posting into your reply to this, and see if it shows up correctly.  I’m assuming you can see it in my post, correct?

Oh, and in case anyone is interested, I am using Mozilla Firefox 2.0.0.14, running under Mac OS 10.3.9.  When I copy-and-paste from the OED, I get the curly-brackets effect described above.

[ Edited: 27 June 2008 02:13 PM by Dr. Techie ]
Profile
 
 
Posted: 27 June 2008 02:02 PM   [ Ignore ]   [ # 4 ]
RankRankRank
Total Posts:  114
Joined  2008-04-24

ad. Gr. σοφιστής, f. σοφίζεσθαι to become wise

Yes, my browser supports Unicode.  I was guessing the OED uses Unicode, but whatever it uses, Dave, what is your preferred solution?  It’s a problem that obviously needs fixing since the OED is an invaluable resource.

Profile
 
 
Posted: 28 June 2008 05:58 AM   [ Ignore ]   [ # 5 ]
Administrator
Avatar
RankRankRankRankRank
Total Posts:  4600
Joined  2007-01-03

Since, I can’t do anything about it, it’s not a question of what is my preferred solution. It’s really up the individual how they want to handle it.

I think the only clearly unacceptable solution is to do a simple cut and paste that omits the special characters with no indication that they should be there.

Retention of the OED-supplied bracketed characters is okay for the odd character here and there. But it becomes difficult and tiresome to read if the quoted passage is filled with many such characters.

Transliteration into modern, standard characters is fine. Thorn and eth can become th, ash ae, wynn a w, and the yogh a modern g or y, depending on the case. If you omit or change diacritical marks, you should note that the quotation is not exact and one needs to consult the actual OED if they are doing serious research (which they should be doing anyway; much as I love my own site and have the highest respect for the contributors here, I wouldn’t repeat anything said here in a real paper without checking the sources).

Unicode substitution is the best solution from the perspective of accurate communication, but it’s the most work and I won’t blame people who don’t want to bother. Kudos to those that do, though.

[Corrected errors pointed out by jheem. Doh! --dw]

[ Edited: 28 June 2008 06:21 AM by Dave Wilton ]
Profile
 
 
Posted: 28 June 2008 06:14 AM   [ Ignore ]   [ # 6 ]
Avatar
RankRankRank
Total Posts:  407
Joined  2007-02-14

Thorn and ash can become th

A small correction: thorn {þ} and eth (or edh) {ð} become th (and optionally the latter can become dh; ash {æ} becomes ae.

Profile
 
 
Posted: 28 June 2008 08:28 AM   [ Ignore ]   [ # 7 ]
RankRankRank
Total Posts:  114
Joined  2008-04-24

I’ve just done a search on Unicode in this site, and in the Test discussion section is a thread called OED Letters: Junicode, where Dave has listed

Wynn:  ƿǷᚹ

Thorn: Þþᚦ

Eth:  Ððᚧᛟ

Ash:  æÆ

It might help if these and others often needed to replace missing OED letters were listed somewhere in a Sticky topic.  It would then be very easy to copy and paste, at least for those of us who can access Unicode.

Profile
 
 
Posted: 25 August 2009 11:29 PM   [ Ignore ]   [ # 8 ]
Rank
Total Posts:  1
Joined  2009-08-25

Hello, I hate to bump an old topic like this, but I came across this thread on Google, and I happen to have written a script a while back to solve this exact problem. It’s available at http://userscripts.org/scripts/show/2428 and requires the Firefox browser with the Greasemonkey add-on.

Unfortunately, I am aware of no publicly available list of the {} codes, so I have compiled a list by hand based on my day-to-day OED usage which now seems fairly comprehensive, although it is entirely possible that a few have escaped my attention (and they add new characters from time to time as they revise entries).

EDIT: In fact, I now see that “heathen” contains such a character, wouldn’t you know it....

[ Edited: 25 August 2009 11:31 PM by DopefishJustin ]
Profile
 
 
Posted: 26 August 2009 04:32 AM   [ Ignore ]   [ # 9 ]
Avatar
RankRankRankRank
Total Posts:  2298
Joined  2007-01-30

I’ve had no further problems with the Greek letters in OED since finding this transliteration site. Just copy and paste into your post the letters you type with the Virtual Greek Keyboard and Bob’s your uncle.

αβγδεζηθικλμνξοπρστυφχψω

Profile
 
 
   
 
 
‹‹ Times Archive      Sex Jokes ››