1 of 2
1
HD: Where Do English Words Come From? 
Posted: 30 May 2014 07:13 AM   [ Ignore ]
Administrator
Avatar
RankRankRankRankRank
Total Posts:  4687
Joined  2007-01-03

Kind of goes to the heart of this site.

Profile
 
 
Posted: 30 May 2014 04:00 PM   [ Ignore ]   [ # 1 ]
Avatar
RankRankRankRankRank
Total Posts:  3053
Joined  2007-02-26
Dave Wilton - 30 May 2014 07:13 AM

Kind of goes to the heart of this site.

Very interesting, thanks for this.

The chart here represents the sources of English words as a percentage of the total new entries the OED has for that particular century. By using percentages, I correct for the bias the OED has for drawing upon sources from certain centuries. For example, since the OED was primarily compiled in the nineteenth century, the dictionary has far more new words from that period than any other—almost as many as the seventeenth and eighteenth centuries combined, and more than twice as many as the twentieth. Using percentages allows comparisons across the centuries.

I get the point, but surely there are centuries during which English was more, uh, logogenetically productive than others. In trying to eliminate sampling bias, might you not have introduced a distortion?

EDIT: BTW, pretty amazing that you are able to analyse the whole OED like this. I take it that you have it on your own disk, rather than just having online access?

[ Edited: 30 May 2014 04:15 PM by OP Tipping ]
Profile
 
 
Posted: 31 May 2014 03:32 AM   [ Ignore ]   [ # 2 ]
Administrator
Avatar
RankRankRankRankRank
Total Posts:  4687
Joined  2007-01-03

I get the point, but surely there are centuries during which English was more, uh, logogenetically productive than others. In trying to eliminate sampling bias, might you not have introduced a distortion?

There are two types of bias. One is editorial, with the editors focusing more on one century than another. We see this, for example, in decline in entries from the twentieth century. Since the bulk of the OED was compiled in the nineteenth century, they’re still playing catch-up with the twentieth, and thus there are fewer entries for terms coined after 1900. It’s not that the twentieth century was less productive, it’s just that the corpus is skewed. The second is a genuine difference in the rate of new words—I suppose this is what you mean by “logogenetic.” Any comparison of absolute numbers will include these biases; you can’t do meaningful comparisons because you can’t tell, for example, whether a decline in Latin coinages results from Latin being less popular or from fewer new words overall. Looking at percentages eliminates both. It doesn’t introduce any new bias, but it means the percentages don’t tell you anything about century-by-century productivity differences.

For the two later charts, I’ve normalized the data by multiplying the absolute number of entries for a given language by a factor (the total number of entries divided by the number of entries for that century). That should eliminate the editorial bias, but it may skew the numbers in other ways. I believe it works pretty well for the big languages with lots of entries (i.e., with big numbers the different biases all come out in the wash). It probably does skew the numbers for languages that contribute relatively few words. But since I’ve grouped these languages into larger groups, I think that problem is taken care of.

And all of this assumes the OED’s sampling is a good one. Again, that’s probably the case when you’re working with large numbers, but it gets problematic when dealing with smaller ones.

I don’t think the OED can be used reliably to determine the relative overall productivity of coinages between the centuries (e.g., are we coining more words now than in centuries past). I’ve mentioned the twentieth-century issue, but there are others. The publication of dictionaries and glossaries can artificially inflate the numbers for a given year, assigning many old words to a later year. Reliance on relatively few manuscripts from the medieval period introduces issues. The introduction of the printing press caused an explosion in the number of books published in the sixteenth century. Etc.

BTW, pretty amazing that you are able to analyse the whole OED like this. I take it that you have it on your own disk, rather than just having online access?

Nope, I just have online access. I used the advanced search feature. It took some time, but wasn’t especially difficult. That’s why I provided the spreadsheets, so others who wanted the actual numbers wouldn’t have to put in the hours to get them. (This whole effort was really just a way to procrastinate getting started on my next dissertation chapter.)

Profile
 
 
Posted: 31 May 2014 04:23 AM   [ Ignore ]   [ # 3 ]
Avatar
RankRankRankRankRank
Total Posts:  3053
Joined  2007-02-26

Thanks again for the extra info.

Profile
 
 
Posted: 31 May 2014 06:12 AM   [ Ignore ]   [ # 4 ]
RankRankRankRankRank
Total Posts:  3466
Joined  2007-01-29

The final contributor is non-language sources, things like personal and place names, acronyms, echoic words, and the like, including the infamous “origin unknown.”

I know what you mean, but there must be a better term than “non-language,” since all the sources you mention are in fact part of language.

Profile
 
 
Posted: 31 May 2014 07:17 AM   [ Ignore ]   [ # 5 ]
Administrator
Avatar
RankRankRankRankRank
Total Posts:  4687
Joined  2007-01-03

Yeah. The OED labels these as “other sources,” but that doesn’t clearly indicate that it means non-dialectal sources. I thought of using “non-dialectal,” but then thought that those not familiar with the linguistic sense of “dialect” might misinterpret it.

Profile
 
 
Posted: 31 May 2014 04:36 PM   [ Ignore ]   [ # 6 ]
Avatar
RankRankRank
Total Posts:  156
Joined  2007-02-15

Fascinating stuff mate! There’s a lot to take in there, it’s a nice insight into the sticks and stones of this site, as you say. Will have to have a wee look at that when guaging origins in future!

The thirteenth century, some two hundred years after the arrival of the Normans, was the peak of borrowing from French, with over a third of new words coming from French roots

Interesting, that 200 year hiatus from introduction in the new possession to height of word-borrowing potential after the fact. Could we call that tipping point the zenith of 13th C middle class social mobility awareness? After that, French was just so 12th C don’t you cnaw?

Instead of using a percentage, I’ve normalized the raw numbers to eliminate the sampling bias

I love it when you talk technical. But I do appreciate the approach.

Profile
 
 
Posted: 31 May 2014 06:32 PM   [ Ignore ]   [ # 7 ]
RankRankRankRank
Total Posts:  1263
Joined  2007-03-21

The exception is German, which starting in the eighteenth century begins to increase its contribution to English vocabulary, reaching 3% of new words all by itself, and nosing ahead of French by the twentieth century.

My favorite is Schadenfreude. The OED gives its earliest citation as 1852 but the word was obviously in use before that.

[1852 R. C. Trench Study of Words (ed. 3) ii. 29 What a fearful thing is it that any language should have a word expressive of the pleasure which men feel at the calamities of others; for the existence of the word bears testimony to the existence of the thing. And yet in more than one such a word is found… In the Greek ἐπιχαιρεκακία, in the German, ‘Schadenfreude’.

I once used the word in a Facebook post and a German friend wrote, “Why do you use a German word for that.” I responded, “Because we don’t have one.”

Profile
 
 
Posted: 01 June 2014 05:27 AM   [ Ignore ]   [ # 8 ]
RankRankRankRankRank
Total Posts:  3466
Joined  2007-01-29

You should have responded “The same reason you use the ‘English’ word Handy, except at least we borrowed a real word.”

Profile
 
 
Posted: 01 June 2014 06:36 AM   [ Ignore ]   [ # 9 ]
Avatar
RankRankRankRankRank
Total Posts:  3053
Joined  2007-02-26

I reckon English got the better deal out of that swap.

Profile
 
 
Posted: 01 June 2014 11:40 AM   [ Ignore ]   [ # 10 ]
Administrator
Avatar
RankRankRankRankRank
Total Posts:  4687
Joined  2007-01-03

Philip Durkin wrote a blog post on this same subject a few months back for the Oxford Dictionaries blog. The blog post also has an interactive graphic. Durkin also has a book out on the subject. I was aware of neither until just now.

[ Edited: 01 June 2014 11:45 AM by Dave Wilton ]
Profile
 
 
Posted: 01 June 2014 11:46 AM   [ Ignore ]   [ # 11 ]
RankRankRankRank
Total Posts:  1263
Joined  2007-03-21
languagehat - 01 June 2014 05:27 AM

You should have responded “The same reason you use the ‘English’ word Handy, except at least we borrowed a real word.”

Esprit de l’escalier! Perfect.

Profile
 
 
Posted: 01 June 2014 12:56 PM   [ Ignore ]   [ # 12 ]
Avatar
RankRankRankRank
Total Posts:  2312
Joined  2007-01-30
Oecolampadius - 01 June 2014 11:46 AM

languagehat - 01 June 2014 05:27 AM
You should have responded “The same reason you use the ‘English’ word Handy, except at least we borrowed a real word.”

Esprit de l’escalier! Perfect.

That’s been my favourite expression for years. It’s the perfect way of describing those ”That’s what I should have said!” moments, and God knows I get enough of those. I think it’s Diderot’s but I won’t swear to it.

Profile
 
 
Posted: 02 June 2014 03:43 AM   [ Ignore ]   [ # 13 ]
Administrator
Avatar
RankRankRankRankRank
Total Posts:  4687
Joined  2007-01-03

Yes, it’s Diderot, in his 1773 Paradoxe sur le comédien.

Profile
 
 
Posted: 03 June 2014 03:31 PM   [ Ignore ]   [ # 14 ]
RankRankRank
Total Posts:  174
Joined  2013-10-14

The thirteenth century, some two hundred years after the arrival of the Normans, was the peak of borrowing from French, with over a third of new words coming from French roots…

Very interesting, even though I was not able to enlarge the chart;(I tried zooming in but the chart would not enlarge) therefore, I was unable to read the print or the numbers. 

Many words borrowed from French, German, Italian, Dutch etc. were in turn borrowed from Latin or Greek.  This seems to indicate that the main source seems invariably to be Latin or Greek. 
For example:  Source is an English word derived from old French sourse , which in turn comes from Latin surgere.

Madam from old French ma dame from Latin mea domina.

Therefore, it seems that the true origin of English words come from Latin or Greek and all the other languages are just intermediaries.

I believe that the majority of English words draw from Latin or Greek, and Latin being predominant. Is this an inaccurate interpretation?
I’m just fishing for edification; if I’m completely off base I’d appreciate clarification.

Profile
 
 
Posted: 03 June 2014 08:31 PM   [ Ignore ]   [ # 15 ]
Administrator
Avatar
RankRankRankRankRank
Total Posts:  4687
Joined  2007-01-03

I believe that the majority of English words draw from Latin or Greek, and Latin being predominant. Is this an inaccurate interpretation?

The majority of English words come from Germanic roots, not Latin.

Many words borrowed from French, German, Italian, Dutch etc. were in turn borrowed from Latin or Greek. 

German and Dutch words are not likely to be from Latin, although words borrowed into English direct from German aren’t many in the scheme of things. (Note the difference between Germanic, which denotes the larger language family that includes English and Old English, Dutch, and German, which refers to the modern language spoken between the Rhine and the Oder.)

French and Italian words are predominantly from Latin. In fact, you can consider French and Italian to be “modern Latin.”

Therefore, it seems that the true origin of English words come from Latin or Greek and all the other languages are just intermediaries.

I’m not sure what a “true origin” is, unless the phrase is being used to denote a correct, as opposed to incorrect, etymology. Latin is just an “intermediary” for an even older language, just as Old English is an intermediary between modern English and proto-Germanic. It’s turtles all the way down, and where you stop tracing is an arbitrary decision. (Although after a bit evidence drys up. We really can’t trace words further than Old English or Latin. We can figure out a pretty good approximation of what proto-Indo-European words were, but the reconstructed PIE roots probably don’t represent a real language; they resemble somewhat the ancestor languages (note the plural; there was probably not a single dialect of PIE). There are no reliable guesses as to what ancestors of the PIE languages might have looked like.)

But that being said, there are really two different questions. Where does English borrow its words from? And, What proportion of English words stem from the various major language families?

[Edited to correct typos--dw]

[ Edited: 04 June 2014 08:37 AM by Dave Wilton ]
Profile
 
 
   
1 of 2
1
 
‹‹ HD: Baby Names      HD: How Languages Evolve ››