How Words Make It Into "The Dictionary"

10 March 2006

A question that I get asked with some frequency is I’ve invented a new word. How do I go about getting it in "the dictionary"?

The questioner wants to get credit and fame for coining a term. Almost invariably, I have to disappoint them; their word usually has little chance of ever making it into a dictionary.

So how does a word get into the dictionary? The simple answer is that the editors of a particular dictionary must deem it worthy of inclusion. There is no organization that officially admits words into the English language. Our language is a very democratic one and we admit any word that people actually use. But whether or not a word makes it into a dictionary is another question and the lexicographers of each dictionary are the gatekeepers for entry into that work.

So what are the criteria that lexicographers use to determine if a word is "worthy" of inclusion in their dictionary? The criteria vary from dictionary to dictionary, but generally are something like the following:

Is the term used by a number of different people? If the term is in widespread use, it will probably be included. Words that are rarely used are less likely to be included. Exceptions might be made for rarely used words that appear in significant works of literature.

Has the term been used for an extended period of time? Many (perhaps most) words are invented and then die a quick death. Dictionaries are less likely to include newly coined words because chances are they will be gone by the time the dictionary is actually published. Lexicographers will wait a few years to see if the word is still around when the next edition is being prepared.

Does the term appear in significant works of literature? If the word is used by a famous writer, it is more likely to be included. There are many examples of words appearing in the dictionary that haven’t been used by anyone except a single, famous writer. This isn’t snobbery, but rather practicality. If it is used by a famous writer, there are bound to be people read the work in question and look up the word to see what it means.

Is the term relevant to the dictionary’s audience? Believe it or not, dictionaries are usually aimed at particular audiences. Desktop and, even more so, paperback dictionaries are intended for people who need to look up a term quickly. These tomes will tend to include current, commonly used words over archaic, seldom found ones. Larger dictionaries tend to include more obscure terms. Some dictionaries have a distinct literary bent. Others seek to document slang, jargon, or regional uses. American dictionaries, as one might expect, tend to focus on American usages, British on British ones, Australian dictionaries on terms from downunder, etc.

Is the term highly technical or used only by a specific subset of the population? If it is, it is less likely to be included. Unless of course, the dictionary specializes in the jargon of the group in question. Medical dictionaries, for example, include lots of terms used by doctors that do not appear in general dictionaries.

Is the term a proper name? Some dictionaries omit proper names.

Is there room to include it? Finally, lexicographers must make a judgment call on how the word ranks with other candidates for inclusion. There is never enough room to include every word, so some words don’t make the cut because they are deemed less important than others. Even online dictionaries, which are not constrained by space limitations, do have limits on time. The lexicographers simply don’t have time to research every word they come across and so they prioritize.

So can you suggest a word for inclusion? The answer for some dictionaries is yes. The best dictionaries actively seek help from the public, but only certain types of help are wanted. First, check a number of different dictionaries, including, if you have access, the Oxford English Dictionary. If the term appears in one or more, there is little point in submitting it to those that don’t include it. Lexicographers do check rival dictionaries to see what is included there. If the term is not in one dictionary, it was probably omitted because of editorial policy or the editors know about it and it will be in the next edition.

And lobbying for a word’s inclusion will not help--in fact, it will probably hurt its chances for inclusion as the editors will likely dismiss you as a crank. The best bet is to send the term to the dictionary, accompanied by verifiable citations of it actually being used, and just see what happens. The citations should cover a variety of sources and come from a period covering several years at least. Include the term, quotes from a number of sources that use the term, and enough bibliographic information for each source so that the editors can find it easily.

And what is a verifiable source? Simply put, the lexicographers are not going to take your word for it. They want to be able to see where and how the word is being used for themselves. That means the word has to be published, either in print, on the web, or in some recorded media that is accessible to researchers. What about oral sources? The trouble is that oral uses are nearly impossible to verify. You might include one or two, so long as you provide a specific date, the speaker, and the location and as long as there are other, written citations as well. Please, don’t just say "my grandmother used this all the time" or "my best friend said this thirty years ago in high school."

So where do you send this information? To contribute to the Oxford English Dictionary, consult http://www.oed.com/readers/research.html for submission criteria and contact information. Merriam-Webster maintains an Open Dictionary online that you can submit entries to. It’s at http://www3.merriam-webster.com/opendictionary/index.php. And the folks that created Wikipedia also run the Wiktionary and you can craft entries for that yourself at http://en.wiktionary.org/wiki/Main_Page. Keep in mind that The Open Dictionary and the Wiktionary are not scholarly works, although professional lexicographers do consult them for ideas and evidence. Most of what appears in these open dictionaries is junk, but there are some nuggets of gold to be found there.

If a word is in the dictionary, does it do any good to send in a citation? Yes, if the citation is using the term in a way different than the definition given or it is used in a particular period. Lexicographers write their definitions based on how a word is used, so a new or different sense of the word can lead to a secondary definition. Also, if the word is used earlier or later than it has otherwise been known to, or during a period where the dictionary is lacking citations, then this can be helpful. Also, evidence of a regional or jargon term being used outside its normal habitat can also be helpful.

Finally, will you get credit? Well, if you submit to Merriam-Webster’s Open Dictionary or the Wiktionary, you can see your submission on the web. But that’s about it. If you are a very active contributor to the OED, you may get a mention on the contributors list, but you won’t get public credit for a specific submission. You don’t do this kind of thing for credit, but rather for the love of language.

Navajo Code Talkers

3 March 2006

A recent posting on the wordorigins.org discussion forum discussed the Navajo Code Talkers of World War II. The subject combines two interests of mine, languages and the history of the war and it also hearkens back to my Army days when I was in charge of communications security for my battalion.

The Navajo Code Talkers were human encryption/ decryption machines that served with the US Marine Corps in the Pacific Theater. In 1942, a Philip Johnston, the son of missionaries to the Navajo Indian nation and one of a handful of non-Navajos who spoke the language, heard of the military’s search for a robust code that could be used on the battlefield and thought that the Navajo might have a solution.

The Marine Corps’ dilemma was that any code that was difficult to break took a long time to encrypt and decrypt. There were many code systems that worked well for strategic communications, where one had the luxury of time to translate into and out of the code, but in the heat of battle time was a commodity in short supply. If one tried to send a coded message, it would probably not be understood in time by the person at the other end. And if one sent a message "in the clear," the message would get through quickly, to intended recipient, but also to the enemy.

Johnston realized that the Navajo language would suit the Marine Corps’ need perfectly. Navajo is an Athapaskan language, closely related to Apache. In the 1940s, it was also an unwritten language, which also meant that one had to travel to Navajo Indian reservations to learn it from a native speaker. (Since the 1940s, an Navajo alphabet has been developed and it is now a written language as well as a spoken one.) In the early 1940s there were only about 50,000 native Navajo speakers (today there are about 150,000) and only about thirty non-Navajos who spoke the language. And none of these were Japanese.

Johnston convinced the Marine Corps to conduct a test of Navajo as a "code." Under simulated battle conditions, Navajo recruits translated a three-line message from English into Navajo, transmitted it, and translated it back into English in about 20 seconds. It took an encryption machine thirty minutes to do the same task. The Marine Corps was sold and the first group of twenty-nine Code Talkers began training in the spring of 1942.

The Navajo code used by the Code Talkers was quite different from the Navajo language. The code used English grammar and syntax but Navajo vocabulary. Navajo words were used in two fashions. One was as a simple cipher for English letters. The second was as the basis for a limited glossary of code words.

The Code Talkers created this glossary consisting of about 450 commonly used English words using Navajo equivalents. Where no Navajo word existed, a common occurrence for many military jargon terms, a word would be created out of Navajo words. A fighter plane, for example, was a da-he-tih-hi, or hummingbird. A bomber was a jay-sho, or buzzard and its bombs were a-ye-shi, or eggs.

If a word was not in the glossary, it would be transmitted as a cipher. Each English letter had several (usually three) Navajo words that could be used to represent it. The letter A could be represented by the Navajo word wol-la-cheeant, or be-la-sanaapple, or tse-nillaxe. B could be na-hash-chidbadger, or shushbear, or toish-jehbarrel. And so on.

The Code Talkers could be rather creative in choosing the translations for the glossary. The word for America was ne-he-mah, meaning our mother. Alaska
was beh-hga, or with winter. Germany, a word that was probably not used too often by the Code Talkers in the Pacific, was besh-be-cha-he, or iron hat. And in that less politically correct time, Japan was beh-na-ali-tsosie, or slant eye.

Planes were named for types of birds. We’ve mentioned fighter planes and bombers, but a dive bomber was known as gini, or chicken hawk, and a torpedo plane was tas-chizzie, or swallow. Similarly, ships were named for aquatic creatures. A battleship was a lo-tso, or whale, and a submarine was a besh-lo, or iron fish.

The translations could be literal, artillery was be-al-doh-tso-lani, meaning many big guns. They could be metaphorical, a fortification was ah-na-sozi, or cliff dwelling, and a grenade was ni-ma-si, or potato. A tank was chay-da-gahi, or tortoise, and a tank destroyer was chay-da-gahi-nail-tsaidi, or tortoise killer. Or they could be plays on the English words, a dispatch, for example, was la-chai-en-seis-be-jay, or dog is patch.

The complete code can be found at http://www.history.navy.mil/faqs/faq61-4.htm.

The Navajo code proved to be especially resilient. The Japanese never broke it. In comparison, many of the other codes used by US forces were broken by the Japanese. Even when the Japanese caught on that Navajo was the basis for the code, it proved unbreakable. They even went so far as to force a Navajo prisoner-of-war, one of twenty captured when the Philippines fell at the start of the war, to listen to recordings of the Code Talkers. While he understood the words, he was unable to deduce the meaning of what was said.

Some 400 Navajos served as Marine Code Talkers during the war. They had a significant impact on the conduct of the war. They transmitted their coded messages efficiently and securely--six Code Talkers with the Fifth Marine Division on Iwo Jima sent and received over 800 messages in the first two days of the battle. Their contribution was considered so important that it remained classified until long after the war was over.

The Code Talkers are an interesting chapter in the history of WWII and of linguistics. With their function now performed by electronic devices, their like will never be seen again.

Linguistics Glossary

24 February 2006

This week we examine some terms that used in the field of linguistics and on Wordorigins.org. Like any field, linguistics has its own jargon (see below), used to convey information precisely and concisely. Sometimes this jargon is opaque and daunting to those encountering it for the first time. So, in the interests of better communication, we present this glossary of linguistic terms:

accent, a system of pronunciation used by an individual or group.

argot, slang (see below), esp. that of a socially suspect group.

blend, the fusion of two or more words into one, e.g., motel is a blend of motor and hotel, also known as portmanteau words.

borrowing, the taking of a word or phrase from one language into another, a word or phrase so taken, see loanword.

cant, a jargon (see below) used to mislead outsiders or protect secrets. Cants are ever changing, as the meanings of cant words become widely known the group adopts new terms.

creole, a blend of two dialects. Creoles are more sophisticated than pidgins (see below), being full dialects in their own right. Creoles are often created from pidgins, when a generation grows to adulthood speaking a pidgin as their native language. Some of these creoles bear the name pidgin, although they are actually creoles, such as Tok Pisin of Papua New Guinea. Creole is also the name of a French-English blend spoken in Louisiana.

derivation, 1) the process by which a word changes over time, e.g., channel derives from the Latin canalem; 2) a process by which complex words are formed from simpler ones, primarily through the addition of affixes, e.g., handily derives from hand.

diachronic linguistics, the study of the history and patterns of change of and in language, see also synchronic linguistics. 19th century linguistics was largely diachronic, but this has ceded ground to synchronic linguistics.

dialect, a system of communication using structured vocal sounds (or in the case of languages for the deaf, physical signs) and which can be embodied in other media, such as writing. Dialect is synonymous with language, especially one characteristic of a particular region, class, or person. Dialects have distinctive accent, grammar, vocabulary, and idiom. In linguistic jargon, a dialect is not distinguishable from a language. Sometimes the term is used to refer to provincial modes of speech that differ from the “standard.”

etymology, the origin and history of a word, the study of the origins of words.

folk etymology, 1) a process of word change where an unfamiliar word is substituted with a familiar one, e.g., cater-corner becomes kitty-corner; 2) a popular and usually incorrect hypothesis of the origin of a term.

generalization, a process of semantic change where the meaning of a term broadens over time, e.g., to sail once meant to travel over water propelled by the wind and now is often used to refer to any travel via water and even to move through any medium smoothly and effortlessly.

grammar, the set of patterns describing the syntax and morphology of a dialect. Grammar can be implicit and innate, or formal and written. The latter tends to be a subset of the former, consisting of only those patterns and rules that need to be stated for instructive purposes.

idiom, an expression whose sense and usage is not predictable by the normal rules of grammar, syntax, or semantics. An example of an English idiom is neck of the woods, meaning a particular locale. Use of the expression is not particularly related to forested regions or to narrow strips of land (necks). Idioms are usually fixed grammatically; one cannot, for example, refer to the woods’ neck.

inflection, the grammatical form of a word, varying in pronunciation, spelling, or by the addition or deletion of affixes. Going and gone are both inflections of the verb to go. English has relatively few inflections. Other languages may have many more.

jargon, a specialized vocabulary, especially that of a trade, profession, or other activity.

language, a dialect. The concept of languages like English, French, or Chinese are political/social constructs rather than linguistic. For example, Norwegian and Danish are mutually intelligible (for the most part), yet they are popularly considered separate languages, while Mandarin and Cantonese are not intelligible to one another, yet both are considered dialects of Chinese. It is often said, “a language is a dialect with an army and a navy.”

loanword, a word that has been taken from another language, see borrowing.

morpheme, the fundamental structural unit of a word. The word dogs, for example, consists of two morphemes, dog and s. There are various morphological systems used in linguistics and not all are consistent.

onomastics, the study of proper names.

orthography, spelling.

phoneme, a discrete sound used in combination with others to pronounce words. The word tooth, for example, consists of three phonemes, the initial consonant t, the oo vowel sound, and the th sound at the end.

phonetic, relating to the vocalization of speech.

pidgin, a contact dialect or lingua franca formed from one or more dialects, usually containing a simplified grammar and a limited, polyglot vocabulary. Pidgins form where there is a need for communication, but no common tongue, frequently in trading ports or similar venues.

portmanteau word, a blend, a portmanteau is a suitcase that opens like a book and the linguistic term comes to us from Lewis Carroll’s Through The Looking Glass, where Humpty Dumpty says to Alice " Well, slithy means lithe and slimy...You see it’s like a portmanteau—there are two meanings packed up into one word."

semantic, relating to the meanings of words and phrases. Semantics is the study of such meanings.

semiotics, the study of signs and symbols.

slang, an informal, non-technical vocabulary consisting chiefly of synonyms for standard words and phrases. Slang is usually, but not always, ephemeral.

specialization, a process of semantic change where the meaning of a term narrows in scope, e.g., hound once meant any dog, but has shifted to refer to hunting dogs that pursue live prey.

synchronic linguistics, the study of the state of language at a given time, see diachronic linguistics. 20th and 21st century linguistics has been primarily synchronic.

syntax, the order of words in a phrase, the permissible combinations of words in a dialect and the rules by which they combine. In English, rules of syntax have largely replaced inflectional grammar. Other languages have more flexible syntax, but more rigorous inflection.

usage, how the elements of language are customarily used to produce meaning. Usage includes grammar, semantics, syntax, accent, punctuation, orthography, and idiom.

word, the fundamental unit of a dialect, a vocalization (or sign) with a discernable meaning.

Corrections

17 February 2006

I received more email comments about last week’s issues than any other. Most focused on two typos: I misspelled Velcro and who. While I am an excellent proofreader of other people’s writing, like most writers I am abysmal at proofing my own words.

One writer objected to my classification of the words as from the 20th century, stating that the 20th century ran from 1901-2000, not 1900-1999. While this hairsplitting may be technically correct, common usage has the century (and millennium) ending on 31 December 1999. But the real reason for my choice of 1900-1999 as the dates is that the OED has no words with a first citation from 2000, so rather than leaving a blank in the slot for that year I shifted the set one year back.

Word a Year: 20th Century, Part 2

17 February 2006

Last week we examined fifty words, one from each of the years 1950-99. This week we look at words from the first half of the twentieth century.

The words chosen all have their first citation in the Oxford English Dictionary from the year in question. This does not mean that they were actually coined in that year, in fact most were probably not since it usually takes some time from the coining of a term and its appearance in print and there is no guarantee that the OED has even identified the earliest recorded use. But the words were reasonably new to the English language in the year in question and as such are a good guide for tripping down memory lane and recalling what things were new and important in a given year.

For most years, the OED offers several hundred words to choose from. The words here were not selected in any scientific or systematic way. They are simply words that stood out as either representative of that year or just because I found them interesting for some reason or another. Some were obvious choices. What would 1925 be without monkey trial, for example. Others were significant in the year they were coined and remain so today. Big Brother was coined in 1949, with Stalinism dominating Russia and Maoism having just taken over China, but it still resonates today with the current news reports of warrantless NSA wiretaps of American citizens. Others, like Movietone, are only historical artifacts today. Some appear because I was surprised how late (or early) they appeared. I would have, for example, thought sexy was around long before 1928 and I would have thought gigolo was more recent than 1922. Others were chosen just because I like them and it’s a good excuse to include them in A Way With Words, like wanderlust.

So, without further ado, the words of 1900-49

1949, Big Brothern., omnipotent state authority, esp. one that spies on its citizens, from Orwell’s 1984

1948, dim sumn., a Cantonese-style savory snack, a meal consisting of these

1947, jet stream, n., a strong wind in the upper troposphere, predominantly blowing from west to east

1946, on-again off-againadj., vacillating, intermittent

1945, fissionableadj., capable of undergoing and sustaining nuclear fission

1944, genociden., the deliberate extermination of a people or ethnic group

1943, acronymn., a word formed by the initial letters of other words

1942, zoot suitn., a style of man’s suit characterized by a long jacket with padded shoulders and high-waisted, tapering trousers

1941, paratroopern., a soldier trained to leave a perfectly good airplane via parachute

1940, blitzn., a military attack launched with suddenness and great violence, esp. an aerial bombing attack, as the Blitz referring to the attacks on London in that year

1939, walkie-talkien., a hand-held, two-way radio

1938, nylonn. & adj., type of synthetic fabric, denoting something made from nylon

1937, yetin., Sherpa name for a mythical ape-like mammal that dwells in the Himalayan mountains, the abominable snowman

1936, racismn., the belief that characteristics and abilities are determined by race and that one race is superior to others

1935, testosteronen., a steroid hormone responsible for the development of male secondary sexual characteristics

1934, audion., sound, esp. recorded or electronically transmitted sound

1933, V.I.P.abbrev., very important person

1932, bagel, n., a hard, ring-shaped, piece of bread

1931, black marketn., illegal trading in commodities that are illegal or in short supply

1930, Third Reichn., the German state under Hitler’s rule, speculative until 1933 when it became a reality

1929, delistv., to remove a security from the list of those traded by an exchange

1928, sexyadj., concerned with sex, sexually attractive

1927, Movietonen., brand name for a method of recording sound on a film negative, used chiefly in newsreels

1926, totalitarianadj., pertaining to a system of government where all institutions are subordinated to the state

1925, monkey trialn., Tennessee v. Scopes, the trial of a high school teacher for teaching evolution by natural selection in the public schools

1924, two-timev., to deceive, to be unfaithful to a lover

1923, Houdiniadj. & v., descriptive of an escape or disappearance, to escape or disappear, after the stage name of magician Erich Weiss (1874-1926)

1922, gigolon., a male escort for a woman, a kept man

1921, Fascistn. & adj., right-wing Italian nationalists under the leadership of Mussolini who came to power the following year, later applied to the Nazi party in Germany and more widely to anyone with right-wing, authoritarian views

1920, palookan., a stupid or loutish person, in boxing, a mediocre fighter

1919, airportn., a place where passengers embark and disembark airplanes

1918, pogey baitn., candy, from soldier’s slang

1917, camouflagen. & v., concealment for military purposes, to so disguise something

1916, tankn., an armored fighting vehicle

1915, Fritzn., used to denote something German, esp. a German soldier or soldiers

1914, air-raidn., an attack by airplane

1913, airmailn., postal service conveyed by airplane

1912, vigorishn., a percentage deducted from gambling winnings as payment to the house, a usurious rate of interest

1911, Cubismn., a style of early 20th century art consisting of a rejection of perspective of a single viewpoint and representations of people and objects using simple geometric shapes, also Cubist, an artist who works in Cubist style

1910, sabotagen., malicious damage inflicted on property during a labor strike or by military forces

1909, mouth-to-mouthadj., denoting a method of artificial respiration

1908, Ozn., spelling variant abbreviation of Australia

1907, Wimbledonn., lawn tennis tournament played in this district of South London

1906, Eln., an elevated train

1905, Sinn Feinn., the name of an Irish independence and nationalist movement

1904, heartlandn., the central region of a nation, esp. one that characterizes the nation as a whole

1903, chow meinn., fried noodles served with a thick sauce or stew

1902, wanderlustn., a desire or fondness for traveling

1901, Hall of Famen., a place where persons in a particular field or institution are commemorated

1900, Zeppelinn., a dirigible airship