Words On The Web: www.oclc.org\worldcat

1 September 2006

The folks that bring the Dewey Decimal System, the Online Computer Library Center, or OCLC, have a great catalog search service. By visiting their web site at http:\\www.oclc.org\worldcat, you can enter in search terms and search a multitude of library catalogs for that book. You then enter in your city or postal code and the Worldcat service will give the libraries that that book in order of the distance from you.

For example, I enter in Word Myths and Emeryville, CA and I’m told that there are 408 libraries in the Worldcat system that have my book. The closest is the University of California Berkeley, some three miles away, followed by the San Francisco Public Library, across the bay some nine miles away. The farthest is the Singapore Polytechnic Library, half a world away.

This is an invaluable resource when you’re looking for a particularly hard-to-find book.

Classifying Human Knowledge, Part 1

1 September 2006

I’ve spent the last week organizing my library, a task that, surprisingly, has turned out to be quite interesting. In an effort to find a classification scheme that works for me, I’ve been looking at an learning about the various systems in use in libraries around the world.

The most famous is perhaps the Dewey Decimal System. Invented by Melvil Dewey in 1876, it is the most widely used library classification in the United States, used primarily by public and primary school libraries. The DDS divides all human knowledge into ten major divisions, each of these have ten possible subdivisions, these each have ten more, and so on. Hence the decimal.

The top level domains are:

  • 000 – Computer science, information, general works

  • 100 – Philosophy and psychology

  • 200 – Religion

  • 300 – Social sciences

  • 400 – Language

  • 500 – Science

  • 600 – Technology

  • 700 – Arts and recreation

  • 800 – Literature

  • 900 – History and geography

In the language category, for example, the subdivisions are:

  • 400 – General

  • 410 – Linguistics

  • 420 – English

  • 430 – Other Germanic languages

  • 440 – French, Provencal, and Catalan

  • 450 – Italian and Romanian

  • 460 – Spanish and Portuguese

  • 470 – Latin

  • 480 – Greek

  • 490 – Other languages

English, again for example, is broken into:

  • 421 – Writing system and phonology

  • 422 – Etymology

  • 423 – Dictionaries

  • 424 – Not used

  • 425 – Grammar

  • 426 – Not used

  • 427 – Language variations (dialects and slang)

  • 428 – Usage

  • 429 – Old English

The same numbers are used across the various categories to denote similar subdivisions. So 432 is German etymology and 482 is Greek etymology.

These categories can be further extended by numbers following a decimal point to further classify the work. The number .73, for example, denotes the United States. So the call number 427.73 is a book about American dialect. This consistent use of the same numerical combinations across all subdivisions (e.g., 973 is history of the United States) makes it easy for those familiar with the system to see how a book is classified.

Since there are many different books in these broad categories, the category number is usually followed by a Cutter number (see below) that denotes the author’s name, e.g., T911 is Mark Twain and the category 813 T911 contains fiction by Twain (81 American Literature, 3 Fiction). For prolific authors, like Twain, this Cutter number is often followed by a alphabetic sequence that either represents the title or the order in which the library acquired the book–so that new acquisitions can simply be put at the end of the appropriate shelf. So The Adventures of Huckleberry Finn might have a call number of 813 T911 Ad, or 813 T911 Fi, or, as it is shelved in the Berkeley Public Library, 813 T911zb.

It’s often thought that the Dewey system is for non-fiction only. This erroneous notion is because many libraries don’t use Dewey to classify fiction. Instead they use the author’s last name alone. This is helpful to general readers who just want to find the book and don’t care if T.S. Eliot is classified as American or British. So Huck Finn is classified as Fic Tw in many libraries. Similary, biography is often not filed in the Dewey category of 920 and instead a book of Twain’s life is filed under B Tw.

The chief problem with the Dewey Decimal system is that it is very American and European focused. For example, most of the languages of the world are crammed into the 490 category. Arabic, Native American, and Finnish can all be found here. It is kept up to date by the Online Computer Library Center, which owns the rights to the system, with new categories, like computer science, added from time to time. But it is very much captive to a 19th century American view of the relative importance of various classes of knowledge.

An improvement over Dewey is the Universal Decimal Classification or UDC. Invented by Belgian bibliographers Paul Otlet and Henri la Fontaine at the beginning of the 20th century, it is a variation on Dewey’s original system. It rarely used in the United States, but is the primary system for library classification in Britain and other English-speaking countries and can frequently be found to classify libraries in non-English-speaking countries as well. Like the Dewey system, it is kept up-to-date by a consortium of libraries.

The high level categories are similar to Dewey, except that 4 is not used and language and linguistics are grouped with literature in 8. The subcategories are organized so they are more easily extensible. You can keep adding digits to become more specialized.

The UDC also includes a notation system for denoting the relationship between categories in a book. This is especially powerful.

  • + plus sign, means that the book is about the two categories

  • / slash, means the book covers all the categories between the two numbers given

  • : colon, means the book is concerned with the relationship between the two categories

  • [ ] brackets, combines categories into a single unit

  • = equals sign, denotes the language in which the book is written.

So 31:[622+669](485)=20 is a book of statistics on mining and metallurgy in Sweden that is written in English, i>Statistics:[Mining+Metallurgy](Sweden)=English.

A third system is the Cutter Expansive Classification. Invented by Charles Cutter in the 1880s and 1990s for Boston’s Athenaeum library, it is used by only a few libraries, mostly in New England. The top level domains of the Cutter system are:

  • A – General works

  • B-D – Philosophy, psychology, religion

  • E-G – Biography, history, geography

  • H-J – Social sciences, law

  • L-T – Science, technology

  • U-Vs – Military, sports, recreation

  • Vt-W 150; Theater, music, fine arts

  • X – Philology, language

  • Y – Literature

  • Z – Book arts, bibliography

The Cutter system also denotes the size of the volume in its call number, using points (.), pluses (+), and slashes (/) to denote books of small to large size. This is very useful if over or undersized works are stored separately or for quickly locating books on the shelves.

Cutter also devised an ingenious system for classifying author’s names. Cutter created tables of two or three digits that stood for the rest of the name of an author. A214, for example, is John Adams. These tables are in use in most libraries to form the basis of the author’s name portion of call numbers.

Next week: Library of Congress Classification and tags

Words On The Web: LibraryThing.com

25 August 2006

A persistent vexation of mine is not being able to find the book I want. I know it’s on the shelf somewhere, but I just can’t find it. I’ve often spent ten minutes or more tracking down a book. My personal library is large (over 500 books), but it is by no means huge. Another issue is that I occasionally find myself buying multiple copies of a book–I forget what books I already own. I’ve often thought that I can’t be the only one with this problem and that there must be an easy way of organizing my books that someone else has pioneered.

Well, this week I discovered LibraryThing.com. It is a sublime website. Cataloguing a library of some size is never easy, but LibraryThing.com makes it nearly so. So what is LibraryThing?

First, it is a site designed to help you catalog your books. You can enter your books online–usually just a few words from the title and the author’s last name–and hit the search button. LibraryThing will create a catalog entry for you based on the catalog of the Library of Congress, Amazon.com, or any one of several dozen major libraries around the world. It will give you the Library of Congress and Dewey Decimal call numbers, the ISBN, publisher information, etc. In just a few hours I created catalog entries for over half of my books and I expect to be done by Sunday.

The search function works incredibly well. Gone are my days of going to the Library of Congress website to find data on a book. LibraryThing’s search interface is far easier and much faster. Although, LibraryThing does have trouble searching on the classics. Searching on "Dracula, Stoker", for example, turns up several hundred possibilities. These include commentaries on the novel as well as the primary work itself. And there is no easy way to sort the returned entries–a Googlesque problem. But for most books, published in a handful of editions, this is not an issue.

You can also add your own tags to the entries in your collection. So you can tag all your books on quotations, or on slang, or about dogs, or 18th century French poetry. Whatever tags meet your needs.

Your catalog data resides on the LibraryThing servers. (You have the choice of whether to keep it private or make it available for viewing to others.) But you can download it in comma or tab-delimited formats for use by spreadsheets or database programs. There are even features to allow you check your catalog from a mobile phone. (Useful when standing in the bookstore wondering if you have already have a copy of that book you are about to buy). And for those with large libraries, having a list of all your books offsite will help in reestablishing your collection in case of fire or other disaster.

The second aspect of the site is community. There are discussion forums galore. You can find other users who share the same tastes as you. (I share at least 41 books with another prominent contributor to the Wordorigins discussion forums. Go to the site and try to find him–his nearly 5,000 books puts my paltry 500 or so to shame.) Users can contribute reviews and share cataloging schemes. You can get a list of recommended books based on what is in the libraries of readers similar to you.

It’s also fun to look at some of the statistics of the books cataloged. As I write this, there are 71,838 collections, containing 5,087,028 books, of which 1,175,812 are unique works. The most popular author is (no surprise) J.K. Rowling with 37,552 copies of her books in the combined collections. Stephen King is second with 28,824. The Bard rolls in at seventh place with 15,860. The most popular book is Harry Potter and the Half-Blood Prince with 6,047 copies–Harry Potter books occupy the top six positions, all with over 5,000 copies–attesting to the reader loyalty engendered by the series. In seventh is The Da Vinci Code (4,662). And giving some solace to those who are despairing at the lack of "great" books, Orwell’s 1984 takes the eighth spot (3,835). (Ironic, as LibraryThing is the antithesis of Big Brother.) The Catcher In The Rye (3,710) and The Hobbit (3,709) round out the top ten.

The site is allows you to catalog up to 200 books for free. You can buy an annual membership for $10 that allows you an unlimited number of books in your catalog. Or a lifetime membership is just $25. So, the fees are quite reasonable and I, for one, was happy to pay to help keep such a site going.

I’m going to be spending the weekend in ontological ecstasy. I’ve already decided to going to rearrange the books by LC call number, with some variations from the official scheme that make sense for me (such as grouping all my books on toponyms together instead of regionally as the LC does). I haven’t decided whether or not to label the books with the call number. I’ll probably not do so, at least not at first.

Revisiting the Planets, Redux

25 August 2006

The International Astronomical Union (IAU) voted Thursday on a definition of the word planet. The proposed definition we reported on last week was rejected and the IAU defined a planet as a celestial body that

  • is in orbit around the Sun,

  • has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape, and

  • has cleared the neighborhood around its orbit.

According to the IAU, this leaves our solar system with eight planets, Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. By this definition, Pluto is not a planet because it has not cleared its neighborhood.

The IAU also rejected use of the proposed term pluton, for the class of objects similar to Pluto. That term is also commonly used by geologists for an igneous mass that forms when molten rock cools underground and it was thought that there could be confusion between the geological and astronomical senses–although that doesn’t seem very likely. Context would rule out any chance of confusion in most cases. After all, there isn’t any confusion of the geological and anatomical uses of vein.

More problematic for pluton is that in French and Italian this is the name for the former-planet Pluto. This could cause much confusion between the class and the specific body in those languages.

Instead of pluton, the IAU decided on another linguistically problematic term, dwarf planet, which is defined as a celestial body that:

  • is in orbit around the Sun,

  • has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape,

  • has not cleared the neighborhood around its orbit, and

  • is not a satellite.

Benjamin Zimmer over at Language Log has a good discussion as to why this is a questionable form in English. In English compound nouns, the more general term is usually the second noun. Catfish are fish, not felines and mountain lions are cats, not masses of rock. Although there are exceptions, like sea lion. Although a dwarf star, arguably the most similar term to dwarf planet, is most definitely a star.

But perhaps the most cogent commentary on the subject is by Ruben Bolling who penned this cartoon. The third example is the best.

Planets & Plutons: An Update

18 August 2006

Back in November of last year, I wrote about the International Astronomical Union (IAU) and how planets were named. The IAU is currently meeting and has proposed a definition of planet. (It has not had a formal definition of the term to date.) The organization will vote on the proposal on Thursday. In addition to getting into the lexicographical game by coming up with a definition, the IAU is also proposing a new term, pluton, for Pluto-like objects that orbit the sun beyond Neptune.

The new definition was prompted by the continuing debate over whether or not Pluto should be considered a planet and by the discovery of 2003 UB313, unofficially nicknamed Xena after the warrior princess of television fame, an object much further and much larger than Pluto. But the IAU had some surprises when it announced its proposed definition this week.

Under the proposed definition, a planet is an object that

  • has sufficient mass to assume near-circular shape because of its own gravity, and

  • is in orbit around a star, and

  • is not itself a star, and

  • is not a satellite of a planet in the sense of having an orbit that goes around a center of gravity that is located inside a body that is independently a planet.

If the proposal passes, there will (for the moment) be twelve planets, instead of the current nine. And there are several surprises in the new planetary nominees. One item that is not surprising, but was by no means certain, is that Pluto will retain its planetary status. And given this, Xena will also be a planet–no surprise given that it is larger than Pluto. But the other two new planets are the shockers.

One is Ceres, the largest of the asteroids. Ceres, discovered in 1802, was once considered a planet, but had long since been demoted to asteroid status. It is only 930 km in diameter (Pluto is 2,274 km in diameter; Earth is 12,756 km, and the Earth’s Moon is 3,476 km.) The new IAU definition of planet includes Ceres because it is a circumsolar object (it orbits the sun directly and not another object) and because it has enough mass to form a sphere.

The second surprise is Charon, Pluto’s moon. Even though it orbits Pluto, and not the sun directly, Charon makes the planetary cut because of the fourth criterion. The barycenter (center of gravity) of the Pluto-Charon system is outside Pluto. (By comparison, the barycenter of the Earth-Moon system is 1,700 km beneath the Earth’s surface.) Pluto-Charon is essentially a dual planet.

The number of planets is not fixed at twelve and as more trans-Neptunian objects like Pluto and Xena are discovered many will also be so designated. There are several other known objects that might make the grade at a future IAU meeting, including Sedna and Quaoar, which are both larger than Ceres and Charon. Another is 2003 EL61, which is also larger than Charon and Ceres, but which has an elliptical shape rather than round.

The IAU is proposing a name for these icy trans-Neptunian objects, calling them plutons. This is a useful term, although it is also a geological term for an igneous rock formed below the Earth’s surface–there will probably be little or no confusion between the two. Pluton is superior to trans-Neptunian object as it can also be applied to such objects in other star systems, when and if they are discovered.

But what is of interest to us word mavens is not so much which objects receive planetary designations by the IAU, but the IAU’s attempt to define the word planet. First, they are not using the methodology that lexicographers do. They are not surveying usage and deriving a definition from how people use the term. Instead, as is usually the case for technical definitions, they are establishing a rigid definition intended to categorize objects by criteria that are scientifically useful. While this method is perfectly fine for technical definitions, their results are highly questionable.

The first problem with the definition is that of the barycenter codicil that admits Charon to planetary status. The barycenters of planet-moon systems are not fixed, changing over time. The Earth’s moon, for example, is moving away from the Earth by about 4 cm a year. In 40 million years (which may seem long to us but is only a moment in planetary time) the Moon will become a planet when the barycenter of the Earth-Moon system moves above the Earth’s surface. If objects can move in and out of planetary status in relatively short periods, does the categorization tell us anything meaningful?

Another is the question of how round is round? No planet is perfectly spherical. The Earth is somewhat pear-shaped, a bit fatter in the southern hemisphere than in the north. Saturn is noticeably squashed at the poles. What an object is made of and how fast it rotates are as important as mass in whether or not gravitational forces shape it into a sphere. As I have noted, 2003 EL61 is considerably larger than Ceres, but it is egg-shaped due to its rotation and the fact that it is probably made largely of ice instead of rock. This inserts an subjective and arbitrary judgment into what should be objective and meaningful criteria. The key isn’t roundness, it’s mass; in which case the definition should cut out the middle-man and specify a minimum mass.

But the larger question is whether or not any definition of "planet" serves a useful astronomical purpose. When objects are as different as rocks like the Earth, gas giants like Jupiter, and ice balls like Pluto, is there any sense in trying to lump them into a single category? Astronomy might be better served to classify the objects that orbit the sun into terrestrial bodies (big rocks like Mercury, Venus, Earth, Mars, as well as many moons), asteroids (rocks that are too small to be considered in the first category), gas giants (Jupiter, Saturn, Uranus, and Neptune), and plutons (Pluto, Xena, and comets).

Astronomers should take their cues from another discipline. At one time, biologists considered race an important category in studying humans. But over time, it became clear, that although the characteristics associated with race were biologically determined, the characteristics could not be used to form any scientifically useful categories. Skin color, for example, tells you nothing about the person except their skin color. Today, no reputable biologist attempts to classify people by race. This is not to say that race is not culturally relevant, but it is meaningless as far as the science of biology is concerned. Biologists have gotten out of the race business, leaving the definition of racial categories to social scientists and lexicographers.
Similarly, astronomers should do the same thing with the word planet. They should declare that astronomically, the category is useless. Leave it to the lexicographers to find a definition based on how people (astronomers, schoolchildren, and the general public) use the term. Planets would still exist, but whether an object is defined as a planet should not concern an astronomer. In such a descriptive definition, the planets would consist of the eight classical planets and, depending on who you asked, Pluto and some of the other trans-Neptunian objects.

Little is served by a scientific body giving its imprimatur to a definition that has little or no scientific utility.

See the website Bad Astronomy for an excellent, if opinionated, review of the astronomical issues associated with the IAU’s definition of planet.