This week the New York Times took the unusual step of publishing an anonymous op-ed piece by someone identified as “a senior official in the Trump administration” that was sharply critical of Trump. The writer described the president as incompetent and out of his depth and said that they and other senior administration officials actively worked to keep Trump from making decisions. Needless to say, it was a rather explosive article and speculation about who wrote it began immediately.

One particular speculative claim, however, is of particular interest and relates to this blog because of its linguistic nature. A certain Dan Bloom took to Twitter with the claim that the piece was written by Vice President Mike Pence, claiming that the giveaway was the piece’s use of the word lodestar. The anonymous op-ed had praised the recently deceased Senator John McCain as being “a lodestar for restoring honor to public life and our national dialogue.” Bloom points to the fact that Pence has used lodestar on numerous occasions in the past, dating back to 2001, and that it is an unusual word. But he is just wrong in the way he conducts his analysis.

Now regardless of what one’s political leanings are, the idea that Pence scribed the op-ed piece is rather delicious. His authorship would raise all sorts of constitutional and political questions and issues. But, linguistically, the theory is a load of hooey. That’s simply not how one goes about ascribing authorship to an anonymous piece.

To start, lodestar is not all that unusual a term. The Oxford English Dictionary says it appears in current usage about 0.1 to 1.0 times per million words. That seems low at first glance, but that’s the same range as overhang, life support, register, rewrite, nutshell, candlestick, rodeo, embouchure, and insectivore. The word is also quite familiar to lawyers, as the lodestar standard is a method courts use to estimate legal fees in a lawsuit, and there are a lot of lawyers working in the White House. (Most of the hits for lodestar in the Corpus of Contemporary American English are references to this method of legal fee calculation.)

Even more problematic is that ascription of authorship to an anonymous text does not rely on single, uncommon, content words (nouns, non-copulative verbs, adjectives, adverbs) like lodestar, and it doesn’t do it for one very good reason. The choice of these words is largely dependent on the topic being written about. One simply does not use the same content terms when writing about economics as opposed to biology, or about linguistics as opposed to a newspaper op-ed about the White House. Measuring the use of such words tells you the topic, not who wrote it.

Instead, a legitimate stylistic analysis relies on function words (prepositions, copulatives, pronouns, conjunctions), very common content words that do not rely on topic, and repeated collocations of words. Not only do patterns of use of these words not change depending on the topic, they are harder to fake or mask—something a writer of an politically explosive op-ed who wished to be anonymous would be likely to do. Stylistic analysis looks for the relative frequency of these words in an author’s writing and creates a “signature” that can be compared to the anonymous text. Needless to say, such analysis must be computerized.

And there are fundamental problems with applying stylistic analysis to this particular op-ed piece. For one thing, at less than a thousand words, it is simply too short to create a reliable signature. One needs a text of several thousand words before reliable results can be generated. Then there is the problem with ghostwriters. Undoubtedly, many of speeches and articles ascribed to a politician of Pence’s stature are written by staffers. One needs a large corpus of material known to be written by the person in question. Lack of that would frustrate any stylistic analysis.

Now, I have no idea who wrote the Times op-ed piece, but the idea that its use of lodestar demonstrates anything is just plain wrong. Such armchair linguistic analysis is simply not valid.


