To be fair to the Guardian, though, they do mention the OED’s discussion of the term “data”, which, itself, says that in English “data” was historically used with plural verbs (and that it still used with plural verbs in some scientific fields). It isn’t totally clear if they are basing their statement regarding the “root” of the issue on 1) the rules of Latin grammar, 2) historical usage of the term in English, 3) current use of the term in some technical circles, or 4) some sort of poorly-defined mish-mash of the above three things. But I agree that their identification of the “root” of the issue seems to miss the point, which is whether, as a publication that is aimed at a wide audience of modern, mostly non-technical, English readers, it should use “data” in a manner consistent with the usage of modern, non-technical, English speakers, or whether it should insist on a usage that would be jarring to most such users out of a desire to appease the prescriptivists (who will likely find much to complain about in The Guardian in any event).
I’m more than a little puzzled by the distinction the WSJ apparently makes in a half-concession to current usage. It seems to direct its writers to use “data” with singular verbs when “data” refers to an identifiable collection of information, but to insist on plural verbs when referring to a not-yet-collected body of information. The rationale seems to be that “data” can only be properly used as a singular when it refers to a “collection” of information, and a body of information can not be said to be a collection if it has not been collected. So, one should say “the data that is available support’s Zandar’s theory”, because there is a discrete and identifiable body of data that has been collected about that theory as of this moment, but one should say that “the data are still being collected” because, by definition, data that are in the process of being collected cannot be said to be a collection of data.
This strikes me as a rather silly distinction that, in addition to being pointless, would likely be opaque to the WSJ’s readers, who, i think, would most likely assume that a writer erred if he or she wrote that “the data that is available supports the x theory” but later wrote in the same article (and, perhaps, the same sentence) that “the data are still being collected”.
And what is the point of the distinction? Those who favor “the data are” probably favor such usage in all circumstances, while those who find “the data are” to clang in the ear likely find it jarring in all circumstances.
The Guardian, I take it, simply has its writers say “the data is” in all circumstances, which seems like a happier solution.