We have spent a good chunk of time attempting to decipher a new study on prevalence of “Latin” languages on the Web, which, for some reason, is defined to include Romanian and German.

The conclusions, such as they can be deciphered, seem to run as follows:

Moments later, in an incomprehensible table, we are given the following figures:

We really cannot figure this out. Do the “Latin” languages account for 44.83% or 22.64% of Web sites? We handily doubt the former, which, if true, will surely trigger new rounds of fulmination from white supremacists enraged that the spics are even taking over the goddamn Internet. We assume the presence of Romanians and Germans in the estimate will remain unnoticed by this group.

At any rate, assuming for the moment that the statistics are remotely credible, the authors conclude that the proportion of speakers of all surveyed languages who actually publish online is roughly equivalent for all surveyed languages. Every language group, it seems, has an equal propensity to publish online.

The methodology is also clever: Researchers merely searched for 57 words in the target languages. Evidently the search terms are all spelled uniquely in each language, avoiding cognates, whether false (as English and French location) or true (as English and French section). A couple of the search terms are spectacularly unlikely (“homosexualities” plural, anyone?), but it’s not a bad idea.

We have yet to see clear and credible estimates of the prevalence of languages online. We suspect that the current research could be reliable if it were explained properly.

Posted on 2001-10-04