rfmcdonald: (Default)
[personal profile] rfmcdonald
I mentioned last August a recent computer study claiming that, contrary to current consensus, the ur-heimat of Indo-European languages was located not in the Pontic steppes on the northern shore of the Black Sea but rather in the landmass of Anatolia on the southern shore. I linked in passing to Razib Khan's GNXP criticism of problems with the model used by the team--Romani's divergence from South Asian Indo-European languages is much more historically recent than the model claims, notably--but now at Geocurrents, Martin Lewis has a long post with the full title of "Mismodeling Indo-European Origin and Expansion: Bouckaert, Atkinson, Wade and the Assault on Historical Linguistics" taking issue with the basic claims. Five paragraphs are excerpted below.


Our initial response was one of profound skepticism, as it hardly seemed likely that a single mathematical study could “solve” one of the most carefully examined conundrums of the distant human past. Recent work in both linguistics and archeology, moreover, has tended against the Anatolian hypothesis, placing Indo-European origins in the steppe and parkland zone of what is now Ukraine, southwest Russia, and environs. The massive literature on the subject was exhaustively weighed as recently as 2007 by David W. Anthony in his magisterial study, The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Could such a brief article as that of Bouckaert et al. really overturn Anthony’s profound syntheses so easily?

The more we examined the articles in question, the more our reservations deepened. In the Science piece, the painstaking work of generations of historical linguists who have rigorously examined Indo-European origins and expansion is shrugged off as if it were of no account, even though the study itself rests entirely on the taken-for-granted work of linguists in establishing relations among languages based on words of common descent (cognates). In Wade’s New York Times article, contending accounts and lines of evidence are mentioned, but in a casual and slipshod manner. More problematic are the graphics offered by Bouckaert and company. The linguistic family trees generated by their model are clearly wrong, as we shall see in forthcoming posts. And on the website that accompanies the article, an animated map (“movie,” according to its creators) of Indo-European expansion is so error-riddled as to be amusing, and the conventional map on the same site is almost as bad. Mathematically intricate though it may be, the model employed by the authors nonetheless churns out demonstrably false information.

Failing the most basic tests of verification, the Bouckaert article typifies the kind of undue reductionism that sometimes gives scientific excursions into human history and behavior a bad name, based on the belief that a few key concepts linked to clever techniques can allow one to side-step complexity, promising mathematically elegant short-cuts to knowledge. While purporting to offer a truly scientific* approach, Bouckaert et al. actually forward an example of scientism, or the inappropriate and overweening application of specific scientific techniques to problems that lie beyond their own purview.

The Science article lays its stake to scientific standing in a straightforward but unconvincing manner. The authors claim that as two theories of Indo-European (I-E) origin vie for acceptance, a geo-mathematical analysis based on established linguistic and historical data can show which one is correct. Actually, many theories of I-E origin have been proposed over the years, most of which—including the Anatolian hypothesis—have been rejected by most specialists on empirical grounds. Establishing the firm numerical base necessary for an all-encompassing mathematical analysis of splitting and spreading languages is, moreover, all but impossible. The list of basic cognates found among Indo-European languages is not settled, nor is the actual enumeration of separate I-E languages, and the timing of the branching of the linguistic tree remains controversial as well. As a result of such uncertainties, errors can easily accumulate and compound, undermining the approach.

The scientific failings of the Bouckaert et al. article, however, go much deeper than that of mere data uncertainty. The study rests on unexamined postulates about language spread, assuming that the process works through simple spatial diffusion in much the same way as a virus spreads from organism to organism. Such a hypothesis is intriguing, but must be regarded as a proposition rather than a given, as it does not rest on a foundation of evidence. The scientific method calls for all such assumptions to be put to the test. One can easily do so in this instance. One could, for example, mathematically model the hypothesized diffusion of Indo-European languages for historical periods in which we have firm linguistic-geographical information to see if the predicted patterns conform to those of the real world. If they do not, one could only conclude that the approach fails. Such failure could stem either from the fact that the data used are too incomplete and compromised to be of value (garbage in/garbage out), of from a more general collapse of the diffusional model. Either possibility would invalidate the Science article.


This is promised to be the first post of a few criticizing the paper.
Page generated Jan. 29th, 2026 10:40 pm
Powered by Dreamwidth Studios