The
Global Lexicostatistical Database: Mission statement
MISSION STATEMENT |
In
comparative-historical linguistics, uniform basic wordlists of related or
potentially related languages are primarily used for lexicostatistics,
a simple, but often efficient technique that derives the relative genetic
proximity of languages from the percentages of «shared» (that is, going back to
the same historical ancestor) items on the Swadesh list; and glottochronology,
a slightly more complex procedure of assigning absolute historical dates for
these common ancestors based on the idea that basic lexicon is replaced over time
at a constant or, at least, regularly shifting rate that can be used as a
rough equivalent of a «glotto-clock».
Although both
lexicostatistics and especially glottochronology have been frequently
criticized on various grounds (some of the criticisms are discussed, answered,
and refuted in GLD-related papers published on the site), they still remain a
viable, promising, and, what is most important, the only universally applicable method of language classification. The
exactness and reliability of the results, however, depend significantly on how
well the lexicostatistical calculations are aligned with the results of the historical-comparative analysis of the
basic lexicon.
The past decade has
seen a major increase of interest in various lexicostatistical techniques, much
of it stemming from the development of new phylogenetic classification methods
in biology and a desire to test them in other fields of study; there has been a
veritable swarm of publications in prestigious journals that apply complex statistical
and probabilistic algorithms to wordlists of languages. Very few of these
publications, however, have so far managed to make a serious impact on the
general field of historical linguistics, since, for the most part, what they
offer is statistical approximations that do not deal with individual «word
histories», and sometimes go as far as to contradict historical reality and
common sense — due to either false priors, or failure to take into account all
the necessary factors, or, as is very often the case, inadequate data
collections.
The chief mission of the GLD is to assemble a unified
and ordered collection of basic lexical data on all/most of the world's
languages that may be
easily submitted to various automatized algorithms of analysis, but, above
everything else, would acknowledge and take advantage of, rather than routinely
ignore, all the achievements of historical linguistics. To that end, the data,
wherever possible, should be accompanied with notes on available synchronic and
diachronic information; cognation indexes that tie together words of common
origin should be explicated and justified; and, most importantly, thorough
attention should be paid to the construction of the wordlist itself —
experience shows that common mistakes in the data are a very frequent problem
with some «popular» Swadesh wordlists indiscriminately employed by researchers
without a pedigree in historical linguistics.
The wordlists that are
collected and annotated on the GLD site may serve a variety of purposes. Beyond
the obvious one — serving as a basis for genetic classifications — they will be
of significant use to various typological studies, particularly in the field
of the typology of phonetic change. Another advantage of the annotation system
employed in the database is the possibility of its eventual use for the study
of semantic shifts in language; any progress in this sphere will have serious
repercussions in almost every area of linguistic sciences.
BACK TO MAIN PAGE DATABASE
LIST RUSSIAN VERSION
© 2011-2016 George Starostin (site design,
data input coordination)
© 2011-2016 Phil Krylov (programming,
technical support)