The Global Lexicostatistical Database: Downloads

DOWNLOADS

The GLBD Experimental Tree-Building Procedures: Some Preliminary Results

Mostly as a rough preview for more important things to come, this page hosts some of the classificatory results, achieved within the GLD based on (1) application of standard comparative-historical procedure to relatively small («level 1») databases; (2) attempts to replace potentially subjective historical analysis with «objective» computerized procedures; (3) skipping detailed analysis of individual wordlists and creating a preliminary rough sketch of how one of the highest levels of the GLD could look like in the future.

1. «Regular» Trees

The GLD currently offers its users the possibility to create genealogical trees in on-line mode based on a whole variety of parameters (see A Quick Tutorial for more detailed information). The default tree is, however, assumed to be the one based on either «Fixed rate» (based on Sergei Starostin's revision of the Swadesh formula, with the rate of replacement per millennium = 0.05%) or «Variable rate» (based on the same formula, but with a variable rate, dependent on individual stability indexes of different Swadesh items); the two are usually very close. The table below allows to quickly browse through the «official» trees — including glottochronological dates of separation — for all of the language groups currently uploaded on the main GLD site.

Language group	Fixed rate tree	Variable rate tree	Language group	Fixed rate tree	Variable rate tree
Ekoid	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\Users\Gstarst\Desktop\eko-f.png$	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\Users\Gstarst\Desktop\eko-v.png$	North Khoisan	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\nkh-f.png$	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\nkh-v.png$
Nakh	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\nah.png$	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\nah-v.png$	Ob-Ugrian	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\oug-f.png$	$Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: C:\oug-v.png$

2. The «Objectively Generated» Tree of Phonetic Similarity

This particular tree is constructed on principles that are similar to the much larger «ASJP World Language Tree of Lexical Similarity» (the latter utilizes a slightly more complex algorithm and includes about 100 times as many languages, although, on the other hand, it only operates on 40-item wordlists of generally more dubious quality). It includes data from all the languages entered into the GLD (minus some of the latest additions; new versions should be expected every few months), treated as follows:

a. Etymologically based cognation indexes, supplied by wordlist compilers, have been removed;

b. New cognation indexes are generated automatically by means of an algorithm that assumes «cognation» based on phonetic similarity of the words compared. Phonetic similarity is, in this case, defined as «the first and the second consonant of the roots of the compared words respectively belong to the same consonantal classes» (information on consonantal classes is contained in the StarLing file sound.dbf, whose structure can be seen here).

c. A lexicostatistical matrix and genealogical tree are constructed, based on new «cognation» indexes with the application of the «Variable rate» formula.

Click here to view the latest version (01.14.2013) of The «Objectively Generated» Tree of Phonetic Similarity.

Older versions: 10.19.2011, 07.31.2011.

Colors between nodes are to be understood as follows:

blue = the computer has correctly identified a language family or at least a subset of such a family (although its internal structure may still have been identified with errors). Such results are positive;

yellow = the computer has identified all or part of a «controversial» family that has been hypothesized by at least some linguists on a serious level, but has not found mass acceptance. Such results are relevant, but inconclusive;

red = the established link (as a rule, based on a minimal number of «cognates» that is indistinguishable from chance) reflects a currently unprovable or vaguely guessed connection at a deep level. Such results are irrelevant.

3. Preliminary Genealogical Tree For Eurasia (based on 50-item wordlists)

This tree can be viewed as a very rough, sketchy «shortcut» towards the ultimate goal of the GLD project. It has been built based on manual cognate indexation of 50-item wordlists, created for more than 150 low-level proto-languages (such as Proto-Germanic, Proto-Turkic, Proto-Ethiosemitic, Proto-Nakh, etc.) and language isolates of Eurasia. The main body of the database was put together by G. Starostin, with helpful input from A. Dybo, A. Kassian, M. Zhivlov, and other colleagues at the Center for Comparative Linguistics.

Cognates were identified, where possible, based on known phonetic correspondences. For poorly studied families as well as macrofamily levels, basic phonetic resemblance (determined by more or less the same criteria that are used for the «objectively generated» tree) has been used instead of regular correspondences. All of the comparisons have been done based on a «step-by-step» cladistic approach, as described in the GLD paper «Preliminary lexicostatistics».

Click here to view the latest version (07.31.2011) of The Preliminary Genealogical Tree For Eurasia (long variant).

Click here to view the latest version (07.31.2011) of The Preliminary Genealogical Tree For Eurasia (short variant).

Click here to see the data (without cognation indexes) in Excel form (WARNING: Many of the reconstructions are highly preliminary. Please do not quote anything from this file without consulting us at gstarst@rinet.ru).

BACK TO MAIN PAGE DATABASE LIST RUSSIAN VERSION