The Tower of Babel

Evolution of Human Language Project
The Tower of Babel
All Databases
Interactive Maps
Russian Language
Text Corpora
What Is It?
Wiki & issue tracker
Technical Advice
Analytical Catalogue
Mythological Motifs

Online Database on
Russian Folk Dialects

Sergey A. Krylov.
“Quantitative Instance-Oriented
Grammatical Dictionary of Modern
Mongolian” database

In memoriam:
S. Starostin

Artem Kozmin
<< Home Page

Updates and upgrades

A new section, Meetings, has been added to the site, featuring programs and various materials from conferences, workshops, seminars, etc., organized by participants of the "Tower of Babel" project.

Our latest sub-project, more than a year in the making, is finally out: The Global Lexicostatistical Database. Check it out!

We now feature a complete re-design of the "Articles and Books" section, with a flexible system of work sorting and, additionally, more than twenty new titles added to the list. Please check it out!

As some of you may have noticed, two new options are now available for some of the databases on the site, namely, downloading them in complete form in PDF and Excel format. This will make it easier for the user to analyze the overall scope and value of the offered data, as well as make use of it offline. Please keep in mind, however, that much of the data are still offered in relatively raw form and must be used cautiously. Also, in order for the Excel files to be displayed correctly, you will need to install the LinoStar font which you can download here.
In addition to the new formats, all the databases for Khoisan languages have been seriously reworked so that they now serve as an approximate template for the "ideal" database: including all references to source material, general descriptions that include explanation of database structure, notes on transcription, classification, and - occasionally - sound correspondences, and also providing Swadesh wordlists to go along with the etymological databases, accompanied by genealogical tree diagrams. We hope that, in the future, all the other databases will be made to suit this template as well.

Finally, most of the introductory pages on the site have been renewed and remodeled. In particular, the links page has been rewritten from scratch (most of the links weren't functioning anyway); a brief description of all the new databases on the site has been added; an F.A.Q. file is now functioning; and minor updates have been introduced on most other pages as well.

Some radical modifications are introduced in the "Articles and Books" section. A large number of publications have been added, including both new and older ones (many of them exceedingly hard to find or virtually unavailable in printed form), so that the section might reflect the views, methods, and results of the Moscow school of comparative linguistics on a more advanced level. Also, many of the previously available texts have been re-edited, with some serious errors due to font/conversion problems finally corrected.

After a long break (with a few minor updates to the databases every now and then), finally some major additions to the site:
1) A Frequently Answered Questions file (English version only) has been added to the main menu. Hopefully it will cover a lot of the topics that need to be clarified for regular users of the site. The F.A.Q. file is open and will necessarily expand in the future.
2) A welcome contribution from John Bengtson: the Basque comparative database! Already integrated in with the Sino-Caucasian database.
3) Some important additions to the "Articles and Books" section: Merritt Ruhlen's revised version of Joe Greenberg's comparative Amerind dictionary, and several articles by the late S.A. Starostin, including his classic work on lexicostatistics (in English) and his Proto-Yeniseian reconstruction (in Russian).

A new set of interactive maps has finally been integrated into the website, replacing the older (and, in the light of newly added databases, seriously obsolete) system. The maps stem from J. Greenberg's classification and have been kindly provided by Merritt Ruhlen. The language classification displayed on the maps, of course, primarily represents Greenberg's vision of the world's macrophyla, but the differences from ToB's classification are usually insignificant (besides, some of the maps are present in alternative variants - such as Greenberg's "Eurasiatic" vs. Illich-Svitych's "Nostratic"). Special thanks go to Artem Kozmin who has spent a long time working on the new system.

After a relatively long break, some minor and major additions. Minor ones include wiping out a few mistakes from selected databases. Major ones include the addition of O. V. Stolbova's comparative Chadic databases, which brings to a relative completion the Afro-Asiatic section of the site.

Some major technical modifications within the Altaic database (added hyperlinks between etymologies in the notes section, corrected some mistakes and misprints, as well as added links to the bibliography database on the site).

The Chinese database (bigchina.dbf) has been seriously updated, with detailed Russian meanings added as well as intermediate reconstructions belonging to several Old Chinese periods (from Classic through Western and Eastern Han to Postclassic) according to S. A. Starostin's reconstruction. Also, the "Articles" section now hosts S. A. Yatsemirsky's article on the history Etruscan numerals, both English and Russian versions.

Finally added the "In memoriam: S. Starostin" section, with biographic materials, photos and even a few audiofiles. The section is bound to be updated and modified; for comments and additions, please E-mail us at

The "Articles and Books" section has been enriched with S. A. Starostin's massive description of the comparative Sino-Caucasian phonology, written in 2004-05 and so far unpublished officially. The description is accompanied by a detailed glossary (more or less the equivalent of the corresponding database in PDF format).

All the etymological databases in the DOWNLOAD section have finally been renewed; two new databases - Kartvelian and Indo-European - have also been added.

More updates on the site: 1) We are finally proud to present Oleg Mudrak's extensive database on Comparative Eskimo, based on his own reconstruction as well as work by M. Fortescue. The Eskimo base is already linked to the Nostratic one, although the number of comparisons for now is rather limited.
2) Several 100-wordlists have been added (Baltic and Germanic languages, with added genealogical trees). Hopefully this is just the beginning, and more of these will be added gradually.
3) A new bibliographical database is currently under construction. For the moment, almost all the sources for the Khoisan database have been documented, as well as multiple sources for high-level databases.

Dear friends and colleagues,
to say that there can be no adequate replacement for S. Starostin in the world of comparative linguistics would be much more than an understatement. That said, he was never alone in his endless research, and yes, the work goes on. "The Tower of Babel" is still Sergei Starostin's homepage, and will always be as long as there are people still wishing to follow in his footsteps.
Anyway, cutting short the epic part: as some of you may have noticed, the site has undergone some MAJOR reworking over the past month. Crucial improvements include:
A) All of the previously available databases have been replaced by newer versions. This is particularly important for such families as Khoisan and Bahnaric, work on which is still in its initial phases, but improvements have been made in practically every database.
B) TONS of new databases added. Some of these have already been previously available on the Santa Fe version of the site (, but some are presented here for the first time. Here is the list:
1) A database for "global" or "long-range" etymologies, mostly serving as a unifying point of entry for all of our Eurasiatic data, componed by S. Starostin from a series of sources as well as featuring many of his own (for now - highly provisional, of course) etymologies;
2) A database for Nostratic etymologies, gathering in one place most of the pioneering data by V. M. Illich-Svitych and A. Dolgopolsky, with extensive contributions from many ToB participants.
3) The Indo-European database, created by S. L. Nikolayev on the basis of Pokorny's dictionary and other sources. Subordinate databases include collections of Baltic and Germanic etymologies, also componed by S. L. Nikolayev, as well as scanned and OCR-ed versions of Vasmer's etymological dictionary of the Russian language and Pokorny's Indo-European dictionary as well.
4) The Uralic database, originally created by Ye. Khelimsky and substantially enlarged and detailed by S. A. Starostin on the basis of Redei's dictionary and other sources. The database is preliminary and does not yet include any subordinate files, although several intermediate databases are currently under construction. It is nevertheless nice to have at least something on Uralic etymology online.
5) The Kartvelian database, created by S. A. Starostin on the basis of Klimov's and Fenrich's etymological dictionaries.
6) The Afroasiatic (Afrasian, Semito-Hamitic) database, created by A. Militarev and O. Stolbova on the basis of a variety of sources. It is under constant construction, but large parts of it are in readable shape and may be useful to the general public. The subordinate Semitic database - especially those parts of it which have been published as the first two volumes of A. Militarev and L. Kogan's Semitic Etymological Dictionary - still remains as the most elaborate part. Many other subordinate databases are exceedingly small - around 100-200 etymologies - but nevertheless publishable. The only section that has not yet been made public is the Chadic one, due to certain technical difficulties; we hope to have this question settled very soon.
7) The general Sino-Caucasian database, serving as the higher hierarchy level for the already available North Caucasian, Sino-Tibetan, and Yenisseian databases. It was created in its entirety by S. A. Starostin and features the latest version of his Sino-Caucasian reconstruction (a detailed description of phonological correspondences between all three branches has also been completed a short time before his demise and is now awaiting publication). The Sino-Caucasian database now also links to the subordinate Burushaski etymology; the Basque and Na-Dene data, unfortunately, have not yet been entered.
8) The Austric database includes a list of parallels between Austronesian and Austro-Asiatic languages, entered by I. Peiros and S. A. Starostin. Unfortunately, the subordinate Austronesian database (a reworked version of O. Dempwolff's dictionary) is not available yet due to technical reasons. A much larger version of the Austronesian database is also being prepared by I. Peiros and other participants of the project, but is currently in its initial stages. However, it is now possible to include a huge amount of Austro-Asiatic data, collected by I. Peiros and others as part of the IDS project. Formerly only the Bahnaric databases were available; today, subordinate databases include everything from Aslian to Palaung-Wa to Viet-Muong, etc. The quality of these databases varies widely - some do not include more than a hundred etymologies - but it is still better than nothing. Also available is a set of databases for Thai-Kadai languages, although it has not yet been linked to the common Austric base. More to come soon.
9) New databases for Khoisan languages, created by G. Starostin. These include a "macro-Khoisan" database - VERY preliminary stage, containing tentative matches between various subbranches of Khoisan including Hadza and Sandawe; a database for "Peripheral Khoisan", bringing together data from North Khoisan, South Khoisan and Eastern #Hoan; and several lower-level subordinate databases. All the reconstructions are liable to change practically every day, so reader beware.
C) The "Articles and Books" section has been hugely expanded as well. In addition to various works by S. Starostin that have been available for a long time, it now features contributions from most of the participants of ToB (including older as well as more current works) on language families ranging from Indo-European to Afroasiatic to Khoisan to Elamite. Most of the texts are in .PDF format to avoid any unnecessary trouble with font incompatibility. NOTE that the corpus is open to any submissions. If you have a well-written paper on comparative linguistics that you would like to see published, feel free to mail it to us so that we can add it to the archive.
D) Last but not least - we finally have a FORUM! Yes, you can now go to and use it for technical questions, suggestions, announcements, or discussion on various linguistic topics. The forum has both an English language and a Russian language section and is moderated by G. Starostin (
More additions are to come soon. We are planning on having monthly updates in the database section, documented on the news page. Concerning new databases, Chadic and Eskimo etymologies are on their way. Presumably we will also be including 100-wordlists for selected databases, although work on this will be gradual. More updates in the "Articles and Books" sections are coming up. Finally - and very importantly - we are now designing a memorial section for S. A. Starostin, which will include photos, obituaries, and maybe even audio files.
For now, that is all.

Signed: George Starostin

Good news: after a lot of perturbation, we are finally moving to a new FreeBSD-operated platform. While the new server is inarguably much more powerful than the old one, the shift from Windows to Unix may result in occasional mistakes on the part of the server. Should you come across some weird side effect of the transition, contact us immediately at

Download the latest updates of StarLing (9.0.8: star4DOS.exe ) and Star4Win (1.0.6: star4WIN.exe ) software, available both as complete installation packages and as shorter packages without fonts and dictionaries (star4DOSshort.exe , star4winshort.exe). These versions also come with updated fonts and many new features, including an optional multilingual dictionary multilang.exe.
Installation notes. Star4Win is complicated software, and we have received some installation complaints. So if you download and install the short version (star4winshort.exe) make sure that you also download the fonts package (ttffonts.exe) and, if you need standard dictionaries, also dicts.exe.
Normally the installation should automatically configure your system. But if the necessary fonts are not activated after installation, try first to reboot your computer (some systems, especially Windows 2000 are rather picky when it comes to font installation), and if the problems persist, please do not hesitate to contact us at
Many users who recently tried to view and edit Starling dbf files using Microsoft Excel or Access have complained that the files are password protected. This is in fact completely wrong, and is explained just by some peculiarities of the Starling database format. In the latest version we have dealt with this problem, so if you encounter it, just open and close the file in question using Star4Win 1.0.6 (or Star4Dos 9.08). After that the file should become readable in Excel or Access. Not that it will really help: files of the proprietary Starling format (with fields of variable length) will be still unreadable, and for others font problems will still be considerable, because of the restrictions of standard database editors. To export information from Star4Win you should rather use the Star4Win Print / List routines, as well as converting procedures (available in Assist).

The first part of I. Peiros' Austroasiatic etymological database is finally online. It is his Comnmon Bahnaric database, with several subgroup databases. Thanks, Ilia!
1. Updates in the Russian section: Vasmer's etymological dictionary is online. The dictionary has been scanned and recognized using the ABBYY Finereader software (thank you, ABBYY!) by M. Daniel and A. Golovastikov, and converted into a StarLing database by S. Starostin.
2. Our colleagues in Leiden are about to put Pokorny's Indo-European etymological dictionary online, which was also digitized using ABBYY finereader and put into database format by G. Starostin, and finally polished by A. Lubotsky.
If you wish to be able to use either Vasmer or Pokorny you are recommended to update the fonts (download ). The fonts that are specifically needed are the updated Times New Roman Star, Greek and Slav (for Old Slavonic). In Unicode encoding any standard Unicode font (like Arial Unicode MS) will do, but we recommend Palatino LinoStar. The latter font was specially developped by S. Bolotov for the Altaic Etymological Dictionary and contains a number of symbols absent in standard Unicode; it is now also available for download ( ).
3. Download the latest updates of StarLing (9.0.7: star4DOS.exe ) and Star4Win (1.0.5: star4WIN.exe ) software, available both as complete installation packages and as shorter service packs for previous versions ( dosupdate.exe , winupdate.exe ). These versions also come with updated fonts, several fixed bugs and many new features.

06.08.2001 Latest updates are there:

1. The most recent version of the Altaic database is on line. Added are four Mongolian languages, two Japanese dialects, four Tungus-Manchu languages and twenty Turkic languages; about two hundred new Altaic roots; many of the previously available roots were updated in various ways.

2. The most recent version of Star4Win software (version 1.0.3) is downloadable. Many bugs were fixed and many new features added. The main setup file has become slightly less in size, since the CJK (Chinese-Japanese-Korean) component is now downloadable as a separate setup file (Starcjk.exe), and contains fonts for two Chinese encodings (Big5 and GB) and for Japanese (JIS). No Winupdate.exe is present, because the new version - unfortunately - contains several new DLL's and needs to be downloaded completely. But patch (service pack) files will surely follow.

For the past few days I have been fighting with another morphological hacker attack; so you may have noticed that the server was barely surviving. The attack was professional, through a proxy server, so it was difficult to pin down the original IP-address and disable it. I hope the problem is solved now, and hope that these attacks will not happen more often than once a year (see the news from 25.2.2000)!

9.3.2001. Several software updates are available. The Web database engine and scripts finally support Unicode - you may have noticed the UTF buttons on the
query pages. See the sections pertaining to Unicode in HELP.
As usual, new downloadable updates are available, both for DOS and Windows versions. The DOS version obtained the (long overdue) documentation of the GRAMMAR function, and in the Windows version searching in text files has been (hopefully) fixed - although backward search and Search/Replace is still absent.

28.12.2000 The latest Winupdate.exe is there. Note: it will only update version 1.01; those who still have 1.00 should first download Star4Win.exe or 4WinShort.exe.

24.12.2000 All the downloadable executables were updated. The latest version number of Star4Win is 1.01. There are no significant changes compared with the last downloadable version (1.00), but the program (I hope) has become more stable. This version has no WINUPDATE.EXE; instead, download either the complete package (Star4Win.exe) or the shorter variant 4WinShort.exe. The latter may serve as an update to any previous version of Star4Win, but you will additionally need to download fonts (ttffonts.exe) and dictionaries (dict.exe) and install them manually.

I hope this will be the last update this year, century and millennium! Look for next updates in a month or so.


Download the latest WINUPDATE.EXE with numerous bugs fixed. Make sure you install the updates in the same folder where you installed STAR4WIN. Changes made involve basically the fonts, the lst-files which contain recoding parameters and the printing system. WINUPDATE.EXE is a service-pack for Star4Win.exe or Star4WinShort.exe If you have not installed Star4Win yet, do it (instructions see below), and install Winupdate afterwards.


Here comes the long awaited Windows version of StarLing software! The program was gradually translated from Clipper into C/xBase++ during the whole of last year, by Philip Krylov and Sergei Starostin. What we have now is a fully functional beta version of StarLing called Star4Win (alias: Star32.exe). Star4Win is fully compatible with all files generated by the DOS version; in their turn, files generated by Star4Win should be readable and editable with the DOS version. Most of StarLing functionality (and many more features) are present in Star4Win; what is still lacking are auxiliary procedures and the lexicostatistical component.

Star4Win may be downloaded as a single setup file (Star4Win.exe), but it may turn out to be difficult - the file takes up about 10 MBytes. So we provided also a way of downloading it as a shorter setup complect (Star4WinShort.exe) plus a set of dictionaries (dict.exe) and fonts (winfonts.exe).

StarLing for DOS is still available. Moreover, Star4Win, lacking procedures and glottochronology, and still being in the early stages of testing, still cannot replace the DOS version completely. The procedure of downloading StarLing has also been modified. There is a single large setup file (Star4DOS.exe), and, as an alternative, the shorter Star4DOSShort.exe plus the set of dictionaries (dict.exe) and fonts (dosfonts.exe).

Note that the dict.exe is the same for both DOS and Windows versions.

Since the program (at least the Windows version) is being constantly updated, we shall regularly place updates on the Web. We hope that in most cases these will be small and easily downloadable files, say, winupdate.exe and dosupdate.exe.

13.04.2000 Many design elements in the CGI scripts have been changed (by my son, Anatoli Starostin); I hope this will make the users' life a bit easier.

15.03.2000 A possibility of searching for Greek letters (in figure brackets with the letter G: e.g. {Gphe/ro:}) has been added. See more details in Help. As usual, some updates - this time mainly in Sino-Tibetan files.


1) News for Sino-Tibetologists. The Yamphu database from Roland Rutgers is finally available. Just as the Limbu, Dumi and Kulung databases it is linked to the Common Kiranti database. I must note that the Yamphu data have led to a rather significant modification of the Proto-Kiranti consonant reconstruction and reduction of the Proto-Kiranti consonant inventory: I was able to do away with retroflex dentals and uvulars. As a consequence, the Proto-Kiranti system is now much better fitting into the reconstructed Proto-Sino-Tibetan system. I am now preparing a special publication on this subject.

2) Our server has been attacked several times by hackers trying to download (page by page) the complete list of Russian wordforms. I don't mind people looking for Russian paradigms - but automatic programs like this sometimes considerably hamper the server performance (remember the stories about hackers attacking Yahoo or CNN?) So I had to take precautions: after a critical amount of requests per day your computer will be cut off from the database server for good, until you write me an email explaining the purpose of your excessive activity. Please do not hesitate to contact me and I may suggest better ways of downloading the information you require. I shall not sue anybody!

15.01.2000 A number of new updates is present:

1) We have a number of contributions from the Himalayan Project in Leiden University: the databases for Limbu, Dumi and Kulung. The Yamphu database is on its way. Thanks to George Van Driem, Gerard Tolsma and Roland Rutgers!

2) The Limbu, Dumi and Kulung databases are linked to the new Common Kiranti database which I have finally made more or less ready for Web publishing.

3) The Kiranti database is itself linked to the Sino-Tibetan database, which is also enriched by Lepcha data which were provided by Olga Mazo.

4) The Old Chinese database was significantly enriched and modified and includes now almost 5000 Old Chinese entries.

5) As usual there are updates in the Altaic and Dravidian databases.

6) Most recent copies of databases and software are available for download.

11.01.99 You will notice the presents of Semitic data (at last); as usual, Altaic updates. There had also been some interface changes: special descriptions for each database were introduced.


Several databases were updated: Dravidian and Altaic (lots of new roots), Old Chinese (links to the Sino-Tibetan database and numbers from Karlgren's Grammata Serica Recensa were added).


Some elements of the interface were updated - parameters of viewing pages at the bottom of the screen can now be set all at once, instead of doing it separately for each purpose.


Due to the advent of Windows 98 it has been necessary to update downloadable fonts. The archive TIMESTR.ZIP now contains two files: TIMESTAR.TTF and COURSTAR.TTF which you may install on your computer. The former contains the font Times New Roman Star and should be used instead of the earlier Times NewR Star; the latter is called CourierB Star. It is a fixed pitch font needed for editing purposes. Both should be installed in your browser options/preferences for the encoding "User Defined".

For editing purposes a keyboard layout is also supplied - so far, only for Windows users. It is called STARLING.KBD and should replace the standard KBDUS.KBD in the WINDOWS/SYSTEM folder. This keyboard layout works exactly like the standard US keyboard, but contains a complete set of special characters activated by pressing the right ALT key (alone or combined with SHIFT).


More innovations. A new method of search ("Match meaning") is now available - see details in Help.
Registered users can now edit the contents of the databases on line, not just view them. For details, consult


Several programming and cosmetic innovations were introduced: search in all database fields at once; highlighting of searched sequences etc. You'll see.

Our map is expanding: the Dravidian databases are there at last!


A download page is now available.


I have added some completely different stuff: Russian automatic morphology and Russian dictionary databases.


1. Finally we have a contribution from outside! William Wang and Chin-Chuan Cheng have sent in the latest version of their DOC database of Chinese dialects, which I am glad to include into the general system of databases. See the description of DOC in the Introduction, as well as on .

2. Most databases are in a relatively stable state, but some are being constantly modified. There has been quite a number of updates in the Old Chinese database, and the Altaic database transforms itself quickly: its publishing is drawing near, and a lot of work with individual roots is going on.

    "The Tower of Babel": The main goal of the project is to join efforts in the research of long range connections between established linguistic families of the world. Internet is a brilliant way to combine our attempts and to build up a commonly accessible database of roots, or etyma reconstructed for the World's major (and minor) linguistic stocks. Continue >>

On the following pages you shall be able to look at computer databases for the dictionaries of Ozhegov, Zalizniak and Mueller, as well as analyze any Russian word and obtain its complete accented paradigm. Continue >>

Languages of the World:
All Bases
North Caucasian
Himalayan (Limbu, Dumi, Kulung)
Chinese Dialects
Continue >> Map >> Download >>
Russian Language
Continue >> Download >>
    STARLING is a software package designed by Sergei Starostin for various types of linguistic text and database processing, including handling of linguistic fonts in the DOS and WINDOWS operating systems, operations with linguistic databases and Internet presentation of linguistic data. Continue >> Download >>
Articles and Books
    Altaic, North Caucasian, Yenissei, Dravidian... Continue >>
    Etymological resources, Russian language... Continue >>
Technical Advice
    Fonts... Continue >>