124
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found

      Phylogenetic Inference from Word Lists Using Weighted Alignment with Empirically Determined Weights

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The paper investigates the task of inferring a phylogenetic tree of languages from the collection of word lists made available by the Automated Similarity Judgment Project. This task involves three steps: (1) computing pairwise word distances, (2) aggregating word distances to a distance measure between languages and inferring a phylogenetic tree from these distances, and (3) evaluating the result by comparing it to expert classifications. For the first task, weighted alignment will be used, and a method to determine weights empirically will be presented. For the second task, a novel method will be developed that attempts to minimize the bias resulting from missing data. For the third task, several methods from the literature will be applied to a large collection of language samples to enable statistical testing. It will be shown that the language distance measure proposed here leads to substantially more accurate phylogenies than a method relying on unweighted Levenshtein distances between words.

          Most cited references6

          • Record: found
          • Abstract: not found
          • Article: not found

          Comparison of phylogenetic trees

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Computational Feature-Sensitive Reconstruction of Language Relationships: Developing the ALINE Distance for Comparative Historical Linguistic Reconstruction

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              On the accuracy of language trees

              Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.
                Bookmark

                Author and article information

                Contributors
                Journal
                22105832
                Language Dynamics and Change
                LDC
                Brill (The Netherlands )
                2210-5824
                2210-5832
                2013
                : 3
                : 2
                : 245-291
                Affiliations
                Article
                10.1163/22105832-13030204
                ac779a9f-488e-461c-a4b6-b40804228726
                Copyright 2014 by Gerhard Jäger

                This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) License.

                History

                Languages of Asia,General linguistics,Linguistics & Semiotics,Languages of Europe,Levels of linguistic analysis
                automatic language classification,language phylogenies,ASJP; weighted string alignment

                Comments

                Comment on this article