57
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets.

          Results

          Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone.

          Conclusion

          This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts ( https://bitbucket.org/blackrim/phyparts), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12862-015-0423-0) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references44

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity

          Background MTML-msBayes uses hierarchical approximate Bayesian computation (HABC) under a coalescent model to infer temporal patterns of divergence and gene flow across codistributed taxon-pairs. Under a model of multiple codistributed taxa that diverge into taxon-pairs with subsequent gene flow or isolation, one can estimate hyper-parameters that quantify the mean and variability in divergence times or test models of migration and isolation. The software uses multi-locus DNA sequence data collected from multiple taxon-pairs and allows variation across taxa in demographic parameters as well as heterogeneity in DNA mutation rates across loci. The method also allows a flexible sampling scheme: different numbers of loci of varying length can be sampled from different taxon-pairs. Results Simulation tests reveal increasing power with increasing numbers of loci when attempting to distinguish temporal congruence from incongruence in divergence times across taxon-pairs. These results are robust to DNA mutation rate heterogeneity. Estimating mean divergence times and testing simultaneous divergence was less accurate with migration, but improved if one specified the correct migration model. Simulation validation tests demonstrated that one can detect the correct migration or isolation model with high probability, and that this HABC model testing procedure was greatly improved by incorporating a summary statistic originally developed for this task (Wakeley's ΨW ). The method is applied to an empirical data set of three Australian avian taxon-pairs and a result of simultaneous divergence with some subsequent gene flow is inferred. Conclusions To retain flexibility and compatibility with existing bioinformatics tools, MTML-msBayes is a pipeline software package consisting of Perl, C and R programs that are executed via the command line. Source code and binaries are available for download at http://msbayes.sourceforge.net/ under an open source license (GNU Public License).
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The age and diversification of the angiosperms re-revisited.

            • It has been 8 years since the last comprehensive analysis of divergence times across the angiosperms. Given recent methodological improvements in estimating divergence times, refined understanding of relationships among major angiosperm lineages, and the immense interest in using large angiosperm phylogenies to investigate questions in ecology and comparative biology, new estimates of the ages of the major clades are badly needed. Improved estimations of divergence times will concomitantly improve our understanding of both the evolutionary history of the angiosperms and the patterns and processes that have led to this highly diverse clade. • We simultaneously estimated the age of the angiosperms and the divergence times of key angiosperm lineages, using 36 calibration points for 567 taxa and a "relaxed clock" methodology that does not assume any correlation between rates, thus allowing for lineage-specific rate heterogeneity. • Based on the analysis for which we set fossils to fit lognormal priors, we obtained an estimated age of the angiosperms of 167-199 Ma and the following age estimates for major angiosperm clades: Mesangiospermae (139-156 Ma); Gunneridae (109-139 Ma); Rosidae (108-121 Ma); Asteridae (101-119 Ma). • With the exception of the age of the angiosperms themselves, these age estimates are generally younger than other recent molecular estimates and very close to dates inferred from the fossil record. We also provide dates for all major angiosperm clades (including 45 orders and 335 families [208 stem group age only, 127 both stem and crown group ages], sensu APG III). Our analyses provide a new comprehensive source of reference dates for major angiosperm clades that we hope will be of broad utility.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution.

              An understanding of ctenophore biology is critical for reconstructing events that occurred early in animal evolution. Toward this goal, we have sequenced, assembled, and annotated the genome of the ctenophore Mnemiopsis leidyi. Our phylogenomic analyses of both amino acid positions and gene content suggest that ctenophores rather than sponges are the sister lineage to all other animals. Mnemiopsis lacks many of the genes found in bilaterian mesodermal cell types, suggesting that these cell types evolved independently. The set of neural genes in Mnemiopsis is similar to that of sponges, indicating that sponges may have lost a nervous system. These results present a newly supported view of early animal evolution that accounts for major losses and/or gains of sophisticated cell types, including nerve and muscle cells.
                Bookmark

                Author and article information

                Contributors
                eebsmith@umich.edu
                michael.moore@oberlin.edu
                josephwb@umich.edu
                yangya@umich.edu
                Journal
                BMC Evol Biol
                BMC Evol. Biol
                BMC Evolutionary Biology
                BioMed Central (London )
                1471-2148
                5 August 2015
                5 August 2015
                2015
                : 15
                : 150
                Affiliations
                [ ]Department of Ecology and Evolutionary Biology, University of Michigan, S State St, Ann Arbor, 48109 MI USA
                [ ]Department of Biology, Oberlin College, W Lorain St, Oberlin, 44074 OH USA
                Article
                423
                10.1186/s12862-015-0423-0
                4524127
                26239519
                3fd55cbd-2780-4aad-b7a4-6146b4350707
                © Smith et al. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 23 March 2015
                : 25 June 2015
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2015

                Evolutionary Biology
                phylogenomics,incomplete lineage sorting,transcriptome,gene tree conflict,gene duplication

                Comments

                Comment on this article