19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A second-generation assembly of the Drosophila simulans genome provides new insights into patterns of lineage-specific divergence

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We create a new assembly of the Drosophila simulans genome using 142 million paired short-read sequences and previously published data for strain w 501 . Our assembly represents a higher-quality genomic sequence with greater coverage, fewer misassemblies, and, by several indexes, fewer sequence errors. Evolutionary analysis of this genome reference sequence reveals interesting patterns of lineage-specific divergence that are different from those previously reported. Specifically, we find that Drosophila melanogaster evolves faster than D. simulans at all annotated classes of sites, including putatively neutrally evolving sites found in minimal introns. While this may be partly explained by a higher mutation rate in D. melanogaster, we also find significant heterogeneity in rates of evolution across classes of sites, consistent with historical differences in the effective population size for the two species. Also contrary to previous findings, we find that the X chromosome is evolving significantly faster than autosomes for nonsynonymous and most noncoding DNA sites and significantly slower for synonymous sites. The absence of a X/A difference for putatively neutral sites and the robustness of the pattern to Gene Ontology and sex-biased expression suggest that partly recessive beneficial mutations may comprise a substantial fraction of noncoding DNA divergence observed between species. Our results have more general implications for the interpretation of evolutionary analyses of genomes of different quality.

          Related collections

          Most cited references51

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PAML 4: phylogenetic analysis by maximum likelihood.

            PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              MUSCLE: a multiple sequence alignment method with reduced time and space complexity

              Background In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. Results We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer. Conclusions MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at .
                Bookmark

                Author and article information

                Journal
                Genome Res
                Genome Res
                GENOME
                Genome Research
                Cold Spring Harbor Laboratory Press
                1088-9051
                1549-5469
                January 2013
                January 2013
                : 23
                : 1
                : 89-98
                Affiliations
                [1 ]Department of Ecology and Evolutionary Biology and the Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA;
                [2 ]Howard Hughes Medical Institute and the Lawrence Berkeley Laboratory, University of California Berkeley, Berkeley, California 94720, USA;
                [3 ]Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92617, USA
                Author notes
                [4 ]Corresponding author E-mail tthu@ 123456princeton.edu
                Article
                9518021
                10.1101/gr.141689.112
                3530686
                22936249
                c9b53e60-618f-401c-9aec-cb52a497b263
                © 2013, Published by Cold Spring Harbor Laboratory Press

                This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

                History
                : 12 April 2012
                : 29 August 2012
                Categories
                Research

                Comments

                Comment on this article