72
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines

      research-article
      1 , 2 , 3 , 4 , 1 , 5 , 6 , 1 , 7 , 8 , 1 , 1 , 8 , 1 , 8 , 6 , 8 , 8 , 9 , 6 , 1 , 8 , 8 , 8 , 8 , 8 , 8 , 8 , 8 , 8 , 8 , 8 , 1 , 8 , 8 , 8 , 8 , 8 , 1 , 8 , 8 , 1 , 7 , 9 , 8 , 8 , 5 , 6 , 1 , 8 , 2 , 3 , 1 , 16
      Genome Research
      Cold Spring Harbor Laboratory Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.

          Related collections

          Most cited references88

          • Record: found
          • Abstract: found
          • Article: not found

          The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

          Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PLINK: a tool set for whole-genome association and population-based linkage analyses.

            Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

              We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
                Bookmark

                Author and article information

                Journal
                Genome Res
                Genome Res
                GENOME
                Genome Research
                Cold Spring Harbor Laboratory Press
                1088-9051
                1549-5469
                July 2014
                July 2014
                : 24
                : 7
                : 1193-1208
                Affiliations
                [1 ]Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA;
                [2 ]Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland;
                [3 ]Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
                [4 ]Center for Education in Liberal Arts and Sciences, Osaka University, Osaka-fu, 560-0043 Japan;
                [5 ]Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina (IBB), Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain;
                [6 ]Department of Entomology, Texas A&M University, College Station, Texas 77843, USA;
                [7 ]Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany;
                [8 ]Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA;
                [9 ]Virginia Tech Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
                Author notes
                [10]

                Joint first authors

                [11]

                Senior authors

                Present addresses: 12Chevron Inc., Houston, TX 77002, USA;

                [13]

                Syngenta, Research Triangle Park, NC 27709, USA;

                [14]

                Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA;

                [15]

                Shell International Exploration and Production, Inc., Houston, TX 77082-3101, USA.

                [16 ]Corresponding author E-mail trudy_mackay@ 123456ncsu.edu
                Article
                9518021
                10.1101/gr.171546.113
                4079974
                24714809
                fe9f69e2-0c9e-44e3-b3a2-1c70bd3a984e
                © 2014 Huang et al.; Published by Cold Spring Harbor Laboratory Press

                This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0.

                History
                : 20 December 2013
                : 1 April 2014
                Page count
                Pages: 16
                Categories
                Resource

                Comments

                Comment on this article