+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak ( Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720–730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak ( Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37–52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.

          Related collections

          Most cited references 38

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Fast and accurate short read alignment with Burrows–Wheeler transform

          Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: Contact:
            • Record: found
            • Abstract: not found
            • Article: not found

            Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

              • Record: found
              • Abstract: found
              • Article: not found

              A framework for variation discovery and genotyping using next-generation DNA sequencing data

              Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

                Author and article information

                G3 (Bethesda)
                G3: Genes, Genomes, Genetics
                G3: Genes, Genomes, Genetics
                G3: Genes, Genomes, Genetics
                G3: Genes|Genomes|Genetics
                Genetics Society of America
                12 September 2016
                November 2016
                : 6
                : 11
                : 3485-3495
                [* ]Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095
                []Institute of the Environment and Sustainability, University of California, Los Angeles, California 90095
                []Institute of Genomics and Proteomics, University of California, Los Angeles, California 90095
                [§ ]Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205
                [** ]Department of Evolution and Ecology, University of California, Davis, California 95616
                [†† ]University of Maryland Center for Environmental Science, Appalachian Laboratory, Frostburg, Maryland 21532
                [‡‡ ]Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218
                [§§ ]Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, California 90095
                [*** ]Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218
                [††† ]Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21218
                Author notes
                [1 ]Corresponding author: Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 4139 Terasaki Life Sciences Building, 610 Charles E. Young Drive East, Los Angeles, CA 90095-7239. E-mail: vlsork@
                Copyright © 2016 Sork et al.

                This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Figures: 5, Tables: 4, Equations: 0, References: 62, Pages: 11
                Genomic Selection


                Comment on this article