4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An improved ovine reference genome assembly to facilitate in-depth functional annotation of the sheep genome

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The domestic sheep ( Ovis aries) is an important agricultural species raised for meat, wool, and milk across the world. A high-quality reference genome for this species enhances the ability to discover genetic mechanisms influencing biological traits. Furthermore, a high-quality reference genome allows for precise functional annotation of gene regulatory elements. The rapid advances in genome assembly algorithms and emergence of sequencing technologies with increasingly long reads provide the opportunity for an improved de novo assembly of the sheep reference genome.

          Findings

          Short-read Illumina (55× coverage), long-read Pacific Biosciences (75× coverage), and Hi-C data from this ewe retrieved from public databases were combined with an additional 50× coverage of Oxford Nanopore data and assembled with canu v1.9. The assembled contigs were scaffolded using Hi-C data with Salsa v2.2, gaps filled with PBsuitev15.8.24, and polished with Nanopolish v0.12.5. After duplicate contig removal with PurgeDups v1.0.1, chromosomes were oriented and polished with 2 rounds of a pipeline that consisted of freebayes v1.3.1 to call variants, Merfin to validate them, and BCFtools to generate the consensus fasta. The ARS-UI_Ramb_v2.0 assembly is 2.63 Gb in length and has improved continuity (contig NG50 of 43.18 Mb), with a 19- and 38-fold decrease in the number of scaffolds compared with Oar_rambouillet_v1.0 and Oar_v4.0. ARS-UI_Ramb_v2.0 has greater per-base accuracy and fewer insertions and deletions identified from mapped RNA sequence than previous assemblies.

          Conclusions

          The ARS-UI_Ramb_v2.0 assembly is a substantial improvement in contiguity that will optimize the functional annotation of the sheep genome and facilitate improved mapping accuracy of genetic variant and expression data for traits in sheep.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            STAR: ultrafast universal RNA-seq aligner.

            Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BEDTools: a flexible suite of utilities for comparing genomic features

              Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                04 February 2022
                2022
                04 February 2022
                : 11
                : giab096
                Affiliations
                Department of Animal, Veterinary, and Food Sciences, University of Idaho , 875 Perimeter Dr, Moscow, ID 83843, USA
                US Dairy Forage Research Center , USDA-ARS, 1925 Linden Drive, Madison, WI 53706, USA
                Baylor College of Medicine , 1 Baylor Plaza, Houston, TX 77030, USA
                Baylor College of Medicine , 1 Baylor Plaza, Houston, TX 77030, USA
                The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh , Easter Bush Campus, Midlothian, EH25 9RG, UK
                The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh , Easter Bush Campus, Midlothian, EH25 9RG, UK
                Utah State University , Old Main Hill, Logan, UT 84322, USA
                US Meat Animal Research Center , USDA-ARS, State Spur 18D, Clay Center, NE 68933, USA
                US Meat Animal Research Center , USDA-ARS, State Spur 18D, Clay Center, NE 68933, USA
                Department of Animal, Veterinary, and Food Sciences, University of Idaho , 875 Perimeter Dr, Moscow, ID 83843, USA
                Animal Genomics and Improvement Laboratory , USDA-ARS, 10300 Baltimore Ave, Beltsville, MD 20705, USA
                Author notes
                Correspondence address. Brenda M. Murdoch, Department of Animal, Veterinary, and Food Sciences, University of Idaho, 875 Perimeter Dr, Moscow, ID 83843, USA. E-mail: bmurdoch@ 123456uidaho.edu
                Correspondence address. Benjamin D. Rosen, Animal Genomics and Improvement Laboratory, USDA-ARS, 10300 Baltimore Ave, Beltsville, MD 20705, USA. E-mail: ben.rosen@ 123456usda.gov
                Author information
                https://orcid.org/0000-0003-2796-9252
                https://orcid.org/0000-0002-7349-2451
                https://orcid.org/0000-0001-8675-3473
                https://orcid.org/0000-0001-9395-8346
                Article
                giab096
                10.1093/gigascience/giab096
                8848310
                35134925
                5d9d6b24-e8b8-4040-95a5-1e9b227c2e94
                © The Author(s) 2022. Published by Oxford University Press GigaScience.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 17 August 2021
                : 28 October 2021
                : 25 December 2021
                Page count
                Pages: 9
                Funding
                Funded by: U.S. Department of Agriculture, DOI 10.13039/100000199;
                Funded by: National Institute of Food and Agriculture, DOI 10.13039/100005825;
                Award ID: 2013–67015-21228
                Funded by: National Institutes of Health, DOI 10.13039/100000002;
                Award ID: U54 HG003273
                Categories
                Data Note
                AcademicSubjects/SCI00960
                AcademicSubjects/SCI02254

                rambouillet,genome assembly,reference genome,sheep,ovis aries

                Comments

                Comment on this article