9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Long-read RNA sequencing of human and animal filarial parasites improves gene models and discovers operons

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Filarial parasitic nematodes (Filarioidea) cause substantial disease burden to humans and animals around the world. Recently there has been a coordinated global effort to generate, annotate, and curate genomic data from nematode species of medical and veterinary importance. This has resulted in two chromosome-level assemblies ( Brugia malayi and Onchocerca volvulus) and 11 additional draft genomes from Filarioidea. These reference assemblies facilitate comparative genomics to explore basic helminth biology and prioritize new drug and vaccine targets. While the continual improvement of genome contiguity and completeness advances these goals, experimental functional annotation of genes is often hindered by poor gene models. Short-read RNA sequencing data and expressed sequence tags, in cooperation with ab initio prediction algorithms, are employed for gene prediction, but these can result in missing clade-specific genes, fragmented models, imperfect mapping of gene ends, and lack of isoform resolution. Long-read RNA sequencing can overcome these drawbacks and greatly improve gene model quality. Here, we present Iso-Seq data for B. malayi and Dirofilaria immitis, etiological agents of lymphatic filariasis and canine heartworm disease, respectively. These data cover approximately half of the known coding genomes and substantially improve gene models by extending untranslated regions, cataloging novel splice junctions from novel isoforms, and correcting mispredicted junctions. Furthermore, we validated computationally predicted operons, manually curated new operons, and merged fragmented gene models. We carried out analyses of poly(A) tails in both species, leading to the identification of non-canonical poly(A) signals. Finally, we prioritized and assessed known and putative anthelmintic targets, correcting or validating gene models for molecular cloning and target-based anthelmintic screening efforts. Overall, these data significantly improve the catalog of gene models for two important parasites, and they demonstrate how long-read RNA sequencing should be prioritized for ongoing improvement of parasitic nematode genome assemblies.

          Author summary

          Filarial parasitic nematodes are vector-borne parasites that infect humans and animals. Brugia malayi and Dirofilaria immitis are transmitted by mosquitoes and cause human lymphatic filariasis and canine heartworm disease, respectively. Recent years have seen a dramatic increase in genomic and transcriptomic data sets and the concomitant increase in innovative strategies for drug target identification, validation, and screening. However, while the completeness of genome assemblies of filarial parasitic nematodes has seen steady improvements, the reliability of gene models has not kept pace, hindering cloning efforts. Long-read RNA sequencing technologies are uniquely able to improve gene models, but have not been widely used for the causative agents of neglected tropical diseases. Here, we report the improvement of gene models in both B. malayi and D. immitis by long-read RNA sequencing. We identified novel operons, deprecated false positive operons, identified dozens of novel genes, and described the parameters of polyadenylation. We also focused on putative anthelmintic targets, identifying novel isoforms and correcting gene models. These data substantially increase the trustworthiness of gene models in these two species and demonstrate how long-read sequencing approaches should be prioritized in the continued improvement of genome assemblies and their gene annotations.

          Related collections

          Most cited references67

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BEDTools: a flexible suite of utilities for comparing genomic features

            Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BLAST+: architecture and applications

              Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
                Bookmark

                Author and article information

                Contributors
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: MethodologyRole: ResourcesRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: ResourcesRole: SoftwareRole: SupervisionRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Negl Trop Dis
                PLoS Negl Trop Dis
                plos
                plosntds
                PLoS Neglected Tropical Diseases
                Public Library of Science (San Francisco, CA USA )
                1935-2727
                1935-2735
                16 November 2020
                November 2020
                : 14
                : 11
                : e0008869
                Affiliations
                [001]Department of Pathobiological Sciences, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
                National University of Ireland Galway, IRELAND
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-5909-4190
                https://orcid.org/0000-0003-0582-006X
                https://orcid.org/0000-0001-9233-1760
                Article
                PNTD-D-20-01113
                10.1371/journal.pntd.0008869
                7704054
                33196647
                43e35062-f199-41a5-8fce-3b739a6bf314
                © 2020 Wheeler et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 22 June 2020
                : 9 October 2020
                Page count
                Figures: 7, Tables: 0, Pages: 22
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000060, National Institute of Allergy and Infectious Diseases;
                Award ID: R01AI151171
                Award Recipient :
                Funding for MZ is provided by an R01 grant from the National Institute of Allergy and Infectious Diseases (R01AI151171, NIH.gov). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Organisms
                Eukaryota
                Animals
                Invertebrates
                Nematoda
                Brugia
                Brugia Malayi
                Biology and Life Sciences
                Zoology
                Animals
                Invertebrates
                Nematoda
                Brugia
                Brugia Malayi
                Biology and Life Sciences
                Genetics
                DNA
                Operons
                Biology and Life Sciences
                Biochemistry
                Nucleic Acids
                DNA
                Operons
                Biology and life sciences
                Molecular biology
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Research and analysis methods
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Gene Prediction
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Gene Prediction
                Biology and Life Sciences
                Genetics
                Genomics
                Animal Genomics
                Invertebrate Genomics
                Research and Analysis Methods
                Animal Studies
                Experimental Organism Systems
                Model Organisms
                Caenorhabditis Elegans
                Research and Analysis Methods
                Model Organisms
                Caenorhabditis Elegans
                Research and Analysis Methods
                Animal Studies
                Experimental Organism Systems
                Animal Models
                Caenorhabditis Elegans
                Biology and Life Sciences
                Organisms
                Eukaryota
                Animals
                Invertebrates
                Nematoda
                Caenorhabditis
                Caenorhabditis Elegans
                Biology and Life Sciences
                Zoology
                Animals
                Invertebrates
                Nematoda
                Caenorhabditis
                Caenorhabditis Elegans
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Gene Mapping
                Research and Analysis Methods
                Molecular Biology Techniques
                Gene Mapping
                Medicine and Health Sciences
                Medical Conditions
                Parasitic Diseases
                Nematode Infections
                Custom metadata
                vor-update-to-uncorrected-proof
                2020-11-30
                All data analysis and visualization scripts are publicly available at https://github.com/zamanianlab/Filarid_IsoSeq-ms. The script for poly(A) analysis can be found at https://github.com/zamanianlab/polyAudit. Long-read sequencing data has been deposited into NIH BioProjects PRJNA548902 (B. malayi) and PRJNA640410 (D. immitis).

                Infectious disease & Microbiology
                Infectious disease & Microbiology

                Comments

                Comment on this article