Comparative genomics and community curation further improve gene annotations in the nematode  Pristionchus pacificus

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics.

Results

Here, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths, screens for atypical domain combinations and species-specific orphan genes resulted in 4311 candidate genes that were subject to community-based curation. Corrections for 2946 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%.

Conclusions

Our work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.

Related collections

Most cited references 35

Record: found
Abstract: found
Article: found

Is Open Access

AUGUSTUS: ab initio prediction of alternative transcripts

Mario Stanke, Oliver Keller, Irfan Gunduz … (2006)

AUGUSTUS is a software tool for gene prediction in eukaryotes based on a Generalized Hidden Markov Model, a probabilistic model of a sequence and its gene structure. Like most existing gene finders, the first version of AUGUSTUS returned one transcript per predicted gene and ignored the phenomenon of alternative splicing. Herein, we present a WWW server for an extended version of AUGUSTUS that is able to predict multiple splice variants. To our knowledge, this is the first ab initio gene finder that can predict multiple transcripts. In addition, we offer a motif searching facility, where user-defined regular expressions can be searched against putative proteins encoded by the predicted genes. The AUGUSTUS web interface and the downloadable open-source stand-alone program are freely available from .

0 comments Cited 648 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

Carson Holt, Mark Yandell (2011)

Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

0 comments Cited 477 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes.

Brandi L Cantarel, Ian Korf, Sofia M. C. Robb … (2008)

We have developed a portable and easily configurable genome annotation pipeline called MAKER. Its purpose is to allow investigators to independently annotate eukaryotic genomes and create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions, and automatically synthesizes these data into gene annotations having evidence-based quality indices. MAKER is also easily trainable: Outputs of preliminary runs are used to automatically retrain its gene-prediction algorithm, producing higher-quality gene-models on subsequent runs. MAKER's inputs are minimal, and its outputs can be used to create a GMOD database. Its outputs can also be viewed in the Apollo Genome browser; this feature of MAKER provides an easy means to annotate, view, and edit individual contigs and BACs without the overhead of a database. As proof of principle, we have used MAKER to annotate the genome of the planarian Schmidtea mediterranea and to create a new genome database, SmedGD. We have also compared MAKER's performance to other published annotation pipelines. Our results demonstrate that MAKER provides a simple and effective means to convert a genome sequence into a community-accessible genome database. MAKER should prove especially useful for emerging model organism genome projects for which extensive bioinformatics resources may not be readily available.

0 comments Cited 416 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Christian Rödelsperger:

ORCID: http://orcid.org/0000-0002-7905-9675

christian.roedelsperger@tuebingen.mpg.de

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2164

Publication date (Electronic): 12 October 2020

Publication date PMC-release: 12 October 2020

Publication date Collection: 2020

Volume: 21

Electronic Location Identifier: 708

Affiliations

GRID grid.419495.4, ISNI 0000 0001 1014 8330, Department for Integrative Evolutionary Biology, , Max Planck Institute for Developmental Biology, ; Max-Planck-Ring 9, 72076 Tübingen, Germany

Author information

Christian Rödelsperger http://orcid.org/0000-0002-7905-9675

Article

Publisher ID: 7100

DOI: 10.1186/s12864-020-07100-0

PMC ID: 7552371

PubMed ID: 33045985

SO-VID: adf43d25-dea0-4353-a1d0-3e9deb546fe9

License:

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

History

Date received : 3 August 2020

Date accepted : 23 September 2020

Funding

Funded by: Max-Planck-Gesellschaft (DE)

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: genome,evolution,caenorhabditis elegans,parasitic nematodes,orphan genes

Data availability:

ScienceOpen disciplines: Genetics

Keywords: genome, evolution, caenorhabditis elegans, parasitic nematodes, orphan genes

Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Arabidopsis genomics

Most cited references 35

AUGUSTUS: ab initio prediction of alternative transcripts

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 223

Cited by 15

Most referenced authors 451