A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The use of a standard human sequence variant nomenclature is advocated by the Human Genome Variation Society in order to unambiguously describe genetic variants in databases and literature. There is a clear need for tools that allow the mining of data about human sequence variants and their functional consequences from databases and literature. Existing text mining focuses on the recognition of protein variants and their effects. The recognition of variants at the DNA and RNA levels is essential for dissemination of variant data for diagnostic purposes. Development of new tools is hampered by the complexity of the current nomenclature, which requires processing at the character level to recognize the specific syntactic constructs used in variant descriptions.

Results

We approached the gene variant nomenclature as a scientific sublanguage and created two formal descriptions of the syntax in Extended Backus-Naur Form: one at the DNA-RNA level and one at the protein level. To ensure compatibility to older versions of the human sequence variant nomenclature, previously recommended variant description formats have been included. The first grammar versions were designed to help build variant description handling in the Alamut mutation interpretation software. The DNA and RNA level descriptions were then updated and used to construct the context-free parser of the Mutalyzer 2 sequence variant nomenclature checker, which has already been used to check more than one million variant descriptions.

Conclusions

The Extended Backus-Naur Form provided an overview of the full complexity of the syntax of the sequence variant nomenclature, which remained hidden in the textual format and the division of the recommendations across the DNA, RNA and protein sections of the Human Genome Variation Society nomenclature website ( http://www.hgvs.org/mutnomen/). This insight into the syntax of the nomenclature could be used to design detailed and clear rules for software development. The Mutalyzer 2 parser demonstrated that it facilitated decomposition of complex variant descriptions into their individual parts. The Extended Backus-Naur Form or parts of it can be used or modified by adding rules, allowing the development of specific sequence variant text mining tools and other programs, which can generate or handle sequence variant descriptions.

Related collections

Most cited references 15

Record: found
Abstract: found
Article: not found

Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion.

J. den Dunnen, S. Antonarakis (2000)

Consistent gene mutation nomenclature is essential for efficient and accurate reporting, testing, and curation of the growing number of disease mutations and useful polymorphisms being discovered in the human genome. While a codified mutation nomenclature system for simple DNA lesions has now been adopted broadly by the medical genetics community, it is inherently difficult to represent complex mutations in a unified manner. In this article, suggestions are presented for reporting just such complex mutations. Copyright 2000 Wiley-Liss, Inc.

0 comments Cited 329 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.

Ernest LW van Heurn, Johan T. den Dunnen, Sheila Wildeman … (2007)

Unambiguous and correct sequence variant descriptions are of utmost importance, not in the least since mistakes and uncertainties may lead to undesired errors in clinical diagnosis. We developed the Mutation Analyzer (Mutalyzer) sequence variation nomenclature checker (www.lovd.nl/mutalyzer; last accessed 13 September 2007) for automated analysis and correction of sequence variant descriptions using reference sequences from any organism. Mutalyzer handles most variation types: substitution, deletion, duplication, insertion, indel, and splice-site changes following current recommendations of the Human Genome Variation Society (HGVS). Input is a GenBank accession number or an uploaded reference sequence file in GenBank format with user-modified annotation, an HGNC gene symbol, and the variant (single or in a batch file). Mutalyzer generates variant descriptions at DNA level, the level of all annotated transcripts and the deduced outcome at protein level. To validate Mutalyzer's performance and to investigate the sequence variant description quality in locus-specific mutation databases (LSDBs), more than 11,000 variants in the PAH, BIC BRCA2, and HbVar databases were analyzed, showing that 87%, 25%, and 38%, respectively, were error-free and following the recommendations. Low recognition rates in BIC and HbVar (38% and 51%, respectively) were due to lack of a well-annotated genomic reference sequence (HbVar) or noncompliance to the guidelines (BRCA2). Provided with well-annotated genomic reference sequences, Mutalyzer is very effective for the curation of newly discovered sequence variation descriptions and existing LSDB data. Mutalyzer will be linked to the Leiden Open source Variation Database (LOVD) (www.LOVD.nl; last accessed 13 September 2007) and is the first module of a sequence variant effect prediction package. (c) 2007 Wiley-Liss, Inc.

0 comments Cited 152 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

MutationFinder: a high-performance system for extracting point mutation mentions from text.

William A Baumgartner, Greg Randolph, Lawrence Hunter … (2007)

Discussion of point mutations is ubiquitous in biomedical literature, and manually compiling databases or literature on mutations in specific genes or proteins is tedious. We present an open-source, rule-based system, MutationFinder, for extracting point mutation mentions from text. On blind test data, it achieves nearly perfect precision and a markedly improved recall over a baseline. MutationFinder, along with a high-quality gold standard data set, and a scoring script for mutation extraction systems have been made publicly available. Implementations, source code and unit tests are available in Python, Perl and Java. MutationFinder can be used as a stand-alone script, or imported by other applications. http://bionlp.sourceforge.net.

0 comments Cited 59 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Conference

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2011

Publication date (Electronic): 5 July 2011

Volume: 12

Issue: Suppl 4

Page: S5

Affiliations

[1 ]Department of Human Genetics, Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands

[2 ]Interactive Biosoftware, Rouen, France

Article

Publisher ID: 1471-2105-12-S4-S5

DOI: 10.1186/1471-2105-12-S4-S5

PMC ID: 3194197

PubMed ID: 21992071

SO-VID: 89416ce4-bb8f-4d85-9df9-f81716e521ef

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Conference name: ECCB 2010 Workshop: Annotation interpretation and management of mutations (AIMM)

A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genetoberfest

Most cited references 15

Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion.

Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.

MutationFinder: a high-performance system for extracting point mutation mentions from text.

Author and article information

Conference

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 95

Cited by 8

Most referenced authors 178