A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Summary: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics.

Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI ( https://pypi.python.org/pypi/hgvs).

Contact: reecehart@ 123456gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 6

Record: found
Abstract: found
Article: found

Is Open Access

Splign: algorithms for computing spliced alignments with identification of paralogs

Yuri Kapustin, Alexander Souvorov, Tatiana Tatusova … (2008)

Background The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. Results We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. Conclusion Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. Reviewers This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand).

0 comments Cited 152 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.

Ernest LW van Heurn, Johan T. den Dunnen, Sheila Wildeman … (2007)

Unambiguous and correct sequence variant descriptions are of utmost importance, not in the least since mistakes and uncertainties may lead to undesired errors in clinical diagnosis. We developed the Mutation Analyzer (Mutalyzer) sequence variation nomenclature checker (www.lovd.nl/mutalyzer; last accessed 13 September 2007) for automated analysis and correction of sequence variant descriptions using reference sequences from any organism. Mutalyzer handles most variation types: substitution, deletion, duplication, insertion, indel, and splice-site changes following current recommendations of the Human Genome Variation Society (HGVS). Input is a GenBank accession number or an uploaded reference sequence file in GenBank format with user-modified annotation, an HGNC gene symbol, and the variant (single or in a batch file). Mutalyzer generates variant descriptions at DNA level, the level of all annotated transcripts and the deduced outcome at protein level. To validate Mutalyzer's performance and to investigate the sequence variant description quality in locus-specific mutation databases (LSDBs), more than 11,000 variants in the PAH, BIC BRCA2, and HbVar databases were analyzed, showing that 87%, 25%, and 38%, respectively, were error-free and following the recommendations. Low recognition rates in BIC and HbVar (38% and 51%, respectively) were due to lack of a well-annotated genomic reference sequence (HbVar) or noncompliance to the guidelines (BRCA2). Provided with well-annotated genomic reference sequences, Mutalyzer is very effective for the curation of newly discovered sequence variation descriptions and existing LSDB data. Mutalyzer will be linked to the Leiden Open source Variation Database (LOVD) (www.LOVD.nl; last accessed 13 September 2007) and is the first module of a sequence variant effect prediction package. (c) 2007 Wiley-Liss, Inc.

0 comments Cited 152 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Seed-based IntaRNA prediction combined with GFP-reporter system identifies mRNA targets of the small RNA Yfr1

Andreas Richter, Christian Schleberger, Rolf Backofen … (2009)

Motivation: Prochlorococcus possesses the smallest genome of all sequenced photoautotrophs. Although the number of regulatory proteins in the genome is very small, the relative number of small regulatory RNAs is comparable with that of other bacteria. The compact genome size of Prochlorococcus offers an ideal system to search for targets of small RNAs (sRNAs) and to refine existing target prediction algorithms. Results: Target predictions for the cyanobacterial sRNA Yfr1 were carried out with INTARNA in Prochlorococcus MED4. The ultraconserved Yfr1 sequence motif was defined as the putative interaction seed. To study the impact of Yfr1 on its predicted mRNA targets, a reporter system based on green fluorescent protein (GFP) was applied. We show that Yfr1 inhibits the translation of two predicted targets. We used mutation analysis to confirm that Yfr1 directly regulates its targets by an antisense interaction sequestering the ribosome binding site, and to assess the importance of interaction site accessibility. Contact: backofen@informatik.uni-freiburg.de; claudia.steglich@biologie.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 36 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Journal ID (hwp): bioinfo

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 15 January 2015

Publication date (Electronic): 30 September 2014

Publication date PMC-release: 30 September 2014

Volume: 31

Issue: 2

Pages: 268-270

Affiliations

¹Invitae Inc., San Francisco, CA 94107 and ²23andMe Inc., Mountain View, CA 94043, USA

Author notes

*To whom correspondence should be addressed.

Associate Editor: John Hancock

Article

Publisher ID: btu630

DOI: 10.1093/bioinformatics/btu630

PMC ID: 4287946

PubMed ID: 25273102

SO-VID: 3c31559e-626f-425b-bfde-37695c35b0ca

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 26 June 2014

Date revision received : 29 August 2014

Date accepted : 17 September 2014

Page count

Pages: 3

Comments

Comment on this article

scite_

Cited by 13

See all cited by

Most referenced authors 281

See all reference authors

- Version 1

A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

Read this article at

Abstract

Related collections

Genetoberfest

Most cited references 6

Splign: algorithms for computing spliced alignments with identification of paralogs

Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.

Seed-based IntaRNA prediction combined with GFP-reporter system identifies mRNA targets of the small RNA Yfr1

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 190

Cited by 13

Most referenced authors 281