22
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

      research-article
      1 , 2 , , 1 , 3 , 1 , 1 , 4
      BMC Bioinformatics
      BioMed Central
      ECCB 2010 Workshop: Annotation interpretation and management of mutations (AIMM)

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed.

          Results

          This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs.

          The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html.

          Conclusions

          Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus.

          Related collections

          Most cited references57

          • Record: found
          • Abstract: found
          • Article: not found

          Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database.

          The past decade has witnessed hundreds of reports declaring or refuting genetic association with putative Alzheimer disease susceptibility genes. This wealth of information has become increasingly difficult to follow, much less interpret. We have created a publicly available, continuously updated database that comprehensively catalogs all genetic association studies in the field of Alzheimer disease (http://www.alzgene.org). We performed systematic meta-analyses for each polymorphism with available genotype data in at least three case-control samples. In addition to identifying the epsilon4 allele of APOE and related effects, we pinpointed over a dozen potential Alzheimer disease susceptibility genes (ACE, CHRNB2, CST3, ESR1, GAPDHS, IDE, MTHFR, NCSTN, PRNP, PSEN1, TF, TFAM and TNF) with statistically significant allelic summary odds ratios (ranging from 1.11-1.38 for risk alleles and 0.92-0.67 for protective alleles). Our database provides a powerful tool for deciphering the genetics of Alzheimer disease, and it serves as a potential model for tracking the most viable gene candidates in other genetically complex diseases.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Entrez Gene: gene-centered information at NCBI

            Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion.

              Consistent gene mutation nomenclature is essential for efficient and accurate reporting, testing, and curation of the growing number of disease mutations and useful polymorphisms being discovered in the human genome. While a codified mutation nomenclature system for simple DNA lesions has now been adopted broadly by the medical genetics community, it is inherently difficult to represent complex mutations in a unified manner. In this article, suggestions are presented for reporting just such complex mutations. Copyright 2000 Wiley-Liss, Inc.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                5 July 2011
                : 12
                : Suppl 4
                : S4
                Affiliations
                [1 ]Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53754 Sankt Augustin, Germany
                [2 ]Knowledge Management in Bioinformatics, Humboldt-University Berlin, Unter den Linden 6, 10099 Berlin, Germany
                [3 ]Research Unit on Biomedical Informatics (GRIB), IMIM-Hospital del Mar, UPF, PRBB, c/Dr. Aiguader 88, E-08003 Barcelona, Spain
                [4 ]University of Applied Science and Arts Dortmund, Department of Computer Science, Emil-Figge-Str. 42, 44227 Dortmund, Germany
                Article
                1471-2105-12-S4-S4
                10.1186/1471-2105-12-S4-S4
                3194196
                21992066
                58822d4f-2911-46a3-ace2-3b315f7342f2
                Copyright ©2011 Thomas et al; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                ECCB 2010 Workshop: Annotation interpretation and management of mutations (AIMM)
                Ghent, Belgium
                History
                Categories
                Research

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article