41
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Experimental Design-Based Functional Mining and Characterization of High-Throughput Sequencing Data in the Sequence Read Archive

      research-article
      * , ,
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” ( http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          McKusick's Online Mendelian Inheritance in Man (OMIM®)

          McKusick's Online Mendelian Inheritance in Man (OMIM®; http://www.ncbi.nlm.nih.gov/omim), a knowledgebase of human genes and phenotypes, was originally published as a book, Mendelian Inheritance in Man, in 1966. The content of OMIM is derived exclusively from the published biomedical literature and is updated daily. It currently contains 18 961 full-text entries describing phenotypes and genes. To date, 2239 genes have mutations causing disease, and 3770 diseases have a molecular basis. Approximately 70 new entries are added and 700 entries are updated per month. OMIM® is expanding content and organization in response to shifting biological paradigms and advancing biotechnology.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database.

            The goal of this study was to assess the validity of the International Classification of Disease, 10th Version (ICD-10) administrative hospital discharge data and to determine whether there were improvements in the validity of coding for clinical conditions compared with ICD-9 Clinical Modification (ICD-9-CM) data. We reviewed 4,008 randomly selected charts for patients admitted from January 1 to June 30, 2003 at four teaching hospitals in Alberta, Canada to determine the presence or absence of 32 clinical conditions and to assess the agreement between ICD-10 data and chart data. We then re-coded the same charts using ICD-9-CM and determined the agreement between the ICD-9-CM data and chart data for recording those same conditions. The accuracy of ICD-10 data relative to chart data was compared with the accuracy of ICD-9-CM data relative to chart data. Sensitivity values ranged from 9.3 to 83.1 percent for ICD-9-CM and from 12.7 to 80.8 percent for ICD-10 data. Positive predictive values ranged from 23.1 to 100 percent for ICD-9-CM and from 32.0 to 100 percent for ICD-10 data. Specificity and negative predictive values were consistently high for both ICD-9-CM and ICD-10 databases. Of the 32 conditions assessed, ICD-10 data had significantly higher sensitivity for one condition and lower sensitivity for seven conditions relative to ICD-9-CM data. The two databases had similar sensitivity values for the remaining 24 conditions. The validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions was generally similar though validity differed between coding versions for some conditions. The implementation of ICD-10 coding has not significantly improved the quality of administrative data relative to ICD-9-CM. Future assessments like this one are needed because the validity of ICD-10 data may get better as coders gain experience with the new coding system.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Database resources of the National Center for Biotechnology Information

              In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2013
                22 October 2013
                : 8
                : 10
                : e77910
                Affiliations
                [1]Database Center for Life Science (DBCLS), Research Organization of Information and Systems (ROIS), Tokyo, Japan
                Cairo University, Egypt
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: TN TO HB. Performed the experiments: TN TO. Analyzed the data: TN TO. Wrote the paper: TN HB.

                Article
                PONE-D-13-08161
                10.1371/journal.pone.0077910
                3805581
                24167589
                3a950427-2bbb-4b1f-8089-92fc65339abf
                Copyright @ 2013

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 20 February 2013
                : 5 September 2013
                Page count
                Pages: 7
                Funding
                This work was supported by the Life Science Database Integration Project of the National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency (JST). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article