358
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

      research-article
      1 , 1 , 2 , 3 , 68 ,   4 , 5 , 6 , 5 , 7 , 8 , 1 , 1 , 7 , 9 , 5 , 5 , 5 , 4 , 10 , 11 , 12 , 13 , 14 , 13 , 14 , 15 , 16 , 3 , 17 , 4 , 4 , 5 , 18 , 11 , 19 , 20 , 21 , 5 , 1 , 1 , 1 , 7 , 1 , 7 , 1 , 7 , 1 , 7 , 22 , 1 , 7 , 23 , 24 , 25 , 26 , 27 , 19 , 28 , 29 , 20 , 20 , 21 , 30 , 20 , 31 , 32 , 33 , 34 , 15 , 35 , 36 , 20 , 21 , 18 , 23 , 21 , 37 , 5 , 1 , 1 , 7 , 38 , 34 , 39 , 40 , 41 , 42 , 20 , 21 , 1 , 7 , 43 , 4 , 1 , 7 , 26 ,   44 , 4 , 45 , 11 , 46 , 47 , 48 , 5 , 43 , 20 , 5 , 49 , 23 , 50 , 52 , 11 , 51 , 1 , 27 , 3 , 45 , 52 , 53 , 5 , 35 , 54 , 22 , 55 , 20 , 47 , 21 , 37 , 56 , 5 , 5 , 39 , 47 , 4 , 13 , 1 , 7 , 52 , 27 , 30 , 25 , 15 , 51 , 5 , 1 , 7 , 1 , 7 , 43 , 33 , 57 , 58 , 59 , 48 , 3 , 15 , 1 , 7 , 18 , 60 , 48 , 61 , 3 , 3 , 12 , 3 , 57 , 62 , 26 , 26 , 44 , 63 , 5 , 5 , 5 , 21 , 37 , 64 , 45 , 65 , 4 , 5 , 40 , 13 , 47 , 16 , 10 , 66 , 20 , 21 , 40 , 1 , 5 , 67 , , 3 , 40 , 68
      PLoS Biology
      Public Library of Science
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.

          Abstract

          An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community

          Related collections

          Most cited references96

          • Record: found
          • Abstract: found
          • Article: not found

          Genome sequence of the nematode C. elegans: a platform for investigating biology.

          (1999)
          The 97-megabase genomic sequence of the nematode Caenorhabditis elegans reveals over 19,000 genes. More than 40 percent of the predicted protein products find significant matches in other organisms. There is a variety of repeated sequences, both local and dispersed. The distinctive distribution of some repeats and highly conserved genes provides evidence for a regional organization of the chromosomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

            Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The KEGG databases at GenomeNet.

              The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and complexes, the GENES database for the information about genes and proteins generated by genome sequencing projects, and the LIGAND database for the information about chemical compounds and chemical reactions that are relevant to cellular processes. In addition to these three main databases, limited amounts of experimental data for microarray gene expression profiles and yeast two-hybrid systems are stored in the EXPRESSION and BRITE databases, respectively. Furthermore, a new database, named SSDB, is available for exploring the universe of all protein coding genes in the complete genomes and for identifying functional links and ortholog groups. The data objects in the KEGG databases are all represented as graphs and various computational methods are developed to detect graph features that can be related to biological functions. For example, the correlated clusters are graph similarities which can be used to predict a set of genes coding for a pathway or a complex, as summarized in the ortholog group tables, and the cliques in the SSDB graph are used to annotate genes. The KEGG databases are updated daily and made freely available (http://www.genome.ad.jp/kegg/).
                Bookmark

                Author and article information

                Journal
                PLoS Biol
                pbio
                PLoS Biology
                Public Library of Science (San Francisco, USA )
                1544-9173
                1545-7885
                June 2004
                20 April 2004
                : 2
                : 6
                : e162
                Affiliations
                [1] 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and Technology TokyoJapan
                [2] 2Bioinformatics Laboratory, Genome Research Department, National Institute of Agrobiological Sciences IbarakiJapan
                [3] 3Human Genome Center, The Institute of Medical Science, The University of Tokyo TokyoJapan
                [4] 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus CambridgeUnited Kingdom
                [5] 5Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics ShizuokaJapan
                [6] 6Nara Institute of Science and Technology NaraJapan
                [7] 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics Consortium TokyoJapan
                [8] 8BITS Company ShizuokaJapan
                [9] 9Quantum Bioinformatics Group, Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute KyotoJapan
                [10] 10Reverse Proteomics Research Institute ChibaJapan
                [11] 11Central Research Laboratory, Hitachi TokyoJapan
                [12] 12Bioinformatics Center, Institute for Chemical Research, Kyoto University KyotoJapan
                [13] 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Bethesda, MarylandUnited States of America
                [14] 14Centre National de la Recherche Scientifique (CNRS), Laboratoire de Physique Mathematique MontpellierFrance
                [15] 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus CambridgeUnited Kingdom
                [16] 16National Cancer Institute, National Institutes of Health Bethesda, MarylandUnited States of America
                [17] 17Department of Biological Sciences, Idaho State University Pocatello, IdahoUnited States of America
                [18] 18Korea Research Institute of Bioscience and Biotechnology TaejeonKorea
                [19] 19Center for Genomics and Bioinformatics, Karolinska Institutet StockholmSweden
                [20] 20Genexpress—CNRS—Functional Genomics and Systemic Biology for Health Villejuif CedexFrance
                [21] 21Sino-French Laboratory in Life Sciences and Genomics ShanghaiChina
                [22] 22Tokyo Research Laboratories, Kyowa Hakko Kogyo Company TokyoJapan
                [23] 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and Health NeuherbergGermany
                [24] 24Centre for Bioinformatics and Biological Computing, School of Information Technology, Murdoch University Murdoch, Western AustraliaAustralia
                [25] 25Medical Education and Biomedical Research Facility, University of Iowa Iowa City, IowaUnited States of America
                [26] 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute KanagawaJapan
                [27] 27Medical College of Wisconsin, Milwaukee WisconsinUnited States of America
                [28] 28HUGO Gene Nomenclature Committee, University College London LondonUnited Kingdom
                [29] 29Genome Science Laboratory, RIKEN SaitamaJapan
                [30] 30Ludwig Institute of Cancer Research Sao PauloBrazil
                [31] 31CNRS Vandoeuvre les NancyFrance
                [32] 32Lawrence Berkeley National Laboratory, Berkeley CaliforniaUnited States of America
                [33] 33Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental University TokyoJapan
                [34] 34Swiss Institute of Bioinformatics GenevaSwitzerland
                [35] 35Bioresource Information Division, RIKEN BioResource Center, RIKEN Tsukuba Institute IbarakiJapan
                [36] 36Genome Knowledgebase, Cold Spring Harbor Laboratory Cold Spring Harbor, New YorkUnited States of America
                [37] 37Chinese National Human Genome Center at Shanghai ShanghaiChina
                [38] 38Division of Genetic Resources, National Institute of Infectious Diseases TokyoJapan
                [39] 39Graduate School of Frontier Sciences, Department of Integrated Biosciences, University of Tokyo ChibaJapan
                [40] 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and Technology TokyoJapan
                [41] 41Department of Primary Care and Population Sciences, Royal Free University College Medical School, University College London LondonUnited Kingdom
                [42] 42Clinical and Molecular Genetics Unit, The Institute of Child Health LondonUnited Kingdom
                [43] 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai University KanagawaJapan
                [44] 44South African National Bioinformatics Institute, University of the Western Cape BellvilleSouth Africa
                [45] 45Kazusa DNA Research Institute ChibaJapan
                [46] 46RZPD Resource Center for Genome Research HeidelbergGermany
                [47] 47Molecular Genome Analysis, German Cancer Research Center-DKFZ HeidelbergGermany
                [48] 48Pennsylvania State University University Park, PennsylvaniaUnited States of America
                [49] 49Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University OsakaJapan
                [50] 50Medical Photobiology Department, Photon Medical Research Center, Hamamatsu University School of Medicine ShizuokaJapan
                [51] 51Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology TokyoJapan
                [52] 52Department of Molecular Biology, Keio University School of Medicine TokyoJapan
                [53] 53Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology KanagawaJapan
                [54] 54Global Scientific Information and Computing Center, Tokyo Institute of Technology TokyoJapan
                [55] 55Molecular Biology Laboratory, Medicinal Research Laboratories, Taisho Pharmaceutical Company SaitamaJapan
                [56] 56Department of Population Genetics, National Institute of Genetics ShizuokaJapan
                [57] 57Human Genome Research Group, Genomic Sciences Center, RIKEN Yokohama Institute KanagawaJapan
                [58] 58Columbia University and Columbia Genome Center New York, New YorkUnited States of America
                [59] 59Department of Biotechnology, Royal Institute of Technology StockholmSweden
                [60] 60Biology Division and Genome Task Group, Office of Biological and Environmental Research, United States Department of Energy Washington, D.CUnited States of America
                [61] 61Faculty of Bio-Science, Nagahama Institute of Bio-Science and Technology ShigaJapan
                [62] 62Institute for Genomic Research Rockville, MarylandUnited States of America
                [63] 63Center for Genome Information, Department of Environmental Health, University of Cincinnati Cincinnati, OhioUnited States of America
                [64] 64State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Rui-Jin Hospital, Shanghai Second Medical University ShanghaiChina
                [65] 65PointOne Systems Wauwatosa, WisconsinUnited States of America
                [66] 66Graduate School of Life and Environmental Sciences, University of Tsukuba IbarakiJapan
                [67] 67Department of Genetics, Graduate University for Advanced Studies ShizuokaJapan
                [68] 68Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo TokyoJapan
                Article
                10.1371/journal.pbio.0020162
                393292
                15103394
                bd9bcda8-1eda-43c1-809d-0b7bdb6994fd
                Copyright: © 2004 Imanishi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
                History
                : 19 December 2003
                : 1 April 2004
                Categories
                Research Article
                Bioinformatics/Computational Biology
                Genetics/Genomics/Gene Therapy
                Homo (Human)

                Life sciences
                Life sciences

                Comments

                Comment on this article