+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GENCODE reference annotation for the human and mouse genomes

      1 , 2 , 3 , 4 , 5 , 6 , 7 , 1 , 1 , 8 , 9 , 10 , 2 , 1 , 1 , 1 , 11 , 3 , 1 , 12 , 1 , 2 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 11 , 1 , 12 , 1 , 13 , 14 , 8 , 1 , 8 , 12 , 1 , 1 , 1 , 1 , 1 , 15 , 8 , 1 , 1 , 8 , 16 , 1 , 10 , 8 , 17 , 18 , 11 , 19 , 20 , 6 , 7 , 2 , 3 , 12 , 1

      Nucleic Acids Research

      Oxford University Press

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as

          Related collections

          Most cited references 48

          • Record: found
          • Abstract: found
          • Article: not found

          Basic local alignment search tool.

          A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
            • Record: found
            • Abstract: found
            • Article: not found

            An Integrated Encyclopedia of DNA Elements in the Human Genome

            Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A global reference for human genetic variation

               Lachlan Coin (2017)
              The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

                Author and article information

                Nucleic Acids Res
                Nucleic Acids Res
                Nucleic Acids Research
                Oxford University Press
                08 January 2019
                24 October 2018
                24 October 2018
                : 47
                : Database issue , Database issue
                : D766-D773
                [1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
                [2 ]UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
                [3 ]Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
                [4 ]Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
                [5 ]Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
                [6 ]MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
                [7 ]Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
                [8 ]Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
                [9 ]Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
                [10 ]Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
                [11 ]Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
                [12 ]Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
                [13 ]Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA
                [14 ]Systems Biology Institute, Yale University, West Haven, CT 06516, USA
                [15 ]Centre of New Technologies, University of Warsaw, Warsaw, Poland
                [16 ]Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
                [17 ]Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
                [18 ]Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
                [19 ]Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
                [20 ]Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
                Author notes
                To whom correspondence should be addressed. Tel: +44 1223 492581; Fax: +44 1223 494494; Email: flicek@
                © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                Page count
                Pages: 8
                Funded by: National Human Genome Research Institute 10.13039/100000051
                Award ID: U41HG007234
                Funded by: Wellcome Trust 10.13039/100004440
                Award ID: WT108749/Z/15/Z
                Award ID: WT200990/Z/16/Z
                Database Issue



                Comment on this article