25
views
0
recommends
+1 Recommend
1 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      OMGene: mutual improvement of gene models through optimisation of evolutionary conservation

      research-article
      ,
      BMC Genomics
      BioMed Central
      Genome annotation, Annotation errors, Orthogroups, Orthology, Gene model

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The accurate determination of the genomic coordinates for a given gene – its gene model – is of vital importance to the utility of its annotation, and the accuracy of bioinformatic analyses derived from it. Currently-available methods of computational gene prediction, while on the whole successful, frequently disagree on the model for a given predicted gene, with some or all of the variant gene models often failing to match the biologically observed structure. Many prediction methods can be bolstered by using experimental data such as RNA-seq. However, these resources are not always available, and rarely give a comprehensive portrait of an organism’s transcriptome due to temporal and tissue-specific expression profiles.

          Results

          Orthology between genes provides evolutionary evidence to guide the construction of gene models. OMGene (Optimise My Gene) aims to improve gene model accuracy in the absence of experimental data by optimising the consistency of multiple sequence alignments of orthologous genes from multiple species. Using RNA-seq data sets from plants, mammals, and fungi, considering intron/exon junction representation and exon coverage, and assessing the intra-orthogroup consistency of subcellular localisation predictions, we demonstrate the utility of OMGene for improving gene models in annotated genomes.

          Conclusions

          We show that significant improvements in the accuracy of gene model annotations can be made, both in established and in de novo annotated genomes, by leveraging information from multiple species.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Biases in Illumina transcriptome sequencing caused by random hexamer priming

          Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

            Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

              Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
                Bookmark

                Author and article information

                Contributors
                +44 (0) 1865 275123 , steven.kelly@plants.ox.ac.uk
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                27 April 2018
                27 April 2018
                2018
                : 19
                : 307
                Affiliations
                ISNI 0000 0004 1936 8948, GRID grid.4991.5, Department of Plant Sciences, , University of Oxford, ; South Parks Road, Oxford, OX1 3RB UK
                Author information
                http://orcid.org/0000-0001-8583-5362
                Article
                4704
                10.1186/s12864-018-4704-z
                5923031
                29703150
                f1231a29-0417-4ba2-be6a-3452fddee96d
                © The Author(s). 2018

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 17 November 2017
                : 19 April 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100010663, H2020 European Research Council;
                Award ID: 637765
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100000266, Engineering and Physical Sciences Research Council;
                Award ID: EP/G03706X/1
                Award Recipient :
                Categories
                Software
                Custom metadata
                © The Author(s) 2018

                Genetics
                genome annotation,annotation errors,orthogroups,orthology,gene model
                Genetics
                genome annotation, annotation errors, orthogroups, orthology, gene model

                Comments

                Comment on this article