0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Thousands of human non-AUG extended proteoforms lack evidence of evolutionary selection among mammals

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The synthesis of most proteins begins at AUG codons, yet a small number of non-AUG initiated proteoforms are also known. Here we analyse a large number of publicly available Ribo-seq datasets to identify novel, previously uncharacterised non-AUG proteoforms using Trips-Viz implementation of a novel algorithm for detecting translated ORFs. In parallel we analyse genomic alignment of 120 mammals to identify evidence of protein coding evolution in sequences encoding potential extensions. Unexpectedly we find that the number of non-AUG proteoforms identified with ribosome profiling data greatly exceeds those with strong phylogenetic support suggesting their recent evolution. Our study argues that the protein coding potential of human genome greatly exceeds that detectable through comparative genomics and exposes the existence of multiple proteins encoded by the same genomic loci.

          Abstract

          Analysis of a large number of Ribo-seq datasets and genomic alignments led to detection of novel non-AUG proteoforms. Unexpectedly the number of non-AUG proteoforms identified with Ribo-seq greatly exceeds those with strong phylogenetic support.

          Related collections

          Most cited references81

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

            We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BEDTools: a flexible suite of utilities for comparing genomic features

              Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                120220049@umail.ucc.ie
                p.baranov@ucc.ie
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                23 December 2022
                23 December 2022
                2022
                : 13
                : 7910
                Affiliations
                [1 ]GRID grid.7872.a, ISNI 0000000123318773, School of Biochemistry and Cell Biology, , University College Cork, ; Cork, Ireland
                [2 ]GRID grid.7872.a, ISNI 0000000123318773, SFI Centre for Research Training in Genomics Data Science, , University College Cork, ; Cork, Ireland
                [3 ]GRID grid.418853.3, ISNI 0000 0004 0440 1573, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS, ; Moscow, Russia
                [4 ]GRID grid.14476.30, ISNI 0000 0001 2342 9668, Belozersky Institute of Physico-Chemical Biology, , Lomonosov Moscow State University, ; Moscow, Russia
                [5 ]GRID grid.52788.30, ISNI 0000 0004 0427 7672, European Molecular Biology Laboratory, European Bioinformatics Institute, , Wellcome Genome Campus, ; Hinxton, Cambridge UK
                Author information
                http://orcid.org/0000-0001-5996-7855
                http://orcid.org/0000-0001-5320-3045
                http://orcid.org/0000-0003-4789-7495
                http://orcid.org/0000-0001-9017-0270
                Article
                35595
                10.1038/s41467-022-35595-6
                9789052
                36564405
                63cc3793-bad6-48fd-94eb-392a6bd92c28
                © The Author(s) 2022

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 1 June 2022
                : 12 December 2022
                Funding
                Funded by: FundRef https://doi.org/10.13039/100004440, Wellcome Trust (Wellcome);
                Award ID: 210692/Z/18/Z
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/501100001602, Science Foundation Ireland (SFI);
                Award ID: 20/FFP-A/8929
                Award ID: 18/CRT/6214
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/501100002081, Irish Research Council (An Chomhairle um Thaighde in Éirinn);
                Funded by: SFI-HRB-Wellcome Trust Biomedical Research Partnership, grant number 210692/Z/18/Z
                Funded by: FundRef https://doi.org/10.13039/501100001596, Irish Research Council for Science, Engineering and Technology (IRCSET);
                Funded by: FundRef https://doi.org/10.13039/501100006769, Russian Science Foundation (RSF);
                Award ID: 19-14-00152
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100013060, European Molecular Biology Laboratory (EMBL Heidelberg);
                Categories
                Article
                Custom metadata
                © The Author(s) 2022

                Uncategorized
                sequence annotation,genome informatics,proteome,molecular evolution,translation
                Uncategorized
                sequence annotation, genome informatics, proteome, molecular evolution, translation

                Comments

                Comment on this article