Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Plant Proteins Are Smaller Because They Are Encoded by Fewer Exons than Animal Proteins

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylogenetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa), average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ∼81 aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ∼10 exons of small size [∼176 nucleotides (nt)]. Streptophyta have on average only ∼5.7 exons of medium size (∼230 nt). Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400 nt). Among subcellular compartments, membrane proteins are the largest (∼520 aa), whereas the smallest proteins correspond to the gene ontology group of ribosome (∼240 aa). Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ∼34% more but ∼20% smaller proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes.

      Related collections

      Most cited references 60

      • Record: found
      • Abstract: found
      • Article: not found

      MUSCLE: multiple sequence alignment with high accuracy and high throughput.

       Robert Edgar (2004)
      We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

        The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

          PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
            Bookmark

            Author and article information

            Affiliations
            [1 ]Genetic Engineering Department, CINVESTAV Unidad Irapuato, Irapuato, CP 36821, Mexico
            [2 ]Colegio de Postgraduados, Campus Montecillo, Texcoco, CP 56230, Mexico
            Author notes
            [* ]Corresponding author. atiessen@ 123456ira.cinvestav.mx
            [a]

            ORCID: 0000-0003-3156-1209.

            [b]

            ORCID: 0000-0002-3202-1784.

            [c]

            ORCID: 0000-0003-4193-2720.

            [d]

            ORCID: 0000-0001-5572-4274.

            Contributors
            Journal
            Genomics Proteomics Bioinformatics
            Genomics Proteomics Bioinformatics
            Genomics, Proteomics & Bioinformatics
            Elsevier
            1672-0229
            2210-3244
            18 December 2016
            December 2016
            18 December 2016
            : 14
            : 6
            : 357-370
            27998811
            5200936
            S1672-0229(16)30187-5
            10.1016/j.gpb.2016.06.003
            © 2016 The Authors

            This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

            Categories
            Original Research

            Comments

            Comment on this article