• Record: found
  • Abstract: found
  • Article: found
Is Open Access

Universal features in the genome-level evolution of protein domains

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Novel protein domain stochastic duplication/innovation models that are independent of genome-specific features are used to interpret global trends of genome evolution.



      Protein domains can be used to study proteome evolution at a coarse scale. In particular, they are found on genomes with notable statistical distributions. It is known that the distribution of domains with a given topology follows a power law. We focus on a further aspect: these distributions, and the number of distinct topologies, follow collective trends, or scaling laws, depending on the total number of domains only, and not on genome-specific features.


      We present a stochastic duplication/innovation model, in the class of the so-called 'Chinese restaurant processes', that explains this observation with two universal parameters, representing a minimal number of domains and the relative weight of innovation to duplication. Furthermore, we study a model variant where new topologies are related to occurrence in genomic data, accounting for fold specificity.


      Both models have general quantitative agreement with data from hundreds of genomes, which indicates that the domains of a genome are built with a combination of specificity and robust self-organizing phenomena. The latter are related to the basic evolutionary 'moves' of duplication and innovation, and give rise to the observed scaling laws, a priori of the specific evolutionary history of a genome. We interpret this as the concurrent effect of neutral and selective drives, which increase duplication and decrease innovation in larger and more complex genomes. The validity of our model would imply that the empirical observation of a small number of folds in nature may be a consequence of their evolution.

      Related collections

      Most cited references 45

      • Record: found
      • Abstract: found
      • Article: not found

      SCOP: a structural classification of proteins database for the investigation of sequences and structures.

      To facilitate understanding of, and access to, the information available for protein structures, we have constructed the Structural Classification of Proteins (scop) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure. It also provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references. Two search facilities are available. The homology search permits users to enter a sequence and obtain a list of any structures to which it has significant levels of sequence similarity. The key word search finds, for a word entered by the user, matches from both the text of the scop database and the headers of Brookhaven Protein Databank structure files. The database is freely accessible on World Wide Web (WWW) with an entry point to URL http: parallel magnitude of scop.
        • Record: found
        • Abstract: found
        • Article: not found

        Emergence of scaling in random networks

        Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
          • Record: found
          • Abstract: found
          • Article: not found

          The origins of genome complexity.

          Complete genomic sequences from diverse phylogenetic lineages reveal notable increases in genome complexity from prokaryotes to multicellular eukaryotes. The changes include gradual increases in gene number, resulting from the retention of duplicate genes, and more abrupt increases in the abundance of spliceosomal introns and mobile genetic elements. We argue that many of these modifications emerged passively in response to the long-term population-size reductions that accompanied increases in organism size. According to this model, much of the restructuring of eukaryotic genomes was initiated by nonadaptive processes, and this in turn provided novel substrates for the secondary evolution of phenotypic complexity by natural selection. The enormous long-term effective population sizes of prokaryotes may impose a substantial barrier to the evolution of complex genomes and morphologies.

            Author and article information

            [1 ]Università degli Studi di Milano, Dip. Fisica. Via Celoria 16, 20133 Milano, Italy
            [2 ]INFN, Via Celoria 16, 20133 Milano, Italy
            Genome Biol
            Genome Biology
            BioMed Central
            30 January 2009
            : 10
            : 1
            : R12
            Copyright © 2009 Cosentino Lagomarsino et al.; licensee BioMed Central Ltd.

            This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.




            Comment on this article