+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Exceptional Diversity, Non-Random Distribution, and Rapid Evolution of Retroelements in the B73 Maize Genome

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Recent comprehensive sequence analysis of the maize genome now permits detailed discovery and description of all transposable elements (TEs) in this complex nuclear environment. Reiteratively optimized structural and homology criteria were used in the computer-assisted search for retroelements, TEs that transpose by reverse transcription of an RNA intermediate, with the final results verified by manual inspection. Retroelements were found to occupy the majority (>75%) of the nuclear genome in maize inbred B73. Unprecedented genetic diversity was discovered in the long terminal repeat (LTR) retrotransposon class of retroelements, with >400 families (>350 newly discovered) contributing >31,000 intact elements. The two other classes of retroelements, SINEs (four families) and LINEs (at least 30 families), were observed to contribute 1,991 and ∼35,000 copies, respectively, or a combined ∼1% of the B73 nuclear genome. With regard to fully intact elements, median copy numbers for all retroelement families in maize was 2 because >250 LTR retrotransposon families contained only one or two intact members that could be detected in the B73 draft sequence. The majority, perhaps all, of the investigated retroelement families exhibited non-random dispersal across the maize genome, with LINEs, SINEs, and many low-copy-number LTR retrotransposons exhibiting a bias for accumulation in gene-rich regions. In contrast, most (but not all) medium- and high-copy-number LTR retrotransposons were found to preferentially accumulate in gene-poor regions like pericentromeric heterochromatin, while a few high-copy-number families exhibited the opposite bias. Regions of the genome with the highest LTR retrotransposon density contained the lowest LTR retrotransposon diversity. These results indicate that the maize genome provides a great number of different niches for the survival and procreation of a great variety of retroelements that have evolved to differentially occupy and exploit this genomic diversity.

          Author Summary

          Although TEs are a major component of all studied plant genomes, and are the most significant contributors to genome structure and evolution in almost all eukaryotes that have been investigated, their properties and reasons for existence are not well understood in any eukaryotic genome. In order to begin a comprehensive study of TE contributions to the structure, function, and evolution of both genes and genomes, we first identified all of the TEs in maize and then investigated whether there were non-random patterns in their dispersal. We used homology and TE structure criteria in an effort to discover all of the retroelements in the recently sequenced genome from maize inbred B73. We found that the retroelements are incredibly diverse in maize, with many hundreds of families that show different insertion and/or retention specificities across the maize chromosomes. Most of these element families are present in low copy numbers and had been missed by previous searches that relied on a high-copy-number criterion. Different element families exhibited very different biases for accumulation across the chromosomes, indicating that they can detect and utilize many different chromatin environments.

          Related collections

          Most cited references 47

          • Record: found
          • Abstract: found
          • Article: not found

          MUSCLE: multiple sequence alignment with high accuracy and high throughput.

           Robert Edgar (2004)
          We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
            • Record: found
            • Abstract: found
            • Article: not found

            CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

            The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.
              • Record: found
              • Abstract: found
              • Article: not found

              PAML 4: phylogenetic analysis by maximum likelihood.

               Ziheng Yang (2007)
              PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (

                Author and article information

                [1 ]Department of Genetics, University of Georgia, Athens, Georgia, United States of America
                [2 ]Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
                [3 ]Université de Perpignan, Via Domitia, CNRS UMR5096 LGDP, Perpignan, France
                [4 ]Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, Indiana, United States of America
                Fred Hutchinson Cancer Research Center, United States of America
                Author notes

                Conceived and designed the experiments: RSB JCE CC JMD RPW PJS JLB. Performed the experiments: RSB JCE CC NU AJ JMD RPW PJS JLB. Analyzed the data: RSB JCE CC NU AJ JMD RPW PJS JLB. Contributed reagents/materials/analysis tools: RSB JCE CC JMD RPW PJS JLB. Wrote the paper: RSB JCE CC JMD RPW PJS JLB.

                Role: Editor
                PLoS Genet
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                November 2009
                November 2009
                20 November 2009
                : 5
                : 11
                Baucom et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Pages: 13
                Research Article
                Genetics and Genomics
                Genetics and Genomics/Genome Projects
                Genetics and Genomics/Genomics
                Genetics and Genomics/Plant Genomes and Evolution



                Comment on this article