74
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      GapCoder automates the use of indel characters in phylogenetic analysis

      research-article
      1 , , 2
      BMC Bioinformatics
      BioMed Central

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Several ways of incorporating indels into phylogenetic analysis have been suggested. Simple indel coding has two strengths: (1) biological realism and (2) efficiency of analysis. In the method, each indel with different start and/or end positions is considered to be a separate character. The presence/absence of these indel characters is then added to the data set.

          Algorithm

          We have written a program, GapCoder to automate this procedure. The program can input PIR format aligned datasets, find the indels and add the indel-based characters. The output is a NEXUS format file, which includes a table showing what region each indel characters is based on. If regions are excluded from analysis, this table makes it easy to identify the corresponding indel characters for exclusion.

          Discussion

          Manual implementation of the simple indel coding method can be very time-consuming, especially in data sets where indels are numerous and/or overlapping. GapCoder automates this method and is therefore particularly useful during procedures where phylogenetic analyses need to be repeated many times, such as when different alignments are being explored or when various taxon or character sets are being explored. GapCoder is currently available for Windows from http://www.home.duq.edu/~youngnd/GapCoder.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: found
          • Article: not found

          Phylogenetic utility of the external transcribed spacer (ETS) of 18S-26S rDNA: congruence of ETS and ITS trees of Calycadenia (Compositae).

          The 3' region of the external transcribed spacer (ETS) of 18S-26S nuclear ribosomal DNA was sequenced in 19 representatives of Calycadenia/Osmadenia and two outgroup species (Compositae) to assess its utility for phylogeny reconstruction compared to rDNA internal transcribed spacer (ITS) data. Universal primers based on plant, fungal, and animal sequences were designed to amplify the intergenic spacer (IGS) and an angiosperm primer was constructed to sequence the 3' end of the ETS in members of tribe Heliantheae. Based on these sequences, an internal ETS primer useful across Heliantheae sensu lato was designed to amplify and sequence directly the 3' ETS region in the study taxa, which were the subjects of an earlier phylogenetic investigation based on ITS sequences. Size variation in the amplified ETS region varied across taxa of Heliantheae sensu lato from approximately 350 to 700 bp, in part attributable to an approximately 200-bp tandem duplication in a common ancestor of Calycadenia/Osmadenia. Phylogenetic analysis of the 200-bp subrepeats and examination of apomorphic changes in the duplicated region demonstrate that the subrepeats in Calycadenia/Osmadenia have evolved divergently. Phylogenetic analyses of the entire amplified ETS region yielded a highly resolved strict consensus tree that is nearly identical in topology to the ITS tree, with strong bootstrap and decay support on most branches. Parsimony analyses of combined ETS and ITS data yielded a strict consensus tree that is better resolved and generally better supported than trees based on either data set analyzed separately. We calculated an approximately 1.3- to 2.4-fold higher rate of sequence evolution by nucleotide substitution in the ETS region studied than in ITS-1 + ITS-2. A similar disparity in the proportion of variable (1.3 ETS:1 ITS) and potentially informative (1.5 ETS:1 ITS) sites was observed for the ingroup. Levels of homoplasy are similar in the ETS and ITS data. We conclude that the ETS holds great promise for augmenting ITS data for phylogenetic studies of young lineages. Copyright 1998 Academic Press.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology.

            Phylogenetic analyses of non-protein-coding nucleotide sequences such as ribosomal RNA genes, internal transcribed spacers, and introns are often impeded by regions of the alignments that are ambiguously aligned. These regions are characterized by the presence of gaps and their uncertain positions, no matter which optimization criteria are used. This problem is particularly acute in large-scale phylogenetic studies and when aligning highly diverged sequences. Accommodating these regions, where positional homology is likely to be violated, in phylogenetic analyses has been dealt with very differently by molecular systematists and evolutionists, ranging from the total exclusion of these regions to the inclusion of every position regardless of ambiguity in the alignment. We present a new method that allows the inclusion of ambiguously aligned regions without violating homology. In this three-step procedure, first homologous regions of the alignment containing ambiguously aligned sequences are delimited. Second, each ambiguously aligned region is unequivocally coded as a new character, replacing its respective ambiguous region. Third, each of the coded characters is subjected to a specific step matrix to account for the differential number of changes (summing substitutions and indels) needed to transform one sequence to another. The optimal number of steps included in the step matrix is the one derived from the pairwise alignment with the greatest similarity and the least number of steps. In addition to potentially enhancing phylogenetic resolution and support, by integrating previously nonaccessible characters without violating positional homology, this new approach can improve branch length estimations when using parsimony.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Analysis of insertions/deletions in protein structures.

              An analysis of insertions and deletions (indels) occurring in a databank of multiple sequence alignments based on protein tertiary structure is reported. Indels prefer to be short (1 to 5 residues). The average intervening sequence length between them versus the percentage of residue identity in pairwise alignments shows an exponential behaviour, suggesting a stochastic process such that nearly every loop in an ancestral structure is a possible target for indels during evolution. The results also suggest a limit to the average size of indels accommodated by protein structures. The preferred indel conformations are reverse turn and coil as are the preferred conformations at the indel edges (N- and C-terminal sides). Interruptions in helices and strands were observed as very rare events.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                2003
                19 February 2003
                : 4
                : 6
                Affiliations
                [1 ]Department of Biological Sciences, Duquesne University, Pittsburgh, PA 15219, USA
                [2 ]Biology Department, Trinity University, 715 Stadium Dr., San Antonio, TX 78212, USA
                Article
                1471-2105-4-6
                10.1186/1471-2105-4-6
                153505
                12689349
                05ffcbce-2dec-4bc1-b7ea-438ed3b27fed
                Copyright © 2003 Young and Healy; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
                History
                : 29 November 2002
                : 19 February 2003
                Categories
                Methodology Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article