29
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CLU: A new algorithm for EST clustering

      research-article
      1 , , 2
      BMC Bioinformatics
      BioMed Central
      Second Annual MidSouth Computational Biology and Bioinformatics Society Conference. Bioinformatics: a systems approach
      7–9 October 2004

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression.

          Results

          We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats.

          Conclusion

          CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded from http://compbio.pbrc.edu/pti

          Related collections

          Most cited references13

          • Record: found
          • Abstract: not found
          • Article: not found

          dbEST--database for "expressed sequence tags".

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The TIGR gene indices: reconstruction and representation of expressed gene sequences.

              Expressed sequence tags (ESTs) have provided a first glimpse of the collection of transcribed sequences in a variety of organisms. However, a careful analysis of this sequence data can provide significant additional functional, structural and evolutionary information. Our analysis of the public EST sequences, available through the TIGR Gene Indices (TGI; http://www.tigr.org/tdb/tdb.html ), is an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed for selected organisms by first clustering, then assembling EST and annotated gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, and to provide links between orthologous and paralogous genes.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                2005
                15 July 2005
                : 6
                : Suppl 2
                : S3
                Affiliations
                [1 ]Pennington Biomedical Research Center, 6400 Perkins Rd. Baton Rouge LA 70808
                [2 ]South African National Bioinformatics Institute, P/b X17 UWC SANBI Bellville 7535
                Article
                1471-2105-6-S2-S3
                10.1186/1471-2105-6-S2-S3
                1637039
                16026600
                c67ddb45-614e-42e1-911c-48a69b7f2f93
                Copyright © 2006 Ptitsyn and Hide; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Second Annual MidSouth Computational Biology and Bioinformatics Society Conference. Bioinformatics: a systems approach
                Little Rock, AR, USA
                7–9 October 2004
                History
                Categories
                Proceedings

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article