12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Workflow management systems for gene sequence analysis and evolutionary studies – A Review

      research-article
      * , ,
      Bioinformation
      Biomedical Informatics
      Analysis, bioinformatics, databases, phylogeny, integration, workflows

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Post ‘omic’ era has resulted in the development of many primary, secondary and derived databases. Many analytical and visualization bioinformatics tools have been developed to manage and analyze the data available through large sequencing projects. Availability of heterogeneous databases and tools make it difficult for researchers to access information from varied sources and run different bioinformatics tools to get desired analysis done. Building integrated bioinformatics platforms is one of the most challenging tasks that bioinformatics community is facing. Integration of various databases, tools and algorithm is a challenging problem to deal with. This article describes the bioinformatics analysis workflow management systems that are developed in the area of gene sequence analysis and phylogeny. This article will be useful for biotechnologists, molecular biologists, computer scientists and statisticians engaged in computational biology and bioinformatics research.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets.

          TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PseudoPipe: an automated pseudogene identification pipeline.

            Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential "parent" proteins against the intergenic regions of the genome and then processing the resulting "raw hits" -- i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Hal: an Automated Pipeline for Phylogenetic Analyses of Genomic Data

              The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.
                Bookmark

                Author and article information

                Journal
                Bioinformation
                Bioinformation
                Bioinformation
                Bioinformation
                Biomedical Informatics
                0973-8894
                0973-2063
                2013
                17 July 2013
                : 9
                : 13
                : 663-672
                Affiliations
                Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Library Avenue, Pusa, New Delhi - 110012
                Author notes
                Article
                97320630009663
                10.6026/97320630009663
                3732438
                23930017
                c8752187-b4dc-4e1e-a7fe-b1e5eec0a71b
                © 2013 Biomedical Informatics

                This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.

                History
                : 19 February 2013
                : 23 February 2013
                Categories
                Current Trends

                Bioinformatics & Computational biology
                analysis,bioinformatics,databases,phylogeny,integration,workflows

                Comments

                Comment on this article