35
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Proceedings of the 2014 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference

      introduction
      1 , 2 , 3 , 4 , , 10 , 5 , 6 , 7 , 8 , 9
      BMC Bioinformatics
      BioMed Central
      11th Annual MCBIOS Conference
      6-8 March 2014

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Introduction The MidSouth Computational Biology and Bioinformatics Society (MCBIOS 2014) held its eleventh annual conference at the Wes Watkins Center at Oklahoma State University, Stillwater on March 7-8, 2014. The theme was "From Genome to Phenome: Connecting the Dots". Conference Chair this year was Rakesh Kaundal, who is also one of the MCBIOS board members, and conference committee members were Ulrich K. Melcher and Doris Kupfer. The current president is Andy Perkins and Cesar Compadre was elected as President-Elect for 2015-16. There were 154 registrants and a total of 125 abstracts submitted (50 oral and 75 poster presentations). Keynote speakers were Owen White from The University of Maryland School of Medicine, whose talk was titled "The Human Microbiome Project: Large-Scale Data Management and Analysis", and Jody Hey from Temple University "Designing Genealogy Samplers for Population Genetics". Dr. William Slikker, Director of the Food and Drug Administration's, National Center for Toxicological Research, concluded with a talk on the past ten years of MCBIOS and a perspective on its future. Participants also had the opportunity to attend a workshop on next-generation sequencing (NGS), hosted by Peter Hoyt of OSU and Dr. Graham Wiley of the Oklahoma Medical Research Foundation. The workshop had a keynote by Dr. Joshua Orvis of The University of Maryland School of Medicine and Johns Hopkins University on genome annotation. The winners of conference awards were: Best Paper Award: Zongliang Yue, Ping Wan, Hui Huang, Zhan Xie and Jake Y. Chen for "SLDR: A Method to Identify New Gene Regulatory Relationship Candidates" [1] Best Paper Runner-up: Nam S Vo and Vinhthuy Phan for "Exploiting dependencies of pairwise-comparison outcomes to predict patterns of gene response" [2] Best Oral Presentations (Post-Doctoral fellows): Michael A. Bauer, University of Arkansas for Medical Sciences Erich A. Peterson, University of Arkansas for Medical Sciences Best Oral Presentations (students): Karl Walker, Arkansas State University, 1st place Shraddha Thakkar, University of Arkansas for Medical Sciences, 2nd place Mihir Jaiswal, University of Arkansas at Little Rock, 3rd place Best Poster (Computation): Stephen Reichley, Mississippi State University, 1st place Kushal Bohra, Texas A&M University at Commerce, 2nd place Austin McCullough, John Brown University, 3rd place Best Poster (Biology): Shraddha Thakkar, University of Arkansas for Medical Sciences, 1st place Sunetra Das, University of Oklahoma, 2nd place (tie) Garima Saxena, University of North Texas, 2nd place (tie) Kangmei Zhao, University of Oklahoma, 2nd place (tie) Selecting papers for the MCBIOS XI Proceedings All papers were anonymously peer-reviewed by at least 2 reviewers and quantitatively evaluated on the basis of three criteria: Novelty, Impact and Clarity, enabling papers to be ranked. Editors who were also co-authors of submitted papers were not permitted to handle their own papers editorially. This year, 22 papers were submitted, and 16 papers were deemed acceptable by reviewers, giving an acceptance rate of 73%, higher than last year's 59%. Papers generally fell into four categories: Networks and pathways Zongliang Yue, et al. [1] developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. This technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techniques. It can particularly identify "rare" activation/inhibition relationship events that can be difficult to find in experimental results. SLDR is computationally efficient with o(N2) complexity and may be applied to the mining of functional genomics big data for future network biology and network medicine applications. The manuscript by Toby and colleagues [3] reports some interesting observations regarding the possible role of the SigB regulon in the divergence of members of the Bacillus cereus group based on the protein-coding content of 25 completed whole genomes of B. cereus group isolates. The authors used cluster analysis of orthologous proteins to reconstruct the clade structure of the B. cereus group, and found that the resulting structure follows the pattern of what genes belong to the SigB regulon or not. Their observation suggests the hypothesis that horizontal gene transfer, gene duplication/divergence and deletion dictate the underlying coding capacity in these genomes, and is likely a common pattern in prokaryotic evolution. Peterson et al. developed a workflow based on whole exome sequencing to discover clonal lineages in tumor samples. Here, they utilize the availability of samples from a single multiple myeloma patient from initial presentation and two subsequent relapses to arrive at key single nucleotide and insertion/deletion events in Key Genes that presumably gave rise to the initial presentation and the relapses. What is unique is the visualization tool, CloneViz, which allows for visualization of these events by chromosome and applies Gaussian kernel density plots for easy comparison between samples. The benefit is a better understanding of the evolution of a cancer[4]. Genomics & transcriptomics In light of a shrinking NIH budget, researchers can stretch funds further if they can decrease the number of samples used for gene expression profiling. The paper by Vo and Phan [2] addresses the issue of small sample size in microarray experiments that contain measurements of multiple responses. While they developed their method for microarray studies, it can well be scaled to gene expression profiling using RNA sequencing technology. The latter requires complex data processing, made easy by the paper of Peng et al. Their software, SeqAssist [5], processes raw FASTQ files and extracts useful biological data. Zhang et. al. used two other state-of-the-art pipelines to analyze whole genomes, by sequencing 35 Korean individuals [6]. Not only did they identify genomic variants specific for the Korean population, but also analyzed functions of genes and disease susceptibility associated with these SNPs. Michael A. Bauer, et al. describe custom software tools for the integration and analysis of data from various molecular profiling methods. This software allows the integration of microarray, RNA-Seq, and whole exome sequencing data, and was applied here to sample from a patient with multiple myeloma for illustrative purposes[7]. Proteomics The manuscript by Jaiswal and co-authors [8] presents a new algorithm for analyzing cross-linking mass spectrometry data, XLPM, and its implementation. The results of applying the new software to model systems are shown. Central to this approach is the B-y ion filter which the authors have used to analyze NIST spectral library. The authors claim the XLPM selection filter leads to increased analysis speed and higher confidence non-specific cross-link data. The manuscript by Zhang and Zhao [9] performed some quality measurements on mass-spectrometry metabolomics data. They conclude that their proposed zigzag index is better than several metrics including the MCQ index in evaluating the quality of extracted ion chromatographs (EICs). This work is important due to a need for effective metrics of quality evaluation of EICs to filter out low-quality data. Millions of dollars are spent annually to better understand how pathogens infect their hosts and to identify potential targets for therapeutics. Protein-Protein Interaction (PPI) is an important mechanism playing a crucial role in host-pathogen interactions and pathogenicity. Although there are several bioinformatics methods developed to predict PPIs at the intra-species level, there are very few studies at the inter-species level. As a case study on Arabidopsis-Pseudomonas syringae interaction system, Sitanshu S. Sahu et al [10] developed various domain and interolog based approaches to predict genome-scale PPI network. This is the first report of deciphering an inter-species interactome in any plant-microbe system. Jo and Cheng [11] discussed an important problem in proteome science i.e. protein fold recognition. A random forest based machine learning method is proposed to predict the fold of proteins and they demonstrated its efficiency by comparing their method with the existing approaches. The method should help in recognizing the correct structural fold for template-based protein structure modeling. Laccases (E.C. 1.10.3.2) are multi-copper oxidases that have gained importance in many industries such as biofuels, pulp production, textile dye bleaching, bioremediation, and food production. Previous classification systems for laccase enzymes are based on multiple sequence alignments and they largely follow species taxonomy rather than substrate ranges, enzyme properties, or specific function. In the manuscript by Weirick et al [12], they developed a two-phase classification system; first using unsupervised learning approaches to identify various laccase subtypes based on sequence features and enzyme function, and then developed a supervised learning method for predicting/classifying new laccases from the unknowns. This tool will be a useful resource to the biotechnology community especially those working in the area of biofuels. Huiwen Ng et al. [13] developed a competitive docking approach (CDA) for performing ligand-docking in Estrogen receptors. The CDA takes into account and compares the non-covalent interactions between a specific ligand and the two separate docking models based on the respective docking scores of the docked complex and, therefore, better reflects the receptor-ligand interaction. The CDA approach is extensible to other receptor targets both to screen for potential binders and to differentiate between agonists and antagonists, and is as applicable in drug discovery as for regulatory testing purposes. Miscellaneous Thousands of bioinformatics programs have been published and put online, but studies have shown that many of these resources become inaccessible in a time-dependent manner [14]. Jason Hennessey et al. conduct the largest survey to date of the current availability of previously published URLs - over 27,000. They attempt to identify which ones are Scientific Data Analysis Resources (SDARs) and analyze factors associated with the probability they are still available. They find that SDAR production tends to be distributed widely among institutions (in contrast to publications in general, the bulk of which tend to come from a relatively small fraction of institutions), and that more authors per SDAR-producing paper tends to equate with a greater probability of future accessibility [15]. Weizhong Zhao et al. discuss methods of summarization of text datasets in the form of topic modeling and show its effectiveness on three separate biological datasets [16]. Stephen Grace et al. present Haystack, a web-based tool for metabolomics research. Haystack is designed to visualize, parse, filter, and extract significant features from Liquid Chromatography Mass Spectrometry (LCMS) datasets. They show it is effective when comparing proteomics data from plants grown under two different light conditions [17]. Future meetings MCBIOS XII will be held at The Statehouse Convention Center in Little Rock, Arkansas in 2015 from March 12-14, tentatively entitled "Emerging Trends in Bioinformatics". The 2014-2015 MCBIOS President is Chaoyang "Joe" Zhang from the University of Southern Mississippi. MCBIOS is a regional affiliate of the International Society for Computational Biology (http://www.ISCB.org). For information regarding MCBIOS and our future meetings, see http://www.MCBIOS.org. Competing interests The authors declare that they have no competing interests. Declaration of funding Funding for the publication of this editorial was authorized by and obtained from the Mid-South Computational Biology and Bioinformatics Society. Authors' contributions All authors served as editors for these proceedings, with JDW serving as Senior Editor. All authors helped write this editorial.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Competitive molecular docking approach for predicting estrogen receptor subtype α agonists and antagonists

          Background Endocrine disrupting chemicals (EDCs) are exogenous compounds that interfere with the endocrine system of vertebrates, often through direct or indirect interactions with nuclear receptor proteins. Estrogen receptors (ERs) are particularly important protein targets and many EDCs are ER binders, capable of altering normal homeostatic transcription and signaling pathways. An estrogenic xenobiotic can bind ER as either an agonist or antagonist to increase or inhibit transcription, respectively. The receptor conformations in the complexes of ER bound with agonists and antagonists are different and dependent on interactions with co-regulator proteins that vary across tissue type. Assessment of chemical endocrine disruption potential depends not only on binding affinity to ERs, but also on changes that may alter the receptor conformation and its ability to subsequently bind DNA response elements and initiate transcription. Using both agonist and antagonist conformations of the ERα, we developed an in silico approach that can be used to differentiate agonist versus antagonist status of potential binders. Methods The approach combined separate molecular docking models for ER agonist and antagonist conformations. The ability of this approach to differentiate agonists and antagonists was first evaluated using true agonists and antagonists extracted from the crystal structures available in the protein data bank (PDB), and then further validated using a larger set of ligands from the literature. The usefulness of the approach was demonstrated with enrichment analysis in data sets with a large number of decoy ligands. Results The performance of individual agonist and antagonist docking models was found comparable to similar models in the literature. When combined in a competitive docking approach, they provided the ability to discriminate agonists from antagonists with good accuracy, as well as the ability to efficiently select true agonists and antagonists from decoys during enrichment analysis. Conclusion This approach enables evaluation of potential ER biological function changes caused by chemicals bound to the receptor which, in turn, allows the assessment of a chemical's endocrine disrupting potential. The approach can be used not only by regulatory authorities to perform risk assessments on potential EDCs but also by the industry in drug discovery projects to screen for potential agonists and antagonists.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Predicting genome-scale Arabidopsis-Pseudomonas syringae interactome using domain and interolog-based approaches

            Background Every year pathogenic organisms cause billions of dollars' worth damage to crops and livestock. In agriculture, study of plant-microbe interactions is demanding a special attention to develop management strategies for the destructive pathogen induced diseases that cause huge crop losses every year worldwide. Pseudomonas syringae is a major bacterial leaf pathogen that causes diseases in a wide range of plant species. Among its various strains, pathovar tomato strain DC3000 (PstDC3000) is asserted to infect the plant host Arabidopsis thaliana and thus, has been accepted as a model system for experimental characterization of the molecular dynamics of plant-pathogen interactions. Protein-protein interactions (PPIs) play a critical role in initiating pathogenesis and maintaining infection. Understanding the PPI network between a host and pathogen is a critical step for studying the molecular basis of pathogenesis. The experimental study of PPIs at a large scale is very scarce and also the high throughput experimental results show high false positive rate. Hence, there is a need for developing efficient computational models to predict the interaction between host and pathogen in a genome scale, and find novel candidate effectors and/or their targets. Results In this study, we used two computational approaches, the interolog and the domain-based to predict the interactions between Arabidopsis and PstDC3000 in genome scale. The interolog method relies on protein sequence similarity to conduct the PPI prediction. A Pseudomonas protein and an Arabidopsis protein are predicted to interact with each other if an experimentally verified interaction exists between their respective homologous proteins in another organism. The domain-based method uses domain interaction information, which is derived from known protein 3D structures, to infer the potential PPIs. If a Pseudomonas and an Arabidopsis protein contain an interacting domain pair, one can expect the two proteins to interact with each other. The interolog-based method predicts ~0.79M PPIs involving around 7700 Arabidopsis and 1068 Pseudomonas proteins in the full genome. The domain-based method predicts 85650 PPIs comprising 11432 Arabidopsis and 887 Pseudomonas proteins. Further, around 11000 PPIs have been identified as interacting from both the methods as a consensus. Conclusion The present work predicts the protein-protein interaction network between Arabidopsis thaliana and Pseudomonas syringae pv. tomato DC3000 in a genome wide scale with a high confidence. Although the predicted PPIs may contain some false positives, the computational methods provide reasonable amount of interactions which can be further validated by high throughput experiments. This can be a useful resource to the plant community to characterize the host-pathogen interaction in Arabidopsis and Pseudomonas system. Further, these prediction models can be applied to the agriculturally relevant crops.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Improving protein fold recognition by random forest

              Background Recognizing the correct structural fold among known template protein structures for a target protein (i.e. fold recognition) is essential for template-based protein structure modeling. Since the fold recognition problem can be defined as a binary classification problem of predicting whether or not the unknown fold of a target protein is similar to an already known template protein structure in a library, machine learning methods have been effectively applied to tackle this problem. In our work, we developed RF-Fold that uses random forest - one of the most powerful and scalable machine learning classification methods - to recognize protein folds. Results RF-Fold consists of hundreds of decision trees that can be trained efficiently on very large datasets to make accurate predictions on a highly imbalanced dataset. We evaluated RF-Fold on the standard Lindahl's benchmark dataset comprised of 976 × 975 target-template protein pairs through cross-validation. Compared with 17 different fold recognition methods, the performance of RF-Fold is generally comparable to the best performance in fold recognition of different difficulty ranging from the easiest family level, the medium-hard superfamily level, and to the hardest fold level. Based on the top-one template protein ranked by RF-Fold, the correct recognition rate is 84.5%, 63.4%, and 40.8% at family, superfamily, and fold levels, respectively. Based on the top-five template protein folds ranked by RF-Fold, the correct recognition rate increases to 91.5%, 79.3% and 58.3% at family, superfamily, and fold levels. Conclusions The good performance achieved by the RF-Fold demonstrates the random forest's effectiveness for protein fold recognition.
                Bookmark

                Author and article information

                Contributors
                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2014
                21 October 2014
                : 15
                : Suppl 11
                : I1
                Affiliations
                [1 ]Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation; 825 N.E. 13th Street, Oklahoma City, OK 73104-5005, USA
                [2 ]Biochemistry and Molecular Biology Dept, University of Oklahoma Health Sciences Center, USA
                [3 ]Stephenson Cancer Center, University of Oklahoma Health Sciences Center, USA
                [4 ]Department of Geriatric Medicine, University of Oklahoma Health Sciences Center, USA
                [5 ]Civil Aerospace Medical Institute, Federal Aviation Administration, Oklahoma City, OK 73169, USA
                [6 ]Department of Computer Science and Engineering, Mississippi State University, Mississippi State, MS 39762, USA
                [7 ]School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406, USA
                [8 ]Department of Biochemistry & Molecular Biology, Oklahoma State University, Stillwater, OK 74078, USA
                [9 ]Bioinformatics Facility, Institute for Integrative Genome Biology, Department of Botany & Plant Sciences, University of California, Riverside, CA 92521, USA
                [10 ]Virginia Commonwealth University, Richmond Academy of Medicine, Department of Biostatistics, USA
                Article
                1471-2105-15-S11-I1
                10.1186/1471-2105-15-S11-I1
                4251036
                25350879
                d62e2288-c73e-4ec0-a330-b118fe793ac5
                Copyright © 2014 Wren et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                11th Annual MCBIOS Conference
                Stillwater, OK, USA
                6-8 March 2014
                History
                Categories
                Introduction

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article