34
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Proceedings of the 2012 MidSouth computational biology and bioinformatics society (MCBIOS) conference

      research-article
      1 , 2 , , 1 , 3 , 4 , 5 , 3
      BMC Bioinformatics
      BioMed Central
      Proceedings of the Ninth Annual MCBIOS Conference. Dealing with the Omics Data Deluge
      17-18 February 2012
      bioinformatics, conferences, MCBIOS, ISCB

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Introduction The ninth annual conference of the MidSouth Computational Biology and Bioinformatics Society (MCBIOS 2012), "Making Sense of the Omics Data Deluge", took place in Oxford, Mississippi February 17-8 2012. This year's Conference Chairs were Dr. Dawn Wilkins, of the University of Mississippi and Dr. Doris Kupfer, also the current MCBIOS President (2011-2), from the Federal Aviation Administration. There were 170 registrants and a total of 106 abstracts (34 oral presentations and 72 poster session abstracts). Keynote speakers for 2012 were Dr. Michael Gribskov, Purdue University, who gave the opening address, "After the Deluge: Bioinformatics meets big data"; Dr. David J. States, OncProTech LLC, who gave his presentation remotely via WebEx entitled "Data Intensive Proteomics"; and Sultan Meghji, Appistry Inc. presenting the Saturday morning address, entitled "Simple, Fast and Affordable - Turning the myriad of data into action - technologies to support personalized medicine" and invited speaker, Dr. William Slikker, Director of the Food and Drug Administration's, National Center for Toxicological Research, presented a talk entitled "Regulatory Science: Challenges and Progress" outlining the role research at the FDA plays in their regulatory responsibilities. Participants also had the opportunity to attend hands-on workshops on NCBI tools, presented by Dr. Peter Cooper, NCBI/NLM/NIH staff scientist, and a collaboration workshop focused on the timber rattlesnake genome, facilitated by Dr. Ed Perkins, Army Corp of Engineers. The winners of conference awards were: Best Oral Presentations (students): 1st Place: Shana Stoddard, University of Mississippi 2nd Place: Neal Platt, Mississippi State 3rd Place: Aleksandra Markovets, UALR Best Oral Presentations (Post-Doctoral fellows): 1st Place: Mikhail Dozmorov, OMRF 2nd Place: Zhichao Liu, NCTR Best Poster (Computation): 1st Place: Shraddha Thakkar, UAMS 2nd Place: Xingyan Kuang, University of Missouri 3rd Place: Sule Dogan, Mississippi State Best Poster (Biology): 1st Place: Tamer Aldwairi, Mississippi State 2nd Place: Dilip Gautam, Mississippi State 3rd Place: Bin Pang, University of Missouri Proceedings summary This year, there were 13 papers accepted for publication in the conference proceedings [1-13] out of a total of 20 submitted (65%), which was the lowest number of papers published since the first MCBIOS conference in 2003 which also accepted 13 papers. It was the second lowest number of papers submitted to the proceedings (17 were submitted in 2004). This was a substantial drop from the 21 papers published in last year's Proceedings [14-34]. All papers were peer-reviewed by 2 or more reviewers. Our goal is to be inclusive, yet rigorous in the peer-review process such that accepted papers are both high quality and reflective of the work presented at the conference. Papers generally fell into five categories: Genomic analysis Ptitsyn et al report an algorithm for analysis of whole genomes in terms of the genes that they share. It provides an important way to quantify gain and loss of genes across phyla, and the authors have identified core genes that are common to each phylum [6]. Verma and Melcher [8] describe a Support Vector Machine (SVM) model for distinguishing peptides originating from plant host proteins or from proteobacterial plant pathogen proteins. A feature set consisting of a combination of both single amino acid compostions and dipeptide compositions exhibited the highest accuracy. Yao et al. [9] present a detailed phylogenetic and transcriptome analysis of three classes of the secondary-wall-associated NAC domain transcription factors across 19 higher plant species. In addition, computational modeling is used to predict the genes regulated or co-regulated by these transcription factors. The study reveals coordinative functioning of several NAC genes and a number of novel genes and pathways that can potentially be involved in biosynthesis of cell walls. Systems biology/pathways Abundance of different databases literally creates an "omics data deluge". Hui Huang et al [3] addressed this issue by creating PAGED database http://bio.informatics.iupui.edu/PAGED, that include data from OMIM, GAD, MSigDb, miRecords and other databases as a one stop solution for exploratory science. Zhang and Drabier [10] used different aspects of data integration by compiling an Integrated Pathway Analysis Database (IPAD, http://bioinfo.hsc.unt.edu/ipad). This database defines associations between genes, proteins, pathways, diseases, drugs and organs, essential in understanding the relationships between these entities. These relationships can be quantified by running enrichment analysis with flexible threshold options. Zhang and Berleant [12] developed a Java application BirdsEyeView http://metnetdb.org/MetNet_BirdsEyeView.htm for visualizing gene lists and expression data in context of cellular localization, pathways, and gene ontology annotations. Developed for plant research, this customizable tool provides flexible and intuitive understanding of the processes and pathways affected by the genes of interest. RNA-seq Reddy et al. used RNA-Seq to develop an expression based data analysis workflow using freely available software to validate and expand the existing annotation of the cattle pathogen, Mannheimia haemolytica PHL213 [7]. Using the pipeline, the study confirmed existing M. haemolytica annotation as well as identified potential novel genes and operon structures, demonstrating and validating the use of this elegant, simple, and easily implemented bioinformatics pipeline. Proteomics Zhang et al compare the effects of organelle enrichment on sensitivity of protein identification by high-throughput mass-spec from aerials of A. thaliana, and further compare and contrast the biological effects of two hormone treatments, Zeatin and brassinosteroid, on protein expression levels in mitochondria and chloroplasts. Their results suggest that physical enrichment of organelles increases the sensitivity of the assay to identify organelle specific proteins. In addition, they find that the two hormones affect different biological pathways to achieve a similar physiological effect, an increase in biomass for bioenergy production [13]. Zhang and Su analyzed the flexibility of protein structures using different structures of identical proteins based on structural comparison, secondary structure and sequence alignment, and report that proteins have several stable conformations, and that structures for the identical sequences may significantly differ from one another [11]. This will be helpful in evaluating the accuracy of protein structure prediction methods, e.g. one may need to employ molecular dynamic simulation to construct a structure set as criteria for such studies. Pechan and Gwaltney investigated the relationship between tandem mass spectral fragment ion intensities and the distribution of in vacuo protonation states that can be modeled from peptide sequences [5]. Their work suggests that it is possible to calculate the ion intensities in the mass spectra of peptides, based solely on the protein's amino acid sequence. Miscellaneous Halil Bisgin et al use topic modeling to analyze pharmacological similarity and evaluate their system in terms of its potential to reposition drugs - that is, to find additional uses for them. Doing so is important because new drug development is extremely expensive and time-consuming [1]. Zhifa Liu et al evaluated four different Bayesian network scoring functions, Minimum Description Length (MDL), Akaike's Information Criterion (AIC), Bayesian Dirichlet equivalence score (BDeu) and factorized Normalized Maximum Likelihood (fNML), and analyzed their performance in terms of success rate on recovering 'true' gold standard networks [4]. They report that MDL outperforms other scoring functions. This study would provide useful information when analyzing biological networks, such as the gene regulatory networks (GRN). Fu et al [2] applied multiple instance learning via embedded selection (MILES) for the construction of quantitative structure-activity relationship (QSAR) between 3D shapes of a bioactive compounds with their targets. The authors demonstrated that their method, built solidly on previous research, allows better drug activity prediction without overfitting. Future meetings The Stoney Creek Inn & Conference Center in Columbia, Missouri will be the site of MCBIOS 2013 to be held April 5-6. This will be the tenth anniversary of the MCBIOS conference and will be entitled "The 10th Anniversary in a Decade of Change: Discovery in a Sea of Data". The 2012-2013 MCBIOS President is Ed Perkins, US Army Engineer Research and Development Center and Andy Perkins, Mississippi State University, is now the President-elect. MCBIOS is a regional affiliate of the International Society for Computational Biology http://www.ISCB.org. For information regarding MCBIOS and our future meetings, see http://www.MCBIOS.org. Competing interests The authors declare that they have no competing interests. Authors' contributions All authors served as editors for these proceedings, with JDW serving as Senior Editor. All authors helped write this editorial.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens

          Background RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq. Results Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads. Conclusion The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Mining FDA drug labels using an unsupervised learning technique - topic modeling

            Background The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. Method In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. Results The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics. Conclusions The successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Predicting glioblastoma prognosis networks using weighted gene co-expression network analysis on TCGA data

              Background Using gene co-expression analysis, researchers were able to predict clusters of genes with consistent functions that are relevant to cancer development and prognosis. We applied a weighted gene co-expression network (WGCN) analysis algorithm on glioblastoma multiforme (GBM) data obtained from the TCGA project and predicted a set of gene co-expression networks which are related to GBM prognosis. Methods We modified the Quasi-Clique Merger algorithm (QCM algorithm) into edge-covering Quasi-Clique Merger algorithm (eQCM) for mining weighted sub-network in WGCN. Each sub-network is considered a set of features to separate patients into two groups using K-means algorithm. Survival times of the two groups are compared using log-rank test and Kaplan-Meier curves. Simulations using random sets of genes are carried out to determine the thresholds for log-rank test p-values for network selection. Sub-networks with p-values less than their corresponding thresholds were further merged into clusters based on overlap ratios (>50%). The functions for each cluster are analyzed using gene ontology enrichment analysis. Results Using the eQCM algorithm, we identified 8,124 sub-networks in the WGCN, out of which 170 sub-networks show p-values less than their corresponding thresholds. They were then merged into 16 clusters. Conclusions We identified 16 gene clusters associated with GBM prognosis using the eQCM algorithm. Our results not only confirmed previous findings including the importance of cell cycle and immune response in GBM, but also suggested important epigenetic events in GBM development and prognosis.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2012
                11 September 2012
                : 13
                : Suppl 15
                : S1
                Affiliations
                [1 ]Arthritis and Immunology Research Program, Oklahoma Medical Research Foundation; 825 N.E. 13th Street, Oklahoma City, OK 73104-5005, USA
                [2 ]Biochemistry and Molecular Biology Dept, Univ of Okla Health Sciences Center, USA
                [3 ]Civil Aerospace Medical Institute, Federal Aviation Administration, Oklahoma City, OK 73169, USA
                [4 ]National Institute for Microbial Forensics & Food and Agricultural Biosecurity, Department of Biochemistry & Molecular Biology, Oklahoma State University, Stillwater, OK 74078, USA
                [5 ]Department of Computer Science and Engineering, Mississippi State University, Box 9637, Mississippi State, MS 39762, USA
                Article
                1471-2105-13-S15-S1
                10.1186/1471-2105-13-S15-S1
                3439718
                23046182
                dc01a08f-a316-4bbb-92c1-1369ca22deb4
                Copyright ©2012 Wren et al.; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Proceedings of the Ninth Annual MCBIOS Conference. Dealing with the Omics Data Deluge
                Oxford, MS, USA
                17-18 February 2012
                History
                Categories
                Introduction

                Bioinformatics & Computational biology
                mcbios,conferences,iscb,bioinformatics
                Bioinformatics & Computational biology
                mcbios, conferences, iscb, bioinformatics

                Comments

                Comment on this article