7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

          Related collections

          Author and article information

          Journal
          J Biomed Inform
          Journal of biomedical informatics
          1532-0480
          1532-0464
          Oct 2015
          : 57
          Affiliations
          [1 ] Department of Computer Science and IT, School of Electrical Engineering and Computer Science, Shiraz University, Shiraz, Iran.
          [2 ] Department of Computer Science and IT, School of Electrical Engineering and Computer Science, Shiraz University, Shiraz, Iran. Electronic address: sami@shirazu.ac.ir.
          [3 ] School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia; Institute of Biotechnology, Shiraz University, Shiraz, Iran; Department of Genetics and Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, Australia. Electronic address: Esmaeil.Ebrahimie@unisa.edu.au.
          Article
          S1532-0464(15)00158-6
          10.1016/j.jbi.2015.07.018
          26232668
          3644a6a7-6938-4566-9ab8-22534bccd86a
          Copyright © 2015 Elsevier Inc. All rights reserved.
          History

          Association rule mining,CBA algorithm,Pandemic influenza prediction

          Comments

          Comment on this article