7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      A Survey of Data Mining and Deep Learning in Bioinformatics

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references72

          • Record: found
          • Abstract: not found
          • Article: not found

          Extracting and composing robust features with denoising autoencoders

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Neighbor-net: an agglomerative method for the construction of phylogenetic networks.

            We present Neighbor-Net, a distance based method for constructing phylogenetic networks that is based on the Neighbor-Joining (NJ) algorithm of Saitou and Nei. Neighbor-Net provides a snapshot of the data that can guide more detailed analysis. Unlike split decomposition, Neighbor-Net scales well and can quickly produce detailed and informative networks for several hundred taxa. We illustrate the method by reanalyzing three published data sets: a collection of 110 highly recombinant Salmonella multi-locus sequence typing sequences, the 135 "African Eve" human mitochondrial sequences published by Vigilant et al., and a collection of 12 Archeal chaperonin sequences demonstrating strong evidence for gene conversion. Neighbor-Net is available as part of the SplitsTree4 software package.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Rotation forest: A new classifier ensemble method.

              We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the Rotation Forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favorable to Rotation Forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest, and more diverse than these in Bagging, sometimes more accurate as well.
                Bookmark

                Author and article information

                Journal
                Journal of Medical Systems
                J Med Syst
                Springer Science and Business Media LLC
                0148-5598
                1573-689X
                August 2018
                June 28 2018
                August 2018
                : 42
                : 8
                Article
                10.1007/s10916-018-1003-9
                29956014
                0158440d-9261-4fe7-89a6-a1ddca1e5e4a
                © 2018

                http://www.springer.com/tdm

                History

                Comments

                Comment on this article