Blog
About

  • Record: found
  • Abstract: found
  • Article: not found

Framework for making better predictions by directly estimating variables' predictivity.

Read this article at

ScienceOpenPublisherPMC
Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

      Related collections

      Most cited references 11

      • Record: found
      • Abstract: not found
      • Article: not found

      Gene expression profiling predicts clinical outcome of breast cancer.

      Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. The strongest predictors for metastases (for example, lymph node status and histological grade) fail to classify accurately breast tumours according to their clinical behaviour. Chemotherapy or hormonal therapy reduces the risk of distant metastases by approximately one-third; however, 70-80% of patients receiving this treatment would have survived without it. None of the signatures of breast cancer gene expression reported to date allow for patient-tailored therapy strategies. Here we used DNA microarray analysis on primary breast tumours of 117 young patients, and applied supervised classification to identify a gene expression signature strongly predictive of a short interval to distant metastases ('poor prognosis' signature) in patients without tumour cells in local lymph nodes at diagnosis (lymph node negative). In addition, we established a signature that identifies tumours of BRCA1 carriers. The poor prognosis signature consists of genes regulating cell cycle, invasion, metastasis and angiogenesis. This gene expression profile will outperform all currently used clinical parameters in predicting disease outcome. Our findings provide a strategy to select patients who would benefit from adjuvant therapy.
        Bookmark
        • Record: found
        • Abstract: not found
        • Article: not found

        The Elements of Statistical Learning

          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          A review of feature selection techniques in bioinformatics.

          Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
            Bookmark

            Author and article information

            Affiliations
            [1 ] Department of Politics, Princeton University, Princeton, NJ 08540.
            [2 ] Department of Statistics, Harvard University, Cambridge, MA 02138; slo@stat.columbia.edu chernoff@stat.harvard.edu tz33@columbia.edu.
            [3 ] Department of Statistics, Columbia University, New York, NY 10027 slo@stat.columbia.edu chernoff@stat.harvard.edu tz33@columbia.edu.
            Journal
            Proc. Natl. Acad. Sci. U.S.A.
            Proceedings of the National Academy of Sciences of the United States of America
            Proceedings of the National Academy of Sciences
            1091-6490
            0027-8424
            Dec 13 2016
            : 113
            : 50
            27911830
            1616647113
            10.1073/pnas.1616647113
            5167195

            Comments

            Comment on this article