99
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accurate Computation of Survival Statistics in Genome-Wide Studies

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations.

          Author Summary

          The identification of genetic variants associated with survival time is crucial in genomic studies. To this end, a number of methods have been proposed to computing a p-value that summarized the difference in survival time of two or more population. The most widely used method among these is the log-rank test. Widely used implementations of the log-rank test present a systematic error that emerges in most genome-wide applications, where the two populations have very different sizes, and the accurate computation of very small p-values is required due to the evaluation of a number of candidate variants. Considering cancer genomic applications, we show that the systematic error leads to many false positive associations of somatic variants and survival time. We present and analyze a new algorithm, ExaLT that accurately computes the p-value for the log-rank test under a distribution that is appropriate for the parameters found in genomics. Unlike previous approaches, ExaLT allows to control the accuracy of the computation. We use ExaLT to analyze cancer genomics data from The Cancer Genome Atlas (TCGA), identifying several novel associations in addition to well known associations. In contrast, the standard implementations of the log-rank test report a huge number of presumably false positive associations.

          Related collections

          Most cited references 16

          • Record: found
          • Abstract: found
          • Article: not found

          Association between BRCA1 and BRCA2 mutations and survival in women with invasive epithelial ovarian cancer.

          Approximately 10% of women with invasive epithelial ovarian cancer (EOC) carry deleterious germline mutations in BRCA1 or BRCA2. A recent article suggested that BRCA2-related EOC was associated with an improved prognosis, but the effect of BRCA1 remains unclear. To characterize the survival of BRCA carriers with EOC compared with noncarriers and to determine whether BRCA1 and BRCA2 carriers show similar survival patterns. A pooled analysis of 26 observational studies on the survival of women with ovarian cancer, which included data from 1213 EOC cases with pathogenic germline mutations in BRCA1 (n = 909) or BRCA2 (n = 304) and from 2666 noncarriers recruited and followed up at variable times between 1987 and 2010 (the median year of diagnosis was 1998). Five-year overall mortality. The 5-year overall survival was 36% (95% CI, 34%-38%) for noncarriers, 44% (95% CI, 40%-48%) for BRCA1 carriers, and 52% (95% CI, 46%-58%) for BRCA2 carriers. After adjusting for study and year of diagnosis, BRCA1 and BRCA2 mutation carriers showed a more favorable survival than noncarriers (for BRCA1: hazard ratio [HR], 0.78; 95% CI, 0.68-0.89; P < .001; and for BRCA2: HR, 0.61; 95% CI, 0.50-0.76; P < .001). These survival differences remained after additional adjustment for stage, grade, histology, and age at diagnosis (for BRCA1: HR, 0.73; 95% CI, 0.64-0.84; P < .001; and for BRCA2: HR, 0.49; 95% CI, 0.39-0.61; P < .001). The BRCA1 HR estimate was significantly different from the HR estimated in the adjusted model (P for heterogeneity = .003). Among patients with invasive EOC, having a germline mutation in BRCA1 or BRCA2 was associated with improved 5-year overall survival. BRCA2 carriers had the best prognosis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            RUNX1 mutations in acute myeloid leukemia: results from a comprehensive genetic and clinical analysis from the AML study group.

            To evaluate frequency, biologic features, and clinical relevance of RUNX1 mutations in acute myeloid leukemia (AML). Diagnostic samples from 945 patients (age 18 to 60 years) were analyzed for RUNX1 mutations. In a subset of cases (n = 269), microarray gene expression analysis was performed. Fifty-nine RUNX1 mutations were identified in 53 (5.6%) of 945 cases, predominantly in exons 3 (n = 11), 4 (n = 10), and 8 (n = 23). RUNX1 mutations clustered in the intermediate-risk cytogenetic group (46 of 640, 7.2%; cytogenetically normal, 34 of 538, 6.3%), whereas they were less frequent in adverse-risk cytogenetics (five of 109, 4.6%) and absent in core-binding-factor AML (0 of 77) and acute promyelocytic leukemia (0 of 61). RUNX1 mutations were associated with MLL-partial tandem duplications (P = .0007) and IDH1/IDH2 mutations (P = .03), inversely correlated with NPM1 (P < .0001), and in trend with CEBPA (P = .10) mutations. RUNX1 mutations were characterized by a distinct gene expression pattern; this RUNX1 mutation-derived signature was not exclusive for the mutation, but also included mostly adverse-risk AML [eg, 7q-, -7, inv(3), or t(3;3)]. RUNX1 mutations predicted for resistance to chemotherapy (rates of refractory disease 30% and 19%, P = .047, for RUNX1-mutated and wild-type patients, respectively), as well as inferior event-free survival (EFS; P < .0001), relapse-free survival (RFS, P = .022), and overall survival (P = .051). In multivariable analysis, RUNX1 mutations were an independent prognostic marker for shorter EFS (P = .007). Explorative subgroup analysis revealed that allogeneic hematopoietic stem-cell transplantation had a favorable impact on RFS in RUNX1-mutated patients (P < .0001). AML with RUNX1 mutations are characterized by distinct genetic properties and are associated with resistance to therapy and inferior outcome.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Survival analysis with high-dimensional covariates.

              In recent years, breakthroughs in biomedical technology have led to a wealth of data in which the number of features (for instance, genes on which expression measurements are available) exceeds the number of observations (e.g. patients). Sometimes survival outcomes are also available for those same observations. In this case, one might be interested in (a) identifying features that are associated with survival (in a univariate sense), and (b) developing a multivariate model for the relationship between the features and survival that can be used to predict survival in a new observation. Due to the high dimensionality of this data, most classical statistical methods for survival analysis cannot be applied directly. Here, we review a number of methods from the literature that address these two problems.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                May 2015
                7 May 2015
                : 11
                : 5
                Affiliations
                [1 ]Department of Mathematics and Computer Science, University of Southern Denmark, Funen, Denmark
                [2 ]Department of Computer Science, Brown University, Providence, Rhode Island, United States of America
                [3 ]Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
                Ontario Institute for Cancer Research, CANADA
                Author notes

                I have read the journal’s policy and have the following conflicts: EU is a member of the scientific board and a consultant at Nabsys Inc.

                Conceived and designed the experiments: FV BJR EU. Performed the experiments: FV AP. Analyzed the data: FV AP. Contributed reagents/materials/analysis tools: FV BJR EU. Wrote the paper: FV BJR EU.

                Article
                PCOMPBIOL-D-13-01745
                10.1371/journal.pcbi.1004071
                4423942
                25950620

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

                Page count
                Figures: 3, Tables: 0, Pages: 18
                Product
                Funding
                This work is supported by NSF grants IIS-1016648 and IIS-1247581, and by NIH grant R01HG007069. BJR is supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, an Alfred P. Sloan Research Fellowship, and an NSF CAREER Award (CCF-1053753). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article

                Quantitative & Systems biology

                Comments

                Comment on this article