Rediscover: an R package to identify mutually exclusive mutations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Discover is an algorithm developed to identify mutually exclusive genomic events. Its main contribution is a statistical analysis based on the Poisson–Binomial (PB) distribution to take into account the mutation rate of genes and samples. Discover is very effective for identifying mutually exclusive mutations at the expense of speed in large datasets: the PB is computationally costly to estimate, and checking all the potential mutually exclusive alterations requires millions of tests.

Results

We have implemented a new version of the package called Rediscover that implements exact and approximate computations of the PB. Rediscover exact implementation is slightly faster than Discover for large and medium-sized datasets. The approximation is 100–1000 times faster for them making it possible to get results in less than a minute with a standard desktop. The memory footprint is also smaller in Rediscover. The new package is available at CRAN and provides some functions to integrate its usage with other R packages such as maftools and TCGAbiolinks.

Availability and implementation

Rediscover is available at CRAN (https://cran.r-project.org/web/packages/Rediscover/index.html).

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 8

Record: found
Abstract: found
Article: found

Is Open Access

Maftools: efficient and comprehensive analysis of somatic variants in cancer

Anand Mayakonda, De-Chen Lin, Yassen Assenov … (2018)

Numerous large-scale genomic studies of matched tumor-normal samples have established the somatic landscapes of most cancer types. However, the downstream analysis of data from somatic mutations entails a number of computational and statistical approaches, requiring usage of independent software and numerous tools. Here, we describe an R Bioconductor package, Maftools, which offers a multitude of analysis and visualization modules that are commonly used in cancer genomic studies, including driver gene identification, pathway, signature, enrichment, and association analyses. Maftools only requires somatic variants in Mutation Annotation Format (MAF) and is independent of larger alignment files. With the implementation of well-established statistical and computational methods, Maftools facilitates data-driven research and comparative analysis to discover novel results from publicly available data sets. In the present study, using three of the well-annotated cohorts from The Cancer Genome Atlas (TCGA), we describe the application of Maftools to reproduce known results. More importantly, we show that Maftools can also be used to uncover novel findings through integrative analysis.

0 comments Cited 1812 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

COSMIC: the Catalogue Of Somatic Mutations In Cancer

John G Tate, Sally Bamford, Harry C Jubb … (2018)

Abstract COSMIC, the Catalogue Of Somatic Mutations In Cancer (https://cancer.sanger.ac.uk) is the most detailed and comprehensive resource for exploring the effect of somatic mutations in human cancer. The latest release, COSMIC v86 (August 2018), includes almost 6 million coding mutations across 1.4 million tumour samples, curated from over 26 000 publications. In addition to coding mutations, COSMIC covers all the genetic mechanisms by which somatic mutations promote cancer, including non-coding mutations, gene fusions, copy-number variants and drug-resistance mutations. COSMIC is primarily hand-curated, ensuring quality, accuracy and descriptive data capture. Building on our manual curation processes, we are introducing new initiatives that allow us to prioritize key genes and diseases, and to react more quickly and comprehensively to new findings in the literature. Alongside improvements to the public website and data-download systems, new functionality in COSMIC-3D allows exploration of mutations within three-dimensional protein structures, their protein structural and functional impacts, and implications for druggability. In parallel with COSMIC’s deep and broad variant coverage, the Cancer Gene Census (CGC) describes a curated catalogue of genes driving every form of human cancer. Currently describing 719 genes, the CGC has recently introduced functional descriptions of how each gene drives disease, summarized into the 10 cancer Hallmarks.

0 comments Cited 1541 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx

Mohamed Mounir, Marta Lucchetta, Tiago Silva … (2019)

The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.

0 comments Cited 195 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Juan A Ferrer-Bonsoms: (View ORCID Profile)

Angel Rubio: (View ORCID Profile)

Journal

Title: Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Print): 1367-4803

ISSN (Electronic): 1460-2059

Publication date Created: February 01 2022

Publication date Created: January 12 2022

Publication date Created: October 19 2021

Publication date Other: February 01 2022

Publication date (Print): January 12 2022

Publication date (Electronic): October 19 2021

Volume: 38

Issue: 3

Pages: 844-845

Affiliations

[1 ]Department of Biomedical Engineering and Sciences, TECNUN, University of Navarra, San Sebastian, Spain

Article

DOI: 10.1093/bioinformatics/btab709

PubMed ID: 34664620

SO-VID: 6df39c0f-f46b-40a6-b327-8d193069fe11

License:

https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

Rediscover: an R package to identify mutually exclusive mutations

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

International Journal of Management Studies

Most cited references 8

Maftools: efficient and comprehensive analysis of somatic variants in cancer

COSMIC: the Catalogue Of Somatic Mutations In Cancer

New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx

Author and article information

Contributors

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 61

Cited by 8

Most referenced authors 130