Global computational alignment of tumor and cell line transcriptional profiles

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Cell lines are key tools for preclinical cancer research, but it remains unclear how well they represent patient tumor samples. Direct comparisons of tumor and cell line transcriptional profiles are complicated by several factors, including the variable presence of normal cells in tumor samples. We thus develop an unsupervised alignment method (Celligner) and apply it to integrate several large-scale cell line and tumor RNA-Seq datasets. Although our method aligns the majority of cell lines with tumor samples of the same cancer type, it also reveals large differences in tumor similarity across cell lines. Using this approach, we identify several hundred cell lines from diverse lineages that present a more mesenchymal and undifferentiated transcriptional state and that exhibit distinct chemical and genetic dependencies. Celligner could be used to guide the selection of cell lines that more closely resemble patient tumors and improve the clinical translation of insights gained from cell lines.

Abstract

The determination of whether cancer cell lines recapitulate the molecular features of corresponding patient tumours remains essential for the selection of appropriate cell line models for preclinical studies. The method developed here, Celligner, integrates cancer cell line and tumour RNA-seq datasets and reveals large differences in their concordance across cell lines and cancer types.

Related collections

Most cited references 79

Record: found
Abstract: found
Article: not found

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

A. Subramanian, P. Tamayo, V. K. Mootha … (2005)

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

0 comments Cited 12678 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

limma powers differential expression analyses for RNA-sequencing and microarray studies

Matthew E. Ritchie, Belinda Phipson, Di Wu … (2015)

limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

0 comments Cited 10983 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Mark Robinson, Davis J. McCarthy, Gordon K. Smyth (2009)

Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au

0 comments Cited 9988 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

James M. McFarland:

ORCID: http://orcid.org/0000-0001-9978-480X

jmmcfarl@broadinstitute.org

Journal

Journal ID (nlm-ta): Nat Commun

Journal ID (iso-abbrev): Nat Commun

Title: Nature Communications

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2041-1723

Publication date (Electronic): 4 January 2021

Publication date PMC-release: 4 January 2021

Publication date Collection: 2021

Volume: 12

Electronic Location Identifier: 22

Affiliations

[1 ]GRID grid.66859.34, Broad Institute of MIT and Harvard, ; Cambridge, MA USA

[2 ]GRID grid.65499.37, ISNI 0000 0001 2106 9910, Department of Medical Oncology, , Dana Farber Cancer Institute, ; Boston, MA USA

[3 ]GRID grid.38142.3c, ISNI 000000041936754X, Harvard Medical School, ; Boston, MA USA

Author information

William C. Hahn http://orcid.org/0000-0003-2840-9791

Jesse S. Boehm http://orcid.org/0000-0002-6795-6336

Francisca Vazquez http://orcid.org/0000-0002-2857-4685

Aviad Tsherniak http://orcid.org/0000-0002-3797-1877

James M. McFarland http://orcid.org/0000-0001-9978-480X

Article

Publisher ID: 20294

DOI: 10.1038/s41467-020-20294-x

PMC ID: 7782593

PubMed ID: 33397959

SO-VID: 20fbf918-633f-4038-8cc5-88ab66336902

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 6 April 2020

Date accepted : 24 November 2020

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: cancer genomics,cancer models,data integration

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: cancer genomics, cancer models, data integration

Comments

Comment on this article

scite_

Cited by 40

See all cited by

Most referenced authors 8,823

See all reference authors

- Version 1

Global computational alignment of tumor and cell line transcriptional profiles

Read this article at

Abstract

Abstract

Related collections

Computational epistasis

Most cited references 79

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

limma powers differential expression analyses for RNA-sequencing and microarray studies

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 196

Cited by 40

Most referenced authors 8,823