Pancreatic ductal adenocarcinoma (PDA) is a lethal disease. Overall survival is typically six months from diagnosis 1 . Numerous phase III trials of agents effective in other malignancies have failed to benefit unselected PDA populations, although patients do occasionally respond. Studies in other solid tumors have shown that heterogeneity in response is determined, in part, by molecular differences between tumors. Further, treatment outcomes are improved by targeting drugs to tumor subtypes in which they are selectively effective, with breast 2 and lung 3 cancers providing recent examples. Identification of PDA molecular subtypes has been frustrated by a paucity of tumor specimens available for study. We have overcome this problem by combined analysis of transcriptional profiles of primary PDA samples from several studies along with human and mouse PDA cell lines. We define three PDA subtypes: classical, quasi-mesenchymal, and exocrine-like, and present evidence for clinical outcome and therapeutic response differences between them. We further define gene signatures for these subtypes that may have utility in stratifying patients for treatment and present preclinical model systems that may be used to identify new subtype specific therapies. Global gene expression analysis has proved useful for subtype identification in many human tumor types 4 . We approached PDA subtype identification by first identifying intrinsically variable (standard deviation > 0.8) genes in two gene expression microarray datasets from resected PDA. We generated one dataset using microdissected PDA material (UCSF tumors, n=27) from clinical samples for which information on survival duration was available and the second was previously published (Badea, et al.) 5 . To increase power, we merged these two datasets using the distance weighted discrimination (DWD) method 6,7 and included intrinsically variable genes common to both studies. We then performed nonnegative matrix factorization (NMF) analysis with consensus clustering 8 to identify subtypes of the disease. This analysis supported up to three subtypes (cophenetic coefficient >0.99; Supplementary Figs. 1, 2a and Supplementary Tables 1–3). We then developed a gene signature by using subtypes defined in NMF analysis of the merged clinical datasets to supervise significance analysis of microarrays (SAM) analysis 9 with false discovery rate (FDR) less than 0.001. This resulted in a 62 gene signature, designated PDAssigner. The three PDA subtypes in the merged clinical dataset and their expression of PDAssigner genes are shown in Fig. 1a. We designated these subtypes as classical, quasi-mesenchymal (QM-PDA) and exocrine-like, based on our interpretation of subtype specific gene expression. The classical subtype had high expression of adhesion-associated and epithelial genes, the QM-PDA subtype showed high expression of mesenchyme associated genes. The exocrine-like subtype showed relatively high expression of tumor cell derived digestive enzyme genes, with immunohistochemical staining supporting this observation (Supplementary Fig. 3). Analysis of PDAssigner gene expression in three additional published PDA expression datasets of unique origin, platform or processing 10–12 also supported these three subtypes (Supplementary Fig. 4) demonstrating the robust nature of the subtype classification in early stage PDA. Survival after PDA resection has been associated with many factors including stage (tumor size and nodal involvement) and grade (degree of differentiation) 13 , but no one factor has been consistently prognostic 14,15 . We found that stratification by PDA transcriptional subtype provided useful prognostic information in one PDA dataset (UCSF) for which clinical annotation was available. Specifically, patients with classical subtype tumors fared better than patients with QM-PDA subtype tumors after resection (p=0.038, log rank, Fig. 1b). In this same data set, stage and grade were not significantly related (p>0.99), stage was not significantly associated with subtype (p=0.40), while grade was (p=0.041) (univariate analyses). In a multivariate Cox regression model including stage and subtype, subtype was an independent predictor of overall survival (p=0.024) indicating that PDA subtype independently contributed prognostic information to pathological staging in PDA. Associations of PDA subtype with other clinical variables are summarized in Supplementary Table 4. This analysis supports the use of subtypes (as defined using PDAssigner) as an independent prognostic indicator in resected PDA. Further validation of PDA subtypes and preclinical identification of subtype specific therapeutic agents would be facilitated by the availability of laboratory models of the subtypes. Therefore, we asked if the PDA subtypes were represented in a collection of 19 human and 15 mouse PDA cell lines. The 19 human PDA cell lines were selected from publicly available PDA lines while the 15 mouse lines were derived by us from genetically engineered Tp53 −/− and INK4A −/−16 models of PDA. The analyses of the human and mouse PDA cell lines using the 62 PDAssigner genes are shown in Fig. 1c,d, as well as in Supplementary Figs. 2b–e. Supplementary Tables 5 and 6. These cell line collections contain representatives of the classical and QM-PDA subtypes, but not the exocrine-like subtype. The absence of the exocrine-like subtype in the cell line collection raises the possibility that this subtype is an artifact of contaminating normal pancreas tissue adjacent to tumor. However, the UCSF samples were prepared by microdissection to enrich for PDA cancer cells thereby minimizing contaminating tumor-associated stroma and/or adjacent normal exocrine pancreas. This dataset includes the exocrine-like subtype, which argues that it is a bona fide PDA subtype. Thus, we conclude that the cell line collections model two of the PDA subtypes thereby enabling exploration of biological differences between these two PDA subtypes and may facilitate screening for classical and QM-PDA subtype-specific therapeutic agents and biological properties. Two genes associated with PDA subtypes, GATA binding protein 6 (GATA6) and v-ki-ras2 kirsten rat sarcoma viral oncogene homolog (KRAS), two variable genes in our UCSF PDA dataset, Supplementary Table 1a), have been implicated in both aspects of normal development and cancer pathophysiology in published studies. GATA-family transcription factors are associated with tissue specific differentiation and have been demonstrated to be subtype specific markers in other cancers. For example, GATA binding protein 3 (GATA3) is required for luminal differentiation in the breast 17 , and high GATA3 characterizes luminal subtype breast cancers 18 . Likewise, GATA6 is essential for pancreatic development 19 and has been implicated in PDA 20,21 . GATA6 is highly expressed in most classical subtype tumors and cell lines, and comparatively low in the QM-PDA cell lines and tumors. Additionally, a previously published gene signature associated with GATA6 overexpression 20 is enriched in the classical subtype (Supplementary Fig. 5). Seeking to establish a possible functional role underlying the observed differences in GATA6 expression, we assessed the impact of GATA6-directed RNAi knockdown on colony formation in soft agar in the classical and QM-PDA cell lines. GATA6 depletion impaired anchorage-independent growth in classical PDA cell lines, but not in QM-PDA cell lines (Fig. 2), consistent with a functional, subtype-specific role for this transcription factor in the classical PDA subtype. Recent work in the mouse demonstrates that PDA can arise from a variety of precursor cells by activating KRAS in distinct cellular compartments of the pancreas 22 . Others have found that cancer cell lines harboring mutant KRAS differ in their dependence on KRAS 23 . These findings imply plasticity in either reliance on KRAS signaling or a cell-type specific role for mutant KRAS in different cells of origin/lineages in PDA, or both. They further suggested to us that despite KRAS mutation in most PDAs, KRAS dependence might differ by PDA subtype. We found KRAS mRNA levels elevated in classical subtype PDAs relative to the other subtypes (Supplementary Fig. 6, p 0.8. We then merged SD-selected datasets using DWD 7 as described 6 . We column (samples) normalized to N(0,1) and row (probe or gene) normalized each dataset by median centering. We merged the processed datasets using DWD and finally, we median centered the rows. NMF and SAM Analysis We analyzed the merged datasets by consensus clustering-based NMF 8 to identify stable subtypes using R code from GenePattern 30 . See supplement for details regarding the interpretation of subtypes derived from consensus-based NMF clustering. We identified PDAssigner genes using three-class SAM analysis based on classes identified from NMF analysis using the Bioconductor 31 package, Siggenes, and generated heatmaps of samples by PDAssigner genes using Cluster software 32 . For cell line classification, we merged the cell line datasets with core PDA clinical datasets after selection of the 62 PDAssigner genes from each dataset followed by DWD based merging. We visualized datasets using the Hierarchical Clustering Viewer (HCV) from GenePattern 30 . Clinical Outcome Analysis We calculated overall survival in days from the time of PDA resection until date of death as defined by the State of California Death Registry and clinical records. We employed the log-rank test for univariate associations with survival or the Cox proportional hazards model for multivariate modeling of survival. We applied Fisher’s exact test to investigate the relationships among subtype, stage and grade. We used the R language for all analyses. We drew the survival curve using web-based GenePattern 30 . Drug Sensitivity We plated 2.5x103 cells per well on day 0, treated with erlotinib or gemcitabine in nine, five-fold serial dilutions on day 1 and estimated cell number using Cell Titre Glow assay (CTG, Promega) on day 4. IC50 was calculated using the Calcusyn program (Biosoft). RNAi We obtained validated pLKO-based shRNA vectors shKRAS#5 33 from Dr. B.R. Stockwell (NYU) and shGATA6#5, and shLuc 34 from Dr. R Adam, (Boston Children’s Hospital). We packaged lentiviruses, transduced cells and then selected in puromycin for 48 hours. For shKRAS proliferation experiments, we plated 2.5 x103 transduced cells per well on day 0 in 96 well plates, then counted one plate on day one and the other plate on day four. We confirmed protein knockdown by western blotting using the Odyssey system, with 10ug per lane of total protein and the c19 KRas antibody (Santa Cruz), normalized to total actin (I-19, Santa Cruz) and compared to pLKOshLuc -KRas levels. For GATA6 knockdown experiments, after puromycin selection cells we trypsinized and plated transduced cells in soft agar as described 35 . We assessed GATA6 transcript levels on the day of plating in soft agar as described 34 . See Supplementary Information for detailed methods. Supplementary Material 1 2 3 4 5 6 7 8 9