We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population.
Cancer is an evolutionary system whose growth and development is attributed to aberrations in well-known genes and to cancer-type specific genomic imbalances. Here, we present methods for reconstructing the evolution of individual tumors based on cell-to-cell variations between copy numbers of targeted regions of the genome. The methods are designed to work with fluorescence in situ hybridization (FISH), a technique that allows one to profile copy number changes in potentially thousands of single cells per study. Our work advances the prior art by developing theory and practical algorithms for building evolutionary trees of single tumors that can model gain or loss of genetic regions at the scale of single genes, whole chromosomes, or the entire genome, all common events in tumor evolution. We apply these methods on simulated and real tumor data to demonstrate substantial improvements in tree-building accuracy and in our ability to accurately classify tumors from their inferred evolutionary models. The newly developed algorithms have been released through our publicly available software, FISHtrees.