135
views
0
recommends
+1 Recommend
0 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.

          Author Summary

          Genome rearrangements and associated gene fusions are known to be important oncogenic events in some cancers. We have developed a novel computational method called deFuse for detecting gene fusions in RNA-Seq data and have applied it to the discovery of novel gene fusions in sarcoma and ovarian tumors. We assessed the accuracy of our method and found that deFuse produces substantially better sensitivity and specificity than two other published methods. We have also developed a set of 60 positive and 61 negative examples that will be useful for accurate identification of gene fusions in future RNA-Seq datasets. We have trained a classifier on 11 novel features of the 121 examples, and show that the classifier is able to accurately identify real gene fusions. The 45 gene fusions reported in this study represent the first ovarian cancer fusions reported, as well as novel sarcoma fusions. By examining the expression patterns of the affected genes, we find that many fusions are predicted to have functional consequences and thus merit experimental followup to determine their clinical relevance.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Universal Protein Resource (UniProt) in 2010

          The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            De novo assembly and analysis of RNA-seq data.

            We describe Trans-ABySS, a de novo short-read transcriptome assembly and analysis pipeline that addresses variation in local read densities by assembling read substrings with varying stringencies and then merging the resulting contigs before analysis. Analyzing 7.4 gigabases of 50-base-pair paired-end Illumina reads from an adult mouse liver poly(A) RNA library, we identified known, new and alternative structures in expressed transcripts, and achieved high sensitivity and specificity relative to reference-based assembly methods.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              COMPLEX LANDSCAPES OF SOMATIC REARRANGEMENT IN HUMAN BREAST CANCER GENOMES

              SUMMARY Multiple somatic rearrangements are often found in cancer genomes. However, the underlying processes of rearrangement and their contribution to cancer development are poorly characterised. Here, we employed a paired-end sequencing strategy to identify somatic rearrangements in breast cancer genomes. There are more rearrangements in some breast cancers than previously appreciated. Rearrangements are more frequent over gene footprints and most are intrachromosomal. Multiple architectures of rearrangement are present, but tandem duplications are common in some cancers, perhaps reflecting a specific defect in DNA maintenance. Short overlapping sequences at most rearrangement junctions suggest that these have been mediated by non-homologous end-joining DNA repair, although varying sequence patterns indicate that multiple processes of this type are operative. Several expressed in-frame fusion genes were identified but none were recurrent. The study provides a new perspective on cancer genomes, highlighting the diversity of somatic rearrangements and their potential contribution to cancer development.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                May 2011
                May 2011
                19 May 2011
                : 7
                : 5
                : e1001138
                Affiliations
                [1 ]Centre for Translational and Applied Genomics, BC Cancer Agency, Vancouver, British Columbia, Canada
                [2 ]School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
                [3 ]Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
                [4 ]Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
                [5 ]Department of Molecular Oncology, BC Cancer Agency, Vancouver, British Columbia, Canada
                Accelrys, United States of America
                Author notes

                Conceived and designed the experiments: AM AZ MAM MH TON SCS DH SPS. Performed the experiments: AM. Analyzed the data: AM. Contributed reagents/materials/analysis tools: AM. Wrote the paper: AM SPS. Assisted with algorithm development: FH. Software development: RG MGFS. Copy number variation analysis: GH. Development of precursor software: MG. Gene expression analysis: AHM. RT-PCR validation: JS. FISH validation: NM MP.

                Article
                10-PLCB-RA-2589R4
                10.1371/journal.pcbi.1001138
                3098195
                21625565
                6fbd0e3c-0fa1-4d4c-8220-16082875d9e2
                McPherson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 28 July 2010
                : 18 April 2011
                Page count
                Pages: 16
                Categories
                Research Article
                Computational Biology/Genomics
                Computer Science/Applications
                Oncology/Genitourinary Cancers
                Oncology/Sarcomas

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article