37
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE.

          Results

          We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles.

          Conclusions

          After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          The evolution of gene expression levels in mammalian organs.

          Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses of gene expression evolution were until recently prevented by technological limitations. Here we report the sequencing of polyadenylated RNA from six organs across ten species that represent all major mammalian lineages (placentals, marsupials and monotremes) and birds (the evolutionary outgroup), with the goal of understanding the dynamics of mammalian transcriptome evolution. We show that the rate of gene expression evolution varies among organs, lineages and chromosomes, owing to differences in selective pressures: transcriptome change was slow in nervous tissues and rapid in testes, slower in rodents than in apes and monotremes, and rapid for the X chromosome right after its formation. Although gene expression evolution in mammals was strongly shaped by purifying selection, we identify numerous potentially selectively driven expression switches, which occurred at different rates across lineages and tissues and which probably contributed to the specific organ biology of various mammals.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Evolutionary changes in cis and trans gene regulation.

            Differences in gene expression are central to evolution. Such differences can arise from cis-regulatory changes that affect transcription initiation, transcription rate and/or transcript stability in an allele-specific manner, or from trans-regulatory changes that modify the activity or expression of factors that interact with cis-regulatory sequences. Both cis- and trans-regulatory changes contribute to divergent gene expression, but their respective contributions remain largely unknown. Here we examine the distribution of cis- and trans-regulatory changes underlying expression differences between closely related Drosophila species, D. melanogaster and D. simulans, and show functional cis-regulatory differences by comparing the relative abundance of species-specific transcripts in F1 hybrids. Differences in trans-regulatory activity were inferred by comparing the ratio of allelic expression in hybrids with the ratio of gene expression between species. Of 29 genes with interspecific expression differences, 28 had differences in cis-regulation, and these changes were sufficient to explain expression divergence for about half of the genes. Trans-regulatory differences affected 55% (16 of 29) of genes, and were always accompanied by cis-regulatory changes. These data indicate that interspecific expression differences are not caused by select trans-regulatory changes with widespread effects, but rather by many cis-acting changes spread throughout the genome.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

              Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. Contact: jdegner@uchicago.edu; marioni@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2013
                7 August 2013
                : 14
                : 536
                Affiliations
                [1 ]Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
                [2 ]Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
                [3 ]Department of Molecular, Cellular, and Developmental Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109, USA
                Article
                1471-2164-14-536
                10.1186/1471-2164-14-536
                3751238
                23919664
                5152fe25-fc69-4577-8eed-d7c6d41cb475
                Copyright ©2013 Stevenson et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 February 2013
                : 5 August 2013
                Categories
                Methodology Article

                Genetics
                next-generation sequencing,mapping bias,drosophila melanogaster,drosophila simulans,dgrp,allelic imbalance,genomics,gene expression,illumina

                Comments

                Comment on this article