37
views
0
recommends
+1 Recommend
0 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Spermatogenesis Studies Reveal a Distinct Nonsense-Mediated mRNA Decay (NMD) Mechanism for mRNAs with Long 3′UTRs

      other
      *
      PLoS Genetics
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Extensive alternative splicing and polyadenylation of pre-mRNAs not only expands the protein coding potential of our genomes but also generates a wealth of mRNA isoforms with different 3′ untranslated regions (UTRs) [1,2]. Since 3′UTRs are major regulators of mRNA stability, localization, and translation, the tissue-specific, developmentally regulated, and stress-induced generation of alternative 3′UTRs greatly contributes to the posttranscriptional regulation of gene expression. It is an intriguing observation that mRNAs with the longest 3′UTRs are predominately present in the brain, whereas the testis is enriched in mRNA isoforms with shorter 3′UTRs [3,4]. In general, mRNAs with shorter 3′UTRs tend to be more stable because (i) they contain fewer binding sites for decay-inducing miRNAs or RNA-binding proteins (RBPs) and (ii) long 3′UTRs can trigger nonsense-mediated mRNA decay (NMD) [1,5]. The term NMD was initially coined to describe the accelerated degradation observed for mRNAs with nonsense mutations that prematurely truncate the open reading frame (ORF) [6]. NMD serves an important cellular quality control function by reducing the production of potentially deleterious C-terminally truncated proteins. However, genome-wide studies uncovered that beyond degrading aberrant mRNAs harboring premature translation termination codons (PTCs), NMD also targets many “normal” mRNAs encoding apparently full-length functional proteins, suggesting a broader biological function in posttranscriptional gene regulation [5,7]. The three conserved core factors (UPF1, UPF2, and UPF3) and additional metazoan-specific proteins are required for NMD in mammalian cells and, although the exact molecular mechanism of NMD is not known, inefficient or aberrant translation termination seems to be a key trigger for NMD. It has been empirically found that exon–exon junctions located >50 nucleotides downstream of the termination codon often trigger NMD, which is typically the case in aberrant PTC-containing transcripts, and that long 3′UTRs can also elicit NMD, a feature found in many of the PTC-free NMD targets [8–10]. Whether these two groups of NMD substrates are recognized and degraded by a common mechanism or whether they employ mechanistically distinct branches of NMD is a major unresolved question in the field. Two new studies on mouse spermatogenesis [11,12], both published in this issue of PLOS Genetics, now shed some unexpected new light on this question. During spermatogenesis, spermatogonia differentiate into spermatocytes, which undergo meiosis and postmeiotically develop into round and, further, elongated spermatids (Fig 1). 10.1371/journal.pgen.1005979.g001 Fig 1 Illustration depicting mammalian spermatogenesis and the effect of a UPF2 or TDRD6 knockout on two different types of NMD-targeted mRNAs. Precursors of chromatid bodies (CBs: red) begin to form in late-stage spermatocytes at the end of meiosis and condense into the typical CB structure in early round spermatids. CBs are enriched in different classes of RNA, NMD factors, and additional proteins, including TDRD6 and MHV. Both papers report that the NMD factors UPF1 and UPF2 are highly expressed in postmeiotic spermatocytes and spermatids, where they are found in germ cell-specific perinuclear structures called chromatin bodies (CBs). CBs consist of RNA, many RNA-binding proteins, helicases, and several members of the TUDOR-domain protein family (TDRDs); furthermore, CBs are best known for their role in piRNA biogenesis [13,14]. A conditional UPF2 knockout in spermatogonia led to infertile mice with small testes [11], resembling the “Sertoli-only syndrome” in humans; ablation of TDRD6, which disrupts CB formation and arrests spermatogenesis, gave a similar phenotype. Transcriptome profilings of UPF2 or TDRD6 knockout spermatocytes and round spermatids revealed in both cases a strong enrichment of transcripts with long 3′UTRs among the up-regulated RNAs, while the classical NMD targets (defined by the presence of an exon–exon junction >50 nucleotides downstream of the termination codon) were largely unaffected [11,12]. The UPF2 knockout data is reminiscent of a previous study showing that NMD induced by the presence of exon junction complex (EJC) factors eIF4A3, Y14, and MAGOH downstream of the PTC does not require UPF2 [15]. Together with the results from Bao and colleagues [11], this finding provides evidence for the existence of mechanistically different modes of NMD acting on different types of transcripts and, in particular, strongly suggests a UPF2-independent route of NMD for transcripts with EJCs in the 3′UTR. Because UPF2 is thought to play a crucial role in NMD by promoting the SMG1-mediated phosphorylation of UPF1 [5], it remains, however, unclear how UPF1 phosphorylation could be achieved in the postulated UPF2-independent NMD mode. That the TDRD6 knockout affects the transcriptome in the same way as the UPF2 knockout suggests that there must be a germ-line–specific aspect to NMD. Fanourgakis and colleagues provide several lines of evidence that, in spermatocytes and round spermatids, NMD occurs in the CBs [12]. The absence of TDRD6 disrupted CB formation, UPF1 no longer bound to UPF2 and to the CB component MHV (a Vasa-like helicase), and UPF1 and UPF2 associations with long 3′UTR-containing mRNAs was reduced. Taken together, these observations imply that intact CBs are a prerequisite for this branch of NMD [12]. In contrast, the other NMD branch targeting mRNAs with EJCs in the 3′UTR does not require intact CBs because it was not affected by the TDRD6 knockout. It is noteworthy that piRNA biogenesis was not affected in the TDRD-/-cells either. In summary, the two new studies [11,12] corroborate the view that what is currently called NMD might in fact represent more than one mechanistically distinct mRNA degradation pathway by genetically separating the UPF2- and TDRD6-dependent degradation of mRNAs with long 3′UTRs from the UPF2- and TDRD6-independent degradation of mRNAs with EJCs in the 3′UTR. It will be a future challenge to elucidate the exact mechanisms of the two different NMD routes. The two studies also establish an essential role for NMD in mammalian spermatogenesis by contributing to shaping the male-germ-cell–specific transcriptome, which is typified by mRNAs with unusually short 3′UTRs. While germ-line–specific alternative polyadenylation factors ensure that important mRNAs for sperm development are expressed with a short 3′UTR, a UPF2- and TDRD6-dependent branch of NMD clears ubiquitously expressed mRNAs with long 3′UTRs in these cells.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes.

          Nonsense-mediated mRNA decay (NMD) is probably the best characterized eukaryotic RNA degradation pathway. Through intricate steps, a set of NMD factors recognize and degrade mRNAs with translation termination codons that are positioned in abnormal contexts. However, NMD is not only part of a general cellular quality control system that prevents the production of aberrant proteins. Mammalian cells also depend on NMD to dynamically adjust their transcriptomes and their proteomes to varying physiological conditions. In this Review, we discuss how NMD targets mRNAs, the types of mRNAs that are targeted, and the roles of NMD in cellular stress, differentiation and maturation processes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Nonsense-mediated mRNA decay - mechanisms of substrate mRNA recognition and degradation in mammalian cells.

            The nonsense-mediated mRNA decay (NMD) pathway is well known as a translation-coupled quality control system that recognizes and degrades aberrant mRNAs with truncated open reading frames (ORF) due to the presence of a premature termination codon (PTC). However, a more general role of NMD in posttranscriptional regulation of gene expression is indicated by transcriptome-wide mRNA profilings that identified a plethora of physiological mRNAs as NMD targets. In this review, we focus on mechanistic aspects of target mRNA identification and degradation in mammalian cells, based on the available biochemical and genetic data, and point out knowledge gaps. Translation termination in a messenger ribonucleoprotein particle (mRNP) environment lacking necessary factors for proper translation termination emerges as a key determinant for subjecting an mRNA to NMD, and we therefore review recent structural and mechanistic insight into translation termination. In addition, the central role of UPF1, its crucial phosphorylation/dephosphorylation cycle and dynamic interactions with other NMD factors are discussed. Moreover, we address the role of exon junction complexes (EJCs) in NMD and summarize the functions of SMG5, SMG6 and SMG7 in promoting mRNA decay through different routes. This article is part of a Special Issue entitled: RNA Decay mechanisms. Copyright © 2013 Elsevier B.V. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Biased alternative polyadenylation in human tissues

              Background Polyadenylation is essential for the 3'-end formation of most mRNAs in eukaryotes. It involves two tightly coupled steps, cleavage of a nascent mRNA and polymerization of a poly(A) tail at the 3' end of the cleaved RNA. An array of factors are involved in the process, including factors that seem to be exclusively involved in polyadenylation, such as cleavage-polyadenylation specificity factor (CPSF), cleavage stimulatory factor (CstF), cleavage factors (CFs) I and II, and poly(A) polymerase (PAP), and factors that are involved in both polyadenylation and other cellular processes, including transcription and mRNA splicing, such as RNA polymerase II, Symplekin [1,2], PC4 [3], Ssu72 [4], heterogeneous nuclear ribonucleoprotein (hnRNP) F [5], hnRNP H/H' [6], U2AF65 [7], U1A [8-10], polypyrimidine tract binding protein (PTB) [11], and SRp20 [12]. The fact that some factors are involved in both polyadenylation and transcription and mRNA splicing supports the notion that these processes are tightly coupled [13,14]. In addition, the processing efficiency of polyadenylation has a direct impact on the amount of mRNAs produced [15]. Abnormal processing efficiency can lead to human diseases such as thrombophilia [16]. Both biochemical and bioinformatic methods have been applied to the identification of cis-regulatory elements (or cis elements) for polyadenylation. The polyadenylation signal (PAS) is located 10 to 35 nucleotides (nt) upstream of the cleavage site, and serves as the binding site for CPSF. It is usually AAUAAA or a single nucleotide variant [17,18]. U/GU-rich elements are located within approximately 40 nt downstream of the cleavage site [19,20], serving as the binding site for CstF. In addition, several auxiliary upstream elements and downstream elements have been found in viral or cellular genes that can promote or repress polyadenylation [21-24]. Recent studies have shown that over half of the human genes have multiple polyadenylation sites (poly(A) sites) [18,25]. Like alternative initiation and alternative splicing, alternative polyadenylation (Alt-PA) contributes to the complexity of the transcriptome in human cells by producing mRNAs with different 3' untranslated regions (3'UTRs) and/or encoding variable protein isoforms [15]. The regulation of 3'UTRs by Alt-PA can have a different impact on the mRNA metabolism, as 3'UTRs can contain various regulatory elements, such as AU-rich elements responsible for mRNA stability [26,27] and miRNA target sequences involved in the regulation of mRNA translation [28-30]. The effect of Alt-PA on protein coding is usually coupled with alternative splicing [15], and has been demonstrated for several genes. Well-studied examples include regulation of the IgM heavy chain gene [31] and regulation of calcitonin/calcitonin gene-related peptide [32,33]. Many poly(A) sites are preferentially used in certain tissues and under specific cellular conditions [15,34]. It is not known, however, whether the pattern of poly(A) site usage is systematically different among human tissues, which could result in coordinate regulation of 3'UTRs or encoded proteins for a large number of genes. Here we describe our effort to study tissue-specific Alt-PA events using bioinformatic approaches. Using expressed sequence tag (EST) data and a newly developed method named GAUGE (for global study of poly(A) site usage by gene-based EST vote), we investigated 42 tissue types. We found that several tissues tend to use poly(A) sites that are biased toward certain locations of a gene, that is, 5' or 3' poly(A) sites. For poly(A) sites located in the 3'-most exon, biased usage was found in the nervous system, brain, pancreatic islet, ear, bone marrow, uterus, retina, placenta, ovary, and blood. For poly(A) sites located in introns or internal exons, biased usage was observed in cerebrum, soft tissue, pancreas, lung, prostate, skin, placenta, esophagus, eye, retina, and blood. In addition, we found that eye, retina, and placenta tend to use poly(A) sites not preferred in other tissues. Using microarray expression data of polyadenylation-related protein factors, we found that several brain tissues have high concordance with each other, and low concordance with other tissues. Finally, we identified several cis elements that are preferentially associated with brain-specific poly(A) sites. Taken together, our data suggest that systematic bias of Alt-PA occurs in several human tissues, and both cis elements and trans-acting factors are responsible for regulating Alt-PA. Results Positional preference of polyadenylation in human tissues We have previously shown that approximately 54% of human genes have multiple poly(A) sites [18]. Poly(A) sites can be located in various regions of a gene, including introns, internal exons, and 3'-most exons [18,25]. To address whether there are positional preferences of Alt-PA in human tissues, we evaluated tissue-specific poly(A) site usage based on the relative position of a poly(A) site in a gene. We have previously classified human genes into three types according to the locations of their poly(A) sites [18] (also shown in Figure 1 in Additional data file 1). Briefly, genes with only one poly(A) site are classified as type I genes, genes with multiple poly(A) sites all in the 3'-most exon as type II genes, and genes with poly(A) sites located in introns or internal exons as type III genes. Alt-PA of type II genes may result in mRNAs with variable 3'-UTRs, and the usage of poly(A) sites located in introns or internal exons of type III genes can potentially have an impact on protein sequence or lead to mRNAs with no in-frame stop codons. Thus, by investigating the poly(A) site usage of type II and III genes, one can address the question of whether Alt-PA leads to variable 3'-UTRs or protein products in certain tissue types. To this end, we classified poly(A) sites of type II genes into 2F (the 5'-most poly(A) site), 2L (the 3'-most poly(A) site), and 2M (middle poly(A) sites between 2F and 2L); and classified poly(A) sites of type III genes into 3U (poly(A) sites located upstream of the 3'-most exon) and 3D (poly(A) sites located in the 3'-most exon) (Figure 1 in Additional data file 1). Based on the EST tissue information obtained from the UniGene database [35] and assisted by the cDNA library classification made by Yeo et al. [36], we first grouped cDNA libraries into tissue types. In order to make quantitative comparisons, we used only non-normalized cDNA libraries. In total, we grouped 609 non-normalized cDNA libraries into 42 tissue types, corresponding to 86,495 poly(A/T)-tailed ESTs (Table 1 in Additional data file 1). To examine tissue-specific usage of the three types of poly(A) sites in type II genes (2F, 2M, and 2L) and the two types of poly(A) sites in type III genes (3U and 3D), we developed a method named GAUGE as follows. Type II or type III genes cast votes for the usage of different poly(A) site types in every tissue. The votes were based on the number of supporting ESTs for certain poly(A) sites, for example, 2F, 2M, or 2L. To account for the difference of expression levels among genes, the number of ESTs for each poly(A) site type was divided by the total number of ESTs supporting each gene. Thus, for each gene, the sum of the vote for all site types equals 1. The percentage of usage of a poly(A) site type in a tissue can then be measured by the votes cast by all genes expressed in that tissue divided by the number of genes. We then carried out Chi-squared tests to measure the bias of usage of different types of sites in each tissue. For each poly(A) site type in a tissue, its percent of usage was compared with the median percent of usage of all tissues. The difference was normalized to the median and called distance (Figure 1). Complete lists of values for all tissues are provided in Tables 2 and 3 in Additional data file 1. Several tissues were found to have significantly biased (p value 0.75) for most tissues, suggesting consistency of these two studies with respect to the factors investigated. As expected, a distinct cluster containing several brain tissues (amygdala, thalamus, caudate nucleus, fetal brain, and whole brain) can be discerned (average r-value 0.87 within the cluster), which showed low concordance with other tissues (average r-value 0.55 between the cluster and other tissues). The clustering result to some extent agrees with our studies using ESTs. For example, lung, ovary, placenta, and prostate, which were among the 25 tissues in the microarray studies, had significant positional bias towards 5' poly(A) sites (2F or 3U; Figure 1), and brain and cerebrum had a statistically significant positional preference for 3' poly(A) sites (2L or 3D; Figure 1). Consistent with these observations, expression data from brain tissues correlated poorly (mean r-value of 0.41) with those from placenta, lung, ovary, or prostate (Figure 3c). Two studies showed that, of the 21 genes, U1A and PTB had a similar expression pattern across tissues, as did PC4, PCF11, τCstF-64, and hnRNP H'. The difference between brain tissues and others was mainly attributable to the expression of four genes: The expression of PTB and U1A was consistently lower in brain tissues than in other tissues (p value +100) of a poly(A) site [18,37]. We divided the genomic sequence surrounding a poly(A) site into four regions: -40/-1 nt, where the PAS is located; +1/+40 nt, where U/GU-rich elements are usually located; -100/-41 nt, where auxiliary upstream elements may be located; and +41/+100 nt, where auxiliary downstream elements may be located (Figure 5a). In addition, we used the -300/-200 and +200/+300 regions as control regions, which, based on our current knowledge of cis elements for polyadenylation, should contain very few, if any, regulatory elements for polyadenylation. To identify cis elements in brain-specific poly(A) sites, we took an approach that is similar to the method described in [24]. Briefly, hexamers (4,096 in total) were assigned with two scores: z un , the difference between the frequency of occurrence in a specific sequence region, for example, -100/-41, of poly(A) sites used in brain tissues (a total of 2,495 sites) and those not used in the brain (a total of 3,297 sites); and z pc , the difference between the frequency of occurrence in a specific poly(A) region and the frequency of occurrence in control regions. We then selected hexamers with both z un and z pc greater than 2.5. A z score of 2.5 corresponds to a p value of approximately 0.01 in a normal distribution. To avoid the identification of cis elements that are globally preferred, we filtered out hexamers with z sw > 2.5, where z sw is the difference between the frequency of occurrence in a specific sequence region of strong poly(A) sites and that of weak poly(A) sites, using 75% as the cutoff for classification of strong and weak sites. Selected hexamers were grouped by their mutual similarities, and groups with more than three hexamers were used to build sequence logos. In addition, position-specific scoring matrices (PSSMs) were derived from the logos, and used to search corresponding cis elements in poly(A) regions. We identified five putative elements that were significantly over-represented in various regions of poly(A) sites preferentially used in brain tissues (Figure 5). Among these, a GU element (Figure 5d, right panel) was identified in region +1/+40, which seems to be the binding site for CstF-64. GU elements should be general enhancers for polyadenylation. As we filtered out hexamers that are significantly associated with strong poly(A) sites, the fact that a GU element still remains indicates that the GU element is strongly biased to poly(A) sites used in brain tissues. This notion is in line with the difference between the percent of hits profile of the GU element in brain specific poly(A) sites compared to non-brain poly(A) sites (Figure 5d, right panel). As the expression of CstF-64 is similar between brain tissues and other tissues and the expression of τCstF-64 is significantly higher in brain tissues, the identified element could be the preferred binding site of τCstF-64. This prediction, however, needs to be validated in wet lab experiments. In addition, we found that the UCUUU element (Figure 5d, left panel) was over-represented in region +1/+40. UCUUU is known to be the binding site of PTB [11,41]. Interestingly, the UUC/GUG element identified in the -100/-41 region (Figure 5b, right panel) also resembles PTB binding sites. As shown by the microarray data, PTB expression is low in brain tissues, whereas the nPTB level is high. Thus, it will be interesting to examine whether nPTB binds to these cis elements and plays a role in poly(A) site selection in brain tissues. Furthermore, two other elements (Figure 5b, left panel and Figure 5c) seem to be related to U-rich elements and the AAUAAA PAS. Their significance is not clear, despite the fact that their percent of hits profiles differ between poly(A) sites preferred in brain tissues and those not preferred (Figure 5b,c). They could well be general regulatory elements that were not filtered out using z sw scores (see above). In line with this notion, both elements only had four supporting hexamers, whereas the GU element and UCUU element had five supporting hexamers, and the UUC/GUG element had seven supporting hexamers (Figure 3 in Additional data file 1). Discussion We have detected biased poly(A) site usage in several human tissues using GAUGE. GAUGE was designed to detect systematic bias of poly(A) site usage in different tissues. The idea is that individual genes may not have statistical power for detection of overall trend, whereas significant patterns could emerge using a large number of genes. Although the numbers of cDNA libraries and ESTs for some tissue types were sufficient to allow us to make statistical conclusions, some others did not have enough numbers for sensitive detections, such as heart and thymus (Table 1 in Additional data file 1). If more ESTs become available, this approach could be carried out for these tissues in the future. On a similar note, an inherent limitation of our approach is that we could not assess the bias for individual genes due to lack of statistical power, which, at the current stage, is best addressed by wet lab experiments. For poly(A) sites located in the 3'-most exon, the nervous system, brain, pancreatic islet, ear, bone marrow, and uterus tend to use 3' poly(A) sites, whereas retina, placenta, ovary, and blood tend to use 5' poly(A) sites. This observation indicates that genes may express mRNAs with longer 3'UTRs in certain tissues than in others, and the pattern is systematically controlled. Consistent with our observation, it has been suggested that brain tissues tend to express larger genes than other tissues [42], presumably due to the low mitotic activity of highly differentiated cells in the brain allowing more time to express long transcripts. Our data also suggest that each tissue type may have a defined 'program' to produce mRNAs with certain length. Given that 3'UTRs contain various RNA regulatory elements, it is conceivable that this mode of gene regulation could coordinately influence mRNA metabolism for a large number of genes. However, the exact impact of this systematic control needs to be explored in wet lab settings. In addition, lung, prostate, skin, placenta, esophagus, eye, retina, and blood were found to have higher usage of poly(A) sites located upstream of the 3'-most exon than other tissues. The usage of these poly(A) sites could result in truncated mRNAs without in-frame stop codons, or mRNAs encoding distinct protein isoforms. The coordinated regulation of poly(A) site usage could, therefore, lead to a switch in the expression of protein isoforms. As poly(A) sites located upstream of the 3'-most exon are next to introns and internal exons, regulation of this type of poly(A) sites is complicated by other factors, such as transcription and mRNA splicing. For example, both the IgM heavy chain gene [31] and the calcitonin/calcitonin gene-related peptide [32,33] gene switch protein products by using different poly(A) sites under certain cellular conditions. In both cases, alternative splicing was also shown to be involved. We found that the expression of U1A, PC4, τCstF-64, PTB, and nPTB were significantly different between brain tissues and other tissues. The differences may contribute to the distinct Alt-PA pattern in the brain. It has been shown that brain tissues exhibit high levels of alternative splicing, especially exon skippings [36], which is consistent with our observation of a low expression level of PTB, a repressor of mRNA splicing [43], in brain tissues. It has also been shown that PTB can modulate polyadenylation efficiency by competing with CstF-64 for binding to downstream U/GU-rich elements [11]. nPTB shares high sequence homology with PTB [44,45] (Figure 4a in Additional data file 1), but its activity in regulating polyadenylation has not been studied. U1A can modulate polyadenylation by interacting with the poly(A) polymerase [10]. Furthermore, PC4 can regulate polyadenylation by interacting with CstF-64 [3]. τCstF-64 appears to be a paralog of CstF-64 (75% identity in protein sequence), which has been previously reported to be highly expressed in the brain and testis [46]. CstF-64 and τCstF-64 are highly homologous (>95% identity; Figure 4b in Additional data file 1) in both the amino-terminal RNA binding domain, which is responsible for interacting with U/GU-rich elements, and the carboxy-terminal 63 amino acid region, which has been implicated in binding to PC4 [3], indicating that the functions of CstF-64 and τCstF-64 may overlap extensively. Thus, nPTB and τCstF-64 appear to be functional homologs of PTB and CstF-64, respectively. Our observations that both nPTB and τCstF-64 mRNA levels are higher in brain tissues than other tissues, whereas the PTB mRNA level is lower in brain tissues and there is no difference in CstF-64 mRNA expression between brain tissues and other tissues (Figure 2 in Additional data file 1), indicate that brain tissues use a different set of genes to regulate splicing and polyadenylation, albeit their functions may be similar to their counterparts in other tissues. In this study, we used brain tissues as a model to correlate the presence of cis elements and expression of trans factors. The reason to choose brain tissues is that biased usage of poly(A) sites was observed in brain tissues and high concordance of gene expression of polyadenylation factors was detected among several brain tissues. The latter is important as microarray data often contain noise. Using two datasets and several brain tissues gave us assurance as to the quality of the data. On the other hand, we only focused on known polyadenylation factors. Other protein factors that may also be involved in the regulation of polyadenylation were not examined in this study. Nevertheless, the significant presence of PTB and CstF-64 binding sites near poly(A) sites preferentially used in brain tissues correlates with high expression levels of nPTB and τCstF-64, which suggests a model where nPTB and τCstF-64 function cooperatively in poly(A) site selection in brain tissues. However, the exact details of the interactions need to be investigated in the future. In addition, the role of PC4 in regulating poly(A) site selection in the brain is to be examined, as a higher level of PC4 was observed in brain tissues versus other tissues. Furthermore, the same approach for identifying tissue-specific usage of cis elements can be applied to other tissue types. Ear, retina, and placenta will be particularly interesting to study, as they were found to use poly(A) sites that are not frequently used in other tissues and all three tissues tend to use 5' poly(A) sites. Materials and methods Datasets and resources Genes with alternative poly(A) sites, their annotations including poly(A) positions and supporting EST evidence were obtained from polyA_DB [38]. General annotations of cDNA libraries were downloaded from the UniGene database [35]. A PERL script was used to determine whether a cDNA library is normalized or non-normalized. A cDNA library is classified as non-normalized if its annotation contains 'non-normaliz' or 'not normaliz', or does not contain the string 'normaliz' in any part of the annotation. Assisted by tissue annotations for cDNA libraries made by Yeo et al. [36], we grouped 609 non-normalized cDNA libraries into 42 tissue types. Microarray datasets [39,40] were downloaded from NCBI GEO [47]. Mappings of probe-sets to LocusLink IDs were obtained from the Affymetrix website [48]. Identification of biased usage of poly(A) sites in human tissues by GAUGE For genes with more than one poly(A) site, we used the number of supporting ESTs to classify strong or weak sites. To make robust assessment, we used three cutoffs for the classification, specifically, 60%, 75%, and 90%. For each cutoff, the poly(A) site with the percent of supporting ESTs above the cutoff was classified as a strong site. If there was a strong poly(A) site, other sites of the same gene were classified as weak sites. In addition, we required that the sum of ESTs for all weak sites must be above 1. Type II and type III genes were classified as previously described [18]. Poly(A) sites in type II genes were classified into 2F (the 5'-most poly(A) site), 2L (the 3'-most poly(A) site), and 2M (middle poly(A) sites between 2F and 2L); and poly(A) sites in type III genes were classified into 3U (poly(A) sites located upstream of the 3'-most exon) and 3D (poly(A) sites located in the 3'-most exon). To study the usage of poly(A) sites, we allowed each gene to cast votes for the usage of poly(A) site types according to supporting ESTs. The vote was calculated as follows: where is the votes for the usage of poly(A) site type p (strong/weak; or 2F/2M/2L; or 3U/3D) in tissue type t; is the number of poly(A/T)-tailed ESTs supporting the usage of poly(A) site type p for gene g in tissue t; and E g,t is the total number of poly(A/T)-tailed ESTs associated with gene g in tissue t. Percent of usage of a poly(A) type in a tissue (also called observed usage) was calculated by dividing the votes by the total number of genes investigated in the tissue. The median usage for a poly(A) site type is the median of the percent of usage values for the type of all tissues. The distance to the median usage was calculated by (observed usage - median usage)/median usage. A Chi-squared test was performed for each tissue with the null hypothesis that the usage of a given poly(A) site type in the tissue is not different from the median usage. The analyses were carried out in R [49]. Microarray data analysis of trans-acting factors mRNA expression data were obtained from the NCBI GEO database [47]. The average difference values were normalized to the 75th percentile within each chip. When more than one probe-set mapped to the same gene, the median value was used to represent the mRNA expression level. For tissue types with more than one sample, median values were used. Clustering of tissues and genes with respect to expression profiles of polyadenylation factors were carried out using the Cluster program [50], and presented using TreeView [50]. Identification of candidate cis elements Genomic regions -100/+100 nt surrounding the poly(A) sites were divided into four sub-regions: -100/-41 nt, -40/-1 nt, +1/+40 nt, and +41/+100 nt. Frequencies of occurrence of all 4,096 hexamers were calculated in each sub-region and in control regions (-300/-200 and +200/+300). Three scores were used to select hexamers in each sub-region: z un , the difference between the frequency of occurrence from poly(A) sites used in the brain and those from poly(A) sites not used in the brain; z pc , the difference between the frequency of occurrence in the sub-region and the frequency of occurrence in the control regions; and z sw , the difference between the frequency of occurrence in the sub-region of strong poly(A) sites and weak poly(A) sites using the 75% cutoff (see above). All z scores were calculated using the following equations. For hexamers in set a and set b, z ab was calculated as follows: where and N a and N b are the total number of hexamers associated with set a and set b, respectively. f a (H) and f b (H) are the frequency of occurrence of hexamer H in set a and set b, respectively. A cutoff of 2.5 was used to select hexamers that are significantly biased to one set, which corresponded to a p value of approximately 0.01 in a normal distribution. Thus, selection of hexamers by two criteria should result in less than one falsely identified hexamer (4,096 × 0.01 × 0.01 = 0.4). Hexamers that have both z un and z pc above the cutoff, and z sw below the cutoff, were selected for further analysis. Selected hexamers were grouped based on their mutual distances using the hierarchical clustering function in program R with the average agglomerative method. The distance between two hexamers is their dissimilarity score (d) calculated as follows: d = 6 - s, where s is a similarity score. s was calculated using a dynamic programming method for global sequence alignment that does not allow gaps, and match and mismatch scores were 1 and 0, respectively. A cutoff of 2.5 was used to group hexamers. Only groups containing more than three hexamers after clustering were selected for further analysis. Hexamers in the same group were aligned using a multiple sequence alignment method using the hexamer with the highest frequency of occurrence as the seed. All other hexamers were aligned to the seed. Aligned hexamers were used to build sequence logos to represent cis elements using the Web Logo tool [51]. The height of each nucleotide in a sequence logo reflects the occurrence of the nucleotide in the cis element. Aligned hexamers were also used to generate PSSM, which were used to search sequences containing poly(A) sites. For each position in a given cis element, the score was calculated by: S(n,p) = log2(f(n,p)/f(n)) where S(n,p) is the score for nucleotide n at position p, f(n,p) is the frequency of occurrence of nucleotide n at position p, and f(n) is the frequency of occurrence of nucleotide n in a specific poly(A) region. Sequences with positive scores compared with PSSM were called hits. Additional data files The following additional data are available with the online version of this paper. Additional data file 1 is a PDF containing supplemental tables and figures. Supplementary Material Additional data file 1 A PDF containing supplemental tables and figures. The tables list: Table 1, tissues, cDNA libraries, genes, ESTs, and poly(A) sites used in this study; Table 2, poly(A) site usage of type II genes; Table 3, poly(A) site usage of type III genes; Table 4, usage of strong and weak poly(A) sites using 60% as cutoff; Table 5, usage of strong and weak poly(A) sites using 75% as cutoff; Table 6, usage of strong and weak poly(A) sites using 90% as cutoff. The figures show: Figure 1, Schematic representation of three types of genes; Figure 2, Boxplots of mRNA expression levels of polyadenylation-related factors in brain tissues versus other tissues; Figure 3, selection of hexamers in different poly(A) regions; Figure 4, multiple sequence alignments of PTB and nPTB and CstF-64 and τCstF-64. Click here for file
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, CA USA )
                1553-7390
                1553-7404
                5 May 2016
                May 2016
                : 12
                : 5
                : e1005979
                Affiliations
                [001]Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
                University of Basel, SWITZERLAND
                Author notes

                The author has declared that no competing interests exist.

                Article
                PGENETICS-D-16-00304
                10.1371/journal.pgen.1005979
                4858265
                27149371
                2fe0ffde-a92f-452b-b750-b70f8e9a433c
                © 2016 Oliver Mühlemann

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                Page count
                Figures: 1, Tables: 0, Pages: 4
                Funding
                The author’s research is supported by the NCCR RNA & Disease and by grant 31003A-162986, both funded by the Swiss National Science Foundation. The funders had no influence on the preparation of the article.
                Categories
                Perspective
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                Messenger RNA
                Biology and Life Sciences
                Cell Biology
                Cell Processes
                Cell Cycle and Cell Division
                Meiosis
                Spermatogenesis
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Meiosis
                Spermatogenesis
                Biology and Life Sciences
                Physiology
                Reproductive Physiology
                Spermatogenesis
                Medicine and Health Sciences
                Physiology
                Reproductive Physiology
                Spermatogenesis
                Biology and Life Sciences
                Cell Biology
                Cellular Types
                Animal Cells
                Germ Cells
                Sperm
                Spermatocytes
                Biology and Life Sciences
                Cell Biology
                Cellular Types
                Animal Cells
                Germ Cells
                Sperm
                Spermatids
                Biology and Life Sciences
                Genetics
                Gene Expression
                Gene Regulation
                Biology and life sciences
                Biochemistry
                Proteins
                RNA-binding proteins
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Transcriptome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Transcriptome Analysis
                Biology and Life Sciences
                Genetics
                Gene Expression
                Polyadenylation

                Genetics
                Genetics

                Comments

                Comment on this article