Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Transcript changes in Vibrio cholerae in response to salt stress

, , , ,

Gut Pathogens

BioMed Central

Vibrio cholerae, Salt stress, Transcription, PCR array

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Vibrio cholerae, which is a serious human intestinal pathogen, often resides and thrives in estuaries but requires major self-regulation to overcome intestinal hyperosmotic stress or high salt stress in water and food. In the present study, we selected multiple O1 and O139 group V. cholerae strains that were isolated from different regions and during different years to study their salt tolerance. Based on the mechanisms that other bacteria use to respond to high salt stress, we selected salt stress-response related genes to study the mechanisms which V. cholerae responds to high salt stress.

      V. cholerae strains showed salt-resistance characteristics that varied in salt concentrations from 4% to 6%. However, group O1 and group O139 showed no significant difference in the degree of salt tolerance. The primary responses of bacteria to salt stress, including Na + exclusion, K + uptake and glutamate biosynthesis, were observed in V. cholerae strains. In addition, some sigma factors were up-regulated in V. cholerae strains, suggesting that V. cholerae may recruit common sigma factors to achieve an active salt stress response. However, some changes in gene transcript levels in response to salt stress in V. cholerae were strain-specific. In particular, hierarchical clustering of differentially expressed genes indicated that transcript levels of these genes were correlated with the degree of salt tolerance. Therefore, elevated transcript levels of some genes, including sigma factors and genes involved in peptidoglycan biosynthesis, may be due to the salt tolerance of strains. In addition, high salt-tolerant strains may recruit common as well as additional sigma factors to activate the salt stress response.

      Electronic supplementary material

      The online version of this article (doi:10.1186/s13099-014-0047-8) contains supplementary material, which is available to authorized users.

      Related collections

      Most cited references 28

      • Record: found
      • Abstract: found
      • Article: not found

      Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes

      Background Gene-expression analysis is increasingly important in many fields of biological research. Understanding patterns of expressed genes is expected to provide insight into complex regulatory networks and will most probably lead to the identification of genes relevant to new biological processes, or implicated in disease. Two recently developed methods to measure transcript abundance have gained much popularity and are frequently applied. Microarrays allow the parallel analysis of thousands of genes in two differentially labeled RNA populations [1], while real-time RT-PCR provides the simultaneous measurement of gene expression in many different samples for a limited number of genes, and is especially suitable when only a small number of cells are available [2,3,4]. Both techniques have the advantage of speed, throughput and a high degree of potential automation compared to conventional quantification methods, such as northern-blot analysis, ribonuclease protection assay, or competitive RT-PCR. Nevertheless, these new approaches require the same kind of normalization as the traditional methods of mRNA quantification. Several variables need to be controlled for in gene-expression analysis, such as the amount of starting material, enzymatic efficiencies, and differences between tissues or cells in overall transcriptional activity. Various strategies have been applied to normalize these variations. Under controlled conditions of reproducible extraction of good-quality RNA, the gene transcript number is ideally standardized to the number of cells, but accurate enumeration of cells is often precluded, for example when starting with solid tissue. Another frequently applied normalization scalar is the RNA mass quantity, especially in northern blot analysis. There are several arguments against the use of mass quantity. The quality of RNA and related efficiency of the enzymatic reactions are not taken into account. Moreover, in some instances it is impossible to quantify this parameter, for example, when only minimal amounts of RNA are available from microdissected tissues. Probably the strongest argument against the use of total RNA mass for normalization is the fact that it consists predominantly of rRNA molecules, and is not always representative of the mRNA fraction. This was recently evidenced by a significant imbalance between rRNA and mRNA content in approximately 7.5% of mammary adenocarcinomas [5]. Also, it has been reported that rRNA transcription is affected by biological factors and drugs [6,7,8]. Further drawbacks to the use of 18S or 28S rRNA molecules as standards are their absence in purified mRNA samples, and their high abundance compared to target mRNA transcripts. The latter makes it difficult to accurately subtract the baseline value in real-time RT-PCR data analysis. To date, internal control genes are most frequently used to normalize the mRNA fraction. This internal control - often referred to as a housekeeping gene - should not vary in the tissues or cells under investigation, or in response to experimental treatment. However, many studies make use of these constitutively expressed control genes without proper validation of their presumed stability of expression. But the literature shows that housekeeping gene expression - although occasionally constant in a given cell type or experimental condition - can vary considerably (reviewed in [9,10,11,12]). With the increased sensitivity, reproducibility and large dynamic range of real-time RT-PCR methods, the requirements for a proper internal control gene have become increasingly stringent. In this study, we carried out an extensive evaluation of 10 commonly used housekeeping genes in 13 different human tissues, and outlined a procedure for calculating a normalization factor based on multiple control genes for more accurate and reliable normalization of gene-expression data. Furthermore, this normalization factor was validated in a comparative study with frequently applied microarray scaling factors using publicly available microarray data. Results Expression profiling of housekeeping genes Primers were designed for ten commonly used housekeeping genes (ACTB, B2M, GAPD, HMBS, HPRT1, RPL13A, SDHA, TBP, UBC and YWHAZ) (see Table 1 for full gene name, accession number, function, chromosomal localization, alias, existence of processed pseudogenes, and indication that primers span an intron; see Table 2 for primer sequences). Special attention was paid to selecting genes that belong to different functional classes, which significantly reduces the chance that genes might be co-regulated. The expression level of these 10 internal control genes was determined in 34 neuroblastoma cell lines (independently prepared in different labs from different patients), 20 short-term cultured normal fibroblast samples from different individuals, 13 normal leukocyte samples, 9 normal bone-marrow samples, and 9 additional normal human tissues from pooled organs (heart, brain, fetal brain, lung, trachea, kidney, mammary gland, small intestine and uterus). The raw expression values are available as a tab-delimited file (see Additional data files). Single control normalization error To determine the possible errors related to the common practice of using only one housekeeping gene for normalization, we calculated the ratio of the ratios of two control genes in two different samples (from the same tissue panel) and termed it the single control normalization error, E (see Materials and methods). For two ideal internal control genes (constant ratio between the genes in all samples), E equals 1. In practice, observed E values are larger than 1 and constitute the erroneous E-fold expression difference between two samples, depending on the particular housekeeping gene used for normalization. E values were calculated for all 45 two-by-two combinations of control genes and 865 two-by-two sample combinations within the available tissue panels (neuroblastoma, fibroblast, leukocyte, bone marrow and a series of normal tissues from Clontech; that is, a total of 38,925 data points) (Figure 1). In addition, the systematic error distribution was calculated by analysis of repeated runs of the same control gene. The average 75th and 90th percentile E values are 3.0 (range 2.1-3.9), and 6.4 (range 3.0-10.9), respectively. Gene-stability measure and ranking of selected housekeeping genes It is generally accepted that gene-expression levels should be normalized by a carefully selected stable internal control gene. However, to validate the presumed stable expression of a given control gene, prior knowledge of a reliable measure to normalize this gene in order to remove any nonspecific variation is required. To address this circular problem, we developed a gene-stability measure to determine the expression stability of control genes on the basis of non-normalized expression levels. This measure relies on the principle that the expression ratio of two ideal internal control genes is identical in all samples, regardless of the experimental condition or cell type. In this way, variation of the expression ratios of two real-life housekeeping genes reflects the fact that one (or both) of the genes is (are) not constantly expressed, with increasing variation in ratio corresponding to decreasing expression stability. For every control gene we determined the pairwise variation with all other control genes as the standard deviation of the logarithmically transformed expression ratios, and defined the internal control gene-stability measure M as the average pairwise variation of a particular gene with all other control genes. Genes with the lowest M values have the most stable expression. Assuming that the control genes are not co-regulated, stepwise exclusion of the gene with the highest M value results in a combination of two constitutively expressed housekeeping genes that have the most stable expression in the tested samples. To manage the large number of calculations, we have written a Visual Basic Application (VBA) for Microsoft Excel - termed geNorm - that automatically calculates the gene-stability measure M for all control genes in a given set of samples (geNorm is freely available from the authors on request). The program enables elimination of the worst-scoring housekeeping gene (that is, the one with the highest M value) and recalculation of new M values for the remaining genes. Using this VBA applet, we ranked the ten control genes in the five tissue panels tested according to their expression stability (Figure 2, Table 3). In addition, the systematic variation was calculated as the pairwise variation, V, for repeated RT-PCR experiments on the same gene, reflecting the inherent machine, enzymatic and pipet variation. Normalization factor calculation based on the geometric mean of multiple control genes We concluded that in order to measure expression levels accurately, normalization by multiple housekeeping genes instead of one is required. Consequently, a normalization factor based on the expression levels of the best-performing housekeeping genes must be calculated. For accurate averaging of the control genes, we propose to use the geometric mean instead of the arithmetic mean, as the former controls better for possible outlying values and abundance differences between the different genes. The number of genes used for geometric averaging is a trade-off between practical considerations and accuracy. It is obvious that an accurate normalization factor should not include the rather unstable genes that were observed in some tissues. On the other hand, it remains relatively impractical to quantify, for example, eight control genes when only a few target genes need to be studied, or when only minimal amounts of RNA are available. Furthermore, it is a waste of resources to quantify more genes than necessary if all genes are relatively stably expressed and if the normalization factor does not significantly change whether or not more genes are included. Taking all this into consideration, we recommend the minimal use of the three most stable internal control genes for calculation of an RT-PCR normalization factor (NF n , n = 3), and stepwise inclusion of more control genes until the (n + 1)th gene has no significant contribution to the newly calculated normalization factor (NF n + 1). To determine the possible need or utility of including more than three genes for normalization, the pairwise variation V n/n + 1 was calculated between the two sequential normalization factors (NF n and NF n + 1) for all samples within the same tissue panel (with a ij = NF n,i and a ik = NF n + 1,i , n the number of genes used for normalization (3 ≤ n ≤ 9), and i the sample index; see Equations 2 and 3 in Materials and methods). A large variation means that the added gene has a significant effect and should preferably be included for calculation of a reliable normalization factor. For all tissue types, normalization factors were calculated for the three most stable control genes (that is, those with the lowest M value) and for seven additional factors by stepwise inclusion of the most stable remaining control gene. Pairwise variations were subsequently calculated for every series of NF n and NF n + 1 normalization factors, reflecting the effect of adding an (n + 1)th gene (Figure 3a). It is apparent that the inclusion of a fourth gene has no significant effect (that is, low V 3/4 value) for leukocytes, fibroblasts and bone marrow. This is also illustrated by the nearly perfect correlation between NF3 and NF4 values, as shown for fibroblasts in Figure 3b. On the basis of these data, we decided to take 0.15 as a cut-off value, below which the inclusion of an additional control gene is not required. For neuroblastoma and the pool of normal tissues, one and two additional genes, respectively, are necessary for reliable normalization (see also Figure 3b). The high V 8/9 and V 9/10 values for the normal pool, neuroblastoma and leukocytes corroborate very well the findings obtained by stepwise exclusion of the worst-scoring control gene (Figure 2). This analysis showed an initial steep decrease in average M value, pointing at two aberrantly expressed control genes for leukocytes and one unstable gene for neuroblastoma and the pool of normal tissues. Furthermore, the need to include additional control genes for these last two tissue panels is in keeping with the high variation in control-gene expression, as evidenced from Figure 2. Validation of proposed real-time RT-PCR normalization factors To assess the validity of the established gene-stability measure, that is, that genes with the lowest M values have indeed the most stable expression, we determined the gene-specific variation for each control gene as the variation coefficient of the expression levels after normalization. This coefficient should be minimal for proper housekeeping genes. Three different normalization factors were calculated, based on the geometric mean of three genes with, respectively, the lowest (NF3(1-3)), the highest (NF3(8-10)), and intermediate M values (NF3(6-8)) (as determined by geNorm). We subsequently determined the average gene-specific variation of the three genes with the most stable expression (that is, the lowest variation coefficient) for each normalization factor and within each tissue panel (Figure 4a). It is clear that the gene-specific variation in all tissue panels is by far the smallest when the data are normalized to NF3(1-3). This demonstrates that the gene-stability measure effectively identified the control genes with the most stable expression. To verify that a high M value is characteristic of an unstable or differentially expressed gene, we analyzed the expression level of MYCN - a highly differentially expressed protooncogene in neuroblastoma with prognostic value [13] - together with the set of ten housekeeping genes. MYCN was readily identified as the most differentially expressed gene, with an M value of 6.02 compared to 2.17 for the least stable control gene (B2M) in neuroblastoma. It was further observed that normalization with a single control gene consistently resulted in significantly higher gene-specific variations of the other control genes (data not shown), which underscores the improvement in normalization by using multiple housekeeping genes. To show that the associations between the best control genes are independent of cell proliferation, we analyzed the expression level of the proliferation marker PCNA in the neuroblastoma cancer cell lines, and determined the Spearman rank correlation coefficient between the raw expression levels of the four best housekeepers and the marker gene PCNA. From this analysis, it was clear that the control genes were - as expected - significantly correlated (p < 0.001, correlation coefficient between 0.60 and 0.76). In contrast, no correlation was observed between PCNA and three of the four control genes, and only a weak correlation (p = 0.024, coefficient = 0.43) between PCNA and control gene HPRT1. These data firmly demonstrate that the most stable control genes (identified by the geNorm algorithm) are not per se linked to the state of cell proliferation of the samples. To further validate the accuracy of geometric averaging of carefully selected control genes for normalization, the geometric means of housekeeping-gene expression levels obtained from publicly available microarray data were compared with commonly applied microarray normalization factors calculated for the same data. For this purpose, an 8,000-gene array data set [14] was chosen, containing nine of the ten control genes evaluated in this RT-PCR study. Two commonly applied microarray normalization factors (based on median ratio normalization, and total intensity normalization) [15,16,17] were determined for eight randomly selected hybridization sets. Subsequently, for each hybridization set, the background-corrected expression levels of nine housekeeping genes for the two fluorescence channels were imported into geNorm and ranked, as described for the RT-PCR data. As these microarray data originate from hybridizations of cell lines from various histological origin versus a reference pool of multiple cell lines, we have calculated the geometric mean of the five most stable control genes (NF5) for each hybridization set, in accordance to the recommendations for reliable normalization within a heterogeneous tissue panel (see previous paragraph). Alternatively, internal control genes were excluded in a stepwise manner until the M values of the remaining genes were below 0.7 (experimental value shown to eliminate the most variable and outlying genes in this microarray dataset). Depending on the hybridization set, seven to nine genes fitted this criterion, upon which the geometric mean was calculated (NF M < 0.7). Both normalization factors (NF5 and NF M < 0.7) were shown to be similar to the calculated microarray normalization factors (Figure 4b). Tissue-specific housekeeping gene expression To compare the control gene-expression levels within the heterogeneous group of all 13 tested tissues, the same set of control genes should be used for normalization. We therefore calculated the geometric mean of six control genes that were withheld from the set of ten genes after elimination of the two genes with the highest M value within each tissue panel (that is, B2M, RPL13A, ACTB and HMBS) (see Table 3). Given the large variety of tested tissues, this is the optimal strategy to eliminate most variation, and to allow direct comparison between the different samples. Under the assumption of equal PCR threshold cycle values for equal transcript numbers of different genes, an estimation of the transcript abundance of the various control genes can be made. Figure 5 shows that the ten tested genes belong to various abundance classes, with an approximately 400-fold expression difference between the most abundant (ACTB) and the rarest (HMBS) transcript. Although the overall abundance of a given control gene in the different tissues is relatively similar, we clearly observe tissue-specific expression differences, for example, B2M expression level is 112-fold higher in leukocytes compared to fetal brain, and ACTB shows an expression difference of 22-fold between fibroblasts and heart tissue. It is also clear that some genes have a relatively constant expression level (for example, UBC and HPRT1) compared to the differential expression pattern of others (for example, B2M and ACTB). Discussion Accurate normalization of gene-expression levels is an absolute prerequisite for reliable results, especially when the biological significance of subtle gene-expression differences is studied. Still, little attention has been paid to the systematic study of normalization procedures and the impact on the conclusions. For RT-PCR, there is a general consensus on using a single control gene for normalization purposes. A comprehensive literature analysis of expression studies that were published in high-impact journals during 1999 indicated that GAPD, ACTB, 18S and 28S rRNA were used as single control genes for normalization in more than 90% of cases [11]. As numerous studies reported that housekeeping gene expression can vary considerably [6,9,10,11,12], the validity of the conclusions is highly dependent on the applied control. Some laboratories have tried to find the optimal control gene for their experimental system, and often rRNA molecules were proposed as best references. These studies should be approached with some caution, as often only the variation in expression of the tested genes with respect to the mass loading of total RNA was assessed. As rRNA molecules make up the bulk of total RNA, they should indeed correlate very well with the total RNA mass, but that does not necessarily make them good control genes. As outlined in the introduction, total RNA and rRNA levels are not proper references, because of the observed imbalance between rRNA and mRNA fractions. In addition to searching for a stable control gene, we aimed at determining the errors related to the common practice of single control normalization. In this study, we provide evidence that a conventional normalization strategy based on a single housekeeping gene leads to erroneous normalization up to 3.0- and 6.4-fold in 25% and 10% of the cases, respectively, with sporadic cases showing error values above 20. This analysis showed that a few control genes were unstable and significantly differentially expressed in some tissue panels, as evidenced by the decrease from 5.9 to 4.5 for the 90th-percentile single control normalization error value for neuroblastoma when the B2M gene is omitted (data not shown). This finding agrees with the reported differential expression of B2M in neuroblastoma, corresponding to the stage of differentiation of the tumor cells [18]. The error-distribution curves not only reflect the stability of expression of the applied controls, but also the sample heterogeneity within a tissue panel, as noted from the less steep curve for the heterogeneous set of normal pooled tissues compared to the other, relatively homogeneous, tissue panels. In this regard, the issue has been raised that finding proper control genes is even more important when working with tissues of different histological origin [9]. The single control normalization error values point to inherent noisy oscillations in expression levels of the control genes, a finding which has been corroborated in other large-scale studies where several thousand genes were measured in different cells or tissues by microarray analysis. No gene was found on an 8,000-feature array that did not vary by ratios of at least twofold across a panel of 60 cell lines [14], and a set of genes frequently used for normalization (including GAPD and ACTB) was found to vary in expression by 7- to 23-fold [9]. Taken together, our data and these studies clearly show that ideal and universal control genes do not exist. This warrants the search for stably expressed genes in each experimental system, and for the development of an accurate normalization strategy. To validate the expression stability of the tested control genes without any prior assumption of a metric for standardization, we had initially measured the correlation between the raw, non-normalized expression levels of any two control genes, which should be nearly perfect for proper control genes. We observed, however, that the data range between the minimum and maximum expression levels, or any outlying value, could have a profound influence on the slope of the regression line, and consequently on the value of the correlation coefficient. This made Pearson and Spearman correlation coefficients unsuitable for this kind of analysis. We have therefore developed a new stability measure, based on the principle that the expression ratio of two proper control genes should be identical in all samples, regardless of the experimental condition or cell type, with increasing ratio variation corresponding to decreasing expression stability of one (or both) of the tested genes. The proposed standard deviation of log-transformed control gene ratios is a robust measure for the variation between two control genes, as it does not impose any requirements for normality or homoscedasticity of the data points. Furthermore, this measure is independent of the abundance difference between the genes, and is equally affected by any outlying or extreme ratio (that is, outliers for a sample with low or high overall expression, or outliers caused by an upregulated or downregulated gene have an equivalent increase in pairwise variation V). Logarithmic transformation of the ratios is required for symmetrical distribution of the data around zero, resulting in equal absolute values (but opposite signs) for a given ratio and the inverse ratio. As a result, the standard deviation of log-transformed ratios is identical to the standard deviation of log-transformed inverse ratios, which makes this measure characteristic for every combination of two genes. Having established a robust measure to assess the variation in expression of two control genes, we subsequently defined a gene-stability measure M as the average pairwise variation between a particular gene and all other control genes. Using a VBA applet geNorm developed in-house, we ranked ten commonly used housekeeping genes belonging to different functional and abundance classes according to their expression stability in five tested tissue panels. The clear decrease of M of the remaining control genes during stepwise exclusion of the worst-scoring gene points at differences in the stability of gene-specific expression and demonstrates that the remaining genes are more stably expressed than the excluded genes. Some tissue panels show a relatively steep initial decline, which reflects the exclusion of one or more aberrantly expressed control genes (for example, ACTB and HMBS for leukocytes), as also noticed from the single control normalization error analyses (see above). The average gene stability values of the remaining genes during stepwise elimination of the least stable control genes also indicates tissue-specific differences, with bone marrow and the pool of normal tissues having the lowest and highest overall expression variation, respectively. The latter is no surprise, given the larger tissue heterogeneity in this panel. The question of whether the observed high variation for neuroblastoma is a cancer-related phenomenon of deregulated expression is currently under further investigation. From these analyses, it is clear that there is no universal control gene suitable for all cell types. ACTB and B2M appear to be the worst-scoring genes, whereas UBC, GAPD and HPRT1 seem to be the best overall control genes, each belonging to the four most stable genes in four out of five tested tissues. However, these generalizations should be treated with caution. B2M appears to be one of the least stable control genes, but is nevertheless a good choice for normalization of leukocyte expression levels. This clearly demonstrates that a proper choice of housekeeping genes is highly dependent on the tissues or cells under investigation. This is even more important when considering the differences in transcript abundance of some control genes between different tissues. The large expression differences between the tissues tested for B2M and ACTB, for instance, would definitely result in large normalization errors if they were used for standardization. Interestingly, the observed tissue-specific expression of these control genes is in keeping with their known role or function: there is high B2M expression in leukocytes, where it is a major cell-surface marker, and relatively low non-muscle cytoskeletal ACTB expression in heart tissue, which is predominantly of muscular origin. In view of the inherent variation in expression of housekeeping genes, we recommend the use of at least three proper control genes for calculating a normalization factor, and present a procedure to determine whether or not more - and if so, how many - control genes were required for reliable normalization. This analysis clearly showed that three stable control genes sufficed for accurate normalization of samples with relatively low expression variation, whereas other tissue panels required a fourth, or even a fifth control gene to capture the observed variation. The purpose of normalization is to remove the sampling differences (such as RNA quantity and quality) in order to identify real gene-specific variation. For proper internal control genes, this variation should be minimal or none. To validate the gene-stability measure M and the geNorm algorithm to identify the most stable control genes in a set of samples, we have calculated the gene-specific variation for each gene as the coefficient of variation of normalized expression levels. To this end, the raw expression values were standardized to different normalization factors, calculated as the geomean of the most, intermediate, or least stable control genes (as determined by geNorm). The rationale of this analysis is that a normalization factor based on proper internal control genes should remove all nonspecific variation. In contrast, unstable control genes cannot completely remove the nonspecific variation, and even add more variation, resulting in larger so-called gene-specific variations for the tested control genes. This analysis clearly demonstrated that most nonspecific variation was removed when the most stable control genes (as determined by geNorm) were used for normalization, which proves that the novel stability measure and strategy presented here effectively allowed the stability of gene expression in the different tissue panels to be assessed. Further validation demonstrated that the geometric mean of carefully selected control genes is an accurate estimate of the mRNA transcript fraction, as determined by comparison with frequently applied microarray normalization factors. Although both RT-PCR normalization factors based on geometric averaging are relatively similar, the one based on at least seven control genes (that is, NF M < 0.7) is slightly more equivalent to the microarray-scaling factors. Two possible explanations can account for this observation. First, the five most stable control genes as determined by geNorm are based on only two RNA samples (that is, a Cy3-labeled reference pool, and a Cy5-labeled test sample), in contrast to the RT-PCR data, where 9 to 34 samples were used, resulting in more reliable estimation of the expression stability. Second, recent technical reports clearly state that array hybridization analyses experience considerable - often underestimated - variation and uncertainty at several levels. Accurate background fluorescence correction and spot quality assessment, among others, have been described as critical issues for reliable ratio estimation [19,20,21]. The higher variability associated with array hybridization results might thus explain the need for more control genes to normalize the data. Nevertheless, this study clearly showed that normalization based on the geometric mean of carefully selected control genes results in equivalent ratio estimation compared to commonly applied array scale factors, which validates its use for RT-PCR normalization. In addition, the method presented could easily be applied to normalize gene-expression levels resulting from microarray hybridization experiments, where only a limited number of genes are spotted, including some housekeeping genes. In conclusion, we described and validated a procedure to identify the most stable control genes in a given set of tissue samples, and to determine the optimal number of genes required for reliable normalization of RT-PCR data. The strategy presented can be applied to any number or kind of genes or tissues, and should allow more accurate gene-expression profiling. This is of utmost importance for studying the biological significance of subtle expression differences, and for confirmatory and/or extended analyses of microarray results by means of RT-PCR. Materials and methods Sample preparation Thirty-four neuroblastoma cell lines were grown to subconfluency according to standard culture conditions. RNA was isolated using the RNeasy Midi Kit (Qiagen) according to the manufacturer's instructions. Nine RNA samples from pooled normal human tissues (heart, brain, fetal brain, lung, trachea, kidney, mammary gland, small intestine and uterus) were obtained from Clontech. Blood and fibroblast biopsies were obtained from different normal healthy individuals. Thirteen leukocyte samples were isolated from 5 ml fresh blood using Qiagen's erythrocyte lysis buffer. Fibroblast cells from 20 upper-arm skin biopsies were cultured for a short time (3-4 passages) and harvested at subconfluency as described [22]. Bone marrow samples were obtained from nine patients with no hematological malignancy. Total RNA of leukocyte, fibroblast and bone marrow samples was extracted using Trizol (Invitrogen), according to the manufacturer's instructions. Real-time RT-PCR DNase treatment, cDNA synthesis, primer design and SYBR Green I RT-PCR were carried out as described [23]. In brief, 2 μg of each total RNA sample was treated with the RQ1 RNase-free DNase according to the manufacturer's instructions (Promega). Treated RNA samples were desalted (to prevent carry over of magnesium) before cDNA synthesis using Microcon-100 spin columns (Millipore). First-strand cDNA was synthesized using random hexamers and SuperscriptII reverse transcriptase according to the manufacturer's instructions (Invitrogen), and subsequently diluted with nuclease-free water (Sigma) to 12.5 ng/μl cDNA. RT-PCR amplification mixtures (25 μl) contained 25 ng template cDNA, 2x SYBR Green I Master Mix buffer (12.5 μl) (Applied Biosystems) and 300 nM forward and reverse primer. Reactions were run on an ABI PRISM 5700 Sequence Detector (Applied Biosystems). The cycling conditions comprised 10 min polymerase activation at 95°C and 40 cycles at 95°C for 15 sec and 60°C for 60 sec. Each assay included (in duplicate): a standard curve of four serial dilution points of SK-N-SH or IMR-32 cDNA (ranging from 50 ng to 50 pg), a no-template control, and 25 ng of each test cDNA. All PCR efficiencies were above 95%. Sequence Detection Software (version 1.3) (Applied Biosystems) results were exported as tab-delimited text files and imported into Microsoft Excel for further analysis. The median coefficient of variation (based on calculated quantities) of duplicated samples was 6%. Single control normalization error E For any given m tissue samples, real-time RT-PCR gene-expression levels a ij of n internal control genes are measured. For every combination of two tissue samples p and q, and every combination of two internal control genes j and k, the single control normalization error E was calculated (Equation 1). This is the fold expression difference between samples p and q when normalized to housekeeping gene j or k. ( j,k [1,n], p,q [1,m], j ≠ k and p ≠ q): Internal control gene-stability measure M For every combination of two internal control genes j and k, an array A jk of m elements is calculated which consist of log2-transformed expression ratios a ij /a ik (Equation 2). We define the pairwise variation V jk for the control genes j and k as the standard deviation of the A jk elements (Equation 3). The gene-stability measure M j for control gene j is the arithmetic mean of all pairwise variations V jk (Equation 4). ( j,k [1,n] and j ≠ k): V jk = st.dev (A jk )     (3) Normalization of array data Publicly available raw microarray data [14] were downloaded as tab-delimited files. Eight hybridization data sets were randomly selected and imported into Microsoft Excel software for further manipulation (MCF7, DU-145, 786-0, BC2, K562, A549, U251, and SK-OV-3). For each hybridization array, all spots with Cy3 or Cy5 fluorescence intensities below the average overall background level plus one standard deviation were discarded. Subsequently, a local background correction for each spot was applied. Two scale factors were calculated for each slide on the basis of median ratio normalization (median ratio set to 1) and total intensity normalization (equalized sum of fluorescence intensities for both channels). Nine housekeeping genes were identified by BLAST similarity or keyword search against the database of cDNA clones present on the array (see IMAGE clones listed in Table 1). Additional data files The raw expression values are available as a tab-delimited file. Supplementary Material Additional data file 1 Raw expression values Click here for additional data file
        Bookmark
        • Record: found
        • Abstract: found
        • Article: found
        Is Open Access

        A novel and universal method for microRNA RT-qPCR data normalization

        Background MicroRNAs (miRNAs) are an important class of gene regulators, acting on several aspects of cellular function such as differentiation, cell cycle control and stemness. Not surprisingly, deregulated miRNA expression has been implicated in a wide variety of diseases, including cancer [1]. Moreover, miRNA expression profiling of different tumor entities resulted in the identification of miRNA signatures correlating with patient diagnosis, prognosis and response to treatment [2]. Despite the small size of miRNA molecules, several technologies have been developed that enable high-throughput and sensitive miRNA profiling, such as microarrays [3-8], real-time quantitative PCR (RT-qPCR) [9,10] and bead-based flow cytometry [2]. In terms of accuracy and specificity, RT-qPCR has become the method of choice for measuring gene expression levels, both for coding and non-coding RNAs. However, the accuracy of the results is largely dependent on proper data normalization. As numerous variables inherent to an RT-qPCR experiment need to be controlled for in order to differentiate experimentally induced variation from true biological changes, the use of multiple reference genes is generally accepted as the gold standard for RT-qPCR data normalization [11]. Typically, a set of candidate reference genes is evaluated in a pilot experiment with representative samples from the experimental condition(s). Ideally these candidate reference genes belong to different functional classes, significantly reducing the possibility of confounding co-regulation. In case of miRNA profiling, only few candidate reference miRNAs have been reported [12]. Generally, other small non-coding RNAs are used for normalization. These include both small nuclear RNAs (for example, U6) and small nucleolar RNAs (for example, U24, U26). Strategies for normalization of high-dimensional expression profiling experiments (using, for example, microarray technology, but recently also transcriptome sequencing) generally take advantage of the huge amount of data generated and often use (almost) all available data points. These strategies range from a straightforward approach based on the mean or median expression value to more complex algorithms such as lowess normalization, quantile normalization or rank invariant normalization [13]. In this study we successfully introduce the mean expression value in a given sample to normalize high-throughput miRNA RT-qPCR data and compare its performance to the currently adopted approach based on small nuclear/nucleolar RNAs. In addition, we provide a workflow for proper data normalization of both large scale (whole miRNome) and small scale miRNA profiling experiments. Results Stability of the mean miRNA expression To evaluate the suitability of the mean miRNA expression value as a normalization factor, we profiled 448 miRNAs and controls in a subset of 61 neuroblastoma (NB) tumor samples and 384 miRNAs and controls in 49 T-cell acute lymphoblastic leukemia (T-ALL) samples, 18 leukemias with EVI1 overexpression, 8 normal human tissues and 11 normal bone marrow samples using a high throughput miRNA profiling platform based on Megaplex stem-loop RT-qPCR technology in combination with a limited cycle pre-amplification [9,10]. For each of the above mentioned sample sets all 18 available small RNA controls were quantified. For each individual sample, the mean expression value was calculated based on those miRNAs that were expressed according to a Cq detection cut-off of 35 PCR cycles [10] (Cq, or quantification cycle, is the standard name for the Ct or Cp value according to Real-time PCR Data Markup Language (RDML) guidelines [14]). Expression stability of the mean expression value, the small RNA controls and a selection of three miRNAs (miR-17-5p, miR-191 and miR-103) previously proposed as universal reference miRNAs was then assessed for each sample set using the geNorm algorithm [11]. To reduce the risk of including genes that are putatively co-regulated, a number of small RNA controls residing within the same gene cluster were discarded, retaining only one representative small RNA control per cluster. This was the case for RNU44, U47 and U75 on 1q25, and RNU58A and RNU58B on 18q21, of which RNU44 and RNU58A were randomly retained for further analysis. Naturally, only those small RNA controls that are expressed in all samples within a sample set were evaluated for their expression stability. geNorm analysis clearly shows that the mean expression value is a suitable normalization factor in the different tissue groups under investigation. In terms of expression stability, the mean expression value is top ranked in the T-ALL samples, the NB samples, the normal human tissues and the normal bone marrow samples when compared to 16, 17, 14 and 18 candidate small RNA controls/miRNAs, respectively (Figure 1 and Additional data file 1). For the leukemia samples with EVI1 overexpression the mean expression value ranked second (compared to 17 small RNA controls/miRNAs; Additional data file 1). Several of the high ranking small RNA controls are the same ones proposed by the manufacturer as most suitable for miRNA normalization. The expression stability of one of the so-called universal reference miRNAs (miR-191) proposed by Peltier and Latham [12] equaled that of the mean expression value in the NB sample set. In the other sample sets, none of these three miRNAs performed as well as the mean expression value. When we calculated an alternative mean expression value (only including those miRNAs that are expressed in all samples within a given sample set), it was never as good or better (in terms of suitability as normalization factor) than the mean expression value of all expressed miRNAs. This indicates that the mean expression value more faithfully represents the input amount when all expressed miRNAs per sample are considered. All results obtained with geNorm were independently confirmed with the Normfinder algorithm [15] (data not shown). Figure 1 geNorm expression stability plot. Expression stability of 13 different small RNA controls and the mean expression value in the T-cell acute lymphoblastic leukaemia sample set. The mean expression value shows the highest expression stability across all 49 samples analyzed. Mean expression value normalization reduces technical variation The variation in gene expression data is a combination of biological and technical variation. The purpose of normalization is to reduce the technical variation within a dataset, enabling a better appreciation of the biological variation. We calculated the coefficient of variation (CV) for each individual miRNA across all samples within a given tissue group and used it as a normalization performance measure. Lower CVs hereby denote better removal of experimentally induced noise [16,17]. Relative expression data were normalized using either the mean expression value of all expressed miRNAs or the mean of the most stable small RNA controls (as identified by geNorm; arithmetic means were calculated in log space). The optimal number of stable controls was determined on the basis of a pairwise variation analysis between subsequent normalization factors using a cut-off value of 0.15 as described in Vandesompele et al. [11]. The cumulative distribution of the individual CV values was plotted for both raw (not normalized) and normalized data (Figure 2). Figure 2 Cumulative distribution of miRNA coefficient of variation (CV) values. The cumulative distribution of miRNA CV values in the neuroblastoma sample set when no normalization is applied (blue), stable RNA control (RNU24, RNU44, RNU58A and RNU6B) normalization is applied (red), mean expression value normalization is applied (green) or normalization with miRNAs/small RNA controls resembling the mean expression value (Z30, RNU24, miR-361, miR-331 and miR-423) is applied (purple). While normalization using stable small RNA controls clearly results in a significant decrease of the CV value in the NB sample set, this shift is only apparent for the 50% least variable miRNAs (paired sample t-test, P < 0.001; Figure 2 and Additional data file 2). For the 50% most variable miRNAs no significant reduction in variation is observed (P = 0.253; Additional data file 2), indicating that elimination of technical variation is restricted to only half of the miRNAs profiled. In contrast, after normalization with the mean expression value there is an overall decrease in variation that is significant both for the 50% least variable (P < 0.001) and the 50% most variable (P < 0.001) miRNAs (Additional data file 2). Furthermore, a more pronounced reduction in variation is observed compared to stable small RNA control normalization (Figure 2). As true differentially expressed miRNAs predominantly reside in the most variable half of the dataset (50% most variable), only mean expression value normalization is capable of reducing the number of false negatives. Reduction of false positives is possible with both normalization strategies but to different extents as mean expression value normalization results in a stronger decrease of technical variation for the 50% least variable miRNAs. Similar results were obtained for the other sample sets (Additional data file 3 and data not shown). Mean expression value normalization identifies true biological changes in patient samples While the mean expression value is the best ranked normalization factor and significantly reduces technical variation, the question remains how different normalization strategies affect biological changes. To address this issue, we evaluated differential expression of the miRNAs belonging to the mir-17-92 cluster in the NB sample set. The miR-17-92 cluster contains a total of six different miRNAs (miR-17, miR-18a, miR-19a, miR-20a, miR-19b and miR-92) and has recently been shown to be a direct target of the MYC family of transcription factors using chromatin immunoprecipitation (ChIP) [18,19]. In NB cells, MYCN directly binds to the miR-17-92 promoter, resulting in an activation of mir-17-92 expression [18]. Accordingly, NB cells with amplification and activation of the MYCN oncogene display elevated mir-17-92 expression [18,20,21]. To confirm MYCN binding to the miR-17-92 promoter, we performed ChIP-chip experiments using a MYCN-specific antibody in three different NB cell lines, Kelly, IMR5 and WAC2. To assess whether transcripts from this region are actively transcribed and elongated, we additionally analyzed histone marks for active transcription (H3K4me3), repression (H3K27me3), and elongation (H3K36me3) together with MYCN binding. In all tested NB cell lines, binding of MYCN was preferentially found to the miR-17-92 promoter region encompassing the two canonical e-boxes upstream of miR-17 (Additional data file 4). Furthermore, MYCN binding to the miR-17-92 promoter was strongly associated with histone marks for active transcription (H3K4me3) and elongation (H3K36me3) (Additional data file 4). To confirm the MYCN ChIP-chip data, we performed ChIP-qPCR on ChIP samples from WAC2 and IMR5 cells. Both promoter fragments were enriched in the two cell lines under investigation, with fold changes comparable to that of the MDM2 positive control, confirming direct MYCN binding to the miR-17-92 promoter (Additional data file 5). To assess the impact of different normalization strategies on differential miR-17-92 expression, the NB sample set, consisting of 22 MYCN amplified (MNA) and 39 MYCN single copy (MNSC) tumor samples, was normalized using either the mean expression value or the stable small RNA controls. Differential miR-17-92 expression was then evaluated by means of the average fold change in expression between the MNA and MNSC tumor samples (Figure 3). When the data are normalized using the stable small RNA controls, none of the 8 miRNA transcripts that were analyzed reach a 2-fold expression difference and only one miRNA, miR-92, exceeds a 1.5-fold expression difference (fold change = 1.85). Moreover, miR-92 is the only miRNA from the miR-17-92 cluster with a significant differential expression between MYCN amplified and MYCN single copy tumor samples (Mann-Whitney, Benjamini-Hochberg multiple testing correction, P < 0.05). Figure 3 Differential miR-17-92 expression in neuroblastoma tumor samples. Average fold change expression difference of eight different miRNAs residing within the miR-17-92 cluster in MYCN amplified neuroblastoma samples compared to MYCN single copy neuroblastoma samples. Fold changes were calculated upon stable small RNA control (RNU24, RNU44, RNU58A and RNU6B) normalization (dark grey), mean expression value normalization (light grey) and normalization with miRNAs that resemble the mean expression value (miR-425, miR-191 and miR-125a; medium grey. These results are not in line with previous studies reporting differential expression of multiple miRNAs from the miR-17-92 cluster nor do they match our findings, and those of others, regarding the direct interaction between MYCN and the miR-17-92 promoter [18]. Furthermore, our analysis of histone markers bound to the region is more in line with an actively transcribed entire miR-17-92 cluster in MYCN amplified cell lines. When the same dataset is normalized with the mean expression value, 7 miRNAs reach a 1.5-fold expression difference and half of the miRNAs exceed the 2-fold expression difference. All but one miRNA, mir-17-3p, were found to be significantly differentially expressed between MNA and MNSC tumors (Mann-Whitney, Benjamini-Hochberg multiple testing correction, P < 0.05). A recent study by Chen and Stallings [20] reports on differential miRNA expression between MNA and MNSC tumors, measured by stem-loop RT-qPCR. Here, only one miRNA from the five miR-17-92 miRNAs that were evaluated was reported as significantly upregulated in the MNA tumor samples. In that study, miRNA expression data were normalized using two small RNA controls, RNU19 and RNU66. We reanalyzed the same dataset and applied the mean expression value normalization strategy. As expected, all but one miRNA, miR-17-3p, were significantly upregulated in the MNA tumors (Mann-Whitney, Benjamini-Hochberg multiple testing correction, P < 0.05; data not shown). To ascertain that these observations are not restricted to miR-17-92, we identified an additional MYCN regulated miRNA cluster using ChIP-chip. MiR-181a-1 and miR-181b-1 are located within 500 bp of each other and show strong MYCN binding in two MNA NB cell lines, Kelly and IMR5. MYCN binding was strongly associated with histone marks for transcription (H3K4me3) and elongation (H3K36me3) (Additional data file 6). Accordingly, mir-181a and mir-181b expression should be upregulated in MNA NB tumor samples. Upon mean expression value normalization, both miRNAs exceed the 1.5-fold expression difference (FCmir-181a = 2.28, FCmir-181b = 1.67). Upon normalization with stable small RNA controls, only miR-181a has a fold change above 1.5-fold (FCmir-181a = 1.59). For miR-181b, no change in expression could be detected (FCmir-181b = 1.14). These results confirm that the ability of mean expression normalization to extract true biological variation from a dataset is not limited to miR-17-92. Mean expression value normalization identifies true biological changes in cell lines While small RNA control normalization fails to identify differential miR-17-92 expression in patient tumor samples, it has been successfully applied by Fontana and colleagues [18] to detect differential miR-17-92 expression in NB cell lines. To evaluate our method in cell lines, we measured miRNA expression in two NB cell lines also used by Fontana and colleagues, one MYCN single copy (SK-N-AS) and one MYCN amplified (IMR-32). MiR-17-92 fold induction upon mean expression value normalization was consistently higher compared to fold inductions reported by Fontana and colleagues. Further, fold changes for all 5 miRNAs exceed the 1.5-fold expression difference whereas with small RNA control normalization this is only true for 4 out of 5 miRNAs (Additional data file 7). Mean expression value normalization reduces false positive MYCN downregulated miRNAs We sought further support for our new normalization strategy by investigating the overall differential miRNA expression in the two subsets of NB tumor samples. miRNAs that were not expressed in all samples were excluded from the analysis to avoid over- or underestimation of fold changes. Upon normalization with stable small RNA controls, we found an average miRNA expression fold change of 0.756, suggesting that the majority of the miRNAs were downregulated in the MNA tumor samples. Indeed, 89.1% of the miRNAs displaying a minimum 1.5-fold expression difference are expressed at lower levels in the MNA tumor samples (Additional data file 8) indicating a bias towards the identification of downregulated miRNAs. When normalizing with the mean expression value the average miRNA expression fold change levels out to a value of 1.036, representing a more balanced situation. Here, only 57.6% of the differentially expressed miRNAs are downregulated in the MNA tumor samples. Moreover, the fold change expression difference for the 10% most downregulated miRNAs, identified after stable small RNA control normalization, remains largely unaffected upon normalization with the mean expression value (Additional data file 9), suggesting that this normalization strategy more adequately reduces the number of false positive MYCN downregulated miRNAs compared to stable small RNA control normalization. This is in perfect agreement with the larger reduction of variation obtained with mean expression value normalization (see above). miRNAs resembling the mean The use of the mean expression value for data normalization implies that a large number of genes are profiled (450 or 384 in this study). Such screening experiments are often performed in an initial phase but almost never in subsequent validation studies that focus on a limited number of miRNAs. We therefore assessed whether we could identify miRNAs or small RNA controls that resemble the mean expression value and whether their geometric mean could be successfully used to mimic mean expression value normalization. After log transformation, we calculated the geNorm pairwise variation V value to determine robust similarity in expression of a given gene with the mean expression value. For each tissue group the optimal number of miRNAs/small RNA controls was selected and the geometric mean of their relative expression values was used for normalization (Table 1). In the NB sample set, the reduction in technical variation is highly similar to that obtained after mean expression value normalization, as illustrated by the cumulative distribution plot of miRNA CV values (Figure 2). Here also, the overall decrease in variation is significant both for the 50% least variable (P < 0.001) and the 50% most variable (P < 0.001) miRNAs (Additional data file 2). Similar results were obtained for other sample sets (Additional data file 3). These findings indicate that the geometric mean of a limited number of carefully selected miRNAs/small RNA controls that resemble the mean can be successfully used for normalization of gene expression profiling experiments in follow-up studies where only a limited number of miRNA molecules are studied. Table 1 Selection of miRNAs that resemble the mean expression value Neuroblastoma T-ALL EVI1 leukemia Normal tissue Normal bone marrow miR-425* Z30† miR-191* miR-572* miR-140* miR-191* RNU24† miR-140* let-7f* miR-30c* miR-125a* miR-361* miR-16* miR-632* miR-328* miR-331* miR-339* miR-423* RPL21† *Human mature miRNA. †Small RNA control. T-ALL, T-cell acute lymphoblastic leukaemia. We further investigated the use of these stable miRNAs/small RNA controls for normalization by evaluating the impact on differential miRNA expression. In the NB sample set, differential expression of the miR-17-92 cluster is significant for all but one miRNA, with fold changes highly similar to those obtained upon normalization with the mean expression value (Figure 3). Moreover, miRNA expression profiles generated with both normalization strategies are significantly correlated as over 90% of all miRNAs display a correlation coefficient above 0.8 and 65% have a correlation coefficient above 0.9 (Spearman's Rank rho value; Figure 4). Similar results were obtained with other sample sets (data not shown). Figure 4 Cumulative distribution of Spearman's Rank rho values. The cumulative distribution of the Spearman's Rank rho values for each individual miRNA in the neuroblastoma sample set. The rho-values represent the degree of correlation between the miRNA expression profile upon mean expression value normalization or normalization with miRNAs resembling the mean expression value. Normalization using miRNAs that resemble the mean is platform independent Finally, the correlation between both normalization strategies was validated on an independent dataset of microarray miRNA expression data from 12 NB cell lines. Probe intensities were log transformed and the mean expression value was calculated for each array. Subsequently, miRNAs with expression levels correlating to the mean expression value were identified as outlined above and the best miRNAs were selected for further normalization. Log intensities were normalized using either the mean expression value of all probes or the mean expression of the selected miRNAs. Hierarchical clustering of a compiled dataset consisting of mean and miRNA normalized samples reveals a high correlation between each sample pair as pairs consistently cluster together (Additional data file 10). Over 95% of all miRNAs show a correlation coefficient above 0.7 and 87% have a correlation coefficient above 0.8 (Spearman's Rank rho value). These results illustrate that normalization using miRNAs that resemble the mean expression value is platform independent and closely mimics normalization using the mean expression value. Discussion In this study we present the use of the mean miRNA expression value as a new method for miRNA RT-qPCR data normalization. This method was validated across different independent datasets and clearly outperforms the current normalization strategy that is based on the use of endogenous small RNA controls. Our results demonstrate that the mean expression value of all expressed miRNAs is characterized by high expression stability, according to geNorm analysis, resulting in an adequate removal of technical variability, as measured by the CV of normalized expression values. While mean normalization results in reduction of noise over all expressed miRNA, stable small RNA control normalization only achieves this for the 50% least variable miRNAs. Interestingly, the mean expression value of all expressed miRNAs performs better than one based on only those miRNAs that are expressed in all samples. This suggests a more accurate representation of input RNA fluctuations when all miRNAs are considered. Furthermore, the mean expression value is more stable than a set of three miRNAs (miR-103, miR-191 and miR-17-5p) previously proposed as universal reference miRNAs [12]. Only in the NB sample set could we confirm stable expression of miR-191 and miR-103. miR-17-5p is activated by MYC transcription factors, which results in mir-17-5p overexpression in tumors with activated MYC signaling [18,19]. Moreover, mir-17-5p is located on 13q31.3, a region frequently amplified in B-cell lymphomas, resulting in elevated mir-17-5p expression [22]. Accordingly, mir-17-5p does not qualify as a proper candidate reference miRNA. Several studies report on the use of synthetic RNA or miRNA molecules as spike-in controls for mRNA/miRNA expression data normalization [23-26]. While these kind of controls have value in assay validation and quality control, they only correct for extraction efficiency (when added to the cells prior to RNA isolation) or reverse transcription efficiency (when added to the RNA) differences when using them for normalization. As such, they do not control for all experimental variability, and are not assumption-free as it is assumed that the experimenter starts with the same quantity of equal quality template. Normalization factors that are based on endogenous small RNA molecules, such as the small RNA controls, miRNA molecules, or the mean miRNA expression value, are therefore preferred. To assess the impact of small RNA control, miRNA or mean expression value normalization on biological variation, we studied the differential expression of the miR-17-92 cluster in the NB dataset, consisting of samples with and without MYCN amplification. Because differential expression of this miRNA cluster has been repeatedly documented, both in the context of MYC family transcription factors and in the context of NB tumors [18,19], we reasoned that it could serve as an excellent positive control. Strikingly, only 1 of the 8 miR-17-92 miRNAs analyzed showed an expression fold change of at least 1.5-fold after small RNA control normalization. A 1.5-fold expression difference cut-off is based on several miRNA profiling studies confirming that subtle changes in miRNA expression, such as a 1.5-fold difference, can have a significant impact on the biology of the cell [27-32]. As a consequence, a proper normalization strategy that enables detection of these small changes is of the utmost importance. Upon mean expression value normalization, seven miRNAs exceeded the 1.5-fold expression difference. For one miRNA, mir-17-3p, no expression difference was detected; however, the status of mir-17-3p as a functional miRNA is still controversial [19,33,34]. We and others have shown that MYC transcription factors actively bind to the miR-17-92 promoter [18,19]. In addition, we here describe histone marks associated with active transcription and elongation that are not restricted to a single miRNA but encompass the entire set of miRNAs from the miR-17-92 cluster. Taken together with the fact that the miR-17-92 cluster is transcribed as a single transcript (pri-miR-17-92) [22], most likely all miRNAs should be activated in the MNA NB cells. The results obtained with mean expression value normalization are best in line with this hypothesis. While small RNA control normalization in the clinical tumor samples appears not to be affective, in cultured cells this strategy is capable of detecting differential expression for the majority of the mir-17-92 miRNAs [18]. This could be explained by the degree of heterogeneity of the sample set under consideration. Tumor samples are typically more heterogeneous than cultured cells and, therefore, require a more robust normalization strategy that is able to reduce this variability. Apart from differential miR-17-92 expression, we also evaluated global miRNA expression in the NB tumors with regard to MYCN amplification status. Upon normalization with stable small RNA controls, differential miRNA expression was highly unbalanced, with 89.1% of all differentially expressed miRNAs being downregulated. In contrast, literature reports on differential mRNA expression with regard to MYCN amplification status suggest a more balanced situation. From a total of 678 coding genes that have been described as differentially expressed, 63% are upregulated and 37% are downregulated [35]. The unbalanced differential miRNA expression that is observed upon stable small RNA control normalization is most likely caused by an unbalanced normalization factor that hypercorrects miRNA expression in MYCN amplified tumors. Indeed, we calculated a significantly higher normalization factor for amplified versus not-amplified tumors (data not shown). Furthermore, small RNA controls and miRNAs are transcribed by different RNA polymerases [36], possibly making these small RNA controls improper normalizers for miRNA expression. This has been well established for mRNA expression normalization as ribosomal RNAs, which are transcribed by RNA polymerase I, are often poor and unstable normalizers for mRNAs [11], which are transcribed by RNA polymerase II. Mean expression value normalization is based on the expression of miRNAs and results in a more balanced differential miRNA expression with only 57.6% downregulated miRNAs. Importantly, mean expression value normalization is only valid if a large number of miRNAs are profiled. However, for small scale experiments, typically focusing on a selection of miRNAs, this is not the case. To overcome this problem, we have shown that it is possible to identify miRNAs and small RNA controls that resemble the mean expression value. Our results indicate that a normalization factor based on the selection of miRNAs/small RNA controls resembling the mean expression value performs equally well as the mean expression value itself. We therefore propose a workflow consisting of a pilot experiment in which miRNAs/small RNA controls can be identified that resemble the mean expression value. Subsequently, these can be used for proper normalization of miRNA expression in targeted small scale experiments, focusing on only a limited number of genes. miRNA gene expression studies in which no prior whole miRNome expression profiling can be performed should be preceded by a careful selection of the most stable small RNA controls. In this case, cautious interpretation of the data is warranted. Conclusions A proper normalization strategy is a crucial aspect of the RT-qPCR data analysis workflow. For large scale miRNA expression profiling studies we have shown that mean expression value normalization outperforms the current normalization strategy that makes use of small RNA controls. For those experiments focusing on a limited number of miRNAs we propose a workflow that is based on the selection of miRNAs/small RNA controls that resemble the mean expression value. This strategy is innovative, straightforward and universally applicable and enables a more accurate assessment of relevant biological variation from a miRNA RT-qPCR experiment. Materials and methods Samples A total of 147 samples from 5 different tissue groups were used in this study, including 61 NB tumors, 49 T-ALL samples, 18 leukemias with EVI1 overexpression, 8 normal human tissue samples (brain, colon, heart, kidney, liver, lung, breast, adrenal gland) and 11 normal bone marrow samples. RNA samples from the normal human tissue group were obtained from Stratagene (Cedar Creek, TX, USA). NB tumor RNA was isolated using the miRNeasy mini kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. RNA from leukemic and normal bone marrow samples was isolated as described previously [37]. For each sample, total RNA integrity was measured using the Experion (Bio-Rad, Hercules, CA, USA) and evaluated through the RNA quality index; for all samples this was higher than 5. RDML data and MIQE guidelines Compliance of qPCR experiments with the MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) guidelines [38,39] is listed in the MIQE checklist (Additional data file 11). Raw miRNA expression, experimental annotation and sample annotation are available in the RDML data format [14,40] (Additional data file 12). Cell culture Twelve NB cell lines (NGP, IMR-32, SMS-KAN, SK-N-BE(2c), LAN-5, LAN-6, SK-MYC2, SK-N-AS, SK-N-SH, NBL-S, SK-N-FI and CLB-GA) were cultured in RPMI 1640 medium (Invitrogen, Carlsbad, CA, USA) supplied with 15% fetal calf serum, 1% penicillin/streptomycin, 1% kanamycin, 1% glutamine, 2% HEPES (1 M), 1% sodiumpyruvate (100 nM) and 0.1% beta-mercapto (50 nM). At 80% confluence, cells were harvested by scraping for total RNA isolation (miRNeasy, Qiagen). MicroRNA profiling miRNA expression was measured as described previously [10]. Briefly, 20 ng of total RNA was reverse transcribed using the Megaplex RT stem-loop primer pool (Applied Biosystems, Foster City, CA, USA), enabling miRNA specific cDNA synthesis for 430 different human miRNAs and 18 small RNA controls. Subsequently, Megaplex RT product was pre-amplified by means of a 14-cycle PCR reaction with a miRNA specific forward primer and universal reverse primer to increase detection sensitivity. Finally, a 1,600-fold dilution of pre-amplified miRNA cDNA was used as input for a 40-cycle qPCR reaction with miRNA specific hydrolysis probes and primers (Applied Biosystems). All reactions were performed on the 7900 HT (Applied Biosystems) using the gene maximization strategy [41]. Raw Cq values were calculated using the SDS software version 2.1 applying automatic baseline settings and a threshold of 0.05. For further data analysis, only those miRNAs with a Cq value equal to or below 35 (representing single molecule template detection [10]) were taken into account. For NB tumor samples all 448 miRNAs and small RNA controls were profiled. RT-qPCR assays were spread across two 384-well plates. Inter-run variation was accounted for by equalizing the mean Cq-value of the 18 small RNA controls that were profiled in both plates. For the remaining samples 366 miRNAs and 18 small RNA controls were profiled in a single 384-well plate. Selection of stable normalizers Assessing gene expression stability of putative normalizer genes was done using two different algorithms, geNorm [11] and Normfinder [15]. Raw Cq values were transformed to linear scale before analysis. Normalization factors were calculated as the geometric mean of the expression of the stable normalizers [41]. Selection of the optimal number of stable normalizers was based on geNorm's pairwise variation analysis between subsequent normalization factors using a cut-off value of 0.15 for the inclusion of additional normalizers [11]. Selection of miRNAs/small RNA controls that resemble the mean expression value For robust and unbiased selection of genes whose expression level best correlates with the mean expression level, we used the geNorm V value [11]. In brief, for each miRNA and small RNA control we calculated the difference between its Cq value and the average Cq value of all expressed genes, per sample, within a given sample set. Next, the standard deviation of these differences was determined for every miRNA and small RNA control. The miRNAs or small RNA controls with the lowest standard deviation most closely resemble the mean expression value. The optimal number of miRNAs/small RNA controls for normalization was determined upon geNorm analysis of the ten best ranked normalizers. To avoid including miRNAs that are putatively co-regulated, we determined their genomic location and excluded those miRNAs that are located within 2 kb of each other using miRGen [42]. Co-regulated miRNAs were replaced by the next best ranked miRNA. Chromatin immunoprecipitation Immunoprecipitation was performed as described previously using 10 μg of MYCN (Santa Cruz, sc-53993, Santa Cruz, CA, USA) antibodies [43]. Histone marks for active transcription (H3K4me3; Abcam, ab8580, Cambridge, MA, USA), repression (H3K27me3; Upstate, 07-449, Lake Placid, NY, USA), and elongation (H3K36me3; Abcam, ab9050) were assessed together with MYCN binding. ChIP-DNA templates from Kelly, IMR5, WAC2 cells using MYCN, H3K4me3, H3K27me3 and H3K36me3 were amplified for DNA microarray analysis (Agilent Human Promoter ChIP-chip Set 244 K, Santa Clara, CA, USA) using the WGA (whole genome amplification) (Sigma, St. Louis, MO, USA) method as previously described [43]. DNA labeling, array hybridization and measurement were performed according to Agilent mammalian ChIP-chip protocols. For the visualization of ChIP-chip results, the cureos package version 0.2 for R was used (available upon request). Real-time ChIP-qPCR was performed using SYBR Green I detection chemistry (Eurogentec, Seraing, Belgium) on a LightCycler480 (Roche, Basel, Switzerland). Primer sequences for MYCN binding sites in the mir-17-92 and MDM2 promoter regions were described previously [19,44]. Signals were normalized based on the average abundance of three non-specific genomic regions in the ChIP samples using qBasePlus version 1.1 software [45]. Fold enrichment in the MYCN precipitated samples was calculated relative to the input sample and compared to that of a fourth non-specific region. All primer sequences are available in the public RTprimerDB database [46] (gene (RTPrimerDB-ID): miR-17-92 promoter A (7796), miR-17-92 promoter B (7797), MDM2 promoter (7798), non-specific region 1 (7799), non-specific region 2 (7800), non-specific region 3 (7801), non-specific region 4 (7802)) [47]. Locked nucleic acid microarrays In total, 5 μg of total RNA was hybridized to immobilized locked nucleic acid-modified capture probes according to Castoldi et al. [48]. Background- and flag-corrected median intensities were log transformed and normalized according to the mean signal of each array. Hierarchical clustering Hierarchical clustering of the miRNA expression data was performed using Spearman's rank correlation as the sample and gene distance measure and pairwise complete linkage as implemented in the Genepattern 2.0 software [49]. Abbreviations ChIP: chromatin immunoprecipitation; CV: coefficient of variation; miRNA: microRNA; MNA: MYCN amplified; MNSC: MYCN single copy; NB: neuroblastoma; RDML: Real-time PCR Data Markup Language; RT-qPCR: real-time quantitative PCR; T-ALL: T-cell acute lymphoblastic leukaemia. Authors' contributions PM carried out the miRNA profiling experiments and data analysis and drafted the manuscript. PVV and ADW performed miRNA profiling experiments. DM and FW are responsible for MYCN ChIP-on-chip data. FS and JV conceived the study and participated in its design and coordination. All authors read and approved the final manuscript. Additional data files The following additional data are available with the online version of this paper: a figure showing geNorm expression stability plots (Additional data file 1); a figure showing the mean miRNA CV value in the neuroblastoma sample set (Additional data file 2); a figure showing the cumulative distribution of miRNA CV values (Additional data file 3); a figure showing ChIP-chip results for the miR-17-92 cluster (Additional data file 4); a figure showing ChIP-qPCR results for the miR-17-92 cluster (Additional data file 5); a figure showing ChIP-chip results for the miR-181a-1/miR-181b-1 cluster (Additional data file 6); a figure showing miR-17-92 expression in neuroblastoma cell lines (Additional data file 7); a figure showing overall differential miRNA expression in the neuroblastoma sample set (Additional data file 8); a figure showing fold change expression difference correlation for MYCN downregulated miRNAs (Additional data file 9); a figure showing hierarchical clustering of neuroblastoma cell lines based on miRNA expression (Additional data file 10); a table listing the MIQE checklist (Additional data file 11); a collection of RDML files containing miRNA expression for all data sets (Additional data file 12). Supplementary Material Additional data file 1 Expression stability of small RNA controls and the mean expression value in (a) the neuroblastoma sample set, (b) the leukemias with EVI1 overexpression, (c) the normal bone marrow samples and (d) the normal human tissues. Click here for file Additional data file 2 Mean miRNA CV value for (a) the 50% least variable and (b) 50% most variable miRNAs when no normalization is applied, stable small RNA control normalization is applied, mean expression value normalization is applied or normalization with miRNAs/small RNA controls resembling the mean is applied. (a) All three normalization strategies result in a significant decrease of the mean CV value. (b) Only mean expression value normalization and normalization with miRNAs/small RNA controls resembling the mean result in a significant decrease of the mean CV value. Stable small RNA controls for the T-ALL samples: RNU24, RNU44, RNU48, RNU58A, U18 and Z30; for the leukemias with EVI1 overexpression: RNU6B, RNU24 and RNU58A; for the normal bone marrow samples: RNU44, RNU24 and RNU48; and for the normal human tissues: RPL21, RNU38B and RNU24. MiRNAs/small RNA controls that resemble the mean expression value are listed in Table 1. Click here for file Additional data file 3 The cumulative distribution of miRNA CV-values in (a) the T-ALL sample set, (b) the leukemias with EVI1 overexpression, (c) the normal bone marrow samples and (d) the normal human tissues when no normalization is applied (blue), stable RNA control normalization is applied (red), mean expression value normalization is applied (green) or normalization with miRNAs resembling the mean expression value is applied (purple). Stable small RNA controls for the T-ALL samples: RNU24, RNU44, RNU48, RNU58A, U18 and Z30; for the leukemias with EVI1 overexpression: RNU6B, RNU24 and RNU58A; for the normal bone marrow samples: RNU44, RNU24 and RNU48; for the normal human tissues: RPL21, RNU38B and RNU24. MiRNAs/small RNA controls that resemble the mean expression value are listed in Table 1. Click here for file Additional data file 4 ChIP-chip results of the miR-17-92 cluster are given for Kelly, IMR5, and WAC2. Oligonucleotide position is given as bars according to the chromosomal localization. Color coding of the bars represents the log2 ratios MYCN versus input from ChIP-chip experiments, were red means positive and green negative values. Histone marks for active transcription (H3K4me3), repression (H3K27me3) and enlongation (H3K36me3) as measured by ChIP-chip are given together with MYCN binding using the same color coding. miRNA transcript information (miRBase version 11.0), CpG islands, and conservation among 28 species were implemented for the region as given by the respective annotation tracks deposited in the UCSC database (Hg 18, release March 2006). Position of canonical (CACGTG) and non-canonical E-boxes from in silico scanning of the respective sequence is given. Grey coding for results of the positional weight matrix (PWM) scan represents the P-values of the 12 bp MYCN binding motif from the TRANSFAC database. Red line = median log2 ratio MYCN versus input as calculated for each chromosome individually. Click here for file Additional data file 5 Fold enrichment of specific and non-specific genomic regions in the MYCN precipitated samples compared to the input sample as determined by qPCR. MiR-17-92 promoter A and miR-17-92 promoter B are two MYCN specific e-box containing regions in the miR-17-92 promoter. MDM2 promoter is a MYCN specific e-box containing region in the MDM2 promoter and serves as a positive control. The negative control is a non-specific, non e-box containing genomic region. Click here for file Additional data file 6 ChIP-chip results of the miR-181a-1/miR-181b-1 cluster are given for Kelly, IMR5, and WAC2. Oligonucleotide position is given as bars according to the chromosomal localization. Color coding of the bars represents the log2 ratios MYCN versus input from ChIP-chip experiments, were red means positive and green negative values. Histone marks for active transcription (H3K4me3), repression (H3K27me3) and enlongation (H3K36me3) as measured by ChIP-chip are given together with MYCN binding using the same color coding. miRNA transcript information (miRBase version 11.0), CpG islands, and conservation among 28 species were implemented for the region as given by the respective annotation tracks deposited in the UCSC database (Hg 18, release March 2006). Position of canonical (CACGTG) and non-canonical E-boxes from in silico scanning of the respective sequence is given. Grey coding for results of the positional weight matrix (PWM) scan represents the P-values of the 12 bp MYCN binding motif from the TRANSFAC database. Red line = median log2 ratio MYCN versus input as calculated for each chromosome individually. Click here for file Additional data file 7 Relative expression of miR-17-5p, miR-18a, miR-19a, miR-20a and miR-92a in one MYCN single copy cell line (SK-N-AS) and one MYCN amplified cell line (IMR-32) upon mean expression value normalization. Relative expression values were rescaled to those in SK-N-AS. Click here for file Additional data file 8 Average fold change expression difference of all miRNAs with an expression below the Cq cutoff of 35 PCR cycles in MYCN amplified neuroblastoma samples compared to MYCN single copy neuroblastoma samples. Fold changes were calculated upon stable small RNA control normalization (black) and mean expression value normalization (orange). Plotted fold changes are log2-based and sorted from positive (upregulated in MYCN amplified tumor samples) to negative (downregulated in MYCN amplified tumor samples). Dashed lines represent a two-fold expression difference. Arrows indicate the threshold between up- and downregulated miRNAs for both normalization strategies (the number of up- and downregulated miRNAs is indicated left and right of each arrow, respectively). Click here for file Additional data file 9 Correlation plot showing the average fold change expression difference for the 10% most downregulated miRNAs in MYCN amplified tumors compared to MYCN single copy tumors upon stable small RNA control normalization (x-axis) and mean expression value normalization (y-axis). Both axes are log2-based. The corresponding trend line has a coefficient of determination of 0.973 (R2), a slope approaching 1 and a Y-intercept of 0.449. Click here for file Additional data file 10 Heatmap representing a hierarchical clustering analysis of 24 paired samples based on their miRNA expression profiles. Each sample pair consists of a different neuroblastoma cell line for which the miRNA expression was normalized with the mean expression value or with miRNAs resembling the mean expression value. Cell lines are numbered from 1 to 12. The tag represents the applied normalization strategy (M stands for mean expression value normalization, m for normalization with miRNAs resembling the mean expression value). Click here for file Additional data file 11 Compliance of qPCR experiments with the MIQE guidelines. Click here for file Additional data file 12 RDML files containing miRNA expression and a sample annotation for each sample set. Click here for file
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          Bacterial osmoadaptation: the role of osmolytes in bacterial stress and virulence.

          Two general strategies exist for the growth and survival of prokaryotes in environments of elevated osmolarity. The 'salt in cytoplasm' approach, which requires extensive structural modifications, is restricted mainly to members of the Halobacteriaceae. All other species have convergently evolved to cope with environments of elevated osmolarity by the accumulation of a restricted range of low molecular mass molecules, termed compatible solutes owing to their compatibility with cellular processes at high internal concentrations. Herein we review the molecular mechanisms governing the accumulation of these compounds, both in Gram-positive and Gram-negative bacteria, focusing specifically on the regulation of their transport/synthesis systems and the ability of these systems to sense and respond to changes in the osmolarity of the extracellular environment. Finally, we examine the current knowledge on the role of these osmostress responsive systems in contributing to the virulence potential of a number of pathogenic bacteria.
            Bookmark

            Author and article information

            Affiliations
            [ ]State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, 155, Changbai Road, Changping, Beijing 102206 China
            [ ]Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, 310006 China
            Contributors
            fuxiuping313@sina.com
            liangweili@icdc.cn
            dupengcheng@icdc.cn
            yanmeiying@icdc.cn
            kanbiao@icdc.cn
            Journal
            Gut Pathog
            Gut Pathog
            Gut Pathogens
            BioMed Central (London )
            1757-4749
            30 December 2014
            30 December 2014
            2014
            : 6
            : 1
            4293811
            47
            10.1186/s13099-014-0047-8
            © Fu et al.; licensee Biomed Central. 2014

            This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

            Categories
            Research
            Custom metadata
            © The Author(s) 2014

            Gastroenterology & Hepatology

            vibrio cholerae, salt stress, transcription, pcr array

            Comments

            Comment on this article