+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Single-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. We present DeepImpute, a deep neural network-based imputation algorithm that uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation. Overall, DeepImpute yields better accuracy than other six publicly available scRNA-seq imputation methods on experimental data, as measured by the mean squared error or Pearson’s correlation coefficient. DeepImpute is an accurate, fast, and scalable imputation tool that is suited to handle the ever-increasing volume of scRNA-seq data, and is freely available at https://github.com/lanagarmire/DeepImpute.

          Related collections

          Most cited references 27

          • Record: found
          • Abstract: found
          • Article: not found

          Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells

          Recent molecular studies have revealed that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels, and phenotypic output 1–5 , with important functional consequences 4,5 . Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs 1,2 or proteins 5,6 simultaneously because genomic profiling methods 3 could not be applied to single cells until very recently 7–10 . Here, we use single-cell RNA-Seq to investigate heterogeneity in the response of bone marrow derived dendritic cells (BMDCs) to lipopolysaccharide (LPS). We find extensive, and previously unobserved, bimodal variation in mRNA abundance and splicing patterns, which we validate by RNA-fluorescence in situ hybridization (RNA-FISH) for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, we identify a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, we show that variability in this module may be propagated through an interferon feedback circuit involving the transcriptional regulators Stat2 and Irf7. Our study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.
            • Record: found
            • Abstract: found
            • Article: not found

            The technology and biology of single-cell RNA sequencing.

            The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications.
              • Record: found
              • Abstract: found
              • Article: not found

              Tracing the Derivation of Embryonic Stem Cells from the Inner Cell Mass by Single-Cell RNA-Seq Analysis

              Introduction The derivation of embryonic stem cells (ESCs) from the inner cell mass (ICM) of mouse blastocysts consisting of about 20 cells occurs in vitro under a variety of culture conditions, such as in the presence of leukemia inhibitory factor (LIF) and fetal calf serum (FCS) (Evans and Kaufman, 1981; Ying et al., 2008). After about 5 days in culture, the inner cell mass outgrowths of blastocysts are disrupted into small clusters of cells and passaged until the establishment of ESC lines. Thus, the ICM cells that, in vivo, are subject to a strict developmental program undergo a transformation into cells with a capacity for infinite self-renewal while retaining pluripotency. The precise molecular changes accompanying this transition remain to be fully elucidated, which is hampered by the limited number of cells available for analysis (Niwa, 2007). Pluripotent E3.5-E4.5 primitive ectoderm/epiblast (PE) and ESCs can both contribute to all three germ layers and the germ line when injected into host blastocysts to form chimera (Niwa, 2007). However, only the ESCs cultured in vitro have the capacity for unlimited self-renewal while retaining their pluripotency (Niwa, 2007; Smith, 2006). Some differences between the ICM and ESCs have been identified, such as the expression of pramel5, pramel6, and pramel7 in the ICM, which are repressed in ESC (Kaji et al., 2007). Other genes, including Dicer (Bernstein et al., 2003; Kanellopoulou et al., 2005; Murchison et al., 2005), Nanog (Chambers et al., 2007; Chambers et al., 2003; Mitsui et al., 2003), Mbd3 (Kaji et al., 2006; Kaji et al., 2007), and Ezh2 (O'Carroll et al., 2001; Shen et al., 2008), are essential for the establishment of pluripotent PE cells in the ICM but dispensable for the maintenance of ESCs. There have been intensive studies on ESCs in recent years, but these have usually been on bulk cells by RNA-Seq, cDNA microarray, SAGE, and EST sequencing (Niwa, 2007; Ivanova et al., 2006; Cloonan et al., 2008). However, the precise changes accompanying the process of conversion of ICM to ESCs remain to be fully elucidated. To gain insight into this process, we used blastocysts from Oct4-ΔPE-GFP transgenic mice and cultured them in vitro under the classical conditions consisting of LIF and FCS used for the derivation of ESCs (Niwa, 2007). The Oct4-ΔPE-GFP reporter we used is under the control of only the distal enhancer for Oct4 (also known as Pou5f1) and lacks the proximal enhancer (Yeom et al., 1996). This GFP reporter shows expression in the E3.5 ICM, E4.5 epiblast, primordial germ cells (PGCs), and ESCs, but not in the postimplantation epiblast or in the epiblast stem cells (EpiSCs) (Yeom et al., 1996; Bao et al., 2009). Notably, the distal enhancer of Oct4 represents the densest binding locus for the key pluripotency-specific transcription factors in ESCs (Chen et al., 2008), which makes it an ideal reporter for tracing the course of changes during the establishment of ESCs from ICM. By analyzing single Oct4-ΔPE-GFP-positive and Oct4-ΔPE-GFP-negative cells, we set out to monitor changes in ICM cells during their progression toward ESCs. We used our recently developed single-cell RNA-Seq transcriptome analysis to investigate the critical early changes during this process (Tang et al., 2009). Results and Discussion Analysis of Individual ICM Outgrowth Cells First, we analyzed the three key pluripotency genes during the course of blastocyst culture and the formation of outgrowths (Figure 1). At each stage, we chose between 10 and 26 single cells for analysis. We generated cDNAs by whole transcriptome amplification (WTA) of these individual cells (see Experimental Procedures for details). All ICM cells (22/22) tested showed high expression of Oct4, Sox2, and Nanog. However, among cells from day 3 outgrowths that had high Oct4 expression, about 39% (7/18) had already lost expression of Nanog and/or Sox2, indicating that they might be losing pluripotency. By contrast, most of the cells from day 5 outgrowths (11/13) that had high Oct4 expression also showed high expression of both Sox2 and Nanog, suggesting that these may represent the earliest population that had acquired or were likely on course to acquire the ESC-like fate with the potential for self-renewal. We were also able to establish an ESC line from a single cell isolated from a day 5 outgrowth (data not shown). As expected, all the ESCs (23/23) had high expression of these three pluripotency genes. Expression Dynamics of 385 Genes in 74 Single Cells from ICM to ESCs Next, we chose 385 pluripotency and early differentiation related genes to monitor their expression in cells from the ICM, as well as from day 3 and day 5 outgrowths, and from ESCs at single-cell resolution (Table S1). All 14 ESCs analyzed had high expression (Ct = 19–28) of Oct4, Sox2, Nanog, Dppa4, Dppa5, Sall4, Utf1, Rex2, and Rif1, indicating their pluripotent character (Figure 2A and Figure S1). By contrast, we detected little or no expression (Ct = 40) of all 23 early differentiation marker genes (ectoderm markers: Pax6, Otx1, Neurod1, Nes, Lhx5, and Hoxb1; mesoderm markers: Tbx2, T, Nkx2-5, Myod1, Myf5, Mesdc1, Mesdc2, Kdr, Isl1, Hand1, and Eomes; endoderm markers: Onecut1, Gata4, Gata5, and Gata6; extraembryonic markers: Cdx2 and Tpbpa) (see Table S1). Similarly, all 14 cells isolated and analyzed from ICM showed high expression of the nine pluripotency-specific genes. However, expression of some genes, for example, c-Myc, which was shown to be an important reprogramming factor for pluripotency (Takahashi and Yamanaka, 2006), was highly heterogeneous in cells from the ICM (Ct = 24–40); this variability was progressively reduced until, finally, all ESCs consistently expressed c-Myc (Figure 2B). Interestingly, we found that Tet1 and Tet2 (Table S1), which were recently shown to mediate DNA demethylation in ESCs, were highly expressed in both ICM and ESCs, but their expression only decreased in Oct4-negative cells present in the ICM outgrowths. Thus, our observations support their importance for pluripotency (Tahiliani et al., 2009). Since ESCs can also be maintained in an undifferentiated state by LIF and BMP4 (Ying et al., 2003), we investigated the expression of a key receptor, Bmpr1a, and found it to be heterogeneous in the ICM (Ct = 27–40). However during the ICM outgrowth, Bmpr1a expression was detected more consistently until, finally, all ESCs (14/14) showed strong expression. This suggests that all ESCs have the potential to respond to Bmp4 signaling (Figure 2C). Conversely, for Bmp4, all ICM cells (14/14) showed high expression, but this declined during the course of ICM outgrowths so that ultimately only about 50% (7/14) of individual ESCs retained Bmp4 expression (Ct = 25–40). This is compatible with the fact that maintenance of ESCs can be achieved by the addition of exogenous Bmp4 or serum, which contains Bmp4 (Ying et al., 2003). During the course of ICM outgrowth toward ESCs, we found clear upregulation of several genes, including Tcf15, Prdm5, Zic3, Ifitm1, Nodal, and Bex1, indicating that they may potentially be important during the transition to ESCs and/or for their subsequent maintenance (Figure 2D). Indeed, Nodal is a known regulator of self-renewal but is not essential for the pluripotency of ESCs (see below). By contrast, there was clear downregulation of some genes during ICM outgrowth, such as Gata4, Gata6, Pramel7, Tbx3, Bmi1, Bcl2l14, Nr5a2, and Amhr2, which potentially have ICM specific development-related functions (Figure 2E). For example, ICM has the potential to develop into primitive endoderm cells, for which Gata4 and Gata6 are crucial regulators (Fujikura et al., 2002; Koutsourakis et al., 1999; Morrisey et al., 1998). Thus, repression of these genes may allow ICM cells to exit from their inherent developmental program as they acquire the ability for self-renewal while retaining pluripotency as ESCs. Molecular Changes during the Transition from ICM to ESCs To understand the dynamic nature of gene expression in individual cells at the whole-genome scale, we randomly selected 12 individual ESCs and generated their digital transcriptome profile (Figure 3A, Figure S2, and Tables S2 and S3) (Tang et al., 2009). Indeed, all of the 12 ESCs analyzed had high expression of Oct4, Sox2, Nanog, Rex1 (also known as Zfp42), Dppa5, and Utf1, which indicates that all of them are in an undifferentiated state and are pluripotent. To confirm the reliability of our single-cell RNA-Seq approach, we compared our data with that obtained from bulk analysis of ESCs (Cloonan et al., 2008). We found that on average, an individual ESC expresses 10,815 genes (RPM > 0.1), which means that we captured expression of at least 94.6% of the genes in a single cell of those detected by deep sequencing in bulk assays of ESCs (Cloonan et al., 2008). Overall, 65.8% (13,326 out of 20,259) of known genes were expressed in 12 single ESCs, which shows that our RNA-Seq data represent an accurate reflection of the entire transcriptome in ESCs at single-cell resolution. To understand the relationship between ESCs and the ICM/Epiblast cells from which they were derived, we compared the single-cell RNA-Seq transcriptomes of these cells (Figure S2) to determine the extent to which ESCs resemble E3.5 ICM or E4.5 Epiblast cells (Nichols et al., 2009). We found that the molecular signature of all undifferentiated ESCs maintained under our culture conditions are clearly different from both ICM and epiblast cells based on the principal component analysis of their transcriptomes. This means that at the molecular level, ESCs are distinct from E3.5 ICM or E4.5 Epiblast (Figure 3A). We detected a large set of genes, which show clear differential expression between ICM/Epiblast and ESCs. (Table S2 and Figure S3. Note 2,475 genes with fold change, FC[ESC/ICM] > 4, p  4, p 10, Figure 4), which indicates that the former set of genes have a higher propensity for a more dynamic regulation of expression among individual cells of the same type. These genes include Hoxd13, Hoxb3, Hoxb5, and Ddx3y that showed highly variable expression in ESCs, whereas Gm364, Tmem80, Hdx, Trpm3, Enox2, Ilvbl, Has3, Pygm, and Fbxw13 showed a great variation in expression within ICM cells. Some genes, such as Tnk1, Myof, Adamts9, Tspan12, Rhox6, Epha7, Dhrs3, Fam189a1, and Nudt18, showed highly variable expression in both ESCs and ICM (Table S2). These variations are probably not because of technical reasons because genes expressed at low levels (RPM 1) between cells of ESCs and ICM, Gene Ontology (GO) analysis showed that the genes involved in cellular growth, cellular assembly, amino acid metabolism, and lipid metabolism were significantly enriched (p 4, or 1.5, p 1.45, p 1.51, p 1.51, p 1.4, p 0.1 RPM) and found that their correlation coefficient is 0.92, confirming the accuracy of our single-cell RNA-Seq data (Figure S3). Alternative Splicing during the ICM Outgrowth at Whole-Genome Scale Alternative splicing plays an important role in defining tissue identity and specificity. It is estimated that nearly 95% of the mammalian multiexon genes express multiple transcript variants through alternative splicing (Chen and Manley, 2009). We wished to know if alternative splicing was a major feature during the outgrowth process of ICM toward ESCs. We addressed the expression dynamics of all the 6,331 transcript variants from the 2,567 RefSeq genes with multiple known isoforms, which has not been addressed previously. 1,852 transcript variants were expressed (at least 5 counts) in either ICM or ESCs. And from them, 417 transcript variants were upregulated (fold change, FC[ESC/ICM, splicing] > 2, p 1.41, p 1.56, p 2.41, p 1.27, p 1.57, p 1.65, p 1.3, p 0.02) for pluripotency-related genes (FC[Day5 Oct4+/Oct4−] > 4, r > 0.6). The loss of this class of miRNAs may contribute to the phenotype of loss of pluripotent Oct4-positive epiblast cells when Dicer is knocked out in early embryos (Bernstein et al., 2003). The second class of miRNAs preferably target the ESC-specific pluripotency genes (FC[ESC/ICM] > 4) (miR-669b, -298, -692, -204, -28, -149, -34a, -182, ↑-129-5p, -133a, -320; the target enrichment is 1.6-fold, p 0.03) in ICM-specific genes (FC[ESC/ICM] < 0.25). The loss of this class of miRNAs may contribute to the phenotype of resistance to differentiation when Dicer or DGCR8, two key components of the miRNA processing pathway, are knocked out in established ESCs (Kanellopoulou et al., 2005; Wang et al., 2007). Taken together, miRNAs may contribute to ESC's ability to maintain the balance between pluripotency and the potential for rapid differentiation, through one set of miRNAs targeting genes that drive differentiation, while a separate set of miRNAs target ESC-specific pluripotency genes. Conclusion Our study provides insight into the dynamic molecular changes that accompany cell-fate changes. During the conversion of ICM cells to ESCs, there is an evident arrest of a normal developmental program, which is subverted in vitro in favor of a potential for unrestricted self-renewal while retaining the ability to undergo differentiation into all the diverse cell types. We demonstrate how both the retention of expression of key genes allows inheritance of a fundamental property of the ICM, namely pluripotency, while other changes in the transcriptome permit exit from a normal developmental program and confer a key property of self-renewal. Changes in epigenetic regulators apparently allow for the stability of the newly acquired epigenotype, which is crucial for the inherent plasticity of ESCs. The conversion from ICM to ESC is also coupled with a role for distinct sets of miRNAs that allow for both self-renewal while the cells retain the ability to respond rapidly to cues for differentiation. Our investigation may serve as a paradigm for other studies, including regulation and differentiation of small numbers of stem cells in adults. Our approach is applicable to studies on small groups of differentiating cells and for gaining insight into how developmental programs might be undermined, leading to the formation of diseased tissues, including cancers. Experimental Procedures Isolation of Embryos and Single Cells All embryos were recovered from 129 females mated with Oct4-ΔPE-GFP transgenic male mice. The transgenic GFP expression of the reporter is under the control of Oct4 promoter and distal enhancer, but the proximal enhancer region is deleted. This GFP transgene reporter shows expression in the E3.5 ICM and E4.5 Epiblast of blastocysts and PGC in vivo and in ESC (Yeom et al., 1996). E3.5 and E4.5 blastocysts were flushed from the uterus of 129 pregnant females. For ESC outgrowth, E3.5 blastocysts were cultured in KSOM medium for the first day and then transferred to GMEM medium (GIBCO, cat. no. 21710-025) with 15% Fetal Calf Serum (FCS) (GIBCO, cat. no. 16000-044) and 1000 U/ml Lif on mitomycin C-treated MEF feeder cells for all later periods. The time when the E3.5 blastocysts were placed into culture was designated as day 0. For the isolation of single cells of E3.5 ICM or E4.5 epiblast, the blastocysts were first placed in a mouse trophoblast antibody for 30 min. Then they were treated by complement for 30 min. After this, the lysed trophectoderm cells were removed and the isolated ICM or epiblast was placed in EGTA-PBS for 10 min. After that, they were furthered treated by Trypsin at 37°C for 5 min. Then they were transferred into GMEM medium with 15% FCS and dissociated into single-cell suspension. The resulting single cells were washed in BSA-PBS twice and prepared to be picked as single cells. For the isolation of blastocyst/ICM outgrowth, it was treated by trypsin for 5 min to dissociate the core part of outgrowth from surrounding trophectoderm progenies. The inner core of cells in the outgrowth was treated with EGTA-PBS for 10 min at room temperature and trypsin for 5 min at 37°C. The core of cells was dissociated by pipetting into a single-cell suspension in GMEM medium with 15% FCS. Next, the GFP-positive and -negative cells were separated manually under a fluorescence microscope. The single cells were washed in BSA-PBS twice before they were picked individually for subsequent analysis. Preparation of Single-Cell cDNAs The single-cell RNA-seq method has been described in detail previously (Tang et al., 2009, 2010). In brief, an individual cell was manually picked and transferred into lysate buffer by a mouth pipette, followed by reverse transcription directly on the whole-cell lysate. Following this procedure, terminal deoxynucleotidyl transferase was used to add a poly(A) tail to the 3′ end of first-strand cDNAs, which was followed by 20 + 9 cycles of PCR to amplify the single-cell cDNAs. RNA-Seq Library Preparation, Sequencing, and Alignment After generation of the target cDNA from a single cell, 100 ng cDNA (0.5–3 kb) was sheared into 80–130 bp fragments. P1 and P2 adaptors were ligated to each end, and the fragments were subjected to 8–10 cycles of PCR amplification. Emulsion PCR reactions were performed by combining 1.6 billion 1 μm diameter beads that had P1 primers covalently attached to their surfaces with 500 pg of single-cell libraries. Applied Biosystems SOLiD sequencer generated 50-base sequences, and AB's whole transcriptome software tools were used to analyze the sequencing reads (http://solidsoftwaretools.com/gf/project/transcriptome/). The reads obtained from each cell were matched to the Mouse genome (mm 9, NCBI Build 37) and reads that aligned uniquely were used in the downstream analysis. These reads were used to create base coverage files (in a wiggle format), which can be viewed directly in the UCSC genome browser, or to detect known or novel exon-exon junctions. Unambiguously mapped reads were first used to generate exon counts and then transcript or gene counts. Feature counts were normalized using the RPM (read per million aligned reads) method, and no adjustment to gene/transcript size was made because our protocol has a limited coverage of 0.5–3 kb from the 3′ end of the transcripts. An alternative analysis was used for alignments that were not aligned to their full length, where reads were aligned to a reference containing exon-exon junctions, using 42 bases on each side for junctions, allowing up to four mismatches for the full length of the read (50 bases) (Tang et al., 2009). The quality of the single-cell RNA-Seq data was analyzed (Figure S7). These analyses showed that our single-cell RNA-Seq data are highly reproducible, reliable, and accurate for ICM outgrowth and ESCs. Real-Time PCR For TaqMan real-time PCR, 1.0 μl of diluted cDNAs was used for each 10 μl real-time PCR (1× PCR Universal Master Mix, 250 nM TaqMan probe, 900 nM of each primer, that are commercially available as ready to use Assays, custom-plated in 384-plates or TaqMan low Density Array cards by Applied Biosystems). All reactions were duplicated. The PCR was done as following using an AB7900 with 384-well plates: first, 95°C for 10 min to activate the Taq polymerase, then 40 cycles of 95°C for 15 sec and 60°C for 1 min. MicroRNA Profiling of ICM and ESCs The detailed protocol is described previously (Tang et al., 2006). In brief, 10 cells were picked into a PCR tube by glass capillary and were lysed by heat treatment at 95°C for 5 min. Then the microRNAs were reverse transcribed into cDNAs by pool of 330 of stem-looped primers. After this, these microRNA cDNAs were amplified by 18 cycles of PCR by 330 forward primers and a universal reverse primer. Finally the cDNAs were split and each individual microRNA was measured by TaqMan probe-directed real-time PCR. Three biological replicates were done for each type of cell.

                Author and article information

                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                18 October 2019
                18 October 2019
                : 20
                [1 ]ISNI 0000 0001 2188 0957, GRID grid.410445.0, Department of Information and Computer Science, , University of Hawaii at Manoa, ; Honolulu, HI 96816 USA
                [2 ]ISNI 0000 0001 2188 0957, GRID grid.410445.0, Department of Epidemiology, , University of Hawaii Cancer Center, ; 701 Ilalo Street, Honolulu, HI 96813 USA
                [3 ]ISNI 0000 0001 2188 0957, GRID grid.410445.0, Department of Molecular Biology and Bioengineering, , University of Hawaii at Manoa, ; Honolulu, HI 96816 USA
                [4 ]ISNI 0000000086837370, GRID grid.214458.e, Department of Computational Medicine and Bioinformatics, , University of Michigan, ; Ann Arbor, MI 48105 USA
                © The Author(s). 2019

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: K01ES025434
                Award ID: R01 LM012373
                Award ID: R01 HD084633
                Custom metadata
                © The Author(s) 2019


                rna-seq, single-cell, imputation, deep learning, machine learning, neural network, dropout, deepimpute


                Comment on this article