115
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Disentangling the effects of type 2 diabetes and metformin on the human gut microbiota

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In recent years, several associations between common chronic human disorders and altered gut microbiome composition and function have been reported 1,2 . In most of these reports, treatment regimens were not controlled for and conclusions could thus be confounded by the impact of various drugs on the microbiome. This may obfuscate microbial causes, protective factors, or diagnostically relevant signals. The present study addresses disease and drug signatures in the human gut microbiome of type 2 diabetes mellitus (T2D). Two recent quantitative gut metagenomics studies of T2D patients unstratified for treatment yielded divergent conclusions regarding its associated gut microbiotal dysbiosis 3,4 . Here we show, using 784 available human metagenomes, how antidiabetic medication confounds these results and analyse in detail the effects of the most widely used antidiabetic drug, metformin. We provide support for microbial mediation of therapeutic effects of metformin through short-chain fatty acid (SCFA) production as well as for potential microbiota-mediated mechanisms behind known intestinal adverse effects in the form of a relative increase of Escherichia abundance. Controlling for metformin treatment, we report a unified signature of gut microbiome shifts in T2D with a depletion of butyrate-producing taxa 3,4 . These in turn cause functional microbiome shifts, in part alleviated by metformin-induced changes. Overall, the present study emphasizes the need to disentangle effects of specific diseases from those of drugs in studies of human microbiomes. T2D is a disorder of elevated blood glucose levels (hyperglycaemia) primarily due to insulin resistance and inadequate insulin secretion, with rising global prevalence. Genetic and environmental risk factors are known, the latter including dietary habits and a sedentary lifestyle 5 , and gut microbiota involvement is increasingly recognized 3,4,6,7 , although findings diverge between studies 8 ; e.g. Qin et al. (2012) 3 report several Clostridium species enriched in T2D whereas Karlsson et al. (2013) 4 instead report enrichment of several Lactobacilli (see Supplementary Discussion). Treatment involves medication and lifestyle intervention, which may confound reported gut dysbiosis. Many T2D patients receive metformin, an oral blood glucose-lowering, non-metabolizable compound whose primary and dominant metabolic effect is the inhibition of liver glucose production 9 . At least 30% of patients report adverse effects including diarrhea, nausea, vomiting, and bloating, with underlying mechanisms poorly understood. Studies in animals 10 and humans 11 suggest some beneficial effects of metformin on glucose metabolism may be microbially mediated. Here, we built a multi-country T2D metagenomic dataset, starting with gut microbial samples from a non-diabetic Danish cohort of 277 individuals within the MetaHIT project 12 and additional novel Danish MetaHIT metagenomes from 75 T2D and 31 type 1 diabetes (T1D) patients sequenced using the same protocols (samples abbr. MHD). Treatment information was obtained for all MHD samples, as well as for samples from a previously reported 4 cohort of 53 female Swedish T2D patients along with 92 nondiabetic individuals (43 NGT, 49 IGT) (SWE) and a subgroup of 71 Chinese T2D patients with available information on antidiabetic treatment as well as 185 non-diabetic Chinese individuals 3 (CHN). For all these 784 gut metagenomes (Supplementary Table S1), taxonomic and functional profiles were determined (see Methods), verifying our meta-analysis framework to be appropriate and robust in the context of theoretical considerations and through simulations (Supplementary Discussion 1, Extended Data Figure 1a) as well as characterizing differences between the datasets (Extended Data Figure 2). Initial analysis unstratified for treatment but controlling for demographic and technical variation between datasets (Supplementary Discussion 2, Supplementary Table S2) recovered a majority of previously reported associations (Supplementary Discussion 2, Supplementary Table S3) but with large divergence between datasets. Suspecting confounding treatments, we tested for influence of diet and antidiabetic medications (Supplementary Discussion 3, Supplementary Table S4, Extended Data Figure 1b), finding an effect only of metformin. Since the fraction of medicated patients (“T2D metformin+”) varied strongly (21% CHN, 38% SWE and 77% MHD) samples were stratified on metformin treatment status. Multivariate analysis showed significant (Permanova FDR < 0.005) differences in gut taxonomic composition between metformin-untreated T2D (“T2D metformin−”) (n = 106) patients and non-diabetic controls (“ND CTRL”) (n = 554), consistent with a broad-range dysbiosis in T2D (Figure 1A, Supplementary Table S5, see also Extended Data Table 1a and Supplementary Discussion 3 for an analysis of variances broken down by source). While metformin treatment status could be reliably recovered from microbial composition using support vector machines (SVMs), metformin-untreated T2D status itself could not (Figure 1B, Supplementary Table S6). In contrast, drug treatment-blinded T2D samples could in all three cohorts be separated from ND CTRL samples with similar accuracy as previously reported 3,4 , suggesting that the T2D metformin+ classifier robustly outperforms T2D metformin− classifiers across datasets (Supplementary Table S7). We further explored T2D gut microbiome alterations in 106 metformin-untreated T2D compared with 554 ND CTRL samples through univariate tests of microbial taxonomic and functional differences, with significant trends shown in Figure 2A. Metformin-untreated T2D was associated with a decrease in genera containing known butyrate producers such as Roseburia, Subdoligranulum, and a cluster of butyrate-producing Clostridiales spp. (Supplementary Table S8), consistent with previous indications 3,4 . More fine-grained taxonomic analysis indicate some driver species (Supplementary Discussion 4, Supplementary Table S9), and further finds changes in abundance of several unclassified Firmicutes, often reduced or reversed under metformin treatment (see Supplementary Discussion 4). Although an increase in Lactobacillus was seen in treatment-unstratified T2D samples (as previously found experimentally 13 ), this trend was eliminated or reversed when controlling for metformin. Functionally, we found enrichment of catalase (conceivably a response to increased peroxide stress under inflammation) and modules for ribose, glycine and tryptophan amino acid degradation, but a decrease in threonine and arginine degradation and in pyruvate synthase capacity (Supplementary Table S10). While these functional differences could result from strain-level composition changes or be a compound effect of subtle enrichment/depletion of larger ecological guilds, the abundance of most of these modules correlated with abundance of the significantly altered microbial genera (Figure 2A). To interpret our findings on T2D gut microbiota shifts further, we compared with 31 adult T1D patients (Supplementary Table S1, for further discussion of this sub-cohort, see also Supplementary Discussion 5, Supplementary Table S6 and Supplementary Table S11). This group is dysglycaemic like T2D patients, allowing us to separate purely glycaemic phenotype effects from T2D-specific microbial features. Gene richness was significantly (Wilcox FDR < 0.1) elevated in the T1D microbiomes (Figure 2B), whereas in T2D it was reduced (Supplementary Table S10), as reported previously 6 . Features found to distinguish metformin-untreated T2D from ND CTRL microbiomes did not replicate when comparing T1D to ND CTRL. Instead, most contrasts between metformin-untreated T2D samples and controls were reversed in adult T1D patients. In contrast, some microbial functions differentially abundant between metformin-untreated T2D and controls showed similar trends in T1D samples (Figure 2A), although not significantly so, possibly due to lower statistical power. We therefore conclude the majority of gut microbiota shifts visible in metformin-untreated T2D are not simply effects of dysglycaemia, but rather directly or indirectly associated with the causes or progression of T2D. Suspecting microbial mediation of some of the therapeutic effects of metformin, we next compared T2D metformin-treated (n = 93) and T2D metformin-untreated (n = 106) samples to characterize the treatment effect in more detail. Multivariate contrasts of T2D metformin-treated with T2D metformin-untreated samples appeared weaker than those between T2D metformin-untreated and ND CTRL samples, the former only significant at bacterial family level (Permanova FDR < 0.1), suggesting the effects of metformin treatment on gut microbial composition are poorly captured by multivariate analysis. Univariate tests of the effects of metformin treatment showed a significant increase of Escherichia spp. and a reduced abundance of Intestinibacter, the latter fully consistent across the different country datasets (Figure 3A) whereas the former is not seen in the CHN cohort, where diabetics and controls alike are enriched in Escherichia spp. relative to Scandinavian controls. Correcting for differences in gender, BMI and fasting levels of plasma glucose or serum insulin (some of which were significantly different between datasets, Supplementary Table S12) retained these differences as significant (Supplementary Table S13). Fasting serum concentrations of metformin were obtained for the MHD cohort and correlated significantly with abundances of both genera (Figure 3B). Amplicon-based analysis of an independent T2D cohort likewise validated an increase of Escherichia spp. and a reduced abundance of Intestinibacter in metformin-treated patients (Extended Data Figure 1c, Extended Data Table 1b, Supplementary Discussion 6). The metformin-associated changes might derive from taxon-specific resistance/sensitivity to the bacteriostatic or bactericidal properties of the drug 14 . The genus Intestinibacter was defined only recently 15 and includes the human isolate Clostridium bartletti 16 , since reclassified as Intestinibacter bartlettii. Little is known about its role in the colon ecosystem and how it might affect human health. However, I. bartlettii abundances were lower in pigs susceptible to colonization by enterotoxigenic Escherichia 17 , consistent with the pattern seen here following metformin treatment. Analysis of the SEED and GMM functional annotations linked to Intestinibacter show it to be resistant to oxidative stress and able to degrade fucose, hinting to indirect involvement in mucus degradation. It also appears to possess the genetic potential for sulphite reduction, including part of an assimilatory sulphate reduction pathway. Analysis of gut microbial functional potential more generally (Figure 3C) suggested that indirect metformin treatment effects, including reduced intestinal lipid absorption 18 and lipopolysaccharide (LPS)-triggered local inflammation can provide a competitive advantage to Escherichia species 19 possibly triggering a positive feedback loop further contributing to the observed taxonomic changes. At the same time, metformin might revert T2D-associated changes, as several gut microbial genera are more similar in abundance to ND CTRL levels under metformin treatment, notably Subdoligranulum and to some extent Akkermansia. The latter was previously shown to reduce insulin resistance in murine models when increased in abundance through prebiotics 20 , and has been shown to similarly increase in abundance under metformin treatment 10,21 . In human samples, the trend was however inconsistent between country subsets and only MHD samples show a similar response (Extended Data Figure 3). With respect to microbiota-mediated impact on host glucose regulation, the functional analyses (Figure 3C) demonstrated significantly enhanced butyrate and propionate production potential in metformin-treated individuals (Supplementary Table S14). Intriguingly, recent studies in mice have shown that an increase in colonic production of these SCFAs triggers intestinal gluconeogenesis (IGN) via complementary mechanisms. Butyrate activates IGN gene expression through a cAMP-dependent mechanism in enterocytes, while propionate, itself a substrate of IGN, activates IGN gene expression via the portal nervous system and the fatty acid receptor FFAR3 22,23 . In rodents the net result of an increased IGN is a beneficial effect on glucose and energy homeostasis with reductions in hepatic glucose production, appetite and body weight. Taken together, our characterization of a metformin-associated human gut microbiome suggests novel mechanisms contributing to the beneficial effects of the drug on host metabolism. Both on a compositional and functional level, we found significant microbiome alterations that are consistent with well-known side-effects of metformin treatment (Figure 3C). Most such metformin-associated functional shifts, including enrichment of virulence factors and gas metabolism genes, could be attributed to the significantly increased abundance of Escherichia (Supplementary Discussion 7, Supplementary Table S14, Supplementary Table S15). In conclusion, our results suggest partial gut microbial mediation of both therapeutic and adverse effects of the most widely used antidiabetic medication, metformin, though further validation is required to conclude causality and to clarify how such mediation might occur. Our study of T2D illustrates the need to disentangle specific disease dysbioses from effects of treatment on human microbiomes. The importance of this point was further shown by the fact that the previously reported high accuracy 3,4 of gut microbial signatures for identifying treatment-unstratified T2D patients decreased dramatically when considering a large set of metformin-naïve patients only, highlighting a general need to bear treatment regimens in mind both when developing and applying microbiome-based diagnostic and prognostic tools for common disorders or their pre-morbidity states. Methods Danish MetaHIT diabetic study Patient recruitment, enrolment and processing Patients with type 2 diabetes (T2D) were either recruited from the Inter99 study population 24 or from the out-patient clinic at Steno Diabetes Center, Gentofte, Denmark. Patients with known T2D were included if the patient had clinically defined T2D on the day of examination according to the WHO definition 25 . Inclusion criteria were fasting serum C-peptide above 200 pmol/l and negative testing for serum glutamic acid decarboxylase (GAD) 65 antibodies (to exclude type 1 diabetes (T1D), Latent Autoimmune Diabetes of the Adult (LADA)), secondary forms of diabetes like chronic pancreatitis diabetes or syndromic diabetes, no antibiotic treatment two months prior to inclusion, and no known gastro-intestinal diseases, no previous bariatric surgery or medication known to affect the immune system. All patients with T1D were recruited from the out-patient clinic at Steno Diabetes Center, Gentofte, Denmark (n=31). Inclusion criteria were dependence on insulin treatment from time of diagnosis, fasting serum C-pepide below 200 pmol/l, HbA1c above 8.0 % (64 mmol/l) to ensure current hyperglycemia, T1D duration and dependence on insulin treatment > 5 years, no antibiotic treatment at least two months prior to inclusion, and no known gastro-intestinal diseases. All study participants were of North European ethnicity. The study participants were examined on two different days with approximately 14 days apart. On the first day study participants were examined after an over-night fast. Height was measured without shoes to the nearest 0.5 cm, and weight was measured without shoes and wearing light clothes to the nearest 0.1 kg. Hip and waist circumference was measured using a non-expandable measuring tape to the nearest 0.5 cm. Waist circumference was measured midway between the lower rib margin and the iliac crest. Hip circumference was measured as the largest circumference between the waist and the thighs. Blood pressure was assessed while the participant was lying at an up-right position after at least 5 minutes of rest using a cuff of appropriate size (A&D, UA-787 plus digital or A&D, UA-779). Blood pressure was measured at least twice and the average of the measurements was calculated. At the second day of examination all participants delivered a stool sample which was immediately frozen after home collection and stored at minus 80 degrees. Information on medication status was obtained by questionnaire and interview at the first day of examination. Of the 75 T2D patients, ten patients (12 %) received no hyperglycaemic medications, 58 patients (77 %) received the biguanide metformin; of these 28 patients (37 %) received metformin as the only anti-hyperglycaemic medication, 10 patients (13 %) received sulfonylurea alone or in combination with metformin, 14 patients (19 %) received a combination of oral anti-diabetic drugs and insulin treatment and 4 patients (5 %) were on insulin treatment only. Eleven patients (15 %) patients received dipeptidyl peptidase-4 (DPP4) inhibitors or glucagon-like peptide-1 (GLP1), all of them in combination with metformin. Patients were reported as receiving antihypertensive treatment if at least one of the following drugs was reported: spironolactone, thiazides, loop diuretics, beta blockers, calcium channel blockers, moxonidin or drugs affecting the renin-angiotensin system (n=55 for T2D (73 %) and n=23 (74%) for T1D. Patients receiving statins, fibrates and/or ezetimibe were reported as receiving lipid lowering medication (n=56 for T2D (75 %; all on statin treatment), and n=24 for T1D (77 %; 74 % on statin treatment). All T1D patients were on insulin treatment as their only blood glucose lowering treatment. All biochemical analyses were performed on blood samples drawn in the morning after an over-night fast of at least 10 hours. Plasma glucose was analyzed by a glucose oxidase method (Granutest, Merck, Darmstadt, Germany) with a detection limit of 0.11 mmol/l and intra- and interassay coefficients of variation (CV) of <0.8% and <1.4%, respectively. HbA1c was measured on TOSOH G7 by ion-exchange high performance liquid chromatography. Serum C-peptide was measured using a time-resolved fluoroimmunoassay with the AutoDELFIA C-peptide kit (Perkin-Elmer, Wallac, Turku, Finland), having a detection limit of 5 pmol/l and intra- and interassay CV of <4.7% and <6.4%, respectively. Serum insulin (excluding des and intact proinsulin) was measured using the AutoDELFIA insulin kit (Perkin-Elmer, Wallac, Turku, Finland) with a detection limit of 3 pmol/l and with intra- and interassay CV of <3.2% and <4.5%, respectively. Plasma cholesterol, plasma HDL-cholesterol and plasma triglycerides were all measured on Vitros 5600 using reflect-spectrophotometrics. Plasma LDL – cholesterol was calculated using Friedewald’s equation. Blood leucocytes and white blood cell differential count were measured on Sysmex XS 1000i using flow cytometrics. Plasma metformin was determined by high performance liquid chromatography followed by tandem mass spectrometry (LC-MS/MS). Briefly the proteins were precipitated with acetonitril containing the deuterated internal standard, Metformin-d6, Hydrochloride, and the supernatant diluted by acetonitril. The analysis was performed on a Waters Acquity UPLC I-class system connected to a Xevo TQ-S tandem mass spectrometer in electrospray positive ionization mode. Separation was achieved on a Waters XBridgeT BEH Amide 2.5μm column and gradient elution with A: 100 mM ammonium formate pH 3.2 and B: Acetonitril. The MRM transitions used for metformin and metformin-d6 were 130.2>71.0 and 136.2>60.0. Calibrators were prepared by spiking drug free serum with metformin to a concentration of 2000 ng/ml. B12 was measured using Vitros Immunodiagnostic Products. GAD65 was measured on serum samples by a sandwich ELISA (RSR ltd.). Inter-assay and intra-assay CV were < 16.6 % and <6.7 % respectively, and with a detection limit of 0.57 U/ml. Stool samples were obtained at the homes of each participant and samples were immediately frozen by storing them in their home freezer. Frozen samples were delivered to Steno Diabetes Center using insulating polystyrene foam containers, and then they were stored at −80 °C until analysis. The time span from sampling to delivery at the Steno Diabetes Center was intended to be as short as possible and no more than 48 hours. A frozen aliquot (200 mg) of each faecal sample was suspended in 250 μl of guanidine thiocyanate, 0.1 M Tris, pH 7.5, and 40 μl of 10% N-lauroyl sarcosine. Then, microbial DNA extraction was conducted as previously reported 12 . The DNA concentration and its molecular size were estimated using nanodrop (Thermo Scientific) and agarose gel electrophoresis. Generation and availability of metagenomic samples Already available Danish metagenomic samples were those reported in 26 Li et al. and references therein (excluding 14 samples removed due to average read length below 40 nt, and with 5 Chinese and 21 Swedish samples with less than the rarefaction threshold of 7M reads in total excluded from functional profile or diversity analyses), with newly sequenced samples deposited in the European Bioinformatics Institute Sequence Read Archive under accession ERP004605. All information on Swedish data was retrieved from published data 4 . In addition to published data on Chinese individuals 3 , we retrieved information on metformin treatment on a subset of 71 Chinese T2D patients. One hundred and twelve samples from Qin et al. 3 lacked metformin treatment metadata and were therefore discarded except for measuring differences between the country datasets disregarding treatment or diabetic status. Characteristics of all study participants included in the present protocol are given in Supplementary Table S1. Validation cohort recruitment and sample processing Additional Danish T2D patients were recruited at the Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen during 2014 as a part of the ongoing MicrobDiab study (http://metabol.ku.dk/research-project-sites/microbdiab/). T2D patients were included in the study if time since T2D diagnosis was less than 5 year, if they were between 35 and 75 years of age, Caucasian and if they had not received antibiotics within the past four months of inclusion. In total 30 T2D patients (21 male/9 female) were identified. Fecal samples were collected at the home of the patients including immediate freezing of samples in home freezers, and transport of samples to the hospital stored on dry ice. The samples were stored at −80 degrees C until DNA extraction. Information of medication was obtained from questionnaires. In total 21 (70%) of the T2D patients received metformin. Ethics statement All individuals in both the Danish MetaHIT study and the Danish validation study gave written informed consent before participation in the studies. Both studies were approved by the Ethical Committees of the Capital Region of Denmark (MetaHIT study: HC-2008-017; validation study: H-3-2013-102 ). Both studies were conducted in accordance with the principles of the Declaration of Helsinki. Metagenomic analysis Construction of a non-redundant metagenomic reference gene catalogue Illumina shotgun sequencing was applied to DNA extracted from 620 faecal samples originating from the MetaHIT project (Supplementary Table S1). Raw sequencing data were processed using the MOCAT (version 1.1) software package 27 . Reads were trimmed (option read_trim_filter) using a quality and length cutoff of 20 and 30bp, respectively. Trimmed reads were subsequently screened against a custom database of Illumina adapters (option screen_fastafile) and the human genome version 19 using a 90% identity cut-off (option screen). The resulting high-quality reads were assembled (option assembly) and assemblies revised (option assembly revision). Genes were predicted on scaftigs with a minimum length of 500 bp (option gene_prediction). Predicted protein-coding genes with a minimum length of 100 bp were clustered at 95% sequence identity using CD-HIT (version 4.6.1) 28 with parameters set to: -c 0.95, -G 0 -aS 0.9, -g 1, -r 1). The representative genes of the resulting clusters were “padded” (i.e., extended up to 100 bp at each end of the sequence using the sequence information available from the assembled scaftigs), resulting in the final reference gene catalogue used in this study. The reference gene catalogue was functionally annotated using SmashCommunity 29 (version 1.6) after aligning the amino acid sequence of each gene to the KEGG 30 (version 62) and eggNOG 31 [(version 3) databases. Profiling of metagenomic samples Raw insert (sequenced fragments of DNA represented by single or paired-end reads) count profiles were generated using MOCAT 27 by mapping high-quality reads from each metagenome to the reference gene catalogue (option screen) using an alignment length and identity cut-off of 45 and 95%, respectively. For each gene, the number of inserts that matched the protein-coding region was counted. Counts of inserts that mapped with the same alignment score to multiple genes were distributed equally among them. Taxonomic abundances were computed at the level of metagenomic operational taxonomic units (mOTUs) 32 , normalized to the length of the concatenated marker genes for each mOTU to yield the abundances used for the study, and subsequently binned at broader taxonomic levels (genus, family, class, etc.). Rarefaction of metagenomic data and microbial diversity measurements For all metagenome-derived measures except the mOTU taxonomic assignments, read counts were rarefied in order to avoid any artifacts of sample size on low-abundance genes. Rarefied matrices were obtained as follows. Data matrices were rarefied to 7M reads per sample. This threshold was chosen to include most samples, but 5 Chinese and 21 Swedish samples were excluded due to having less than 7M reads per sample. Rarefactions were done using a C++ program developed for the Tara project 33 . In total we did 30 repetitions, in each of which we measured the richness, evenness, chao1 and Shannon diversity metrics within a rarefaction. The median value of these was taken as the respective diversity measurement for each sample. The first of 30 rarefactions of each sample was used to create a rarefied gene abundance matrix and KO abundance profiles were calculated by summing the rarefied abundance of genes annotated to the respective KO gene. Metagenomic species (MGS) construction Clustering of the catalogue genes by co-abundance, as described in Nielsen et al. 34 , defined 10,754 co-abundance gene groups (CAGs) with very high correlations (Pearson correlation coefficient > 0.9). The 925 largest of these, having more than 700 genes, were termed metagenomic species (MGS). The abundance profiles of the CAGs and MGS were determined as the medium gene abundance (downsized to 7M reads per sample) throughout the samples. Furthermore, the CAGs and MGS were taxonomically annotated, by sequence similarity to known reference genomes. Functional annotation/binning of metagenomes To avoid falsely drawing gut microbial functional conclusions from high abundance of single genes remotely homologous to members of a functional pathway, an approach was used that required presence of multiple pathway members. Functional pathway abundance was calculated from gene catalog KO annotation and MGS abundances per sample. Thus KOs present in each MGS were used to determine for that CAG/MGS which functional modules were represented within its genetic repertoire, requiring for this that >90% of KO’s necessary for the completion of a reaction pathway should be present, when also taking alternative enzymatic pathways into account. The module abundance within a sample was calculated from CAG abundance in each respective sample, summing over all CAGs which had the module present. Rarefied median coverage of CAG/MGSs were used, so no further normalization of the module abundance matrix was required. Abundance of genetic potential falling under the same higher-order functional levels was calculated by summing up all abundances of the lower-level functional modules within each sample. Existing functional annotation databases cover gut metabolic pathways relatively poorly. To account for this, a number of additional bacterial gene functional modules were curated and annotated, extending the KEGG system; these are referred to in result tables as GMMs (Gut Microbial Modules) and were previously described in Le Chatelier et al 12 . 16S amplicon processing 16S amplicons from frozen samples were sequenced 300bp + 200bp paired end reads using a Illumina miSeq machine. We used the LotuS 35 pipeline in short amplicon mode with default quality filtering, clustering and denoising OTUs with UPARSE 36 , removing chimeric OTUs against the RDP reference database (http://drive5.com/uchime/rdp_gold.fa) with uchime 37 , merging reads with FLASH 38 and assigning a taxonomy against the SILVA 119 rRNA database 39 and further refined by BLAST searches against the NCBI rRNA database 40 to identify Intestinibacter OTUs, using the following LotuS command line options: “-p miSeq -refDB SLV -doBlast blast -amplicon_type SSU -tax_group bacteria -derepMin 2 -CL 2 -thr 14”. Univariate tests of taxonomic or functional abundance differences Microbial taxa where mean abundance over all samples was less than 30 reads, or which were present in less than 3 samples were excluded from univariate and classifier analyses. All abundances were normalized by total sample sum. For module tables no feature filters were used except requiring the module to be present in at least 20 samples. Filtered data tables were made available online. Univariate testing for differential abundances of each taxonomic unit between two or more groups was tested using Mann-Whitney-U or Kruskal-Wallis tests, respectively, corrected for multiple testing using the Benjamini-Hochberg false discovery control procedure rate (e.g. q-values) 41 . Post-hoc statistical testing for significant differences between all combinations of two groups was conducted only for taxa with abundances significantly different at P < 0.2. Wilcoxon rank-sum tests were calculated for all possible group combinations and corrected for multiple testing again using the Benjamini-Hochberg false discovery rate, as implemented in R. Where controlling for potential confounders such as source study, we used blocked “independence_test” function calls with options “ytrafo = rank, teststat=scalar” for blocked WRST and “ytrafo = rank, teststat=quad” for blocked KWT, as implemented in the COIN software package 42 for R. Similarly we applied these independence tests in the framework of post-hoc testing as described above. Analysis of correlations between taxonomic or functional features, community diversity indices and sample metadata variables were conducted using Spearman correlation tests as implemented in R, and corrected for multiple tests using the Benjamini-Hochberg false discovery rate control procedure. To control for confounders such as source study in univariate correlation analyses, blocked Spearman tests as implemented in COIN (settings “independence_test”, options ytrafo = rank, xtrafo=rank, distribution=asymptotic) were used. In some analyses taxa were corrected for the influence of a continuous confounder variable like microbial community richness; in these cases the residual of a linear model between normalized, log-transformed taxa abundances and overall sample gene richness was used to correct for the confounding variable. Power analysis was conducted by randomly subsampling to a given sample number, repeated 5 times to achieve robust results. Ordinations and multivariate tests All ordinations (NMDS, dbRDA) and subsequent statistical analyses were calculated using the R-package vegan 43 using Canberra distances on normalized taxa abundance matrices, then visualized using the ggplot2 R package 44 . Community differences were calculated using a permutation test on the respective NMDS reduced feature space, as implemented in vegan. Furthermore, we calculated intergroup differences for the microbiota using PERMANOVA 45 as implemented in vegan. This test compares the intragroup distances to the intergroup distances in a permutation scheme and from this calculates a P-value. For all PERMANOVA tests we used 2×105 randomizations and a normalized genus-level mOTU abundance matrix, using Canberra intersample distances. PERMANOVA post hoc P-values were corrected for multiple testing using the Benjamini-Hochberg false discovery rate control procedure. Analysis of variance broken down by cohort, treatment and disease status was conducted by fitting these distances to a linear model of sample metadata distances, as further described in Supplementary Discussion 3.2. Classifier construction and evaluation To create classifiers for separating samples from different subsets, a L1 restricted LASSO using the R glmnet package 46 was carried out to test for an optimal value of lambda (number of features to be used in the final predictor) in a 5-fold cross-validated and internally 4-fold cross-validated LASSO run on all data. After this, the previously determined value of lambda was manually controlled for number of features used against the root mean square error of the classifier. In a 5-fold cross-validation an independent LASSO classifier was trained on 4/5 of the data using the previously determined value of lambda, and response values were predicted on 1/5 of the data. LASSO models with a Poisson response type were used in all cases. Binary classifications between T2D and ND CTRL samples were performed with a R reimplementation of the robust recursive feature elimination support vector machine (rRFE-SVM) 47 procedure. The SVM was performed in an outer cross-validation scheme on 4/5 of the data. Of these randomly 90% were selected 200 times in each cross-validation for the RFE, to create a feature ranking from an average over these runs. Classifier performance was validated on the remaining 1/5 of samples using the pre-established feature ranking. In case of several cohorts, ROC-AUC scores were measured for each cohort separately. Code availability The MGS technology has previously been described 34 and is available online (http://git.dworzynski.eu/mgs-canopy-algorithm/wiki/Home). The mOTU resource has been made publically available (http://www.bork.embl.de/software/mOTU/) and was analyzed using MOCAT 27 which is also publically available (http://vm-lux.embl.de/~kultima/MOCAT/). The 16S pipeline LotuS 35 is freely available online (http://psbweb05.psb.ugent.be/lotus). The novel gene catalog has been deposited online (http://vm-lux.embl.de/~kultima/share/gene_catalogs/620mhT2D/), as have the raw amplicon sequences (http://vm-lux.embl.de/~forslund/t2d/). Statistical analysis and data visualization was conducted using freely available R libraries: vegan, coin and ggplot2 and was described in more details elsewhere 48,49 . Data matrices and R source code for replicating the central tests conducted on the data have been deposited online (http://vm-lux.embl.de/~forslund/t2d/). Evaluation of dietary habits A subset of the Danish study participants answered a validated food frequency questionnaire (FFQ) in order to obtain information on the habitual dietary habits. A complete dataset was obtained for 66 % of the non-diabetic individuals and 88 % of T2D patients. When evaluating the dietary data the consumed quantity was determined by multiplying portion size by the corresponding consumption frequency reported. Standard portion sizes for women and men, separately, were used in this calculation 50,51 . All food items in the FFQ were linked to food items in the Danish Food Composition Databank 52 . Estimation of daily intake of macro- and micronutrients for each participant was based on calculations in the software program FoodCalc version 1.3 53 . Extended Data Extended Data Figure 1 A. As a positive control for the meta-analysis pipeline, true signal was removed from the data by randomly reshuffling sample labels. Artificial contrast was thereafter introduced between random groups containing as many such reshuffled samples as were in the original sets of T2D metformin+ (nCHN=15, nMHD=58, nSWE=20) and T2D metformin− (nCHN=56, nMHD=17, nSWE=33) samples in each original study subset, using the genus Akkermansia as an example feature. Samples randomly assigned to the sets of fake “metformin treated” and “control” categories had their Akkermansia genus abundances adjusted to match the scale of the metformin effect on Escherichia genus abundance reported here (metformin-treated samples roughly 150% as likely to have nonzero abundance, with a roughly threefold higher abundance where present), while retaining their dataset origin labels. The full meta-analysis pipeline (study set blocked KWT test, post-hoc WRS test) was applied to these samples. Benjamini-Hochberg-corrected P-values (FDR scores/Q-values) from testing for a metformin effect on Akkermansia abundance are here plotted in logarithmic scale on the vertical axis for 100 randomizations of the entire shuffled dataset, either without (left boxplot) or with (right boxplot) the artificial Akkermansia metformin signal added after shuffling the data to remove original signal. Box plot borders show medians and quartiles, with points outside this range shown as vertical whisker lines and point markers, whiskers extend to 1.58 × interquartile range / sqrt (n). Horizontal guide lines are shown for ease of visualization corresponding to different false discovery rate thresholds. For randomly reshuffled data, as expected no significant contrast is detected, while the artificially introduced signal is reliably detected, roughly matching expectations from the definition of the false discovery rate itself. B. To investigate statistical power for the other medications tracked, five random sub-samplings were made of pairs of medicated and non-medicated samples, at each increasing number of included sample pairs, and the overall analysis replicated for each, testing each genus for significantly (KW-test followed by post-hoc WRS test) differential abundance between cases and controls, at different BH FDR significance cutoffs marked in the figure using different colours. Out of the total number of samples for which medication status was known, equal numbers n of medicated and unmedicated samples were chosen randomly in repeated iterations. This number n was varied up to its largest possible value (smallest of either number of medicated or unmedicated samples in overall dataset) and is what is shown on the horizontal axis. The vertical axis shows number of features significant relative to each cutoff, with standard deviation over each set of five randomized samples shown as error bars. C. The graphs show Intestinibacter and Escherichia median and quartile abundances as boxplots, whiskers extend to 1.58 × interquartile range / sqrt (n), with samples extreme relative to the interquartile range shown as point markers and with samples below detection threshold (DT) plotted at y = 0, in 21 additional T2D metformin+ and 9 additional T2D metformin− samples. Differences in abundance between sample categories are significant (WRS test, BH FDR < 0.1). The samples where Intestinibacter was detected all fall among the 9/30 untreated rather than the 21/30 metformin-treated samples, consistent with severe depletion under treatment, whereas Escherichia abundances increase under treatment, likewise consistent with observations from the main dataset. Extended Data Figure 2 Differences in physiological variables and microbiome characteristics between Chinese (n=368), Danish MetaHIT (n=383) and Swedish (n=145) gut metagenome sample sets. A. Several participant metadata variables are significantly different between cohorts, of which a subselection is shown here as boxplots displaying median and quartiles, with samples outside this range shown as point markers and whiskers, whiskers extend to 1.58 × interquartile range / sqrt (n). B. In a PCoA ordination of Bray-Curtis distances between samples on bacterial family level, clear differences between samples from the different cohorts become apparent. These are largely explained by taxonomic differences as summarized at the phylum level. C. Boxplots for gut microbial taxa show medians and quartiles of log-transformed read counts for mOTUs summarized at the level of bacterial genera, for the three country subsets across sample categories, with samples outside this range shown as point markers and whiskers, whiskers extend to 1.58 × interquartile range / sqrt (n). For all boxplots, tests for significant differences (Kruskal-Wallis test adjusted for study source) were performed with P-values shown at the head of each figure. Star markers note significance of tests done for each country subset separately. Extended Data Figure 3 Taxonomic microbiome composition comparison between T2D metformin− (n=106), T2D metformin+ (n=93) and ND CTRL (n=554) gut metagenomes with particular focus on possible taxonomic restoration under metformin treatment for certain taxa. Boxplots show medians and quartiles log-transformed read counts for mOTUs summarized at the level of bacterial genera, for the three country subsets across sample categories, with samples outside this range shown as point markers and whiskers, whiskers extend to 1.58 × interquartile range / sqrt (n). Tests for significant differences (Kruskal-Wallis test adjusted for study source) were performed with P-values shown at the head of each figure. Star markers show results of tests for each country subset separately. Extended Data Table 1 A. The analysis of variances table shows the results of modelling the Canberra distances between T2D metformin− (n=106), T2D metformin+ (n=93) and ND CTRL (n = 554) samples with predictor variables encoding same/different diabetes status, same/different treatment, and same/different study source/country. Fractions of explained variance are taken as fractions of sum of square deviations from the model relative to the total deviation. B. Bacterial taxa found significantly different in gut abundance under metformin treatment were tested (WRS-test) for significant differential relative abundance in a separate cohort under 16S amplicon sequencing between T2D metformin+ (n=26) and T2D metformin− (n=8) samples. a Degrees of freedom Sum of squares Explained variation F-statistic Pr(>F) Treatment 1 128 3.8% 21454.01 <2E-016 *** Disease 1 42 1.2% 7039.86 <2E-016 *** Country 1 376 11.1% 63206.96 <2E-016 *** Treatment × Disease 1 1 0.0% 192.62 <2E-016 *** Treatment × Country 1 67 2.0% 11209.33 <2E-016 *** Disease × Country 1 1 0.0% 218.97 <2E-016 *** Treatment × Disease × Country 1 0 0.0% 22.79 0.00000181 *** Residuals 567001 3375 100.0% Total: 3990 b Database OTU identifier MWU P-value Enriched in Mean abundance (%) OTU_45 0.048968332 T2D metformin+ 0.803960725 OTU_1038 0.0319637913 T2D metformin− 0.000185722 Database OTU identifier OTU_45 OTU_1038 Domain Bacteria Bacteria Phylum Proteobacteria Firmicutes Class Gammaproteobacteria Clostridia Order Enterobacteriales Clostridiales Family Enterobacteriaceae Peptostreptococcaceae Genus Escherichia-Shigella Intestinibacter Supplementary Material Supplementary Discussion Supplementary Information Supplementary Table Legends SITable1 SITable2 SITable3 SITable4 SITable5 SITable6 SITable7 SITable8 SITable9 SITable10 SITable11 SITable12 SITable13 SITable14 SITable15 SITable16

          Related collections

          Most cited references 24

          • Record: found
          • Abstract: found
          • Article: not found

          FLASH: fast length adjustment of short reads to improve genome assemblies.

          Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. t.magoc@gmail.com.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The gut microbiome in health and in disease

            Recent technological advancements and expanded efforts have led to a tremendous growth in the collective knowledge of the human microbiome. This review will highlight some of the important recent findings in this area of research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Host-derived nitrate boosts growth of E. coli in the inflamed gut.

              Changes in the microbial community structure are observed in individuals with intestinal inflammatory disorders. These changes are often characterized by a depletion of obligate anaerobic bacteria, whereas the relative abundance of facultative anaerobic Enterobacteriaceae increases. The mechanisms by which the host response shapes the microbial community structure, however, remain unknown. We show that nitrate generated as a by-product of the inflammatory response conferred a growth advantage to the commensal bacterium Escherichia coli in the large intestine of mice. Mice deficient in inducible nitric oxide synthase did not support the growth of E. coli by nitrate respiration, suggesting that the nitrate generated during inflammation was host-derived. Thus, the inflammatory host response selectively enhances the growth of commensal Enterobacteriaceae by generating electron acceptors for anaerobic respiration.
                Bookmark

                Author and article information

                Journal
                0410462
                6011
                Nature
                Nature
                Nature
                0028-0836
                1476-4687
                14 October 2015
                02 December 2015
                10 December 2015
                10 June 2016
                : 528
                : 7581
                : 262-266
                Affiliations
                [1 ]European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
                [2 ]Center for the Biology of Disease, VIB, Leuven, Belgium
                [3 ]Department of Bioscience Engineering, Vrije Universiteit Brussel, Brussels, Belgium
                [4 ]The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
                [5 ]KU Leuven – University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, Leuven, Belgium
                [6 ]MICALIS, Institut National de la Recherche Agronomique, Jouy en Josas, France
                [7 ]Metagenopolis, Institut National de la Recherche Agronomique, Jouy en Josas, France
                [8 ]Institute of Cardiometabolism and Nutrition, Paris, France
                [9 ]Center for Biological Sequence Analysis, Dept. of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark
                [10 ]Department of Biology, University of Copenhagen, Copenhagen, Denmark
                [11 ]Department of Applied Tumor Biology, Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
                [12 ]Molecular Medicine Partnership Unit , University of Heidelberg and European Molecular Biology Laboratory, Heidelberg, Germany
                [13 ]Research Centre for Prevention and Health, Capital Region of Denmark, Copenhagen, Denmark
                [14 ]Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
                [15 ]Faculty of Medicine, University of Aalborg, Aalborg, Denmark
                [16 ]Novo Nordisk Foundation Center for Protein Research, Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
                [17 ]Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
                [18 ]BGI-Shenzhen, Shenzhen, China
                [19 ]Princess Al Jawhara Albrahim Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
                [20 ]Macau University of Science and Technology, Avenida Wai long, Taipa, Macau, China
                [21 ]Department of Medicine and State Key Laboratory of Pharmaceutical Biotechnology, University of Hong Kong, Hong Kong
                [22 ]King’s College London, Centre for Host-Microbiome Interactions, Dental Institute Central Office, Guy’s Hospital, United Kingdom
                Author notes

                Author Contributions

                O. P., S. D. E. & P.B devised the project, designed the study protocol and supervised all phases of the project. T.N., T.H., T.J., & O.P. carried out patient phenotyping and clinical data analyses, T.N. & F.L. performed sample collection & DNA extraction, J.D. supervised DNA extraction, J.W. supervised DNA sequencing and gene profiling, A.Y.V. and R.H. performed additional microbial DNA extraction and amplicon sequencing, J.R, H.B.N., S.B., S. D. E., P.B. & O.P. designed and supervised the data analyses, K.F., F.H., G.F., E.LC., S.S., E.P., S.S-V., V.G., H.K.P, M.A., P.I.C., J.R.K. & H.B.N performed the data analyses, K.F., F.H., T.N., P.B, S.D.E. & O.P. wrote the paper. All authors contributed to data interpretation, discussions and editing of the paper. MetaHIT consortium members contributed to the design and execution of the study.

                Article
                EMS65452
                10.1038/nature15766
                4681099
                26633628

                Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                Categories
                Article

                Uncategorized

                Comments

                Comment on this article