In recent years, several associations between common chronic human disorders and altered
gut microbiome composition and function have been reported
1,2
. In most of these reports, treatment regimens were not controlled for and conclusions
could thus be confounded by the impact of various drugs on the microbiome. This may
obfuscate microbial causes, protective factors, or diagnostically relevant signals.
The present study addresses disease and drug signatures in the human gut microbiome
of type 2 diabetes mellitus (T2D). Two recent quantitative gut metagenomics studies
of T2D patients unstratified for treatment yielded divergent conclusions regarding
its associated gut microbiotal dysbiosis
3,4
. Here we show, using 784 available human metagenomes, how antidiabetic medication
confounds these results and analyse in detail the effects of the most widely used
antidiabetic drug, metformin. We provide support for microbial mediation of therapeutic
effects of metformin through short-chain fatty acid (SCFA) production as well as for
potential microbiota-mediated mechanisms behind known intestinal adverse effects in
the form of a relative increase of Escherichia abundance. Controlling for metformin
treatment, we report a unified signature of gut microbiome shifts in T2D with a depletion
of butyrate-producing taxa
3,4
. These in turn cause functional microbiome shifts, in part alleviated by metformin-induced
changes. Overall, the present study emphasizes the need to disentangle effects of
specific diseases from those of drugs in studies of human microbiomes.
T2D is a disorder of elevated blood glucose levels (hyperglycaemia) primarily due
to insulin resistance and inadequate insulin secretion, with rising global prevalence.
Genetic and environmental risk factors are known, the latter including dietary habits
and a sedentary lifestyle
5
, and gut microbiota involvement is increasingly recognized
3,4,6,7
, although findings diverge between studies
8
; e.g. Qin et al. (2012)
3
report several Clostridium species enriched in T2D whereas Karlsson et al. (2013)
4
instead report enrichment of several Lactobacilli (see Supplementary Discussion).
Treatment involves medication and lifestyle intervention, which may confound reported
gut dysbiosis. Many T2D patients receive metformin, an oral blood glucose-lowering,
non-metabolizable compound whose primary and dominant metabolic effect is the inhibition
of liver glucose production
9
. At least 30% of patients report adverse effects including diarrhea, nausea, vomiting,
and bloating, with underlying mechanisms poorly understood. Studies in animals
10
and humans
11
suggest some beneficial effects of metformin on glucose metabolism may be microbially
mediated. Here, we built a multi-country T2D metagenomic dataset, starting with gut
microbial samples from a non-diabetic Danish cohort of 277 individuals within the
MetaHIT project
12
and additional novel Danish MetaHIT metagenomes from 75 T2D and 31 type 1 diabetes
(T1D) patients sequenced using the same protocols (samples abbr. MHD). Treatment information
was obtained for all MHD samples, as well as for samples from a previously reported
4
cohort of 53 female Swedish T2D patients along with 92 nondiabetic individuals (43
NGT, 49 IGT) (SWE) and a subgroup of 71 Chinese T2D patients with available information
on antidiabetic treatment as well as 185 non-diabetic Chinese individuals
3
(CHN). For all these 784 gut metagenomes (Supplementary Table S1), taxonomic and functional
profiles were determined (see Methods), verifying our meta-analysis framework to be
appropriate and robust in the context of theoretical considerations and through simulations
(Supplementary Discussion 1, Extended Data Figure 1a) as well as characterizing differences
between the datasets (Extended Data Figure 2). Initial analysis unstratified for treatment
but controlling for demographic and technical variation between datasets (Supplementary
Discussion 2, Supplementary Table S2) recovered a majority of previously reported
associations (Supplementary Discussion 2, Supplementary Table S3) but with large divergence
between datasets. Suspecting confounding treatments, we tested for influence of diet
and antidiabetic medications (Supplementary Discussion 3, Supplementary Table S4,
Extended Data Figure 1b), finding an effect only of metformin. Since the fraction
of medicated patients (“T2D metformin+”) varied strongly (21% CHN, 38% SWE and 77%
MHD) samples were stratified on metformin treatment status. Multivariate analysis
showed significant (Permanova FDR < 0.005) differences in gut taxonomic composition
between metformin-untreated T2D (“T2D metformin−”) (n = 106) patients and non-diabetic
controls (“ND CTRL”) (n = 554), consistent with a broad-range dysbiosis in T2D (Figure
1A, Supplementary Table S5, see also Extended Data Table 1a and Supplementary Discussion
3 for an analysis of variances broken down by source). While metformin treatment status
could be reliably recovered from microbial composition using support vector machines
(SVMs), metformin-untreated T2D status itself could not (Figure 1B, Supplementary
Table S6). In contrast, drug treatment-blinded T2D samples could in all three cohorts
be separated from ND CTRL samples with similar accuracy as previously reported
3,4
, suggesting that the T2D metformin+ classifier robustly outperforms T2D metformin−
classifiers across datasets (Supplementary Table S7).
We further explored T2D gut microbiome alterations in 106 metformin-untreated T2D
compared with 554 ND CTRL samples through univariate tests of microbial taxonomic
and functional differences, with significant trends shown in Figure 2A. Metformin-untreated
T2D was associated with a decrease in genera containing known butyrate producers such
as Roseburia, Subdoligranulum, and a cluster of butyrate-producing Clostridiales spp.
(Supplementary Table S8), consistent with previous indications
3,4
. More fine-grained taxonomic analysis indicate some driver species (Supplementary
Discussion 4, Supplementary Table S9), and further finds changes in abundance of several
unclassified Firmicutes, often reduced or reversed under metformin treatment (see
Supplementary Discussion 4). Although an increase in Lactobacillus was seen in treatment-unstratified
T2D samples (as previously found experimentally
13
), this trend was eliminated or reversed when controlling for metformin. Functionally,
we found enrichment of catalase (conceivably a response to increased peroxide stress
under inflammation) and modules for ribose, glycine and tryptophan amino acid degradation,
but a decrease in threonine and arginine degradation and in pyruvate synthase capacity
(Supplementary Table S10). While these functional differences could result from strain-level
composition changes or be a compound effect of subtle enrichment/depletion of larger
ecological guilds, the abundance of most of these modules correlated with abundance
of the significantly altered microbial genera (Figure 2A).
To interpret our findings on T2D gut microbiota shifts further, we compared with 31
adult T1D patients (Supplementary Table S1, for further discussion of this sub-cohort,
see also Supplementary Discussion 5, Supplementary Table S6 and Supplementary Table
S11). This group is dysglycaemic like T2D patients, allowing us to separate purely
glycaemic phenotype effects from T2D-specific microbial features. Gene richness was
significantly (Wilcox FDR < 0.1) elevated in the T1D microbiomes (Figure 2B), whereas
in T2D it was reduced (Supplementary Table S10), as reported previously
6
. Features found to distinguish metformin-untreated T2D from ND CTRL microbiomes did
not replicate when comparing T1D to ND CTRL. Instead, most contrasts between metformin-untreated
T2D samples and controls were reversed in adult T1D patients. In contrast, some microbial
functions differentially abundant between metformin-untreated T2D and controls showed
similar trends in T1D samples (Figure 2A), although not significantly so, possibly
due to lower statistical power. We therefore conclude the majority of gut microbiota
shifts visible in metformin-untreated T2D are not simply effects of dysglycaemia,
but rather directly or indirectly associated with the causes or progression of T2D.
Suspecting microbial mediation of some of the therapeutic effects of metformin, we
next compared T2D metformin-treated (n = 93) and T2D metformin-untreated (n = 106)
samples to characterize the treatment effect in more detail. Multivariate contrasts
of T2D metformin-treated with T2D metformin-untreated samples appeared weaker than
those between T2D metformin-untreated and ND CTRL samples, the former only significant
at bacterial family level (Permanova FDR < 0.1), suggesting the effects of metformin
treatment on gut microbial composition are poorly captured by multivariate analysis.
Univariate tests of the effects of metformin treatment showed a significant increase
of Escherichia spp. and a reduced abundance of Intestinibacter, the latter fully consistent
across the different country datasets (Figure 3A) whereas the former is not seen in
the CHN cohort, where diabetics and controls alike are enriched in Escherichia spp.
relative to Scandinavian controls. Correcting for differences in gender, BMI and fasting
levels of plasma glucose or serum insulin (some of which were significantly different
between datasets, Supplementary Table S12) retained these differences as significant
(Supplementary Table S13). Fasting serum concentrations of metformin were obtained
for the MHD cohort and correlated significantly with abundances of both genera (Figure
3B). Amplicon-based analysis of an independent T2D cohort likewise validated an increase
of Escherichia spp. and a reduced abundance of Intestinibacter in metformin-treated
patients (Extended Data Figure 1c, Extended Data Table 1b, Supplementary Discussion
6). The metformin-associated changes might derive from taxon-specific resistance/sensitivity
to the bacteriostatic or bactericidal properties of the drug
14
. The genus Intestinibacter was defined only recently
15
and includes the human isolate Clostridium bartletti
16
, since reclassified as Intestinibacter bartlettii. Little is known about its role
in the colon ecosystem and how it might affect human health. However, I. bartlettii
abundances were lower in pigs susceptible to colonization by enterotoxigenic Escherichia
17
, consistent with the pattern seen here following metformin treatment. Analysis of
the SEED and GMM functional annotations linked to Intestinibacter show it to be resistant
to oxidative stress and able to degrade fucose, hinting to indirect involvement in
mucus degradation. It also appears to possess the genetic potential for sulphite reduction,
including part of an assimilatory sulphate reduction pathway. Analysis of gut microbial
functional potential more generally (Figure 3C) suggested that indirect metformin
treatment effects, including reduced intestinal lipid absorption
18
and lipopolysaccharide (LPS)-triggered local inflammation can provide a competitive
advantage to Escherichia species
19
possibly triggering a positive feedback loop further contributing to the observed
taxonomic changes. At the same time, metformin might revert T2D-associated changes,
as several gut microbial genera are more similar in abundance to ND CTRL levels under
metformin treatment, notably Subdoligranulum and to some extent Akkermansia. The latter
was previously shown to reduce insulin resistance in murine models when increased
in abundance through prebiotics
20
, and has been shown to similarly increase in abundance under metformin treatment
10,21
. In human samples, the trend was however inconsistent between country subsets and
only MHD samples show a similar response (Extended Data Figure 3). With respect to
microbiota-mediated impact on host glucose regulation, the functional analyses (Figure
3C) demonstrated significantly enhanced butyrate and propionate production potential
in metformin-treated individuals (Supplementary Table S14). Intriguingly, recent studies
in mice have shown that an increase in colonic production of these SCFAs triggers
intestinal gluconeogenesis (IGN) via complementary mechanisms. Butyrate activates
IGN gene expression through a cAMP-dependent mechanism in enterocytes, while propionate,
itself a substrate of IGN, activates IGN gene expression via the portal nervous system
and the fatty acid receptor FFAR3
22,23
. In rodents the net result of an increased IGN is a beneficial effect on glucose
and energy homeostasis with reductions in hepatic glucose production, appetite and
body weight. Taken together, our characterization of a metformin-associated human
gut microbiome suggests novel mechanisms contributing to the beneficial effects of
the drug on host metabolism.
Both on a compositional and functional level, we found significant microbiome alterations
that are consistent with well-known side-effects of metformin treatment (Figure 3C).
Most such metformin-associated functional shifts, including enrichment of virulence
factors and gas metabolism genes, could be attributed to the significantly increased
abundance of Escherichia (Supplementary Discussion 7, Supplementary Table S14, Supplementary
Table S15).
In conclusion, our results suggest partial gut microbial mediation of both therapeutic
and adverse effects of the most widely used antidiabetic medication, metformin, though
further validation is required to conclude causality and to clarify how such mediation
might occur. Our study of T2D illustrates the need to disentangle specific disease
dysbioses from effects of treatment on human microbiomes. The importance of this point
was further shown by the fact that the previously reported high accuracy
3,4
of gut microbial signatures for identifying treatment-unstratified T2D patients decreased
dramatically when considering a large set of metformin-naïve patients only, highlighting
a general need to bear treatment regimens in mind both when developing and applying
microbiome-based diagnostic and prognostic tools for common disorders or their pre-morbidity
states.
Methods
Danish MetaHIT diabetic study
Patient recruitment, enrolment and processing
Patients with type 2 diabetes (T2D) were either recruited from the Inter99 study population
24
or from the out-patient clinic at Steno Diabetes Center, Gentofte, Denmark. Patients
with known T2D were included if the patient had clinically defined T2D on the day
of examination according to the WHO definition
25
. Inclusion criteria were fasting serum C-peptide above 200 pmol/l and negative testing
for serum glutamic acid decarboxylase (GAD) 65 antibodies (to exclude type 1 diabetes
(T1D), Latent Autoimmune Diabetes of the Adult (LADA)), secondary forms of diabetes
like chronic pancreatitis diabetes or syndromic diabetes, no antibiotic treatment
two months prior to inclusion, and no known gastro-intestinal diseases, no previous
bariatric surgery or medication known to affect the immune system.
All patients with T1D were recruited from the out-patient clinic at Steno Diabetes
Center, Gentofte, Denmark (n=31). Inclusion criteria were dependence on insulin treatment
from time of diagnosis, fasting serum C-pepide below 200 pmol/l, HbA1c above 8.0 %
(64 mmol/l) to ensure current hyperglycemia, T1D duration and dependence on insulin
treatment > 5 years, no antibiotic treatment at least two months prior to inclusion,
and no known gastro-intestinal diseases. All study participants were of North European
ethnicity.
The study participants were examined on two different days with approximately 14 days
apart. On the first day study participants were examined after an over-night fast.
Height was measured without shoes to the nearest 0.5 cm, and weight was measured without
shoes and wearing light clothes to the nearest 0.1 kg. Hip and waist circumference
was measured using a non-expandable measuring tape to the nearest 0.5 cm. Waist circumference
was measured midway between the lower rib margin and the iliac crest. Hip circumference
was measured as the largest circumference between the waist and the thighs. Blood
pressure was assessed while the participant was lying at an up-right position after
at least 5 minutes of rest using a cuff of appropriate size (A&D, UA-787 plus digital
or A&D, UA-779). Blood pressure was measured at least twice and the average of the
measurements was calculated. At the second day of examination all participants delivered
a stool sample which was immediately frozen after home collection and stored at minus
80 degrees. Information on medication status was obtained by questionnaire and interview
at the first day of examination. Of the 75 T2D patients, ten patients (12 %) received
no hyperglycaemic medications, 58 patients (77 %) received the biguanide metformin;
of these 28 patients (37 %) received metformin as the only anti-hyperglycaemic medication,
10 patients (13 %) received sulfonylurea alone or in combination with metformin, 14
patients (19 %) received a combination of oral anti-diabetic drugs and insulin treatment
and 4 patients (5 %) were on insulin treatment only. Eleven patients (15 %) patients
received dipeptidyl peptidase-4 (DPP4) inhibitors or glucagon-like peptide-1 (GLP1),
all of them in combination with metformin. Patients were reported as receiving antihypertensive
treatment if at least one of the following drugs was reported: spironolactone, thiazides,
loop diuretics, beta blockers, calcium channel blockers, moxonidin or drugs affecting
the renin-angiotensin system (n=55 for T2D (73 %) and n=23 (74%) for T1D. Patients
receiving statins, fibrates and/or ezetimibe were reported as receiving lipid lowering
medication (n=56 for T2D (75 %; all on statin treatment), and n=24 for T1D (77 %;
74 % on statin treatment). All T1D patients were on insulin treatment as their only
blood glucose lowering treatment.
All biochemical analyses were performed on blood samples drawn in the morning after
an over-night fast of at least 10 hours. Plasma glucose was analyzed by a glucose
oxidase method (Granutest, Merck, Darmstadt, Germany) with a detection limit of 0.11
mmol/l and intra- and interassay coefficients of variation (CV) of <0.8% and <1.4%,
respectively. HbA1c was measured on TOSOH G7 by ion-exchange high performance liquid
chromatography. Serum C-peptide was measured using a time-resolved fluoroimmunoassay
with the AutoDELFIA C-peptide kit (Perkin-Elmer, Wallac, Turku, Finland), having a
detection limit of 5 pmol/l and intra- and interassay CV of <4.7% and <6.4%, respectively.
Serum insulin (excluding des and intact proinsulin) was measured using the AutoDELFIA
insulin kit (Perkin-Elmer, Wallac, Turku, Finland) with a detection limit of 3 pmol/l
and with intra- and interassay CV of <3.2% and <4.5%, respectively. Plasma cholesterol,
plasma HDL-cholesterol and plasma triglycerides were all measured on Vitros 5600 using
reflect-spectrophotometrics. Plasma LDL – cholesterol was calculated using Friedewald’s
equation. Blood leucocytes and white blood cell differential count were measured on
Sysmex XS 1000i using flow cytometrics. Plasma metformin was determined by high performance
liquid chromatography followed by tandem mass spectrometry (LC-MS/MS). Briefly the
proteins were precipitated with acetonitril containing the deuterated internal standard,
Metformin-d6, Hydrochloride, and the supernatant diluted by acetonitril. The analysis
was performed on a Waters Acquity UPLC I-class system connected to a Xevo TQ-S tandem
mass spectrometer in electrospray positive ionization mode. Separation was achieved
on a Waters XBridgeT BEH Amide 2.5μm column and gradient elution with A: 100 mM ammonium
formate pH 3.2 and B: Acetonitril. The MRM transitions used for metformin and metformin-d6
were 130.2>71.0 and 136.2>60.0. Calibrators were prepared by spiking drug free serum
with metformin to a concentration of 2000 ng/ml. B12 was measured using Vitros Immunodiagnostic
Products. GAD65 was measured on serum samples by a sandwich ELISA (RSR ltd.). Inter-assay
and intra-assay CV were < 16.6 % and <6.7 % respectively, and with a detection limit
of 0.57 U/ml.
Stool samples were obtained at the homes of each participant and samples were immediately
frozen by storing them in their home freezer. Frozen samples were delivered to Steno
Diabetes Center using insulating polystyrene foam containers, and then they were stored
at −80 °C until analysis. The time span from sampling to delivery at the Steno Diabetes
Center was intended to be as short as possible and no more than 48 hours.
A frozen aliquot (200 mg) of each faecal sample was suspended in 250 μl of guanidine
thiocyanate, 0.1 M Tris, pH 7.5, and 40 μl of 10% N-lauroyl sarcosine. Then, microbial
DNA extraction was conducted as previously reported
12
. The DNA concentration and its molecular size were estimated using nanodrop (Thermo
Scientific) and agarose gel electrophoresis.
Generation and availability of metagenomic samples
Already available Danish metagenomic samples were those reported in
26
Li et al. and references therein (excluding 14 samples removed due to average read
length below 40 nt, and with 5 Chinese and 21 Swedish samples with less than the rarefaction
threshold of 7M reads in total excluded from functional profile or diversity analyses),
with newly sequenced samples deposited in the European Bioinformatics Institute Sequence
Read Archive under accession ERP004605.
All information on Swedish data was retrieved from published data
4
. In addition to published data on Chinese individuals
3
, we retrieved information on metformin treatment on a subset of 71 Chinese T2D patients.
One hundred and twelve samples from Qin et al.
3
lacked metformin treatment metadata and were therefore discarded except for measuring
differences between the country datasets disregarding treatment or diabetic status.
Characteristics of all study participants included in the present protocol are given
in Supplementary Table S1.
Validation cohort recruitment and sample processing
Additional Danish T2D patients were recruited at the Novo Nordisk Foundation Center
for Basic Metabolic Research, University of Copenhagen during 2014 as a part of the
ongoing MicrobDiab study (http://metabol.ku.dk/research-project-sites/microbdiab/).
T2D patients were included in the study if time since T2D diagnosis was less than
5 year, if they were between 35 and 75 years of age, Caucasian and if they had not
received antibiotics within the past four months of inclusion. In total 30 T2D patients
(21 male/9 female) were identified. Fecal samples were collected at the home of the
patients including immediate freezing of samples in home freezers, and transport of
samples to the hospital stored on dry ice. The samples were stored at −80 degrees
C until DNA extraction. Information of medication was obtained from questionnaires.
In total 21 (70%) of the T2D patients received metformin.
Ethics statement
All individuals in both the Danish MetaHIT study and the Danish validation study gave
written informed consent before participation in the studies. Both studies were approved
by the Ethical Committees of the Capital Region of Denmark (MetaHIT study: HC-2008-017;
validation study: H-3-2013-102 ). Both studies were conducted in accordance with the
principles of the Declaration of Helsinki.
Metagenomic analysis
Construction of a non-redundant metagenomic reference gene catalogue
Illumina shotgun sequencing was applied to DNA extracted from 620 faecal samples originating
from the MetaHIT project (Supplementary Table S1). Raw sequencing data were processed
using the MOCAT (version 1.1) software package
27
. Reads were trimmed (option read_trim_filter) using a quality and length cutoff of
20 and 30bp, respectively. Trimmed reads were subsequently screened against a custom
database of Illumina adapters (option screen_fastafile) and the human genome version
19 using a 90% identity cut-off (option screen). The resulting high-quality reads
were assembled (option assembly) and assemblies revised (option assembly revision).
Genes were predicted on scaftigs with a minimum length of 500 bp (option gene_prediction).
Predicted protein-coding genes with a minimum length of 100 bp were clustered at 95%
sequence identity using CD-HIT (version 4.6.1)
28
with parameters set to: -c 0.95, -G 0 -aS 0.9, -g 1, -r 1). The representative genes
of the resulting clusters were “padded” (i.e., extended up to 100 bp at each end of
the sequence using the sequence information available from the assembled scaftigs),
resulting in the final reference gene catalogue used in this study.
The reference gene catalogue was functionally annotated using SmashCommunity
29
(version 1.6) after aligning the amino acid sequence of each gene to the KEGG
30
(version 62) and eggNOG
31
[(version 3) databases.
Profiling of metagenomic samples
Raw insert (sequenced fragments of DNA represented by single or paired-end reads)
count profiles were generated using MOCAT
27
by mapping high-quality reads from each metagenome to the reference gene catalogue
(option screen) using an alignment length and identity cut-off of 45 and 95%, respectively.
For each gene, the number of inserts that matched the protein-coding region was counted.
Counts of inserts that mapped with the same alignment score to multiple genes were
distributed equally among them. Taxonomic abundances were computed at the level of
metagenomic operational taxonomic units (mOTUs)
32
, normalized to the length of the concatenated marker genes for each mOTU to yield
the abundances used for the study, and subsequently binned at broader taxonomic levels
(genus, family, class, etc.).
Rarefaction of metagenomic data and microbial diversity measurements
For all metagenome-derived measures except the mOTU taxonomic assignments, read counts
were rarefied in order to avoid any artifacts of sample size on low-abundance genes.
Rarefied matrices were obtained as follows. Data matrices were rarefied to 7M reads
per sample. This threshold was chosen to include most samples, but 5 Chinese and 21
Swedish samples were excluded due to having less than 7M reads per sample. Rarefactions
were done using a C++ program developed for the Tara project
33
. In total we did 30 repetitions, in each of which we measured the richness, evenness,
chao1 and Shannon diversity metrics within a rarefaction. The median value of these
was taken as the respective diversity measurement for each sample. The first of 30
rarefactions of each sample was used to create a rarefied gene abundance matrix and
KO abundance profiles were calculated by summing the rarefied abundance of genes annotated
to the respective KO gene.
Metagenomic species (MGS) construction
Clustering of the catalogue genes by co-abundance, as described in Nielsen et al.
34
, defined 10,754 co-abundance gene groups (CAGs) with very high correlations (Pearson
correlation coefficient > 0.9). The 925 largest of these, having more than 700 genes,
were termed metagenomic species (MGS). The abundance profiles of the CAGs and MGS
were determined as the medium gene abundance (downsized to 7M reads per sample) throughout
the samples. Furthermore, the CAGs and MGS were taxonomically annotated, by sequence
similarity to known reference genomes.
Functional annotation/binning of metagenomes
To avoid falsely drawing gut microbial functional conclusions from high abundance
of single genes remotely homologous to members of a functional pathway, an approach
was used that required presence of multiple pathway members. Functional pathway abundance
was calculated from gene catalog KO annotation and MGS abundances per sample. Thus
KOs present in each MGS were used to determine for that CAG/MGS which functional modules
were represented within its genetic repertoire, requiring for this that >90% of KO’s
necessary for the completion of a reaction pathway should be present, when also taking
alternative enzymatic pathways into account. The module abundance within a sample
was calculated from CAG abundance in each respective sample, summing over all CAGs
which had the module present. Rarefied median coverage of CAG/MGSs were used, so no
further normalization of the module abundance matrix was required. Abundance of genetic
potential falling under the same higher-order functional levels was calculated by
summing up all abundances of the lower-level functional modules within each sample.
Existing functional annotation databases cover gut metabolic pathways relatively poorly.
To account for this, a number of additional bacterial gene functional modules were
curated and annotated, extending the KEGG system; these are referred to in result
tables as GMMs (Gut Microbial Modules) and were previously described in Le Chatelier
et al
12
.
16S amplicon processing
16S amplicons from frozen samples were sequenced 300bp + 200bp paired end reads using
a Illumina miSeq machine. We used the LotuS
35
pipeline in short amplicon mode with default quality filtering, clustering and denoising
OTUs with UPARSE
36
, removing chimeric OTUs against the RDP reference database (http://drive5.com/uchime/rdp_gold.fa)
with uchime
37
, merging reads with FLASH
38
and assigning a taxonomy against the SILVA 119 rRNA database
39
and further refined by BLAST searches against the NCBI rRNA database
40
to identify Intestinibacter OTUs, using the following LotuS command line options:
“-p miSeq -refDB SLV -doBlast blast -amplicon_type SSU -tax_group bacteria -derepMin
2 -CL 2 -thr 14”.
Univariate tests of taxonomic or functional abundance differences
Microbial taxa where mean abundance over all samples was less than 30 reads, or which
were present in less than 3 samples were excluded from univariate and classifier analyses.
All abundances were normalized by total sample sum. For module tables no feature filters
were used except requiring the module to be present in at least 20 samples. Filtered
data tables were made available online.
Univariate testing for differential abundances of each taxonomic unit between two
or more groups was tested using Mann-Whitney-U or Kruskal-Wallis tests, respectively,
corrected for multiple testing using the Benjamini-Hochberg false discovery control
procedure rate (e.g. q-values)
41
. Post-hoc statistical testing for significant differences between all combinations
of two groups was conducted only for taxa with abundances significantly different
at P < 0.2. Wilcoxon rank-sum tests were calculated for all possible group combinations
and corrected for multiple testing again using the Benjamini-Hochberg false discovery
rate, as implemented in R. Where controlling for potential confounders such as source
study, we used blocked “independence_test” function calls with options “ytrafo = rank,
teststat=scalar” for blocked WRST and “ytrafo = rank, teststat=quad” for blocked KWT,
as implemented in the COIN software package
42
for R. Similarly we applied these independence tests in the framework of post-hoc
testing as described above.
Analysis of correlations between taxonomic or functional features, community diversity
indices and sample metadata variables were conducted using Spearman correlation tests
as implemented in R, and corrected for multiple tests using the Benjamini-Hochberg
false discovery rate control procedure. To control for confounders such as source
study in univariate correlation analyses, blocked Spearman tests as implemented in
COIN (settings “independence_test”, options ytrafo = rank, xtrafo=rank, distribution=asymptotic)
were used.
In some analyses taxa were corrected for the influence of a continuous confounder
variable like microbial community richness; in these cases the residual of a linear
model between normalized, log-transformed taxa abundances and overall sample gene
richness was used to correct for the confounding variable. Power analysis was conducted
by randomly subsampling to a given sample number, repeated 5 times to achieve robust
results.
Ordinations and multivariate tests
All ordinations (NMDS, dbRDA) and subsequent statistical analyses were calculated
using the R-package vegan
43
using Canberra distances on normalized taxa abundance matrices, then visualized using
the ggplot2 R package
44
. Community differences were calculated using a permutation test on the respective
NMDS reduced feature space, as implemented in vegan.
Furthermore, we calculated intergroup differences for the microbiota using PERMANOVA
45
as implemented in vegan. This test compares the intragroup distances to the intergroup
distances in a permutation scheme and from this calculates a P-value. For all PERMANOVA
tests we used 2×105 randomizations and a normalized genus-level mOTU abundance matrix,
using Canberra intersample distances. PERMANOVA post hoc P-values were corrected for
multiple testing using the Benjamini-Hochberg false discovery rate control procedure.
Analysis of variance broken down by cohort, treatment and disease status was conducted
by fitting these distances to a linear model of sample metadata distances, as further
described in Supplementary Discussion 3.2.
Classifier construction and evaluation
To create classifiers for separating samples from different subsets, a L1 restricted
LASSO using the R glmnet package
46
was carried out to test for an optimal value of lambda (number of features to be used
in the final predictor) in a 5-fold cross-validated and internally 4-fold cross-validated
LASSO run on all data. After this, the previously determined value of lambda was manually
controlled for number of features used against the root mean square error of the classifier.
In a 5-fold cross-validation an independent LASSO classifier was trained on 4/5 of
the data using the previously determined value of lambda, and response values were
predicted on 1/5 of the data. LASSO models with a Poisson response type were used
in all cases.
Binary classifications between T2D and ND CTRL samples were performed with a R reimplementation
of the robust recursive feature elimination support vector machine (rRFE-SVM)
47
procedure. The SVM was performed in an outer cross-validation scheme on 4/5 of the
data. Of these randomly 90% were selected 200 times in each cross-validation for the
RFE, to create a feature ranking from an average over these runs. Classifier performance
was validated on the remaining 1/5 of samples using the pre-established feature ranking.
In case of several cohorts, ROC-AUC scores were measured for each cohort separately.
Code availability
The MGS technology has previously been described
34
and is available online (http://git.dworzynski.eu/mgs-canopy-algorithm/wiki/Home).
The mOTU resource has been made publically available (http://www.bork.embl.de/software/mOTU/)
and was analyzed using MOCAT
27
which is also publically available (http://vm-lux.embl.de/~kultima/MOCAT/). The 16S
pipeline LotuS
35
is freely available online (http://psbweb05.psb.ugent.be/lotus). The novel gene catalog
has been deposited online (http://vm-lux.embl.de/~kultima/share/gene_catalogs/620mhT2D/),
as have the raw amplicon sequences (http://vm-lux.embl.de/~forslund/t2d/). Statistical
analysis and data visualization was conducted using freely available R libraries:
vegan, coin and ggplot2 and was described in more details elsewhere
48,49
. Data matrices and R source code for replicating the central tests conducted on the
data have been deposited online (http://vm-lux.embl.de/~forslund/t2d/).
Evaluation of dietary habits
A subset of the Danish study participants answered a validated food frequency questionnaire
(FFQ) in order to obtain information on the habitual dietary habits. A complete dataset
was obtained for 66 % of the non-diabetic individuals and 88 % of T2D patients. When
evaluating the dietary data the consumed quantity was determined by multiplying portion
size by the corresponding consumption frequency reported. Standard portion sizes for
women and men, separately, were used in this calculation
50,51
. All food items in the FFQ were linked to food items in the Danish Food Composition
Databank
52
. Estimation of daily intake of macro- and micronutrients for each participant was
based on calculations in the software program FoodCalc version 1.3
53
.
Extended Data
Extended Data Figure 1
A. As a positive control for the meta-analysis pipeline, true signal was removed from
the data by randomly reshuffling sample labels. Artificial contrast was thereafter
introduced between random groups containing as many such reshuffled samples as were
in the original sets of T2D metformin+ (nCHN=15, nMHD=58, nSWE=20) and T2D metformin−
(nCHN=56, nMHD=17, nSWE=33) samples in each original study subset, using the genus
Akkermansia as an example feature. Samples randomly assigned to the sets of fake “metformin
treated” and “control” categories had their Akkermansia genus abundances adjusted
to match the scale of the metformin effect on Escherichia genus abundance reported
here (metformin-treated samples roughly 150% as likely to have nonzero abundance,
with a roughly threefold higher abundance where present), while retaining their dataset
origin labels. The full meta-analysis pipeline (study set blocked KWT test, post-hoc
WRS test) was applied to these samples. Benjamini-Hochberg-corrected P-values (FDR
scores/Q-values) from testing for a metformin effect on Akkermansia abundance are
here plotted in logarithmic scale on the vertical axis for 100 randomizations of the
entire shuffled dataset, either without (left boxplot) or with (right boxplot) the
artificial Akkermansia metformin signal added after shuffling the data to remove original
signal. Box plot borders show medians and quartiles, with points outside this range
shown as vertical whisker lines and point markers, whiskers extend to 1.58 × interquartile
range / sqrt (n). Horizontal guide lines are shown for ease of visualization corresponding
to different false discovery rate thresholds. For randomly reshuffled data, as expected
no significant contrast is detected, while the artificially introduced signal is reliably
detected, roughly matching expectations from the definition of the false discovery
rate itself.
B. To investigate statistical power for the other medications tracked, five random
sub-samplings were made of pairs of medicated and non-medicated samples, at each increasing
number of included sample pairs, and the overall analysis replicated for each, testing
each genus for significantly (KW-test followed by post-hoc WRS test) differential
abundance between cases and controls, at different BH FDR significance cutoffs marked
in the figure using different colours. Out of the total number of samples for which
medication status was known, equal numbers n of medicated and unmedicated samples
were chosen randomly in repeated iterations. This number n was varied up to its largest
possible value (smallest of either number of medicated or unmedicated samples in overall
dataset) and is what is shown on the horizontal axis. The vertical axis shows number
of features significant relative to each cutoff, with standard deviation over each
set of five randomized samples shown as error bars.
C. The graphs show Intestinibacter and Escherichia median and quartile abundances
as boxplots, whiskers extend to 1.58 × interquartile range / sqrt (n), with samples
extreme relative to the interquartile range shown as point markers and with samples
below detection threshold (DT) plotted at y = 0, in 21 additional T2D metformin+ and
9 additional T2D metformin− samples. Differences in abundance between sample categories
are significant (WRS test, BH FDR < 0.1). The samples where Intestinibacter was detected
all fall among the 9/30 untreated rather than the 21/30 metformin-treated samples,
consistent with severe depletion under treatment, whereas Escherichia abundances increase
under treatment, likewise consistent with observations from the main dataset.
Extended Data Figure 2
Differences in physiological variables and microbiome characteristics between Chinese
(n=368), Danish MetaHIT (n=383) and Swedish (n=145) gut metagenome sample sets.
A. Several participant metadata variables are significantly different between cohorts,
of which a subselection is shown here as boxplots displaying median and quartiles,
with samples outside this range shown as point markers and whiskers, whiskers extend
to 1.58 × interquartile range / sqrt (n).
B. In a PCoA ordination of Bray-Curtis distances between samples on bacterial family
level, clear differences between samples from the different cohorts become apparent.
These are largely explained by taxonomic differences as summarized at the phylum level.
C. Boxplots for gut microbial taxa show medians and quartiles of log-transformed read
counts for mOTUs summarized at the level of bacterial genera, for the three country
subsets across sample categories, with samples outside this range shown as point markers
and whiskers, whiskers extend to 1.58 × interquartile range / sqrt (n). For all boxplots,
tests for significant differences (Kruskal-Wallis test adjusted for study source)
were performed with P-values shown at the head of each figure. Star markers note significance
of tests done for each country subset separately.
Extended Data Figure 3
Taxonomic microbiome composition comparison between T2D metformin− (n=106), T2D metformin+
(n=93) and ND CTRL (n=554) gut metagenomes with particular focus on possible taxonomic
restoration under metformin treatment for certain taxa. Boxplots show medians and
quartiles log-transformed read counts for mOTUs summarized at the level of bacterial
genera, for the three country subsets across sample categories, with samples outside
this range shown as point markers and whiskers, whiskers extend to 1.58 × interquartile
range / sqrt (n). Tests for significant differences (Kruskal-Wallis test adjusted
for study source) were performed with P-values shown at the head of each figure. Star
markers show results of tests for each country subset separately.
Extended Data Table 1
A. The analysis of variances table shows the results of modelling the Canberra distances
between T2D metformin− (n=106), T2D metformin+ (n=93) and ND CTRL (n = 554) samples
with predictor variables encoding same/different diabetes status, same/different treatment,
and same/different study source/country. Fractions of explained variance are taken
as fractions of sum of square deviations from the model relative to the total deviation.
B. Bacterial taxa found significantly different in gut abundance under metformin treatment
were tested (WRS-test) for significant differential relative abundance in a separate
cohort under 16S amplicon sequencing between T2D metformin+ (n=26) and T2D metformin−
(n=8) samples.
a
Degrees of freedom
Sum of squares
Explained variation
F-statistic
Pr(>F)
Treatment
1
128
3.8%
21454.01
<2E-016 ***
Disease
1
42
1.2%
7039.86
<2E-016 ***
Country
1
376
11.1%
63206.96
<2E-016 ***
Treatment × Disease
1
1
0.0%
192.62
<2E-016 ***
Treatment × Country
1
67
2.0%
11209.33
<2E-016 ***
Disease × Country
1
1
0.0%
218.97
<2E-016 ***
Treatment × Disease × Country
1
0
0.0%
22.79
0.00000181 ***
Residuals
567001
3375
100.0%
Total:
3990
b
Database OTU identifier
MWU P-value
Enriched in
Mean abundance (%)
OTU_45
0.048968332
T2D metformin+
0.803960725
OTU_1038
0.0319637913
T2D metformin−
0.000185722
Database OTU identifier
OTU_45
OTU_1038
Domain
Bacteria
Bacteria
Phylum
Proteobacteria
Firmicutes
Class
Gammaproteobacteria
Clostridia
Order
Enterobacteriales
Clostridiales
Family
Enterobacteriaceae
Peptostreptococcaceae
Genus
Escherichia-Shigella
Intestinibacter
Supplementary Material
Supplementary Discussion
Supplementary Information
Supplementary Table Legends
SITable1
SITable2
SITable3
SITable4
SITable5
SITable6
SITable7
SITable8
SITable9
SITable10
SITable11
SITable12
SITable13
SITable14
SITable15
SITable16