Diet and Maternal Gestational Weight Gain Predict Metabolic Maturation of Infant Gut Microbiomes

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

INTRODUCTION Commensal gut bacterial communities (microbiomes) are predicted to influence human health and disease 1,2 . Neonatal gut microbiomes are colonized with maternal and environmental flora, and mature toward a stable composition over two to three years 3,4 . To study pre- and post-natal determinants of infant microbiome development, we analyzed 402 fecal metagenomes from 60 infants aged 0–8 months, using longitudinal generalized linear mixed models (GLMMs). Distinct microbiome signatures correlated with breastfeeding, formula ingredients, and maternal gestational weight gain (GWG). Amino acid synthesis pathway accretion in breastfed microbiomes complemented normative breastmilk composition. Prebiotic oligosaccharides, designed to promote breastfed-like microflora 5 , predicted functional pathways distinct from breastfed infant microbiomes. Soy formula in six infants was positively associated with Lachnospiraceae and pathways suggesting a short-chain fatty acid (SCFA)-rich environment, including glycerol to 1-butanol fermentation, which is potentially dysbiotic. GWG correlated with altered carbohydrate degradation and enriched vitamin synthesis pathways. Maternal and postnatal antibiotics predicted microbiome alterations, while delivery route had no persistent effects. Domestic water source correlates suggest water may be an underappreciated determinant of microbiome acquisition. Clinically important microbial pathways with significant dietary correlates included dysbiotic markers 6,7 , core enterotype features 8 , and synthesis pathways for enteroprotective 9 and immunomodulatory 10,11 metabolites, epigenetic mediators 1 , and developmentally-critical vitamins 12 , warranting further investigation. MAIN TEXT Commensal gut microbes contribute to pathogen exclusion, nutrient acquisition, and immune recognition, thereby preventing or modulating multiple human pathologies 1,2 . Understanding determinants of early microbiome establishment can guide health-promotion and disease-prevention efforts. Human milk provides optimal infant nutrition 12 , and favors gut Bifidobacterium and Lactobacillus spp 4,13 . While commercial formulas closely approximate breastmilk composition 12,14 , and galacto- (GOS) and fructo-oligosaccharides (FOS) are designed to mimic human milk oligosaccharides 5 , breastfed and formula-fed infant gut microbiomes remain distinct 4,12 . The impact of specific formula ingredients on gut microbiome acquisition is underdetermined. To test the hypothesis that specific formula components alter developing gut microbiomes’ taxa and gene-encoded functions, we whole-metagenome shotgun sequenced 402 frozen fecal samples collected monthly from 60 healthy twins (median gestational age 37 weeks) from birth to eight months 3,13 (Table S1). We constructed longitudinal GLMMs for taxa and genetically-encoded functional pathways (for brevity hereafter referred to as “pathways”) inferred using MetaPhlAn2 and HUMAnN2; all p values are two-tailed, from maximum-likelihood GLMMs Tukey-corrected for multiple comparisons (see Online Methods, Tables S2-S7). This study, approved by the Human Research Protection Office of Washington University School of Medicine, complied with all ethical regulations. Written informed consent was obtained for all subjects. We identified multiple known determinants of gut microbiome assembly, confirming the validity of our approach. (Figures 1, S1) 4,15–17 . Alpha diversity (Shannon index) correlated positively with time (N=402 samples, p<0.001) and fruit/vegetable exposure (N=160, p=0.011), and negatively with maternal intrapartum ampicillin-sulbactam (N=46, p=0.005) and any postnatal antibiotics (N=49, p=0.043). Bifidobacteriaceae enrichment correlated with >50% breastfeeding (N=75, p=0.003) and lifetime GOS exposure (N=204, p=0.005). Lachnospiraceae increased with time (N=402, p<0.001) and decreased with any breastfeeding (N=125, p=0.014), Enterobacteriaceae decreased with time (N=402, p<0.001) and GOS (N=204, p=0.003), and Bacteroidaceae decreased with Cesarean delivery (N=227, p=0.003) and increased with fruit/vegetable exposure (N=160, p=0.004). Breastfed infant gut microbiomes accrued amino acid synthesis pathways that complemented breastmilk’s changing amino acid content 14 , suggesting that parallel milk and microbiome development may reflect physiologic adaptation (Figure 2). Majority-breastfed (>50%) infant gut microbiomes (N=75) had significantly more methionine (p<0.001), BCAA (branched-chain amino acids isoleucine/leucine/valine, p=0.020), cysteine/serine (p=0.012), threonine (p=0.004), and arginine (p=0.023) synthesis pathways. All pathways enriched in breastfed microbiomes except cysteine/serine correspond to amino acids less concentrated in breastmilk than in standard infant formula 14,18 . Breastfed microbial arginine and BCAA synthesis pathways increased sharply after birth and plateaued at ~60 days, coinciding precisely with normative declining amino acid content as breastmilk transitions from colostrum to mature milk. 14 (Fig. 2). Breastmilk is low in methionine and cysteine in all lactation stages 14 ; breastfed microbiomes had more methionine and cysteine pathways at all timepoints. Histidine and tryptophan are more abundant in breastmilk than in formula 14 , and breastfed microbiomes had significantly less histidine-purine-pyrimidine (p=0.046) and tryptophan-precursor chorismate (p<0.001) 19 synthesis pathways. Glutamate and glutamic acid are abundant in breastmilk 14 , and glutamate synthesis pathways (PWY-5505), though too sparse to model, were almost exclusive to formula-fed microbiomes (N=114, 90% of total). Lysine was an exception to milk-microbe complementarity. Infant formulas have more lysine than breastmilk, yet formula-fed microbiomes had more lysine synthesis pathways (p=0.003). Lysine synthesis pathways mapped to Bacteroides and Firmicutes genera (Supplementary Table 40); formula-associated enrichment likely reflects accelerated microbiome maturation following breastfeeding cessation 4,13 . Milk-microbiome complementarity may be physiologically relevant to neonatal and infant protein balance 12,14 . Although breastmilk’s amino acid content declines post-partum 14 and formula composition is static, normative serum arginine, cysteine, and methionine levels decline almost identically in breastfed and formula-fed infants 20 , suggesting a “gap” that might be filled by microbially-produced amino acids. Breastfeeding-enriched metabolic pathways could mechanistically explain some of its known benefits 11,12 . Arginine and cysteine might prevent serious infections 10,11 and biotin, enriched in breastfed infants (p=0.006), inhibits pathogenic E. coli adherence 9 . Many breastfeeding-associated amino acid synthesis pathways mapped to Bifidobacterium spp., an exceptionally successful breastfed gut colonizer. Breastfeeding-correlated enrichment of Bifidobacterium-identified amino acid synthesis pathways in a pattern contemporaneous and complementary to human milk maturation might reflect ancestral co-evolution with commensal microbiota. GOS and FOS are added to formulas to promote breastfed-like microbial communities 5 . Although lifetime GOS exposure correlated with Bifidobacteriaceae enrichment, prebiotics did not uniformly predict breastfed-like functional pathways, highlighting current technologic limitations of formula design and manufacturing (Figure S2). Concurrent GOS and FOS exposure (N=26) predicted increased microbial BCAA (p<0.001) and threonine (p=0.038) synthesis pathways, mimicking breastfeeding. Lifetime GOS exposure (N=204) predicted decreased tyrosine (p=0.004), cysteine/serine (p=0.003), and arginine-polyamine (p=0.040) synthesis pathways, opposing breastfeeding. In all models, prebiotic coefficients approximately equaled or exceeded those for breastfeeding. Pathways depleted with GOS exposure primarily belonged to Enterobacteriaceae (Table S4); discordant GOS and breastfeeding correlates might reflect GOS-related decrease in Enterobacteriaceae 5 . Six infants from four families were soy-exposed; sample size ranged from 31–37, depending on soy formula type (+/− FOS) and exposure type (current or lifetime, Table S6). Soy feeding predicted greater alpha diversity (Shannon index, N=31, p=0.036), low Bifidobacteriaceae (N=31, p<0.001) and high Lachnospiraceae (N=32, p<0.001) content; in both taxonomic models, the coefficient for soy was greater than for breastfeeding (Figures 1b, 3, S3). Two soy-exposed twin pairs were soy-discordant, permitting comparison with a related control. Twins are expected to have similar microbiomes 3,13 , yet soy-discordant twin microbiomes were dissimilar, while unrelated soy-exposed microbiomes had strong resemblance. Soy encourages Lachnospiraceae proliferation 16 , but has no clear effect on Bifidobacteriaceae 16,21 . Soy formula could disfavor Bifidobacteriaceae via cidal effects of soy isoflavone derivatives 22 , by containing prebiotics (FOS) with weak bifidogenic properties 23 , or by favoring competing taxa 16 . Pre-soy samples were few (N=6), but pre-post soy comparisons did not suggest soy-mediated bifidobacterial suppression: soy-fed microbiomes were low in Bifidobacteriaceae prior to soy exposure. Low pre-soy bifidobacterial content suggests that low-Bifidobacteriaceae microbiomes might drive soy formula selection, especially as soy-feeding is usually elective 24 , rather than required for galactosemia, congenital lactase deficiency, and cow’s milk protein allergy 25 . Soy-correlated depletion of Bifidobacteriaceae-identified methionine (N=31, p=0.010) and S-adenosyl methionine (N=37, p=0.019) synthesis pathways suggests a mechanism for this effect (Table S4). Low-Bifidobacteriaceae microbiomes are associated with infant colic, which often prompts formula changes 6 . Methionine is a plausible mediator of enteric symptoms, as it is affects both gut epithelia 26 and motility 27 . Indeed, methionine synthesis pathways positively correlated with reported diarrhea in our cohort (N=16, p<0.001), possibly representing a clinical correlate of methionine’s reported prokinetic properties 27 . Soy protein is methionine-deficient relative to mammalian casein and whey proteins; soy formula is methionine-supplemented with a free methionine content ~125 times that of breastmilk 18,24 . These gut-specific effects of methionine provide a biologically-plausible mechanism for symptoms associated with low bifidobacterial and methionine synthesis pathway content to improve after initiation of high-methionine formula. Several soy-associated pathways – chorismate synthesis (N=31, p<0.001), lactose/galactose degradation (N=37, p=<0.001), and starch degradation (N=31, p<0.001) – suggested SCFA-producing Lachnospiraceae proliferation. Soy-correlated enrichment of lysine synthesis (N=32, p<0.001), riboflavin synthesis (N=32, p<0.001), and glycerol-to-butanol fermentation (N=32, p<0.001) pathways suggested adaptation to SCFAs. Lactose/galactose and starch degradation pathways frequently mapped to Lachnospiraceae (Table S4), and a greater proportion of chorismate synthesis pathways were Blautia-identified post-soy exposure (Fig. 3c). Lysine provides an acetate and butyrate synthesis substrate 28 , butyrate stress in Clostridium spp. induces upregulation of riboflavin and downregulation of methionine synthesis 29 , acetate stress promotes glycerol to butanol fermentation 29 , and many microbes co-regulate riboflavin synthesis genes with metabolic stress response modules 30 . Some soy-associated changes are potentially dysbiotic: decreased Bifidobacteriaceae and elevated glycerol to 1-butanol fermentation combined with high Lachnospiraceae content have been associated with inflammation, allergies, and hepatic steatosis 2,7 . These dysbiotic features raise concerns about the long-term safety and efficacy of elective soy formula feeding. Maternal GWG has yet-to-be determined effects on infant gut microbiome development 15,31,32 . Here, GWG (N=402) predicted persistent enrichment of infants’ microbial glucose (p<0.001) and glycogen (p=0.005) degradation pathways, and phenylalanine (p=0.011), cysteine/serine (p<0.001), folate (p=0.015), thiamine (p<0.001), biotin (p<0.001), and pyridoxine (p=0.009) synthesis pathways, after controlling for gestational age, maternal diabetes, and pre-pregnancy body mass index (Table S1). Starch degradation pathways negatively correlated with GWG (p=0.032) The GWG distribution in our cohort roughly corresponded with Institute of Medicine guidelines for twin pregnancies (see Online Methods): women with inadequate and excessive GWG fell into the first and fourth quartiles, respectively. GWG-correlated pathways plotted by age and quartile suggest that GWG-mediated effects persistent at 8 months are most apparent in infants born to mothers who gained the least weight, and low GWG appears more impactful with increasing gestational age (Figure 4). Although true malnutrition is unlikely in our cohort, maternal undernutrition increases risk of oxidative injury, glucose dysregulation, adiposity, and cardiovascular disease in offspring 1 . Several GWG-enriched vitamin synthesis pathways (pyridoxine, thiamine, folate) are critical to early infant neurodevelopment 12 and thiamine synthesis pathways are a proposed distinguishing core ‘enterotype’ feature 8 . GWG negatively correlates with folate synthesis pathway abundance in the placental microbiome 33 . We observed the inverse relationship in our population (GWG-associated folate pathway enrichment), perhaps representing compensation for the fetal microenvironment. Folic acid is a key epigenetic mediator, and might effectuate enduring host-microbe interactions and mediate fetal origins of disease 1 . GWG-associated microbial metabolic pathway changes persisting eight months postnatally extends current knowledge that GWG influences microbiome development in the first months of human life 15,31,32 and up to one year in non-human primates 34 . As maternal dietary records and weight gain by trimester were not collected, we can neither identify trimester-specific modulations nor attribute GWG-associated effects to specific dietary variables (e.g. fat content). Enduring GWG-associated changes independent of delivery route or breastfeeding might reflect altered in-utero meconium colonization 35 , microbe transfer from caregivers 36 , and other genetic or environmental factors (e.g. family feeding practices) influencing both GWG and infant microbiome acquisition. Maternal intrapartum antibiotics predicted postnatal development of taxa and functional pathways, eclipsing the effects of delivery route and postnatal antibiotics (Figure S4). Maternal intrapartum ampicillin-sulbactam exposure (N=46) predicted depleted histidine/purine/pyrimidine synthesis (p=0.012) and homolactic fermentation (p<0.001) pathways in offspring microbiomes. Postnatal amoxicillin exposure (N=38), analogous to ampicillin without sulbactam, predicted increased histidine/purine/pyrimidine synthesis pathways (p=0.011). Maternal intrapartum clindamycin exposure (N=25) positively correlated with Lachnospiraceae (p= 0.008), Enterobacteriaceae (p<0.001), and cysteine/serine (p<0.001) and biotin (p=0.002) synthesis pathways. Clindamycin was given immediately prior to Cesarean delivery in our cohort, but the more frequently used cefazolin (N=164) did not correlate with these pathways. Lack of persistent microbiome effects associated with Cesarean delivery when corrected for confounders is consistent with prior reports 15 . Infant multivitamin with iron exposure (N=40, Fig. S5) predicted enriched arginine-polyamine (p=0.018), folate (p<0.001), and heme (p=0.026) biosynthesis and homolactic fermentation pathways (p=0.028). Domestic drinking water sources had associated microbiome signatures (Figure S5); sample size depended on exposure type (Table S6). Lactose/galactose degradation pathways positively correlated with filtered water exposure (N=42, p=0.004); enhanced bacterial counts with home water filters might explain this effect 37 . Tap water exposure predicted decreased Enterobacteriaceae (N=251, p=0.016), glycogen degradation (N=230, p=0.006) and homolactic fermentation (N=230, p=0.007) pathways. Bottled water exposure predicted increased homolactic (N=122, p=0.002) pathways, and boiled/distilled water correlated with increased pyridoxine synthesis pathways (N=61, p=0.003). Together with animal data 38 , these patterns suggest an underappreciated influence of drinking water on microbiome acquisition. Although this DNA-based study represents genetic potential rather than confirmed functions, our observations are consistent with transcriptomic studies showing enriched arginine biosynthesis transcripts in mother-fed relative to formula-fed piglets 39 and enhanced BCAA synthesis with sialylated oligosaccharide exposure in mice 40 . Further work is required to mechanistically establish a causal relationship between soy exposure and soy-fed microbiome signatures and to definitively show that soy protein per se drives these changes, likely via experimental validation in microbiome-humanized gnotobiotic mice 13 . In summary, our findings suggest host-microbe metabolic mutualism in infancy, whereby gut microbiome gene content expands to counterbalance components lacking in human milk (Fig. S6). We propose that this milk-microbiome synergy reflects physiologic co-evolution with our earliest commensals, and could play a major teleological role in infant protein nutrition and child growth. The observed discordance between microbial functional correlates of formula components (e.g. prebiotics) and breastmilk may warrant revised metrics for evaluating the safety and efficacy of infant formulas. Soy formulas corresponded with profoundly altered taxa and pathways, some of which have pathologic correlates 6,7 . Finally, the association between maternal GWG and altered infant microbiome carbohydrate utilization and vitamin synthesis pathways enduring eight months postnatally extends current knowledge that maternal GWG influences early microbiome acquisition. These data can inform further ecologic and mechanistic interrogations of gut microbiome development. ONLINE METHODS Study Population This study was approved by the Human Research Protection Office of Washington University School of Medicine in St. Louis, and it complied with all ethical regulations. Written informed consent was obtained from all adult participants, and from the parents or legal guardians of all minor subjects. We used fecal samples that had been frozen at −80oC since collection at monthly intervals from a birth cohort of healthy twins in St. Louis, in which the mothers had consented to monthly fecal sample collection from birth until two years of age 3,13,41–43 . We selected a time interval of 0–8 months of age to capture transitions from breastfeeding to formula and early introduction of solid food. To minimize potential confounding effects of early illness or antibiotic administration, we excluded any neonates who received antibiotics in hospital following delivery. Because of this predetermined exclusion criterion, we also excluded all infants with a maternal history of chorioamnionitis. 402 samples from 60 infants in thirty-one families met our pre-defined coverage threshold of 5 million reads (2.5 million forward/reverse) before processing 44 , for a median of 7 samples per infant (IQR 6–8). Demographic data are provided in Table S1. We excluded neonates treated with antibiotics in the first week of life to avoid potential bias from early illness or antibiotic exposure; there were accordingly no infants with a maternal history of chorioamnionitis. Infant age at stool collection ranged from the day of delivery to 253 days. All infants were exposed to solid food by the end of the study period. The median gestational age was 37 weeks (IQR 36–38), 43% of infants were delivered vaginally, and 47% of twins were monozygotic, 50% dizygotic, and 3% of unknown zygosity. Four infants’ mothers were diabetic (7%), six infants’ mothers developed preeclampsia (10%), and two infants were born to a mother with both conditions. DNA Extraction and Sequencing We extracted fecal metagenomic DNA and a positive control (Zymobiomics microbial community standard D6300), and used a modified Nextera DNA Library Preparation Kit protocol to prepare DNA for Illumina-platform sequencing (NextSeq-High; ~400,000,000 max reads, 150 cycles per read). A positive control (Zymobiomics community standard) and a negative control (nuclease-free water) were included in sequencing runs. Detailed experimental protocols follow. DNA extraction We extracted DNA using the MoBio DNEasy PowerSoil Extraction Kit (Qiagen, 12888–100) according to the manufacturer’s instructions, with the following modification: in lieu of centrifugation, we used bead-beating with a BioSpec Mini-BeadBeater for 4 minutes. Bead-beating consisted of 2 minutes on the “homogenize setting”, 2 minutes on ice, and then 2 minutes on the “homogenize setting”. A Zymobiomics microbial community standard (Zymobiomics, D6300) 0.75 mL was also extracted along with fecal DNA samples. DNA was eluted in 100uL nfH20 and quantitated using a Qbit fluorometer and a Qbit dsDNA HS Assay Kit (Invitrogen, Q32854) according to the manufacturer’s instructions. Nextera library preparation Fecal DNA samples were diluted to a concentration of 0.5 ng/uL and 1uL of each sample (including a nuclease free water negative control and the Zymo community standard positive control) were added to a 96-well plate. Sequencing libraries were prepared using the Nextera DNA Library Preparation Kit (Illumina, FC-121–1011) protocol according to the manufacturer’s instructions, with the following modifications: Tagmentation Tagmentation master mix (TMM) Preparation: Component 1 rxn (uL) 100 rxns (uL) TD buffer 1.25 125.0 TDE1 enzyme 0.125 12.5 Nuclease free water 0.125 12.5 1.5uL TMM added to 1uL gDNA in each well of the 96-well plate, vortexed, and centrifuged Plate covered with microseal B and incubated in a Thermocycler at 55C for 15 minutes. Adapter Addition KAPA HiFi PCR master mix (KAPA HiFi HotStart 2x ready mix #KK2602/KM2605) used for addition of oligonucleotide index adapters. 11.2 uL KAPA PCR MasterMix and 8.8 uL of adapters (1uM) to each well, vortex and centrifuge. PCR done with following Thermocycler protocol 72C, 3:00 98C, 5:00 98C, 0:10 63C, 0:30 72C, 0:30 go to iii 13X 72C, 5:00 4C, forever PCR cleanup Add 22.5 uL AmpPure XP beads to PCR reaction (Agencourt A63881) Incubate 5 min at room temperature Separate beads x 2min on magnetic stand Remove supernatant Wash beads x 2 with 200uL 80% ethanol Air-dry x 15 min Add 30uL resuspension buffer (10 mM Tris-Cl, 1 mM EDTA, 0.05% Tween-20 (pH 8.0)), pipet mix Incubate at RT for 5 minutes Separate beads on magnetic plate x 2 minutes Transfer 27uL supernatant to new plate Quantitate DNA with Qubit HS dsDNA Assay kit (Invitrogen, Q32854) Illumina sequencing Library Pooling: After quantitation, sequencing libraries were pooled in triplicate to minimize the effects of pipetting error. Schema for pooling included the following rules: Target of 5ng DNA per sample, per pool If calculated volume for 5ng <1uL, samples were diluted (2x, 5x, 10x, or 20x) so the volume was >1uL Triplicate pools quantitated with Qubit HS dsDNA Assay kit (Invitrogen, Q32854) Each pool was added to a single pool to make an equimolar solution, and diluted to a concentration of 2ng/uL Pool submitted for Illumina platform sequencing (MiSeq flowcell) as a ~500,000 read spike-in sample; reads analyzed to determine evenness of sample distribution. If needed, a fourth corrective pool was pipetted and added to the solution to ensure adequate read coverage (>2.5M forward/reverse) for all samples. Sequencing Pooled samples diluted to 2ng/uL with nfH20 (~10uM based on expected fragment size) were submitted for Illumina platform sequencing (NextSeq-High; ~400,000,000 max reads) with 150 cycles per read. Sequence data was returned as a .fastq file with reads demultiplexed according to oligonucleotide adapter indexes. Clinical Data Collection Clinical data were collected from medical records at the time of delivery, monthly parental surveys at the time of stool sample collection, and outpatient pediatric records, and securely stored on a RedCap database. Parental questionnaires, infant medical records, and formula manufacturers’ labels provided a detailed clinical and dietary dataset (including symptoms, medications, and introduction of new foods) associated with each sample. Parental dietary reports included infant formula brands, solid foods, and water sources, as well as fields for reporting daily or weekly frequency of each dietary exposure from the CDC Infant Feeding Practices Study II 45 . As exclusive breastfeeding was rare in this twin cohort, infants were classified as breastfed if their parents reported >=50% of their feeds as breastmilk in the survey associated with a given stool sample. All breastfed infants received maternal milk; there were no reported exposures to banked or donated human milk. Medication exposures reported on parental surveys were confirmed with medical records from the child’s primary care physician. Information from the manufacturer’s label for each infant formula was used to generate a suite of variables representing exposure to specific formula ingredients (e.g. lactose, sucrose, soy protein, GOS, FOS); full details are below. Clinical data analytic specifications Clinical data de-identified of any protected health information was collected from medical records at the time of delivery, monthly parental surveys at the time of stool sample collection, and outpatient pediatric medical records, and securely stored on the RedCap database. Parental dietary reports included: Binary fields for exposure to human milk, various infant formula brands, foods, medications, experience of symptoms, etc. Fields for frequency of exposure to a food type, expressed either as the number of times an infant received a food per day, or per week Free text options To transform dietary information into data that were usable in statistical models, the following steps were followed All frequency information listed as exposures per day was converted into exposures per week for convenience Percentage of feeds comprised of formula were calculated from parental reports of number of formula feeds per week and number of breastfeeds per week. A binary variable for “Mostly Breastfeeding” was also generated if the percentage of breastfeeds was > 50%. Carbohydrate, protein, and prebiotic (galacto-oligosaccharides, fructo-oligosaccharides) ingredients were determined for each infant formula according to the manufacturer’s label (Table S7). Binary variables for exposure to each ingredient at each timepoint were generated according to the brand(s) of formula(s) the parents had reported, and the manufacturer’s labels. If parents reported using any brand of formula use on the survey associated with a stool sample, binary variables for ingredients in that formula were coded as “1”, even if the parents otherwise recorded that the infant was exclusively breastfed (i.e. if parents reported 100% breastfeeds, but filled in Enfamil Lipil as a formula they selected, the infant was coded as mostly breastfed, but exposed to the ingredients in Enfamil Lipil). If there was ambiguity in the specific brand of formula, then missing values were recorded for binary variables (e.g. if it was unclear whether an infant was given Enfamil Lipil or Enfamil Premium, Lactose and Cow’s milk Formula, which are present in both, would be coded as “1” but galacto-oligosaccharides and Polydextrose, which were only present in Premium, were recorded as missing). Twin siblings were assumed to have concordant feeding practices unless the parents specified otherwise. Prebiotic variables were assigned according to exposure to neither, one, or both prebiotics (galacto-oligosaccharides and fructo-oligosaccharides) GOS: exposure to galacto-oligosaccharides, regardless of concurrent FOS exposure FOS: exposure to fructo-oligosaccharides, regardless of concurrent GOS exposure Only GOS: exposure to galacto-oligosaccharides with NO concurrent FOS exposure Only FOS: exposure to fructo-oligosaccharides with NO concurrent GOS exposure. ONLY found in soy formulas. GOS/FOS: concurrent exposure to galacto-oligosaccharides AND fructo-oligosaccharides. Solid food binary variables were aggregated as follows: Fruit or vegetable exposure → Fruit/Veg variable; positive if either Fruit or Vegetables were positive Meat, fish, or egg exposure → MeatFishEggs variable; positive if any of the components were positive Juice or sweetened drink exposure → JuiceSweetDrink variable; positive if either component was positive Cereal or starch exposure → CerealStarch variable; positive if either component was positive An AnyDairy variable was created for exposure to any dairy product, including cow’s milk formula. For binary variables reflecting current exposure to a food, medication, or ingredient, a second binary variable was generated reflecting lifetime exposure to that food, medication, or ingredient (exposure at any point prior). Sample size for all binary variables is listed in Table S6. Continuous variables (day of life, maternal weight gain, gestational age, weight) were log10-transformed prior to statistical analysis. Sample size for all continuous variables is 402. Sequence Data Processing A predetermined minimum sequencing depth of 5 million raw reads (2.5 million forward/reverse) per sample was required for inclusion in the study. Reads were trimmed using Trimmomatic (trimmomatic/0.33, minimum length = 60), and human DNA contamination was removed using Deconseq (Deconseq/0.4.3-chr38). We used MetaPhLan 2 (metaphlan2/2.2.0) 46 to extract taxonomic data, and HUMAnN2 (humann2/0.9.4) 47 to identify microbial functions. All taxonomic data is reported as relative abundance; all functional data was normalized to counts per million using the humann2_renorm_table function. Full details are below. Quality control Only samples with >2.5M raw reads in each direction (>5M total raw reads) were included in this study. Fig. S7 shows reads by subject age (months) and the distribution of samples included in the study by age in months There was no systematic bias in raw reads by age. Neonates and infants <3 months had fewer successful samples than infants 3–8 months of age, with neonates having the lowest number of samples that met our quality threshold. The median number of raw reads per sample was 11.3 million (IQR 6.3 million); the median number of reads following trimming and filtering human DNA was 9.2 million (IQR 5.6 million). We trimmed reads using Trimmomatic 48 (trimmomatic/0.33), with the following specifications PE –phred33 SLIDINGWINDOW:6:10 LEADING:13 TRAILING:13 MINLEN:60 We eliminated human sequences using Deconseq/0.4.3-chr38 49 . All analyses were performed on trimmed and decontaminated samples. Decontaminated sequence data was publicly deposited to protect the privacy of human subjects (Bioproject ID PRJNA473126, accession codes SAMN09259835-SAMN09260236). Taxonomic data extraction We used MetaPhlAn2 46 (metaphlan2/2.2.0) to extract taxonomic data from quality-filtered reads, with the following specifications -- mpa_pkl ${mpa_dir}/db_v20/mpa_v20_m200.pkl -- bowtie2db ${mpa_dir}/db_v20/mpa_v20_m200 Control samples (both a negative control and a positive control from a Zymo community standard) were included in all sequencing runs; the community standard failed in the fifth run. There were no taxa identified from the negative control samples. Although there was some bias in the community standard (gram negative organisms overrepresented, gram positive underrepresented), likely reflective of bias in DNA extraction, the results were highly reproducible, which is reassuring for analysis of longitudinal trends. There were small proportions (relative abundance <0.1%) of taxa identified in the community standard sample that were not part of the theoretical community composition: Nauvomozyma unclassified, Pantoea unclassified, and Eremothecium unclassified (Table S8). Nauvomozyma and Eremothecium were not identified in any fecal samples, and Pantoea unclassified was only found in a relatively small number of fecal samples (N=73 out of 402). There were no taxa identified in the negative control sample. Community standard and negative control results did not suggest any systemic contamination. Functional data extraction We used Humann2 47 (humann2/0.9.4) to identify genes and functional pathways from short-read data, with the following specifications. -- input-format fastq -- search-mode uniref50 -- bypass-translated-search -- bypass-prescreen -- gap-fill off We used the function humann2_renorm_table to convert gene and pathway output into normalized counts per million. All models are performed on community-wide counts of MetaCyc-identified functional pathways 47,50 . Individual pathways contributing to aggregate families are detailed in Table S2. The proportion of functional pathways identified as homologous to specific genera are summarized in Table S4. In order to model the abundance of pathways related to synthesis of a specific metabolite (e.g. clinical predictors of arginine synthetic pathway abundance instead of just the abundance of arginine synthesis I or arginine synthesis IV), pathways that were related to a specific metabolite were aggregated by summing the normalized community-wide abundance. Statistical Analysis Statistical analysis and generation of figures was performed in R using the vegan, ape, ggplot2, lme4, lmerTest, MuMin, and multcomp packages. Alpha-diversity is reported as the Shannon index, determined from species-level abundance using the vegan diversity() function. PCOA plots were generated from a Bray-Curtis dissimilarity matrix of family-level taxa generated using the vegan vegdist() and ape pcoa() functions. Sequential MANOVA was performed using the vegan adonis() function. All generalized linear mixed models (GLMMs) in this study are maximum-likelihood mixed models generated using the lme4 lmer() function, and Because the close resemblance between twins’ microbial communities represent an important potential confounding factor 3,13,42 , we controlled for twin status by including both Family and subject (Time | Subject) as mandatory random effects in all models. Time, in log(days) was a mandatory fixed effect in all longitudinal GLMMs; all other fixed effects were back-fitted using a stepwise approach, according to the following schema. As the effects of some clinical variables (e.g. specific formula ingredients) on the developing gut microbiome are completely unknown, we began the model-fitting process agnostic to which variables would be significant and screened all variables for inclusion. To broadly screen for covariation between clinical variables and microbiome features, for every taxonomic or pathway variable, we created two arrays of metadata corresponding to the values above and below the median (relative abundance or normalized CPM). We then applied a two-tailed test to compare these two arrays (t-test for continuous variables and Fisher’s Exact test for binary variables) and included all metadata variables with a screening p value <0.05 in a first-approximation GLMM. Because of the potential effects of Cesarean delivery and breastfeeding on the developing microbiome, they were always included in the first-approximation GLMM, even if they did not pass the screening test. Maternal weight variables represented a special case, with multiple potential confounding variables 15,31,32,35,51–53 . GWG would ideally be classified as normal, inadequate, or excessive according to maternal pre-pregnancy BMI and estimated gestational age of delivery, according to Institute of Medicine guidelines 54 . However, such calculations are established only for singleton pregnancies, with provisional guidelines available for total weight gain in twin gestation. Thus, in our twin population, we attempted to control for confounding variables such as pre-pregnancy BMI, gestational age, maternal diabetes, and preeclampsia, by modifying our model selection pathway so that pre-pregnancy BMI and gestational age at delivery were always included in our first-approximation GLMM, even if they did not pass the initial screening test. The GWG distribution in our cohort roughly corresponded with provisional IOM guidelines for GWG in twin pregnancies (16.8–24.5kg for normal pre-pregnancy BMI, 14.1–22.7 kg for overweight pre-pregnancy BMI, 11–3-19.1 kg for obese pre-pregnancy BMI) 54 . All mothers with inadequate GWG were in the first quartile of our population (2–15kg), while the fourth quartile from our population (26–33kg) represented excessive weight gain irrespective of pre-pregnancy BMI. Additional information found in Table S1. First-approximation GLMM was then back-fitted with the lmerTest step() function, and the MuMin rsquaredGLMM() function as a preferred post-hoc test for goodness of fit. All p values are two-tailed, and are adjusted for multiple comparisons using the multcomp glht() function (tension = Tukey) 55 . Parameters for all GLMMs are in Table S1, and statistically significant coefficients are summarized in Table S5. Full details are below. Statistical Modeling: All maximum-likelihood longitudinal generalized linear mixed models were constructed using the lme4, lmerTest, MuMin, and multcomp packages in R. For all taxonomic and functional pathways, the model formulae took the format of: lmer(PathwayOrTaxon ~ (1 | Family) + (0 + log(day of life) | Subject) + log (day of life) + x + y + …., REML=FALSE, data=df) Family and (time | subject) were mandatory random effects and time was a mandatory fixed effect in all models. Fixed effects were backfitted according to the following schema: Screening for candidate variables For each outcome variable of interest (pathway or taxon abundance), the median was determined. Two arrays of clinical variables were created; one associated with values above the median for the pathway or taxon of interest, and one associated with values below the median To screen for co-variation of clinical variables with the outcome variable of interest, a two-tailed t-test was done for all continuous clinical variables, and a two-tailed Fisher’s Exact test was done for all binary clinical variables. This screening test was performed to select candidate variables for inclusion in a longitudinal generalized linear mixed model (GLMM). No statistical conclusions were made based on this screening test, as this simple screen could not correct for repeated sampling over time, familial effects, and correction for confounding variables. All clinical variables with p<0.05 on initial t-test or Fisher’s Exact test screening were considered candidate variables for inclusion in the next naïve model-fitting set. Day of life, delivery route, and breastfeeding (>50%) were always included in the set of candidate variables, regardless of significance in the initial variable screening step. If any maternal weight variable (maternal pre-pregnancy BMI or maternal gestational weight gain) came through the initial screening step, then maternal pre-pregnancy BMI, maternal gestational weight gain, and infant gestational age at delivery were all included in the set of candidate variables, due to the potential for confounding effects. Naïve model fitting As binary variables were in two formats (current exposure to an ingredient vs lifetime exposure to an ingredient), two models were fitted: one for current exposure; one for lifetime exposure. Demographic variables (e.g. maternal age, infant birthweight, day of life) were included in all models If variables were supersets of other variables (e.g. “Maternal Peripartum antibiotics” is a superset of “Maternal Ampicillin” and “Maternal Cefazolin”), the supersets and subsets were not included in the same model; instead a specific model (with only subset variables) and a general model (with only superset variables) were created. Maximum-likelihood generalized linear models of all candidate variables identified in step 1 were created using the lmer() function in the lme4 package. The step() function in the lmerTest package was used to backfit maximum-likelihood generalized linear models (GLMMs) for all candidate variables, with a significance cutoff of 0.05 for retaining fixed effects. Pseudo-R2 was determined using r.squaredGLMM() in the MuMin package Testing for contribution of interaction terms If the correlation matrix of the output model showed any values >0.1 or < −0.1, between infant age and another variable, an interaction term for that variable and infant age (x * log(day of life)) was added to the set of candidate variables, and back-fitting with the step() function was repeated Pseudo-R2 was determined using r.squaredGLMM() in the MuMin package Model comparison The best model was selected from the set of backfitted models associated with a given outcome variable, which included a current-exposure model and a lifetime-exposure model. If superset/subset variables were part of the candidate set, then the current and/or lifetime-exposure models were also divided into specific and general models. Pseudo-R2 was prioritized as a post-hoc test to select the best model. Adjustment for multiple comparisons The glht() function in the multcomp package 55 was used to adjust p values in the preferred model for multiple comparison (lincfit=mcp(tension=“Tukey”)). Supplementary Material 1 2 Table S3 Table S4 Table S5 Table S6 Table S7 Table S8

Related collections

Most cited references 38

Record: found
Abstract: found
Article: found

Is Open Access

Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

Robert Schmieder, Robert Edwards (2011)

High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets ( 150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/.

0 comments Cited 355 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gut microbiota profiling of pediatric NAFLD and obese patients unveiled by an integrated meta-omics based approach.

Federica Del Chierico, Valerio Nobili, Pamela Vernocchi … (2017)

There is evidence that non-alcoholic fatty liver disease (NAFLD) is affected by gut microbiota. Therefore, we investigated its modifications in paediatric NAFLD patients using targeted-metagenomics (MG) and metabolomics (MB). Stools were collected from 61 consecutive patients diagnosed with NAFL, NASH, or obesity and 54 healthy subjects (CTRLs), matched in a case-control fashion. Operational taxonomic units were pyrosequenced targeting 16S ribosomal RNA and volatile organic compounds (VOCs) determined by solid-phase micro-extraction GC-MS. The α-diversity was highest in CTRLs followed by obese, NASH, NAFL patients and β-diversity distinguished between patients and CTRLs, but not NAFL and NASH. Compared to CTRLs, in NAFLD patients Actinobacteria were significantly increased and Bacteroidetes reduced. There were no significant differences amongst NAFL, NASH, and obese groups. Overall NAFLD patients had increased levels of Bradyrhizobium, Anaerococcus, Peptoniphilus, Propionibacterium acnes, Dorea, Ruminococcus and reduced proportions of Oscillospira and Rikenellaceae compared to CTRLs. After reducing MG and MB data dimensionality, multivariate analyses indicated Oscillospira decrease in NAFL and NASH groups, and Ruminococcus, Blautia, and Dorea increase in NASH patients compared to CTRLs. Of the 292 VOCs, 26 were up- and 2 down-regulated in NAFLD patients. Multivariate analyses found that combination of Oscillospira, Rickenellaceae, Parabacteroides, Bacteroides fragilis, Sutterella, Lachnospiraceae, 4-methyl-2-pentanone, 1-butanol, and 2-butanone could discriminate NAFLD patients from CTRLs. Univariate analyses found significantly lower levels of Oscillospira and higher levels of 1-pentanol and 2-butanone in NAFL compared to CTRLs. In NASH, lower levels of Oscillospira were associated with higher abundance of Dorea, Ruminococcus and higher levels of 2-butanone, 4-methyl-2-pentanone compared to CTRLs.

0 comments Cited 288 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life.

Fredrik Bäckhed, Josefine Roswall, Yangqing Peng … (2015)

0 comments Cited 275 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9502015

Journal ID (pubmed-jr-id): 8791

Journal ID (nlm-ta): Nat Med

Journal ID (iso-abbrev): Nat. Med.

Title: Nature medicine

ISSN (Print): 1078-8956

ISSN (Electronic): 1546-170X

Publication date Nihms-submitted: 11 September 2018

Publication date (Electronic): 29 October 2018

Publication date (Print): December 2018

Publication date PMC-release: 29 April 2019

Volume: 24

Issue: 12

Pages: 1822-1829

Affiliations

[1 ]Division of Newborn Medicine, Department of Pediatrics, Washington University in St. Louis School of Medicine, St. Louis, MO, USA

[2 ]The Edison Family Center for Genome Sciences and Systems Biology, Washington University in St. Louis School of Medicine, St. Louis, MO, USA

[3 ]Division of Gastroenterology, Hepatology, and Nutrition Department of Pediatrics, Washington University in St. Louis School of Medicine, St. Louis, MO, USA

[4 ]Department of Molecular Microbiology, Washington University in St. Louis School of Medicine, St. Louis, MO, USA

[5 ]Department of Pathology and Immunology, Washington University in St. Louis School of Medicine, St. Louis, MO, USA

[6 ]Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA

Author notes

AUTHOR CONTRIBUTIONS

A.M.B-D., A.W.D., B.B.W, P.I.T., and G.D. conceived of experiments and design of work and analyses. B.B.W. and P.I.T. oversaw collection and stewardship of fecal samples and clinical metadata inventories. A.M.B-D. performed wet-lab experiments with advice from G.D. A.M.B-D. performed computational analyses with advice from A.W.D. and G.D. Article drafting was performed by A.M.B-D with critical revision performed by A.W.D., B.B.W, P.I.T., and G.D.

[* ]To whom correspondence should be addressed: A.M.B-D ( dudenhoeffer@ 123456wustl.edu ); G.D. ( dantas@ 123456wustl.edu )

Article

Manuscript ID: NIHMS1506371

DOI: 10.1038/s41591-018-0216-2

PMC ID: 6294307

PubMed ID: 30374198

SO-VID: b4b83c56-ea9d-441d-a8ef-a2bfd0b8b6e0

License:

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

Diet and Maternal Gestational Weight Gain Predict Metabolic Maturation of Infant Gut Microbiomes

Read this article at

Abstract

Related collections

Role of Microbes in Soil Fertility and Human Health

Most cited references 38

Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

Gut microbiota profiling of pediatric NAFLD and obese patients unveiled by an integrated meta-omics based approach.

Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 9

Cited by 88

Most referenced authors 1,127