Determinants of plasma metabolites in individuals with impaired glucose control
Metabolomic profiling was performed on plasma samples collected from individuals (aged 50–64 years) with prediabetes, treatment-naive T2D and controls, who were included in the impaired glucose tolerance (IGT) (n = 697) and Swedish CArdioPulmonary bioImage Study (SCAPIS) (n = 470) cohorts from Sweden25, serving as discovery and validation cohorts, respectively (Fig. 1). In the discovery cohort, 220 individuals had NGT, 185 had isolated impaired fasting glucose (IFG), 173 had isolated IGT, 74 had combined glucose intolerance (CGI) and 45 had screen-detected T2D based on fasting glucose levels or oral glucose tolerance test (OGTT). In the validation cohort, 201 individuals had NGT, 130 had isolated IGT, 84 had CGI and 55 had T2D. As 364 of 477 individuals (76.3%) with prediabetes and T2D in the discovery cohort were overweight or obese (body mass index (BMI ≥ 25)), the NGT group was BMI-matched with the IGT group in the validation cohort (Supplementary Table 1) to partly mitigate the potential confounding effects of overweight and obesity. The detailed clinical characteristics of both cohorts can be found in Supplementary Table 1. A total of 978 plasma metabolites, primarily derived from amino acids (22.1%) and lipid (45.4%) metabolism, were measured and annotated (Supplementary Table 2).
FBG and OGTT were used to screen individuals with varying degree of glucose intolerance. The GBDT algorithm was used to predict plasma metabolites based on collected data from the FFQ, clinical tests and gut microbiome profiling. n indicates the sample size for the two cohorts, or the number of features in the diet, clinical, gut microbiome and plasma metabolome datasets.
While clinical phenotypes, microbiome and diet have been linked to the blood metabolome in healthy individuals from Israel14, it is important to explore whether these factors also applied to the Swedish cohort, including those with prediabetes and T2D. To this aim, we used the same analytical strategy, that is, the gradient-boosted decision trees (GBDT) algorithm14 (Methods and Extended Data Fig. 1). We evaluated the relative predictive power of these three feature groups, including 34 clinical biomarkers (Supplementary Table 1), 1,427 metagenomic species (MGSs) (Supplementary Table 3)25 and 193 dietary variables based on MiniMeal-Q28,29, a validated web-based interactive food frequency questionnaire (FFQ) (Supplementary Table 4), respectively, for each circulating metabolite measured in the Swedish IGT cohort, ranging from normal glucose control to treatment-naive T2D. In total, we observed that 645 of 978 (65.9%) metabolites were significantly associated with at least one feature group (Supplementary Table 5; Wald test, Padj < 0.1). In particular, we found that GBDT models explained a median and maximum explained variance of 13.6% and 66.3%, respectively, to predict the circulating levels of each metabolite with the clinical data (465 associated metabolites in total), 7.8% and 47.2%, respectively, with the microbiome data (197 metabolites), and 1.3% and 38.3%, respectively, with the diet data (272 metabolites) (Fig. 2a and Supplementary Table 5). The relative predictive power of these three factors over the whole metabolome, calculated based on new GBDT models to predict the principal metabolomics components, was 56.2%, 29.4% and 12.4% of the full model for clinical, microbiome and diet data (Fig. 2b), respectively. These findings show that potential determinants persist in prediabetes and T2D, with the gut microbiome alone accounting for nearly one-third of blood metabolite variance—twice that measured in healthy individuals14,15,16,23.
a, Box and swarm plots illustrating the explained variance of the top 50 significantly predicted metabolites from clinical data, microbiome and diet using the GBDT algorithm in the discovery cohort. The P values were estimated based on 1,000 iterations of bootstrapping; the number of metabolites with significant predictions in each group are shown in parentheses (Wald test, two-sided Padj < 0.1). The boxes show the median (line), the 25th/75th percentiles (box) and 1.5 times the interquartile range (IQR) (whiskers). b, Comparison of the explained variance according to each feature group, with the full model incorporating all features. c, Explained variance for the 197 predicted microbiome-associated metabolites in the Swedish versus Israeli cohorts. Fifteen influential outliers (with biochemical names labeled) were identified using Cook’s distance (larger than three times the mean Cook’s distance) based on a linear regression model (raw two-sided P < 2.2 × 10−16). The circle sizes are proportional to the explained variance according to diet in the Israel cohort. d, Relative abundances of L. asaccharolyticus in the Israeli versus Swedish cohorts (n = 969 and n = 1,167, respectively). The boxes show the median (line), the 25th/75th percentiles (box) and 1.5 times the IQR (whiskers). e, The top 100 metabolites, ranked according to the explained variance from microbiome data, and their differences in plasma levels between CONV-R and GF mice. The symbols (+) in black and red indicate whether a metabolite was detected in mice and whether it significantly differed between the two groups (Wilcoxon rank-sum test, two-sided Padj < 0.1), respectively. Labelled metabolite names (*) indicate Metabolon-anotated IDs requiring validation. The boxes show the median (line), the 25th/75th percentiles (box) and 1.5 times the IQR (whiskers). f, Explained variance (mean out-of-sample R²) of the Shannon index according to clinical biomarkers, amino acids, lipids or all metabolites was assessed using random forest regression (n = 697 samples), with tenfold cross-validation repeated ten times (n = 100 repeats; shown as the mean ± s.e.m.).
Robust predictions of microbiome-associated metabolites
We next used five distinct approaches to predict and validate the 197 microbiome-associated metabolites identified (Fig. 2a and Supplementary Table 5): (1) we evaluated the impact of distinct metagenomics pipelines, including reference-free canopy clustering25, the reference-based Kraken 2 (ref. 30) and the lineage-specific marker-gene-based MetaPhlAn 4 (ref. 31) to predict the microbiome-associated metabolites; (2) we used two machine learning (ML) methods, that is, GBDT and random forest, to establish microbiome–metabolome associations based on the same MGSs25; (3) we used the same ML algorithms to link metabolites to the Kyoto Encyclopedia of Genes and Genomes orthologies and compared the performance of these orthology models to that of MGSs; (4) we assessed the robustness of microbiome-associated metabolites across populations, that is, the Israeli14 and British TwinsUK23 cohorts versus the Swedish cohort; (5) we verified whether the predicted microbiome-associated metabolites were significantly altered in germ-free (GF) versus conventionally raised (CONV-R) mice.
We observed that microbiome–metabolome associations were consistent across pipelines, with a Pearson correlation of 0.97 (P < 2.2 × 10−16) between Canopy and Kraken 2 (Extended Data Fig. 2a) and between Canopy and MetaPhlAn 4 (Extended Data Fig. 2b), microbiome configurations at MGSs and Kyoto Encyclopedia of Genes and Genomes orthology levels (Pearson correlation coefficient R = 0.95; P < 2.2 × 10−16; Extended Data Fig. 2c), as well as when testing different computational methods (R = 0.73; P < 2.2 × 10−16; Extended Data Fig. 2d), and between populations (R = 0.74 and P < 2.2 × 10−16 in the Israeli14 versus Swedish cohorts; Fig. 2c). Robust microbiome–metabolome associations were also replicated in the British TwinsUK cohort23 (R = 0.60; P = 8.12 × 10−9; Extended Data Fig. 2e), despite the gap of 0.9 ± 1.3 years between the collection of fecal and blood samples.
Apart from the generally consistent microbiome–metabolome associations across populations, we identified 15 metabolites differently predicted by the microbiome between the Israeli and Swedish cohorts (Fig. 2c). These metabolites were dominated by xenobiotics, including 11 metabolites involved in benzoate (3-phenylpropionate) and xanthine metabolism (for example, caffeine and 5-acetylamino-6-amino-3-methyluracil), with the remaining four from amino acid metabolism (phenol sulfate, indole-acetate, phenylacetylglutamine and p-cresol-glucuronide). Interestingly, the eight xanthine-related metabolites involved in caffeine metabolism, along with quinate (a compound commonly found in coffee), were associated with diet in the Israeli cohort but not in Swedish cohorts. This may be attributed to distinct dietary habits. Epidemiological data and food logs show that coffee intake in the Israeli cohort is about one-third of that in the Swedish cohort, despite doubling over the past 50 years (data from the Food and Agriculture Organization of the United Nations32; Extended Data Fig. 3a). In agreement, 95.2% of individuals in the Swedish cohorts reported at least one cup of coffee per day, while 84.6% and 57.8% reported two or more than three cups of coffee per day, respectively (Extended Data Fig. 3b). Thus, we hypothesized that the gut microbiome of Swedes has adapted to routinely coffee exposure, and that the high intake of coffee may reduce the variability of these metabolites that can be attributed to diet. The relative abundances of Lawsonibacter asaccharolyticus, a bacterium involved in coffee metabolism19,33, were indeed lower in the Israeli cohort compared to the Swedish cohort (Fig. 2d). Furthermore, the abundance of this bacterium, but not other Lawsonibacter species including Lawsonibacter sp900066825, were associated with more frequent coffee consumption (Extended Data Fig. 3c).
GF and CONV-R mice offer a robust model to validate in-silico-predicted microbiome-associated metabolites in vivo. Thus, we performed metabolomic profiling of plasma from the portal vein of these mice and identified 66 of 197 microbiome-associated metabolites found in humans, with over half (54.5%) showing significant differences between the two models (Supplementary Table 6 and Fig. 2e), thus confirming their strong association with the gut microbiota.
Finally, we explored whether metabolites were associated with microbial diversity by assessing the Shannon index. Consistent with previous results24, we confirmed that interindividual differences in the gut microbiome were reflected in the plasma metabolome, explaining 49.4% of the variance in alpha diversity, whereas clinical biomarkers explained only 9.4% of the variance (Fig. 2f). Unexpectedly, we also observed that the explained variances by lipid-derived (39.8%) and amino-acid-derived (35.1%) metabolites were minimally additive (43.5% when combined), suggesting that the microbiome–lipid interactions were interconnected, either directly or indirectly, with microbiome–amino acid interactions34,35.
Molecular signatures of individuals with impaired glucose control
In total, we identified 64, 510, 450 and 585 metabolites that showed significantly altered plasma levels in individuals with isolated IFG, isolated IGT, CGI and T2D, respectively, compared to the NGT controls in the discovery cohort (Wilcoxon rank-sum test, Padj < 0.1), resulting in 759 potential metabolites associated with impaired glucose control (Supplementary Table 7). Of these molecular signatures, 502 were altered in the validation cohort, of which 54.2% were annotated as lipid-related and 20.3% as amino-acid-related metabolites (Fig. 3a and Supplementary Table 7). However, imidazole propionate, one of the top ranked microbiome-associated metabolites (Fig. 2e), was only significantly increased in the IGT cohort in individuals with impaired glucose control versus NGT (discovery) but not in SCAPIS (validation, Padj = 0.11) cohort (Supplementary Table 7); accordingly, it was not included in the downstream analyses. Of the 502 metabolites, 469 (126 microbiome-associated), remained significantly associated with higher and lower odds ratios (ORs) for IFG, IGT or CGI/T2D, after adjusting for group differences in age and sex similarly to a previous study9 (Fig. 3b and Supplementary Table 8; logistic regression analyses; Padj < 0.1).
a, Circular heatmap showing the 502 metabolites consistently altered in the prediabetes and T2D groups versus the NGT group in both the discovery (D) and validation (V) cohorts (Wilcoxon rank-sum test; two-sided Padj < 0.1). b, Top 100 metabolites with significantly lower or higher ORs of CGI/T2D risks after adjusting for age and sex (logistic regression analyses; two-sided Padj < 0.1) c, Metabolites uniquely associated with specific prediabetes and T2D subgroups and those shared across groups. Metabolites associated with overweight or obesity (n = 117 of 165) in the NGT group of the discovery cohort, T2D (n = 150), HF (n = 99) and KD (n = 111) in the EPIC-Norfolk cohort, and ACS (n = 205) in Israelis are highlighted with colored lines if they were also T2D-associated in Swedes. d, Proportions of overweight-associated and obesity-associated metabolites over all prediabetes-associated, IFG-associated, IGT-associated, CGI-associated or T2D-associated metabolites. Groups labeled with different letters (a or b) indicate significant statistical differences (two-sided chi-squared test). e, Venn diagram showing that a total of 143 microbiome-associated metabolites (calculated by summing the three numbers highlighted in red) identified in Israelis (n = 104) or Swedes (n = 197) overlap with prediabetes-associated and T2D -associated metabolites (n = 502). f,g, Random forest classifiers in distinguishing CGIs and T2Ds from NGTs in the discovery (f) and validation (g) cohorts based on the FINDRISC, microbiome, 143 microbiome-associated and prediabetes-associated and diabetes-associated metabolites, the 32 most robust microbiome-associated metabolites identified in both the Swedish and Israeli cohorts, or all 501 of 502 prediabetes-associated and diabetes-associated metabolites excluding glucose. The performance of the classifiers is assessed by AUC; the cross-validation AUCs based on tenfold cross-validation repeated ten times in the discovery cohort and true prediction AUCs in the validation cohort were provided, respectively.
Comparison of the altered metabolites across distinct prediabetes and T2D groups revealed that 56 of 502 metabolites were significantly altered in isolated IFG compared to the NGT control. Interestingly, these 56 metabolites were concurrently altered in all subgroups of prediabetes and T2D, prompting the question of whether, and to what extent, IFG and IGT are fundamentally different (Fig. 3c and Supplementary Table 7). In contrast, 241 (48.0%) of altered metabolites were shared among subgroups characterized by glucose intolerance (isolated IGT, CGI and T2D).
Metabolites linked to prediabetes or diabetes were then compared with those associated with distinct cardiometabolic diseases to identify potential shared metabolic pathways between the two conditions9,13. When the 220 individuals with NGT in the discovery cohort were stratified according to BMI, 165 metabolites were significantly altered in overweight or obese individuals (BMI ≥ 25; n = 108) compared to the lean controls (BMI < 25; n = 112) (Fig. 3c and Supplementary Table 7). Of these, 117 (70.9%) overlapped with the 502 prediabetes-associated and T2D-associated metabolites but only eight were identified as overweight-specific and obesity-specific (including malonate, methylmalonate, cortisol, myo-inositol, 2-palmitoylglycerol (16:0), 2-linoleoylglycerol (18:2), 3β-7α-dihydroxy-5-cholestenoate and N-δ-acetylornithine) (Supplementary Tables 7 and 9). A connection between cortisol and obesity is well established36 and gut bacteria metabolizing myo-inositol were recently suggested to be enriched in an obesity-related gut microbiome enterotype37. In addition, 33 obesity-associated metabolites were identified in the isolated IFG group, accounting for 58.9% of altered metabolites in this subgroup, which constituted a significantly larger proportion than in isolated IGT (26.2%), CGI (28.8%) or T2D (27.5%) groups (chi-squared test, P < 0.01) (Fig. 3d). Our results also demonstrated that 245 of 502 metabolites were associated with noncommunicable diseases in the EPIC-Norfolk cohort9, which included 150 associated with incidence of T2D, 99 with HF and 111 with kidney disease (KD) (Fig. 3c and Supplementary Table 9). We also observed that 392 of 533 metabolites showing altered differences between ACS and non-ACS controls11 were detected in our study. Notably, 52.3% (205 of 392) were consistently associated with prediabetes and T2D (Fig. 3c and Supplementary Table 9), which is not unexpected as 31.2% of patients with ACS had T2D11. This conclusion is supported by studies indicating that similar microbiome and metabolome alterations are observed across the span of cardiometabolic diseases from obesity to HF12.
Among the 502 metabolites identified as potential biomarkers of impaired glucose control, 143 were microbiome-associated in either the Swedish or Israeli cohort (Fig. 3e). We performed random forest classification to compare the metabolome’s ability to distinguish CGI and T2D from NGT controls, versus microbiome-based classifiers and FINnish Diabetes Risk SCore (FINDRISC), which showed similar performance25. The models were trained and optimized in the discovery cohort, then applied to the validation cohort for prediction. Model performance was assessed using the area under the curve (AUC). The non-glucose-metabolite-based (n = 501) classifiers demonstrated superior performance compared to their MGSs classifiers and the FINDRISC score, with AUCs of 0.89 and 0.83 in the discovery and validation cohorts, respectively (Fig. 3f), comparable to models using all metabolites without preselection (AUCs of 0.89 and 0.84, respectively; Extended Data Fig. 4). Models based on the 143 microbiome-associated metabolites linked to impaired glucose control, or the 32 metabolites robustly associated with the gut microbiome across populations, also showed superior performance compared to the MGSs classifier, with AUCs of 0.79 and 0.76, respectively, in the validation cohort (Fig. 3g).
Diet–microbiota interactions affecting glucose control
We next conducted feature attribution analysis based on the SHapley Additive exPlanation (SHAP) approach to identify the potential effects of specific MGSs and lifestyle factors on plasma molecular signatures affecting glucose control. SHAP values quantify feature importance and attribute gut microbiome taxa contributions to functional perturbations while preserving microbial composition38. We focused on the 502 consistently changed metabolites in the two Swedish cohorts and 118 MGSs associated with prediabetes and T2D in the same individuals identified previously25 (Supplementary Table 3).
Our results indicate that among the MGS–metabolite pairs with the highest SHAP values, the recently isolated but largely uncharacterized species Hominifimenecus microfluidus can have a significant impact on the metabolism of several xenobiotics, including quinate (Extended Data Fig. 5 and Supplementary Table 10). As expected, the variation in abundance of this bacterium is similar to L. asaccharolyticus (Fig. 2d), exhibiting much lower abundances in Israelis compared to Swedes (Extended Data Fig. 6). Faecalibacterium species are among the key features associated with indolepropionate levels, which are inversely associated with the risk of T2D, consistent with previous findings39. Another notable MGS–metabolite pair was observed between Ruminococcus gnavus and isoursodeoxycholate, which is consistent with the known ability of R. gnavus to produce iso-bile acids40. The capacity to produce isoursodeoxycholate may provide a mechanism for how R. gnavus contributes to inflammation and cardiometabolic disease41. Additionally, predicted plasma levels of metabolites involved in phenylalanine metabolism, such as phenylacetate, phenylacetylglutamate and phenylacetylglutamine, were linked with certain bacteria of the Clostridium genus, which are linked to heightened cardiovascular disease risk42,43.
To gain a broader understanding of the interactions between the plasma metabolome and different predictive MGSs, we then used the top 300 metabolite–MGS pairs with the strongest SHAP values for dynamic network visualization using a force-directed algorithm. Notably, this analysis identified H. microfluidus and Blautia wexlerae, both members of the Lachnospiraceae family, as the key nodes of the metabolome–microbiome dynamics in prediabetes and T2D (Fig. 4a). Further network analysis confirmed these observations: three MGSs—H. microfluidus, B. wexlerae and Agathobacter rectalis—were consistently ranked among the top five features based on their high node degree and betweenness centrality, potentially acting as keystone species (Fig. 4b). Interestingly, we observed an inverse relationship between H. microfluidus and B. wexlerae via four metabolites, of which three were involved in benzoate metabolism, including catechol sulfate, 3-phenylpropionate and hippurate (Fig. 4a,b). The tight connections between hippurate and different Blautia species and strains have also been observed in the LifeLines DEEP cohort16,44. Additional mediation analyses revealed a bidirectional relationship between these two bacteria: hippurate mediates 21.1% of the effect of H. microfluidus on B. wexlerae, while 17.8% of the effect of B. wexlerae on H. microfluidus is mediated through this metabolite (Fig. 4c). Note that the SHAP-based analyses were consistent with Spearman correlation analyses both in the Swedish cohort (Fig. 4d) and in a geographically independent Chinese cohort where we had previously profiled the gut microbiome using the same methods45 (Fig. 4e). These findings are consistent with a gut microbiome structure consisting of two competing guilds across population and health status46.
a, Bi-network showing the top 300 MGS–metabolite pairs with the largest absolute SHAP values based on a force-directed algorithm. b, Network analysis of the node degree and betweenness centrality for those top 300 MGS–metabolite pairs. c, Bidirectional causal inference using mediation analyses to estimate the proportions of effect mediated by hippurate between H. microfluidus and B. wexlerae. d, Dot plot showing the significant correlations between plasma hippurate and H. microfluidus and B. wexlerae, respectively, in the discovery cohort (Spearman ρ correlation analyses, raw two-sided P < 2.2 × 10−16). e, Replication of the Hominifimenecus-hippurate-Blautia associations in a Chinese cohort (Spearman ρ correlation analyses, two-sided raw P < 2.2 × 10−16). f, Top ten metabolites identified as important features in the 2-h OGTT, FBG HbA1c, fasting insulin, HOMA-IR or FINDRISC (n = 49 in total) based on the GBDT models; metabolites were order according to their SHAP values (reflecting feature importance) to 2-h OGTT levels. The lifestyle features (purple) including both the physical activity levels (as measured by steps per day) and dietary components or MGSs (green) with the maximum or minimum SHAP values for each metabolite are also shown on the right. Metabolites were colored using pathway annotations, including those involved in amino acid, lipid, carbohydrate and xenobiotic metabolism as in a.
Next, we assessed the SHAP values of metabolites in relation to several glucose and insulin indices in our cohorts, encompassing 2-h OGTT levels, fasting blood glucose (FBG), hemoglobin A1c (HbA1c), fasting insulin, homeostatic model assessment of insulin resistance (HOMA-IR) and FINDRISC (Fig. 4f and Supplementary Table 11). Our findings revealed that the primary metabolites reflective of FINDRISC were generally consistent with those influenced by fasting insulin and HOMA-IR but not FBG, suggesting that FINDRISC may reflect insulin resistance rather than glycemia per se from a molecular perspective. Of interest, catechol sulfate and hippurate emerged as the top two features exhibiting negative contributions with 2-h OGTT levels, but not with fasting insulin and HOMA-IR, to which the most positive and negative contributions were glutamate and 1-(1-enyl-palmitoyl)-GPC (P-16:0), respectively. Our results further indicated that Bifidobacterium adolescentis was linked to lower levels of α-ketobutyrate and 2-hydroxybutyrate, which showed the highest positive SHAP values with 2-h OGTT (Fig. 4f). Note that the SHAP values of metabolites regarding the 2-h OGTT strongly correlated with the model coefficients from the linear ridge regressions, demonstrating robustness across distinct ML methods (R = 0.71, P < 2.2 × 10−16; Extended Data Fig. 7).
Lifestyle-specific modulation of diabetes-linked metabolites
In line with the established understanding that both gut microbiota and T2D are influenced by lifestyle changes10,22, our analyses identified physical activity levels (measured as steps per day; Fig. 4d) and several dietary components to be among the top factors influencing variations in distinct diabetes-related metabolites. Thus, we analyzed the plasma metabolome data from two previous longitudinal trials—one focusing on diet22 and the other on exercise47—to identify molecular responses to these lifestyle interventions. We could determine the levels of 307 of 502 metabolites associated with impaired glucose control; of these, 125 were associated with improvement in insulin sensitivity (as reflected by HOMA-IR) upon dietary intervention22. Most of these metabolites were lipids (77, 61.6%), amino acids (39, 31.2%) and xenobiotics (9, 7.2%) (Supplementary Table 12);123 of 125 metabolites overlapped with the metabolites profiled in the exercise intervention study aiming to characterize the metabolic benefits of short-term exercise47 (Supplementary Table 12).
Additional hierarchical clustering analysis revealed that the 123 overlapping metabolites could be classified into eight clusters based on their differences in prediabetes and T2D versus NGTs, and their responses to both the dietary and exercise interventions (Fig. 5). Eighty-one (65.9%) of these metabolites responded to at least one of the interventions. Importantly, we observed that lifestyle–metabolite interactions varied depending on the type of intervention (Fig. 5 and Supplementary Table 12), similar to the heterogeneity observed in T2D pathogenesis. Specifically, 32 metabolites were reversible after both interventions (clusters 2 and 7), while 42 metabolites were not altered by either intervention (clusters 3 and 8). Moreover, 28 metabolites showed reversal only after the dietary intervention (clusters 4 and 5), whereas 21 metabolites responded exclusively to exercise (clusters 1 and 6).
Heatmap showing the overlapping metabolites involved in amino acid, lipid and xenobiotic metabolism (n = 123) in two clinical trials of either diet (14 days) or exercise for 1-h (before, 120 and 180 min after exercise) interventions with those 502 altered metabolites in prediabetes and T2D. Responses reversed (Y, yes; N, no) by either diet (D) or exercise (E) or both (B) were clustered and are shown in distinct colors beside the row clustering branches. Representative metabolites including 14 overlapping with Fig. 4f are labeled in red, and five others in black. Wilcoxon rank-sum test and one-way repeated-measures analysis of variance were used to identify altered metabolites in the cohorts and two longitudinal datasets (Padj < 0.1), respectively.
Interestingly, 14 of the top 49 features associated with glucose or insulin indices (Fig. 4d) were also identified and ten of these were reversible using short-term lifestyle changes; the remaining four—hippurate, 1-oleoyl-GPC (18:1), α-ketobutyrate and 2-hydroxy(iso)butyrate (Fig. 5)—were not. This indicates that other factors modulate these metabolites. In support, significantly elevated plasma hippurate levels were observed in the Chinese cohort when stratified according to high versus low physical fitness levels (P = 0.048; Extended Data Fig. 8a) and correlated with maximum oxygen intake levels (Extended Data Fig. 8b), suggesting that long-term, but not short-term, physical exercise might modulate this microbial metabolite. In agreement, average daily steps, which are indicative of habitual physical activity, emerged as the second most influential factor positively associated with circulating hippurate levels in our Swedish cohort (Extended Data Fig. 8c). The plasma levels of three branched-chain fatty acids, repeatedly linked to glucose control and insulin resistance48, could be reduced with short-term exercise but, as expected, not with a high-protein diet. In contrast, 7α-hydroxy-3-oxo-4-cholestenoic acid (7-HOCA), a new substrate of liver 5β-reductase contributing to liver lipid dysregulation49, and 5-α-androstan-3-β,17-β-diol disulfate, a top feature associated with alcohol consumption50, could only be reduced by diet and not exercise intervention. Thus, our results indicated that the interactions between lifestyle and the microbiome–metabolome axis are modifiable targets for T2D management; however, optimal health benefits might be achievable through a combination of lifestyle modifications.