Study design, participants, and sample collection
Participants were recruited from the San Francisco Bay Area, California. Inclusion criteria were general health, with no prior diabetes diagnosis, no uncontrolled hypertension or major organ disease, and no use of diabetes medication. Participants underwent evaluations and screening tests at the Clinical and Translational Research Unit after overnight fasting (e.g., HbA1c, fasting plasma glucose, insulin, lipid panel, and creatinine at baseline). The study protocol was reviewed and approved by the Institutional Review Board at Stanford University School of Medicine Human Research Protection Office (Institutional Review Board #43883). All participants provided written informed consent. This trial is registered on ClinicalTrials.Gov (NCT03919877; “Precision Diets for Diabetes Prevention”; 2019-04-18). Participants underwent gold standard metabolic tests (as described in detail in the following sections) after 10-h overnight fasting, including an oral glucose test (OGTT), insulin suppression test (IST), and isoglycemic intravenous glucose infusion test (IIGI). The metabolic test results determined participants’ metabolic subphenotypes, such as IR, beta-cell dysfunction, and incretin dysfunction.
Main cohort
36 healthy adults were included in the final analyses as the main study cohort (also called training cohort) (Table 1).
Validation cohort
An independent cohort of 10 individuals completed metabolic tests and provided lifestyle data. The demographics, labs, and metabolic test results are summarized in Supplementary Table 1.
Lifestyle deep profiling using wearable biosensors and feature extraction
By leveraging the power of real-time digital health monitoring technologies, we monitored participants’ dietary intake, sleep characteristics, physical activity, and glucose levels in real-time throughout the study period (at least 14 consecutive days). Participants were asked not to change their sleep and activity habits during the study. Moreover, participants were required to maintain their normal eating, sleep, and physical activity habits without change during the study.
For dietary data collection, participants were required to log all food and beverage items consumed in real-time on the Cronometer food tracking app (Cronometer Software, Inc., Revelstoke, BC, Canada). A median of 20.5 days of food logs were collected from 36 participants. Over 92% of participants provided more than 10 days of diet data during the study period. To enhance the accuracy of the diet data, days with a reported daily caloric intake of less than 500 kcal as well as those reporting an overnight fasting period exceeding 24 hours were excluded. Registered dietitians monitored participants’ food log entries (food items, calories, and nutrient compositions) throughout the study. It was also ensured that all participants could record dietary intake data for at least two weekdays and one weekend day to capture a more accurate and representative understanding of their typical dietary habits. There was no missing dietary data for all 36 participants. A total of 74 diet features (51 energy-adjusted nutrient levels, 10 food groups, and 13 meal timings) were extracted (Fig. 1 and Supplementary Table 2).
For sleep and physical activity data collection, participants wore a Fitbit Ionic band (Fitbit, Inc., San Francisco, CA) for the study period. The Fitbit data was available for 24 out of 36 participants due to a product recall of Fitbit Ionic for potential burn hazards during the study period. As such, a median of 55 nights of sleep data and 64 days of physical activity data were collected from 24 participants. To ensure data accuracy, only days with 4–12 hours of overnight sleep data were considered, and days with less than 500 steps were excluded. 14 sleep features (1 quantity, 9 qualities, 4 timings) and 23 physical activity features (4 activity levels, 19 timings) were extracted (Fig. 1 and Supplementary Table 2). This study did not use the duration for each sleep stage because we did not have access to open-source Fitbit data to independently validate the algorithm predicting sleep structure in our population. Finally, heart rate (HR) data were also extracted.
For continuous glucose monitoring, participants wore a Dexcom G4 CGM device (Dexcom Inc., San Diego, CA) for the study period. Of note, readings from glucose monitoring devices were not made available to the participants until the study-end, therefore, lifestyle habits were not affected by the recordings. CGM data were collected for a median of 28 days from 35 participants (Fig. 1).
Gold-standard metabolic physiological tests
Participants underwent glucose metabolic tests after 10-h overnight fasting to determine metabolic characteristics, such as tissue-specific IR, beta-cell dysfunction, and incretin dysfunction. The details of the physiologic tests are described in Metwally et al.5, and are summarized as follows.
Muscle IR was quantified through an insulin suppression test (IST). In a validated IST39,40, participants were infused with octreotide (0.27 μg m−2 min−1), insulin (32 mU m−2 min−1), and glucose (267 mg m2 min−1) for 240 min. In this test, participants showed different levels of SSPG, indicating the individual’s ability to insulin-mediated glucose disposal14.
Beta cell function was assessed during an oral glucose tolerance test (OGTT). Specifically, plasma glucose levels were measured at 16 timepoints (−10, 0, 10, 15, 20, 30, 40, 50, 60, 75, 90, 105, 120, 135, 150, and 180 min) following a 75 g oral glucose load, while insulin and C-peptide were measured at 7 timepoints (0, 15, 30, 60, 90, 120, 180 min) using Millipore radioimmunoassay assay at the Core Lab for Clinical Studies, Washington University School of Medicine in St. Louis (WashU). The insulin secretion rate was calculated from C-peptide levels during the OGTT test using the Insulin SECretion (ISEC) software. Then, a disposition index (DI; (pmol*dL)/(kg*ml))15, was calculated as the area under the insulin secretion rate, divided by the SSPG. Based on the DI, the beta cell function was determined.
Incretin function was quantified using an IIGI test. In this test, participants were continuously infused with dextrose via an intravenous catheter. The incretin effect (IE%) can be quantified by comparing plasma glucose and C-peptide profiles responding to the dextrose load either orally (OGTT) or intravenously (IIGI).
The HIR index equation, using insulin, BMI, body fat%, and HDL cholesterol levels, was validated against endogenous glucose production measured during euglycemic–hyperinsulinemic clamp41. Adipose tissue IR was calculated based on the average plasma FFA measured at 90, 100, and 110 min during the modified IST.
Data analyses
All data analyses, corresponding key findings, and interpretations are described in detail in Supplementary Table 8. To test for differences in baseline demographics, labs, and metabolic test results between normoglycemia and prediabetes/T2D groups, as well as between the main and validation cohorts, the Wilcoxon rank-sum test was used for non-normally distributed continuous variables, and the χ2 test or Fisher’s exact test was used for categorical variables.
To identify dietary patterns and their relationship to metabolic characteristics in the cohort, PCA was performed on meal timing features. They were classified/color-coded by HbA1c, IR SSPG, incretin effect, or beta-cell function Disposition Index. Then, we used covariate-adjusted multiple linear regression (MLR) models to examine differences in the energy contribution of each meal timing between metabolic groups while adjusting for age, sex, BMI, and ethnicity. P values were BH-adjusted for multiple testing.
Individual-to-systemic analytical framework: linear regression analysis (individual) and training machine learning prediction models (systemic)
To assess individual associations of diet, sleep, and activity features with glucose outcomes (CGM and metabolic test results), we used the LASSO combined with regression models. For each glucose outcome, we performed a grid search (values ranging from λ = 1010 to λ = 10-2) to optimize the hyperparameter, λ, and selected the model that minimizes test misclassification error (MSE). The LASSO models selected lifestyle features associated with glucose outcomes and provided an estimate of the predictive values of the feature individually (Supplementary Tables 3–5). Then, we used regression analyses to examine individual associations of diet, sleep, and activity with glucose outcomes. P values were BH-adjusted for multiple testing.
We built integrated, comprehensive prediction models based on all three lifestyle modalities and demographic information to predict metabolic characteristics. Since many features are highly dependent on each other, we removed obvious dependencies and kept a total of 47 features to start with (e.g., baseline BMI was kept, and height and weight were removed). Features were then centered and scaled. Since we needed to include all three lifestyle factors simultaneously for building the prediction models, and there were missing values for individuals without Fitbit data, we chose to use the cohort mean to replace these NA values, as MICE-imputed data failed to predict all metabolic classes. Next, the LASSO approach selected relevant features, and then models with no regularization were built39. The hyperparameter lambda was selected, and the model was selected through leave-one-out and MSE. The plot (model coefficients of the top 10 selected features) was visualized in all analyses (Fig. 6 and Supplementary Fig. 6). P values were adjusted for multiple testing.
Time series analysis of activity and CGM
To examine the effects of the time series interaction between step counts and SSPG status on CGM mean values, linear models with permutation were fit at the 7-time windows of 24 hours (05:00–8:00, 8:00–11:00, 11:00–14:00, 14:00–17:00, 17:00–21:00, 21:00–24:00, and 24:00–the next day 05:00). Then, a shifted Pearson correlation analysis with permutation was performed between step counts and CGM mean values by SSPG status subgroups through the 7-time windows.
Correlation network analysis among time-matched lifestyle behaviors
To identify intercorrelations among the three lifestyles, we used Spearman correlation with permutation. All correlation and interaction analyses were adjusted for multiple testing.
Model validation on an independent validation cohort
Lifestyle and metabolic test data from the independent validation cohort were first preprocessed to extract the same lifestyle (diet, sleep, and physical activity) features and metabolic subtypes as used in the main cohort. Beta-cell function (all 10 normal) and muscle IR (9 insulin-sensitive and 1 insulin-resistant) were highly skewed in distribution. Therefore, we focused on the incretin function for validation, which was evenly split (normal, n = 5; dysfunction, n = 5), providing a robust test set. Then, we applied the final trained prediction model derived from the main cohort to this independent dataset. This yielded a MSE, which was compared against a random baseline error by selecting the largest group as the prediction.