We used the same analysis strategy as the Canadian study16, and utilized two data sources: the CKB and CNHS (2015) (Supplementary Fig. 7). The CKB was used to develop a prediction model for the 5-year mortality risk that focused on the hazard of death related to lifestyle factors while accounting for other potential risk factors. The Ethical Review Committee of the Chinese Center for Disease Control and Prevention (CDC; Beijing, China), the Peking University Health Science Center (Beijing, China), and the Oxford Tropical Research Ethics Committee, University of Oxford (UK) approved the study. The CNHS was used to apply the above prediction model to estimate the period-based LE of the whole population under various scenarios and was approved by the Ethical Committee of China CDC. Due to sex differences in LE, we performed separate analyses for men and women.
Data for model development and validation
The CKB is a nationwide population-based prospective cohort study including over 0.5 million adults. The study design has been detailed elsewhere35. In brief, 512,723 participants aged 30–79 were enrolled during 2004–2008 from five urban and five rural regions, covering a wide range of risk exposures, disease patterns, and levels of economic development. Two periodic resurveys were conducted on about 5% of randomly chosen surviving participants in 2008 and 2013–2014. All participants signed informed consent forms.
All participants were followed up for mortality immediately after baseline enrollment by linkage to the National Disease Surveillance Points (DSP) system, supplemented with active follow-up. The loss to follow-up was <1% before censoring on December 31, 2018.
Data for simulation analysis
The CNHS (2015–2017) was the latest round of cross-sectional surveys for Chinese national nutrition and chronic disease surveillance, with nationally representative samples from 302 survey sites across 31 provincial-level administrative divisions in the mainland of China. The adult survey was completed in 2015. Participants were selected using a stratified multistage cluster sampling scheme, as previously reported36. All participants had completed written informed consent forms.
Candidate predictors
In both the CNHS and CKB baseline surveys, all participants completed a questionnaire and had physical measurements taken. Candidate predictors were identified using the following rules: (1) ever included in mortality risk prediction models in previous studies37,38,39,40,41; (2) available in the CKB and CNHS surveys. We eventually pre-specified 22 candidate predictors, including age, education level, marital status, smoking status, alcohol consumption, total physical activity level, dietary intake (fresh vegetables, fresh fruits, red meat, and fish/seafood), sleep duration, body mass index (BMI), systolic blood pressure, diastolic blood pressure, resting heart rate, self-rated health status, and personal medical histories (coronary heart disease, stroke, cancer, chronic obstructive pulmonary disease [COPD], asthma, and diabetes). We depicted in the Supplementary Material about the assessment of candidate predictors.
Definition of simulated scenarios
Referring to the action goals set in Healthy China 2030 and the recommended intakes of various foods in the Dietary Guidelines for Chinese Residents9,42, we predefined two simulated scenarios for each lifestyle factor: ideal and practical scenarios, with the goals set in the practical scenario being more achievable (Table 3). We also considered an alternative scenario for smoking, in which the whole population is assumed to never smoke. In addition, based on Healthy China 2030 which aims to reduce smoking prevalence in the whole population to 20%, we further assumed that this would be achieved only by reducing smoking prevalence in men from 51.6% to 40.1%. As a result, we recoded 11.5% of the males who currently smoke as quitters.
Due to the low prevalence of excessive alcohol consumption in the CNHS population (men: 11.9%; women: 0.4%), we did not set a practical scenario for this factor. However, we considered an alternative scenario for alcohol consumption, assuming that all people with excessive alcohol use were non-daily drinkers.
Total physical activity level was modeled as a continuous measure, with 1 MET-h/d representing an increase in moderate-intensity physical activity of about 15 min per day43. In addition to the ideal and practical scenarios, we also referred to the goal of Healthy China 2030, which is to increase the proportion of individuals who engage in regular physical activity to 40%. Regular physical activity is defined as engaging in physical activity of moderate intensity or higher ≥3 times per week, each time lasting ≥30 min. The proportions of men and women in CNHS achieving this physical activity level were 2% and 1.2%, respectively, so we recoded an additional 38% of men and 38.8% of women who did not achieve this level of physical activity as having achieved it.
For dietary habits, the target values in the ideal scenario are the amounts of fresh fruits and fish/seafood that the population needs to consume to reach the upper limits of the dietary guideline recommendations. In contrast, those in the practical scenario are the amounts required to reach the lower limits of the recommendations. We did not simulate changes in red meat intake because the CNHS population already consumed more than the recommended level. The combined impact of changes in these lifestyle factors was estimated by setting all lifestyle factors to ideal or practical scenarios.
Statistical analysis
Development and validation of the prediction model
Two participants in the CKB were excluded due to missing BMI data, leaving 210,203 men and 302,518 women in the current study. We developed separate models for men and women. To reduce overfitting44, the models were fitted to a random two-thirds sample (derivation cohort: men 140,135 and women 201,678) and evaluated in the remaining one-third (validation cohort: men 70,068 and women 100,840). To improve prediction performance, we used the intake amount data from the second resurvey as a proxy measure of mean consumption for each frequency category at baseline45. Details on the calculations and results have been presented in the Supplementary Material and Supplementary Tables 8 and 9.
Cox proportional hazard models were used to develop models stratified by ten study regions, with follow-up time as the time scale. Participants were considered at risk from the baseline enrollment until the date of death, loss to follow-up, or December 31, 2018, whichever was earlier. We used backward elimination (P < 0.05 to retain) to select predictors. Due to the large sample size, most predictors were statistically significantly associated with mortality risk but only modestly improved the model’s predictive accuracy. We, therefore, subjected all remaining predictors to further selection with the Bayesian information criteria (BIC). If the BIC index decreased when the variable was removed, it was excluded from the model46. We used forward and bidirectional selection techniques rather than backward elimination to test the model’s stability. Two strategies selected the same set of predictors.
We used restricted cubic splines to test possible non-linear relationships between continuous variables and mortality risk. If nonlinearity was detected, the variables were transformed using the natural logarithm or converted to categorical variables. The formation was chosen when the model achieved the smallest BIC. All continuous variables were mean-centered to control for multicollinearity and provide a more straightforward interpretation of the regression estimates. Linear and squared terms of age were included to fit the non-linear increase in death hazard in older ages47,48. All two-way interactions were considered, but none significantly improved model performance. Finally, 17 and 16 predictors were kept in the models for men and women, respectively, both including six lifestyle factors, namely smoking status, alcohol consumption, total physical activity level, and daily intake amount of fresh fruits, red meat, and fish/seafood. Baseline survival at 5 years (S0[5]) was estimated by pooling the S0(5) across regions and weighting it by the number of deaths by 5 years49. Briefly, the 5-year all-cause mortality risk for an individual with risk factor X is:
$$F(5,{{{\bf{X}}}})=1-{S0(5)}^{\exp (V)}$$
where \(F(5,\,{{{\bf{X}}}})\) is the absolute risk of mortality in 5 years. S0(5) is the baseline survival probability at 5 years, and V equals to \(\beta 1X1+\beta 2X2+\ldots+\beta nXn\), where Xi is the value of predictor i, and \(\beta i\) is the beta coefficient for predictor i.
In the validation cohort, discrimination performance was assessed using the AUC, also known as the c-index. Calibration performance was graphically assessed by comparing the mean predicted risks over 5 years to the observed risks across deciles of predicted risks. The observed risks were obtained using Kaplan–Meier analyses. We also repeated the model development process using the whole population rather than the derivation cohort to test the model’s stability and parameter accuracy. We further examined whether the lifestyle effects were mediated by other variables by comparing three progressively adjusted models: (1) age and lifestyle factors only; (2) adding sociodemographic factors, including education level and marital status; (3) further adjusting for health indicators comprising self-rated health, baseline stroke, cancer, COPD, diabetes, BMI, systolic blood pressure, and resting heart rate.
Simulation of the impact of changes in lifestyle prevalence on LE for the whole population
The constructed model was applied to the CNHS population for simulation analysis. Participants were excluded if: (1) they could not be weighted due to lack of address information (n = 62); (2) they were younger than 30 years of age (n = 7289; in accordance with the CKB population used for model development); (3) they had missing data on the predictors (n = 3605); (4) they had implausible food intakes (n = 2230; fresh fruits >600 g/d, red meat >400 g/d, fish/seafood>200 g/d; the cut-off values were chosen based on the upper 99th percentile value). After these exclusions, 31,515 men and 36,049 women remained in the analysis.
With reference to the Canadian study cited above, we assessed the impact of changes in lifestyle prevalence using a cause-deleted period life table approach16,50. Unlike a typical period life table beginning with age- and sex-specific mortality rates, which are then converted to age- and sex-specific mortality risk, we constructed sex-specific 5-year abridged period life tables (30 to 85 years) using weighted mortality risk derived from the prediction model. When the prevalence of all lifestyle factors of interest remained at the level of the CNHS population, the obtained LE estimates were designated as the LE under “base scenario”. The LE under each simulated scenario was calculated using the same method and then compared to the base scenario to estimate the impact of lifestyle intervention on the whole population.
The CIs of LE were estimated using parametric bootstrapping with 500 runs, combining the stochastic error from the model parameters and the exposure variablity in the CNHS population.
We also demonstrated how lifestyle changes in specific age groups affect the LE of the whole population. Taking smoking as an example, we first used the 30–39 age group as the target population for tobacco control, then changed a certain proportion of individuals from a current to a former smoking status so that the prevalence of current smoking in this age group achieved the simulated scenario. Next, we gradually expanded the age range of the target population by 10 years (30–49, 30–59, 30–69, 30–79, and the whole population) and repeated the calculation of LE. All statistical analyses were performed using Stata (version 15.0, StataCorp), and graphs were plotted using R version 4.0.3.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
