Data sources and research design
The UK Biobank is a large-scale prospective study that recruited over 500,000 middle-aged participants from April 2007 to December 2010. For this analysis, we included 501,483 participants, of whom 907 had a baseline diagnosis of PD (Supplementary Fig. 1). Ethical approval for the UK Biobank study was granted by the North West Multicenter Research Ethics Committee.
Definition of the digestive diseases
This study focused on 14 specific diseases of the digestive system, which are commonly encountered and widely studied in clinical research. In addition to these individual diseases, we defined a compound outcome, termed overall digestive diseases, which included individuals diagnosed with any of the aforementioned conditions at the baseline. Diagnoses were identified with reference to a combination of diagnostic codes sourced from national inpatient datasets, primary care datasets, cancer registries, and self-reported medical conditions. Detailed diagnostic codes are provided in Supplementary Table 1. According to the annual audit committee report, the accuracy of these diagnostic codes is >89% (https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/319355/PbR_DAF_costing_briefing_June_2014.pdf).
Ascertainment of PD
The date of PD onset was determined using the algorithm recommended by UK Biobank (https://biobank.ndph.ox.ac.uk/ukb/ukb/docs/alg_outcome_main.pdf). Disease information was obtained via linkage to inpatient electronic health records and death registries, including data from Hospital Episode Statistics (HES) in England, the Scottish Morbidity Record, and the Patient Episode Database for Wales. Mortality data, including date and cause of death, were sourced from NHS Digital (England and Wales) and the NHS Central Register (Scotland). Follow-up time was defined as the interval from baseline assessment to the earliest of PD diagnosis, death, loss to follow-up, or censoring. Censoring dates were October 31, 2022 (England), August 31, 2022 (Scotland), and May 31, 2022 (Wales). PD cases were identified using the ICD-10 code G20. Further details on diagnostic criteria and associated codes are provided in Supplementary Table 2.
Covariate assessment
The analysis incorporated several covariates, including the age at the time of recruitment, sex (categorized as female and male), BMI, Townsend Deprivation Index (TDI), ethnicity (classified as White, Asia, Black, and Other), alcohol consumption (Never, Previous and Current), physical activity (whether a person met the UK Physical activity guide lines of 150 min of walking or moderate activity per week or 75 min of vigorous activity), healthy diet, educational attainment (categorized as any school degree, college or university degree, vocational, and other), smoking status (never, adolescence, adulthood, and childhood), hypertension (normal, evaluated, stage 1 hypertension, and stage 2 hypertension), and family history of PD.
Definition of nonsteroidal anti-inflammatory drugs (NSAIDs) and proton pump inhibitors (PPIs)
During the baseline assessment, the routine use of NSAIDs and PPIs among participants was initially assessed via a touchscreen questionnaire and subsequently confirmed during an oral interview conducted by trained staff. In the touchscreen questionnaire assessing NSAID use, participants were asked whether they regularly consumed any of the following medications: Aspirin, Ibuprofen (e.g., Nurofen), Paracetamol, Ranitidine (e.g., Zantac), Omeprazole (e.g., Zanprol), Laxatives (e.g., Dulcolax, Senokot), None of the above, Do not know, or Prefer not to answer32.
For the assessment of PPI use, participants were queried with the question, “Do you regularly take any prescription medications?” “Regular use” was defined as taking the medication on most days of the week over the past 4 weeks. If participants selected “Yes” or “Unsure,” they were then asked by the interviewer, “In the touchscreen questionnaire, you indicated that you are taking regular prescription medications. Could you please specify which medications these are?” Information regarding PPI use was recorded in free text format, with the types of PPIs documented including omeprazole, lansoprazole, pantoprazole, rabeprazole, and esomeprazole33. Detailed definitions and descriptions of the covariates are provided in Supplementary Table 2.
Assessment of sleep patterns and dietary diversity
The sleep patterns were evaluated using five key sleep behaviors: sleep duration, circadian preference, insomnia, snoring, and daytime sleepiness, as assessed through a touchscreen questionnaire34. Each behavior was scored as either 1 (healthy) or 0 (unhealthy) based on previously established health criteria. The cumulative sleep health score ranged from 0 to 5, with higher scores indicating healthier sleep patterns. The participants were then classified into two categories: “ideal sleep pattern” and “poor sleep pattern,” based on the median sleep health score.
Dietary diversity was assessed by examining the frequency of food intake and calculating the Shannon Diversity Index. This index quantifies dietary diversity by calculating the richness of food types and the balance of their consumption frequencies. The higher the index, the more diverse the diet35,36. The formula for calculating the metric is as follows:
$$H=\mathop{\sum }\limits_{i=1}^{s}-({P}_{i}\times \,{\text{ln}}\,{P}_{i})$$
$$\begin{array}{l}{\text{where}}\\ {\bf{H}}\,=Shannon\\ {\bf{Pi}}\,=\,fraction\,of\,the\,entire\,population\,made\,up\,of\,species\,i\\ ln({\bf{Pi}})=the\,natural\,log\,of\,above\\ {\bf{S}}\,=number\,of\,species\,encountered\\ \sum =\,sum\,from\,species\,{1}\,to\,species\,S\end{array}$$
Foods were categorized based on the consumption frequency, with integer codes assigned (where 1 represented the lowest frequency and N the highest). The Shannon Index was used to quantify dietary diversity. Based on this analysis, the participants were categorized into high and low dietary diversity groups.
Statistical analysis
The UK Biobank dataset contains a substantial amount of missing data, which could reduce the power to detect associations and introduce bias. To address this concern, we applied random forest imputation37, a machine learning-based multivariate technique, thereby incorporating all covariates to improve the prediction of missing values. Missing data related to observed variables can lead to greater bias in complete case analysis38; thus, multiple imputation provides more reliable estimates. Considering the significant amount of missing data in our sample, the use of multiple imputations was deemed essential to mitigate any potential bias. The results of the imputation analysis are detailed in the supplementary materials (Supplementary Table 3).
We performed chi-square (χ2) tests for categorical variables and the analysis of variance (ANOVA) for continuous variables for the evaluation of the baseline characteristic differences across digestive diseases. We then applied Cox proportional hazards regression models to estimate the associations between overall digestive diseases and the risk of PD. Given the potential for co-occurrence among digestive diseases, we also assessed the association between the number of digestive diseases and PD risk. In addition, we applied Cox models to estimate the associations of individual digestive diseases with PD risk. Three analytical models were applied in the Cox regression analysis: (1) Model 1 adjusted for age and sex; (2) Model 2 further adjusted for ethnicity, Townsend deprivation index, body mass index, education attainment, healthy diet, smoking status, alcohol consumption, physical activity, and hypertension; (3) Model 3 was additionally adjusted for family history of PD. The false discovery rate (FDR) procedure was used to adjust P-values39.
To estimate the proportion of PD cases that could potentially be prevented by eliminating the 11 digestive diseases significantly associated with PD, we employed the causal PAF package to calculate the Population Attributable Fraction (PAF) for these conditions. The PAF calculation was based on Levin’s formula40, which requires relative risk (RR) estimates and the prevalence of each risk factor41.
We further examined the associations between PD risk and specific digestive diseases—gastritis, duodenitis, pancreatitis and appendicitis—within both acute and chronic subgroups using Cox proportional hazards models.
Subgroup analyses and interaction analyses were conducted to explore the potential effect modifications by sex, ethnicity, alcohol consumption, educational attainment, smoking status, and hypertension.
In addition, we investigated the joint effects of lifestyle factors (such as sleep patterns and dietary diversity) and digestive diseases on PD risk42. Initially, we employed Cox proportional hazards regression to evaluate associations between these lifestyle factors and digestive diseases. Next, we integrated the 11 digestive diseases significantly associated with PD with various lifestyle factors to form new variables and categorized them into four groups based on a 2 × 2 matrix of digestive disease exposure (no or yes) and lifestyle factor risk (low-risk or high-risk). Low-risk lifestyle factors were defined as “ideal sleep patterns,” and “high dietary diversity,”. Likelihood ratio tests were performed to examine the interaction between these combined variables and PD risk, thereby providing further insights into their collective role in PD pathogenesis.
All analyses were two-tailed, and p < 0.05 was considered to indicate statistical significance. All statistical analyses were conducted using R version 4.2.2.
Sensitivity analysis
To evaluate the robustness of our findings, we performed a series of sensitivity analyses as detailed: (1) we restricted the analysis to participants aged >55 years at recruitment (n = 290,625) so as to control for potential immortal time bias, which occurs when follow-up periods without event risk are incorrectly included43. (2) To minimize reverse causation, we excluded participants who developed PD within the first 2 years following baseline assessment. (3) We employed a competing risk model to account for death and loss to follow-up as competing events, thereby providing a more comprehensive assessment of the PD risk44. (4) We then adjusted for the use of NSAIDs and PPIs for their potential confounding effects. NSAIDs are commonly used among PD patients and may influence disease risk, while PPIs are frequently prescribed for digestive diseases and have been linked to PD45.
