All methods were performed in accordance with the relevant guidelines and regulations.
Study design and participants
Data were obtained from China Health and Retirement Longitudinal Study (CHARLS), a nationally representative longitudinal survey conducted by Peking University among Chinese middle-aged and older adults. The CHARLS baseline survey was conducted from 2011 to 2012, covering 150 counties in 28 provinces of China. A wide range of information on socioeconomic status, health circumstances, as well as anthropometric and laboratory measurements, were collected [21]. The participants were followed in 2013, 2015, and 2018 through face-to-face computer-assisted personal interview (CAPI), respectively. Detailed descriptions of the survey design and procedures were available elsewhere [21].
In this study, we restricted our analysis to a subset of participants aged 60 years and older, without ADL disability at the baseline survey of CHARLS (2011 wave). At baseline, a total of 2840 participants with missing information on key variables such as all physical performances and ADL status were excluded, and 4303 well-functioning participants were included for analyses. Compared with the excluded participants, the included participants were older and more likely to be females, with worse demographic characteristics, chronic conditions, and health behaviors (Table S1). After a 4-year follow-up, 2111 were lost to follow-up and 2192 participants reported complete information on the ADL outcome, and both groups shared similar baseline characteristics (Fig. S1 and Table S2).
All the participants signed informed consent at the time of participation and this study was approved by the Institutional Review Board of Peking University (IRB00001052–11014).
Outcome
ADL was evaluated by the Katz ADL scale referring to daily self-care tasks, including taking a bath, eating, getting in and out of bed, dressing, using the toilet, and maintaining continence of urine and feces [22]. In this study, participants were determined as having ADL disability if they reported needing any help in at least one of these ADL items [23].
Physical performances—handgrip strength
We assessed the upper limb function by performing the handgrip strength test. Subjects were asked to stand and hold the dynamometer at a right angle (90°), squeezing the handle as hard as possible for a few seconds. Each hand was measured twice in turn. In this study, the maximum handgrip strength (kg) from all four attempts was used to measure handgrip strength [24].
Physical performances— the short physical performance battery (SPPB)
We evaluated the lower limb function by conducting the SPPB, which includes three measurements of balance, gait speed, and repeated chair stands tests. In the balance test, participants were asked to take two of the following balance tests: side-by-side stand, semi-tandem stand, and full tandem stand. All participants were asked to conduct a semi-tandem stand. If participants were able to hold a semi-tandem stand for 10s, they were then asked to perform the full tandem stand for 30s (for participants aged 70 or above) or 60s (for participants aged less than 70). Otherwise, they were asked to conduct a side-by-side stand for about 10s [25]. In the gait speed test, subjects walked twice (there and back) along a 2.5-m straight road at their usual speed and the time taken was recorded [25,26,27,28]. For repeated chair stands test, subjects were asked to stand and sit in a chair five times as quickly as possible with their arms crossed over their chest. The time was measured from the moment the subjects started to stand up until they were fully standing after rising for the fifth time [25]. Each test was scored from 0 to 4. The balance test score depended on the hierarchical combination of performance on the three kinds of balance tests. In the other two tests, score 0 was assigned to those who were unable to complete the tests, and scores from 1 to 4 were assigned according to the quartiles of time required to complete the tests [29]. Additionally, the SPPB score was obtained by summing balance, gait speed, and repeated chair stands tests, ranging from 0 (worst performance) and 12 (best performance).
Physical performances—gait speed
Gait speed was one part of the SPPB and has been given detailed descriptions in the SPPB section. The average speed of the two trials was used in the analysis [25].
Other predictors
The following variables were also considered as predictors: age, gender, marital status, education, social activity, drinking, smoking, night sleep, comorbidities, body mass index (BMI), self-assessment of health conditions, depressive symptoms, and cognitive function [30, 31]. Age was classified into the following four groups: 60–64, 65–69, 70–74, and older than 75 years old [32]. Marital status was categorized into married or cohabiting, widowed, and another marital status including separated, divorced, and never married. Education was categorized into the following five categories: illiterate, primary school, middle school, high school, and college and above [32]. Social activity frequencies were classified as never, not regularly, almost weekly, and almost daily [28]. Drinking was divided into the following four categories: never, quit drinking, less than once a month, and more than once a month [28]. Smoking was classified into the following four categories, never, quit smoking, less than 20 cigarettes a day, and more than 20 cigarettes a day [28]. Moreover, night sleep durations were classified as less than 6 h, 6 to 9 h, and more than 9 h [28]. Suffering from two or more self-report chronic diseases was defined as comorbidity condition [28]. BMI was classified according to WHO cut-off points for Chinese: underweight (BMI < 18.5 kg/m2), normal weight (BMI = 18.5 kg/m2 to 23.9 kg/m2), overweight (BMI = 24 kg/m2 to 27.9 kg/m2) and obese (BMI ≥ 28 kg/m2) [33]. Self-report health condition was classified into good, fair, poor, and very poor. Cognitive function was assessed by two domains, episodic memory and mental intactness, with global cognitive scores ranging from 0 to 21 [25, 34]. The episodic memory score was defined as the average of the immediate and delayed recall scores, with the scores ranging from 0 to 10 [34]. In CHARLS, the mental intactness tests included serial subtraction of 7 from 100 (up to five times), the date (month, day, and year), the day of the week, the season of the year, and intersecting pentagon copying test. Answers to these questions were summed into a mental intactness score ranging from 0 to 11 [34]. Depressive symptoms were measured using Center for Epidemiologic Studies Depression Scale-10 items (CES-D-10) (ranging from 0 to 30). Participants with scores ≥10 were considered to have significant depressive symptoms [35].
Statistical analysis
A descriptive analysis was performed to characterize the study populations. Continuous variables were reported as median and quartile (non-normal distribution), and categorical variables were reported as numbers and percentages. We compared the baseline characteristics between ADL status using the Kruskal-Wallis test for continuous variables or the chi-square test for categorical variables.
We established six logistic regression models using logistic regression analysis. Model 1 (fundamental model) was established using a backward stepwise selection with the Akaike information criterion (AIC). We selected seven predictors (gender, age, smoking, self-report health condition, BMI, depressive symptoms, and cognitive function) from 13 candidate predictors (age, gender, marital status, education, social activity, drinking, smoking, night sleep, comorbidity, self-report health condition, BMI, depressive symptoms, and cognitive function). Besides, five physical performance-based models were established based on Model 1, adding handgrip strength (Model 2), SPPB (Model 3), gait speed (Model 4), handgrip strength plus SPPB (Model 5), and handgrip strength plus gait speed (Model 6), respectively. In our study, Model 2 represented upper limb model, Model 3 to 4 represented lower limb model, and Models 5 to 6 severed as comprehensive model combining both upper and lower limbs. Predictors selected through every model were considered of odds ratio (OR) and corresponding 95% confidential interval (CI). Moreover, we transformed each model into visualized nomogram, facilitating risk probability calculation using more concrete numbers for individuals.
The model performance was evaluated by discrimination, calibration, and clinical utility. The discrimination was quantified by the concordance index (C-index) which was equivalent to the area under the receiver-operating characteristic curve (AUC) in a logistic analysis. The AUC closer to 1 represented better discriminant ability, and AUC closer to 0.5 the opposite [36]. C-index ≥0.70 defined good discrimination [37]. We used the calibration plots to assess the calibration of the model by comparing the consistency between the actual outcomes and predicted outcomes. The 45-degree line represented perfect calibration, and adjacency to this line indicated good calibration [38]. Clinical decision curve analysis (DCA) was conducted to determine the clinical utility of the model by quantifying the net benefits at threshold probability [39]. Interventions would be made only when the outcome probability reached the threshold value. Moreover, we validated our models internally by conducting 1000 bootstrap resamples to generate the bootstrap-corrected C-index and calibration plots.
We also used Integrated Discrimination Improvement (IDI) and Net Reclassification Improvement (NRI) to assess the incremental benefit in the subsequent extended models (Model 2 to Model 6). The IDI index shows the average net improvement in the predicted risk for ADL disability in the extended models [40, 41]. The NRI index can be interpreted as the proportion of correct risk reclassification after adding physical performances to Model 1 [40]. Category-free NRI was adopted due to the lack of consensus on categorization of ADL disability risk in the older community population. In general, NRI (IDI) > 0 is considered relatively positive incremental benefit in the subsequent new models, indicating better prediction performance than the old one.
All statistical analyses were performed with the use of R software (version 3.0.2; http://www.Rproject.org) and SPSS (version 20.0). All statistical tests were two-sided, and significance was set as P value<0.05.