The choice of self-rated health measures matter when predicting mortality: evidence from 10 years follow-up of the Australian longitudinal study of ageing

Background Self-rated health (SRH) measures with different wording and reference points are often used as equivalent health indicators in public health surveys estimating health outcomes such as healthy life expectancies and mortality for older adults. Whilst the robust relationship between SRH and mortality is well established, it is not known how comparable different SRH items are in their relationship to mortality over time. We used a dynamic evaluation model to investigate the sensitivity of time-varying SRH measures with different reference points to predict mortality in older adults over time. Methods We used seven waves of data from the Australian Longitudinal Study of Ageing (1992 to 2004; N = 1733, 52.6% males). Cox regression analysis was used to evaluate the relationship between three time-varying SRH measures (global, age-comparative and self-comparative reference point) with mortality in older adults (65+ years). Results After accounting for other mortality risk factors, poor global SRH ratings increased mortality risk by 2.83 times compared to excellent ratings. In contrast, the mortality relationship with age-comparative and self-comparative SRH was moderated by age, revealing that these comparative SRH measures did not independently predict mortality for adults over 75 years of age in adjusted models. Conclusions We found that a global measure of SRH not referenced to age or self is the best predictor of mortality, and is the most reliable measure of self-perceived health for longitudinal research and population health estimates of healthy life expectancy in older adults. Findings emphasize that the SRH measures are not equivalent measures of health status.


Background
Self-rated health (SRH) is a widely used measure for health status in public health and epidemiological research due to strong associations with other subjective and objective measures of well-being, health outcomes and mortality [e.g. [1]]. The multidimensional concept of health that is encapsulated within a single global SRH response is considered by World Health Organisation (WHO [2]) and the Euro-REVES 2 [3] project to be one of the best indicators of health at the individual and popula-tion level. Both of these organisations have extensively investigated the relationship between global SRH and health outcomes and recommended the measure to estimate policy relevant data on aspects of public health such as healthy life expectancy and mortality [4].
The most commonly used SRH measure has a global or current reference point (i.e. how would you rate your health in general/at the present time?). A comparative reference point is also often used to anchor the assessment, such as comparing current health to previous health (self-comparative), or same-aged peers (age-comparative). All these forms of the SRH item are in use in health surveys around the world as an indicator of older adults' years lived in good health [e.g. [3]]. Despite their extensive use, and the robust relationship between poor ratings of SRH and major health outcomes, there is scant research that has compared how these SRH measures perform in older adult populations; in particular it has not been established how the SRH measures compare in their relationship with mortality.
In the few studies that have compared the association between SRH items and mortality, mixed results have been reported. Manderbacka, Kareholt, Martikainen and Lundberg [5] found the predictive quality of global and age-comparative SRH items was dependent on gender, with the age-comparative item being a better predictor of mortality for males than females in simultaneous models. In contrast, Vuorisalmi, Lintonen and Jylhä [6] reported both an age-comparative and global comparative item was a stronger predictor of mortality for males than females. Even less attention has been paid to the selfcomparative item, and the temporal reference point that is used in the few studies has been broad (i.e. ranging from 5 to 10 years previous). As the interval over which participants gauge their health expands, so does the possibility that these retrospective reports may be erroneous. For example, Bath [7] found that a global SRH item was more robust in predicting mortality than a self-comparative measure that asked older adults to compare their present health to health five years previously. In a study that compared a global, age-comparative and self-comparative (10 years previous) SRH items it was found that the predictive quality of the self-comparative item became less robust for males after longer follow up periods, whereas none of the SRH items were predictive of mortality in adjusted models for females after 3 and 7.5 year follow-ups [8].
The inconsistent findings regarding the predictive quality of the different SRH items, and the effect that gender may have on this SRH-mortality relationship, warrants further investigation. It is also not clear the extent that these SRH measures predict mortality independently of other potential associated changes, as not all studies have accounted for other known mortality risks, such as demographic variables, health and health behaviours. Furthermore, no study has accounted for the high correlation between the SRH measures by simultaneously modelling the three SRH measures together.
A further consideration is whether the SRH-mortality relationship is constant over time. There is a growing body of evidence that a dynamic evaluation of self-rated health, that is, taking into account the potential timevarying nature of SRH across time, may provide a more authentic depiction of the relationship between subjective health assessments and mortality [9]. Cross-sectional and longitudinal studies suggest SRH does not remain stable across the lifespan [10][11][12][13]; therefore using a single occasion measurement of SRH to predict mortality may not take into account the biopsychosocial interactions of health and the lifespan health trajectory [14,15]. The advantage of modelling the relationship between timevarying SRH ratings and mortality is that it enables the modelling of time-varying predictors. Previous studies have found that time-varying measures of Global SRH are a superior predictor of mortality compared to a single fixed measure [15][16][17][18]. However, no study to date has investigated how dynamic changes in age-comparative or self-comparative SRH may relate to mortality risk.
The aim of this study is to fill current gaps within the literature regarding the sensitivity of time-varying SRH measures with different reference points to predict mortality in an older adult sample. The unique contributions of the study are (1) the dynamic evaluation of the relationship between three time-varying SRH measures (global, age-comparative and self-comparative) on mortality risk, (2) the identification of the unique impact of each SRH measure by controlling for time-dependent and time-varying measures of biopsychosocial factors known to increase mortality risk in older adults [15][16][17], and (3) the comparison of the independent and concurrent relationship between the three SRH measures on mortality in order to account for the correlation of these measures and further determine their unique predictive quality. While the literature suggests gender differences in the relationship between SRH measures and mortality [e.g. [5,7]], and age-group differences in SRH ratings [e.g. [13,19]], it is not clear if men and women exhibit the same association between SRH and mortality at different ages. Therefore, interactions between SRH items and gender and age-groups are also investigated in order to more fully assess these dimensions of difference.

Sample
The Australian Longitudinal Study of Ageing (ALSA) has been fully described elsewhere [20]. In brief, households with residents over 70 years were identified from the South Australian Electoral Database. The sample was stratified by area, gender, and 5-year cohort groups (70-74, 75-79, 80-84, and ≥ 85) [21]. Males were over-sampled to ensure sufficient numbers in follow up. Of the 2,705 eligible residents 1,477 (55%) agreed to be interviewed. Spouses (>65 years) and co-residents (>70 years) of the sample were also asked to participate, which brought the total number of participants at baseline to 2,087 community dwelling and residential care individuals. The patterns of health care utilization in the final sample were found to be similar to that of the general Australian population [20].
Data collection began in 1992. Baseline and waves 3 (24 months from baseline), 6 (96 months) and 7 (120 months) data consisted of a comprehensive two-hour home inter-view including questions on demographic, health, medical, psychosocial, and physical status. Waves 2 (12 months from baseline), 4 (36 months), and 5 (60 months) were conducted via short telephone interviews and addressed changes in biopsychosocial factors since last measurement period. Data from all seven waves were included in the current study. Baseline ages ranged from 65 to 103 years of age (mean age = 78.14 years, SD = 6.68). In the final wave of data collected the remaining 489 participants were aged between 75 to 102 years (mean age = 84.94 years, SD = 4.90). Reasons for non-response at wave 7 were due to death (58.8% of the baseline participants), participants unable to be contacted (2.0%), participants who had moved out of scope of the study (2.3%), and those that refused to be interviewed (13.5%). Table 1 displays the descriptive characteristics for the final sample across all waves.

Self-rated health (SRH)
Global SRH was measured with the question "How would you rate your overall health at the present time?" (1-'excellent' to 5-'poor'). Age-comparative SRH was measured in response to the question "Would you say your health is better (1), about the same (2) or worse (3) than most people your age?" Self-comparative SRH was worded "Is your health now better (1), about the same (2) or not as good (3) as it was 12 months ago?" Global and self-comparative SRH was measured at all seven waves whilst the age-comparative item was measured at baseline, waves 3, 6, and 7. SRH ratings were reverse coded so that the highest score was equivalent to the most positive health rating.

Demographics
Demographic variables included gender, age, community versus residential dwelling, partner status, annual income, and number of years of education. The education question asked participants how old they were when they left school, with possible responses ranging from 1 = never went to school, 2 = under fourteen years, 3 = fourteen years, 4 = fifteen years, to 7 = eighteen or more years. The education variable was dichotomised at the median age category to reflect ≤ 14 years versus ≥ 15 years education [20,22].

Physical and functional health and medications
Participants were asked at baseline, wave 3, 6, and 7 if they had been diagnosed and were currently suffering from a heart condition, cancer, or diabetes. In addition participants were shown a prompt card that listed another 38 medical conditions including arthritis, diabetes, and gallstones, and asked to indicate if they suffered from these as well as list any other conditions they had. The total number of conditions currently suffered was summed to create a continuous aggregate score of number of conditions. At baseline, and waves 3 and 6 participants were asked to nominate, and show the interviewer the container, for prescribed and non-prescribed mediations they were currently taking. Number of medications was summed to reflect a continuous variable of total number of medications for each respondent.
Functional status was assessed using the Activities for Daily Living (ADL) and the Instrumental Activities of Daily Living (IADL) measures [23] at all seven waves. The ADL measures difficulties bathing, dressing, eating, using the toilet, and getting around or away from home. IADL questions include ten activities regarding housework, meal preparation, money management and the use of public transport. Scores are coded (0 = "no difficulty" and 1 = "difficulty") and summed so that higher scores indicated greater functional disability.

Smoking status
Smoking Status measured current and past smoking of cigarettes, pipe or cigars at baseline. The items were coded to reflect (1) current smoker, (2) ex-smoker and (3) never smoked.

Depressive symptoms
Depressive symptoms were measured at baseline and waves 3, 6 and 7, using the Centre for Epidemiology Depression Scale (CES-D), a 20-item questionnaire designed for use in community-based epidemiological studies [24]. A four-point scale was used to assess how an individual felt in the last week, with answers extending from rarely or none of the time (0) to most of the time (3). Summed scores ranged from 0 to 60 with a higher score indicative of more depressive symptoms. The scale had a high level of internal consistency with a Cronbach's alpha coefficient of .85.

Cognitive functioning
Cognitive functioning was measured at baseline and waves 3, 6 and 7, with the Mini-Mental State Examinations scale [MMSE: [25]]. The scale assesses orientation to place and time, attention and calculation, and memory recall [20]. The MMSE has been shown to have satisfactory reliability and construct validity and displays a high degree of sensitivity for moderate to severe cognitive impairment [26].

Statistical analysis
Participants with over 25% observable data missing (n = 354 (16.9%)) were removed from the data set [27] leaving a final sample of 1,733 (52.6% males). This criteria was used as Byrne [27] has shown that model and fit estimates are comparable between a complete data set and one with up to 25% data loss when a full information maximum likelihood imputation method is used. Participants removed from the data set were more likely to be male (χ 2 (1) = 15.84, p < .001), older at baseline (t(2085) = 7.95, p < .001), have a greater number of problems with ADL's (t(2085) = 3.34, p = .001), and IADL's (t(2085) = 3.32, p = .001), and be taking more medications (t(2085) = -4.51, p < .001).
Of the final remaining sample 21.3% had < 5% missing data over the seven waves, 14.9% had 6 to 10% missing, 6.8% had 11 to 15% missing, 13.9% had 16 to 20% missing, and 43.1% had >21% missing. As expected, sensitivity analysis revealed that there was a .96 probability (area under the Receiver Operating Characteristic (ROC) curve: Standard Error = .005) that a randomly chosen participant in the final sample who had died over the follow up period would have a greater percentage of missing data compared to a randomly chosen survivor. The missing values for the remaining sample were imputed with the maximum likelihood approach of the Expectation Maximization (EM) algorithm method [28]. The EM method uses all available data and alternates the iterative algorithm between estimating missing values from observed responses and parameter estimates and maximises the likelihood for the subsequent full data [29].
Cox regression models were used to analyse the effect of time-varying predictors and covariates on mortality risk (Singer & Willet, 2003). Number of years from baseline interview until death or censorship was the measure for time used in the models. The Cox Regression is a partial likelihood method of estimation which takes into account the number and rank order of deaths in the sample. Because of a 'conditioning argument' [ [30]; p.520] within the partial likelihood method no assumptions are made regarding the shape of the baseline hazard function, therefore only the effect of the predictors and covariates are evaluated. The great advantage to using this Cox Regression approach is that a model can be fitted regardless of the baseline hazard function complexity. Singer and Willet [30] describe the probability of the event (mortality) risk is modeled as: where the time-invariant predictor or covariate in a model is represented by X 1i and the time-varying predictor or covariate is represented by X 2ij. h(t ij )/h 0 (t j ) represents an individual's (i) mortality hazard ratio (HR) at time t j and is therefore a product of the baseline hazard function h 0 , and the individuals true risk score at a given time (i.e. the antilog of each raw coefficientβ 1 X 1i + β 2 X 2ij ). To deal with the unbalanced data (i.e. not all vari-  ables are observed at every wave -see Table 1) we carried forward the most recent value of each time-varying predictor to the next wave if it was missing at that wave [30]. Singer and Willet argue that this approach is particularly appropriate to account for the shortfall of predictor information for categorical data when there are complex patterns of temporal variation of observations. SRH items were treated as ordinal variables with the reference category designated as the most positive rating (i.e. 'excellent' for global; and 'better' for age-and selfcomparative). Covariates were computed to reflect timevarying values over the observation periods with the exception of the time-invariant covariates (gender, education, income and smoking status). Income and smoking status were included as categorical variables and therefore baseline measures were used for ease of interpretation. For time-invariant covariates the HR represents the effect of a one-unit change in the related predictor on the raw hazard of mortality over the 10 years of follow-up. For the time-varying predictors and covariates, the HR represents the weighted average of short-term mortality risk across the 10 years follow-up (i.e. the mean of the risks from baseline to first measurement period plus second measurement to third measurement period and so on) [31]. For the categorical predictors the interpretation is essentially the same, except that the HR represents the difference in the risk of mortality compared to the reference group.
The addition of gender and age-group interaction terms into the ordinal models resulted in an additional 32 parameters. To ensure models remained parsimonious [32] SRH items were not identified as categorical in the separate interaction models allowing for an interaction effect to be identified. A significant interaction term resulted in adjusted ordinal models run by group to ascertain group differences.

Associations between SRH measures and mortality
Prior to the Cox regression models the relationship between the three SRH measures was investigated. As expected, correlations between the global, age-comparative and self-comparative SRH measures across the seven waves were all significant at p < .05. Correlations between global and age-comparative SRH were moderate, ranging from .271 (p < .000) at wave 7 to .471 (p < .001) at baseline. Similar correlations were found between global and self-comparative SRH, ranging from .278 (p < .001) at baseline to .475 (p < .001) at wave 4. Correlations were smaller between age-comparative and self-comparative SRH, ranging from .138 at wave 7 to .221 at baseline. Table 2 shows the unadjusted associations between SRH items and mortality as well as the net effects of SRH items on mortality. 'Poor' global SRH increased the unad-justed risk of mortality by 4.71 times, compared to 'excellent' ratings (model 1). 'Worse' age-comparative SRH increased mortality risk by 2 times compared to 'better' ratings (model 2). 'Not as good' self-comparative ratings increased the mortality risk by 1.23 times compared to 'better' ratings (model 3).
When global SRH was placed in the same models as age-comparative (model 4) and self-comparative (model 5) SRH the relationship between 'worse' age-comparative and 'not as good' self-comparative ratings and mortality became non-significant. Model 6 revealed that 'worse' age-comparative and 'not as good' self-comparative ratings independently predicted mortality when placed in the same model. In model 7, after accounting for shared variance of all three SRH items, poor global SRH was revealed as the strongest independent predictor of mortality. These results confirm that the global SRH measure accounts for the relationships between the comparative SRH measures and mortality. Table 3 shows the models adjusted for demographic and health risk factors. In the independent models, after accounting for other mortality risk factors, 'poor', 'fair' or 'good' global SRH ratings and 'worse' age-comparative ratings over time indicate a significant increase in mortality risk for older adults compared to the most positive ratings. In contrast, 'same' self-comparative ratings significantly reduced the mortality risk compared to 'better' over time. In the final full model, all three SRH items are entered to account for overlap between these measures. This model shows that the relationship between the three SRH measures and mortality remains relatively unchanged from the independent models in Table 3. The most notable difference is the reduction in hazard ratio from 3.37 to 2.83 for 'poor' global SRH. This suggests that a poor global rating reflects both age and self comparison processes to some degree.

Gender and age by SRH item interactions
As described above separate models investigated gender and age interactions with SRH. The gender by SRH interaction term was not significant (HR = 0.99, 95% CI: 0.98, 1.01). However, a significant age by SRH interaction term was found (HR = 1.00, 95% CI: 1.00, 1.001). To investigate the interaction effect separate adjusted models for each age-group were conducted (see Table 4). Age-groups were categorised into young-old (65 to 74 years), old-old (75 to 84 years) and oldest-old (85+ years) at baseline, as defined in the gerontological literature [33].
The young-old age-group model revealed that "poor" global and "worse" age-comparative ratings significantly predicted mortality. For the old-old adults age-comparative and self-comparative SRH did not independently predict mortality. Similarly, only 'poor' global ratings were found to significantly predict mortality for the oldest-old age-group. Averaging 'same' self-comparative health resulted in a significant reduction in mortality risk for young-old and oldest-old adults compared to 'better' ratings over time.

Discussion
Our results indicate that the three SRH items do not have comparable relationships with mortality. These results build on previous findings that SRH measures with different reference points are not interchangeable measures of subjective health for older adults [5,34]. To our knowledge this is the first time the predictive nature of three commonly used SRH items have been compared over time in a dynamic evaluation model. This comparison revealed that, overall, global SRH was the strongest predictor of mortality when taking into account the timevarying nature of the ratings across time, whilst the weakest association was with the self-comparative item.
These findings are contrary to some previous studies that have found that an age-comparative SRH item is a more robust predictor of mortality than a global item for males in a similar age-group [5], or that the three SRH items had similar predictive qualities (55 to 85 year old sample) [8]. However for most previous studies the SRHmortality associations have been modelled separately for gender [e.g. [5][6][7][8]]. The contrasting methodology that was used in the current study revealed non-significant interaction terms for gender in the models, indicating that the Note: a -Reference Category is "Excellent"; b & c "Better". LL = Log Likelihood. ***p < .001; **p < .01; *p < .05.  relationship between the different SRH measures and mortality was not significantly different for males and females,. Hence separate models for men and women were not justified. Furthermore, our findings expand the literature because the comparison of the three SRH items was investigated through a dynamic model of SRH and mor-tality, using time-varying SRH ratings to predict mortality rather than ratings at a single point in time. By modelling the mortality hazard using time-varying predictors and covariates we assessed the cumulative shortterm effects of SRH ratings on mortality over a 10 year follow-up period. While well-suited to the global SRH data, the findings indicate that this dynamic evaluation The difference in predictive quality of these SRH measures is most likely due to the fundamental nature of anchoring the health evaluation to a particular reference point, such as peers or own past health. For example, the age-comparative item may enhance health assessments due to a self-protective 'social downgrading' process [35], whereas the forced temporal aspect of the self-comparative item elicits more negative ratings as it makes recent negative changes in health more salient [36]. Etiologically speaking, self-perceived temporal decline in health could be argued to be a good indicator of imminent mortality risk, as the further analysis above has demonstrated. However, with advanced health decline older adults may perceive that their health cannot get any worse, thus they may begin to rate their health as 'the same' as previous, placing an inherent limitation to the self-comparative item for providing unique mortality information over time. This limitation of the self-comparative SRH measure may also explain our seemingly counterintuitive findings that 'same' self-comparative ratings are protective of mortality risk compared to 'better' ratings. For example, an individual who rates their health as 'better' than the previous year could be reflecting on their experience of recent health issues that may subsequently increase mortality risk. The contrast between the proportion of the current sample who rated their health as "better than others their own age" and "not as good as twelve months ago", along with the small to medium correlations found between the SRH measures, supports the notion that the reference point invokes specific comparison processes which can bias health assessments [14,37], making them less predictive of mortality over time.
Idler and Benyamini [14] and Jylhä [38] argue that the robust SRH-mortality relationship found in global SRH is most likely due to complex, dynamic human judgements that include contextual evaluation frameworks where past and current health is considered along with future health expectations. In the few studies that have compared the determinants of global and comparative SRH items, the global measure has been found to be the most inclusive measure of subjective health in terms of its associations with other factors of health [37,39]. For example, Eriksson et al. [39] found that physical, functional and mental health, health behaviours, and psychosocial factors (such as social support), held significantly stronger associations with global SRH compared to an age-comparative measure. Similarly, the global measure has previously been shown to be the most comprehensive SRH item for the ALSA sample used here [40]. Taken together, these findings suggest that the strong association observed between global SRH and mortality is due to the global measure reflecting an all-encompassing evaluation of health compared to the other SRH items.
In the current study the utility of the SRH items to predict mortality was dependent on age. In particular, the two comparative measures did not provide unique information of mortality risk in adults over 75 years of age. It has been argued that the age-comparative item is not appropriate to use in older populations, or samples with a large age range, due to its sensitivity to age [6,34,38]. The current findings support this notion of age-sensitivity and extend it to the self-comparative item. Further research is needed to clarify whether these comparison effects extend to other age groups or are merely cohort effects.
Whilst the focus of the current study was on the impact of the varying SRH measures on mortality, and the majority of the covariate relationships with mortality are as expected, there are findings in the models that are note worthy. For example, we found a significant protective effect for number of medical conditions. We suggest that the counterintuitive relationship between number of medical conditions and mortality found here are a product of the large number of non-life threatening conditions that were included, such as cataracts, gout, hernia, ingrown toenails, and migraines. The conditions included in the aggregate variable were not weighted here for lifethreat, as previous research has suggested that this does not necessarily improve model fit [41], however the combination of stage of disease and comorbidity was found by these authors to increase mortality risk. Together with the current findings it is suggested that future research is needed to ascertain the best way to measure and weight comorbidity in relation to mortality risk.
The major strength of our study is that the large amount of longitudinal data (up to seven waves spanning 12 years) allowed for comprehensive health models to be tested in a time-varying approach. Furthermore, the mortality relationship of each SRH item was established by directly comparing the items whilst accounting for shared variance with other SRH items and health covariates. A limitation of the data set is the unbalanced data collection (i.e. not all measures were observed at each wave) and the different modes of data collection (face-to-face versus telephone interview). Whilst, previous research has supported the reliability of telephone interviewing and the strong correlation between this mode with face-to-face interviewing in established samples [e.g. [42,43]], the unbalanced data may require care in the interpretation of results. Singer and Willet [30] argue that the method of imputing the time-varying predictors when they are not observed, by carrying forward the most recent value (as was used here, see Statistical Analysis section), is most likely to result in a conservative estimate. It should also be noted that the small number of oldest-old adults at baseline remaining in the wave 6 and 7 samples could limit our findings, as small sample sizes may result in reduction of statistical power to detect significant effects [29]. Baseline selection effects of this age group must also be taken into account [44], along with possible sample attrition due to causes other than mortality. For example, the significant differences in characteristics of the excluded participants (due to missing data, see Statistical Analysis section above) suggest a possible bias as the oldest-old, males, and those with increased functional difficulties and number of medications were less likely to be included in the final sample. However, the finding that 'poor' global SRH predicted mortality for the oldest-old age group in the adjusted models, as did being male and having increased ADL's, suggests our results are more likely to be an underestimation of effects. Therefore the relationship between time-varying SRH and mortality risk may in fact be stronger than is indicated here.

Conclusions
In conclusion, global, age-comparative and self-comparative SRH items embody unique, age-sensitive, associations with mortality over time. Researchers should exercise caution when pooling or harmonising SRH items as they are not comparable measures of health. Future research investigating the potential for time-varying, and even change in, SRH measures to predict other major health outcomes, such as functional disability and health care utilisation, may extend the application of SRH items for indicators of population health.
The age sensitivity of the comparative SRH measures suggests they should be used with caution in older adult populations, particularly if used for predicting mortality. Furthermore, the usefulness of tracking age-and selfcomparative measures to predict mortality is limited by the anchoring of the evaluation to the reference point. In contrast, 'poor' global is a robust predictor of mortality across age groups over time, indicating that this is the most reliable measure of self-perceived health for older adults.