Item distribution and inter-rater reliability of the German version of the quality of life in Alzheimer’s disease scale (QoL-AD) proxy for people with dementia living in nursing homes

Background The Quality of Life in Alzheimer’s disease scale (QoL-AD) is a widely used Health Related Quality of Life (HRQoL) instrument. However, studies investigating the instrument’s inter-rater reliability (IRR) are missing. This study aimed to determine the item distribution and IRR of the German proxy version of the QoL-AD (13 Items) and a nursing home-specific instrument version (QoL-AD NH, 15 Items). Methods The instruments were applied to 73 people with dementia living in eight nursing homes in Germany. Individuals with dementia were assessed two times by blinded proxy raters. The IRR analyses were based on methodological criteria of the quality appraisal tool for studies of diagnostic reliability (QAREL), the COSMIN group and the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥0.70. Results All items for both instrument versions demonstrated acceptable item difficulty, with the exception of one item (QoL-AD proxy). The IRR was moderate for the QoL-AD (ICC: 0.65) and insufficient for the QoL-AD NH (ICC: 0.18). The additional computation of the average measure ICC for two proxy-raters demonstrated a strong IRR (ICC: 0.79) for the QoL-AD and a weak IRR for the QoL-AD NH (ICC: 0.31). The detailed analysis of the IRR for each item underpinned the need for the further development of both instruments. Conclusions The unsatisfactory IRRs for both instruments highlight the need for the development of a user guide including general instructions for instrument application as well as definitions and examples reflecting item meaning. Priority should be given to the development of reliable proxy-person versions of both instruments. Trial registration ClinicalTrials.gov: NCT02295462, Date of registration: 11–20-2014.


Background
Health Related Quality of Life (HRQoL) has become an important outcome in dementia research [1,2]. HRQoL is defined as a "multidimensional concept that reflects the individual's subjective perception of the impact of a health condition on everyday living" [3]. The Quality of life in Alzheimer's Disease (QoL-AD) scale has been widely used to measure HRQoL in people with dementia [4][5][6][7]. The QoL-AD has been developed in the US [4] and is based on the conceptual work by Lawton [8] who defines the QoL concept as multidimensional, including subjective (e.g., perceived QoL and psychological well-being) and objective components (e.g., behavioral competence and environment). The instrument had originally been developed as self-rating version for community-dwelling people with dementia but has also been used frequently as proxy-instrument in the nursing home setting [9][10][11].
In addition to the self-and proxy version of the QoL-AD, Edelman et al. [5] have developed the QoL-AD NH, an adapted version of the original instrument particularly for people living in nursing homes. Whereas self-rating means that the QoL of a person with dementia is rated by herself/himself, a proxy-rating is defined as the QoL rating of a particular person with dementia by a proxy e.g. relative or caregiver of the person with dementia. According to Pickard & Knight [12] the proxy-rating perspectives "proxy-proxy" and "proxy-person" have been distinguished. While in the former perspective the proxy rates the HRQoL of a person with dementia from his/her proxy perspective, in the latter perspective a proxy assesses the QoL of a person with dementia as he/she thinks the person with dementia would rate him or herself [12]. Both proxy perspectives are appropriate for the assessment of HRQoL. Unfortunately, the applied proxy perspective mostly remains unclear in the literature [13]. One recent study investigated both perspectives (proxy-proxy and proxy-person), comparing them to self-reports of people with and without dementia in nursing homes. The three perspectives were assessed with different versions of the EuroQol-5D [13]. The results show that both proxy-perspectives overestimate low self-reports and underestimate high self-reports. The tendency to attenuate self-ratings existed for both proxy-perspectives with a smaller perspective gap between self-ratings and the proxy-person perspective [13]. These results highlight the need for a clear definition and description and the psychometric investigation of both proxy perspectives. In general, there is little information on the psychometric properties of the proxy versions of the QoL-AD and QoL-AD NH [1,2]. In particular, there is a lack of information about the inter-rater reliability (IRR) of the QoL-AD proxy and QoL-AD NH proxy scales [2]. This lack of information is often neglected in the literature [1,[14][15][16][17][18] and its impact on the validity of the QoL-AD proxy and QoL-AD NH proxy scales is unclear [19].
A detailed user guide with instructions for the instrument application is available for the self-rating versions, but not for proxy-rating versions. It is particularly unclear whether the items of both proxy versions have to be rated from a proxy-proxy or a proxy-person perspective.
QoL-AD proxy and QoL-AD NH proxy are frequently applied which is most likely due to the instruments' anticipated feasibility. Thus, the low number of items (QoL-AD proxy = 13 Items, QoL-AD NH proxy = 15 Items) and the fact that no comprehensive training is necessary (instructions for the proxy versions are not available) allows a resource-saving data collection in contrast to other QoL measures for people with dementia (e.g. QUALIDEM, Dementia Care Mapping instrument). The number of missing values has been described as low [9,11,20].
The discrepancy between this lack of knowledge and the uncontrolled usage of the QoL-AD proxy and QoL-AD NH proxy emphasizes the relevance of a comprehensive evaluation of the IRR of both measures.
While the German version of the QoL-AD proxy has been available for some years, we have only recently conducted a cross-cultural adaption of the QoL-AD NH proxy to the German context [21].
Based on the international lack of knowledge concerning the IRR of the QoL-AD proxy and QoL-AD NH proxy as well as different perspective gaps between self-ratings and proxy perspectives the objective of the present study was to evaluate item distribution and IRR of both instruments, based on a proxy-proxy perspective for the QoL-AD proxy and a proxy-person perspective for the QoL-AD NH proxy.

Study design
This study was conducted between June 2015 and March 2016 as a nested cohort study within the randomized controlled trial EPCentCare [22], which aimed to reduce antipsychotic medication in nursing home residents. EPCentCare was carried out in three German regions, whereas the present evaluation took place in Northern Germany only. The study sample consisted of people with dementia from eight nursing homes located in Schleswig-Holstein and Hamburg. The investigation of the IRR of the QoL-AD proxy and QoL-AD NH proxy was based on the criteria of the quality appraisal tool for studies of diagnostic reliability (QAREL) [23] and the COSMIN group [24] (e.g. sample size calculation, description of blinding of raters, description of raters).

Sample size calculation
The sample size calculation was based on an estimated intra-class correlation coefficient (ICC) of 0.75, ratings of two independent raters (registered nurses and nursing assistants) and a width of 0.20 for the 95% Confidence Interval (CI). This resulted in a calculated sample of 75 residents with dementia [25].

Procedures
According to the inclusion criteria of the EPCentCare trial, nursing homes with at least 50 residents were eligible for participation in the study. For this IRR evaluation the predefined inclusion criteria for residents with dementia was a Dementia Screening Scale score ≥ 3 [26]. Exclusion criteria were a temporary stay in respite care or a primary diagnosis of schizophrenia or bipolar disorders.
Inclusion criteria for caregivers were at least half-time work and at least one year nursing training ("nursing assistant" qualification). Qualification levels of proxy-raters depended on organizational conditions and staffing levels at the time of data collection in the participating nursing homes. Additionally, caregivers had to have been at work on at least seven days within the last two weeks prior to data collection and had to have a close relationship with the assessed resident. Based on these criteria, caregivers were identified and assigned to the assessed residents by the management staff (head nurse) of each participating nursing home. The close relationship with the assessed resident enabled the caregivers to rate the HRQoL based on a proxy-person and proxy-proxy perspective.
The IRR evaluation was based on independent proxy-ratings from registered nurses or nursing assistants referring to the preceding two weeks. Both independent ratings for one measure and one resident took place in the same shift and under the same circumstances. Each caregiver was blinded to the ratings of the other rater. To ensure blinding of raters and standardized data collection, application of QoL-AD proxy and QoL-AD NH proxy was supervised by researchers of the EPcentCare trial, who had been trained in applying QoL instruments, prior to the data collection. The supervision included a short instruction about the assessed construct (e.g. HRQoL, agitated behavior), the underlying time frame (preceding two weeks) and the perspective (e.g. proxy-person). Based on different organizational conditions and staffing levels at the time of data collection, independent ratings by different qualified caregivers (registered nurses and nursing assistants) were possible. During the data collection occasion one caregiver assessed one QoL-AD proxy version at the beginning and one at the end of the data collection. The order of the QoL-AD proxy application was randomly applied.

Instruments
The QoL-AD proxy consists of 13 items which can be answered by self-rating (person with dementia) or proxy-rating (e.g. caregiver). Response options for all items are "poor", "fair", "good" and "excellent" resulting in item scores between 1 to 4 and total scores between 13 and 52, with higher scores indicate higher HRQoL. In 2005 the QoL-AD was translated into German language [27]. For this investigation the German version provided by Mapi Research Trust, Lyon was used [28]. Although information related to the linguistic validation of the German version is not available, in two recent studies, the German QoL-AD proxy demonstrated sufficient internal consistency and structural validity [6,29].
For the nursing home version (QoL-AD NH proxy), two items of the original version concerning financial and marital status (Money, Marriage) were removed and four items added (People who work here, Ability to take care of oneself, Ability to live with others, and Ability to make choices in one's life) [5]. The response options for the QoL-AD NH correspond to the original version, resulting in total scores from 15 to 60. In 2016 the nursing home version was translated into German and linguistically validated [21]. While no information on psychometric properties are available for the German version, the original US proxy-proxy version showed sufficient internal consistency in three studies [5,30,31] and a nearly perfect IRR [31].
Cognitive impairment of participating residents was assessed by nursing staff using the Dementia Screening Scale (DSS) [26], a seven item measure with a three point response scale (0, 1, 2) resulting in scores between 0 and 14, with higher scores indicating more cognitive impairment.
Agitated behavior was assessed by nursing staff using a adapted German version [32] of the Cohen-Mansfield Agitation Inventory (CMAI) [33,34], a proxy-measurement consisting of 25 items rated on a seven-point scale of frequency of occurrence resulting in scores between 25 and 175 [35], with higher scores indicating higher frequencies of agitation. Age, gender, length of stay in nursing home in months, and care dependency level (0, I, II, or III) as defined by the German long-term care insurance, were taken from the residents' case files. In addition, proxy-rater characteristics were assessed with single items.

Data analysis
Sample characteristics and item distribution were computed using descriptive statistics. Based on the item distribution the item difficulty was proven. An item mean in the first 20% of the scale was defined as floor effect and in the last 20% as ceiling effect. To gain comprehensive information on the degree of IRR for the QoL-AD proxy and QoL-AD NH proxy, a reliability coefficient was calculated for each measurement item and all measurement items in total. This procedure was based on earlier IRR studies and allowed a detailed interpretation and comparison of the IRR of each item [36,37]. The IRR for each item was based on a calculation of the overall proportion of agreement (p o ). Moreover, we computed the linear weighted κ statistics for ordinal data (κ w ) [38,39], because p o ignores the possibility that agreement could occur only by chance and instead considers only crude agreement. The two paradoxical properties of κ statistics were considered during the interpretation of the results [40]. The interpretation of κ w values was based on the following ranges: 0.00-0.20, slight; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, substantial; and 0.81-1.00, nearly perfect [41]. The IRR of the QoL-AD proxy and QoL-AD NH proxy in total were evaluated using Intraclass Correlation Coefficients (ICC) based on a two-way random-effects model for absolute agreement. Additionally, the average-measure ICC for two raters was calculated. This coefficient estimates the IRR of a collaborative QoL-AD proxy respectively QoL-AD NH proxy rating by two raters. Based on the recommendation by Terwee et al. [24], we targeted κ w and ICC values ≥0.7. To analyze the level of uncertainty, 95% CIs for ICCs and κ w values were examined. The 95% CI for κ w values was based on 10,000 bootstrapped samples and determined using the percentile method [42], which included using the 0.025 and 0.975 percentile levels of the estimated Kappa distributions as interval limits. Statistical analysis was performed using R Version 3.2.4 [43] and the software packages "irr" Version 0.84 [44] and "boot" Version 1.3-18 [45,46].

Characteristics of the sample
The sample consisted of 73 residents with dementia and 21 caregivers from eight nursing homes (Table 1).

Item distribution
The descriptive investigation of the QoL-AD proxy items (proxy-proxy perspective) demonstrated a balanced distribution ( Table 2). The response option "good" was used most frequently whereas the response category "excellent" was used least often. Distributions of the other response options were balanced. Only one item showed a floor effect (item 10, ability to do chores around the house).
Analyses for the QoL-AD NH proxy (proxy-person perspective) yielded similar results. However, no item showed floor or ceiling effects.
Missing value analyses demonstrated low percentages of missing values in general. Only item 12 (Money) of the QoL-AD proxy showed a high percentage of missing values (29%). The main reason for this was nurses' refusal to rate or a lack of knowledge to assess residents' financial situation. A descriptive comparison of the elven comparable items shows no clear pattern that one perspective or QoL scale lead to higher QoL ratings.

Inter-rater reliability
The results of the IRR evaluation for the QoL-AD proxy and the QoL-AD NH proxy are displayed in Table 3.

Item distribution
This study describes a comprehensive evaluation of the item distribution and inter-rater reliability of the German QoL-AD proxy and QoL-AD NH proxy.
The investigation of the item distribution of the QoL-AD proxy demonstrated a balanced distribution of the four response options. Twelve out of 13 items showed an acceptable item difficulty. Only one item      Item with floor effect, the mean is in the first 20% of the scale (< 1.6), see Table 2 f Complete case analysis (item 10: ability to do chores around the house) showed a floor effect and item 12 (Money) showed a high percentage of missing values (29%). One reason for these results for item 10 and 12 might be a missing cross-cultural adaption of the QoL-AD proxy for the German context and in particular for German nursing homes.
The analysis for the QoL-AD NH proxy yielded similar results with an acceptable item difficulty for all 15 items. With the exception of the identified floor effect for item 10 of the QoL-AD proxy, these descriptive results are in the line with previous results [4,5,21,47]. Given the relatively small sample size, the identified floor effect for the item 10 of the QoL-AD proxy must be interpreted with caution.
A descriptive comparison of the eleven comparable items shows no clear pattern that one perspective leads to higher QoL ratings. Thus, compared to the comparable items rated in both perspectives, we identified higher mean values for the items 1, 2 and 4 of the QoL-AD (proxy-proxy) and for the items 3, 5, 6, 7, 8, 9, 10 and 15 of the QoL-AD NH (proxy-person). We assumed systematically higher QoL ratings based on a proxy-person perspective compared to a proxy-proxy perspective. This assumption was based on previous studies, which demonstrated systematically lower proxy-based QoL ratings compared to self-ratings [48,49], and on one recent study, which showed a smaller perspective gap between self-ratings and the proxy-person perspective [13]. The major reason for the missing difference between both perspectives might be the only moderate to weak IRR of the applied scales and perspectives.

Inter-rater reliability
The IRR results demonstrate a moderate IRR for the QoL-AD proxy (ICC: 0.65, 13 items, ICC: 0.63, 12 items) and an insufficient IRR for the QOL-AD NH (ICC: 0.18). The additional computation of the average measure ICC for two proxy-raters demonstrated a strong IRR of 0.79 (13 items) or 0.77 (12 items) for the Qol-AD proxy and a weak ICC for the QoL-AD NH proxy (0.31).
The detailed analysis of the IRR of each item yielded heterogeneous results. Based on κ w and p o , only item 6 (family) of the QoL-AD proxy demonstrated a moderate IRR. All other QoL-AD proxy items showed fair (items 1: physical health, 2: energy, 3: mood, 5: memory, 7: marriage, 10: ability to do chores around the house, 13: life as a whole) or slight IRR (items: 4: living situation, 8: friends, 9: his−/herself as a whole, 11: ability to do things for fun, 12: money). Based on the high number of missing values and the slight IRR of item 12 (money of the resident) we recommend the exclusion of this item for the QoL-AD proxy application in nursing homes.
For the QoL-AD NH proxy only items 1 (physical health) and 15 (life overall) yielded a fair IRR. All other items demonstrated a slight IRR.
The in-depth analysis of the IRR indicates the need for improvement of both instruments for their application in research and practice. An improvement of the IRR might be reached through a structured instrument user guide including clear definitions and examples related to the meaning of each QoL-AD proxy and QoL-AD NH proxy item. The positive effect of such a user guide has been demonstrated in a recent IRR study on the dementia-specific QoL instrument QUALIDEM [37].
This study is the first IRR study dealing with the proxy version of the QoL-AD. Therefore, study results can only be compared to the IRR results of other dementia-specific QoL and HRQoL instruments [2].  [50]. Both study results have to be interpreted with caution because of several methodological limitations [2]. The Quality of Life in Late stage Dementia scale (QUA-LID) demonstrated good IRR with an ICC value of 0.83 [51]. For the instrument QUALIDEM a recent study yielded a good IRR for all subscales for people with mild to very severe dementia (ICC = 0.79-0.96) with the exception of the subscale negative affect (ICC = 0.64) [37]. These three instruments reflect overall QoL. One frequently used instrument for the assessment of dementia-specific HRQoL is the DEMQOL [17]. Unfortunately, no information were available on IRR of the DEMQOL proxy-version which would have allowed a comparison to our study results [2]. Due to the heterogeneous range of QoL domains assessed with different QoL and HRQoL instruments, a comparison of IRR results between different instruments is limited. However, IRR results of instruments like QUALIDEM and QUALID demonstrate that good IRR values are achievable for proxy-rated dementia-specific QoL.
Our IRR results for the QoL-AD NH proxy can be compared to one previous US study which demonstrated a high IRR with a ICC value of 0.99 [31]. In contrast, our IRR results seem low. However, the study by Sloane et al. [31] had several methodological limitations and a proxy-proxy perspective. In contrast, our results are based on a proxy-person perspective. The different perspectives also explain the differences between the Qol-AD proxy and the QoL-AD NH proxy in our study.
A comparison of equal items (item 1-6 and 7-10) from the QoL-AD proxy and QoL-AD NH proxy show lower p o . and κ w for the items assessed with the proxy-person perspective (QoL-AD-NH proxy).
The QoL-AD proxy data were based on a proxy-proxy perspective which means that proxies rate the HRQoL of care recipients from their proxy perspective. This might be an easier perspective compared to the so-called proxy-person perspective as there is no need for a perspective shift by the proxy rater. Nevertheless, proxy-proxy ratings can also be influenced by attitudes [52], life satisfaction, the assessment circumstances and challenging behaviors of people with dementia living in nursing homes [53]. Moreover, HRQoL ratings based on proxy-proxy ratings might not be particularly valid due to the partial loss of subjectivity of the HRQoL assessment [54].
In contrast, the proxy-person perspective requires a change of perspectives. Here, a proxy assesses the HRQoL of a person with dementia as he/she thinks the person with dementia would rate him or herself. It can be assumed that this perspective is more difficult for proxies, thus the required level of individuality for each HRQoL rating is high.
The results of a recent study [13], which showed a smaller perspective gap between self-ratings and the proxy-person perspective, underpin the need for the further development of the proxy-versions of the QoL-AD and QoL-AD NH to enable the assessment of an IRR proxy-person perspective for both instruments.

Limitations
This study is the first investigation of IRR of the QoL-AD based on a proxy-proxy perspective and the QoL-AD NH based on a proxy-person perspective. The strengths of this IRR study are the in-depth analysis of the IRR of each instrument item, the inclusion of people at all stages of dementia and the methodological rigor based on the criteria of the quality appraisal tool for studies of diagnostic reliability (QAREL) as well as the COSMIN group [24]. The following limitations should be considered when interpreting the results.
First, although the preplanned sample sizes were almost achieved, the identified width of CIs of the computed ICC values range between 0.29 and 0.67. Especially when interpreting the IRR of the QoL-AD NH this statistical uncertainty has to be taken into account. Our results provide a good basis for sample size calculations of further IRR studies.
Second, the applied data collection procedure led to a QoL-AD proxy and a QoL-AD NH proxy assessment for a resident with dementia by one caregiver on one occasion. Despite the random order of the ratings a possible influence of the rating of one measure on the rating of the second measure cannot be excluded.
Third, proxy-raters' characteristics have to be interpreted with caution due to the high rate of missing values.
Fourth, a close relationship between the proxy and the assessed person with dementia will help proxy raters to reach this required level of individuality. Usually primary models of nursing care are used in German nursing homes. The head nurse was instructed to assign a primary nurse to the assessed residents. Unfortunately, the shortage of nurses in German nursing homes may have influenced the assignment of the head nurses. A varying understanding of this criterion by the assigning head nurse may be jointly responsible for the IRR results.

Conclusions
This IRR study demonstrated a moderate IRR for the QoL-AD based on a proxy-proxy perspective and an insufficient IRR for the QoL-AD NH (proxy-person perspective). According to established cut off points for the interpretation of IRR values there is a need for the improvement of both instruments. We recommend the development of a user guide including general instructions for the application of both instruments as well as definitions and examples reflecting the meaning of each item. Priority should be given to the development of reliable proxy-person versions of both instruments. Until a user guide is available, the QoL-AD proxy might be conducted as a collaborative rating by at least two proxy-raters. However, this approach is limited to a proxy-proxy perspective which means partial loss of the subjectivity of the HRQoL assessment and as a result, a reduction in validity. Additionally, we recommend the exclusion of item 12 (money of the resident) for the QoL-AD proxy application in nursing homes.
The QoL-AD NH based on a proxy-person perspective should not be used until a user guide is available and further IRR studies have demonstrated an improved IRR.