Study population
The present study was part of the Zurich Life and Death with Advanced Dementia (ZULIDAD) study, the aim of which was to describe the situation of PAD dying in nursing homes [29]. It was a prospective, multi-perspective, observational study conducted in 11 nursing homes in the Zurich area of Switzerland: 10 municipal nursing homes and one privately managed nursing home specializing in dementia care (Sonnweid AG). Details of the recruitment process are available in the published study protocol [29]. The residents in the participating institutions were screened using the nursing homes’ Resident Assessment Instrument-Minimum Data Set (RAI-MDS) (Swiss version 2.0) databases [30]. The RAI tool was developed to provide a standardized and inter-disciplinary approach to care planning in long-term care settings. It consists of several modules, such as the MDS, which contains indicator elements, including disease diagnoses, health conditions, cognitive abilities, nutritional status, etc. Another element is the cognitive performance scale (CPS), which was developed from five MDS items (“comatose,” “short-term memory,” “cognitive skills for decision making,” “making self understood,” and “self performance”). The final CPS score ranges from 0–6, with higher scores indicating more severe cognitive impairment [23].
The inclusion criteria were a diagnosis of dementia (RAI-MDS items “Alzheimer’s disease” or “dementia other than Alzheimer’s disease”) and a CPS score of 5 or 6, which indicates severe impairment (“advanced dementia”) [23]. A CPS score of 5 is comparable to a Mini Mental State Examination (MMSE) score of 5 [31]. The exclusion criterion was cognitive impairment due to other conditions, such as a major stroke, tumor, or coma (RAI-MDS item “disease diagnoses”).
Of the 1,786 eligible nursing home residents, 410 (22.9%) met the inclusion criteria. Two of the 410 residents were not eligible due to other neurodegenerative diseases; 37 had no healthcare proxies to be contacted; and 15 died between the eligibility assessment and the start of the data collection. Altogether, 356 healthcare proxies (relatives and professionals) were contacted, with 126 (35.4%) consenting to participation.
Questionnaires and data collection
The MSSE consists of eight items relating to health conditions and two items relating to professional and family estimations of the patient’s suffering. Each item has a binary score of present (1) or not (0) (yes/no format). The final score ranges from 0–10 (0–3 corresponding to low, 4–6 to intermediate, and 7–10 to high levels of suffering) [12]. Since no manual exists for the MSSE, no specific rater training could be performed with the nurses. All nurses underwent a 60–90 min one-on-one introduction with a researcher covering the study questionnaire. Furthermore, researchers were available throughout the study for all questions regarding the questionnaire. Although the Kuder-Richardson Formula 20 (KR-20) is a relevant index to evaluate internal consistency in dichotomously scored scales such as the MSSE, the instrument developers reported Cronbach’s alpha reliability coefficient in the original English version of the MSSE as 0.735 and 0.718, respectively, for groups assessed by two physicians [12].
In addition to the MSSE, symptom management was assessed with the Symptom Management–End-of-Life with Dementia (SM-EOLD) scale [32] and global estimations of physical and psychological suffering, with two separate single items to test for construct validity. The SM-EOLD consists of nine items, including those relating to pain, shortness of breath, and fear, and is recommended for assessing nursing home residents with dementia [33]. Proxy respondents indicate how frequently they have observed the symptoms in the last four weeks on a six-step scale (“never,” “once a month,” “two or three times a month,” “once a week,” “two or three times a week,” or “daily”), with possible scores ranging from 0 to 45 and higher scores indicating better symptom management. Global physical and psychological suffering were assessed using two separate questions referring to the previous seven days on an 11-step scale ranging from 0 (no suffering) to 10 (highest possible suffering): “How would you rate the extent of the PAD’s physical suffering?” “How would you rate the extent of the PAD’s psychological suffering?”.
The MSSE, the SM-EOLD, and the two separate questions assessing global physical and psychological suffering were all administered by primary nurses. The primary nursing care system emphasizes person-centered delivery and assigns specific nurses to specific patients [34]. Due to the close relationship that primary nurses develop with their patients, they can be considered reasonably accurate observers of suffering in PAD.
Of the 126 PAD included in the ZULIDAD study, data for two of them were missing because their primary nurses did not carry out the baseline measurements. Thus, 124 PAD were assessed by 95 primary nurses. Among them, 72 were responsible for one PAD each and 23 for multiple PAD (17 primary nurses for two PAD each and six primary nurses for three PAD each). Of the 124 PAD, SM-EOLD scores were missing for 10 (8.05%) of them. Thus, SM-EOLD scores were available for 114 PAD. The global estimations of physical and psychological suffering were available for all 124 PAD. (All data presented in this article are from baseline measurements collected between December 2013 and December 2014. Permission was obtained from the original author of the MSSE (Dr Aminoff) and of the SM-EOLD (Dr Volicer) to use and translate the instruments for this study.)
German translation of the MSSE
The English version of the MSSE [12] was translated into German and cross-culturally adapted to the German context based on the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) guidelines [35]. Two researchers performed independent forward translations and reconciled them into one forward translation, which was back-translated by a native English speaker with a professional background in dementia care. All three translators reviewed the back-translation and harmonized the German version with it. Three experts reviewed the harmonized German version—a general practitioner, a senior long-term care nurse, and a family member of a PAD—for comprehensibility, relevance, face validity, and intertranslation validity and finalized it.
Psychometrical scale performance analysis
Scale development involves complex and systematic procedures grounded in theoretical and methodological rigor in relation to the measurement problem at hand. The theoretical model serves as a guide for conceptual formulations and the definition and operationalization of the phenomenon to be measured [36]. To assess whether the MSSE questionnaire matched the intended goals of the developers, we first analyzed the theoretical and conceptual framework of the MSSE questionnaire, how the latent construct of suffering was defined and operationalized, and whether the item generation was based on deduction, induction, or a combination of the two methods.
Clinically useful measures should exhibit minimal floor and ceiling effects, which are considered to be present when more than 15% of the persons assessed achieve the lowest or highest possible total score, respectively [37]. Consequently, patients with the lowest (0) or highest (10) possible total MSSE scores cannot be distinguished from each other concerning their level of suffering.
Structural validity was evaluated by confirmatory factor analysis (CFA) to assess whether the scores of the MSSE instrument would confirm the predefined unidimensionality of the construct of suffering [38]. We also conducted a Mokken scale analysis (MSA) [39] to assess the assumption of unidimensionality. The investigation of MSA models is suitable when the number of items in a questionnaire is low [40], as with the MSSE.
Construct validity—the degree to which the scores of an instrument are consistent with hypotheses based on the assumption that the instrument validly measures the intended construct—was assessed using convergent validity and divergent validity [41]. Convergent validity assesses how a scale correlates with related variables or other related measures, and divergent validity is an assessment of a scale’s lack of correlation with dissimilar variables or unrelated measures [18].
For convergent validity, we hypothesized that good correlations between the total MSSE score and global estimations of physical and psychological suffering would be found. For divergent validity, we hypothesized that a weak correlation between the total MSSE score and the SM-EOLD total scores would be found.
Statistical analysis
All calculations were computed with STATA, version 16.1 for Mac [42]. Missing data were not imputed, and omitted items were excluded from the analysis. The measurement properties of the scores produced by the instruments were assessed using several indices [38, 43].
Descriptive analysis was used to calculate the mean scores (M) and standard deviations (SD) of the sociodemographic and clinical variables of PAD, the sociodemographic variables of the primary nurses, the MSSE total scores, the SM-EOLD total scores, and the two separate single items for the global estimations of physical and psychological suffering. Nominal data were reported as frequencies (numbers, percentages).
The internal consistency reliability of the dichotomously scored MSSE items was measured using the Kuder-Richardson Formula 20 (KR-20). A value of > 0.7 was expected [18, 44].
To assess structural validity, a CFA was used. Variation and covariation among the 10 items were evaluated using fit indices for a reflective one-factor structure model. We calculated the following fit indices: the root mean square error of approximation (RMSEA) with a 90% confidence interval (CI), the comparative fit index (CFI), and the Tucker–Lewis index (TLI). Although cutoff rules are still under discussion, it has been suggested that a CFI of > 0.95 and an RMSEA of < 0.06 indicate an acceptable fit for binary variables [45]. TLI values close to 0.95 are considered to demonstrate an acceptable fit. A factor loading of > 0.5 was expected for each item. Scalability was measured using Loevinger’s coefficient H. By convention, the strength of a scale is considered weak (0.3 ≤ H < 0.4), moderate (0.4 ≤ H < 0.5), or strong (0.5 ≤ H ≤ 1.0) [40].
Convergent and discriminant validity were determined by analyzing Pearson’s correlation [38]. The size of the correlational effects is considered small (0.1 < Pearson’s correlation coefficient (r) < 0.3), moderate (0.3 < r < 0.5), or high (r > 0.5) [46]. The level of significance (p value) of < 0.05 (two-tailed test) was considered statistically significant.