Measurement properties, feasibility and clinical utility of the Doloplus-2 pain scale in older adults with cognitive impairment: a systematic review

Background The Doloplus-2 is a pain assessment scale for assessing pain in older adults with cognitive impairment. It is used in clinical practice and research. However, evidence for its measurement properties, feasibility and clinical utility remain incomplete. This systematic review synthesizes previous research on the measurement properties, feasibility and clinical utility of the scale. Method We conducted a systematic search in three databases (CINAHL, Medline and PsycINFO) for studies published in English, French, German, Dutch/Flemish or a Scandinavian language between 1990 and April 2017. We also reviewed the Doloplus-2 homepage and reference lists of included studies to supplement our search. Two reviewers independently reviewed titles and abstracts and performed the quality assessment and data abstraction. Results A total of 24 studies were included in this systematic review. The quality of the studies varied, but many lacked sufficient detail about the samples and response rates. The Doloplus-2 has been studied using diverse samples in a variety of settings; most study participants were in long-term care and in people with dementia. Sixteen studies addressed various aspects of the scale’s feasibility and clinical utility, but their results are limited and inconsistent across settings and samples. Support for the scale’s reliability, validity and responsiveness varied widely across the studies. Generally, the reliability coefficients reached acceptable benchmarks, but the evidence for different aspects of the scale’s validity and responsiveness was incomplete. Conclusion Additional high-quality studies are warranted to determine in which populations of older adults with cognitive impairment the Doloplus-2 is reliable, valid and feasible. The ability of the Doloplus-2 to meaningfully quantify pain, measure treatment response and improve patient outcomes also needs further investigation. Trial registration PROSPERO reg. no.: CRD42016049697 registered 20. Oct. 2016. Electronic supplementary material The online version of this article (10.1186/s12877-017-0643-9) contains supplementary material, which is available to authorized users.


Background
Cognitive impairment is increasing globally [1], as is the global population over 60 years old [2]. Pain is a welldocumented, very prevalent issue in older adults with cognitive impairment, who often suffer from conditions like musculoskeletal disorders, malignancy, gastrointestinal and cardiac conditions [3][4][5]. It is estimated that at least 50% of older adults with cognitive impairment residing in long-term care (LTC) facilities have pain on a regular basis [6,7].
Pain assessment is essential for adequate pain management [7,8], but assessing pain in older adults with cognitive impairment remains a challenging issue due to impaired memory, changes in cognitive processing, and a reduced ability or inability to communicate verbally [7,9]. Thus, caregivers may need alternative methods to obtain information about the person's pain. When older adults with cognitive impairment cannot report pain themselves, the next best optionthe so-called 'silver standard'is assessment by the person who is most familiar with the patient's everyday life [10]. However, previous research has reported that pain assessment in older adults with cognitive impairment often depends on a health care provider's (HCP) subjective impression and occasionally appears to be mere guesswork [11,12]. Therefore, in clinical practice, it may be useful for HCPs to use pain assessment tools that account for the population's distinctive characteristics. However, pain assessment tools are used infrequently, which may contribute to the fact that un(der)managed pain remains a major problem in this population [6,13,14]. Furthermore, there is limited evidence regarding the measurement properties, feasibility and clinical utility of pain assessment tools for older adults with cognitive impairment. Currently, no one particular tool is recommended [9,15,16]. However, a 2014 metareview that reviewed 28 tools developed specifically for pain assessment in people with dementia identified the Doloplus-2 pain scale as one of the better tools currently available [9].
The Doloplus-2 is based on the Doloplus, which was developed by Wary et al. in 1993 [17]. The Doloplus was based on a tool that used behaviour to assess pain in children with neoplastic disease (the Douleur Enfant Gustave Roussy scale). The Doloplus assessed pain in older people with verbal communication difficulties by assessing their behaviour using three subscales: somatic, psychomotor and psychosocial reactions to pain. Each subscale included five items (for a total of 15 items), and each item received a score of 0, 1 or 2 [18]. In 1994, a network of geriatricians from Switzerland and France began developing the Doloplus-2, based on the Doloplus. The Doloplus-2 has the same three subscales, but the total number of items was reduced to ten:

1) Somatic reaction to pain includes five items:
'somatic complaints', 'protective body postures adopted at rest', 'protection of sore areas', 'expression' and 'sleep pattern'. 2) Psychomotor reaction to pain includes two items: 'washing and/or dressing' and 'mobility'.
The ten items on the Doloplus-2 are scored from 0 to 3; higher scores represent more intense pain [19]. The total score can range from 0 to 30. The score for the somatic reactions subscale ranges from 0 to 15, the psychomotor reaction subscale ranges from 0 to 6, and the psychosocial subscale ranges from 0 to 9. If the rater considers an item inappropriate, the item is not scored. A combined score of 5 or higher suggests the presence of pain [19].
The Doloplus-2 covers most of the pain behaviour categories recommended in the American Geriatric Society's guidelines for 'The management of persistent pain in older persons' [20]; only 'change in mental status' is missing. The Dolopuls-2 includes the categories 'facial expression' , 'verbalizations/vocalization' , 'body movements' , 'changes in interpersonal interactions' and 'changes in activity patterns or routines'. The Doloplus-2 indicates a progression of pain rather than pain experienced in a specific moment [16]. An HCP (e.g. physician, registered nurse, nursing assistant) who knows the patient well should score the Doloplus-2. According to the developers, a trained HCP can complete the scale in approximately five minutes [17]. The Doloplus-2 was officially validated in 1999 and was published in English in 2001 [17,19]. The tool has since been translated into many different languages [21][22][23][24].
Several reviews of pain assessment tools for older adults with cognitive impairment have been published, including a meta review [9]. Some of these include the Doloplus-2 [15,16,[25][26][27]. However, more studies on the Doloplus-2 have been published since the last systematic review in 2012 (these reviewers conducted a systematic search up to 2010) [26]. The Doloplus-2 is one of the more extensively tested tools for pain assessment [9,15], and it has been identified as one of the most promising tools for pain assessment in older adults with cognitive impairment [9]. Furthermore, the scale is used in clinical practices and research across the world. For this reason, this review focuses solely on the Doloplus-2. It seeks to thoroughly examine the scale's feasibility, clinical utility and measurement properties when used to assess pain in older adults as this evidence remains incomplete. A feasible, useful and accurate scale is essential to ensure that older adults in pain are correctly identified as such, consistently and over time. Furthermore, for a pain scale to guide pain management decisions and support efficient evaluations, it must be actionable and easy to interpret, and it cannot take so many resources that it disrupts clinical care. Therefore, this systematic review examines the feasibility, clinical utility and measurement properties of the Doloplus-2 scale when used to assess pain in older adults with cognitive impairment.

Method
This systematic review was prospectively registered with PROSPERO under reg. no. CRD42016049697. The PRISMA guidelines for reporting on systematic reviews were followed. Due to the clinical, methodological and statistical heterogeneity of the included studies, a descriptive approach was adopted in the research synthesis.

Data sources and search strategy
A systematic search was conducted in CINAHL (March 2016), Medline (August 2016) and PsycINFO (September 2016) in collaboration with a research librarian. The search strategy was formulated in CINAHL and adapted in Medline and PsycINFO, using keywords, Boolean operators and the database's controlled vocabulary. The results were limited from 1990 to the dates the searches were performed (Additional file 1).
In addition to the systematic search, a search for the keyword 'Doloplus' was performed in the three databases (February 2017). In CINAHL, 'all text' was selected so that the entire article text was searched for the term 'Doloplus'. Medline and PsycINFO do not have the 'all text' option for searching with keywords, so only titles and abstracts were searched for the keyword. The systematic and keyword searches in all three databases were saved immediately, and e-mail alerts were set up for every search. We received automatic e-mail notifications from all three databases whenever a new publication matching our search criteria (for the systematic or the keyword search) became available in the database. These monthly auto-alerts were reviewed until April 2017, and articles which met the inclusion criteria were included in this review.
In addition to the database searches, the list of previous publications (including publications from 1993 to 2008) provided on the Doloplus-2 online home page was reviewed. Articles which met the inclusion criteria were included.

Eligibility criteria
A study was eligible for inclusion if it: i) used the Doloplus-2 to assess pain in cognitively impaired patients (any stage) aged 65 and older; ii) were published in English, French, German, Dutch/Flemish or a Scandinavian language. Studies in which the Doloplus-2 was described but not used were excluded, as were studies in which the scale was used to validate other observational pain assessment tools. Dissertations, editorials, guidelines and expert opinion papers were excluded as well. Literature reviews were also excluded since they do not contain original data.

Process of study selection
The studies were selected in two steps. First, two reviewers independently screened the titles and abstracts to determine the studies' eligibility for inclusion. Discrepancies and uncertainties were discussed by the reviewer team until a consensus was reached. In the second step, two reviewers independently assessed the full text of the articles for eligibility. The reference lists of the included articles were also reviewed for additional eligible studies to supplement the data sources previously described.

Quality assessment
Two reviewers independently assessed the quality of the included studies using the Mixed Methods Appraisal Tool (MMAT) [28]. The 2011 version of the MMAT allows for the description and appraisal of the methodological quality of five types of studies: i) qualitative, ii) quantitative randomized controlled trials, iii) quantitative non-randomized, iv) quantitative descriptive, and v) mixed methods. Each type has its own set of quality criteria. The criteria are scored 'yes' , 'no' or 'can't tell' , followed by comments. The MMAT's inter-rater reliability is moderate to excellent [29]. Since this is the first systematic review of the Doloplus-2, we wanted to provide a comprehensive review of the scale, so no study was excluded based on the quality assessment.

Data abstraction
All the reviewers used a standardized data abstraction sheet. Two reviewers independently abstracted information from the studies, including study objective, setting, sample characteristics, how the Doloplus-2 was administered and the results of the assessment, and clinical utility and feasibility data. Feasibility was defined as the time and resources required to collect and process the assessment, encompassing ease of use, the need for staff training, and the time required to complete the assessment [30]. Clinical utility was defined as 'usefulness to clinical practice': the scale's usefulness in identifying pain and whether the result of the assessment could assist clinical decisions (e.g. administration of analgesics) [10]. Information about the Doloplus-2's measurement properties was also abstracted. As a guide for abstracting data on measurement properties, we used the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes [31]. Different authors propose various criteria for assigning strength of association to particular values, but we chose the guidelines for instrument reliability and precision suggested by Hahn et al. [32].

Results
A total of 2692 citations were initially identified for possible inclusion through the systematic search of the three databases. The citations were transferred into Endnote and duplicates were removed; 2131 unique citations remained (Box A). An additional 649 publications were identified through other sources (Box B). There were so many additional publications because the other sources were manually screened, and we did not have a reference system to remove duplicates or those already retrieved through the systematic search. In total, 2780 publications were screened. After the titles and abstracts were reviewed, 42 full-text studies were assessed for eligibility. We were unsure whether five articles met the eligibility criteria, and we attempted to contact the corresponding author via e-mail. For two of those, no e-mail address was found. Of the three authors contacted, two did not respond, and one provided sufficient information [33]. Consequently, four studies were excluded because we were unable to determine whether they fulfilled the eligibility criteria [34][35][36][37]. Fourteen more studies were excluded based on a review of the full text (see Fig. 1). Articles reporting on the same research project but describing different or new results were included as separate sources [22,38], [39][40][41] and [42,43]. A qualitative synthesis was conducted on a total of 24 studies.

Feasibility
Fifteen studies reported, in varying detail, that the raters received some form of training in how to use the Doloplus-2 to collect data [21-23, 38, 42, 43, 45-50, 53-55]. Nine of the studies included clear (but brief ) information about the content of the training [21-23, 38, 42, 45, 46, 50, 54]. The training method was reported in nine studies [22,38,42,43,45,46,49,50,54], and six reported the duration or amount of training [7,22,38,45,46,49]. Every study that described the trainer reported that a member of the research team provided the training [22, 39-43, 46, 48, 49, 54]. Two studies simply mention that training was provided without providing any details [47,55], and one [53] refers to the procedure of another study [50]. In one study, raters gave feedback on the importance of being trained in data collection using the Doloplus-2 and of knowing the patients' normal behaviour in order to use the Doloplus-2 correctly [21].
Ten studies specified that the raters were familiar with the patients' normal behaviour [21-23, 38, 42, 43, 45, 47, 48, 50]. In the remaining studies, this was not clear or not reported. Most of the Doloplus-2 assessments were conducted by a person with a background in nursing [21-24, 33, 38-43, 45-49, 53, 54], sometimes in collaboration with research assistants (RA) or a researcher. In other studies, physicians [50] or an occupational therapist [56] performed the assessments. A description of the raters was not provided or was unclear in four studies [44,51,52,55]. One study reported the initial impact of nurses' qualifications: More highly qualified nurse raters tended to assign higher pain ratings on the Dololpus-2. The effect of nurse qualifications seemed to disappear with repeated use of the scale, and the number of raters did not bias the result [48].
On average, it took raters five to ten minutes per patient to complete the Doloplus-2 [49,50,53]. The raters thought that the scale's administrative burden was small [21]. They also thought that the Doloplus-2 was feasible [23] and easy to use [50,53] and that the manual was clear [24].

Clinical utility
In one study, after a year of regular Doloplus-2 assessments, patients' pain scores decreased significantly, and HCPs' use of analgesic therapy with non-opioids (Step 1 of the WHO pain ladder) increased significantly, from a baseline of 30% to 100% [53]. In another pre-and posttest study, participants in the experimental group were assessed with the Doloplus-2 and received significantly more analgesics than the control group, which was not assessed with the Doloplus-2 [54].
Some studies also evaluated the Doloplus-2's usefulness. One study found that the scale was useful in assessing pain [22], whereas another study reported that the Doloplus-2 was the least useful of the three pain scales evaluated [24]. The scale has been reported to facilitate valuable discussions about patients [21]. Raters using the Doloplus-2 stated that the psychosocial items were difficult to understand and score [22,24] and that these items should be cautiously scored because abnormal social reactions can also be caused by dementia [21]. Furthermore, the highest congruency between Doloplus-2 scores over 5 and registered nurses (RN) reporting 'don't know' when proxy-rating pain was found on the psychosocial subscale [42].
When comparing the Doloplus-2 with other methods used to assess pain in older adults with cognitive impairment, one study in a nursing home found that nurses evaluated significantly more patients as having pain when using Doloplus-2 than when proxy-rating pain. With proxy-rating alone, nurses were not able to say whether one-third of the patients appeared to be in pain [42]. A second study found that patients reported more pain using the Visual Analogue Scale (VAS) than nurses did using the Dolplus-2 [49]. The same study also found that of all the patients who self-reported pain, only one in five scored ≥5 on the Doloplus-2. This raises the question of whether the cut-off score should be adjusted [42,49]. The different study populations (verbal and nonverbal) may explain the different results. It is possible that pain behaviour in people who are able to self-report is different to that of people who cannot selfreport due to more advanced cognitive impairment.

Measurement properties
Seventeen studies reported on one or more measurement properties of the Doloplus-2 (Table 3).

Reliability
Internal consistency The Cronbach's alpha for the total scale ranged from 0.67 [49] to 0.84 [33,49], indicating low to moderately good internal consistency across settings. The alpha coefficients for the total scale did not increase when any of the items were deleted [22], but they were lower for patients with dementia than for those who were not cognitively impaired [49]. The items in the Doloplus-2 are heterogeneous, so they are not expected to correlate well with each other since they reflect a variety of dimensions [42].
The Cronbach's alpha for the subscales ranged from low to moderate or good internal consistency in the different settings, including nursing homes (0.60 to 0.84) [22,42].
Test-retest reliability Test-retest reliability was high to excellent in one study in a hospital setting (Intraclass Correlation Coefficient (ICC) = 0.96) [49]. The testretest reliability for multilingual versions of the test in multiple settings was moderately good to high or excellent; the ICC ranged from 0.62 (the Dutch version) to 0.98 (the Italian version) [50].
Inter-rater reliability Inter-rater reliability was tested using different statistical techniques (ICC, Pearson correlation, Kappa statistics, Wilcoxon signed rank, paired t-test, matching scores) [22,23,47,48,50]. Agreement among raters ranged from 0.73 [48] to 0.97 [50], indicating moderately good to high or excellent inter-rater reliability across settings. Agreement for the subscales ranged from 0.60 to 0.84 [22]. One study compared pain level categorizations (the Doloplus-2 total score was used to classify patients into groups with mild, moderate or severe pain) across raters and found moderately good agreement (0.42 and 0.50) on two testing occasions [48]. The mean κ values for pairs of raters at each pain intensity level (mild, moderate, severe) increased as pain intensity increased (from mild 0.04 to severe 0.38) [51]. High intensity behaviour is more obvious and most likely easier for raters to spot and agree on. One study found no statistically significant differences between the two raters in the total score [33]. Another study found no difference between mean total scores for RA-RN pairs but found a statistically significant difference between the mean total scores of RA-Nursing Assistant (NA) pairs; the NAs reported more pain cues than the RAs [38]. In another study, matching scores by researchers and RNs was 77.5%, p = <0.01 [23].

Content validity
The degree to which the (items of an) instrument seems to be an adequate reflection of the construct to be measured was only addressed in one study, which reported that that the scale pinpoints important pain clues [21].
Construct validity A 1-factor solution was the best description in two studies using exploratory factor analysis [33,48]. In a study using principal component analysis, items loaded on three factors, and each item was correlated with the originally belonged subscale in addition to the overall scale [22]. A single-factor model best described the correlation between Doloplus-2 and two other observational pain assessment tools (the Abbey Pain Scale and the Checklist of Nonverbal Pain Indicators), indicating that these scales measure the same single construct [48].
Cross-cultural validity was examined in three studies. In these, a group of experts or the raters of the scale reviewed the content of the translated versions of the Doloplus-2 [21][22][23].
To consider 'hypothesis testing' , one study examined the correlations between the Doloplus-2 and the socalled 'known correlates of pain'. This study found a statistically significant correlation between the Doloplus-2 and functional ability and depression in dementia [22]. Another study reported that there was no statistically significant difference between mean scores on the Doloplus-2 facial items across different levels of pain intensity [51]. A Known-groups technique was used to compare the Doloplus-2 scores of a 'no pain' group and a 'daily pain' group. This study found that the mean score was obviously higher in the 'daily pain' group than in the 'no pain' group. Another study reported low correlations between the Doloplus-2 and other measures of pain (the Pain Assessment Checklist for Seniors with   The translation was approved by all administrators. No item was pointed out as confusing, difficult to understand or elsewhere problematic Criterion (concurrent) Experts' pain rating with NRS-11 was used as a pain criterion.
The experts rated 25 patients as pain free where the Doloplus-2 made five false positive with scores of 5 and 6. Of the 59 cases, the Doloplus-2 made false negatives on 10 occasions: a Doloplus-2 ≥ 5 at the same time as the expert rated above 0 on the NRS-11. In five of these cases, the expert's score was one half (usually 0 at rest and 1 in movement), three had a score of 1 and the remaining two were rated with 2 and 3 on the NRS-11 The Doloplus-2 explained 62% (R 2 ) of the pain distribution. For 85% of the assessments, the Doloplus-2 score (0-30) multiplied by 0.25 (beta) corresponded to the expert score ± 1 unit on the 0-10 NRS scale Facial expression explained 48% (R 2 = 0.48) of the experts scores alone. When including items Protective body postures at rest, Communication and Somatic complaints, these four items explained 68% of the total variability in the experts' scores Hølen, 2007, Norway [47] Reliability (inter-rater) Agreement between a geriatric specialist nurse and an enrolled nurse on the total score was 0.77 (ICC), with a 95% CI of 0.47-0.92. Assessed in the 16 patients included at the geriatric hospital unit Criterion (concurrent) The pain criterion was the specialist nurse (pain expert) who made a single evaluation of each patient's pain level on NRS-11. Doloplus-2 scores against the expert scores produced an R 2 = 0.023, implying poor criterion validity of the Doloplus-2 when compared to pain experts evaluation. Association was found between the pain expert and the geriatric expert nurse who administered the Doloplus-2 in 16 patients in the Hospital, R 2 = 0.54

NR NR
Monacelli, 2013, Italy [53] NR NR Reduction of total mean score between the first assessment and after 1 year of follow up (Wilcoxon rank test) R 2 = 0.216, p < 0.001 NR Neville, 2014, Australia [48] Internal consistency Cronbach's alpha for the two rater groups on the two assessment occasion was 0.86 and 0.87 Reliability (test-retest) Agreement for the two testing occasions occurring two weeks apart. Pearson correlation 0.71 for both rater groups Reliability (inter-rater) Criterion (concurrent) Pain criterion was RNs initial yes/no rating of the residents' pain. Pearson correlation for each rater group at the first testing occasion showed moderate correlations at 0.43 (rater group 1) and 0.45 (rater group 2) Construct (Structural) EFA showed a 1-factor solution was the best description of the factor structure of the Doloplus-2  Limited Ability to Communicate, the Pain Assessment in Advanced Dementia, the Visual Analogue Scale (VAS) and the Verbal Rating Scale) [24]. However, it is possible that self-rated pain, hypnotized correlates and other observational measures of pain, assess different dimensions of pain than the Doloplus-2 [22,48]. One study reported that several items on the Doloplus-2 are related to delirium, depression and/or the severity of dementia; item 10 ('Problems of behaviour') on the psychosocial subscale appears to be the least specific [46].
Criterion validity Five studies reported on the correlation between the Doloplus-2 and a 'gold standard' or 'pain criterion' [33,42,48,49,51]. A moderately high correlation (Spearman 0.7) was reported for the University of Alabama Birmingham Pain Behaviour Scale [33].
One study reported a low correlation (Pearson 0.4) with RNs' yes/no rating of patient pain [48], and another study found that significantly more patients were evaluated as experiencing pain when using Doloplus-2 than with RNs' proxy rating of pain [42]. No significant correlations were observed between the Doloplus-2 and the Facial Action Coding System at any level of pain intensity (mild, moderate or severe) [51].
One study reported a low correlation (Spearman 0.46) with patients' self-assessment (VAS), but the correlation was higher in patients without dementia than in patients with dementia. Moreover, the Doloplus-2 predicted 41% of the variability in pain intensity as measured by the VAS where the somatic dimension explained the most [49]. Two studies compared the Doloplus-2 to experts' pain ratings on the Numeric Rating Scale (NRS)-11. One found that the criterion validity of the Doloplus-2 was satisfactory and that the Doloplus-2 explained 62% of the experts' pain score; the item 'facial expression' alone explained 48% of the experts' scores [21]. The second study that used pain experts found no association between the experts' ratings and the Doloplus-2 scores [47]. However, in this study, the criterion validity increased when the Doloplus-2 was administrated by a specialized geriatric nurse [47].

Responsiveness
Four studies examined the ability of the Doloplus-2 to detect changes in pain over time [53][54][55][56]. One study reported a statistically significant reduction in the total mean score after one year of monthly assessments [53], while three studies demonstrated a statistically significant reduction in the total [54][55][56] and subscale scores [55] post-treatment.

Discussion
This review synthesizes the available research on the feasibility, clinical utility and measurement properties of the Doloplus-2 pain scale in older adults with cognitive impairment. Previous reviews have concluded that there is limited evidence for the feasibility, clinical utility, and validity of the measurement properties of pain assessment tools for older adults with cognitive impairment [9,15]. Based on the 24 studies summarized in this review, we draw a similar conclusion for the Doloplus-2. Of the studies evaluated, only four studies were assessed as high-quality studies based on the MMAT. There were significant variations in the designs and methods of analysis in the included studies. The majority were performed in LTC settings with patients with cognitive impairment and used small, heterogeneous samples, which limited the possibility of sub-group analyses. Consequently, it is difficult to draw conclusions about the suitability and effectiveness of the scale in various subpopulations (i.e. varying types and degrees of cognitive impairment). Furthermore, the methods of assessing pain with the Doloplus-2 varied across the studies. There was considerable variation in how the studies reporting on at least one of the COSMIN measurement properties assessed reliability, validity and responsiveness. Likewise for the handful of studies that explicitly assessed feasibility and clinical utility, which also used small samples.
Because older adults with cognitive impairment (especially in the severe stage) often have a limited ability to communicate pain, their expressions of pain may not be obvious and may be difficult to interpret. Consequently, it is essential that clinicians and researchers use appropriate, effective tools when assessing pain in older adults with cognitive impairment. Furthermore, the measurement properties of such tools are not fixed attributes of the scale and vary according to population [57,58], and validation is a long process which needs to be repeated [47,59]. These findings have several implications for clinical practice and future research.
First, it must be further evaluated whether and how the results of the Doloplus-2 assessment can guide clinical decisions and improve patient outcomes. This may vary across settings and populations. One important issue is whether all of the Doloplus-2 items detect pain, rather than other symptoms, in older adults with cognitive impairment [21,22,24,46]. The overlap between manifestations of pain and those of delirium, dementia and/or depressive symptoms can make it difficult to assess and confidently identify pain (distinct from delirium or depressives symptoms) in this population, who are prone to these comorbidities [60,61]. This may affect treatment decisions based on Doloplus-2 assessments and the quality of the pain management. Previous studies have reported that nurses and physicians experience some uncertainty about the accuracy of pain assessment in older adults with cognitive impairment, and they may be reluctant to administer analgesics as a result of this uncertainty [8]. A combination of Doloplus-2 assessment with the use of observational tools to evaluate comorbidities such as depressive symptoms and delirium may increase the scale's validity and its ability to provide significant clinical information about pain in this population.
The Doloplus-2 is one of the few observational pain assessment tools that provides a cut-off to categorize patients with 'pain' and 'no pain' [9]. The developers of the Doloplus-2 recommend a cut-off ≥5, but they also point out that pain cannot be excluded even with a score below 5 [17,19]. A cut-off score can make the results of the assessment easier to interpret and more meaningful and actionable [58,62] in clinical practice and research. To our knowledge, this cut-off, which is based on clinical experience [19], has not been evaluated. Questions have been raised about whether the established cut-off will entail an under-or overestimation of pain [43,49]. According to the Doloplus-2 Group, higher scores indicate increasing pain intensity [19]. However, there is no evidence supporting the assumption that HCPs can determine pain intensity from patient behaviour [15], nor is there evidence suggesting that it is appropriate to assume that intensity of behaviour is proportional to intensity of pain. Therefore, we argue that the Doloplus-2 only indicates whether a patient may be in pain or not; it does not indicate anything about the intensity of the patient's pain. Thus, there is a need to validate the cut-off score and to examine HCPs' interpretations of the (change in) score. How the score informs clinical decisions and actions must also be evaluated, as this is an important indication of the scales' clinical utility in everyday practice.
Second, more research is needed concerning the feasibility of the Doloplus-2 across settings and populations. There appear to be large variations in how the Doloplus-2 is administered. These variations include the raters' professional qualifications, the training provided (if any), and raters' familiarity with the patients' usual behaviour and habits. As the developers of the Doloplus-2 point out, using the scale requires training [17]. The raters need to understand how it works and the terminology used in the scale. Use of the scale also requires an ability to note changes in a patient's usual behaviour and an awareness of pain and pain control in older adults not able to self-report pain [17,19] in order to plausibly achieve the best fit between the rater's assessment and the patient's experience [9].
However, while such an ideal situation might be feasible for a research study, is it feasible for everyday clinical use? Providing training and securing the availability of staff familiar with patients demands many resources and may impede the scales' feasibility. Across health care settings, staff turnover is high and changing work shifts are common. Furthermore, a shortage of nurses is projected in the next 10 to 20 years [63]. Therefore, the most realistic scenario involves a care facility with a significant number of HCPs who have varying amounts of training, professional and personal skills, and familiarity with the patients administering the scale, which may affect its reliability [38].
The administration, scoring and interpretation of the scale also needs to be described in an unambiguous, reproducible manner. According to the Doloplus-2 guidelines, items on the scale should not be scored if they do not apply to the patient [17]. This is a methodological concern because the total score is affected by unanswered items. It is not clear whether a minimum number of items must be answered in order to use the scale correctly [54]. Consequently, if the Doloplus-2 is to be used in everyday clinical practice, it may be necessary to evaluate the scales' guidelines and determine what actually works in the variety of settings where older adults with cognitive impairment receive health care. Furthermore, how to effectively and easily facilitate everyday use while obtaining valid, reliable results should be explored.
Third, the Doloplus-2 is based on sound assumptions about the multidimensionality of pain. Its items are supported by the literature on how older adults who are unable to communicate verbally express pain [15]. However, the results of our review suggest that there is limited research on the validity of the content of the Doloplus-2. No studies have been done to determine whether clinicians and experts in the various fields of caring for older adults with different types and stages of cognitive impairment consider the scale to be comprehensive. As previously discussed, some items of the Doloplus-2 have been reported to be difficult to administer, probably because the items are somewhat unspecific regarding pain, which may lead to uncertain results. Even though face validity only provides information about whether the Doloplus-2 appears to measure pain, it is still important, as clinicians and experts need to have confidence in the scales' relevance to the construct they want to measure.
Furthermore, it is necessary to evaluate whether the items are equivalent in all multilingual versions, and whether all translated versions of the Doloplus-2 are conceptually, semantically and operationally equivalent [58] to the original French version. If different versions of the Doloplus-2 are not equivalent, it is uncertain whether observed differences in, for example, pain prevalence assessed with the Doloplus-2 are due to actual differences in pain or subtle variations in what the tool is actually measuring. Comparing results and interpreting differences or similarities must be done with caution [58]. Additionally, translation issues, such as ambiguous wording that different raters may understand differently, may lead to inconsistency in scoring some items [21].
The results of our review suggest that it is difficult to establish the construct and criterion validity of the Doloplus-2. The studies included in this review used a variety of hypothesized pain criteria and pain correlates (measures for the same/unrelated constructs) to test these aspects of the scale's validity. Moreover, tests were conducted under a wide range of circumstances and samples. There is no gold standard to use as a benchmark for the assessment of pain in older adults with cognitive impairment due to the subjectivity of pain, and that makes it difficult to evaluate the scale's criterion validity [9].
There is also a lack of interventional studies using rigorous investigation methods, and there is limited evidence regarding the responsiveness of the Doloplus-2. An unresponsive instrument may indicate an improvement in the patient's pain when there actually is none, or it may fail to detect true improvement. There is some controversy over trying to test 'responsiveness' as a property of an instrument as it is hard to disentangle the instrument's characteristics from the characteristics of the treatment provided [58]. However, it is important for clinicians and researchers to know if an intervention induces change in the patient's condition. Therefore, future research should investigate whether the Doloplus-2 measures change in a meaningful way and whether it can be used to evaluate the effect of pain treatments in older adults with cognitive impairment.

Strengths and limitations
This review has several strengths. We used systematic methods and multiple sources to identify relevant studies. We also included articles written in other languages than English. Two reviewers independently assessed the titles, abstracts and quality of the studies. The MMAT was used for quality assessment to allow for the different study designs included in this review, and, in order to provide a comprehensive review, studies were not excluded based on methodological quality. Two reviewers independently abstracted data according to the COS-MIN guidelines; this meant that measurement properties were assessed in a uniform way to avoid confusion regarding relevance, terminology, definitions and design.
One limitation of this review is that the authors of the included studies may have used different definitions for the measurement properties than those provided by COSMIN, which may have led us to misinterpret or misrepresent their findings. An example provided by the COSMIN initiative is the definition of 'responsiveness' , which may be defined as "the ability to detect clinically important change" or as "the ability to detect change in the construct to be measured". These definitions reflect different constructs [31].
Furthermore, our findings are limited due to the heterogeneity of the included studies. Also, some quality criteria of included studies may have been rated as insufficient simply because the necessary information was not available. Four studies that may have had important findings were excluded because we were unsure whether they fulfilled the inclusion criteria. Although we tried to contact the authors of these articles, we were unsuccessful, which may be due to the fact that some of these studies were published ten to fifteen years ago. Finally, approximately one-third of our included studies were retrieved from the supplementary sources. This might indicate a possible bias in the systematic search strategy in the databases, such as missing indexed terms, possibly resulting in a lower number of articles and thereby incomplete conclusions.
Despite these limitations, our review is relevant for both clinicians and researchers. It provides valuable insight about the evidence regarding aspects of the use and the measurement properties of the Doloplus-2. It also highlights some of the complex, challenging issues in the field of pain assessment in older adults with cognitive impairment.

Conclusion
The Doloplus-2 has been cited as one of the more extensively tested and promising tools for pain assessment in older adults with cognitive impairment. Still, this review suggests that there is a lack of comprehensive, high-quality evidence regarding the feasibility, clinical utility and measurement properties of this scale when assessing pain in older adults with cognitive impairment. Further research should examine the Doloplus-2 across a range of settings. Moreover, future studies should use more homogenous samples and provide clear definitions of the type and stage of cognitive impairment and pain. Also, more studies should be done using rigorous methods and large sample sizes in order to better allow clinicians and researcher to assess the tools' effectiveness and appropriateness for measuring pain in older people with cognitive impairment.