Cross-sectoral inter-rater reliability of the clinical frailty scale – a Danish translation and validation study
BMC Geriatrics volume 20, Article number: 443 (2020)
Focus on frailty status has become increasingly important when determining care plans within and across health care sectors. A standardized frailty measure applicable for both primary and secondary health care sectors is needed to provide a common reference point. The aim of this study was to translate the Clinical Frailty Scale (CFS) into Danish (CFS-DK) and test inter-rater reliability for key health care professionals in the primary and secondary sectors using the CFS-DK.
The Clinical Frailty Scale was translated into Danish using the ISPOR principles for translation and cultural adaptation that included forward and back translation, review by the original developer, and cognitive debriefing. For the validation exercise, 40 participants were asked to rate 15 clinical case vignettes using the CFS-DK. The raters were distributed across several health care professions: primary care physicians (n = 10), community nurses (n = 10), hospital doctors from internal medicine (n = 10) and intensive care (n = 10). Inter-rater reliability was assessed using intraclass correlation coefficients (ICC), and sensitivity analysis was performed using multilevel random effects linear regression.
The Clinical Frailty Scale was translated and culturally adapted into Danish and is presented in this paper in its final form. Inter-rater reliability in the four professional groups ranged from ICC 0.81 to 0.90. Sensitivity analysis showed no significant impact of professional group or length of clinical experience. The health care professionals considered the CFS-DK to be relevant for their own area of work and for cross-sectoral collaboration.
The Clinical Frailty Scale was translated and culturally adapted into Danish. The inter-rater reliability was high in all four groups of health care professionals involved in cross-sectoral collaborations. However, the use of case vignettes may reduce the generalizability of the reliability findings to real-life settings. The CFS has the potential to serve as a common reference tool when treating and rehabilitating older patients.
It is a global concern that even highly effective health care systems will struggle to meet the demands of the increasing share of aged and frail populations . Frailty is a health state associated with the ageing process and is recognized as a good estimate of changes associated with molecular ageing, i.e. biological age [2, 3]. Frail citizens often need support from several different health and social care providers and are frequently subject to fragmented continuity of care due to poor cross-sectoral coordination and communication . Knowledge of frailty status enables the identification of citizens who need tailored treatment and care plans within and across health care sectors . This requires the primary and secondary health care sectors to use a standardized frailty measurement tool that has transdisciplinary acceptance [6, 7]. Such a tool could act as a reference point for treatment and care and serve as a safeguard against ageism in allocation of healthcare resources.
Multiple scales and instruments for measuring frailty exist and have been tested in various settings [5, 8]. The Frailty Index  and the Frailty Phenotype  are the most prominent and are often used as reference. However, the Clinical Frailty Scale (CFS) is increasingly used in clinical research across medical specialities and emergency medical services , likely due to its ease of use and speed of completion .
The CFS was developed in Canada in 2005 and was validated for diagnostic accuracy of frailty in people aged 65 years and over in the primary care sector . The original 7-point version was modified in 2008 by the developers to its current form, a 9-point scale with pictograms (Fig. 1) . Since then, predictive performance has also been validated in the secondary health care sector for multiple outcomes, including mortality and admissions to intensive care , length of stay , outcomes from resuscitation , interventions in the intensive care unit , and survival in the intensive care unit . Although the CFS is of particular interest for cross-sectoral implementation, its psychometric properties such as inter-rater reliability have not been evaluated or compared across the primary and secondary health care sectors, which is a necessity for the establishment of a common reference for frailty measure.
The aim of the current study was first to translate the CFS into Danish (CFS-DK) using standard methodology and then to test inter-rater reliability for key health care professionals in the primary and secondary sectors using the CFS-DK.
The study was conducted from 16th March to 8th May 2020.
To ensure cultural and conceptual compliance with the source instrument [9, 11], we translated the CFS into the Danish language using the 10-step ISPOR Principles of Good Practice for the Translation and Cultural Adaptation of Patient-Reported Outcomes . The Danish translation process is depicted in Fig. 2.
Health care professionals from the primary and secondary health care sectors were recruited to validate the CFS-DK and to support potential future use in a cross-sectoral context. Forty raters assessed 15 written clinical case vignettes using the CFS-DK. The raters were 10 community nurses, 10 general practitioners, and 20 hospital doctors (10 from internal medicine, 10 from intensive care) and were recruited as a convenience sample from the authors’ professional network.
Cases consisted of a short text and a picture of each case-patient and were selected to collectively represent all nine levels of the CFS. Cases provided essential information on 1) symptoms of diseases, 2) dependency on others, 3) cognitive function, and 4) physical condition. Each case was presented alongside a picture of the CFS-DK for reference (Fig. 1). Cases were built to imitate real-life patients by authors SKN (senior registrar in geriatric medicine) and KAR (consultant and professor in geriatric medicine). Prior to completing the questionnaire, raters were asked to view a five-minute video introducing frailty as a concept, and the CFS and its use in clinical practice, as well as raising awareness of pitfalls in using CFS, e.g. scoring patients with dementia. The raters then assessed each case according to the 9-point CFS-DK. The cases were presented in random order of severity but were rated in the same order by each rater. Finally, raters were asked to assess the relevance of the CFS for their own area of work and for cross-sectoral collaboration.
Study data were collected using an online questionnaire developed in REDCap (version: REDCap 9.1.15 -© 2020 Vanderbilt University) [18, 19], an electronic data capture tool hosted at Open Patient Exploratory Network (OPEN) at Odense University Hospital, the Region of Southern Denmark.
Inter-rater reliability was assessed using the intraclass correlation coefficient (ICC) with 95% confidence intervals (CI) [20, 21] that was calculated i) for all 40 raters, and ii) within each professional group. The ICCs were considered poor (< 0.40), fair (0.40–0.59), good (0.60–0.75), or excellent (> 0.75) according to standard practice .
The sensitivity of the inter-rater reliability was examined using a random effect linear regression  with CFS-DK scores as outcome and with random effects included for case and rater, and with rater experience and professional group as covariates. The values from the community nurses were used as reference. In this way, we could assess the extent to which the inter-rater reliability was sensitive to the raters’ length of clinical experience and any unobserved variations in raters and cases. Other studies have used a graphical Bland-Altman approach , but this was less appropriate for the current study as our raters came from four different professional groups.
Statistical analysis was performed using SAS software (SAS Institute Inc., Cary, NC, USA). The statistical significance threshold for all tests was set to P < 0.05. Figures were made using “R” software (Version 3.6.1)  and the ggplot2 package .
In accordance with the ISPOR guidelines , the final report of the translation process is available in Additional file 1. The translation process is depicted in Fig. 2 and is summarized as follows:
Step 2–3: We observed high agreement in the meaning and wording of the two forward translations. Minor incongruities were mostly related to synonyms for particular words. Differences were discussed and resolved in a reconciliation meeting. Step 4–5: The back translation corresponded well to the reconciled forward translation. The few discrepancies were identified and discussed, leading to minor word changes in items 4, 5, 6, and 9.
Step 6: To ensure concordance between the harmonized translation and the source instrument, the original instrument developer was contacted for revision and feedback. This led to two minor changes in items 5 and 9. Step 7–8: To test for conceptual coherence, interpretation, and cultural relevance, the translated instrument was tested on five respondents from the target population (two general practitioners, one community nurse, and two geriatricians). Three respondents were women and two were men, and occupational experience ranged from 5 to 40 years. The five respondents were asked to score the same three cases, after which a cognitive debriefing was held with each respondent individually. This did not result in changes to the translation. Step 9–10: The final translation was proofread. The original source instrument and the Danish translation are presented in Fig. 1.
All 40 raters assessed the 15 cases, yielding 600 observations with 40 replicate observations per case. Twenty-two of the 40 raters (55%) were women, and mean length of clinical experience in the four professional groups ranged from 14 to 17 years, with hospital doctors being the most experienced.
The overall inter-rater reliability for all 40 raters (based on individual assessments) was 0.85 (0.74; 0.93). As shown in Fig. 3, the inter-rater reliability was similar for the four professional groups, though highest for the hospital doctors specialized in intensive care (0.90, CI 0.82; 0.96). The ratings had narrow interquartile ranges and median ratings within one level for all but one case. The ratings for Case 13 (which described a case at level 9 with terminal illness but ‘not otherwise evidently frail’) had wide interquartile ranges for both primary care physicians and community nurses.
Table 1 shows the results of the random effect linear regression (with CFS-DK scores as outcome and with case and rater as unobserved random effects) aiming at examining the sensitivity of inter-rater reliability to rater experience. Although the hospital doctors specialized in intensive care came close to being significantly different (p = 0.05), the inter-rater reliability did not differ significantly between the professional groups or according to the length of the raters’ clinical experience (p = 0.96). The random effects components showed that variation between cases (4.40) was more important than variation between raters (0.01).
All but one of the 40 raters considered the CFS-DK to be relevant for their own area of work, and all raters considered it to be relevant for cross-sectoral collaboration.
We successfully translated and validated the CFS into Danish (CFS-DK). The cross-sectoral validation found the CFS-DK to have excellent inter-rater reliability, both within each of the four health professional groups and between these groups. The health care professionals also considered the CFS-DK as being relevant to their area of work and to cross-sectoral collaboration.
These results suggest that the CFS-DK is a useful measure of frailty that can be applied meaningfully in both the primary and secondary health care sectors. Previous studies have provided evidence of good sensitivity and specificity of CFS  and its ability to predict a range of adverse health outcomes [12,13,14,15,16].
A valid and reliable measure of clinical frailty that can be used both within and across health care sectors has several advantages. It has the potential to improve collaborative efforts in the treatment, care, and rehabilitation of frail patients, who often need input from variety of health and social care providers. Reporting a standardized frailty measure by, for example, community nurses could enable primary care physicians to appreciate early signs of functional and physical deterioration among patients receiving home and social care. Hospital doctors and patients could benefit from a frailty assessment when determining treatment options and when planning hospital discharge and rehabilitation or end-of-life care. A standardized frailty measure might also assist community nurses to identify patients requiring extra follow-up after hospital treatment.
CFS is best suited as the entry point for intervention planning, as it is designed for cross-sectional assessments, rather than tracking trajectories . The approach for cross-sectoral comparison of inter-rater reliability employed in this study could also be used for validation of trajectory tracking models.
In our study, the inter-rater reliability was very high (0.98). A recent validation study of the CFS in both English and French reported similar high inter-rater reliability of 0.87 (95%CI: 0.76–0.93) for native French doctors using the source CFS in English, and 0.76 (95%CI: 0.57–0.87) for native French nurses using the French translation of the CFS .
While these results correspond well with the observed inter-rater reliability for hospital doctors in this study, reliability evaluations should be interpreted with care as they depend on the assessment conditions , e.g. location, disturbances, rater characteristics and availability of information. Other psychometric properties of the CFS-DK, such as responsiveness and predictive validity, should also be tested in clinical conditions more similar to daily routine and on actual clinical cases, as recently demonstrated for the German version of the CFS . This could be usefully done in combination with intervention studies, for example. Convergent validity of the CFS was tested during development against the Frailty Index  and could also be tested in different settings.
The estimates for inter-rater reliability in the current study would probably have been improved if we had separated out Case 13 that described level 9 on the CFS (terminally ill but otherwise no evidence of frailty). This level could have been further explained with a note at the start of the CFS tool, or specific training could have been provided on rating this particular level. An underlying assumption in the CFS is that life expectancy declines with increasing frailty from item 1 to item 9. However, functional dependency and frailty progress only from item 1 to item 8. At item 9, physical limitation is not apparent and, by definition, level 9 is “not otherwise evidently frail”. This likely confuses raters and can be observed in the wide distribution of ratings for Case 13. Another possible explanation is that Case 13 uses terms like “[lung cancer disseminated to multiple organs]” and “[declined palliative therapy]”, which require health care professionals recognizing the implied consequences (i.e. severe prognosis and short residual lifespan).
Calibration of the CFS scale is likely dependent on frailty incidence , and maybe in particular item 9 for different health care sectors. Use of the CFS within a hospital might be limited by a lack of discriminative ability for patients with severely affected functional level (i.e. in geriatric departments) as it was developed for community-dwelling adults aged 65 years and over, but no systematic reviews have yet been made of the prognostic performance of the CFS.
The responsibility for care in the frail population varies across European countries, and this has implications for the future use of the CFS. In the Scandinavian countries and the Netherlands, there is societal consensus for a welfare system in which the public sector is responsible for providing good and equal health and long-term social care [30,31,32]. However, demographic changes mean that responsibility for long-term care is slowly moving towards families and relatives, even in publicly financed health care systems . Consulting family and relatives will thus probably be important when health care professionals use the CFS in the future.
Strengths and limitations
This study used clinical case vignettes, and although the case vignettes were clinically appropriate and built to imitate real-life patients, they could also be regarded as hypothetical. Case vignettes provide the opportunity to let raters assess the exact same information, with the same structure and nearly identical setting at the same time point, allowing equal comparisons between professional groups. The cases were designed to encompass all the CFS levels. Ratings given to the 15 cases clearly reflect these different degrees of severity. Conversely, case vignettes introduce a risk of inflated inter-rater reliability and ICC as raters assess all cases at once, which does not reflect clinical practice.Another limitation to this study is the relatively small number of cases (n = 15). However, the use of case vignettes allowed a high number of observations of 600 (15 cases, 40 raters), a quadrupling compared to previous investigations of inter-rater reliability following translation of the CFS .
Finally, raters were recruited as a convenience sample, inherently risking inflated reliability measures and sensitivity analyses.
This study has several strengths. First, the translation process was completed using a rigorous procedure that followed the ISPOR guidelines. Poorly translated instruments threaten the validity of the data, and quality is dependent on methodology . Second, the validation included key actors most often involved in significant transitions between the primary and secondary health care sectors, and the health care professionals were experienced in their clinical fields. Third, raters were informed that inter-rater reliability would be compared between professional groups but that individual rater performance would not be assessed. We expect this to have reduced the risk of a Hawthorne effect (i.e. awareness of being observed).
The Clinical Frailty Scale was translated and culturally adapted into Danish following a careful and well-established standard process. The inter-rater reliability was high in all four groups of health care professionals involved in cross-sectoral collaborations. However, the use of case vignettes may reduce the generalizability of the reliability findings to real-life settings. The CFS has the potential to serve as a common reference tool when treating and rehabilitating older patients.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Clinical Frailty Scale
Clinical Frailty Scale - Danish version
Instrumental Activities of Daily Life
Intraclass correlation coefficient
Key in-country person
Turner G, Clegg A, British Geriatrics S, Age UK, Royal College of general P. Best practice guidelines for the management of frailty: a British Geriatrics society, Age UK and Royal College of general practitioners report. Age Ageing. 2014;43:744–7. https://doi.org/10.1093/ageing/afu138.
Fried LP, Tangen CM, Walston J, Newman AB, Hirsch C, Gottdiener J, et al. Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci. 2001;56:M146–56. https://doi.org/10.1093/gerona/56.3.m146.
Rockwood K, Howlett SE. Fifteen years of progress in understanding frailty and health in aging. BMC Med. 2018;16:220. https://doi.org/10.1186/s12916-018-1223-3.
Humphries R. Integrated health and social care in England--Progress and prospects. Health Policy (New York). 2015;119:856–9. https://doi.org/10.1016/j.healthpol.2015.04.010.
Theou O, Squires E, Mallery K, Lee JS, Fay S, Goldstein J, et al. What do we know about frailty in the acute care setting? A scoping review. BMC Geriatr. 2018;18:139. https://doi.org/10.1186/s12877-018-0823-2.
Hoogendijk EO, Afilalo J, Ensrud KE, Kowal P, Onder G, Fried LP. Frailty: implications for clinical practice and public health. Lancet. 2019;394:1365–75. https://doi.org/10.1016/S0140-6736(19)31786-6.
Reeves D, Pye S, Ashcroft DM, Clegg A, Kontopantelis E, Blakeman T, et al. The challenge of ageing populations and patient frailty: Can primary care adapt? BMJ. 2018;362:1–7. https://doi.org/10.1136/bmj.k3349.
Apóstolo J, Cooke R, Bobrowicz-Campos E, Santana S, Marcucci M, Cano A, et al. Predicting risk and outcomes for frail older adults. JBI Database Syst Rev Implement Rep. 2017;15:1154–208. https://doi.org/10.11124/JBISRIR-2016-003018.
Rockwood K, Song X, MacKnight C, Bergman H, Hogan DB, McDowell I, et al. A global clinical measure of fitness and frailty in elderly people. Cmaj. 2005;173:489–95.
Lewis ET, Dent E, Alkhouri H, Kellett J, Williamson M, Asha S, et al. Which frailty scale for patients admitted via emergency department? A cohort study. Arch Gerontol Geriatr. 2019;80:104–14. https://doi.org/10.1016/j.archger.2018.11.002.
Dalhousie University. Clinical Frailty Scale. 2020. https://www.dal.ca/sites/gmr/our-tools/clinical-frailty-scale.html. Accessed 15 May 2020.
Kaeppeli T, Rueegg M, Dreher-Hummel T, Brabrand M, Kabell-Nissen S, Carpenter CR, et al. Validation of the clinical frailty scale for prediction of thirty-day mortality in the emergency department. Ann Emerg Med. 2020;76:291–300. https://doi.org/10.1016/j.annemergmed.2020.03.028.
Wallis SJ, Wall J, Biram RWS, Romero-Ortuno R. Association of the clinical frailty scale with hospital outcomes. QJM. 2015;108:943–9. https://doi.org/10.1093/qjmed/hcv066.
Wharton C, King E, MacDuff A. Frailty is associated with adverse outcome from in-hospital cardiopulmonary resuscitation. Resuscitation. 2019;143:208–11. https://doi.org/10.1016/j.resuscitation.2019.07.021.
Montgomery CL, Zuege DJ, Rolfson DB, Opgenorth D, Hudson D, Stelfox HT, et al. Implementation of population-level screening for frailty among patients admitted to adult intensive care in Alberta, Canada. Can J Anesth. 2019;66:1310–9. https://doi.org/10.1007/s12630-019-01414-8.
Guidet B, de Lange DW, Boumendil A, Leaver S, Watson X, Boulanger C, et al. The contribution of frailty, cognition, activity of daily life and comorbidities on outcome in acutely admitted patients over 80 years in European ICUs: the VIP2 study. Intensive Care Med. 2020;46:57–69. https://doi.org/10.1007/s00134-019-05853-1.
Wild D, Grove A, Martin M, Eremenco S, McElroy S, Verjee-Lorenz A, et al. Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: report of the ISPOR task force for translation and cultural adaptation. Value Heal. 2005;8:94–104. https://doi.org/10.1111/j.1524-4733.2005.04054.x.
Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L, et al. The REDCap consortium: Building an international community of software platform partners. J Biomed Inf. 2019;95:103208. https://doi.org/10.1016/j.jbi.2019.103208.
Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inf. 2009;42:377–81. https://doi.org/10.1016/j.jbi.2008.08.010.
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–8. https://doi.org/10.1037//0033-2909.86.2.420.
MedCalc Software Ltd. MedCalc. 2020. https://www.medcalc.org/index.php. Accessed 15 May 2020.
Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6:284–90.
Baltagi BH. Econometric analysis of panel data. Fourth ed. New York: Wiley; 2008.
Abraham P, Courvoisier DS, Annweiler C, Lenoir C, Millien T, Dalmaz F, et al. Validation of the clinical frailty score (CFS) in French language. BMC Geriatr. 2019;19:322. https://doi.org/10.1186/s12877-019-1315-8.
R Core Team. R: A Language and Environment for Statistical Computing. 2019. www.R-project.org. Accessed 15 May 2020.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. https://ggplot2.tidyverse.org. Accessed 15 May 2020.
Belloni G, Cesari M. Frailty and Intrinsic Capacity: Two Distinct but Related Constructs. Front Med. 2019;6:133. https://doi.org/10.3389/fmed.2019.00133.
Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ, Hrobjartsson A, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64:96–106. https://doi.org/10.1016/j.jclinepi.2010.03.002.
Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356:i6460. https://doi.org/10.1136/bmj.i6460.
Kvist J, Greve B. Has the Nordic welfare model been transformed? Soc Policy Adm. 2011;45:146–60. https://doi.org/10.1111/j.1467-9515.2010.00761.x.
Lyttkens CH, Christiansen T, Häkkinen U, Kaarboe O, Sutton M, Welander A. The core of the Nordic health care system is not empty. Nord J Heal Econ. 2016;4:7–27.
Verbeek-Oudijk D, Woittiez I, Eggimk E, Putman L, Association TG. Who cares in Europe? A comparison of log-term care for the over-50a in sixteen European countries. Geneva; 2014. https://www.genevaassociation.org/sites/default/files/research-topics-document-type/pdf_public//ga2014-health31-verbeek-oudijkwoittiezegginkputman.pdf. Accessed 15 May 2020.
National Committee on Health Research Ethics. What to notify? 2020;2020 15th of May. https://en.nvk.dk/how-to-notify/what-to-notify. Accessed 15 May 2020.
The authors acknowledge copyright holder and developer of the source CFS, Kenneth Rockwood, for allowing translation and validation and Olga Theou for providing feedback on the translated instrument. For forward translation, the authors acknowledge Jessica Joan Williams. For participating in the cognitive debriefing and validation of the CFS-DK, the authors acknowledge Jens Vestergaard, Katja Thomsen, Lars Matzen, and Susan Feldborg. For proofreading, the authors acknowledge Christina Boesen Kristensen. Finally, the authors also acknowledge David Hass from the Open Patient data Explorative Network (OPEN), Odense University Hospital, Region of Southern Denmark for assistance in data management.
This study was supported by funding from the University of Southern Denmark (SKN), Region of Southern Denmark (SKN, AF), and Innovation Fund Denmark (AF). The funding bodies had no influence on study design, data collection, interpretation, or decision on publication.
Ethics approval and consent to participate
In accordance with the guidelines from the Danish National Ethics Committee, the study was not subject to notification . Health care professionals performing the validation of the CFS-DK gave their written consent to participate. The study was approved by the Danish Data Protection Agency (rec. nr. 20/16898).
Consent for publication
No images, details or videos in this study relate to an individual person. Pictures accompanying case vignettes were licenced from Colourbox.dk.
The authors declare that they have no competing interests.
About this article
Cite this article
Nissen, S.K., Fournaise, A., Lauridsen, J.T. et al. Cross-sectoral inter-rater reliability of the clinical frailty scale – a Danish translation and validation study. BMC Geriatr 20, 443 (2020). https://doi.org/10.1186/s12877-020-01850-y