High inter-rater reliability of Japanese bedriddenness ranks and cognitive function scores: a hospital-based prospective observational study

Background The statistical validities of the official Japanese classifications of activities of daily living (ADLs), including bedriddenness ranks (BR) and cognitive function scores (CFS), have yet to be assessed. To this aim, we evaluated the ability of BR and CFS to assess ADLs using inter-rater reliability and criterion-related validity. Methods New inpatients aged ≥75 years were enrolled in this hospital-based prospective observational study. BR and CFS were assessed once by an attending nurse, and then by a social worker/medical clerk. We evaluated inter-rater reliability between different professions by calculating the concordance rate, kappa coefficient, Cronbach’s α, and intraclass correlation coefficient. We also estimated the relationship of the Barthel Index and Katz Index with the BR and CFS using Spearman’s correlation coefficients. Results For the 271 patients enrolled, BR at the first assessment revealed 66 normal, 10 of J1, 15 of J2, 18 of A1, 31 of A2, 37 of B1, 35 of B2, 22 of C1, and 32 of C2. The concordance rate between the two BR assessments was 68.6%, with a kappa coefficient of 0.61, Cronbach’s α of 0.91, and an intraclass correlation coefficient of 0.83, thus showing good inter-rater reliability. BR was negatively correlated with the Barthel Index (r = − 0.848, p < 0.001) and Katz Index (r = − 0.820, p < 0.001), showing justifiable criterion-related validity. Meanwhile, CFS at the first assessment revealed 92 normal, 47 of 1, 19 of 2a, 30 of 2b, 60 of 3a, 8 of 3b, 8 of 4, and 0 of M. The concordance rate between the two CFS assessments was 70.1%, with a kappa coefficient of 0.62, Cronbach’s α of 0.87, and an intraclass correlation coefficient 0.78, thus also showing good inter-rater reliability. CFS was negatively correlated with the Barthel Index (r = − 0.667, p < 0.001) and Katz Index (r = − 0.661, p < 0.001), showing justifiable criterion-related validity. Conclusions BR and CFS could be reliable and easy-to-use grading scales of ADLs in acute clinical practice or large-scale screening, with high inter-rater reliabilities among different professions and significant correlations with well-established, though complicated to use, instruments to assess ADLs. Trial registration UMIN000041051 (2020/7/10). Supplementary Information The online version contains supplementary material available at 10.1186/s12877-021-02108-x.


Background
In the early 1990s, the Ministry of Health, Labour and Welfare (MHLW) released classifications for activities of daily living (ADLs), consisting of bedriddenness ranks and cognitive function scores [1]. These classifications have been widely used under the Japanese Health Insurance and Nursing-care Insurance systems, by attending physicians of elderly patients to assess them for nursing care insurance eligibility [2], by certified evaluators of long-term care to assess persistent disabilities of patients in their own home, or by attending nurses to determine the level of hospital care or discharge support required by elderly inpatients [3][4][5]. Bedriddenness ranks assess the degree of bedriddenness, which is classified into nine grades using four classification steps (S1, Fig. A) [5]. Cognitive function scores are used to evaluate cognitive impairment, which is classified into eight grades using five classification steps (S1, Fig. B) [5]. In aged societies with limited medical and nursing care resources, such as Japan, easily obtainable bedriddenness ranks and cognitive function scores could be used as comprehensive ADL indicators to screen at-risk older people for their daily living assistance requirements. Furthermore, bedriddenness ranks and cognitive function scores could be used as common tools between hospital or senior care home settings and home or local community settings of patients. While it would be ideal to gather information about specific basic ADLs of all older patients, many are left unassessed due to the overwhelmingly large number of requirements needed to even qualify for an assessment in Japan, which is globally at the forefront of our aged society. Other developed countries with an aged society will certainly experience the same situation as Japan in the near future. These classifications are already used to evaluate ADLs in older patients in common settings, such as hospitals, nursing-care facilities, and local communities, to provide patients with consecutive, long-term care in Japan.
Other than the MHLW bedriddenness ranks and cognitive function scores, several standard and established scores of ADLs and impairment of cognitive function MHLW have been reported. These include the Barthel Index (BI), which classifies the abilities of 10 basic ADLs into two to four grades [6], the Katz Index (KI), which classifies six basic ADLs into two grades (an independent or care-requiring condition), and the Mini-Mental State Examination (MMSE), which evaluates 11 items (e.g. naming objects, calculation, and manual dexterity while drawing). The BI, KI, and MMSE are well-established measures; however, they are complex and can be difficult to implement. Indeed, despite their high inter-rater reliability [7,8], they are complicated and time-consuming to performparticularly the BI and MMSE [9,10] which makes these measures difficult to use in an acute care hospital setting or to evaluate many subjects, such as mass screening [3][4][5]11]. On the other hand, bedriddenness ranks and cognitive function scores are extremely simple and easy to implement, and could therefore be more suited to such situations [3][4][5]11]. Despite their routine use in daily clinical practice and in research settings in Japan [3][4][5]11], the precision and validation of bedriddenness ranks and cognitive function scores have yet to be established.
We herein report the usefulness of the MHLW bedriddenness ranks and cognitive function scores as tools to assess ADLs. For this, we examined inter-rater reliability and criterion-related validity in a prospective investigation of inpatients at an acute care hospital in a suburban city in Japan.

Study design and patients
This study was a hospital-based prospective observational study. The subjects were inpatients aged 75 years or older at Yuaikai Oda Hospital (S2, Appendix.), an acute care hospital in a suburban city in Japan, who were admitted from November 2017 to September 2019. The following patients were excluded: those who did not consent to participate in the study, those whose hospital stay was shorter than 24 h, those who were in a serious or possibly fatal condition, and/or those with miscellaneous conditions that made evaluation impossible. A serious or possibly fatal condition was defined as having a disturbance of consciousness equal to or more severe than III-100 of the Japan Coma Scale within 72 h of admission, meaning that they were unable to open their eyes or take action to avoid painful stimuli [12], had poor vital signs, with a shock index > 1 (i.e., pulse rate divided by systolic blood pressure > 1) [13], showed percutaneous oxygen saturation < 90%, and required administration of 8 L/min or more of oxygen.

Data and data sources
The variables we checked from the medical records on admission were age, sex (male or female), ambulance transfer (presence or absence), admission with a referral letter from a primary physician (presence or absence), duration of hospitalization (days), department of admission (Internal Medicine, Surgery, Cardiology, Dermatology, Otorhinolaryngology, Neurosurgery), MHLW bedriddenness rank and cognitive function score, emergency admission (presence or absence), place of abode (home, nursing home, or other place), basic ADLs (eating, moving, personal maintenance, going to the toilet, bathing, walking, going up and down the stairs, dressing, defecation, and urination; independently or not), and diseases causing hospitalization (according to the International Classification of Diseases, 10th edition: ICD-10). Furthermore, we calculated the BI and KI using data on basic ADLs that were routinely and systematically assessed in each patient on admission by an attending nurse.
Bedriddenness ranks were classified into five grades, which were further divided into nine grades, as follows: normal, J (J1, J2), A (A1, A2), B (B1, B2), and C (C1, C2) (S1, Fig. A). Cognitive function scores were classified into six grades, which were further divided into eight grades, as follows: normal, 1, 2 (2a, 2b), 3 (3a, 3b), 4, and M (S1, Fig. B) [5,14]. In this study, MHLW bedriddenness ranks and cognitive function scores were checked twice; at Oda hospital, bedriddenness ranks and cognitive function scores of all inpatients aged 75 years or older are routinely checked by an attending nurse within 24 h after admission, which are subsequently confirmed by another nurse in charge of discharge support care, with results recorded in the medical chart both times. The results of this routine check of bedriddenness ranks and cognitive function scores were derived from the medical charts for Assessment 1. For Assessment 2, these results were checked independently within 72 h after admission by a medical social worker or a medical clerk of the hospital, i.e. a person of a medical profession other than a nurse, who was blinded to the results of Assessment 1. The flowchart shown by in supplemental Figure 1 (S1, Fig.) shows the scoring procedure for both Assessment 1 and 2. Although the criterion of Bedriddenness rank J is "A person who has mild disorders, but who is almost independent in activities of daily living and can go out using public transport (J1) or around the neighborhood (J2)", bedriddenness ranks were evaluated at the patients' bedside in the hospital, for both Assessment 1 and 2. These evaluations were made according to information obtained during interviews with the patient or family members, or observations about the patient's condition. For Assessment 1, a referral letter or nursing records were also used in these evaluations. All variables other than the results of Assessment 2 were collected from medical charts.

Statistical analysis
The background characteristics of all the eligible patients were analyzed according to the grade of bedriddenness ranks or cognitive function scores. Categorical background characteristic variables are presented as the numbers of patients and percentages, and continuous background characteristic variables as the median and quartile ranges. Categorical variables were analyzed using the Chi-square test, and continuous variables were analyzed using an analysis of variance and F-test.
After excluding patients with missing data in Assessment 2, the concordance rate, kappa coefficient, Cronbach's α, and intraclass correlation coefficient between Assessment 1 and 2 for both the bedriddenness ranks and cognitive function scores were calculated. The correlations of bedriddenness ranks/cognitive function scores with BI and KI were analyzed using Spearman's rank correlation coefficient.
Statistical significance was set at p < 0.05. IBM SPSS Statistics (version 25, IBM, Armonk, New York, USA) was used for the statistical analyses.

Ethical considerations
This study conforms to Ethical Guidelines for Medical and Health Research Involving Human Subjects issued by the Japanese MHLW and the Ministry of Education, Culture, Sports, Science, and Technology. This study was approved by the research ethics committee of the Yuai-kai Foundation and Oda Hospital (No. 20171201). We obtained consent from each patient individually, and anonymity of the patients was protected.

Enrollment and background characteristics of patients
During the study period, 3222 inpatients aged 75 years or older were admitted, 2906 of whom were excluded according to the exclusion criteria. After excluding patients with missing data, 271 patients were eligible to be enrolled (Fig. 1). Table 1 8 4, and 0 M. More detailed analyses of background characteristics according to bedriddenness ranks or cognitive function scores showed differences according to age, sex, emergency admission, pre-and post-hospitalization living locations, and duration of hospitalization (Tables 2 and 3). The diseases causing hospitalization consisted of 15 conditions among the major categories, and 144 diseases among the subcategories of the ICD-10. The condition with the highest incidence, 63 cases, among the major categories was cardiovascular diseases, while the disease with the highest incidence, 22 cases, among the subcategories was congestive heart failure.

Inter-rater reliability of the MHLW bedriddenness ranks
The concordance rate of bedriddenness ranks between Assessment 1 and 2 was 68.6%, kappa coefficient 0.61, Cronbach's α 0.91, and intraclass correlation coefficient 0.83 (95% confidence interval (CI): 0.79-0.86). The concordance rates of Assessment 1 and 2 in each grade of bedriddenness ranks in descending order were 87.5% for C2, 75.8% for Normal, 73.0% for B1, 71.0% for A2, 68.2% for C1, 65.7% for B2, and less than 50% for J1, J2, and A1 (Table 4). There was  Categorical variables are presented as the number of patients and percentages, and continuous variables as the median and quartile range no significant difference in concordance rates of bedriddenness ranks between Assessment 1 and 2 among the major categories of the ICD-10 (Chi-square test, χ 2 = 11.1, p = 0.681, S3, Table.).

Inter-rater reliability of the MHLW cognitive function scores
The concordance rate of cognitive function scores between Assessment 1 and 2 was 70.1%, kappa coefficient 0.62, Cronbach's α 0.87, and intraclass correlation coefficient 0.78 (95% CI: 0.73-0.82). No patients were classified as having a cognitive function score M in Assessment 1. The concordance rates of Assessment 1 and 2 in each grade of cognitive function scores in descending order were 87.5% for 3b, 85.9% for Normal, 75% for 3a, 75% for 4, 63.8% for 1, and less than 50% for 2a and 2b (Table 5). There was no significant difference in concordance rates of cognitive function scores between Assessment 1 and 2 among the major categories of the ICD-10 (Chi-square test, χ 2 = 14.5, p = 0.415, S3, Table.).

Discussion
A good inter-rater reliability of the MHLW bedriddenness ranks was found in this study. The BI and KI, which are well established scales to assess ADLs, have been reported to have good inter-rater reliabilities, as indicated by a Pearson's correlation coefficient of 0.9 and by an intraclass correlation coefficient of 1.00 [7]. However, the BI takes a long time to implement due to its complexity [10], and the KI is unreliable when examining patients who require assistance for almost all of their basic ADLs [15], both of which are significant disadvantages. In comparison, the simplicity and high reliability of the MHLW bedriddenness ranks could means that this is a better scale to assess ADLs in contexts such as nursing homes or mass screenings, where many elderly patients require care for daily life, or busy clinical settings such as acute care hospitals. Additionally, bedriddenness ranks could provide us with more detailed information on patients who are bedridden by dividing them into two categories, B (Chair-bound) and C (Bed-bound) [16,17]. However, we should note that there were relatively lower concordance rates of patients belonging to bedriddenness ranks J1, J2, and A1 between Assessment 1 and 2 than those of patients belonging to other categories. In Assessment 1, which was performed by attending nurses, information was also available from bedside observation of patients or from patients' family members; however, in Assessment 2, which was performed by a medical social worker or medical clerk, the source of information was limited to a brief interview with patients. This highlights the need to gain information not only from patients themselves when using these easy-to-use, reliable scales, but also other sources such as from their family members, especially for patients whose conditions fluctuate according to the day and time of day. The MHLW cognitive function scores also had an excellent inter-rater reliability, MHLW as indicated by a kappa coefficient of 0.62, Cronbach's α of 0.87, and intraclass correlation coefficient 0.78. The MMSE [18] and ABC Dementia Scale [19] are existing measures of cognitive function that have been reported to have good inter-rater reliabilities, as shown by a kappa coefficient of 0.98 and weighted kappa coefficient of 0.75, respectively [7]. However, the MMSE consists of 11 items, including drawing pictures [7], and the ABC Dementia Scale of 13 items [19], which makes these measures cumbersome to use in extremely busy clinical settings in Japan, such as acute care hospitals or for mass screenings [5]. In contrast, cognitive function scores are simpler and easier-to-use, while also having sufficient reliability for use in mass screening or busy clinical

(14%) -
Death within 30 days after discharge 1 (1%) 2 (5%) 1 (6%) 0 (0%) 7 (14%) 0 (0%) 0 (0%) 0 (0%) 1 (14%) 0.079 Categorical variables are presented as the number of patients and percentages, and continuous variables as the median and quartile range a For continuous and categorical variables, between-group comparisons were made using the F-test for analysis of variance and the Chi-square test, respectively b This category has no quartile range settings, despite their slightly lower inter-rater reliability than the MMSE and ABC Dementia Scale. However, we should also be aware of the relatively lower concordance rates of patients belonging to cognitive function score categories of 2a and 2b between Assessment 1 and 2 than those of patients belonging to other categories, similarly to the case of bedriddenness ranks. Making judgements about whether patients are independent (a cognitive function score 1 or less) or require nursing care (a cognitive function score 3 or more) is straightforward due to the apparentness and unambiguousness of these scores. However, making judgements about patients with a score of 2, who could be somewhat independent under someone's careful observation, despite certain cognitive impairments or difficulties in communication, could rely on subjective assessments, making such judgments less decisive. This characteristic of a cognitive function score of 2 meant that we relied on information gained not only from an interview with patients themselves, but also from interviews with their family members or bedside observations, to better ensure that attending nurses of Assessment 1 made the correct judgment. This also suggests that we should source information not only from patients themselves, but also from their family members or careful bedside observations, for both the cognitive function scores and bedriddenness ranks. The MHLW bedriddenness ranks can be easier-to-use alternatives to the KI and BI. Furthermore, bedriddenness ranks were correlated with the BI and KI, which  The cognitive function score, routinely checked by a nurse, was derived from the medical charts for Assessment 1. In Assessment 2, the cognitive function score was checked independently within 72 h after admission, without knowing the results of Assessment 1, by a medical social worker or a medical clerk, i.e. a person of medical profession other than a nurse The bedriddenness rank, routinely checked by a nurse, was derived from the medical charts for Assessment 1. In Assessment 2, bedriddenness was checked independently within 72 h after admission, and without knowledge of the Assessment 1 results, by a medical social worker or a medical clerk, i.e. a person of a medical profession other than a nurse suggests that the various outcomes related to the BI and KI may also be related to the bedriddenness ranks [20][21][22]. Additionally, low bedriddenness ranks were significantly related to many variables in the present study and pressure ulcers, decreased ADL, malnutrition [3], oral mucosal epithelial detachment [4], and fall-related injuries in the previous studies [5]. We found that bedriddenness ranks, which generally provide a more approximate classification of ADLs than the BI or KI, were correlated with the BI and KI. However, for patients who were almost independent, with a BI of ≥80 or a KI of ≥5, the bedriddenness ranks offered a more detailed assessment of ADLs than did the BI or KI by the finer classifications of normal, J1, J2, and A1 (normal to house-bound) (Fig.  2). In a similar way, in completely bedridden patients with a KI of 0, the bedriddenness ranks gave a more detailed assessment by the finer classifications of B2 to C2.
The MHLW cognitive function scores were also significantly correlated with the BI and KI, and were related to age, sex, and location of livelihood. We did not compare cognitive function scores with other established scales of cognition, such as the MMSE, because not all inpatients completed the assessment of those established scales in this study. Thus, cognitive function scores should be compared with the standard scales of cognitive function in future research.

Limitations
In the present study, we excluded 2906 patients due to a lack of informed consent after failing to explain the research protocol to patients themselves or their family members within 72 h after admission. This was partly due to a shortage of human resources, as well as patients' refusal to provide informed consent. However, the Spearman's rank correlation coefficient between the MHLW bedriddenness ranks and the BI was −0.848, p < 0.001 (a), and between the bedriddenness ranks and the KI this was − 0.820, p < 0.001 (b) characteristics of included subjects were similar to those of the overall inpatient group (S4, Table), and so we believe that this exclusion had negligible effects on the present results. The MHLW bedriddenness ranks and cognition function scores found in this study were different to those of elderly people living in care facilities [23] and those living in the local community [24]; this could be a result of selection bias, which could have resulted in an inter-rater reliability that differed from other groups of patients with different background characteristics. Furthermore, the concordance rates within each grade of bedriddenness ranks and cognitive function scores varied between Assessments 1 and 2, with low concordance rates, albeit only slightly.

Conclusions
The MHLW bedriddenness ranks and cognitive function scores are official Japanese classifications of ADLs. Our results indicate that the implementation of these ADL grading scales are more practical than existing ones, especially in busy clinical practice or large-scale screening. Furthermore, they had a high inter-rater reliability, even among assessors with different professions, and their grades were significantly correlated with those of wellestablished scales to assess ADLs.