Comparing the Functional Independence Measure and the interRAI/MDS for use in the functional assessment of older adults: a review of the literature

Background The rehabilitation of older persons is often complicated by increased frailty and medical complexity - these in turn present challenges for the development of health information systems. Objective investigation and comparison of the effectiveness of geriatric rehabilitation services requires information systems that are comprehensive, reliable, valid, and sensitive to clinically relevant changes in older persons. The Functional Independence Measure is widely used in rehabilitation settings - in Canada this is used as the central component of the National Rehabilitation Reporting System of the Canadian Institute of Health Information. An alternative system has been developed by the interRAI consortium. We conducted a literature review to compare the development and measurement properties of these two systems. Methods English language literature published between 1983 (initial development of the FIM) and 2008 was searched using Medline and CINAHL databases, and the reference lists of retrieved articles. Relevant articles were summarized and charted using the criteria proposed by Streiner. Additionally, attention was paid to the ability of the two systems to address issues particularly relevant to older rehabilitation clients, such as medical complexity, comorbidity, and responsiveness to small but clinically meaningful improvements. Results In total, 66 articles were found that met the inclusion criteria. The majority of FIM articles studied inpatient rehabilitation settings; while the majority of interRAI/MDS articles focused on nursing home settings. There is evidence supporting the reliability of both instruments. There were few articles that investigated the construct validity of the interRAI/MDS. Conclusion Additional psychometric research is needed on both the FIM and MDS, especially with regard to their use in different settings and with different client groups.


Background
Measurement and reporting health outcomes have become an essential component guiding the development and evolution of health care systems. As the focus of health care changes to adapt to the aging population, aggregate data from health assessment systems can be used to inform policy decisions regarding service use and best practices [1]. One health care setting that serves a primarily older clientele is post acute rehabilitation [2]. There is a substantial need for accurate assessment in this population as it can have significant implications for older patients' care planning and future quality of life [3]. Despite some encouraging research in this area [4][5][6], there is limited data that focus on measuring rehabilitation outcomes in older adults [7]. One major challenge is that the performance of currently available assessment systems is not well understood in this population.
Development of valid and reliable outcome measures for use with older adults is complicated by frailty, comorbidity, and heterogeneity in this population. Geriatric patients are different from their younger counterparts as they tend to have lower functional status on admission and higher clinical complexity due to multicausal disability and intercurrent medical conditions [4,8,9]. Older adults are an extremely diverse population and represent a wide range of physical and cognitive abilities [2]. Individualized measures, such as Goal Attainment Scaling, have been suggested as a possible approach to address this heterogeneity [10], however such measures present challenges for the development of a consistent database of client information. Wells and colleagues [8] recommend that standardized tools should be used for diagnosis, assessment, and outcome measurement in geriatric rehabilitation. Instruments that are designed for younger, healthier, and more homogenous groups are unlikely to have the same psychometric properties with older adults [2] and additional research is required specifically related to the performance of assessment tools and outcome measures in older populations of rehabilitation patients.
Both the Functional Independence Measure (FIM) [11] and the interRAI/Minimum Data Set (MDS) [12,13] are instruments designed to measure functional ability, and both have been used widely with older persons and are mandated in multiple health care settings. Specific components of these instruments collect parallel information and items on both the FIM and the MDS can be used to predict total scores on the other tool [14,15].
In spite of their similarities, the range of content coverage, item definitions, scoring, and psychometrics are not identical for both tools, which prevents direct translation of scores from one instrument to another [16,17]. Comparative information on their psychometric properties would be helpful in assessing the relative merits and potential applications of the two instruments. The purpose of this investigation was to examine previously published research on the measurement properties of these tools for use in populations of older adults.

Functional Independence Measure (FIM)
The FIM was developed in 1983 by a task force created by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation headed by Carl Granger and Byron Hamilton [11]. To generate items, this group conducted a literature review of 36 existing functional performance measures [8]. The final instrument was based on the Barthel Index [18,19], which has been in use since the 1950s [20]. The FIM was designed to measure physical and cognitive disability and focuses on burden of care [11]. The main objective in its development was to create a generic measure that could be administered by clinicians and non-clinicians to assess patients in all age groups with a wide variety of diagnoses [11]. The FIM contains a total of 18 items. Thirteen of these items constitute the motor subscale and the remaining five items form the cognitive subscale [21]. The motor subscale collects information involving self care, sphincter control, transfer, and locomotion, and the cognitive subscale focuses on communication and social cognition. All items are scored using a seven-point ordinal scale that is based on the amount of assistance that is required for the patient to perform each activity [21]. Higher scores on the FIM denote patients that have a higher level of independence and require a small amount of assistance [21]. The sum of all 18 items gives the patient's total score, which ranges from 18-126 [21]. The FIM is the major source of functional status data in the National Rehabilitation Reporting System (NRS) of the Canadian Institute for Health Information [22].

interRAI/Minimum Data Set (MDS)
interRAI is an international research consortium that develops comprehensive assessment tools that are principally intended for older adult populations [13,23]. These Resident Assessment Instruments (RAIs) are used internationally in a wide variety of health care settings for a large number of applications including care planning, outcome measurement, and quality indicators [24]. Currently, there are 12 RAI tools designed for use in rehabilitation, long term care, home care, and other settings across the health care continuum [23]. The instruments consist of over 300 items encompassing a large array of patient characteristics including functional status, admission history, medical conditions and other information [24]. Initially these items were generated by reviewing previous literature on over 60 assessment instruments [25]. The final sets of items were selected based on extensive clinical deliberations and an iterative review process mainly focused on interrater reliability and clinical relevance [25]. All of the tools contain a proportion of common items that are intended to facilitate communication across multiple health care settings [26,13]. Each individual tool also includes specialized items exclusive to that setting [26]. The instrument specifically designed for use in rehabilitation is the interRAI Post Acute Care [27,23]. Physical functioning is measured by a range of activities of daily living (ADL) items that can be summed to form several ordinal ADL scales [28]. These items were designed to measure activities across a wide range of functional independence levels to enable the detection of functional changes in individuals with both high and low levels of functioning [28]. Each item is scored on the basis of the amount of assistance required for performance, with higher scores indicating greater dependence [28]. The scales were developed based on exploratory factor analyses and hypothesis testing to arrange the ADL items hierarchically in relation to loss of functioning. Currently there has been no consensus on a single standard ADL subscale for the interRAI instruments [28-32].
Cognitive functioning can be estimated using the interRAI instruments in two ways -the 5-item Cognitive Performance Scale [CPS; [33]] or the 11-item MDS Cognition Scale (MDS-COGS) [34]. Both scales are ordinal with the CPS ranging from 0 (intact) to 6 (very severe impairment) and the MDS-COGS ranging from 0 (cognitively intact) to 10 (very severe impairment). These scales were both developed based on their correlation with and ability to predict scores of existing cognition scales, including the

Criteria for considering studies in this review
All relevant English language articles that were published between January 1983 (the initial year of development for the FIM) and June 2008 were included in this review. The following inclusion and exclusion criteria were established to determine article relevance:

Inclusion criteria
1) The study population included older adults (55+) 2) The main focus of the article was on some aspect related to the development and/or measurement properties of the FIM and/or MDS instruments

Exclusion criteria
1) The article focused on child, adolescent, and/or young adult populations 2) The article did not contain original data, statistical analyses, and/or results 3) The article was a review of previously published work 4) The article solely focused on patients with spinal cord injuries and/or traumatic brain injuries 5) The article was focused on reports of experimental versions of the FIM and/or MDS or reported assessments of the properties of additional items or short forms that are not currently used in clinical practice 6) The instruments were used in the study as an intervention (e.g. instrument used to test the effects of a comprehensive assessment on patient outcomes) 7) The article did not relate to MDS items or subscales that are comparable to FIM items

Search methods for identification of studies Electronic searches
Published material was identified using the MEDLINE and CINAHL databases using the following search strategy: 5) S1 AND S3 AND S4

Manual searches
The reference lists of the retrieved articles were examined for additional relevant papers.

Data collection and analysis
Guided by the inclusion and exclusion criteria, the first author eliminated irrelevant articles based on the title of the publication and the content of its abstract. All potentially relevant articles were retrieved and reviewed. Any article that was retrieved but was later found to be potentially irrelevant was reviewed by the second author. When the relevance was questionable, the two authors discussed the paper to arrive at a final conclusion. For each of the selected articles, information was gathered and charted according to the reliability and validity criteria proposed by Streiner [38].

Results
The initial keyword search identified 944 articles, of which 850 were excluded based on review of the title and abstract. Ten additional articles were identified by handsearching the reference lists of articles obtained in the initial search. Of the 94 articles retrieved for further review, 24 were excluded based on relevance and 12 were excluded as they were reviews of previously published works ( Figure 1).
A nearly equal number of FIM articles investigated internal consistency and interrater reliability, while most MDS articles focused on interrater reliability. For both instruments, few articles investigated intrarater reliability. Four of the FIM articles focused on inpatient rehabilitation populations and five studied community residents mostly receiving home care. A large majority of MDS articles focused on nursing home residents and no articles were found that solely focused on inpatient rehabilitation. Clinicians were commonly used as raters for both instruments; three FIM and two MDS articles used researchers to assess the participants.
Internal consistency was high for the FIM total score (α = 0. 88 During the development of MDS instruments, unreliable items were progressively eliminated resulting in increasing reliability estimates over time [25,81]. Five articles investigated the internal consistency of functional status related outcome measures in the MDS. In all five studies, the researchers concluded that the scales(s) investigated was(were) internally consistent. However, because many of the characteristics -including subjects, setting, and raters -are different between the studies, and reliability is dependent on such variations [94], it is not currently possible to develop generalizations across these articles about patterns in consistency. Zimmerman and colleagues [87] were the only group to investigate the intrarater reliability of an MDS subscale. They found that the relative amount of within and between rater error changed for the MDS-COGs depending on which cut-point was used. High interrater reliability has been repeatedly shown for MDS items in nursing home settings (Individual items r = 0.75-0.99, κ = 0.56-0.84, wκ = 0.33-1.0). Many of these studies investigated the reliability of MDS items in isolation and did not assess the reliability of summative scales within the instrument. Across all types of reliability, when summative scales were investigated, there was a lack of consistency in the MDS items used to form cognitive and ADL subscales.
Sixty-one of the articles in the sample investigated the validity of the instruments. The FIM and the MDS were independently discussed in 41 and 20 articles respectively. This difference was mainly due to the notably larger proportion of FIM articles that focused on construct validity. Eight articles investigated the responsiveness of the FIM and only three articles investigated the responsiveness of the MDS -there was considerably more evidence supporting the responsiveness of the FIM than the MDS. The majority of FIM articles focused on inpatient rehabilitation and the remaining studied populations in a variety of health care settings including home care, neurorehabilitation, nursing homes, and acute care. Almost three quarters of the MDS articles investigated the validity of the tool in nursing home residents; no articles exclusively focused on patients in rehabilitation settings. Figure 1 Results of search strategy.  66,67,69,70]. These articles consistently found evidence of DIF between impairment groups; however, they disagreed on its clinical relevance. Eight articles investigated the responsiveness of the FIM; most estimated clinically relevant change using effect size and standardized response mean statistics. All of these articles focused on patients in neurorehabilitation or inpatient rehabilitation settings and consistently found that the FIM total, FIM motor, and FIM motor subscales are responsive and the FIM cognitive and FIM cognitive subscales are not responsive in this population [41,44,52,57,60,74,76,77]. The FIM was also found to be as responsive as other functional assessment instruments used in inpatient rehabilitation including the BI.

Results of search strategy
Similar to the FIM, one article formally assessed the face validity of the MDS [25]. In a nursing home setting, fol- Of the four articles that focused on construct validity, one investigated the structure of the MDS using a confirmatory factor analysis [79]. They found that the factor structure was different in groups of nursing home clients depending on their level of cognitive impairment [79].

Discussion
The purpose of this review was to accumulate and synthesize past research focusing on the reliability and validity of the FIM and the MDS for use with older adults. To our knowledge, there have been no publications to date that have systematically reviewed and compared evidence of the psychometric properties of both tools. It is important for functional status outcome measures to be validated for use with older adults because this group of individuals represents a substantial proportion of the population being assessed with these instruments. Also, it is unlikely that the measurement properties of assessment tools will be consistent between the older and younger populations [2,38,94].
For both the FIM and the MDS, the majority of articles used samples from the same type of health care setting. Over half of the FIM studies were conducted in inpatient rehabilitation settings and almost two-thirds of the MDS articles were conducted with nursing home residents. Also, as MDS instruments are composed of similar items, psychometric data for a single MDS instrument, usually the MDS 2.0, were often extrapolated to other MDS instruments. This may not be appropriate as reliability and validity estimates are dependent on variation in the sample on which the instrument was tested [94]  For both the FIM and the MDS, few articles were located that investigated intrarater reliability. Traditionally, it is more practical and economical to assess interrater reliability as it includes more sources of error: the raters are different and the participant being assessed may have changed over the testing period [38]. As a result, intrarater reliability is necessary but not sufficient for interrater reliability. However, intrarater reliability can be used to further investigate the source of low interrater reliability. For example, if an instrument has low interrater reliability and high intrarater reliability it may mean that the raters have been trained inadequately, resulting in inconsistent evaluations [38]. Daving and colleagues [40] used clinicians to investigate the reliability of the FIM in community residents. They found that the reliability ranged from poor to excellent where the least reliable assessments were completed at different times by different raters. As the interrater reliability of the FIM was generally high in other settings, an intrarater reliability study should be conducted to determine if clinicians assessing community residents are the source of this inconsistency. For both of the articles that investigated the intrarater reliability of the FIM, the raters were not clinicians. As researchers have different background knowledge and may receive different, more intense training programs prior to conducting assessments, this may have artificially inflated the results leading to the high and more narrow range of estimates. Using researchers instead of clinician raters also limited their investigation of the source of error in the natural environment.
Streiner and Norman [94] assert that validity evidence from a series of converging experiments is superior to the results of one study. This is due to the inability of a single study to investigate definitively all aspects of an instrument's hypothetical construct; conclusions regarding the validity of an instrument may vary with the sample, setting and many other factors [38,94]. Therefore, the validity of an instrument is established by the accumulation of evidence across multiple studies. In this sample, there were twice as many studies investigating the validity of the FIM as the MDS. Both the FIM and the MDS have been repeatedly shown to correlate with commonly used assessment instruments in this area. However, because the outcome measures contained in both instruments were developed using these previously existing assessment tools [18,19,33,34] and there is no 'gold standard' instrument for measuring functional status in older adults, these investigations are not sufficient to establish the validity of either instrument. Relative to the FIM articles, the MDS articles were especially lacking in studies that focus on construct validity. There is a need for future research to investigate the construct validity of functionally related outcome measures contained in the MDS, including assessment of dimensionality, floor and ceiling effects, differential item functioning and responsiveness. Additional research is also needed on the construct validity of the FIM to investigate inconsistent findings regarding dimensionality and differential item functioning.
Determining the responsiveness of tools used to measure functional status in older adults is important because small scale changes may represent very large, clinically relevant, changes in quality of life. For example, a small change on a tool's scale can mean the difference between discharge to a long-term care facility or to home care. A number of methods have been proposed for the analysis of responsiveness [99,100]; however there is currently no consensus on a "gold standard" measure of responsiveness [99]. As a result, it is suggested that multiple measures of responsiveness be used in a single study to allow for the interpretation of patterns across different recommended statistics [101]. The methods used to measure the responsiveness of the FIM and the MDS differed widely across studies and very few studies applied more than one responsiveness statistic to the same sample. More research is needed to determine the responsiveness of the FIM and MDS.
Several limitations of this research are recognized. Although a detailed search strategy was developed to locate articles that fit the criteria for this review, it is possible that studies that did not principally focus on the psychometric properties of the MDS or the FIM could contain additional information on the reliability and validity of the tools. Also, all studies meeting the inclusion/exclusion criteria were included in the review regardless of their methodological merit. As we were aware of no prior attempt to collect and synthesize this information our aim was to be as comprehensive and inclusive as possible.
Lastly, this review did not address the accreditation or training requirements, labour or time requirements for completion, software costs, and other administrative expenses, associated with either instrument. These would clearly be relevant considerations for organizations considering adoption of one of these instruments.

Conclusion
This review assembled and compared available evidence of the reliability and validity of two major systems for the functional assessment of older adults. Overall, we found that there is evidence for the reliability of both instruments; however, the majority of FIM studies were carried out in inpatient rehabilitation settings and most of the MDS articles were conducted with nursing home residents. Before clinicians can confidently use the instruments outside of these settings, additional psychometric research is needed on both the FIM and MDS, especially with regard to their use in different settings and in different client groups. We also found that there is considerably more literature examining the validity of the FIM than is available for the MDS instruments. This supports the continued used of the FIM as a component of the NRS. Nonetheless, it is also important to consider that this analysis only included the ADL and cognition items from the MDS which contains a more comprehensive set of items that may enhance its utility. The compatibility of the interRAI instruments across multiple health care setting should also be considered before determining which tool is the most appropriate outcome measure for this population. We suggest that, in particular, more research is needed to investigate the construct validity of the outcome measures derived from the MDS instruments. Lastly, a direct "head to head comparison" of both tools in the same population would yield valuable information, especially in terms of the assessment of their responsiveness to change. Such a study could also allow for analysis (using Rasch methods, for example) that would facilitate direct statistical comparison of results obtained using the two instruments. While such analyses could theoretically lead to the development of a hybrid instrument, it is unlikely that such an instrument would gain broad acceptance given the extensive investments already made into the two systems. It is more likely that the results would facilitate better understanding of the results of each instrument by users of the other system.