Performance-based clinical tests of balance and muscle strength used in young seniors: a systematic literature review

Background Many balance and strength tests exist that have been designed for older seniors, often aged ≥70 years. To guide strategies for preventing functional decline, valid and reliable tests are needed to detect early signs of functional decline in young seniors. Currently, little is known about which tests are being used in young seniors and their methodological quality. This two-step review aims to 1) identify commonly used tests of balance and strength, and 2) evaluate their measurement properties in young seniors. Methods First, a systematic literature search was conducted in MEDLINE to identify primary studies that employed performance-based tests of balance and muscle strength, and which aspects of balance and strength these tests assess in young seniors aged 60–70. Subsequently, for tests used in ≥3 studies, a second search was performed to identify method studies evaluating their measurement properties. The quality of included method studies was evaluated using the Consensus-based Standards for selection of health Measurement Instruments (COSMIN) checklist. Results Of 3454 articles identified, 295 met the inclusion criteria. For the first objective, 69 balance and 51 muscle strength tests were identified, with variations in administration mode and outcome reporting. Twenty-six balance tests and 15 muscle strength tests were used in ≥3 studies, with proactive balance tests and functional muscle power tests used most often. For the second objective, the search revealed 1880 method studies, of which nine studies (using 5 balance tests and 1 strength test) were included for quality assessment. The Timed Up and Go test was evaluated the most (4 studies), while the Community Balance and Mobility (CBM) scale was the second most assessed test (3 studies). For strength, one study assessed the reliability of the Five times sit-to-stand. Conclusion Commonly used balance and muscle strength tests in young seniors vary greatly with regards to administration mode and outcome reporting. Few studies have evaluated measurement properties of these tests when used in young seniors. There is a need for standardisation of existing tests to improve their informative value and comparability. For measuring balance, the CBM is a new and promising tool to detect even small balance deficits in balance in young seniors. Electronic supplementary material The online version of this article (10.1186/s12877-018-1011-0) contains supplementary material, which is available to authorized users.

Balance and muscle strength tests can be used to assess and monitor individual's health over time, and predict multi-morbidity, dependence in basic activities of daily living (ADLs) and early mortality [18][19][20][21][22]. Such tests also are of substantial value in predicting future health status and functional performance in older adults [22].
Numerous performance-based clinical tests assessing balance and/or muscle strength exist. Tests of grip strength, walking speed, sit-to-stand, and standing balance are shown to be markers of both current and future health [1,[18][19][20][21]. As a result, there is an increased interest in these tests and their potential use as simple screening tools in the general population to identify people who may benefit from targeted interventions aimed at preventing functional decline [1,18,23,24].
However, in order to test balance and muscle strength adequately, it is important that the tests are sufficiently challenging since an early detection of loss of balance and muscle strength is important to prevent age-related functional decline in young seniors [25][26][27][28][29]. For young seniors, generally functioning at a higher level, it is questionable whether existing balance and muscle strength tests are sensitive enough to detect early subtle balance declines [1,23]. Balance is a complex composite of multiple body systems including the ability to align different body segments and to generate multi-joint movements to effectively control body position and movement [30]. Since balance is highly task-specific, several aspects need to be assessed which can be categorized into static steady-state balance (i.e., maintaining a steady position in sitting or standing), dynamic steady-state balance (i.e., walking), proactive balance (i.e., anticipating a predicted disturbance such as crossing or walking around an obstacle), and reactive balance (i.e., compensating for a disturbance) [30]. Recent systematic reviews of the literature on balance tests have shown that widely used assessment tools such as the Berg Balance Scale (BBS) or Short Physical Performance Battery (SPPB) show ceiling effects in community-dwelling, healthy older adults aged 60 years and over [23,31]. Ceiling effects of these instruments in higher functioning older adults will hamper the detection of early balance deficits, and thus intervention-related changes over time may not be detected [32,33]. Although some balance tests such as the Fullerton Advanced Balance (FAB) scale [34], are developed for use in higher functioning older adults, these tests typically do not include tasks that challenge balance for the specific population of healthy, higher functioning older adults [35,36].
For muscle strength, commonly used tests such as the Five times sit-to-stand (5STS) are not challenging enough in order to detect risk factors in higher functioning older adults [37]. Especially with regard to confirming the effects of an intervention, such tests have ceiling effects as most older adults can perform the test effortlessly and therefore do not show changes in performance level [37].
At present, no systematic literature review has examined which balance and muscle strength tests are used for the population of young seniors. The aim of this systematic review was to 1) identify any performance-based clinical tests used to measure balance and/or muscle strength in young seniors aged 60-70 years, and 2) evaluate the measurement properties of the most commonly used performance-based clinical balance and muscle strength tests.

Study design
The study is a two-step systematic literature review with two separate literature searches. The first step included the search and systematic review of performance-based clinical tests used for measuring balance or muscle strength in young seniors.
The second step included a search and a systematic review of methodological studies evaluating the measurement properties of performance-based clinical tests that have been used in ≥3 studies identified in step one.

Search strategy
The search in step one was performed in MEDLINE to identify relevant studies published until June 1st 2016, with an update made to identify also newer studies published until November 5th 2018 (Fig. 1). A combination of free-text and MeSH-terms was used that represents the following concepts: 'postural balance' , 'muscle strength' , 'movement' , motor activity' , 'physical exertion' , 'physical endurance' , 'exercise tolerance' , and 'physical fitness'. Additional search terms aimed to exclude animal studies, participants outside our target age group, and non-English studies (see Additional file 1). The search in step two was performed in MEDLINE and EMBASE to identify relevant method studies published until December 19th 2017, and also updated to include newer studies published until November 23rd 2018 (Fig. 2). We combined a search on the most commonly identified tests (≥3 articles) with a search on measurement properties, including validity, reliability, sensitivity, accuracy, responsiveness, and specificity (see Additional file 1).

Inclusion/exclusion criteria
In the first step, articles were included if they (1) described a performance-based clinical test that measured aspects of balance and/or muscle strength, (2) included participants with an age or mean age between 60 and 70 years, and (3) were written in English. Articles were excluded if (1) in principal the test could not be completed without fixed laboratory equipment, (2) all groups were included on the basis of having a clinical condition (i.e., no healthy and/or control groups), and (3) manuscripts were reviews, books, posters, or conference proceedings. In the second step, articles were included if they (1) described a performance-based clinical test that was used in at least 3 studies identified in the first search, (2) evaluated one or more measurement properties in one or more of the tests described, and (3) included participants with an age or mean age between 60 and 70 years.
For the selection of articles in the first part of the study, two authors performed independent reviews of article abstracts. Discrepancies were discussed until agreement was achieved; if not, a third reviewer made the final decision. The tests detected were labelled "in-lab" when they required advanced, fixed lab equipment, or "out-of-lab", if in principal they could be performed in a home setting. Despite gait speed being a very common measure of physical performance in older adults, it is not a specific measure of balance or muscle strength, but rather considered to be a general measure of health and function [38,39]. Therefore we included only articles with tests of gait speed if the test included one or more additional test elements that challenge the sensory system beyond that of normal or fast walking and thus require a balance reaction (i.e. dynamic, proactive or reactive). Test batteries were included if one or more of the tests in the battery was in accordance with our definition of a performance-based test of balance and/or strength.
The review of full-texts was completed by three of the authors where one reviewed all articles and two reviewed one-half each. Discrepancies were discussed with one of the other reviewers and a decision was made based on consensus. For the second part of the study, two authors each screened one-half of the abstracts and full-texts of the methodological studies.

Data extraction
Information from each full-text article was extracted into an excel sheet, containing information about the performance-based clinical tests (name of the test, measurement unit, scoring, and sample characteristics).
Results were categorized into sections representing balance or muscle strength measures. Since balance tests are task-specific, balance tests were categorized according to the framework of Shumway-Cook and Woollacoot [30,1) static steady-state balance (i.e., maintaining a steady position in sitting or standing), including measures of postural sway obtained during quite standing (e.g. CoM sway); (2) dynamic steady-state balance (i.e., walking); (3) proactive balance (i.e., anticipating predicted disturbances such as crossing or walking around an obstacle); (4) reactive balance (i.e., compensating disturbances); and (5) results of balance test batteries. Muscle strength tests were categorized according to a previous published qualitative review [10], resulting in the following categories: (1) 1 Repetition Maximum (1RM); (2) Maximum Isometric Strength (MIS); and (3) Muscle Power.

Assessment of measurement properties
The quality of the method studies included in the second step was evaluated by three independent reviewers using the COSMIN checklist [40]. COSMIN describes how to rate the quality of the following nine categories of measurement properties: internal consistency, reliability, measurement error, content validity, structural validity, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness, with several items within each category [40]. Each category is rated as "poor", "fair", "good" or "excellent", with a "worse-score-count"-approach, meaning that each category will get the lowest rating achieved for any of the items within that category [40]. As the criteria of each rating score can be different between categories, the method studies receive a rating for each measurement property assessed. Thus the quality of a study evaluating validity and reliability of a test can be rated "poor" for its assessment of validity, and "fair" for its assessment of reliability. Two amendments were made to the COSMIN guidelines. The first refers to the handling of missing cases. Because missing cases largely is an issue with questionnaires and not tests of physical performance, it was not considered relevant for the quality assessment, and thus articles were not given negative ratings for not addressing it. The second refers to sample sizes. Articles with sample sizes between 21 and 30 were rated as "fair" instead of "poor", as the sample size affects the precision of estimates rather than the quality of the methodological study itself [41].

Study selection
Out of 3454 articles identified, 295 articles were included in the full-text review (Fig. 1). In total, 69 balance tests and 51 muscle strength tests were identified (Table 1; Additional file 2). Out of these tests, 26 balance tests and 15 muscle strength tests were used in ≥3 articles. These tests were included in the second search on measurement properties, and revealed only three method studies from reviewing 874 abstracts and 131 full-text articles (Fig. 2).
All studies included young seniors, where 282 studies had a sample with a mean age between 60 and 70 years

Dynamic steady-state balance tests
A total of 14 tests assessing dynamic steady-state balance were identified: (1) the tandem walk, with variations in the distance walked (9.14 m; 10 m), (2) the Step test, with variations in the demand of the activity (using the worse leg), (3) The Four Square Step Test (FSST), (4) a step width and length measuring walking test, (5) the Maximum Step Length (MSL) test, (6) the 360°turn, (7) the 180°turn, (8) the 6 m backwards walk test, (9) the 10 m walk under single-and dual-task conditions, (10) the floor transfer task, (11) the Star Excursion Balance Test (SEBT), (12) a walking test measuring dynamic balance and agility, (13) the narrow corridor walk, and (14) the sideways walk test. The method of scoring included (1) total time (s), (2) distance (step width and length), (3) number of steps, (4) number of missteps, (5) percentage (inability to complete the test), and (6) scoring (categorized according to the total time for completion of test).

Proactive balance tests
Eight tests for assessing proactive balance control were identified. The Timed Up and Go (TUG) test was used in 92 studies, with variations in (1) set pace (self-paced; fast paced), (2) distance walked (range 2.44-3.05 m), (3) turn (walk to a line on the floor and return; walk to a cone, turn around the cone and return), (4) chair (with/ without armrests; with/without backrest; height range 40-46 cm), (5) number of trials (range 1-4), (6) incorporated cognitive (counting backwards; saying animal names) and motor (carrying a cup of water) tasks, and (7)

Reactive balance tests
Seven tests for assessing reactive balance control were identified: (1) the Reactive Balance Test, measuring oscillations in medio-lateral and anterior-posterior directions, (2) the Push and Release Test, measuring the amount of steps needed to regain balance, (3) the adaptive gait test, measuring gait speed (m/s) and the number of step errors, (4) the Step Execution Test, measuring reaction time (ms), (5) the Backwards Stepping Test, measuring ground reaction forces (N/kg),(6) the Crossover Stepping Test, measuring ground reaction forces (N/kg), and (7) the Limits of stability test, measuring reaction time (s), movement velocity (m/s), and maximum excursion (%).

Performance test batteries/scales
Nine performance test batteries that included different balance tasks were identified: (1) the Berg Balance Scale  (9) the Functional Movement Measurement (FMM). All performance test batteries used a scoring scheme (e.g., 0 'unable to perform' up to 4 'able to perform the task safely') for the assessment of the performance.

Maximum isometric strength tests
There were nine tests measuring Maximum Isometric Strength (MIS). Eleven studies used MIS tests of knee extensors, with variations in (1) outcome (mean of trials; best trial), and (2) outcome dimension (kg; N/k; percentage, i.e., muscle strength/bodyweight). Six studies evaluated leg muscle strength, assessed by force (kg). Ankle dorsiflexor MIS tests were used in seven studies, either evaluated by force (kg, N/kg) or percentage (muscle strength/bodyweight). Five studies assessed ankle plantar flexor strength by force (kg). One study included MIS tests of hip extensors, two of hip flexors and hip abductors, evaluated by force (kg) or percentage (i.e., muscle strength in relation to total bodyweight). Elbow extensor strength was measured in one study by force (kg), as well as knee flexor strength, measured by percentage (muscle strength/bodyweight).

Assessment of measurement properties
Thirty-nine tests were used in ≥3 articles that were identified through step 1. In step 2, nine studies were identified that assessed measurement properties of four balance tests/scales (10s Tandem stance, TUG, SPPB, CBM) and one strength test (5STS). The quality assessment of these nine included method studies [42,52,[56][57][58][59][60][61][62][63] are shown in an additional file (see Additional file 3). The quality of the study that assessed validity and reliability of the 10s Tandem stance [61] was rated "poor" according to the COSMIN checklist [40]. Four studies assessed the measurement properties of the TUG, with their study quality rated "good" [42,59] for measures of validity, and "poor" for measures of reliability [59,60]. Three studies assessed measurement properties of the CBM, and for measures of validity, the quality of these studies were rated as "fair" [52,58,62], for internal consistency as "poor" [52], and for reliability as "good" [52,62]. The quality of the study assessing the SPPB was rated "excellent" for validity and "good" for reliability [57] in younger seniors. For strength, the study assessing reliability of the 5STS was rated as "fair" [56].

Discussion
In the first step, this systematic review identified 120 performance-based clinical tests used to measure balance and/or muscle strength in young seniors, of which 69 measured balance and 51 measured muscle strength. The TUG (92 articles), BBS (35 articles), and SPPB (34 articles) were the most used balance tests in our sample. Different variations of STS (e.g. 5STS, 30s STS) were most often used to assess muscle strength (128 articles), with the 5STS as the most commonly used test (51 articles), followed by the 30s STS (51 studies). In the second step, ten method studies were identified for the 39 performance-based clinical tests which were most commonly used. The method studies evaluated measurement properties of the 10s Tandem stance, TUG, SPPB, CBM, and 5STS n samples of young seniors.
Proactive balance was the aspect of balance that was tested most frequently, with TUG as the most frequently used test (92 articles; 61,826 participants). This finding aligns with an earlier review that found TUG to be the most used test to predict falls in healthy communitydwelling older adults aged ≥60 years [31]. TUG is fast to perform and easy to administer, and cut-offs between 12 and 13 s have shown moderate to high sensitivity and specificity in predicting falls in older adults [42,64]. However, the TUG is a general test of mobility that provides little or no information on underlying balance deficits [30]. Performance of TUG is a relatively complex task in terms of motor performance, including a 'sit-to-stand'-movement, walking, turning and a 'turn-to-sit'-movement, but for young seniors, the score of total duration may not be sensitive enough to reveal early signs of functional decline [20]. The instrumented version of TUG could potentially be a more useful test of balance and mobility in higher functioning groups, as more details of the quality and quantity of the performance can be obtained objectively than merely the total duration [65].
For balance performance test batteries, BBS was the most commonly used test (35 articles; 2324 participants), closely followed by the SPPB (34 articles; 17,687 participants). BBS is widely used and has been coined the "gold standard" of balance assessment tools [66]. BBS is a significant predictor for ADL disability onset in older adults aged 80 and over [67], but in samples with a mean age in the mid-seventies it suffers from ceiling effects [68][69][70], even in older adults with a falls history [31]. A previous systematic review recommended the SPPB as the best performance-based tool for measuring physical function in older adults due to superior qualities related to validity, reliability, and responsiveness compared to other tests [71]. This review generally reported little ceiling effects for the SPPB in the "general (mixed) population" of community-dwelling older adults. However, when applied in higher-functioning community-dwelling older adults, the SPPB also showed ceiling effects [32,72]. Despite being extensively used in older people in general and receiving appraisals for its measurement properties, the BBS and SPPB do not appear to be good enough for assessing physical performance in well-functioning young seniors due to ceiling effects. In this review, the method study assessing the measurement properties of the SPPB was rated "excellent" for its measure of validity and "good" for its measure of reliability [57]. However, the result of the method studies are not considered in this quality rating, but relatively high mean scores on the SPPB in this study (9.7 ± 2.0) align with the findings of other studies in healthy young seniors [32,72].
The most frequently used muscle strength test across all categories were those including some variation of the 'sit-to-stand'-movement (128 studies), with the 5STS (61 articles; 81,289 participants) and the 30s STS (51 articles; 7493 participants) being the most popular among them.
The 5STS is commonly used as a test of physical performance in clinical assessments [73], and is also part of the SPPB test battery. We found a large variety in how this test was administered, thus making comparisons between versions a challenge. In the original and most applied protocol, the subject is "timed from the initial sitting position to the final standing position at the end of the fifth stand" [74]. In an earlier meta-analysis, the mean score on 5STS from 4184 participants between 60 and 69 years was 11.4 s [75]. This is relatively fast compared to identified cut-offs of 13.6 s for indication of increased disability and morbidity [76], and 15 s for predicting recurrent fallers [77]. However, as also this test lacks validation in young seniors, we have no basis for recommending this performance-based clinical test as a good measure for this specific population.
The second most used tool with a STS-variation was the 30s STS, originally developed to overcome floor effects of the 5STS [78]. We did not identify any method study that assessed the measurement properties of 30s STS, but in community-dwelling adults with a mean age of 70.5 ± 5.5 years, the test-retest reliability (ICC .89) and concurrent validity was moderate, with associations with weight-adjusted 1 RM leg-press of r = .71 (women) and .78 (men) [78]. Therefore, the 30s STS could be suitable to measure physical performance in young seniors, but further studies are warranted to confirm this.
In the second step, nine method studies were identified, with only four out of 26 balance tests and one out of 13 strength tests having been used in ≥3 articles. It is apparent that very few of all available tests for measuring balance and/or strength have been assessed for their measurement properties in healthy young seniors. The quality of most of the method studies rated in this review ranged only from "poor" to "fair". However, there seems to be a shift in focus towards the current target group in the literature, as indicated by the high number of new studies that was identified in the updated literature search (Figs. 1 and 2).
The CBM and the 10s Tandem Stance were two of the tests that emerged as being used in ≥3 studies in the updated search. Therefore, these tests were added to the updated search of method studies. In two of three method studies assessing the CBM [52,58], the measures of reliability were all high (>.97) and validity good to excellent in young seniors [52,58]. However, study quality was rated "poor" with regard to validity measures with the COSMIN checklist. The studies assessing the CBM reported no ceiling effects in young seniors due to its challenging, higher level tasks [52,58], and the CBM could be considered a feasible tool to adequately assess balance performance in healthy, higher functioning young seniors. The study assessing the 10s Tandem Stance found that valid and reliable measures of the Centre of Pressure (COP) can be obtained from a Wii Balance Board (WBB), compared to a laboratory force plate [61]. Such a device could be a suitable tool for a home-based assessment of balance/posture measures. However, COP measures as assessed by the WBB have not been evaluated in younger seniors so far.
New method studies of tests that were already included before the updated search, such as TUG, SPPB, and 5STS, indicate that not only new tests, but also well-established tests are evaluated for their potential suitability in measuring balance and/or strength in young seniors. The TUG showed excellent reliability, but both studies were rated as "poor" regarding their overall methodological quality [59,60]. Another study, rated "good" according to COSMIN, found cut-off scores of 12.47 s on the TUG to be an accurate measure for screening of fall risk [42], while another study reported low discriminative ability of the TUG for healthy older adults vs. older adults with a history of falls [63], which is in line with previous findings concluding that the TUG is able to discriminate between fallers and multiple fallers, but not between non-fallers and fallers [79].
Based on the findings in this review, there seems to be only one promising scale for adequately assessing balance in healthy young seniors, i.e. showing no ceiling effects and having measures of high validity and reliability, namely the CBM, However, important measures such as responsiveness to identify intervention-related changes are currently lacking for this balance scale.
A limitation of this systematic review is the restriction to English written articles which might have influenced the final number of identified tests. However, this review was based on a broadly designed literature search which aimed at getting a broad overview of existing performance-based clinical tests used for measuring balance and/or muscle strength in young seniors. Due to the large number of identified and included articles, our search is unlikely to have missed any frequently used tests.

Conclusion
This systematic review identified a large number of performance-based clinical tests that have been used to measure balance and/or muscle strength in young seniors. The most commonly used balance tests suffer from ceiling effects in young seniors. Additionally, there is a wide variety and hence lack of consensus on how to administer balance and muscle strength tests, and how to report their outcomes. There is a need for guidance on how to administer and conduct balance and strength tests to improve their informative value and comparability of outcomes. Only nine method studies were identified that assessed the measurement properties of tests used in young seniors, indicating that more studies are required to identify suitable tests for assessing balance and strength in young seniors. Only in the last 2 years, three studies assessing the measurement properties of the CBM in healthy young seniors have been identified, indicating that it could be a promising tool to adequately measure balance. The CBM has a standardised assessment procedure and studies show that it is the only scale applied in young seniors not showing ceiling effects [52,58], being more challenging and thus more sensitive to detect changes in balance performance in healthy younger seniors. However, more research is needed to further analyse its measurement properties, especially in terms of responsiveness and sensitivity to change [52,58,62].
In general, more challenging tests are needed to adequately assess young senior's physical performance, especially when aiming to identify early declines in function so that preventive strategies can be initiated in a timely manner.