Validity and reliability testing of the Spanish version of the BESTest and mini-BESTest in healthy community-dwelling elderly

Background The Balance Evaluation Systems Test (BESTest) and its abbreviated version, the Mini-BESTest are clinical examination of balance impairment, but its psychometric properties have not yet been tested in European Spanish. We aimed to assess the psychometric properties of BESTest and Mini-BESTest in Spanish in community-dwelling elderly people. Methods We designed a cross-sectional transcultural adaptation and validation study. Convenience sample of thirty (N-30) adults aged 65 to 89 years old without balance problems were recruited. Two physiotherapists assessed participants at the same time. Internal consistency of Spanish BESTest and Mini-BESTest was carried out by obtaining the Cronbach Alpha. The reproducibility between raters was studied with the Intraclass Correlation Coefficient. The Pearson correlation coefficient was calculated by comparing the relationship between the BESTest, mini-BESTest, Berg Balance Scale (BBS) and Falls Efficacy Scale-International (FES-I). Results BESTest and Mini-BESTest showed good internal consistency. BESTest and Mini-BESTest total scores showed an excellent inter-rater agreement. There was a significant correlation between total score of the BESTest and the Mini-BESTest (r = 0.65; p < 0.001). BESTest had a moderate association with BBS and a strong association with FES-I. Mini-BESTest had a fair correlation with BBS and FES-I. Total scores obtained by women at BESTest and at Mini-BESTest were significantly lower than those reached by men. The differences observed in all the test when disaggregating data by sex require further research. Conclusions Spanish versions of BESTest and Mini-BESTest are comprehensible for new raters. They are reliable tools to provide information on which particular balance systems show impairment in community dwelling older adults. Elderly women had a worse quality of balance and a greater perception of their risk of falling. Trial registration This study was registered in ClinicalTrials.gov with NCT 03403218 on 2018/01/17.


Background
Motor control is defined as the ability to regulate or direct the mechanisms essential for balance and movement [1]. It is very complex and involves many different underlying systems. A system must be understood as a set of elements whose parts or components are related to at least some of the other components.
During ageing, motion initiation, postural control and movement are usually impaired by factors involving effectors, sensory and cognitive systems. Specifically for the vestibular system, above 70 years old, a decrease in the number of sensory vestibular cells and a reduction in vestibular nerve fibres and neurons in the vestibular nuclei has been described [2]. Visual function is also impaired, including reduction in visual acuity, depth perception and peripheral vision [3]. In the musculoskeletal system, osteoporosis, sarcopenia and reduction of body muscle strength are established effects to a greater or lesser extent. As regards the effects of ageing in cognition, selective attention is reported to be less effective in the elderly compared to young subjects. Among other degradations, an increase in the attentional cost of posture control and displacement has also been mentioned [4].
Falls are a geriatric syndrome derived from this deterioration and constitute one of the most frequent and potentially significant public health problems [5]. Although there is no clear indication of the factors that imply worse balance for men and women older than 65 years, there appears to be general agreement in the literature that old women have a worse balance than old men [6][7][8].
The potential of clinical measurements of balance is usually limited to identifying the risk of falling, within the elderly population. The Berg Balance Scale (BBS) is a widely used clinical scale for functional balance and is currently considered as a reference for assessing balance in people with risk of falling due to stroke or Parkinson's disease [9,10].
Considered the gold standard, the BBS has been strongly evaluated as valid and reliable but there are still several factors that indicate that the BBS should be used in conjunction with other balance measures. There are few tasks in the BBS to test dynamic balance, which may limit its power to identify sensitive older adults from those who live independently in the community [10]. Also, ceiling effect and floor effect has been reported for the BBS when used with community dwelling older adults [7,11]. The Falls Efficacy Scale-International (FES-I) is a measure of "fear of falling" or, more properly, "concerns about falling", which is also suitable for research and clinical practice. Stroke rehabilitation or Spanish studies about sarcopenic obesity or balance in the older people with instability include this instrument [12][13][14]. The identification of valid, reliable and clinically relevant outcome measures to document and evaluate the evolution or recovery of a health condition is vital in determining the effectiveness of interventions. The Balance Evaluation Systems Test (BESTest) and its abbreviated version (Mini-BESTest) are clinical balance tools that allow the identification of not only the risk of falling, but also which impairments are involved in dysfunctional balance. Having detected the degraded components underlying balance control, specific types of therapeutic intervention for different types of balance problems can be applied [10].
The BESTest has shown to have high correlation with functional gait and balance performance and moderate correlation with self-assessed fear of falling (FES-I) in English, its source language [15,16].
However, clinical systems used all over the world must not only be translated, but must also be adapted culturally to maintain content validity at a conceptual level across different cultures [17]. Psychometric properties must also be preserved in the destination language. This makes it necessary to analyse its validity before dissemination [18].
The purpose of this study was to translate and adapt BESTest and Mini-BESTest to Spanish (from Spain, Europe) and investigate its validity in elderly healthy subjects without apparent difficulties in balance and locomotion. Secondly, we aimed to study with both instruments the effect of sex-gender in balance.
We hypothesized that BESTest and Mini-Bestest would highly correlate with the Berg Balance Scale (BBS) and would have a low correlation with self-assessed fear of falling in healthy older adults measured by means of Falls Efficacy Scale-International (FES-I).

Study design
A cross-sectional transcultural adaptation and validation study design was used.

Participants
Participants were 30 community-dwelling elderly subjects (14 males and 16 females) (mean age: 73.3, range: 65-89) recruited with a convenience sampling. Inclusion criteria were: (1) no present health problems relating to balance disturbances; (2) no history of falling in the last three months; (3) the ability to walk 6 m without orthopaedic devices or the assistance from another person; (4) cognitively able to receive three verbal instructions; (5) capable to perform tests without excessive fatigue.
The study followed the Declaration of Helsinki and all the participants received oral and written information about the study and signed the informed consent. Ethical approval was obtained from the Comité Etico de Investigación de Aragón -Spain (CEICA).

Clinical scales
The BESTest is a 27-item test organized in six sections, using a model of motor control as a theoretical framework [1]. Each item is scored from 3 (normal) to 0 (severe, not capable) points based on time or performance criteria resulting in a total possible score of 108 points, which are converted to a percentage score. Higher scores indicate better balance.
The mini-BESTest was developed using factor analysis to identify the items of the BESTest that represented dynamic balance, eliminating redundant and insensitive items from the BESTest [5]. It includes tasks from the BESTest sections "Anticipatory Postural Adjustments", "Postural Responses", "Sensory Orientation" and "Stability in Gait", and comprises 14 items scored from 2 (normal) to 0 (severe, not capable), resulting in a total possible score of 28 points, where higher scores indicate better balance.
The BBS consists of 14 functional balance items that focus on static and dynamic balance abilities. Each item is scored from 4 (normal function) to 0 (impossible). The total possible score is 56 points. The interpretation of the result is: ≤20 -wheelchair user, > 20 ≤ 40 -walking with assistance, > 40 ≤ 56independent. A score lower than 46 points indicates that the person has a high risk of falling [10].
The FES-I system measures the level of confidence shown by the subject in carrying out basic activities of daily life. It comprises 16 items with a Likert scale of 4. The total possible score is 64 points. Higher scores indicate worse balance [19].

Transcultural translation process
For the translation of BESTest, Mini-BESTest and their respective instructions, the Beaton recommendations were followed [17]. Firstly, two qualified people translated both instruments from English to Spanish. Then a certified translator back translated the tests to English. The original author of the tests, Fay Horak, reviewed and commented both English documents. All corrections were translated to Spanish and integrated in the managed balance tests. The International System of Units was adopted in the Spanish version of both tests, converting pounds into kilograms and feet and inches into centimetres. Results were rounded off to the nearest unit. A group of physiotherapists instructed in BESTest and Mini-BESTest assessment reviewed the final versions and followed a pre-test study at the Hospital Universitario Miguel Servet (Zaragoza, Spain) with neurological patients and 12 community-dwelling elderly healthy subjects. This training period allowed adjustment of a minor number of instructions to improve items understanding in the cultural context.

Validation procedure
Feasibility analysis was not necessary because these measurement systems are already applied and validated in other languages [20,21].
Two trained senior physiotherapists assessed the 30 subjects by means of the BESTest and Mini-BESTest, the BBS and FES-I. Redundant items of BBS and BESTest were assessed together. The Mini-BESTest scores were rated simultaneously with the BESTest scores, so that the corresponding test was only applied once for each subject.
Evaluations were made at the home of each subject, where required clinical equipment was taken. The full session lasted approximately 45 min. The BERG scale was always administered before BESTest; FES-I was fulfilled at the beginning or at the end of the assessment. To avoid fatigue of participants, both physiotherapists scored each item at the same time without communication between them.
The data capture period of the study lasted two months.

Data analysis
Descriptive statistics were first calculated for age, height, weight and each of the measures with 95% confidence interval (CI) when applicable. Cronbach Alpha was calculated to assess internal consistency of BESTest and mini-BESTest. Values higher than 0.70 were considered to indicate internal consistency of scales. Reliability was calculated via Intraclass correlation coefficient (ICC) and 95% CI using analysis of variance models. Consistent with previous work, an ICC was defined as (1) very good or excellent > 0.75; (2) good, 0.74 to 0.60; (3) fair, 0.59 to 0.40; and (4) poor, < 0.4 [22,23] . Minimum detectable change (MDC) was calculated starting from the standard error of the measurement and taking into account the standard deviation of the values of all the participants. A 95% CI was applied to calculate the formula. The Pearson correlation coefficient was calculated to examine criteria validity by comparing the relationship between the BESTest, mini-BESTest, BBS and FES-I. A correlation (r value) between 0 and 0.25 was interpreted as little or no relationship, between 0.25 and 0.5 as a fair relationship, between 0.5 0.75 as moderate, and above 0.75 as a very good to excellent relationship.
The presence of floor and ceiling effects was defined as 15% or more of the participants having the lowest or respectively the highest possible score on the BESTest and the Mini-BESTest [24].
All analyses were conducted with SPSS software, version 22.0 (IBM Statistic for Windows. Armonk, NY: IBM Corp.). The effect of sex-gender was analysed after verifying the variables normal distribution using oneway ANOVA. TIBCO® Data Science -Statistica® was used for this last analysis.

Results
30 people participated in the study, 14 were men (46.7%) and 16 women (53.3%). They were aged 65 to 89 years old. None of the participants used walking aids. Our cohort do not present anthropometric differences depending on gender (Table 1). Table 1 presents the BBS and FES-I scores as the parameters that illustrate the balance status of our cohort and their own perception about risk of falling. BBS showed significant differences between men (51.3 (1.4)) and women (49.10 (3.40)), (F(1.28) = 5.07, p < 0.03, Table 1). Only 3 people (all woman) showed a BBS score under 46, meaning that our cohort had a high risk of falling, although they had not reported falls in the last three months.
None of the participants got the lowest or highest possible score at the BESTest and Mini-BESTest, so no floor or ceiling effects were detected.
The total score (%) for BESTest obtained by the second rater presented a significantly higher score for men than for women (F(2. 27) = 5.28, p = 0.01, Table 2).
When analysing the BESTest section scores disaggregated by gender, scores obtained by women were lower than those reached by men, regardless of the BESTest or Mini-BESTest section. ANOVA showed a significant difference depending on gender in sections I, IV and VI. Values at section I (Biomechanical Constraints) were F(2. 27) = 3.22, p = 0.06. The mean differences were 1.9 (p < 0.03) obtained by rater 1 and 1.9 (p < 0.02) by rater 2 ( Table 2). This difference represents 14% of the maximal score of section I.
In section IV (Postural Responses), the value was F(2. 27) = 4.48, p = 0.02 for both raters (Scheffé post-hoc for both raters was p < 0.01). This difference represents 10% of the maximal score of the Postural Responses section.
For Mini-BESTest, scores obtained by women were also lower than those reached by men in all sections and for both raters (Table 2).

Internal consistency reliability
BESTest and Mini-BESTest showed a good internal consistency, with Cronbach alpha values between 0.79 and 0.98 (Table 3).

Validity
There was a strong positive and statistically significant correlation between the total scores of BESTest and Mini-BESTest (r = 0.65; p < 0.001, Table 4). Correlations between corresponding sections of both systems were strongly positive and statistically significant (Table 4).
Discriminating by gender, all sections correlated strongly and obtained statistical significance. Higher correlation was observed in the sensory orientation section, being higher in men (r = 0.98 and 0.90 respectively; p < 0.001, Table 4). There was a moderate correlation in women's postural responses (r = 0.44, p < 0.05, Table 4).
BESTest had a moderate association with BBS (r = 0.43, p 0.02, Table 5) and a strong association with FES-I with statistical significance(r = 0.90). Mini-BESTest had a fair correlation with BBS and FES-I. There was no    (Table 5).

Discussion
This study aimed to verify psychometric properties of BESTest and Mini-BESTest once translated to Spanish in a community-based sample of older people without related balance problems and secondarily to study sexgender differences. Validation was done by comparing to BBS and FES-I. Our results showed that the Spanish translated version of BESTest and Mini-BESTest are reliable and valid measures of balance performance for communitydwelling elderly people not prone to falling, and that scores of BESTest and Mini-BESTest obtained by women were lower than the ones obtained by men. Data  Fig. 1 The BESTest total score inter-rater agreement   [5,7]. Both balance measurement systems showed a good internal consistency for items in respect of the total score and for each of the six sections of BESTest and the four of the Mini-BESTest, also demonstrating that the items measured the same underlying attribute. O'Hosky et al. [5], showed that the BESTest, mini-BESTest and brief-BESTest present the construct validity for assessing balance in adults aged over 50 years. We also found a greater degree of agreement between raters for the BESTest validity, indicated by excellent inter-rater reliability for total score and for test subsections. Values for ICC in all sections, except Postural responses, were similar to those obtained by Padgett or Wang-Hsu [23,25]. Reliability in Mini-BESTest indicated excellent interrater concordance data, although with lower scores in some sections than those registered in people with chronic stroke or Parkinson's disease [11,22,26]. Positive correlations between BESTest and Mini-BESTest were found, as obtained by other studies with the same scales developed in other languages [16,18]. Scores of BESTest and Mini-BESTest obtained by women were lower than men. There is strong evidence that physical activity reduces the risk of falling [27,28]. Some papers have also pointed out that physical activity is more intense and frequent in elderly men, while activities of daily living are usually more demanding for women [29]. Unfortunately, in our study we have not recorded a validated score for physical activity, but the participants who were men reported practicing more physical activity than women. This is the only factor that could explain our results. Comorbidity, medication or socioeconomic differences, other important-factors mentioned in the literature as responsible for poor balance [29][30][31] should not be taken into account in our present work because we selected for healthy subjects, unhealthy were excluded, and they were mostly couples with a similar socioeconomic status. Nevertheless, despite the oral report of physical activity, due to the absence of a systematic collection of this information through selfreports or administered scales, we consider that one should take with caution the interpretation that the differences in balance score between genders is due to differences in levels of physical activity. Those differences observed in all the test when disaggregating data by sex require further research.
At the criterion validity and correlating with BBS, a significant correlation was found with BESTest and a fair one with Mini-BESTest, obtaining an inverse correlation in men, as in other research [5]. It could be asked if, in relation to the BESTest and BBS, this inverse correlation is only in any person with good balance or especially in men because, as some studies point out, they have better balance. Those results suggest that other measurement systems should be included to identify in a more accurate and sensitive way individuals with functional limitations beyond BBS [10,11]. Fear of falling measured by FES-I showed a strong association with BESTest, although this was not statistically significant. Regarding the Mini-BEST, in the total sample a negative correlation was observed with FES-I, so the risk of falling decreased when dynamic scores in balance increased, as expected [13,32]. BESTest correlated positively and strongly with Mini-BESTest, but this correlation was inverse. It could be that aspects that the Mini-BESTest fails to evaluate, that are included in the BESTest, make this assessment less accurate in people who are afraid of falling. This situation is particularly important in the group of women, since in men the correlation does not have a positive sign. In general, we have observed that the correlation between the scales used in this study is greater the worse the balance is. To detect which aspects of balance can be improved in the healthy ageing population, it would be the BESTest that detects more precisely these difficulties. This is especially true in the diagnosis and treatment of balance problems considering the differences observed between women and men. We want to highlight these results, which lead us to ask ourselves if better balance in independent elderly people who have no history of a recent fall is linked to gender. There is no consensus in the literature: some studies did not find them, while others observed indeed lower levels of postural stability and balance for men [29][30][31].
It would be also crucial to identify those factors that decrease or increase risk of falling and that contribute to maintaining functional independence.

Study limitations and directions for future research
The not-fully-random sampling that was adopted for practical reasons limits the strength with which these results can be used to infer properties of the general population. This study was designed to assess the psychometric properties of BESTest and Mini-BESTest in communitydwelling elderly people in Spanish. However, given the differences obtained in the correlations between BESTest and MiniBESTest and FES-I or BBS when disaggregating data by gender, it should be studied if, at similar ages, the same level of physical activity in that population implies a similar quality of balance. This would require a larger sample size than the one in the present study.
We strongly recommend cultural adaptation to other Spanish-speaking countries, especially in Central and South America.

Conclusions
Spanish versions of BESTest and Mini-BESTest are balance systems comprehensible for new raters. The BESTest and Mini-BESTest are valid and reliable tools to provide information on which particular balance systems were the underlying cause of balance impairments in community dwelling older adults. Our results suggest that elderly women showed a worse quality of balance and a greater perception of their risk of falling than men.