Research Article | Open | Open Peer Review | Published:
Absolute and relative reliability of acute effects of aerobic exercise on executive function in seniors
BMC Geriatricsvolume 17, Article number: 247 (2017)
Aging is accompanied by a decline of executive function. Aerobic exercise training induces moderate improvements of cognitive domains (i.e., attention, processing, executive function, memory) in seniors. Most conclusive data are obtained from studies with dementia or cognitive impairment. Confident detection of exercise training effects requires adequate between-day reliability and low day-to-day variability obtained from acute studies, respectively. These absolute and relative reliability measures have not yet been examined for a single aerobic training session in seniors.
Twenty-two healthy and physically active seniors (age: 69 ± 3 y, BMI: 24.8 ± 2.2, VO2peak: 32 ± 6 mL/kg/bodyweight) were enrolled in this randomized controlled cross-over study. A repeated between-day comparison [i.e., day 1 (habituation) vs. day 2 & day 2 vs. day 3] of executive function testing (Eriksen-Flanker-Test, Stroop-Color-Test, Digit-Span, Five-Point-Test) before and after aerobic cycling exercise at 70% of the heart rate reserve [0.7 × (HRmax – HRrest)] was conducted. Reliability measures were calculated for pre, post and change scores.
Large between-day differences between day 1 and 2 were found for reaction times (Flanker- and Stroop Color testing) and completed figures (Five-Point test) at pre and post testing (0.002 < p < 0.05, 0.16 < ɳp 2 < 0.38). These differences notably declined when comparing day 2 and 3. Absolute between days variability (CoV) dropped from 10 to 5% when comparing day 2 vs. day 3 instead of day 1 vs. day 2. Also ICC ranges increased from day 1 vs. day 2 (0.65 < ICC < 0.87) to day 2 vs. day 3 (0.40 < ICC < 0.93). Interestingly, reliability measures for pre-post change scores were low (0.02 < ICC < 0.71). These data did not improve when comparing day 2 with day 3. During inhibition tests, reaction times showed excellent reliability values compared to the poor to fair reliability of accuracy.
Notable habituation to the whole testing procedure should be considered as it increased the reliability of different executive function tests. Change scores of executive function after acute aerobic exercise cannot be detected reliably. Large intra- and inter-individual of responses to acute aerobic exercise in seniors can be presumed.
One-third of the global and nearly half of the Western population will be aged >60 years at the end of the twenty-first century . The process of biological aging, particularly in later adulthood, goes along with deteriorations of physical and cognitive function . Cognitive function is a robust predictor of mortality in older people at population level  and executive control, seems to predict the functional status during daily life in seniors . Executive functions refer to a family of top-down cognitive processes underlying the organization and control of goal-directed behaviour . According to Miyake et al.  inhibitory control (control of attention, behaviour and emotions to override a dominant or pre-potent response), working memory (storage, manipulation and retrieval of information) and cognitive flexibility (ability to flexibly shift between mental sets) are considered its core components .
Previous reviews led to the “executive function hypothesis”, which suggests that regular physical activity and exercise selectively elicit benefits in this cognitive domain [7, 8]. Although benefits of exercise targeting cardiovascular fitness were also reported for attention and long-term memory in older adults , some meta-analytical findings show no evidence for improvements of cognitive performance after a period of aerobic training. Nonetheless, improvements in cognitive function following chronic exercise are considered clinically relevant as most findings from observational studies suggest that regular exercise can delay the onset of future dementia , possibly due to a promotion of cognitive reserves protecting cognitive function in spite of disease or damage . Even low but regular PA levels were found to be positively associated with cognitive function (>100.000 participants across 20 nations) during aging .
In contrast to the heterogeneous findings obtained from longitudinal studies, acute bouts of aerobic exercise seem to transiently improve several dimensions of cognitive function, whereby immediate and delayed benefits were pronounced for executive function . A recent meta-analysis revealed that moderate aerobic exercise elicits particularly beneficial effects for inhibitory control, working memory and task-switching in preadolescent children and older adults compared to other age groups . These acute improvements of cognitive function were reported for time-dependent measures and appeared to be independent from the participant’s fitness level. Acute exercise-induced changes of cortical, vascular, hemodynamic and metabolic functions [15,16,17] have been discussed as underlying mechanisms for improved cognitive control. Although benefits elicited by a single exercise bout are considered to be transient, such cognitive improvements are still of high practical relevance. One major advantage of acute effects over chronic effects is that independent of the fitness level they can be elicited quickly . Furthermore, older adults may use a single exercise session as a strategy to prepare for situations demanding high executive control. Whereas both acute and chronic facilitation of executive function by exercise gained notable attention, the variability and reliability of employed tests for assessing cognitive domain in older adults as well as the reliability of acute effects of exercise have long been disregarded.
Thus, examining changes of executive function following acute exercise adequate detection of meaningful change following these interventions . Therefore, the identification of baseline and/or post-exercise variability of the respective cognitive testing parameters is needed. Otherwise, a reliable justification of detrimental or beneficial effects on cognitive performance is hindered. In this regard, the quantification of day-to-day reliability of a certain variable reflecting executive function needs to be discussed towards 1) chance, 2) system-immanent errors and 3) biological variability . A resulting fundamental question is “how reliable is a particular assessment tool and how precisely and reproducible can acute exercise training effects on executive function be identified”. This is of particular importance as a certain amount of error is inherent when testing human beings and, thus, reliability can be considered as the amount of error or variability which is accepted for a particular test . Hence, it is important to know whether an acute change of performance (e.g., executive function) can be attributed to the intervention rather than to its day-to-day variation . Day-to-day variation can be due to the training setting, age-group, occurring fatigue, activity level, disease status . In this regard, relative (e.g., intra-class correlation coefficients) and absolute variability estimates (e.g., standard error of measurement or coefficients of variation) need to be identified for baseline and post-exercise executive function in an acute setting of healthy seniors.
Against this background, the present study aimed at investigating day-to-day reliability of a variety of tests on executive function before and after acute aerobic exercise (i.e., Eriksen Flanker Test, Stroop Test, Digit Span, and Five Point Test) in a group of healthy seniors. Thereby, various absolute and relative reliability indices were collected within a three days repeated measures design in an acute exercise setting. We aimed at disentangling whether acute changes of executive function do vary in seniors. Beside acute effects of exercise on executive function this information is needed to estimate the likelihood of detectable change also in future training studies on exercise and executive function.
Participants and general design
Twenty-two healthy and physically and recreationally active seniors (Table 1) were recruited via a Senior Club (ProSenectute) and voluntarily participated in the present reliability study. Prior stroke, heart attack, heart failure and surgery, bypass, cardiac dysrhythmia, acute flu or cold, spinal-, joint and headpain, diabetes mellitus, untreated hypertension (>160/>100), acute and chronic inflammatory condition, severe arthrosis, recurrent vertigo, knee- or hipendoprosthesis, trauma within the last 6 month). None of the included participants reported any of those conditions. We conducted a repeated between-day comparison (i.e., day 1 vs. day 2 and day 2 vs. day 3) within weekly intervals (Fig. 1). None of the included participants reported any cardiac, pulmonary or neurological condition, elevated blood pressure or medication intake based on the physical activity readiness questionnaire (PAR-Q) . Seniors with at least mild cognitive impairment (MCI) based on the Mini-mental state exam scoring between 20 and 25  and clock drawing test  were excluded. None of the recruited seniors needed to be excluded due to at least mild MCI. The sample size of 20 Subjects was based on an at least moderate correlation between pre and post testing during executive function testing and deliver a strong power (1-beta error) of 90% and a very high alpha level of 0.01. We recruited more participants due to expected drop outs that did not occur.
All seniors were requested to refrain from moderate to severe exercise within the last 24 h prior to spiroergometric exercise testing or acute aerobic exercise training. Caffeine intake was not allowed 5 h prior to exercise testing or training. No caffeine withdrawal symptoms were observed. Between-day variability of cognitive function before and after acute aerobic exercise was assessed on three days in weekly intervals (Fig. 1). The study was approved by the local ethical committee of the University of Basel (11/23/2015–254) and meets the criteria of the declaration of Helsinki. After receiving all relevant study information, the participants signed an informed consent to the study including a permission to publish the data. The Freiburg physical activity questionnaire was used to assess baseline physical activity in h/week (Frey et al. 1995). The total amount of weekly physical activity includes baseline physical activity (e.g., daily walked or biked distance, stair climbing), leisure time activity (e.g., hiking, dancing, bowling) and sportive activity (disciplines). The summarized weekly hours were used to describe activity of both groups. Participants’ body fat and weight was assessed using the InBody 170 (JP Global Markets GmbH, Germany). To measure body height a measuring pole was used. These are common, valid and good to excellently reliable tools for anthropometric assessment (0.8 > r > 0.95).
Aerobic cycling exercise and exercise intensity determination
Based on health-related exercise recommendations of the American College of Sports Medicine , 30 min of aerobic cycling exercise at 70% of the heart rate reserve (HRR) using the “Karvonen-formula” (0.7 × (HRmax – HRrest)) was applied in-between cognitive testing [26, 27]. In order to calculate HRR, maximal heart rates (HRmax) were obtained from maximal spiroergometric ramp-like exercise testing on a treadmill. Briefly, the protocol started at a velocity of 6 km/h − 1 and an inclination of 0.1% for a time period of 1 min. Intensity was increased by 1 km/h − 1 every minute until volitional exhaustion was reached. Maximal exertion levels have been verified if at least three out of the following 5 criteria were reached: 1) age-predicted maximal heart rates , 2) breathing frequency (>35/min) 3) capillary lactate concentration (>8 mmol/L), 4) ventilator equivalent for oxygen uptake (>35) and 5) respiratory exchange ratio (RER >1.1) . Ventilatory parameters were collected using the Cortex Metalyzer® 3B metabolic test system (Cortex Biophysik GmbH, Leipzig, Germany). VO2peak and HRmax were derived from the final 30 s before exercise cessation.
Different aspects of executive function were assessed using the Five-Point-Test  and computer-based modified versions of the Eriksen-Flanker task (Eriksen & Eriksen, 1974), Stroop Color-Word  as well as Digit-Span (forward & backward) in a counterbalanced order. All computer-based tests were administered by the same rater with Presentation 18.0 (NeuroBehavioral Systems, USA). No breaks between testing were allowed. Cognitive assessments were performed in a separate room with one participant at a time. Prior to testing, instructions were provided verbally in a standardized manner. Afterwards, instructions were also presented on the screen to make sure the participants understood the task. Following the instruction, noise was kept to a minimum. Environmental temperature was held constant at 21 °C during cognitive testing.
The modified Flanker task was used to assess the inhibitory component of executive control . During the task, participants are required to respond to a centrally presented target stimulus (vertical visual angle: 8.5°) by pressing a button corresponding to its direction. In congruent trials, the target stimulus was surrounded by six arrows facing the same direction, whereas in incongruent trials the centrally presented target stimulus was facing in the opposite direction of the flanking arrows. Participants completed one practice block with 10 trials and one test block with 100 trials. In each trial, arrows were presented focally for 200 ms after a fixation period of 1000 ms. The response window was set to 1000 ms and participants received feedback on their response. Congruent and incongruent trials were presented in random order and with equal probability. Reaction time and accuracy for congruent and incongruent trials were calculated to assess speed processing and interference control.
Stroop color test (SCT)
The Stroop Color-Word is a standard test to assess inhibitory control [32, 33]. The stimulus used in this task is a color name presented in the centre of the screen (vertical visual angle: 8.5°). It is either printed in ink matching the color name (compatible trials) or in a different color of ink (incompatible trials). Participants are instructed to press a button corresponding to the ink in the color block or the name of the color in the word block. The colors/words chosen for this task were “rot” (red), “grün” (green), “gelb” (yellow) and “blau” (blue). Participants completed a practice block with 12 trials as well as one color and one word block with 96 trials each. The order of test blocks was counter-balanced across participants and both types of trials (compatible, incompatible) appeared with equal probability. Each trial started with a 500 ms fixation period, followed by the presentation of a stimulus over 200 ms. Responses were collected within a 2500 ms window and participants received feedback on their accuracy. As dependent measures of information processing and interference control reaction time and accuracy were calculated for compatible and incompatible trials, respectively.
Digit span forward and backward (DSF/DSB)
Digit Span Forward and backward is used to assess working memory and updating of working memory, respectively . In this task, participants were required to repeat a sequence of digits (1–9 presented with equal probability) on a computer keyboard in the same (forward) and in reversed order (backward). Digits did not occur in regular ascending or descending sequences with equal consecutive step sizes. In all trials, each digit was presented for 500 ms with an inter-stimulus interval of 500 ms. Although participants were instructed to provide a timely response, the time window was not limited. The span length was increased by one digit every two trials (starting from 3 in digit span forward and 2 in backward), until the limit of two successive errors was reached. Measures obtained from the task were lengths of the longest span answered correctly forward and backward as well as the number of cumulative errors.
The Five-Point test is used for the assessment of figural fluency functions, which relate to the set-shifting component of executive control . For this task, participants received a sheet of paper, on which 40 five-dot matrices were printed. Each matrix was identical and consisted of a fixed pattern of five symmetrically arranged dots. Participants were required to draw as many designs as possible in 2 min by connecting the dots with at least one straight line. After the investigator demonstrated two possible designs, participants were asked to perform the task. The Five-Point test was scored by counting the total number of unique designs and repetitions of designs (perseverative errors).
All outcome measures were checked for normal distribution (Kolmogorov Smirnov test) and variance homogeneity (Levene test). Data are given as means with standard deviations (SD) and 90% confidence intervals (90% CI), respectively.
Repeated measures analyses of variance were applied separately for each outcome measure between the two subsequent trials at the beginning and at the end of the testing procedure. An α-level of p < 0.05 was accepted as statistically significant. Effect sizes for variance analyses were given as partial eta squared (ηp 2) with values ≥0.01, ≥ 0.06, ≥ 0.14 indicating small, moderate, or large effects, respectively. Intraclass correlation coefficients as a measure of relative reliability was computed according to the formula ICC = 1 – (SEM2/SD2) with SD serving as between subject standard deviation .
We calculated the standard error of measurements (SEM, computed as the SD of the difference divided by the square root of 2) as well as the log-transformed coefficient of variation (CoV) together with 90% confidence limits as measures of absolute reliability [18, 19]. Reliability data were analysed using a published spreadsheet  in Microsoft Excel® of Hopkins (Hopkins 2007).
Day-to-day variability between the 1st and 2nd day at pre and post testing
At pre testing, meaningful differences were observed between 1st and 2nd day of testing for the Eriksen-Flanker- (compatible: p = 0.05, ɳp 2 = 0.12) and Five-Point-test (correctly completed figures: p = 0.05, ɳp 2 = 0.20). The Stroop Color-Word test revealed relevant differences between 1st and 2nd day for compatible and incompatible reaction time at both pre- and post-testing (pre: 0.002 < p < 0.04, 0.16 < ɳp 2 < 0.38) (Table 2, column 4 and 5) Coefficients of variation below 10% were found for reaction times and accuracy on the Eriksen-Flanker- (0.5% < CoV < 5.5%) and Stroop-Color-test (0.8% < CoV < 8.7%), (Table 2, column 9). Digit-span testing at pre and post revealed very large CoV ranging between 15 and 43%, particularly for cumulative errors both backward and forward. ICC values were mainly good to excellent for the Eriksen-Flanker- (0.66 < ICC < 0.84), Five-Point-test (0.71 < ICC < 0.75) and Stroop Color-Word test (0.65% < ICC < 0.87, except for: accuracy, compatible pre: ICC: 0.28 and incompatible post: ICC: 0.25).
Day-to-day variability between the 2nd and 3rd day at pre and post testing
Notable differences between 2nd and 3rd day were found for the Eriksen-Flanker- (pre, reaction time, incongruent: p = 0.02, ɳp 2 = 0.19), Five-point test (correctly completed figures, post: p = 0.03, ɳp 2 = 0.20) and Stroop Color-Word test (pre, correct response, compatible: p = 0.03, ɳp 2 = 0.22) only. Coefficients of variation (CoV) for reaction times and completed figures were observed to be mainly around 5% for the Eriksen-Flanker-, Stroop Color-Word test and Five-Point test (Table 2, column 9). Digit-span testing at pre and post revealed very large COVs ranging between 11 and 39%. ICC values were mainly fair to excellent for all tests (0.40 < ICC < 0.93), except for accuracy on the Eriksen-Flanker-test (−0.47 < ICC < 0.05) and completed stages or errors on the Five-Point test (0.05 < ICC < 0.40).
Day-to-day variability for the change score between day 1 and 2 and day 2 and 3
Change scores between pre and post testing for day 1 vs. day 2 as well as day 2 and day 3 showed insufficient relative and absolute reliability values (Table 3) ranging between 0.35 < ICC < 0.67 for the Erikson-Flanker test and −0.16 < ICC < 0.44 for the Stroop-color test. These values did not improved when comparing day 2 vs day 3. These values were even lower for the Digit-Span and Five-point test. Also absolute reliability measures showed large typical errors for the change scores during all testing conditions (Table 3).
The present study assessed absolute (e.g., CoV) and relative (e.g., ICC) between-day variability of executive function (i.e., Eriksen-Flanker-test, Stroop-Color-test, Digit-Span and Five-Point-test) before and after a single bout of moderately intense aerobic cycling exercise (30 min at 70% of the heart rate reserve). The study was applied to healthy and active seniors using a three-day (habituation day, first day, second day) repeated measures design.
First, we found notable between-day habituation (from 1st to 2nd day) mainly for some time-dependent measures obtained from the Eriksen-Flanker-, Stroop-Color as well as completed figures in the Five-Point test at both pre and post testing. This is not surprising from a general viewpoint of between-day-learning. However, our testing and training session were interspersed by 7 days. Thus, also longer during between-trial breaks of up to 7 days should account for habituation effects. As these habituation effects became smaller from day 2 to day 3, relative between-day reliability (i.e., ICC) of time-dependent measures of inhibitory control (assessed with Eriksen-Flanker-test, Stroop Color-Word test) and the number of completed figures in the Five-Point test further increased from acceptable to excellent reliability. For the Stroop Color-Word and Flanker task, habituation effects have been shown previously using a short  or longer retest-interval . The present results therefore indicate that habituation to the testing procedures should be considered in older adults, when studies aim to examine changes of inhibitory control over time. As the reliability of time-dependent measures of inhibitory control was excellent after a practice day at pre and post level, the temporal stability of these outcomes suggests that this subcomponent of executive functioning reflects stable individual differences. Despite comparatively high accuracy (>90%) on the Flanker Stroop Color-Word test, the percentage of correct responses showed remarkably lower absolute and relative reliability values than time-dependent measures in both testing scenarios. This could be due to the fact that older adults achieved very high accuracy rates, so that a ceiling effect in performance resulted in less discriminative power and variance between participants. Furthermore, a similar accuracy rate on trials assessing information processing and trials assessing inhibitory control indicates that the number of correct responses does not discriminate well between different cognitive functions and should not be used as the main outcome for the Flanker and Stroop Color-Word test. This is considered to be true, if the high accuracy rates are not solely due to participants completing the tasks with a prevention focus. This strategic inclination promotes an increase of correct responses, whereas a promotion focus reduces reaction time . However, all participants were instructed to perform the task as quick and accurate as possible, so that the instructor did not systematically change the strategic inclination of the participants and a simple trade-off between reaction time and accuracy seems less likely.
Concerning working memory, the Digit-Span testing revealed poor to fair indices for both relative (ICC) and absolute (ICC) reliability, with CoV around 20% for completed stages and 40% for cumulative errors. These findings hold true for pre and post exercise testing values. For the Digit span backward, Waters and Caplan  have also reported a test-retest reliability that is lower than desirable in older adults . However, habituation to cognitive testing increased the relative reliability of the number of completed stages. Consequently, this is the only measure of the Digit Span showing a temporal stability that justifies its use in the detection of acute or chronic effects of exercise on different aspects of executive function. However, a lower reliability of working memory measures compared to outcomes obtained from the inhibition tasks might not be test-specific. The number of trials in the Digit Span was much lower than in the Stroop Color-Word and Flanker task. It is very likely that a higher number of trials would have decreased the variability between test and retest. Therefore, increasing the number of runs on the backward and forward version of the Digit Span in addition to a habituation to the testing procedures might be most promising for the improvement of test-retest variability.
In contrast to other studies assessing the reliability of different executive function tasks, test-retest variability was measured before and after a moderate aerobic exercise session. The present study also calculated the change scores of executive function between pre and post testing. Interestingly, test-retest reliability was very similar between pre and post exercise values after habituation to the testing procedures. Thus, acute aerobic exercise does not increase the variability of executive functioning between days. Additionally, high test-retest reliability of post exercise values suggests that either the effects or the lack of effects of exercise on different aspects of executive function can adequately be reproduced. In this respect, particularly time-dependent measures of the Flanker task and the Stroop Color-Word test, completed figures in the Five-Point test and completed stages in the Digit Span backward can be used to detect acute effects of exercise on inhibitory control, task-switching and working memory, respectively.
Although widely and solely reported , these “relative reliability”  data need to be handled with caution. Intra-class correlation coefficients are highly sensitive to inter-individual variability (heterogeneity) and their magnitude can be difficult to interpret . Absolute reliable data enable the comparison with other testings. Performance tests in athletes mostly require CoV levels below 5%  and recreational settings mostly require CoV values around or below the 10% level . Higher CoV values increasingly impede a reliable detection of “real” change due to the respective intervention. However, heterogeneous populations (e.g., seniors with mental decline, chronic disease) and settings (e.g., lab, home-based) might entail meaningful baseline and exercise-induced variability of cognitive outcomes. Thus, meaningful interventional change on individual level could be “masked” by variability of the measuring and biological “system”, respectively. 10% levels of variability values are given for time-dependent variables in our group of healthy seniors. Only a minority of testing instruments (E.g., Digit-Span testing) revealed inacceptable absolute variability in seniors. Low CoV values of speed and accuracy values are needed to increase the likelihood to detected true intervention-related changes and not due to chance variations. From a scientific point of view, acute intervention studies commonly evaluate mean changes from pre- to post-testing on a group or population level with a notable and inherent amount of noise. As a consequence, reported reliability data should be used to estimate the required sample size to detect meaningful intervention effects.
The present study comprises some limitations that need to be addressed. First, we included active and healthy seniors only. It might be reasonable that participants with MCI show larger absolute variability with lower values for correct responses (ceiling effects) compared to their healthy counterparts. However, seniors per se provide large inter-individual differences in cognitive functions due to different morphological and functional aspects of brain aging (e.g., localization of lesions). Thus, our results cannot be transferred to older and frail seniors with mild to severe cognitive impairment. Moreover, day-to-day variability in post-exercise assessments might have been influenced by the participants’ dynamic capacity for adjusting cognitive processing to external demands (i.e., cognitive reserve). Second, test-retest reliability was assessed for the acute effects of exercise on executive function, so that it remains unclear whether similar ICCs can be expected in a longitudinal design. However, the reproducibility of post-exercise effects indicates that studies investigating chronic effects should control for any exercise bouts performed prior to the assessment of executive function. Third, test-retest reliability was assessed before and after a single aerobic exercise session. Consequently, the present findings do not permit any conclusions about the durability of the effects elicited by acute exercise. A review of the current literature suggests that acute benefits on executive function are maintained for at least 20 to 60 min after exercise cessation .
Mainly time-depended variables (e.g., reaction time of the Eriksen-Flanker- and Stroop-Color test, correctly completed figures of the Five-Point test) of executive function showed notable differences at baseline and after moderate aerobic exercise between day 1 and 2. This difference decreased when comparing day 2 and 3. Thus, a notable habituation effect to the whole experimental set-up can be assumed and should be considered in future acute exercise studies on executive function. As a consequence, acute effects of moderate exercise intensity should not be overrated as the change scores are poorly reliable. Also absolute (CoV dropped from around 10% to 5% for reaction time) and relative reliability indices improved (ICC values from poor/fair to good/excellent) when comparing between-day reliability for day 2 vs. 3 compared to days 1 vs. 2. Correct responses and cumulative errors as accuracy indicators showed high percentage values indicating a ceiling effect in this population. However, reliability of accuracy turned out to be poor. Thus, highly accurate response with less variation before and after exercise can, however, cause poor reliability outcomes. Overall, Digit-span testing revealed absolute variability between 20 and 40%. This testing instrument might impede sufficient detection of exercise induced effects on executive function. Future research on baseline variability and exercise-induced effect on reliability including different types of exercise (e.g., strength, endurance, balance) in frailer and diseased seniors is needed in order to elucidate the specificity effect of exercise on executive function in the elderly population.
Coefficient of Variation
maximal Heart Rate
Heart Rate Reserve
Resting Heart Rate
Intraclass Correlation Coefficient
Minimal Cognitive Impairment
Millimol per Liter
Pearson Correlation Coefficient
Respiratory Exchange Ratio
Standardized Error of Measurement
Peak Oxygen Uptake
Lutz W, Sanderson W, Scherbov S. The coming acceleration of global population ageing. Nature. 2008;451(7179):716–9. https://doi.org/10.1038/nature06516.
Grady C. The cognitive neuroscience of ageing. Nat Rev Neurosci. 2012;13(7):491–505. https://doi.org/10.1038/nrn3256.
Connors MH, Sachdev PS, Kochan NA, Xu J, Draper B, Brodaty H. Cognition and mortality in older people: the Sydney memory and ageing study. Age Ageing. 2015;44(6):1049–54. https://doi.org/10.1093/ageing/afv139.
Bell-McGinty S, Podell K, Franzen M, Baird AD, Williams MJ. Standard measures of executive function in predicting instrumental activities of daily living in older adults. Int J Geriatr Psychiatry. 2002;17(9):828–34. https://doi.org/10.1002/gps.646.
Diamond A. Executive functions. Annu Rev Psychol. 2013;64:135–68. https://doi.org/10.1146/annurev-psych-113011-143750.
Miyake A, Friedman NP, Emerson MJ, Witzki AH, Howerter A, Wager TD. The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: a latent variable analysis. Cogn Psychol. 2000;41(1):49–100. https://doi.org/10.1006/cogp.1999.0734.
Colcombe S, Kramer AF. Fitness effects on the cognitive function of older adults: a meta-analytic study. Psychol Sci. 2003;14(2):125–30.
Erickson KI, Kramer AF. Aerobic exercise effects on cognitive and neural plasticity in older adults. Br J Sports Med. 2009;43(1):22–4. https://doi.org/10.1136/bjsm.2008.052498.
Smith PJ, Blumenthal JA, Hoffman BM, Cooper H, Strauman TA, Welsh-Bohmer K, Browndyke JN, Sherwood A. Aerobic exercise and neurocognitive performance: a meta-analytic review of randomized controlled trials. Psychosom Med. 2010;72(3):239–52. https://doi.org/10.1097/PSY.0b013e3181d14633.
Kennedy G, Hardman RJ, Macpherson H, Scholey AB, Pipingas A. How does exercise reduce the rate of age-associated cognitive decline? A review of potential mechanisms. J Alzheimers Dis. 2016; https://doi.org/10.3233/JAD-160665.
Robertson IH (2014) Right hemisphere role in cognitive reserve. Neurobiol Aging 2014, 35 (6):1375-1385. doi.org/10.1016/j.neurobiolaging.2013.11.028
de Souto BP, Delrieu J, Andrieu S, Vellas B, Rolland Y. Physical activity and cognitive function in middle-aged and older adults: an analysis of 104,909 people from 20 countries. Mayo Clin Proc. 2016; https://doi.org/10.1016/j.mayocp.2016.06.032.
Chang YK, Labban JD, Gapin JI, Etnier JL. The effects of acute exercise on cognitive performance: a meta-analysis. Brain Res. 2012;1453:87–101. https://doi.org/10.1016/j.brainres.2012.02.068.
Ludyga S, Gerber M, Brand S, Holsboer-Trachsler E, Puhse U. Acute effects of moderate aerobic exercise on specific aspects of executive function in different age and fitness groups: a meta-analysis. Psychophysiology. 2016a; https://doi.org/10.1111/psyp.12736.
Barnes JN. Exercise, cognitive function, and aging. Adv Physiol Educ. 2015;39(2):55–62. https://doi.org/10.1152/advan.00101.2014.
Kashihara K, Maruyama T, Murota M, Nakahara Y. Positive effects of acute and moderate physical exercise on cognitive function. J Physiol Anthropol. 2009;28(4):155–64.
Ludyga S, Hottenrott K, Gronwald T. Four weeks of high cadence training alter brain cortical activity in cyclists. J Sports Sci. 2016b:1–6. https://doi.org/10.1080/02640414.2016.1198045.
Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998;26(4):217–38.
Hopkins WG. Measures of reliability in sports medicine and science. Sports Med. 2000;30(1):1–15.
Faude O, Donath L, Roth R, Fricker L, Zahner L. Reliability of gait parameters during treadmill walking in community-dwelling healthy seniors. Gait Posture. 2012;36(3):444–8. https://doi.org/10.1016/j.gaitpost.2012.04.003.
Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Phys Ther. 2006;86(5):735–43.
Cardinal BJ, Esters J, Cardinal MK. Evaluation of the revised physical activity readiness questionnaire in older adults. Med Sci Sports Exerc. 1996;28(4):468–72.
Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189–98.
Seigerschmidt E, Mosch E, Siemen M, Forstl H, Bickel H. The clock drawing test and questionable dementia: reliability and validity. International journal of geriatric psychiatry. 2002;17(11):1048–54. https://doi.org/10.1002/gps.747.
Garber CE, Blissmer B, Deschenes MR, Franklin BA, Lamonte MJ, Lee IM, Nieman DC, Swain DP, American College of Sports M. American College of Sports Medicine position stand. Quantity and quality of exercise for developing and maintaining cardiorespiratory, musculoskeletal, and neuromotor fitness in apparently healthy adults: guidance for prescribing exercise. Med Sci Sports Exerc. 2011;43(7):1334–59. https://doi.org/10.1249/MSS.0b013e318213fefb.
Karvonen MJ, Kentala E, Mustala O. The effects of training on heart rate; a longitudinal study. Annales medicinae experimentalis et biologiae Fenniae. 1957;35(3):307–15.
Panton LB, Graves JE, Pollock ML, Garzarella L, Carroll JF, Leggett SH, Lowenthal DT, Guillen GJ. Relative heart rate, heart rate reserve, and VO2 during submaximal exercise in the elderly. J Gerontol A Biol Sci Med Sci. 1996;51(4):M165–71.
Tanaka H, Monahan KD, Seals DR. Age-predicted maximal heart rate revisited. J Am Coll Cardiol. 2001;37(1):153–6.
Midgley AW, McNaughton LR, Polman R, Marchant D. Criteria for determination of maximal oxygen uptake: a brief critique and recommendations for future research. Sports Med. 2007;37(12):1019–28.
Regard M, Strauss E, Knapp P. Children's production on verbal and non-verbal fluency tasks. Percept Mot Skills. 1982;55(3 Pt 1):839–44. https://doi.org/10.2466/pms.19184.108.40.2069.
Stroop JR. Studies of interference in serial verbal reactions. J Exp Psychol. 1935;18(6):643.
Lezak MD. Neuropsychological assessment. USA: Oxford University Press; 2004.
Spreen O, Strauss E. A compendium of neuropsychological tests: administration, norms, and commentary. USA: Oxford University Press; 1998.
Analysis of reliability with a spreadsheet. A new view of statistics (2007) http://sportsci.org/.
Wostmann NM, Aichert DS, Costa A, Rubia K, Moller HJ, Ettinger U. Reliability and plasticity of response inhibition and interference control. Brain Cogn. 2013;81(1):82–94. https://doi.org/10.1016/j.bandc.2012.09.010.
Beglinger LJ, Gaydos B, Tangphao-Daniels O, Duff K, Kareken DA, Crawford J, Fastenau PS, Siemers ER. Practice effects and the use of alternate forms in serial neuropsychological testing. Arch Clin Neuropsychol. 2005;20(4):517–29. https://doi.org/10.1016/j.acn.2004.12.003.
Förster J, Higgins ET, Bianco AT (2003) Speed/accuracy decisions in task performance: built-in trade-off or separate strategic concerns? Organ Behav Hum Decis Process 90 (1):148-164.
Waters GS, Caplan D. The reliability and stability of verbal working memory measures. Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc. 2003;35(4):550–64.
Paap KR, Oliver S. The of role of test-retest reliability in measuring individual and group differences in executive functioning. J Neurosci Methods. 2016; https://doi.org/10.1016/j.jneumeth.2016.10.002.
Baumgartner TA. Norm-referenced measurement: reliability. Measurement concepts in physical education and exercise science 20; 1989. p. 45–7.
Hopkins WG, Schabort EJ, Hawley JA. Reliability of power in physical performance tests. Sports Med. 2001;31(3):211–34.
We would like to cordially thank all participants for their enthusiastic participation and appreciate their compliance.
No funding for the study was received.
Availability of data and materials
Raw data are anonymously stored according to the ethical guidelines of good scientific and clinical practise.
Ethics approval and consent to participate
The study was approved by the local ethical committee of the University of Basel (Ethikkommission Nordwestschweiz: http://eknz.ch; Approval Number: 11/23/2015–254) and meets the criteria of the declaration of Helsinki. After receiving all relevant information, the participants signed an informed consent to the study including a permission to publish the data. No competing interests need to be declared.
Consent for publication
The consent of publication was given by all participants and co-authors prior to the submission.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.