The trail making test as a screening instrument for driving performance in older drivers; a translational research

Background In many countries, primary care physicians determine whether or not older drivers are fit to drive. Little, however, is known regarding the effects of cognitive decline on driving performance and the means to detect it. This study explores to what extent the trail making test (TMT) can provide indications to clinicians about their older patients’ on-road driving performance in the context of cognitive decline. Methods This translational study was nested within a cohort study and an exploratory psychophysics study. The target population of interest was constituted of older drivers in the absence of important cognitive or physical disorders. We therefore recruited and tested 404 home-dwelling drivers, aged 70 years or more and in possession of valid drivers’ licenses, who volunteered to participate in a driving refresher course. Forty-five drivers also agreed to undergo further testing at our lab. On-road driving performance was evaluated by instructors during a 45 minute validated open-road circuit. Drivers were classified as either being excellent, good, moderate, or poor depending on their score on a standardized evaluation of on-road driving performance. Results The area under the receiver operator curve for detecting poorly performing drivers was 0.668 (CI95% 0.558 to 0.778) for the TMT-A, and 0.662 (CI95% 0.542 to 0.783) for the TMT-B. TMT was related to contrast sensitivity, motion direction, orientation discrimination, working memory, verbal fluency, and literacy. Older patients with a TMT-A ≥ 54 seconds or a TMT-B ≥ 150 seconds have a threefold (CI95% 1.3 to 7.0) increased risk of performing poorly during the on-road evaluation. TMT had a sensitivity of 63.6%, a specificity of 64.9%, a positive predictive value of 9.5%, and a negative predictive value of 96.9%. Conclusion In screening settings, the TMT would have clinicians uselessly consider driving cessation in nine drivers out of ten. Given the important negative impact this could have on older drivers, this study confirms the TMT not to be specific enough for clinicians to justify driving cessation without complementary investigations on driving behaviors. Electronic supplementary material The online version of this article (doi:10.1186/1471-2318-14-123) contains supplementary material, which is available to authorized users.


Background
The trail making test (TMT) is a neuropsychological paper-form test that was initially developed by the US army during the second world war to evaluate overall performance in new recruits [1]. During the late'40s and early'50s, two of its creators, Armitage [2] and Reitan [3], then transposed its application to assess brain injury in patients following stroke. Its ability to assess fitness to drive was first tested in 1992 for patients with closed brain injury [4] and for older drivers the following year [5]. Since then, studies have shown the TMT to be one of the best performing paper-andpencil-based neuropsychological tests in predicting driving difficulties [6][7][8].
Like most neuropsychological tests, there is only a weak association between on-road evaluations and TMT performance values [7,9]. For example, a recent study showed limitations of the TMT in correctly identifying patients deemed unfit to drive [10]. In addition, studies have so far failed to define appropriate cut-off values for the TMT-B to detect unfitness to drive [11]. These issues are crucial for many of the guidelines [12][13][14], including those of the American Medical Association and the Canadian Medical Association, that recommend the TMT to assess fitness to drive. The TMT is nevertheless now being used by primary care physicians who, in many countries, have assumed the responsibility of detecting unfit older drivers with some relative success [15]. Indeed, current guidelines and use of cut-off points for the TMT could lead many primary care practitioners and geriatricians to wrongly consider enforcing driving cessation when assessing fitness to drive. Given the negative consequences for homedwelling older patients, for whom losing their driver's license often entails important changes with negative consequences for their health [15,16], this debate needs to be addressed more specifically. This study investigated to what extent primary care physicians and geriatricians can transpose screening results using the TMT to their patients' hypothetical performance in an on-road evaluation.

Objectives
Our primary objective was to study the strength of the association of TMT with on-road performance and provide clinicians with predictive values of driving performance when screening older people in apparent healthy cognitive states. Our secondary objectives were to provide TMTnormative data for healthy older drivers, verify whether level of education is an appropriate indicator of the literacy required to perform the TMT-B, and break TMT-B down to psychophysics components known to alter with aging and cognitive decline.

Design
This translational research was nested in two separate studies. The first was a cohort study exploring cognitive decline, driving performance, and driving cessation. The second was an explorative study in psychophysics investigating the links between cognitive decline, metabolism, and genetic factors.

Settings
Our aim was to study a representative sample of older drivers independently of their health status. We therefore chose to test fitness to drive and on-road driving performance of older participants in a driving refresher course provided by the Swiss Automobile Club.

Participants
In collaboration with the State Driver and Vehicle Licensing Agency and the Swiss Automobile Club, we wrote to all drivers who had reached their 70 th year and were residents of eastern Lausanne (fall 2011), northern Vaud and Valais (spring 2012), western Lausanne (fall 2012), and Vevey, Montreux, Aigle, and Entremont (spring 2013), inviting them to participate in a refresher course on driving competencies. In this refresher course, all participants were then offered the opportunity to participate in this study. During the spring 2013 session, older drivers were also invited to join the second part of this study investigating the psychophysics components of the TMT. To be included, participants had to hold a valid Swiss driver's license, be aged 70 years or over, and not be institutionalized.

TMT
The first part of the TMT measures the time participants need to connect 25 numbered circles in an ascending order (part A). In the second part (B), 13 numbers and 12 letters have to be alternately connected in their numerical and alphabetical order. Participants were notified of errors immediately and required to correct them without assistance with the clock running.

Medical status and driving history
Older drivers were invited to volunteer for a two-hour interview to collect information on their driving history and their medical status. Visual acuity, visual field, contrast sensitivity, medication, functional mobility using the Timed Up-and-Go test (TUG) [17], the MoCA [18], average weekly distance driven, and history of accidents was some of the information we collected and then used for this study.

Defining the healthy population for normative values
The healthy population was defined as drivers with normal optical vision (acuity ≥0.6 decimals, binocular visual field ≥140°), normal cognitive functions as per the Montreal Cognitive Assessment (MoCA ≥ 26), normal functional mobility (TUG < 13.5 sec), and no known risk of sudden blackout (history of sudden blackout, epilepsy, arrhythmia, uncontrolled diabetes, or sleep apnea), and who were not regularly or occasionally under the influence of class III medication [19].

On-road driving evaluation
Routes were standardized for participants from the same region. They were sufficiently difficult for lapses to occur, and long enough (≈45 minutes) to assess the effects of sustained attention. Routes included urban and rural sections, secondary and principle roads and highways, simple and complex intersections, "roundabouts" (circular intersections with changing on-road priorities), traffic signals, and complex lane selections. The Swiss National Council for Road Security validated the routes. Twelve driving instructors participated in the study. They were either self-employed or were employees of the Swiss Automobile Club. They were all certified by the Swiss National Council for Road Security with a specific diploma for managing older-driver instruction. Driving instructors were blinded to the results from the psycho-medical evaluation and reported their "gestalt" evaluation of driving performance as "good" or "sufficient" for the following criteria: respecting road regulations, handling vehicle, speed adaptation, correct position on the road, comfort, behavior toward other road users, observation, and anticipation. Driving competencies were summarized as excellent (no lapse), good (lapses reported for one or two items), moderate (lapses reported for three to five items), or poor (lapses reported for six to eight items). This scoring method was verified using principle component analysis and Rash analysis thereby confirming its unique dimension (Eigenvalue = 5.1) and good fit to an overall trait (R1c = 12.2, d.f. = 14, p = 0.565).

Literacy
To evaluate the influence of literacy on the TMT-B, an additional task was, at a later stage, developed specifically for this study: the KHE task. Participants were asked to specify which letter would come after each of three specific letters of the alphabet. As soon as a participant gave a correct answer, the next letter was provided to them. Participants were told they were to answer correctly as fast as they could. The task was timed from the moment the first reference letter was announced to the moment the third answer was provided by the participant. The duration and number of errors were then recorded. The letters used were K, H, and E, and the expected answers were "L", "I", and "F".

Psychophysical components
Over two additional two-and-a-half-hour sessions, participants in the spring 2013 session underwent a series of additional tests in our lab. A researcher, blind to the results from the TMT and the on-road evaluation, tested visual acuity (Landolt C, FrACT version 3.7 l) [20], contrast sensitivity (Gabor patch), visual backward masking (Vernier task) [21,22], motion direction sensitivity, orientation discrimination sensitivity, biological motion, visual search (16 objects), the Simon effect, simple response time, executive functions (Wisconsin Card Sorting Test), verbal fluency, and working memory (digital forward and backward task). For further details on these tests see Additional file 1.

Statistical methods
Sample size was calculated to detect a two-fold increase in the risk of performing poorly on the driving test assuming one patient out of five would be positive to the TMT and that 20% of the participants would exhibit poor driving performance. With a significance level set at 0.05 and a power of 0.9, this required recruiting 408 participants.
We excluded patients for whom data on driving performance or TMT were unavailable. TMT was log transformed. Association to driving performance was then tested using linear regression. Driving performance was dichotomized to distinguish poorly performing drivers from all other drivers, and drivers who performed well from all others. We then defined two different cut-off values: the first to identify poor drivers with a specificity of 75%, and the second to identify good drivers with a specificity of 75%. TMT-A and TMT-B results were then combined (A and B negative to rule out, A or B positive to rule in), and predictive values measured. We then verified if this association was influenced by age, gender, education, or driving experience by the use of logistic regression. All continuous variables entered in the model were transformed to be normally distributed and range from zero to one for the fifth and ninety-fifth percentiles of the healthy population. For this analysis, missing data was completed using sequential regression multiple imputations by chained equations (50 times).
To model components of the TMT-B using psychophysics measures, we used Poisson regression with robust estimator of variance. All continuous variables were transformed to range from 0 to 1 for the twenty-fifth and seventy-fifth percentiles of the studied population. Statistical methods were defined prior to analysis and run using STATA 12, except for neural network analysis for which we used the Neural Network Toolbox 8.1 in MATLAB R2013b.

Ethical standards
Both studies from which data was drawn were approved by the official state ethics committee for the Canton of Vaud (www.cer-vd.ch) under the references CE 157/2011 and CE 384/2011. An amendment was accepted in May 2013 to obtain participants' consent to share data between the studies. All participants gave their informed consent prior to their inclusion. Both studies were performed in accordance with the ethical standards of the 2008 amended Declaration of Helsinki (Seoul).

Population description
Between May 2011 and September 2013, 40.2% (404) of participants of a driving refresher course for the elderly participated in this study. Reasons for not participating are provided in Figure 1. Participants' characteristics are described in the left column of Table 1. Forty-one of these drivers also volunteered to undergo a series of psychophysical tests at our lab.

Reference values for the healthy population
One hundred and ninety-seven participants (48.8%) were considered to be healthy. Reasons for excluding the remaining 207 are provided in Figure 1. Compared to other drivers, healthy drivers were younger, were more likely to be female, and were less likely to have been involved in an accident involving injury during the previous two years ( Table 1). Half of the healthy drivers took less than 42 seconds to perform the TMT-A, and less than 94 seconds to perform the TMT-B (Table 2). Our observations reveal that independently of age or education TMT-A and TMT-B durations show very important variations in healthy older drivers; normal values ranged from simple to triple. We observed a slight increase in the duration of the TMT for drivers aged 80 and upward compared to other older drivers. The mean difference was of 8.4 seconds for the TMT-A (R 2 = 0.038, p = 0.006) and 31.7 seconds for the TMT-B (R 2 = 0.061, p < 0.001). Lower education level and gender were associated to TMT-B but not TMT-A. Difference related to gender seemed to arise from a minority of male participants with very slow performances. As for age and education, when observing the probability distribution of TMT-B values, we noticed an overall shift of values toward slower execution times. This supports the hypothesis that cognitive decline affects performance for all drivers even in the absence of motor-or cognitive disorders and that difficulties with the alphabet might need to be accounted for.

Literacy versus years of education
Seventy consecutive participants completed the KHE task. The main result was the time needed to provide correct responses and this ranged from 2.8 seconds to 25 seconds with a median at 6.8 seconds. At least one error was made by 21.7% of participants. On average, making an error increased the duration of the task by 5.6 seconds (CI95% 2.8 to 8.5, p < 0.001). KHE task durations of 12 seconds or more were considered as positive (n = 9). KHE performed better in predicting the number of seconds required to complete the TMT-B than did level of education (R 2 = 0.023 vs. R 2 = 0.006, likelihood ratio test p < 0.001). From our regression analysis, to adjust for difficulties with the alphabet the overall TMT-B values should be reduced by 25% for those with a positive KHE task (≥12 seconds).

On-road evaluation
Of the 404 older drivers, 190 (47.0%) were considered to be excellent drivers, 109 (27.0%) good drivers, 83 (20.5%) moderate drivers, and only 22 (5.4%) poor drivers. TMT results show that many excellent drivers have poor results on this test (Figure 2). Nevertheless, independently of age, gender, and education level, compared to other drivers those that performed poorly on the on-road evaluation took 22.2% (CI95% 0.4 to 48.7, p = 0.045) more time to perform the TMT-A and 63.9% (CI95% 14.8 to 134.2, p = 0.007) more time for the TMT-B. The TMT's ability to correctly classify those with poor driving performance was above chance for both TMT-A (AUC = 0.668, CI95% 0.558 to 0.778) and TMT-B (AUC = 0.662, CI95% 0.542 to 0.783). Older drivers were then categorized into three groups. Those who had a TMT-A < 35 sec and a TMT-B <80 sec were ruled out as being unfit to drive (13.1% of drivers), those who had a TMT-A ≥54 sec or a TMT-B ≥150 sec were ruled in as been potentially unfit to drive (35.1%), and the remaining drivers (51.8%) remained in a grey zone. Not a single driver from the fit group was evaluated as a poor driver whereas fourteen of the 148 "unfit" drivers (9.5%) were. We observed a threefold increase in the risk of been a poor driver if TMT-A ≥54 sec orTMT-B ≥150 (CI95% 1.3 to 7.0; p = 0.007). Relying on the TMT alone, we would nevertheless need to send approximately one participant out of three for an on-road evaluation. Of each ten patients who would then undergo the on-road evaluation, only one would be considered to be a poor driver (sensitivity = 63.6%, specificity = 64.9%, PPV = 9.5%, NPV = 96.9%, PLR = 1.81, NLR = 0.56). Other than the TMT, cognitive impairment, as measured by the modified MoCA, and driving experience were also associated to on-road driving performance (Table 4). However, the MoCA, which includes a modified TMT, was only associated to poor  *The healthy population was defined as drivers with normal optical vision, no cognitive impairment (MoCA ≥ 26), normal functional mobility (TUG < 13.5 sec), no known risk of sudden blackout, and without class III medication affecting driving performance. † MARS contrast sensitivity was not collected from the start of the study and was therefore available for only 158 participants, of whom 76 were healthy. ‡ P-Values are for comparing healthy participants to "unhealthy participants". CS = contrast sensitivity, MoCA = Montreal cognitive assessment, MoCA mod = Modified MoCA (without TMT or education level).
driving performance for those with severe cognitive impairment but not for those with mild cognitive impairment ( Figure 2C and Table 4). The model for driving performance including TMT-A, age, timed up-and-go test, and distance driven per week (Table 4; Model A) showed only TMT-A and distance driven to be related to driving performance. The same was observed when modeling TMT-B including education level (Table 4: Model B). Furthermore, including these factors in neural network modeling did not perform any better than using the TMT alone in identifying drivers with poor driving performance (sensitivity = 52.8%, specificity = 43.3%, PPV = 53.0%, NPV = 97.5%).

History of motor vehicle collisions
One hundred and sixty-seven drivers reported having had a motor vehicle collision (MVC) during the past two years (41.3%). Those who either had a TMT-A ≥54 seconds or a TMT-B ≥150 seconds were more likely to have had a shorter period without MVC than other drivers (HR = 1.48, CI95% 1.06 to 2.06, p = 0.022).

Overview of results and clinical applications
This study shows that cognitive decline in the absence of disease affects the TMT and driving performance. Decline mainly concerns drivers aged 80 years or more. Using the TMT for screening purposes below that age seems unjustified unless there is an underlying known cause of cognitive decline. We also advise not to rely on an age-specific percentile to define cut-off points of abnormality given that this can lead to natural cognitive decline not being accounted for. The same applies for education level as it neglects underlying cognitive deficits that would have also affected scholarship. Instead we suggest verifying patients do not have difficulties with the alphabet. We suggest TMT-B results to be invalid for those who require 12 seconds or more to perform the KHE test. Under these conditions, our study provides clinicians with a simple rule in interpreting TMT results when screening for unfitness to drive. Effects of cognitive decline on driving can be ruled out for those who can perform the TMT-A in less than 35 seconds, and the TMT-B in less than 80 seconds. On the other hand, negative consequences of cortical dysfunction for driving performance can be suspected for those with a TMT-A ≥54 seconds or a TMT-B ≥150 seconds. These drivers are three times more at risk of being poor drivers. However, if we were to have all these people cease driving, we would uselessly reduce the mobility of nine out of ten positive patients. This is absolutely to be avoided, as reducing mobility is known to affect patients and have important negative consequences on their health [15,23,24]. Our results show that the psychophysical functions evaluated by the TMT are those that are indeed most useful for driving. In other words, the TMT is affected by reduced performance on basic visual tasks that are deemed essential for driving. Why, then, is TMT performance only weakly correlated with on-road performance? The similarly bad performance of the MoCA suggests this is not due to the lack of sensitivity of the TMT. Our results even suggest that the TMT does better than the MoCA in classifying poor performing drivers from other drivers when screening older drivers. The TMT is known to perform better in detecting poor performing drivers compared to the mini mental state exam (MMSE). The important load of memory tests within these batteries of tests might affect their validity in predicting on-road events. Memory has indeed been shown to have little to do with driving performance [7]. On the other hand, the TMT is a more precise indicator of reduced visual processing speed [25,26]. We suggest that older drivers may be well aware of their visual limitations related to cognitive decline and have had time to adapt their behavior so that they are not perceivable during the on-road evaluation. In older drivers, tactical and strategic compensations have been shown to reduce the  risks of accident [27]. The underlying mechanisms of these compensations remain unknown and the compensations are, thus, difficult to evaluate clinically. When investigating cognitive decline, we therefore encourage physicians to confirm unfitness to drive with an on-road evaluation. Occupational therapists are the best placed, in collaboration with a driving instructor, to address this problem [28].

Comparison to previous studies
Our results are very similar to those of Classen et al. [29] who used an arbitrary cut-off point set at TMT-B > 180 sec and found an OR = 2.5 of failing an on-road test. Mazer et al. [30] used a different arbitrary cut-off point of three or more errors during the TMT-B in patients with stroke and found an OR of 6.0 to be judged as a bad driver, using a 43-item assessment form filled in by an occupational therapist. We have reasons to believe that the association of the TMT-B to road accidents could even be weaker as Ball et al. [31] found an OR of 1.21 and Marottoli et al. [32] an HR of 1.42 and Rozzini et al. [33] an OR of 2.3. All these results show that the TMT does not clearly distinguish poor drivers from others. When comparing our results to those of other tests, the TMT does just as badly in distinguishing good from poor drivers as any other test, including the UFOV [7], or combination of determinants such as the 4C [34]. It has also been shown that 40% of drivers with severe cognitive impairment are considered as competent drivers during on-road evaluations [35]. This suggests that the TMT's lack of precision is not due to the nature of the test itself, but more to the complexity of the ways in which older drivers adapt their behavior to their condition and the fact that they can perform well even if they are affected by cognitive decline.

Limitations
The studied population was not randomly sampled from the general population and corresponds to approximately 6% of all older drivers from four regions. Nevertheless, the prevalence of accidents involving injury was very similar to that observed in the general population [36], and the prevalence of minor cognitive impairment was not lower than that usually expected [37]. Finally, in Switzerland, from the age of 70 years onwards, drivers are requested to have a physician assess their fitness to drive every two years. We therefore believe this sample to be representative of patients without severe cognitive impairment attending their primary care physician for their compulsory evaluation of fitness to drive. Another limitation is related to the debate over whether on-road evaluation is, or is not, the 'gold standard' of driving performance. In other words, is there a strong link between on-road evaluation and road accidents? Keall and Frith [36] showed that drivers of 80 years or more who fail an on-road driving test had an increased risk of 1.7 times (CI95% 1.3 to 2.2) of being involved in a crash involving injury in the following two years. This cannot be considered as a strong link but is nevertheless of the same magnitude as the increased risk observed for drivers with 0.08% blood alcohol concentrations. Conversely, this also means that in Keall and Frith's study, 98.8% of drivers who failed the on-road test were not involved in an accident involving injury and would therefore have been unjustly prevented from driving had their licenses been withdrawn. This is nevertheless the cost that our society is ready to pay for road safety. Contrarily to those who drink and drive, older drivers do not choose to become impaired. We therefore must always keep in mind that older drivers carry the burden of this sacrifice and should be treated with the highest respect and regard for agreeing to do so.

Conclusion
Our results do not support the use of the TMT as a single measure in deciding whether or not an older driver is unfit to drive. A discussion on the potential impact of cognitive decline on driving performance should be initiated for those with TMT-A ≥54 seconds or those with TMT-B ≥150 seconds. When driving difficulties are identified, efforts should be made to have elderly drivers themselves make the decision to give up driving. Physicians are well placed to encourage them in this process and to help finding alternative solutions to maintaining the elderly's mobility [38].

Additional file
Additional file 1: Description of psychophysical tests. This pdf file provides details on the methods used to measure underlying psychophysical functions potentially related to the Trail Making Task.
Competing interests PV and BF are developing a computer-based screening instrument to assess cognitive fitness to drive; an assessment which is carried out in the primary care environment. They have nevertheless officially renounced any personal financial interest in relation to this instrument. All other authors declare having no competing interest.
Authors' contributions BF, MH, and PV applied for grants; BF, PV, DH, and MH designed the study; PV and DH wrote the protocols; PV and IC collected clinical data for the cohort study; IC designed the KHE test; DH collected psychometric data for the EPFL study; PV planned the statistical method and analyzed data; all authors interpreted results; PV wrote the first manuscript draft. All authors have given final approval of this version of the manuscript.