Development and validation of a model to estimate the risk of acute ischemic stroke in geriatric patients with primary hypertension

Objectives This study aimed to construct and validate a prediction model of acute ischemic stroke in geriatric patients with primary hypertension. Methods This retrospective file review collected information on 1367 geriatric patients diagnosed with primary hypertension and with and without acute ischemic stroke between October 2018 and May 2020. The study cohort was randomly divided into a training set and a testing set at a ratio of 70 to 30%. A total of 15 clinical indicators were assessed using the chi-square test and then multivariable logistic regression analysis to develop the prediction model. We employed the area under the curve (AUC) and calibration curves to assess the performance of the model and a nomogram for visualization. Internal verification by bootstrap resampling (1000 times) and external verification with the independent testing set determined the accuracy of the model. Finally, this model was compared with four machine learning algorithms to identify the most effective method for predicting the risk of stroke. Results The prediction model identified six variables (smoking, alcohol abuse, blood pressure management, stroke history, diabetes, and carotid artery stenosis). The AUC was 0.736 in the training set and 0.730 and 0.725 after resampling and in the external verification, respectively. The calibration curve illustrated a close overlap between the predicted and actual diagnosis of stroke in both the training set and testing validation. The multivariable logistic regression analysis and support vector machine with radial basis function kernel were the best models with an AUC of 0.710. Conclusion The prediction model using multiple logistic regression analysis has considerable accuracy and can be visualized in a nomogram, which is convenient for its clinical application.


Introduction
According to estimates by the World Health Organization, stroke is the second leading cause of death that will account for 7.8 million deaths and 23 million first-time ischemic stroke events by 2030 [1]. Many risk factors for stroke, such as hypertension, dyslipidemia, diabetes, smoking, and alcohol consumption, have been identified [2]. With rising levels of prosperity and an aging population, the prevalence of hypertension in China has increased from 23.4% in 1991 to 28.6% in 2011 (concerning approximately 300 million adults), which places a huge burden on public health resources [3]. Hypertensive patients commonly suffer acute ischemic strokes, especially among the elderly with multiple risk factors.
Considering the high fatality and disability rates resulting from stroke, we intended to develop a practical prediction model by integrating the common risk factors observed in the clinic. It is beneficial to estimate the risk of acute ischemic stroke in geriatric patients with primary hypertension so that appropriate preventive measures can be taken. Nomograms have been widely used for medical diagnosis and prognosis evaluation in recent years [4,5] for their user-friendliness. Our aim was to provide an individualized clinical decision tool for physicians.

Study design and data source
This retrospective file review entailed the extraction of information on geriatric patients who were older than 60 years [6] and diagnosed with primary hypertension, whether or not they suffered an acute ischemic stroke, from the electronic medical record database of the affiliated hospital of Guangdong medical university from October 2018 to May 2020. Patients with detailed clinical information, biochemical, and imaging examinations were included in the study. The diagnosis of acute ischemic stroke was based on neuroimaging.
This resulted in the files of a total of 1367 patients being analyzed in this retrospective study and randomly divided these into a training set and a testing set in a ratio of 70 to 30%.

Study variables
A total of 15 risk factors associated with stroke were included in the study based on the literature [1,[7][8][9] and are listed in Table 1. Risk factors are indicators that can be easily assessed in clinical practice. All the risk factors were transformed into categorical variables to develop a nomogram. With this model, the sample size should be at least ten times greater than the number of variables [11].

Statistical analysis
All variables were expressed as counts (%). Statistical analysis was performed using R software 3.6.1(http:// www.R-project.org/). The risk factors showing a P-value < 0.05 in the Chi-square test were regarded as statistically significant. Multivariable logistic regression analysis was used to identify the optimal variables for the construction of the prediction model. These variables were expressed as odds ratios (ORs) with 95% confidence intervals (CIs) and P-values. The area under the curve (AUC) and calibration curves were used to assess the performance of the prediction model. A nomogram was developed to visualize the prediction model in a userfriendly manner [12,13]. Furthermore, we applied four machine-learning classifiers (random forest, support vector machine with polynomial kernel, support vector machine with radial basis function kernel, and backpropagation neural network) using JupyterLab 1.2.6 (https://jupyterlab.readthedocs.io/ en) to compare the results with the multivariable logistic regression model. The best combination of parameters of the machine learning algorithms was identified based on the highest log-likelihood. The average log-likelihood over five repetitions of fivefold cross-validation was used to select the optimal parameters [14].

Baseline characteristics and optimal risk factors identification
Among the 1367 patients diagnosed with primary hypertension between October 2018 and May 2020 in this study, 437 had suffered an acute ischemic stroke. A total of 959 patients were assigned to the training set and 408 to the testing set. Detailed information about the characteristics of patients in the total cohort and the training set are shown in Tables 2 and Table 3, respectively.
There were nine variables (gender, smoking, alcohol abuse, blood pressure management, a history of stroke, diabetes, carotid artery stenosis (CAS), total cholesterol, and LDL-cholesterol) with statistically significant differences (P < 0.05) in the chi-square test. Six variables (smoking, alcohol abuse, blood pressure management, stroke history, diabetes, CAS) showed a statistically significant difference (P < 0.05) in the multivariable logistic regression analysis. The results of the multivariable logistic regression analysis are displayed as forest plots in Fig. 1.

Construction and assessment of the prediction nomogram
The prediction model was constructed by multivariable logistic regression based on the six identified variables (smoking, alcohol abuse, blood pressure management, stroke history, diabetes, CAS). The nomogram in Fig. 2 visualizes the model in a user-friendly manner.
Nomogram interpretation: The observed value of each feature variable was assigned a certain number of points by drawing a vertical line towards the top points scale. The sum of the points for each variable corresponded to the individual risk of acute ischemic stroke. If we assume that a geriatric patient has a history of ischemic stroke, smoking and poor blood pressure management, but no alcohol abuse or carotid stenosis, we can calculate the score of each feature of the patient according to the value of each variable: smoking (68 points) + history of ischemic stroke (54 points) + poor blood pressure management (100 points) + without alcohol abuse or carotid stenosis (0 points) =222 total points. From the total points scale, a line perpendicular to the acute ischemic risk scale at the bottom shows that the probability of acute ischemic stroke occurrence is about 75%.
The AUC of the prediction model was 0.736 in the training set, while the AUC after 1000-times bootstrap resampling was 0.730 and 0.725 in the external verification using the testing set (Fig. 3). The calibration curve illustrated an overlap between the probabilities of the predicted and actual diagnosis of stroke in both the training set and the testing set (Fig. 4).

Multivariable logistic regression analysis and machine learning
We constructed the prediction model based on the same variables using the five different algorithms, and verified them using the testing set. The multivariable logistic regression analysis and support vector machine with radial basis function kernel both achieved an AUC score of 0.71 that was better than the other three prediction models (Fig. 5).

Discussion
This study developed a practical nomogram that includes six variables that can be easily identified in the clinic to assist physicians in discriminating patients with  high risk of stroke, enabling them to implement preventive measures as early as possible. Blood pressure management is the most important variable that has a positive effect on stroke. With aging, the vascular elasticity decreases as a consequence of atherosclerosis. Thus, it is recommended that the systolic blood pressure in the elderly is less than 150 mmHg [15]. A meta-analysis reported that there was a 41% reduction in stroke for every blood pressure reduction of 10 mmHg systolic or 5 mmHg diastolic [16]. Although various hypertension guidelines indicate a certain goal of blood pressure control, few large-scale clinical evidencebased data focus on hypertension or stroke in very elderly patients. Professional doctors should be aware of this practical clinical problem and pay attention to the notion of individualized blood pressure management in elderly patients [17], without ignoring the symptoms and feelings of very elderly patients. In addition to the absolute value of blood pressure, blood pressure variability deserves attention. Excessive blood pressure fluctuation in the morning is a classic phenomenon. Kario used ambulatory blood pressure monitoring and magnetic resonance imaging and demonstrated that an exaggerated early morning blood pressure surge was independently associated with stroke in elderly hypertensive patients. The risk of stroke in patients with a morning blood pressure surge > 55 mmHg was 2.7 times higher than that in patients with a morning blood pressure surge < 55 mmHg. Pierdominico reached a similar conclusion that stroke had a relationship with an exaggerated early morning blood pressure surge independent of the 24-h average blood pressure [18,19].
Smoking and alcoholism are controllable risk factors for stroke. Both played an important role in our prediction model, and these were valid for more than 90% of the males in our cohort. A large number of clinical studies in different races and populations have confirmed the strong association between smoking and stroke, while exposure to secondhand smoke should also be noted. Current smokers are at least two-to-four times more likely to have a stroke than those who never smoked or those who quit smoking 10 years ago [20]. Some epidemiological studies have demonstrated that the impact of drinking on stroke risk depends on the quantity. A small amount of red wine may reduce the risk of cardiovascular disease and stroke. However, alcohol abuse (> 60 g/day) is associated with an increased risk of stroke in the long term [21,22].
CAS is a marker of systemic atherosclerosis that can be easily detected by ultrasound. According to studies from the 1980s, the annual risk of ipsilateral stroke was 3% in patients with a CAS ≥ 50%, which increased to 5.5% in patients with a CAS > 75%. With the widespread use of preventive drugs, the annual risk of stroke has  been reduced to 0.34% for patients with a CAS ≥ 50% in contemporary studies [23,24].
Other risk factors that are not included in our nomogram, such as age, total cholesterol and LDL-cholesterol [25][26][27], were proven to be related to stroke by an abundance of clinical trials and should be considered by clinicians. It is worth noting that elderly patients usually present with multiple chronic diseases, such as hypertension, diabetes and coronary heart disease. The risk of ischemic stroke caused by pathological changes of organs caused by these diseases may be more serious than that caused by physiological aging [28]. Additionally, elderly patients often do not adhere to prescribed treatments. The direct visual display of the nomogram model can play a role in educating elderly patients and increase their compliance to treatment.
In the era of artificial intelligence, machine learning has become a popular method in data analysis. It utilizes  mathematical models and training data to make predictions [29,30]. The random forest, support vector machines, and backpropagation neural networks are three representative algorithms of machine learning that are increasingly used in the prediction of adverse events in clinical practice or biological research in tumor [31,32]. Although these machine learning algorithms have attracted much attention with the availability of increasingly voluminous datasets (such as electronic medical records), the internal process of which is similar to a "black box" with poor interpretability and visualization, limit their practical application.
In a number of reports, the results of multivariable logistic regression analysis as the classic reference standard were compared with those of machine learning algorithms. In our study, the machine learning algorithms offered no obvious advantage over multivariable logistic regression in evaluating a binary categorical problem (whether or not patients will suffer an acute ischemic stroke). This conclusion is the same as that of several recent studies [14,33].
Our prediction model based on multivariable logistic regression analysis not only has considerable accuracy but also can be visualized by a nomogram, which is convenient for its clinical application.

Limitations
This study was a single-center retrospective study, which limits its generalizability. As a retrospective study, potential selection bias was inevitable. Furthermore, there are numerous other stroke-related risk factors, such as the body mass index, diet habits, and physical exercise, that were not analyzed because they were not reported in the electronic records of patients. Authors' contributions Xifeng Zheng, Fang Fang and Weidong Nong were involved in the conception and design of the study. Xifeng Zheng and Fang Fang were responsible for software, visualization and article writing. Weidong Nong and Yu Yang were involved in analysis of the data. Xifeng Zheng and Dehui Feng provided scientific supervision. All authors reviewed and approved the final manuscript.

Funding
The research belongs to one of Zhanjiang science and technology programs, No. 2021B01364.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Declarations
Ethics approval and consent to participate The research was approved by the Ethics Committee of the Affiliated Hospital of Guangdong Medical University and the informed consent was waived due to the retrospective nature of the analysis. Researchers tried their best to protect the information from disclosure.

Consent for publication
Not Applicable.