Modeling mortality prediction in older adults with dementia receiving COVID-19 vaccination

Objective This study compared COVID-19 outcomes between vaccinated and unvaccinated older adults with and without cognitive impairment. Method Electronic health records from Israel from March 2020-February 2022 were analyzed for a large cohort (N = 85,288) aged 65 + . Machine learning constructed models to predict mortality risk from patient factors. Outcomes examined were COVID-19 mortality and hospitalization post-vaccination. Results Our study highlights the significant reduction in mortality risk among older adults with cognitive disorders following COVID-19 vaccination, showcasing a survival rate improvement to 93%. Utilizing machine learning for mortality prediction, we found the XGBoost model, enhanced with inverse probability of treatment weighting, to be the most effective, achieving an AUC-PR value of 0.89. This underscores the importance of predictive analytics in identifying high-risk individuals, emphasizing the critical role of vaccination in mitigating mortality and supporting targeted healthcare interventions. Conclusions COVID-19 vaccination strongly reduced poor outcomes in older adults with cognitive impairment. Predictive analytics can help identify highest-risk cases requiring targeted interventions. Supplementary Information The online version contains supplementary material available at 10.1186/s12877-024-04982-7.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic created immense challenges for older adults with Alzheimer's disease and related dementias (ADRD) [1,2].These progressive neurological conditions involve impairment of cognitive functions including memory, language, and thinking, which were exacerbated by the pandemic's disruptions [3].Indeed, individuals with ADRD have faced disproportionate adverse outcomes from COVID-19 infection compared to those without dementia, including higher mortality rates as shown in meta-analyses [4].Increased morbidity, accelerated cognitive and functional decline, and a rise in urgent hospitalizations have also been reported in this population [5,6].These negative effects resulted from reduced access to formal caregivers, unemployment among individuals with ADRD, and physical isolation during lockdowns [2,7].Disrupted routines and decreased activity levels also contributed to documented cases of worsened behavioral issues and depression [6].
Accurately predicting mortality risk is especially critical for patients with ADRD, as it allows clinicians to have informed discussions with families, guide treatment decisions, and provide appropriate levels of care [7,8].However, while COVID-19 vaccinations have benefited the general senior population, their precise impacts on outcomes among older adults with ADRD remain less understood [2,5].Despite full vaccination, older adults with ADRD have shown higher breakthrough infection risks than vaccinated seniors without dementia [9].Accordingly, there is a need for research comparing mortality rates and hospitalization incidence between vaccinated and unvaccinated individuals with ADRD across all pandemic phases.This study utilized predictive modeling to compare long-term mortality and hospitalization rates between people with and without ADRD, differentiated by vaccination status.These advanced analytics were employed since they can offer insights into vaccination effectiveness for this vulnerable group, and generate valuable prognostic information to aid patients, families, and providers.

Methods
This was a retrospective cohort study of the electronic medical records between March 01, 2020 and February 28, 2022, of older adults aged ≥ 65 living in the community.All participants were insured for at least two years prior to the study [T1 (2018-2020)] and during the study time frame [T2 (2020-2022)] with Maccabi Healthcare Services (MHS), one of the major health management organizations in Israel.The cohort was divided into a group of patients with dementia and an age, gender, and socioeconomically matched control group without dementia.Both patient groups were further divided into those who had received COVID-19 vaccination and those who had not (see below).

Data collection
The electronic medical records for the dementia study group were obtained from the Cognitive Disorders Registry established by MHS in 2019.This registry was designed to facilitate comprehensive monitoring of patients with cognitive deterioration and includes the continuum of patients with pre-dementia mild cognitive impairment (MCI) all the way to those with severe dementia.The data for the control group was obtained from electronic medical records in MHS's general national database.The primary independent variable was COVID-19 vaccination status.Patients who had received a minimum of two mRNA or viral vector vaccine doses were designated as vaccinated, while patients who had received either a sole dose or none were designated as non-vaccinated.This classification leveraged the vaccination dates as a cumulative factor, ensuring a dynamic assessment of vaccination status over time.Data was collected on sociodemographic variables, including age, gender, geographical location, and socioeconomic strata; COVID-19 characteristics including COVID-19 infection and number of vaccine doses; and clinical attributes including severity of dementia, prescription or utilization of antipsychotic and/or antidepressant medications, diagnosed depression, utilization of home-based medical regimens, incidents of bone fractures including hip fractures, and the presence of documented medical conditions spanning hypertension, chronic obstructive pulmonary disease (COPD), diabetes mellitus (DM), immunosuppression, and obesity.
Data on healthcare-associated variables were also obtained, including metrics related to activities of daily living (ADL); incidences of visits to the emergency department; cumulative hospitalization duration; frequency of hospitalizations, geriatric clinic visits, familycentered clinic visits, geriatric teleconsultations, family telephonic interactions, appointments with psychiatrists, and social worker consultations; frequency of falls; and their attendant costs.

Data processing
In the data processing pipeline (Fig. 1), several fundamental steps were undertaken to ensure the quality, reliability, and suitability of the dataset for subsequent analysis.The process began with sanitization, involving the cautious removal or modification of sensitive, redacted, or inconsistent entries to enhance data integrity.Subsequently, normalization was employed to standardize data features to a uniform scale, enabling equitable comparisons and analyses among diverse attributes.Addressing the challenge of missing values, a substitution strategy was applied, wherein gaps in the dataset were filled using techniques like K-Nearest Neighbors (KNN) imputation or predefined placeholders.To augment the dataset's richness and diversity, augmentation methods were deployed, generating new instances through replication, combination, or duplication of existing data points.Computation of absolute differences played a significant role for clinical characteristics, facilitating the calculation of variations between periods T1 (2018-2020) and T2 (2020-2022).Categorical data underwent transformation through one-hot encoding, converting categorical variables into numerical format by creating binary columns.Another approach, label encoding, was employed to assign unique integer labels to categorical categories.Data aggregation was carried out to merge datasets from diverse sources, such as healthcare records, vaccination records, and demographic information.This process facilitated a comprehensive view of individual profiles by combining relevant attributes into a unified record for each subject.These data pre-processing steps collectively ensured data readiness for subsequent analysis, enhancing the reliability and validity of insights derived from the dataset in the context of the study's objectives.

Statistical analysis
Descriptive statistics, including means and standard deviations for continuous variables, and percentages for categorical variables, were utilized to characterize the sociodemographic characteristics, COVID-19 characteristics, and clinical characteristics of the participants.To assess the statistical differences between groups, T-tests were employed for continuous variables, while Chi-squared tests were used for categorical variables, as appropriate.The level of significance for all statistical analyses was 5%.The data analysis was performed using Python (version 3.9.16).

Machine learning models
For mortality prediction, we employed both logistic regression and tree-based models (Fig. 1).In our utilization of the logistic regression model, we adopted the "newton-cg" solver, which leverages second-order derivative information (Hessian matrix) to adjust its search direction.This characteristic contributes to quicker convergence, especially when addressing imbalanced data scenarios.The "newton-cg" solver is particularly suited for imbalanced datasets exhibiting distinct class separation.This suitability is observed when a majority of instances within the minority class are distinctly separated from those in the majority class [10][11][12][13].
In refining our classifier tree-based models, including Random Forest, XGBoost, LightGBM, and CatBoost, we employed a comprehensive approach to hyperparameter tuning using random search cross-validation, as facilitated by the scikit-learn library.This process was particularly focused on optimizing the Area Under the Curve (AUC) as the primary loss function.Following this, we leveraged the best-performing hyperparameters to evaluate the models on a separate validation set.Our evaluation function not only computed the AUC-ROC and AUC-PR but also employed bootstrapping to estimate 95% confidence intervals for these metrics, providing a robust assessment of model performance.This approach enabled us to address potential overfitting effectively and ensure the generalizability of our predictive models across unseen data, as demonstrated through detailed metrics including recall, precision, F1 score, and balanced accuracy.Our methodology thus underscores the rigor of our model fitting and validation process, ensuring the reliability and applicability of our findings to broader contexts.The hyperparameters that underwent adjustment included the number of trees (n_estimators), the maximum depth of trees (max_depth), and the learning rate.It's important to note that there were some differences for specific classifiers.For instance, the learning rate was not applicable to the Random Forest classifier, and for the CatBoost classifier, the maximum depth was constrained to a maximum value of 16, leading to the use of different intervals during tuning.With these exceptions, the majority of hyperparameters were consistent across models.To determine the best model, an evaluation metric called the area under the receiver operating characteristic curve (AUC score) was utilized.The same process of fine-tuning was extended to models that incorporated various built-in methods to address class imbalance, as detailed in Table 2.In addition to these methods, a technique called inverse probability of treatment weighting (IPTW) was employed in conjunction with the logistic regression and Gaussian Naive Bayes (GaussianNB) algorithms [14].
IPTW was used to counteract the bias introduced by the imbalanced distribution of mortality in the dataset.This was achieved by assigning appropriate weights to individual instances based on their treatment propensities.In essence, these weights represented the degree to which each instance was representative of the overall population.Logistic regression and GaussianNB were employed to model the propensity scores, which indicate the likelihood of an individual being vaccinated.These propensity scores were then utilized to compute weights for each instance in the dataset.The strength of this methodology lies in the combination of logistic regression and GaussianNB to compute propensity scores, leading to the generation of customized weights for each instance.These individualized weights were then used to counteract the adverse effects of class imbalance in the dataset.The ultimate result was an enhancement in both the fairness and accuracy of subsequent predictions related to mortality.The assessment of the models involved the utilization of diverse scoring techniques provided by the scikit-learn library.These methods encompassed AUC, as discussed above, area under the precision recall (AUC-PR), confusion matrix, specificity, sensitivity, recall, precision, F1 score, and balanced accuracy.Furthermore, the evaluation of the random search cross-validation encompassed the utilization of the learning_curve function.This function facilitated the determination of cross-validated training and test scores, specifically across varying training set sizes.To visualize the models' performance, their AUC and AUC-PR values were portrayed using the roc_curve and precision_recall_ curve functions from the scikit-learn library.This allowed for a comprehensive understanding of how the models' predictive capabilities were distributed across different thresholds and recall-precision balances.The AUC-PR metric focuses its evaluation on the performance of the positive (minority) class, thus enabling a more targeted assessment of a model's effectiveness in identifying and classifying instances of the minority class, which is often of greater interest in imbalanced scenarios.Confidence intervals (CIs) for both AUC and AUC-PR were computed to quantify the uncertainty associated with these performance metrics.Bootstrapping, a resampling technique, was employed to generate multiple resampled datasets from the original data, allowing for the creation of distributions of AUC and AUC-PR values.By repeating this process numerous times, confidence intervals were established by determining the lower and upper bounds of percentiles within the distributions.These intervals provided a measure of the range within which the true AUC and AUC-PR values were likely to fall with a specified confidence level [15].

Ethical consideration
The study protocol was approved by the Institutional Human Subjects Ethics Committee of Maccabi Healthcare Services (0075-22-MHS) of the relevant medical facility.Written informed consent was waived by the Institutional Review Board of Maccabi Healthcare Services.All performed procedures followed the ethical standards of both the institutional and national research committees.

Results
In the study, 29,925 patients were ascribed to the Cognitive Disorder Registry and 68,392 apart of the control group.Out of the remaining individuals, 68,556 (80%) had received vaccination, while 16,732 (20%) remained unvaccinated.Throughout the duration of the study, the Cognitive Disorder Registry was associated with 29,925 individuals, and 68,392 individuals were included in the control group, totaling 98,317 individuals in the cohort.To ensure the integrity and comparability of our data, 13,029 individuals were excluded during the data preprocessing phase.This exclusion was due to the absence of presence in both periods, T1 (before the pandemic) and T2 (during the pandemic), rather than solely based on unmatched criteria like age, gender, and socioeconomic status.This approach was taken to accurately assess the impact of COVID-19 vaccination on mortality by comparing individuals with consistent data across both time frames.Statistical analysis revealed a significant difference in gender, age, and socioeconomic status (SES) distribution between the vaccinated and unvaccinated groups (p < 0.001), making the unvaccinated group older and with a lower SES (Appendix 1).This demographic disparity was due to a larger number of vaccinated patients in the study sample.During the initial sampling, vaccination status was not considered as a parameter, and the two groups initially showed no significant differences in demographic characteristics.
The unvaccinated group also had other disadvantages to the vaccinated group including more hospitalization days (see Table 1).
These discrepancies in the continuous variables resulting from the data imbalance between the vaccinated and unvaccinated patients could significantly affect the training of the machine learning models to predict mortality.The imbalance in the distribution of these variables may introduce bias and skew the model's learning process, making it more challenging to accurately predict mortality outcomes for both the vaccinated and unvaccinated groups.To tackle the imbalance, the machine learning models were categorized into four distinct groups: those without weighted classes, models utilizing classweighting techniques, models incorporating IPTW using logistic regression, and models employing IPTW with GaussianNB (as outlined in eAppendix 2).
The models were rigorously evaluated on the validation set using a comprehensive suite of scoring metrics, with a particular focus on AUC and AUC-PR curves, as illustrated in Fig. 2. All models, each employing distinct weighting methodologies, consistently exhibited AUC values ranging from 0.96 to 0.98, along with AUC-PR values between 0.82 and 0.89, indicative of robust performance.Nevertheless, LightGBM, when employing IPTW with GaussianNB, attained the highest AUC value, while XGBoost, devoid of weighted methods, yielded the highest AUC-PR value, positioning them as potential optimal models.Despite the promising scores, the prospect of overfitting necessitated attention.While several models demonstrated elevated AUC scores across all sections, certain models displayed overfitting tendencies when comparing training and validation results using the learning_curve method.Notably, the random forest and CatBoost models exhibited overfitting, with random forest displaying the most severe case.Strikingly, the logistic regression method showcased excellent outcomes concerning overfitting, evidenced by the minimal disparity between training and validation AUC values.The results for XGBoost also indicated convergence between training and validation AUC values.While LightGBM demonstrated promising training and validation AUC outcomes, it showed comparatively limited generalization capabilities compared to XGBoost and logistic regression.To identify the optimal model for the test set, confidence intervals (CIs) for both AUC and AUC-PR were employed.The XGBoost model utilizing IPTW calculated with logistic regression exhibited the narrowest AUC-CI and AUC-PR-CI (AUC CI: 0.97624-9, AUC-PR CI: 0.8903-5), suggesting enhanced confidence in its performance assessment.The test set outcomes, using the XGBoost model utilizing IPTW calculated with logistic regression, resulted with an AUC value of 0.9773 and an AUC-PR value of 0.8969.The AUC CI (0.97732-0.97737)and AUC-PR CI (0.8969-0.8971) further substantiate the model's reliable performance.
The feature importance analysis from the selected model (Fig. 3) underscores the substantial influence of the "Vaccinated" feature in predicting mortality Table 1 Continuous variables analysis.Mean and standard deviation (SD) of the absolute magnitudes of differences between the measurements taken at time points T2 and T1   as a compelling determinant in influencing individual longevity, potentially overshadowing the significance of other visiting features Table 3.

Discussion
The key findings of this study offer important insights into the effects of COVID-19 vaccination on mortality outcomes in older adults with cognitive disorders and dementia.Among this high-risk population, a lack of vaccination was associated with dramatically increased risks of mortality, despite higher COVID-19 positivity rates in the vaccinated group.These results align with previous studies demonstrating reduced mortality after vaccination in older and medically fragile populations [16][17][18].
For dementia specialists, the findings of a strong protective effect of COVID-19 vaccination on mortality risk among those with cognitive impairment have broader clinical implications as we enter the post-pandemic period.Continuing to emphasize the significance of regular immunization against high-risk respiratory pathogens, such as influenza and COVID-19, offers clinicians the opportunity to extend the observed advantages in longevity that emerged during the pandemic to this particularly susceptible patient demographic [19].The optimization of vaccination coverage stands as a potent, yet underutilized avenue for diminishing preventable mortality and morbidity linked with vaccine-preventable diseases among individuals with dementia [19,20].
The considerable imbalance between vaccinated and unvaccinated groups introduces confounding biases reflecting real-world demographic disparities in vaccination uptake.The unvaccinated individuals tended to be older, frailer, and sicker, with higher rates of comorbid conditions like DM and immunosuppression that could independently increase mortality risk [21,22].The unvaccinated group also had greater needs for healthcare services including hospitalization and daily living assistance.By employing rigorous statistical weighting techniques during modeling, our machine learning approach accounted for these imbalances.The feature importance analyses further adjust for factors like dementia severity when identifying vaccination as the top predictor of mortality [23,24].
The feature importance analysis from the optimized XGBoost model illuminates the contribution of various factors in predicting mortality risk within this vaccinated and unvaccinated cohort of older adults with cognitive disorders.The prominent role of the "Vaccinated" feature, especially within those with cognitive disorders, highlights a potential protective association between COVID-19 vaccination and reduced mortality in this high-risk group [8,25].Hospitalization days difference by any case also emerged as an influential indicator of mortality, implying that greater disease severity and more intensive healthcare interventions are closely linked to worse outcomes [26,27].
The collective significance of treatment-related factors, such as costs, access to home care, eligibility for nursing home services, and assistance requirements for activities of daily living, underscores the pivotal role of comprehensive supportive care in shaping mortality outcomes, suggesting that greater care availability promotes longevity in this population [28,29].
Proximity of caregivers emerged as a salient predictor of mortality specifically among those over 65 years of age.This highlights the key role that regular in-person care and assistance from dedicated caregivers may play in promoting longevity in older populations, including those with cognitive impairment [30][31][32].On the other hand, visiting features related to frequency of visits from physicians, nurses and social workers carried little importance in the predictive model.This lack of significance for visiting features likely stems from widespread limitations or reductions in routine in-person healthcare visits during the COVID-19 pandemic period.With temporary disruptions to normal healthcare access, the frequency of visiting various providers was likely diminished and may not have held its typical influence in mortality prediction models [33].However, this finding underscores the need to re-establish regular visitation and care coordination as the pandemic wanes, particularly for vulnerable seniors and those unable to independently access services [34,35].
The findings from this study have several important implications for clinical practice and policy decisions regarding COVID-19 vaccination of older adults with dementia.First, we provide novel evidence that vaccination conferred significant protection against mortality even during the highly transmissible Omicron period.This adds to the scarce literature and reinforces vaccination as a critical strategy for protecting this vulnerable population.
Second, by applying ML models, we were able to generate personalized risk predictions based on individual patient characteristics.This offers clinicians valuable guidance to help identify those at highest risk of poor outcomes who may warrant more aggressive preventive measures or care planning.Such individualized risk stratification could help optimize resource allocation and clinical management of COVID-19 in this complex patient group [36].
Finally, at a policy level, our results provide public health decision-makers with real-world effectiveness data to inform vaccination prioritization, booster recommendations, and public messaging targeting older adults with dementia and their caregivers.Demonstrating the ongoing protection against severe illness and death enhances confidence in COVID-19 vaccines as a priority intervention for this high-risk population.

Limitations
This study had several limitations.First, the single health system cohort may not fully generalize findings to other populations, though the large diverse sample provides valuable real-world data.Second, while rigorous methods were used to account for confounding, residual confounding from unmeasured factors is possible in this observational study.The analyses would be strengthened with additional details on timing of vaccination, specific vaccine types, and reasons for non-vaccination.Longerterm follow-up is needed to ascertain enduring protective effects.Further, while IPTW was used to balance observed factors, residual bias from unmeasured confounders not included in the propensity score model cannot be ruled out.
Nevertheless, leveraging a robust predictive modeling approach, this study offers clinically useful evidence that COVID-19 vaccination substantially reduces mortality risk in a vulnerable population of older adults with cognitive impairment, underscoring the importance of focused efforts to increase vaccine uptake among those at highest risk.It's possible some of those who remained unvaccinated represented a very high-risk group.Advanced health issues, disability, or severe cognitive impairment could have impacted their ability to consent to or access vaccination.For those with extreme frailty or dementia, the choice may have been out of their control.Therefore, the unvaccinated in our study may over-represent the oldest and sickest individuals faced with barriers to vaccination.We cannot rule out residual confounding from factors like diminished capacity, despite our adjustments.Generalizability to very frail older populations is limited.

Conclusions
Our analysis reveals a significant impact of COVID-19 vaccination on reducing mortality risk within a large, demographically diverse cohort of older adults, both with and without cognitive impairment.Notably, among those diagnosed with cognitive disorders, vaccination was associated with a remarkable 93% (p < 0.001) survival rate.This rate was derived through a comparative analysis, between vaccinated and unvaccinated individuals within registar cognitive disorder subgroup, highlighting the profound protective effect of vaccination.These findings underscore the urgent need to enhance vaccination efforts and address disparities, ensuring vulnerable populations receive the protection they critically need to prevent mortality Additionally, the machine learning models, particularly XGBoost with inverse probability of treatment weighting, provide clinically useful tools to reliably predict individual patient mortality risk based on key factors like vaccination status, hospitalization duration, and dementia severity.Overall, the study highlights the vital importance of holistic, patient-centered medical care and equitable access to protective vaccination to improve outcomes for older adults during COVID-19.

Fig. 1
Fig. 1 Data integration and pre-processing pipeline for integrated databases

Fig. 2
Fig.2The Effect of Imbalance Weighting and IPTW on the performance of machine learning models for Binary mortality prediction

Table 2
Evaluation metrics test set results of XGBoost Model using IPTW with logistic regression Fig. 3 Feature importance analysis within the XGBoost model with IPTW weights calculated using logistic regression

Table 3
Performance Comparison of Various Machine Learning Models for Binary Mortality Prediction.Description: Description: The table displays performance metrics for various machine learning models employed in binary mortality prediction on the validation set.Evaluation criteria include AUC, AUC-PR, confusion matrix (TP/FP/FN/TN), specificity, sensitivity, recall, precision, F1 score, and balanced accuracy.The models encompass fundamental machine learning algorithms, models employing built-in weighted imbalance methods, and models utilizing inverse probability of treatment weighting, incorporating both Logistic Regression and GaussianNB