Skip to main content

Development of machine learning models for patients in the high intrahepatic cholangiocarcinoma incidence age group



Intrahepatic cholangiocarcinoma (ICC) has a poor prognosis and is understudied. Based on the clinical features of patients with ICC, we constructed machine learning models to understand their importance on survival and to accurately determine patient prognosis, aiming to develop reference values to guide physicians in developing more effective treatment plans.


This study used machine learning (ML) algorithms to build prediction models using ICC data on 1,751 patients from the SEER (Surveillance, Epidemiology, and End Results) database and 58 hospital cases. The models’ performances were compared using receiver operating characteristic curve analysis, C-index, and Brier scores.


A total of eight variables were used to construct the ML models. Our analysis identified the random survival forest model as the best for prognostic prediction. In the training cohort, its C-index, Brier score, and Area Under the Curve values were 0.76, 0.124, and 0.882, respectively, and it also performed well in the test cohort. Kaplan–Meier survival analysis revealed that the model could effectively determine patient prognosis.


To our knowledge, this is the first study to develop ML prognostic models for ICC in the high-incidence age group. Of the ML models, the random survival forest model was best at prognosis prediction.

Peer Review reports


The incidence and mortality rates of intrahepatic cholangiocarcinoma (ICC), which is the second most common primary liver cancer, accounting for 10% of all primary liver cancers, are increasing [1]. When compared with hepatocellular carcinoma (HCC), ICC is less well understood and also has a worse prognosis. Although radical surgery is a curative treatment for patients with early-stage ICC, many patients are diagnosed at an advanced stage. Moreover, a large-capacity center study found that after hepatectomy, patients with ICC have a five-year survival rate of 25–35%, which was mainly attributed to a high recurrence rate [2].

Although the American Joint Committee on Cancer (AJCC) staging is the most widely used system of evaluating the prognosis of patients with ICC, it is less accurate because it does not account for the effects of treatment, age, and other important factors [3]. Although nomograms have been increasingly researched in recent years [4, 5], they are based on multivariate Cox regression analysis with fixed assigned weights, which are outdated and rigid tools [6]. Through machine learning (ML) algorithms, computers can learn from large-scale, disparate healthcare data and then make decisions or predictions without being explicitly programmed. In many tasks, such as diagnosis, classification, and survival prediction, ML models have key advantages over traditional statistical models [7].

According to the 2020 Global Cancer Observatory (Cancer Today []), the number of liver cancer cases rose sharply after the age of 50 years, while its incidence fell after the age of 74 years. Therefore, for ICC, patients aged between 50 and 74 years are the most frequent and representative. Focusing on this group, we investigated the impact of the clinical features of patients with ICC on survival and accurate prognosis based on ML algorithms, aiming to provide reference values for guiding clinicians in making treatment plans.


Patient selection and study variables

Data on patients with ICC, who were diagnosed with primary intrahepatic bile duct cancer between 2000 and 2020, were obtained using SEER*Stat software (version 8.4.2). External validation data, with diagnosis supported by clear pathology results, were obtained from Renmin Hospital of Wuhan University. The study’s ethical approval was granted by the Clinical Research Ethics Committee, Renmin Hospital of Wuhan University. Study variables included age, sex, race, marital status, time between diagnosis and treatment, histological grade, AJCC-TNM stage, tumor site surgery information, regional lymph node removal information, tumor size, sequence number, number of tumors (number of malignant tumors in lifetime), sequence of surgery and systemic therapy, chemotherapy, radiotherapy. The data of patients with the primary site code, C22.1, an age of 50–74 years, and complete follow-up information were included. Those with missing or unclear data records, controversial grouping data, and a survival of less than a month, were excluded. In case two or more medical records were available, the most recent one prevailed. The patient screening process is outlined in Fig. 1.

Fig. 1
figure 1

Flow chart of patients’ selection in the training and test cohorts from the SEER database

Variable selection and machine learning model construction

Data from 1,751 patients with ICC were randomly divided at a 7:3 ratio into training and internal test cohorts. Univariate and multivariate Cox analyses were then used to identify variables with prognostic value (statistically significant variables with hazard ratios [HR] of > 1 or < 1). The prediction models were constructed using the open-source package, Python library scikit-survival, version 0.21.0 (Python version 3.11.4) [8].

Evaluation of model prediction accuracy and superiority

C-index, time-dependent AUC, and Brier score analyses were used to assess model prediction accuracy [9, 10]. The Brier score measures the difference between the predicted probability and the true outcome, with higher scores indicating poorer prediction accuracy and calibration [11]. The ML models were compared and analyzed using decision curve analysis (DCA). The best cutoff value for risk grouping was determined using the X-tile software [12]. The patients were then classified into the high-, medium-, or low-risk groups. The differences in the groups’ overall survival (OS) rates and actual patient survival probabilities were determined using Kaplan–Meier (KM) analysis.

Interpretation of the random survival forest (RSF) model

The model’s interpretation was divided into the SHapley Additive exPlanations (SHAP) plot and the JAVA-based prediction website. SHAP, a model interpretation package developed in Python, is used for ML model interpretation. For each prediction sample, a SHAP value is assigned to each feature, and the larger the absolute SHAP value, the greater the feature’s influence. The value’s sign indicates if the feature affects the result positively or negatively [13, 14]. To improve this study’s practical value, an interactive website was developed, on which one-, three-, and five-year OS can be calculated automatically by entering the required clinical information.

Statistical analyses

Statistical analyses were performed on R version 4.2.1 and Python version 3.11.4. The “survival”, “survminer”, and timeROC” packages were used for univariate and multivariate Cox regression, forest mapping, and receiver operating characteristic (ROC) analyses, respectively. HR > 1 and < 1 indicate risk and protective factors, respectively. A Chi-square test was used to assess distribution differences of the variables in the two cohorts. Survival rates were compared using a log-rank test. All statistical tests were two-sided, with P < 0.05 indicating statistically significant differences.


Baseline characteristics of the training and test cohorts

The study involved 1,751 patients (women: 830) with ICC from the SEER database, who were divided into the training (N = 1226, 70%) and test (N = 525, 30%) cohorts. More than half of the patients started treatment within a month of diagnosis, and almost all were treated within three months. Although the number of patients with various TNM stages was about the same, the histological grade of the tumors was mainly moderately or poorly differentiated, and highly differentiated or undifferentiated tumors were less common. Many studies indicate that surgery is the main treatment strategy for ICC [15, 16]. In the training cohort, most patients underwent hepatectomy, including wedge or segmental resection, lobectomy, extended lobectomy, hepatectomy, and bile duct excision. Very few patients received liver transplantation or local tumor destruction, such as cryotherapy, photodynamic therapy, and radiofrequency ablation. However, 33% of the patients did not undergo surgery. Notably, most patients also received chemotherapy. The training and test cohorts’ baseline data are shown in Table 1.

Table 1 Demographic and clinical characteristics of patients with intrahepatic cholangiocarcinoma

Variable selection

Cox regression analyses were used to identify prognostic variables in the training cohort. The study included 15 variables and after univariate Cox analysis, the variables, age, race, marital status, and radiotherapy were excluded (Table S1). This analysis was followed immediately by a multivariate Cox analysis (Table S2). analysis revealed that the number of malignant tumors and sequence numbers had a high degree of collinearity problem. Therefore, only the malignant tumors’ sequence number was retained. Finally, eight statistically significant prognostic factors were selected (Fig. 2A).

Fig. 2
figure 2

Demonstration of multivariate Cox analysis and analysis of different months from diagnosis to treatment. A Forest plot based on multivariate Cox regression analysis. B Bar plot of important features of ICC patients in different months from diagnosis to treatment. The vertical coordinate is the percentage of the feature subgroup in the group

As shown in the figure, when compared with women, OS was slightly worse in men, which is consistent with previous reports on HCC and most other cancers [17, 18]. Surprisingly, in the group in which the time between diagnosis and treatment was less than one month, the HR was higher than in the group in which the time was over three months, which is counterintuitive and contrary to several reports [19]. We therefore hypothesized that this feature was overshadowed by other important features because of a small sample size and conducted a correlation analysis. In group “0–1 month” (Fig. 2B), the proportion of TNM stage I was significantly lower than stage IV, whereas the opposite trend was observed in patients in the group, “>3 months”, with the “>3 months” group having a significantly higher percentage of stage I patients when compared with the other two groups. This analysis also revealed that most tumors in the “>3 months” group were < 5 cm in size, whereas those in the “0–1 month” group had a significant number of tumors that were > 10 cm in size.

Histological grade, AJCC-TNM stage, and tumor size correlated negatively with OS and unsurprisingly, surgery and chemotherapy were more beneficial to prolong OS, with hepatectomy and liver transplantation (LT) being significantly better than local tumor destruction. However, the analysis did not reveal the advantages of LT over hepatectomy, probably because LT data were available for only about 2% of the cases, large individual differences may affect outcomes. Interestingly, patients in “1st of 2 or more” had relatively more optimistic prognoses than patients with ICC only. However, there were no significant differences when compared with “not 1st primary”.

ML model construction and comparison

Cox proportional hazards (CPH), survival tree (Tree), gradient boosted machine (GBM), and RSF models were developed based on the training cohort and their parameters were optimized using a five-fold crossover (Table S3 and Fig. S1). To evaluate the models’ performances, their C-indexes and Brier scores were first calculated (Table 2). These analyses revealed that the RSF model performed best, with a high C-index (0.76) and a low Brier score (0.124). Next, we calculated the four models’ Area Under the Curve (AUC) values over time (Fig. 3A). This analysis revealed that for the RSF model, the average AUC value was 0.882, which was markedly higher than the AUCs of other models. Importantly, the RSF model’s AUC value in the first year was higher than in the other periods, indicating that it could predict short-term prognosis more accurately. DCA revealed that the use of our models, especially the RSF model, to guide treatment can benefit patients (Fig. 3B–D). Because the RSF model performed much better than the other three models, it was used for follow-up analyses.

Table 2 C-index and Brier score of machine learning models
Fig. 3
figure 3

Evaluation of the performance of four ML models. A Time-dependent AUC for the four models. B-D DCA of ML models for one-year, three-year, and five-year OS prediction in the training cohort

Validation of the RSF model’s performance

The RSF model’s performance was validated in internal and external test cohorts. The external test cohort had 58 patients (S2). The RSF model performed well in both cohorts (both C-indexes: >0.72, both Brier scores: <0.18, Table S4). In the internal test cohort, ROC curve analysis revealed that the model’s AUC values for one-, three-, and five-year OS were 0.774, 0.789, and 0.815, respectively. However, because of an insufficient sample size, fifth-year ROC curve analysis could not be conducted on the external test cohort. The analysis was therefore done for the second year. Surprisingly, in the external test cohort, the model’s predictive accuracy was high in the first year (AUC: 0.937), and it also performed well in the second (AUC: 0.795) and third years (AUC: 0.727). The model was further evaluated by comparing the consistency between actual survival probabilities and the predicted probabilities (Fig. 4C–F). This analysis revealed that the model’s predictions were highly consistent with the actual situation.

Fig. 4
figure 4

ROC curves and calibration curves of the RSF model in test cohorts. A ROC curves for RSF model predicting 1-, 3-, and 5-year OS in the internal test cohort. B ROC curves for RSF model predicting 1-, 2-, and 3-year OS in the external test cohort. C, D, E Calibration curves of first C, third D and fifth E year in the internal test cohort. F, G, H Calibration curves of first F, second G and third H year in the external test cohort

Risk stratification based on the RSF model

The ability of TNM staging to predict patient prognosis was poor (Fig. 5A). We therefore developed a risk stratification system based on the training cohort’s patient risk scores (Fig. 5B). Patient risk scores were determined from the RSF model’s predictions and they ranged from 17.7 to 221.3, with scores of < 83.5 indicating low risk, scores of > 136.1 indicating high risk, and scores that fall between these values indicating intermediate risk. KM analysis revealed that patients in various subgroups had significantly different OS rates (Fig. 5C), with the high-risk group having the worst prognosis and the low-risk group having the best prognosis.

Fig. 5
figure 5

Risk stratification system based on RSF model. A Survival curves based on TNM stage. B Cut off values for optimal grouping determined using X-tile. C KM survival curves based on RSF model

The RSF model’s feature importance and interpretation

The SHAP technique calculates each feature’s contribution to the model’s final prediction decision for any instance, xi. In the SHAP figure (Fig. 6), the model’s variables are listed in descending order based on importance, with the variable, ‘whether the tumor primary site underwent surgery’, being the most important. Positive SHAP values indicated an increased probability of “death”, with higher values indicating higher risk and vice versa. The results indicate that ‘no surgery at the tumor primary site’ and TNM stage IV increased the probability of “death”, whereas a tumor size of < 5 cm increased the probability of “survival”. To demonstrate prognosis prediction, three patients were randomly selected from the training cohort (Fig. 6B–D). To present our model more intuitively and facilitate its use by clinicians, we developed a website ( where users can predict OS by entering their data and then clicking “determine” to get the predicted results. The model can also be used to assess if a treatment is beneficial. By controlling for the same ‘other variables’ and then inputting a different treatment, one can assess if the prediction improves or decreases, thereby determining if an intervention is beneficial.

Fig. 6
figure 6

The SHAP plot of the RSF model. A SHAP beeswarm summary plot on the impact of input variables on the RSF model’s prediction. B The local SHAP plot of the patient #1. Patient #1: 50-year-old male, survival time was 1 month, died. AJCC TNM stage was IV, Histological grade was IV, tumor size = 12.5 cm. He was treated immediately after diagnosis, underwent hepatectomy and chemotherapy, only had intrahepatic cholangiocarcinoma in his life. C The local SHAP plot of the patient #2. Patient #2: 66-year-old female, survival time was 45 months, died. AJCC TNM stage was IV, Histological grade was II, tumor size = 2.0 cm. She was treated 1 month after diagnosis, underwent hepatectomy and chemotherapy, only had ICC in his life. D The local SHAP plot of the patient #3. Patient #3: 66-year-old female, survival time was 18 months, alive. AJCC TNM stage was I, Histological grade was II, tumor size = 3.2 cm. She was treated 1 month after diagnosis, underwent hepatectomy and chemotherapy. Prior to being diagnosed with ICC, she had multiple malignant tumors. The red ribbons in the local SHAP plot represent risk factors that lead to a poor prognosis, whereas the blue ribbons are the relatively protective factors


Despite liver cancer incidence and mortality increasing annually, accurate prognostic models for guiding clinical decisions are lacking since most current models are Cox regression-based nomograms. Here, we found that ICC incidence is highest in the 50–74 years age group and sought to develop CPH, Tree, GBM, and RSF models for predicting ICC prognosis in this age group [20]. Our findings indicate that ML models exhibit good predictive performance, with the RSF model exhibiting the highest prognosis prediction accuracy. In the training cohort, the RSF model had a C-index of 0.76, a Bries score of 0.124, and an average AUC value of 0.882. Further validation analysis of the model’s utility and accuracy using internal and external tests revealed C-indexes of 0.72 (internal) and 0.80 (external).

Based on Cox regression analyses, our model incorporated eight variables. In most cancers, women have better OS than men [17, 18]. Cong et al. suggested that this may be because women have a better liver foundation since only about 49% of women with liver cancer have cirrhosis when compared to 68% of men [21]. Other studies indicate that the longer OS may be because of earlier liver cancer detection since more women undergo regular ultrasound and α-fetoprotein (AFP) tests, and therefore have better treatment results [22]. However, it is also reported that molecular factors may account for gender differences in OS, such as differential CXCL14, ATF5HAMP, and GPR37 expression, and different levels of Notch and PI3K/AKT signaling [23]. Regarding how the time between diagnosis and treatment affects prognosis, there are discrepancies in reported studies. One study reported that delay in the time from diagnosis to treatment did not significantly affect the OS of patients with liver cancer [24]. However, Tsai et al. reported that in early liver cancer, the longer the time between diagnosis and treatment, the lower the survival rate [25]. Interestingly, some studies indicate that the time interval between diagnosis and treatment may not correlate significantly with prognosis and that it may correlate with prognosis positively or negatively [26]. Therefore, the effect of this factor on ICC prognosis remains controversial.

Histologically, ICC is a highly-to-moderately differentiated adenocarcinoma that in the early stages, often invades the portal vein, lymphatic vessels, and intrahepatic nerves [27]. Moreover, the larger the tumor, the higher the vascular invasion incidence, and in many cancers, tumor size is a prognostic factor [28, 29]. Tumors with a size of ≤ 2 cm have been associated with a good five-year survival rate (63.4%), while patients with tumors that are ≤ 2 cm and no lymph node metastasis, portal vein invasion, or biliary ductal invasion, have a 100% five-year survival rate. However, tumors with a size of > 2 cm are associated with a decline in the five-year survival rate [30]. Although tumor size affects prognosis, resection indications are not limited to tumor size.

Our results indicate that the AJCC-TNM stage and surgery at the tumor site are the most important factors affecting OS. It is reported that at the time of diagnosis, only 20–30% of patients are eligible for resection, mainly because of multifocal tumors and metastases [31]. The main surgical intervention for ICC is hepatectomy, which offers patients about three years of disease-free survival. For advanced, localized, or unresectable ICC, local treatments like transcatheter arterial chemoembolization and thermal ablation are widely used, which, as palliative care, can significantly prolong OS [32]. Importantly, for tumor sizes of < 3 cm, thermal ablation is reported to have a similar impact on survival as hepatectomy and a lower complication rate while being less expensive [33]. LT efficacy in ICC is reported to be significantly worse than that of HCC [34, 35]. However, although LT is often a treatment option for unresectable malignant liver and bile duct tumors, our findings do not show LT’s superiority over hepatectomy because of an insufficient amount of data. However, a recent study reported satisfactory LT results showing that in carefully selected patients with ICC, when combined with neoadjuvant chemotherapy, LT resulted in a five-year OS rate of 83.3% and a five-year disease-free survival of 50% [36]. Therefore, for ICC, it is important to identify ideal LT candidates, and further research is needed.

Over the past decade, gemcitabine and cisplatin have become the standard postoperative adjuvant ICC therapy. It is also reported that neoadjuvant therapy can benefit the survival of patients with ICC [37, 38]. Other studies have shown that surprisingly, combining trans-arterial drug-eluting bead therapy with chemotherapy was efficacious [31, 39]. It is interesting to note the effect of the variable, ‘sequence number’, on ICC prognosis in this study. Although we did not identify relevant ICC studies, a study by Heo et al. found that in HCC, patients with cancer and longer survival had a higher risk of developing a second primary tumor, indicating that patients who developed a second primary tumor survived relatively longer [40]. Similar results were reported by Wang et al. for small cell lung cancer [41], who showed that patients with lung cancer (LC) may die prematurely because of poorer health or higher tumor malignancy, without suffering from other tumors. Moreover, patients with LC, who develop additional tumors, inevitably receive additional antitumor therapy, which may also act as anti-LC therapy. Finally, patients with simple LC may have defective immune surveillance, which may lead to “immune escape”, whereas secondary tumors may activate cancer-related immune mechanisms. These factors may also apply to ICC.

Our research is progressive. Using the latest SEER data, we first used ML algorithms to construct prognostic models for the high ICC incidence age group. We have overcome the visualization and application challenges of ML models using the SHAP technique and by developing a prediction website. However, this study has limitations. First, because it is retrospective, it may have selection bias. Therefore, prospective studies are needed to validate our findings. Second, the SEER database only covers the U.S., and the external test cohort used in this study had 58 patients only. This study would have been more robust if it involved larger datasets. Because of SEER database limitations, some potentially important variables, such as targeted therapy, immunotherapy, and genetic factors, were not available, and including them may improve the performance of ML models.


This study used eight variables to construct ML models for predicting the prognosis of patients in the high ICC incidence age group. Our analyses indicate that the RSF model could predict ICC OS most accurately.

Data availability

The dataset used in this study can be requested from the SEER source website at


  1. Sposito C, Droz Dit Busset M, Virdis M, Citterio D, Flores M, Bongini M, et al. The role of lymphadenectomy in the surgical treatment of intrahepatic cholangiocarcinoma: a review. Eur J Surg Oncol. 2022;48(1):150–9.

    Article  PubMed  Google Scholar 

  2. Spolverato G, Kim Y, Ejaz A, Alexandrescu S, Marques H, Aldrighetti L, et al. Conditional probability of long-term Survival after Liver Resection for Intrahepatic Cholangiocarcinoma: a multi-institutional analysis of 535 patients. JAMA Surg. 2015;150(6):538–45.

    Article  PubMed  Google Scholar 

  3. Büttner S, Galjart B, Beumer BR, van Vugt JLA, van Eijck CHJ, Polak WG, et al. Quality and performance of validated prognostic models for survival after resection of intrahepatic cholangiocarcinoma: a systematic review and meta-analysis. HPB (Oxford). 2021;23(1):25–36.

    Article  PubMed  Google Scholar 

  4. Wang Y, Li J, Xia Y, Gong R, Wang K, Yan Z, et al. Prognostic nomogram for intrahepatic cholangiocarcinoma after partial hepatectomy. J Clin Oncol. 2013;31(9):1188–95.

    Article  PubMed  Google Scholar 

  5. Hyder O, Marques H, Pulitano C, Marsh JW, Alexandrescu S, Bauer TW, et al. A nomogram to predict long-term survival after resection for intrahepatic cholangiocarcinoma: an eastern and western experience. JAMA Surg. 2014;149(5):432–8.

    Article  PubMed  Google Scholar 

  6. Ji GW, Jiao CY, Xu ZG, Li XC, Wang K, Wang XH. Development and validation of a gradient boosting machine to predict prognosis after liver resection for intrahepatic cholangiocarcinoma. BMC Cancer. 2022;22(1):258.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ngiam KY, Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262–73.

    Article  PubMed  Google Scholar 

  8. Pölsterl SJJMLR. scikit-survival: a Library for Time-to-event analysis built on Top of scikit-learn. 2020;21:2121-:6.

  9. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315–6.

    Article  PubMed  Google Scholar 

  10. Van Calster B, Vergouwe Y, Looman CW, Van Belle V, Timmerman D, Steyerberg EW. Assessing the discriminative ability of risk models for more than two outcome categories. Eur J Epidemiol. 2012;27(10):761–70.

    Article  PubMed  Google Scholar 

  11. Dankers F, Traverso A, Wee L, van Kuijk SMJ. Prediction Modeling Methodology. In: Kubben P, Dumontier M, Dekker A, editors. Fundamentals of Clinical Data Science. Cham (CH): Springer; 2019:101 – 20.

  12. Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin Cancer Res. 2004;10(21):7252–9.

    Article  CAS  PubMed  Google Scholar 

  13. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems; Long Beach, California, USA: Curran Associates Inc.; 2017. pp. 4768–77.

  14. Lin J, Yin M, Liu L, Gao J, Yu C, Liu X, et al. The development of a Prediction Model based on Random Survival Forest for the postoperative prognosis of pancreatic Cancer: a SEER-Based study. Cancers (Basel). 2022;14(19).

  15. Krenzien F, Nevermann N, Krombholz A, Benzing C, Haber P, Fehrenbach U, et al. Treatment of Intrahepatic Cholangiocarcinoma-A Multidisciplinary Approach. Cancers (Basel). 2022;14(2).

  16. Diggs LP, Fagenson AM, Putatunda V, Lau KN, Grandhi MS, Pitt HA. Intrahepatic cholangiocarcinoma: how do hepatectomy outcomes compare to liver metastases and hepatocellular carcinoma? HPB (Oxford). 2023;25(11):1420–8.

    Article  PubMed  Google Scholar 

  17. Zhang H, Han J, Xing H, Li ZL, Schwartz ME, Zhou YH, et al. Sex difference in recurrence and survival after liver resection for hepatocellular carcinoma: a multicenter study. Surgery. 2019;165(3):516–24.

    Article  PubMed  Google Scholar 

  18. Dong M, Cioffi G, Wang J, Waite KA, Ostrom QT, Kruchko C, et al. Sex differences in Cancer incidence and survival: a Pan-cancer Analysis. Cancer Epidemiol Biomarkers Prev. 2020;29(7):1389–97.

    Article  PubMed  Google Scholar 

  19. Cone EB, Marchese M, Paciotti M, Nguyen DD, Nabi J, Cole AP, et al. Assessment of Time-to-Treatment Initiation and Survival in a cohort of patients with common cancers. JAMA Netw Open. 2020;3(12):e2030072.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Zhou Y, McArdle JJ. Rationale and applications of Survival Tree and Survival Ensemble methods. Psychometrika. 2015;80(3):811–33.

    Article  PubMed  Google Scholar 

  21. Cong WM, Wu MC, Zhang XH, Chen H, Yuan JY. Primary hepatocellular carcinoma in women of mainland China. A clinicopathologic analysis of 104 patients. Cancer. 1993;71(10):2941–5.<2941::aid-cncr2820711009>;2-3

    Article  CAS  PubMed  Google Scholar 

  22. Dohmen K, Shigematsu H, Irie K, Ishibashi H. Longer survival in female than male with hepatocellular carcinoma. J Gastroenterol Hepatol. 2003;18(3):267–72.

    Article  PubMed  Google Scholar 

  23. Natri HM, Wilson MA, Buetow KH. Distinct molecular etiologies of male and female hepatocellular carcinoma. BMC Cancer. 2019;19(1):951.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Rao A, Rich NE, Marrero JA, Yopp AC, Singal AG. Diagnostic and therapeutic delays in patients with Hepatocellular Carcinoma. J Natl Compr Canc Netw. 2021;19(9):1063–71.

    Article  PubMed  Google Scholar 

  25. Tsai WC, Kung PT, Wang YH, Kuo WY, Li YH. Influence of the time interval from diagnosis to treatment on survival for early-stage liver cancer. PLoS ONE. 2018;13(6):e0199532.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Jacobsen MM, Silverstein SC, Quinn M, Waterston LB, Thomas CA, Benneyan JC, et al. Timeliness of access to lung cancer diagnosis and treatment: a scoping literature review. Lung Cancer. 2017;112:156–64.

    Article  PubMed  Google Scholar 

  27. Vijgen S, Terris B, Rubbia-Brandt L. Pathology of intrahepatic cholangiocarcinoma. Hepatobiliary Surg Nutr. 2017;6(1):22–34.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Feng H, Lyu Z, Zheng J, Zheng C, Wu Q, Liang W, et al. Association of tumor size with prognosis in colon cancer: a Surveillance, Epidemiology, and end results (SEER) database analysis. Surgery. 2021;169(5):1116–23.

    Article  PubMed  Google Scholar 

  29. Yang F, Chen H, Xiang J, Zhang Y, Zhou J, Hu H, et al. Relationship between tumor size and disease stage in non-small cell lung cancer. BMC Cancer. 2010;10:474.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Kubo S, Shinkawa H, Asaoka Y, Ioka T, Igaki H, Izumi N, et al. Liver Cancer Study Group of Japan Clinical Practice Guidelines for Intrahepatic Cholangiocarcinoma. Liver cancer. 2022;11(4):290–314.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Moris D, Palta M, Kim C, Allen PJ, Morse MA, Lidsky ME. Advances in the treatment of intrahepatic cholangiocarcinoma: an overview of the current and future therapeutic landscape for clinicians. CA Cancer J Clin. 2023;73(2):198–222.

    Article  PubMed  Google Scholar 

  32. Fabritius MP, Ben Khaled N, Kunz WG, Ricke J, Seidensticker M. Image-guided local treatment for Unresectable Intrahepatic Cholangiocarcinoma-Role of Interventional Radiology. J Clin Med. 2021;10(23).

  33. Zhang SJ, Hu P, Wang N, Shen Q, Sun AX, Kuang M, et al. Thermal ablation versus repeated hepatic resection for recurrent intrahepatic cholangiocarcinoma. Ann Surg Oncol. 2013;20(11):3596–602.

    Article  PubMed  Google Scholar 

  34. Becker NS, Rodriguez JA, Barshes NR, O’Mahony CA, Goss JA, Aloia TA. Outcomes analysis for 280 patients with cholangiocarcinoma treated with liver transplantation over an 18-year period. J Gastrointest Surg. 2008;12(1):117–22.

    Article  PubMed  Google Scholar 

  35. Sapisochin G, de Lope CR, Gastaca M, de Urbina JO, López-Andujar R, Palacios F, et al. Intrahepatic cholangiocarcinoma or mixed hepatocellular-cholangiocarcinoma in patients undergoing liver transplantation: a Spanish matched cohort multicenter study. Ann Surg. 2014;259(5):944–52.

    Article  CAS  PubMed  Google Scholar 

  36. Sapisochin G, Ivanics T, Heimbach J. Liver transplantation for Intrahepatic Cholangiocarcinoma: Ready for Prime Time? Hepatology (Baltimore. Md). 2022;75(2):455–72.

    Article  Google Scholar 

  37. Rizzo A, Brandi G. Neoadjuvant therapy for cholangiocarcinoma: a comprehensive literature review. Cancer Treat Res Commun. 2021;27:100354.

    Article  PubMed  Google Scholar 

  38. Yadav S, Xie H, Bin-Riaz I, Sharma P, Durani U, Goyal G, et al. Neoadjuvant vs. adjuvant chemotherapy for cholangiocarcinoma: a propensity score matched analysis. Eur J Surg Oncol. 2019;45(8):1432–8.

    Article  PubMed  Google Scholar 

  39. Martin RCG 2nd, Simo KA, Hansen P, Rocha F, Philips P, McMasters KM, et al. Drug-eluting bead, Irinotecan Therapy of Unresectable Intrahepatic Cholangiocarcinoma (DELTIC) with concomitant systemic gemcitabine and cisplatin. Ann Surg Oncol. 2022;29(9):5462–73.

    Article  PubMed  Google Scholar 

  40. Heo J, Noh OK, Oh YT, Chun M, Kim L. Second primary cancer after liver transplantation in hepatocellular carcinoma: a nationwide population-based study. Hepatol Int. 2017;11(6):523–8.

    Article  PubMed  Google Scholar 

  41. Wang S, Hu S, Huang S, Su L, Guo Q, Wu B, et al. Better survival and prognosis in SCLC survivors after combined second primary malignancies: a SEER database-based study. Med (Baltim). 2023;102(6):e32772.

    Article  CAS  Google Scholar 

Download references


Thanks to Yang Hao for his contributions in building online websites by JAVA.


This research was funded by the national key research and development program of China (2022YFC2407304), Natural Science Foundation of Hubei Province (2022CFB122) and National Natural Science Foundation of China (82370654).

Author information

Authors and Affiliations



JS., YZ., DY. were involved in the conception and design of this study. JS. and YZ. wrote the main manuscript text. JS., YZ., ZW. and DY. provided methodology. JS. and JP. prepared the dataset. JS., YZ formed the analysis. ZW., JP., XW. interpreted the results. All authors contributed to writing the final draft and prepared the final manuscript. All authors read and approved by the final manuscript.

Corresponding authors

Correspondence to Kailiang Zhao or Youming Ding.

Ethics declarations

Ethics approval and consent to participate

Approval of the research protocol by an Institutional Reviewer Board: This study was approved by the Clinical Research Ethics Committee, Renmin Hospital of Wuhan University (Approval Number: WDRY2023-K064), due to the retrospective nature of the study, the need for informed consent was waived by the Clinical Research Ethics Committee, Renmin Hospital of Wuhan University.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, J., Yang, D., Zhou, Y. et al. Development of machine learning models for patients in the high intrahepatic cholangiocarcinoma incidence age group. BMC Geriatr 24, 553 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: