Multimorbidity patterns in the elderly: a prospective cohort study with cluster analysis

Background Multimorbidity is the coexistence of more than two chronic diseases in the same individual; however, there is no consensus about the best definition. In addition, few studies have described the variability of multimorbidity patterns over time. The aim of this study was to identify multimorbidity patterns and their variability over a 6-year period in patients older than 65 years attended in primary health care. Methods A cohort study with yearly cross-sectional analysis of electronic health records from 50 primary health care centres in Barcelona. Selected patients had multimorbidity and were 65 years of age or older in 2009. Diagnoses (International Classification of Primary Care, second edition) were extracted using O’Halloran criteria for chronic diseases. Multimorbidity patterns were identified using two steps: 1) multiple correspondence analysis and 2) k-means clustering. Analysis was stratified by sex and age group (65–79 and ≥80 years) at the beginning of the study period. Results Analysis of 2009 electronic health records from 190,108 patients with multimorbidity (59.8% women) found a mean age of 71.8 for the 65–79 age group and 84.16 years for those over 80 (Standard Deviation [SD] 4.35 and 3.46, respectively); the median number of chronic diseases was seven (Interquartil range [IQR] 5–10). We obtained 6 clusters of multimorbidity patterns (1 nonspecific and 5 specifics) in each group, being the specific ones: Musculoskeletal, Endocrine-metabolic, Digestive/Digestive-respiratory, Neurological, and Cardiovascular patterns. A minimum of 42.5% of the sample remained in the same pattern at the end of the study, reflecting the stability of these patterns. Conclusions This study identified six multimorbidity patterns per each group, one nonnspecific pattern and five of them with a specific pattern related to an organic system. The multimorbidity patterns obtained had similar characteristics throughout the study period. These data are useful to improve clinical management of each specific subgroup of patients showing a particular multimorbidity pattern. Electronic supplementary material The online version of this article (10.1186/s12877-018-0705-7) contains supplementary material, which is available to authorized users.


Background
Multimorbidity is defined as the coexistence of two or more chronic diseases [1,2]. Although overall life expectancy and healthy life years have increased worldwide, quality of life and functional capacity has worsened [3] due to the chronic conditions strongly related to aging. Some studies predict a rise in prevalence of these conditions [4]; population multimorbidity prevalence currently ranges from 12.9% to 95.1% [5]. In addition, rates of hospitalization and treatment for people with chronic diseases have soared; consequently, a growth in the burden of disease on health systems is assumed in general, and in primary health care in particular [3].
Although life expectancy has increased in the last century [3], research on multimorbidity has been limited and has focused on describing prevalence, estimating severity, and assessing quality of life [6,7].
In clinical practice, individual patients often present with a collection of chronic diseases which may or may not have a common aetiology, but which require greatly differing and often incompatible management. Prevalence studies, mostly with transversal designs, have identified multimorbidity patterns in patients older than 65 years, but few prospective longitudinal studies have been published and none of them have analysed a period longer than 4 years [5]. With better knowledge about the evolution of multimorbidity patterns, the joint management of several chronic diseases simultaneously could be more effective.
On the other hand, most of the published studies considered diseases, not individuals, as the variable of analysis in assessing multimorbidity patterns. This inhibits an exploration of multimorbidity patterns that takes into account their trajectories and evolution along the individual's lifetime.
Finally, no consensus has been established about a standard model to determine multimorbidity patterns. Published studies differ in the variables included, such as the unit of analysis selected (patients versus diseases), the statistical method for grouping diseases (factor analysis vs. cluster analysis), diseases included (chronic and/ or acute), and number of diseases considered [8,9]. Nevertheless, non-hierarchical cluster analysis assigns patients into a specified number of clusters [10]. The results are less susceptible to outliers in the data, the influence of the distance measure chosen, or the inclusion of inappropriate or irrelevant variables. Some nonhierarchical cluster analysis methods, like k-means, use algorithms that do not need a distance matrix and can analyse extremely large data sets [10][11][12].
The aim of this study was to identify multimorbidity patterns over a six-year study period in electronic health records from a Mediterranean urban population older than 65 years and with multimorbidity, attended in primary health care centres in Barcelona (Spain).

Design, setting, and study population
A cohort study with a cross-sectional analysis was carried out in each year of the study period, from 2009 to 2014, in Barcelona, Catalonia (Spain), a city of Mediterranean region with 1,619,337 inhabitants (31/12/2009) [13]. The Spanish National Health Service provides universal coverage, financed mainly by tax revenue. The Catalan Health Institute (CHI) manages 50 primary health care centres (PHCs) in Barcelona that represent 74% of the population [14]. The CHI's Information System for Research in Primary Care (SIDIAP) contains the clinical information as electronic health records (EHR) recorded by its PHCs since 2006 [15][16][17].
Inclusion criteria were 65-94 years of age on 31 December 2009 and at least one PHC visit during the 6year study period. From the initial sample of 206,146 ( Fig. 1), we excluded people who moved or otherwise sought care outside the CHI system. The only reason to exit the cohort was death (n = 24,013), and no new participants were introduced during the study period.
Prevalence of individual conditions varies with age, as does multimorbidity and disease patterns. In order to obtain a more homogenous sample in terms of multimorbidity, we focused on patients from Barcelona city with multimorbidity, defined as 2 or more diagnoses of chronic disease active as of 31 December 2009. We obtained information on that population during 6 years and analysed the data 6 times at cross-sectional time points, every December from 2009 to 2014. However, mortality data were obtained 5 times, from 2010 to 2014.

Coding and selection of diseases
Diseases are coded in SIDIAP using International Classification of Diseases version 10 (ICD-10). We mapped ICD-10 codes to International Classification of Primary Care, second edition (ICPC-2) codes in order to select chronic diseases by O'Halloran criteria [18] based on the ICPC-2. We only considered chronic diseases with a prevalence over 1% to avoid spurious associations and obtain epidemiologically coherent patterns. Chronic diseases were coded as a dichotomous variable.

Variables
The unit of measurement was the diagnosis (values: 1 for present, 0 for absent). Other variables recorded for each patient were the following: number of different diseases (chronic diseases active on 31 December each year), age groups in 2009 (65-79; ≥80), and sex (women, men).

Statistical analysis
Data access: Data were obtained from SIDIAP after the study was authorized. All authors were granted access to the database. No missing values were handled, as sex and age were universally recorded, so there were no missing values and no missing data were imputed. Wrong codes for sex-specific diagnoses and diagnoses with inconsistent dates were excluded.

Descriptive analysis
Analyses were stratified by sex and age. Descriptive statistics were used to summarize overall information. Categorical variables were expressed as frequencies (percentage) and continuous as mean (Standard deviation, SD) or median (interquartile range, IQR). Chisquare test and Mann-Whitney test were used to assess differences between age groups by sex.
Prevalence of each chronic disease was calculated for each year in order to study the evolution over time. Multimorbidity patterns were identified using two steps: 1) multiple correspondence analysis (MCA) and 2) k-means clustering. For every year of study , MCA and k-means analysis included only those individuals that were alive as of 31 December each year.

Multiple correspondence analysis
This data analysis technique for nominal categorical data was used to detect and represent underlying structures in the data set. The MCA method allows representation in a multidimensional space of relationships between a set of dichotomous or categorical variables, in our case diagnoses, that would otherwise be difficult to observe in contingency tables and to show groups of patients with the same characteristics [19,20]. MCA also allows the direct representation of patients as points (coordinates) in geometric space, transforming the original binary data to continuous data. The MCA analysis was based on the indicator matrix. Optimal number of dimensions extracted and percentages of inertia were determined by scree plot.

k-means clustering
From the geometric space created in MCA, patients were classified in clusters according to proximity criteria using the k-means algorithm with random initial centroids. Clusters centres were obtained for each cluster. Optimal number of clusters (k) was assessed according to Calinski Harabaz criteria, using 100 iterations. The optimal number of clusters is the solution with the highest Calinski-Harabaz index value. To assess internal cluster quality, cluster stability of the optimal solution was computed using Jaccard bootstrap values with 100 runs [10]. "Highly stable" clusters should yield average Jaccard similarities of 0.85 and above.

Multimorbidity patterns
To describe multimorbidity patterns, frequencies and percentage of diseases in each cluster were calculated. Observed/expected (O/E) ratios were obtained by dividing disease prevalence in the cluster by disease prevalence in each age group, by sex. To define a specific pattern, we considered those diseases with an intracluster prevalence ≥20% and an over-expression with O/ E ratio ≥ 2 [21]. The names of patterns are related to the main system affected in each cluster.
Descriptive statistics of age and number of diagnoses per each cluster were also obtained. Clinical criteria were used to evaluate the consistency and utility of the final cluster solution, based on clusters previously described in the literature and a consensus opinion drawn from the clinical experience of the research team (3 family physicians and 2 epidemiologists engaged in daily patient care). Stability in the patterns was considered as the number of persons staying in the same pattern in 2014, as well as the percentage of people who remained in the same pattern at the end of the study compared to 2009. The consistency of multimorbidity patterns was established by analysing the number (percentage) of people who remained stable within the cluster during the study period.
The analyses were carried out using SPSS for Windows, version 18 (SPSS Inc., Chicago, IL, USA) and R version 3.3.1, procedures FactorMineR, fpc, and vegan(R Foundation for Statistical Computing, Vienna, Austria).

Results
Out of 206,146 persons analysed at the beginning of the study in 2009, 190,108 (92.2%) fulfilled multimorbidity criteria ( Fig. 1) and 59.8% were women. The mean age at the beginning of the study was 71.8 (SD 4.35) years for the group 65-79 years old, and 84.2 years (SD 3.46) for the group over 80. In 2009, 31.2% to 39.1% of the population had fewer than 5 chronic diseases, while 40.2% to 42.3% had 6 to 9 diseases and 20.7% to 28.2% had received more than 10 diagnoses. The median number of diseases was 7 (IQR 5-10) for women and for men older than 80 years; the younger men (aged 65-79 years) had a median of 6 diseases (IQR 4-9) ( Table 1).

Chronic diseases prevalence
Hypertension, uncomplicated was the most prevalent chronic disease in all groups over the period of time studied, followed by Lipid disorder. In the group aged 65-79 years, uncomplicated hypertension affected 69% of women and 68% of men in 2009, and lipid disorder affected 57.7% and 49.4%, respectively. Other prevalent diagnoses for women in this age group in 2009 were Osteoporosis (32.6%), Obesity (29.2%), and Depressive disorder (27.3%); among men, ageing-related diseases were prevalent, including Benign prostatic hypertrophy (41.6%), Cataracts (21.4%), and Diabetes, non-insulindependent (30.8%). The top 10 chronic diseases for women and men throughout the study period are shown in Fig. 2. Few changes in prevalence were observed over the 6 years analysed.

K-means clustering
Using the Calinski criterion, six clusters were considered as the optimal solution for both age and sex strata. Average Jaccard bootstrap values for both women and men were 0.85 and above.

Multimorbidity patterns
For each of the four groups studied (two age groups of men and women), 6 clusters were identified using the kmeans method. The first pattern, formed by only the most prevalent diseases, was named the "nonspecific" pattern; the remaining 5 patterns were specific to Musculoskeletal, Endocrine-metabolic, Digestive/digestive-respiratory, Neuropsychiatric, and Cardiovascular diseases, in decreasing order depending on the percentage of the population included [see Additional files 1, 2].
The first cluster had the largest percentage of the sample, both women and men: 35. 6 Table 3 shows men aged 65-79 years with the Neuropsychiatric pattern, containing almost the same diseases as the homologous pattern in women. Differences between the patterns are mainly sex-related diseases such as Benign prostatic hypertrophy.
Following the same method as these two examples, it can be observed that chronic diseases included in each pattern at the beginning of the sample mostly persisted throughout the 6 years analysed. Some variations were observed, such as chronic disease leaving the pattern when it did not meet the inclusion criteria, sometimes only by a few decimal points that decided whether a disease remained in a pattern or not [see Additional files 1-4]. Among women aged 80 and older, as in the younger group, we defined six clusters (Nonspecific and 5 specific multimorbidity patterns) with the same names, even if the diseases varied, because the main system affected was the same. The Muskuloskeletal, Endocrine-metabolic, Digestive and Cardiovascular patterns showed changes in 1 or 2 diseases, but the Neuropsychiatric pattern had added 4 diseases to the cluster by the end of the study period [see Additional file 3].
Several differences were observed in the older group of men, as well. First, the Endocrine-metabolic pattern in this age group was defined by diseases localized in the Cardiovascular patterns in men aged 65-79 years.
Secondly, the Digestive pattern incorporated respiratory diseases, becoming the Digestive-respiratory pattern (as in the last year analysed in men 65-79 years), composed of 9 more chronic diseases than the Digestive pattern. Thirdly, the Neuropsychiatric and Cardiovascular patterns lost some diseases. Finally, no important changes were found in the Musculoskeletal pattern [see Additional file 4].
Furthermore, the percentage of patients whose multimorbidity pattern remained stable exceeded 42.5% for all patterns per each sex and age group. The Nonspecific patterns had the highest values for stability at the end of the period for all groups except men aged 80 and older, for which the cardiovascular pattern was the highest (Fig. 3).

Discussion
We explored multimorbidity patterns and their 6-year evolution in people aged 65 years and older with multimorbidity attended in PHC. The most prevalent chronic diseases, Hypertension, uncomplicated and Lipid disorder, were represented in all clusters in all four groups (i.e., men and women aged 65-79 and ≥80 years). We found 6 clusters per group, 5 of them with a specific pattern related to an organic system: Musculoskeletal, Endocrine-metabolic, Digestive/Digestive-respiratory, Neuropsychiatric and Cardiovascular patterns. We analysed multimorbidity patterns over 6 years and found that they remained quite similar from the beginning to the end of the study period. We observed a high prevalence of multimorbidity in our population sample, with a higher proportion for women, as in other published studies [5,8] and described 6 patterns in each study group. In addition, the prevalence of chronic diseases and multimorbidity patterns was similar to previous studies in Catalonia [22] and in other developed countries [23][24][25]. In a separate study in the same sample, we analysed mortality rates and observed higher mortality among men with Digestive-respiratory patterns and among women with Cardiovascular pattern [26]. In both age groups, both men and women had the same 5 multimorbidity pattern names plus one additional cluster: a Digestive disease pattern in women and a Digestive-respiratory pattern in men. This difference is probably related to the smoking and alcohol habits that were more common among men than among women in the age groups studied [27]. The differences observed between age groups were related to disease prevalence and O/E ratio; no significant differences between men and women were found in the systems that were most commonly affected by the prevalent diseases. As a result, future clinical guidelines could focus on improving common management of multimorbidity in all older patients.
It is particularly noteworthy that more than 50% of those showing the Nonspecific pattern remained in that same pattern across the period analysed, without moving on to a specific pattern; a few degenerative diseases were added in the older groups. In addition, this first (Nonspecific) cluster was defined by highly prevalent diseases, with no over-represented chronic diseases, so that the association between diseases could exist by chance. Consequently, this first cluster showed that a considerable portion of the sample had no system-specific pattern.
In contrast, across the specific patterns we also observed a large proportion (range from 42.5 to 64.7%) of people remaining stable (in terms of chronic disease prevalence) in the same pattern. Maximum stability was observed for the Nonspecific pattern in both groups aged 65 to 79 years and in older women; for men aged 80 and older, the Cardiovascular pattern showed the greatest stability. Moreover, some people changed from one pattern to another but the multimorbidity pattern kept mostly stable during the 6 years studied, confirming the long-term stability of the multimorbidity pattern composition. In view of these results, an association could be hypothesized between multimorbidity and specific genetic conditions, as well as previously suggested associations with lifestyle and environmental conditions [28].
Estimates of multimorbidity pattern prevalences differ deeply in the literature because of variations in methods, data sources and structures, populations and diseases studied. Although this makes it challenging to compare study results [5,29,30], there are some similarities between the present and previous studies. For instance, the most common organic systems affected in previous studies of multimorbidity patterns were cardiovascular/metabolic, neuropsychiatric (mental health), and musculoskeletal [30]. Our study found patterns affecting these same organic systems; however, it offers another point of view for defining multimorbidity patterns. Cluster analysis shows the complexity of multimorbidity in persons aged 65 years and older and is likely to be helpful in shaping future strategies to continue studying this important health issue.
Previous studies have analysed no more than four years of data [29], compared to six years of information about the evolution of a multimorbidity pattern in our study. As a result, we identified long-term stability in multimorbidity patterns, observing some differences between age groups, related to prevalence and O/E ratio in chronic diseases. Useful information can be extracted from our study for the monitoring and treatment of each multimorbidity pattern.

Strengths and limitations
A major strength of this study is the analysis of a large, high-quality EHR database, representative of a large population. In the context of a national health system with universal coverage, EHR data have been shown to yield more reliable and representative conclusions than those derived from survey-based studies [25]. The inclusion of all chronic diagnoses registered in EHR contributed to a more accurate analysis of the multimorbidity patterns in this population. Moreover, the use of data collected by the primary health care system increased the external validation of the information extracted because primary care centres in Barcelona attended more than 70% of the population at least once a year during the study period. As the nonspecific pattern contained well-known chronic diseases with established clinical guidance, the information extracted is relevant but less useful in clinical practice than the specific patterns defined. The long time period observed provided information on the stability of the patterns during six years, enabling us to focus on creating better strategies to address all five specific patterns in terms of prevention, diagnosis, and treatment of these systemic clusters of prevalent diseases.
A number of limitations must be taken into account as well. First, EHR accuracy depends on the data entered by each general physician or nurse, and EHR systems are not designed as general-purpose research tools [31]. Another weakness could be the attention only to chronic diseases, which precludes awareness of acute diseases or bio-psychosocial factors [2]. Nonetheless, the inclusion of a wide range of diseases makes it possible to find multimorbidity patterns not previously obtained and increases complexity in terms of assembling patterns. Finally, we did not have data on cause of death.
In addition, using MCA can produce low percentages of variation on principal axes, complicating the choice of the number of dimensions to retain. We assumed a fivedimension solution, using the elbow rule in the scree plot to have the most accurate solution possible without including an unwieldy number of dimensions in the analysis [19]. Although we did not retain the total variance of the dataset, clustering techniques can be applied to the reduced dataset while preserving its complexity.
The strength of using k-means cluster analysis is that the results are less susceptible to outliers in the data, the influence of the chosen distance measure, or the inclusion of inappropriate or irrelevant variables. The method can also analyse extremely large data sets (as in this study), as no distance matrix is required. On the other hand, some disadvantages of the method are that different solutions can occur for each set of seed points and there is no guarantee of optimal clustering [11]. To minimize this shortcoming, we tested the internal validity of our solution using bootstrap methods [32], and the results were highly stable (Jaccard > 0.85). However, the method is not efficient when a large number of potential cluster solutions are to be considered [11]; to address this limitation, we computed the optimal number using analytical indexes like Calinski Harabasz [33].

Future research
With this confirmation of the stability of multimorbidity patterns across age groups, sex, and time, some actions could be considered to improve multimorbidity management. For instance, clinical guidance could encompass a specific pattern to deal with its complexity rather than creating multiple guidelines for each of the chronic diseases. Relevant information could be extracted from our study for the monitoring and treatment of each multimorbidity pattern. Finally, genetic factors, as well as socioeconomic status, should be taken into account in future studies.

Conclusions
We identified a very large proportion of people over 65 years with multimorbidity, distributed in six clusters; five affected a specific system in the body and one had a nonspecific pattern. The major portion of the sample fit this last pattern, which had few diseases; this finding could be related to genetic or social characteristics of the sample. On the other hand, stability in a specific pattern over an extended time period might give us the information needed to take a new approach and improve a patient's situation. For instance, a new clinical practice guideline could be developed to control a combination of chronic diseases rather than each one individually.
As the prevalence of chronic diseases was stable over the period studied, multimorbidity patterns also became firmer. Therefore, the k-means technique is useful to analyse multimorbidity patterns in real-world data.
The observation that multimorbidity patterns are constant over time is very useful for the specific clinical management of each patient who fits a specific multimorbidity pattern. Further studies using this method in other groups of patients should be performed to validate the results obtained.