### Study population

This study used the Longitudinal Ageing Study in India (LASI-Wave-I), a nationally representative survey of 72,250 older adults aged 45 and above conducted in 2017–18. The survey followed a multistage stratified area probability cluster sampling design to arrive at the eventual units of respondents. The survey in rural areas adopted a three-stage sampling design, and a four-stage sampling design in urban areas. The first stage involved selecting Primary Sampling Units (PSUs) in each state and union territories, that is, sub-districts (Tehsils/Taluks), and the second stage involved the selection of villages in rural areas and the selection of wards in urban areas in the selected PSUs. In the third stage, households were selected from selected villages in rural areas. However, sampling in urban areas involved an additional stage: one Census Enumeration Block (CEB) was randomly selected in each urban area. In the fourth stage, households were selected from these CEBs. Each consenting respondent in the sampled households was administered an individual survey schedule. The Indian Council of Medical Research (ICMR) extended the ethical approval for conducting the LASI. All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all subjects and/or their legal guardian(s). The detailed methodology, with complete information on the survey design and data collection, ethical considerations, and quality control measures, is available in the published survey report [11]. For this study, the sample of older adults aged 60 + years were considered (*n* = 31,646; men = 16,366 and women = 15,098).

### Outcome variable

The outcome variable for this study was ‘cognitive impairment’, which was assessed using the composite cognition score based on five cognitive domains named memory, orientation, arithmetic, executive functioning skills, and object naming. The composite cognitive score ranges from 0 to 43; the higher the value of the score indicates higher cognitive ability. The lowest 10^{th} percentile measured poor cognitive functioning [12]*.*

### Predictor variables

*Demographic factors* included were gender (male, female), age (60–69, 70–79, 80 + years), marital status (currently-in-union, not-in-union), caste (scheduled caste-SC, scheduled tribe-ST, other backward classes-OBC, and none of them), religion (Hindu, others), place of residence (rural, urban), and region (north, west, south, east, and north-east). The s*ocioeconomic factors* considered were years of schooling (no schooling, up to 9 years, and 10 and more years), working status (currently working and currently not working), and monthly per capita consumption expenditure (MPCE quintile). T*wo element*s assessed the social support: financial support (no, yes) and living arrangements (living alone, living with spouse and/or others, living with spouse and children, living with children and others, and living with others only). *Health aspects* included as predictors were body mass index (BMI- underweight, normal, obese), self-rated health (good, moderate, poor), depression (no, yes), difficulty in activities of daily living (ADL- no, yes), and instrumental activities of daily living (IADL- no, yes). The CES-D scale [13] was used to estimate the presence of depressive symptoms. Additionally, alcohol consumption (yes, no) and smoking (yes, no) were considered under the health aspects.

### Statistical analysis

The Stata 16.1 software was used for the data analysis. An analysis of the differences was conducted using the *Chi-square test*. The sample population was classified into two groups according to their level of cognitive function as per composite cognitive score: (i) 0 represents ‘do not have cognitive impairment’ and (ii) 1 represents ‘Have cognitive impairment’. In first stage, Multiple binary logistic regression analysis was used to estimate the effects of the demographic, socioeconomic, social support, and health factors on CoI.

The equation for logistic distribution is:

$$l_n\frac{\mathrm\pi}{\left(1-\mathrm\pi\right)}\;=\;a+\beta_1X_1+\beta_2X_2+\beta_3X_3+\dots+\beta_nX_n$$

where, *X*_{1}*, X*_{2,}* X*_{3}*,…X*_{n} are explanatory variables and *β*_{1}*, β*_{2}*, β*_{3}*, … β*_{n} are regression coefficients.

In second stage, *the concentration index (C)* and *concentration curve* (CC) were prepared to reflect the expense-related economic inequality in CoI. The present study examined CoI among the older adults by economic status quintiles. The C was defined as twice the area between the line of equality and CC. The CC was plotted based on the cumulative percentage of CoI on the Y-axis against the cumulative percentage of the population ordered by economic status on the graph's X-axis. The C can be calculated using the following formula:

$$\text{Concentration index}\left(\text C\right)=\frac2{\mathrm\mu\;\boldsymbol\ast\boldsymbol\;cov(h,r)}$$

where, h = the health outcome (CoI among older adults in the study) \(\upmu\) = the mean of h, r = the fractional rank of individuals in the distribution used (economic status quintiles). The value of the C ranges between -1 to + 1, a value of '0' represents absolute equality or fairness, and there is no income-related inequality in terms of CoI. A positive C value indicates that CoI is more concentrated among richer people (pro-poor), while a negative value suggests more concentration among poor people (pro-rich).

To determine the impact of the different categories of explanatory variables, we used the concept of Shapley decomposition [14,15,16] which is quite well known. We applied the simplest type of Shapley decomposition to determine the impact of demographic, socioeconomic, social support, and health-related variables on the inequality of the CoI among the older adults. The Shapley value decomposition is useful in regression-based methods as it does not require the regression model to be linear. The Shapley value decomposition method relies on iteratively removing explanatory variables to determine how much each contributed to overall inequality. It should be emphasized that we describe the process using the zero Shapley decomposition technique. To perform the Shapley value decomposition analysis, we have included four categories of variables reflecting demographic, socioeconomic, social support, and health factors. Demographic variables include sex, age, marital status, religion, and place of residence of the respondents; socioeconomic variables consist of education level and wealth status (MPCE quintile) of the respondents; social support variables include financial support and living arrangements; and lastly, health-related variables include self-rated health, alcohol consumption status, ever smoked status, difficulties in ADL and IADL and depression. A grouped Shapley decomposition has been performed to reflect the impact of variables on the inequality of CoI. we take into account all possible combinations of demographic, socio-economic, social support and health factors via the so-called Shapley decomposition procedure.

*d*_{ki} denotes the level of the circumstance variable k (k = 1,…, K) for individual i, *e*_{hi} (h = 1,…, H) the socio-economic variable of individual i and *s*_{li} denotes the social support factors l (i = 1,…,L) for individual i. Finally, let us call *h*_{mi} the value of the health variable m (m = 1,…,M) for individual i.

The actual likelihood ratio can be written as,

$$LRI_1=LRI\left(dki\neq0;ehi\neq0;sli\neq0;hmi\neq0\right).$$

Assume for example that we do not include the demographic variables, d_{ki}, of the different individuals in the regression in such a case the likelihood ratio will be expressed as,

$${\mathrm{LRI}}_2=LRI\left(dki=0;ehi\neq0;sli\neq0;hmi\neq0\right).$$

Similarly, assume that we do not include in the regression the socio-economic variables, *e*_{hi}*.* In such a case we will define the likelihood ratio as, *LRI*_{3} = *LRI (d*_{ki} ≠ *0; e*_{hi} = *0; s*_{li} ≠ *0; h*_{mi} ≠ *0)*. We can also assume that we do not introduce in the regression the social support variables, *s*_{li}*,* in which case the likelihood ratio will be *LRI*_{4} = *LRI (d*_{ki} ≠* 0; e*_{hi} ≠ *0; s*_{li} = *0; h*_{mi} ≠ *0).* Lastly, we assume that we do not include the heath factors in the regression then the likelihood ratio will be *LRI*_{5} = *LRI (d*_{ki} ≠ *0; e*_{hi} ≠ *0; s*_{li} ≠ *0; h*_{mi} = *0).*

Naturally, we could also decide not to include two sets of explanatory variables (e.g., the demographic and the socio-economic variables, the demographic and social support variables, the demographic and health factors, the socio-economic and social support variables, the socio-economic and health factors, the social support and health variables, named respectively, *LRI*_{6}*, LRI*_{7}*, LRI*_{8}*, LRI*_{9}*, LRI*_{10}*, LRI*_{11}*.*

Using the by now well-known Shapley procedure we derive that the contribution of demographic variables, **C**_{d} to the overall actual likelihood ratio, *LRI*_{1}, may be expressed as,

\({\mathbf C}_{\mathbf d}=\left(\frac28\right)\left(LRI_1-LRI_2\right)+\left(\frac18\right)\left(LRI_4-LRI_6\right)+\left(\frac18\right)\left(LRI_3-LRI_5\right)+\left(\frac18\right)\left(LRI_8-LRI_{10}\right)+\left(\frac18\right)\left(LRI_7-LRI_9\right)+\left(\frac28\right)\left(LRI_{11}\right)\), since by definition, LRI (*d*_{ki} = 0; *e*_{hi} = 0; *s*_{li} = 0; *h*_{mi} = 0) = 0.

Similarly, the contribution of socio-economic factors, **C**_{e}, to the actual likelihood ratio, LRI_{1}, may be expressed as,

$${\mathbf{C}}_{\mathbf{e}} =\left(\frac{2}{8}\right) \left(LR{I}_{1}-LR{I}_{3}\right) + \left(\frac{1}{8}\right) \left(LR{I}_{4}-LR{I}_{7}\right)+ \left(\frac{1}{8}\right) \left(LR{I}_{2}-LR{I}_{5}\right)+ \left(\frac{1}{8}\right) \left(LR{I}_{6}-LR{I}_{9}\right)+ \left(\frac{1}{8}\right) \left(LR{I}_{7}-LR{I}_{10}\right)+ \left(\frac{2}{8}\right) \left(LR{I}_{8}\right)$$

Likewise, the contribution of social support, **C**_{s}, to the actual likelihood ratio, LRI_{1}, may be expressed as,

$$\mathbf{C}_\mathbf{s}=\left(\frac28\right)\left(LRI_1-LRI_4\right)+\left(\frac18\right)\left(LRI_3-LRI_7\right)+\left(\frac18\right)\left(LRI_2-LRI_6\right)+\left(\frac18\right)\left(LRI_5-LRI_9\right)+\left(\frac18\right)\left(LRI_8-LRI_{11}\right)+\left(\frac28\right)\left(LRI_{10}\right).$$

Finally, the contribution of health-related variables, **C**_{h}, to the actual likelihood ratio, LRI_{1}, may be expressed as,

$${\mathbf C}_{\mathbf h}=\left(\frac28\right)\left(LRI_1-LRI_5\right)+\left(\frac18\right)\left(LRI_2-LRI_8\right)+\left(\frac18\right)\left(LRI_3-LRI_{10}\right)+\left(\frac18\right)\left(LRI_4-LRI_7\right)+\left(\frac18\right)\left(LRI_6-LRI_7\right)+\left(\frac28\right)\left(LRI_9\right)$$

It is then easy to verify that,

$$C_d+C_e+C_s+C_h=LRI\left(dki\neq0;ehi\neq0;sli\neq0;hmi\neq0\right)$$

In other terms, by taking into account all possible combinations of the sets of explanatory variables we can then easily derive the respective contributions of demographic variables, socio-economic, social support and health related variables to the actual likelihood ratio.