Palliative LHS Analysis

Virginia M. Miori

Data Ethics and Digital Privacy in Learning Health Systems for Palliative Medicine

ISBN: 978-1-80262-310-9, eISBN: 978-1-80262-309-3

ISSN: 2050-2060

Publication date: 15 November 2023

Abstract

Data mapping from synthesized data to palliative care characteristics was the final step before the final analysis of survival. Background and foundation for Kaplan-Meier curves are provided before generating curves for the three Palliative Care Groups. Interpretations of the Kaplan-Meier curves are presented along with interpretation of the associated Hazard Curves. Three statistical hypothesis tests, completed on a pairwise basis, are used to verify that the survival curves differ by group. Patients mapped to specific groups may be further supported through advice, counseling, and other services to assist them in moving to a more advantageous care group.

Keywords

Citation

Miori, V.M. (2023), "Palliative LHS Analysis", Miori, V.M., Miori, D.J., Burton, F. and Cardamone, C.G. (Ed.) Data Ethics and Digital Privacy in Learning Health Systems for Palliative Medicine (Studies in Media and Communications, Vol. 23), Emerald Publishing Limited, Leeds, pp. 111-123. https://doi.org/10.1108/S2050-206020230000023008

Publisher

:

Emerald Publishing Limited

Copyright © 2024 Virginia M. Miori


As a reminder, the synthetic data was representative of the state of Massachusetts. In addition to the use of census data and centers for disease control (CDC) data, the 2020 biennial health insurance survey was also important in establishing a foundation (Collins, Gunja, & Aboulafia, 2020). The following relevant findings were important:

  • The sample was gathered nationally, with 4,272 participants.

  • More than 43% of working-age adults had inadequate health insurance when the COVID-19 pandemic began.

  • The adult uninsured rate was 12.5%, with 9.5% having a gap in coverage, and 21.3% being underinsured.

  • Half of underinsured or uninsured adults were paying medical debt over time.

These findings shed additional light on the difficulties associated with delivering palliative care in an equitable fashion. The SYNTHEA data mapping process incorporates these findings through weighting of characteristics.

The Learning Health System (LHS) has the primary purpose of analyzing the combined EHR data, and public data such as the US Census American Community Survey and the Centers for Disease Control Behavioral Risk Factor Surveillance System. The two parts of the system (client and server) are described in detail in Chapter 10. The current chapter details the survival analysis and the interpretation of the analysis as performed by the LHS.

Data Mapping to Characteristics of Care

The augmented synthetic data required mapping to the characteristics as previously defined by ). The characteristics include advocacy, translation, education, employment, income as a percentage of poverty level, insurance coverage, overall health, and denial of need for one’s own care. This section does not provide mappings for advocacy, overall health, and denial of need for care, though we can draw conclusions about these factors based on the collected data. Advocacy, overall health, and denial of need for care are truly individual-specific and may take the form of answered questions and/or healthcare provider observations and will serve as inputs for the client side of the LHS.

  • Translation: The language variable acts as a limited surrogate for translation. It reflects only those synthesized patients who do not speak English in their home. To fully act as a surrogate, this variable may be revised based on a patient’s observed ability to assimilate the medical vocabulary (as delivered by their healthcare providers). The three levels of language skills are: English in Home/Fluent, Conversant, and Poor.

  • Education: The educational attainment variable was assigned using the empirical distributions developed from the PUMS data, as described in Chapter 6. For the LHS, three levels were identified: Incomplete Secondary, Secondary, and Post Secondary. Given that palliative needs tend to evolve more fully as patients age, the educational attainment levels heavily reflect trends from the 1960s to 1980s.

  • Employment: The employment status variable was also assigned using the empirical distributions developed in Chapter 6, along with the IRS definition of full-time work. Three levels were identified: Unemployed, Underemployed, and Well Employed/Retired. The synthetic patients were heavily weighted toward well employed and retired, moving patients closer to the highest level of palliative care.

  • Income: Income was represented by two variables, income bracket and income as a percentage of poverty level. Income estimate generation has only recently become available in SYNTHEA modules. Since the data was generated before this Income module became available, the PUMS data salaries were used. Each patient was assigned to a salary bracket based on sex, race, and age group. The brackets were summarized and analyzed to produce empirical distributions by demographic intersections of sex, race, and age group.

As customary in the data mapping, the distributions were used to assign income to the synthesized patients. The second income variable was income as a % of poverty level. To calculate this value, the mean income for each bracket (in ) was divided by $26,200, the published poverty level for families of four in 2020 (). Three levels of income were identified: 150% of poverty level or below, 150% to 250% of poverty level, and 250% or more of poverty level. Given the importance of income in this care level evaluation, the ability to collect income level by patient will allow for the replacement of this estimating process and greater accuracy.

Table 8.1.

Mean Income Assignment for Percent of Poverty Level Calculation.

Bracket Mean Income Value
< $10K $5,000
$10K <= x < $15K $12,500
$15K <= x < $20K $17,500
$20K <= x < $25K $22,500
$25K <= x < $35K $30,000
$35K <= x < $50K $42,500
$50K <= x < $75K $62,500
>= $75K $135,000

Insurance: Insurance is represented by three levels: uninsured, underinsured, and insured. The definition of uninsured is simple to understand, but the breakpoint between insured and underinsured is much less clear. Commonly, a patient who earns more than poverty level, is considered to be underinsured if annual health care costs exceed 10% of their annual income. For those patients living at or below poverty level, if annual health care costs exceed anywhere from 5% to 7.5% of their annual income, they are considered to be underinsured.

Within the synthetic patient data, there is a field for lifetime medical expenses, and a second field for lifetime covered expenses. Each of these values was divided by the patient age to get annual values. The difference between these values was calculated and divided by that patient’s estimated annual income. The resulting percentage was compared to 10% and 7.5% respectively for synthesized patients above poverty level and those at or below poverty level. Patients with medical expenses exceeding these thresholds are considered to be underinsured.

Kaplan-Meier Curve Background

Data that examines end event alternatives, like survival data, poses a challenge in analysis. Since not all members of the population (or sample) will have experienced the end event, traditional techniques like logit and probit models are not applicable. These approaches provide valuable insights into the odds of events and ranges of categories (both meaningful in this setting), but they rely on fully populated data.

The Kaplan-Meier curve was introduced in 1958, in an effort to accommodate survival modeling (). It allows for incomplete data, which in the case of this research is represented by palliative patients who continue to survive. The approach is limited by having only one factor difference between population groups, but this limitation is overcome by grouping the patients as presented in .

Table 8.2.

Group Characteristics for Palliative Care.

Group 1 Group 2 Group 3
Patient Goal Status Poorly Defined or Absent Possible discussion of Goals Well Defined Goals
Patience Care Status Absence of Care Adequate Care Exceptional Care
Advocacy No Yes Yes
Translation Poor Conversant Fluent
Insurance Uninsured Underinsured Insured
Education Incomplete Secondary Secondary Post Secondary
Employment Unemployed Under-Employed Well Employed/Retired
% of Poverty Level 05-150% 150%-250% >250%
Overall Health Poor Good Excellent
Denial of Need for Own Care Yes No No

To use Kaplan-Meier curves, three variables must be defined. The first is outcome; in the case of this research, outcome was designated as “Alive” or “Deceased.” This information was articulated by the death date field, which contained either a null value or a date. Patients with a null value in this field were interpreted as “Alive” and those with a non-null date were interpreted as “Deceased.”

The second variable represents the time until an end event occurs. To determine this time, the intervention, or the point in time that treatment began had to be identified. Since the patients were viewed as palliative when they exhibited three or more chronic conditions (comorbidities), the diagnosis date for the third condition was identified as the intervention date or the date representing the beginning of treatment. The necessary calculation was the serial (secular) time variable calculated as the time to the end event. Using date fields, and a date subtraction function, the time to the end event (death in this case) was calculated using the difference between a patient’s death date and intervention date, or in the case of surviving patients, it was calculated as the difference between the data collection date and the intervention date. The date subtraction function yielded a response in days, this number was transformed into months using 365.2422, the true length of a year on earth, divided by 12 months (). The average true length of a month was 30.437 days.

In some cases, the intervention/treatment date was coincident with the patient’s death date. These observations remain in the data, but further research and evaluation is warranted on this choice. It could be effectively argued that these patients would not be considered palliative. Inclusion or exclusion of these patients impacts the shape and progression survival curves.

Finally, the group or stratification assignment populates the third variable. In clinical studies, the group is simply a representation of the experimental group into which a participant was randomly assigned. For the palliative patients, the group assignments require greater complexity to yield meaningful and applicable results.

Group Assignment

The characteristics in were used to map synthesized patients to Groups 1–3 using the coded variables for education, employment, income, insurance, and translation as discussed in detail in Chapter 7. For the mapping, each level of each characteristic was numbered from one to three with one being the least advantaged level of the variable. For example, unemployed was assigned a code of 1, underemployed was assigned a code of 2, and well employed/retired was assigned a code of 3. The complete variable definitions are contained in Appendix.

Interactions between these variables are intuitively apparent, and important to understand. As an illustration: insurance, employment, and income may have multiple interactions. Public insurance (Medicare and Medicaid) can be more comprehensive in coverage that some private insurance purchased through the health marketplace. It is therefore useful to evaluate statistical significance among the pairwise coded variables.

Cross-tabulation of the data was completed with subsequent chi-square tests. shows a sample cross-tabulation between income level and insurance status, and provides the associated chi-square test output. summarized the chi-square test results for all of the coded variables. With a p-value less than our chosen alpha of 0.05, the conclusion, that the variable distributions differ from each other, is drawn. In cases of insignificance, like the pairing of Employment Status and Insurance Status, a difference between the variable distribution is not concluded to exist. Variables may therefore overlap and impact more than one of the categories in .

Table 8.3.

Income Level and Insurance Status Cross-Tabulation.

Insurance Code
1 2 3 Total
Income Code 1 569 8602 221 9392
2 408 6522 407 7337
3 487 7309 955 8751
Total 1464 22433 1583 25480
Table 8.4.

Chi-Square Results for Employment by Income.

Chi-Square Tests
Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 578.486a 4 <.001
Likelihood Ratio 591.682 4 <.001
Linear-by-Linear Association 309.476 1 <.001
N of Valid Cases 25480

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 421.56.

Table 8.5.

Chi-Square Results Summary.

Variable 1 Variable 2 Significance Level
Employment Education 0.041
Employment Income <0.001
Employment Insurance 0.874
Employment Translation <0.001
Education Income <0.001
Education Insurance 0.680
Education Translation 0.388
Income Insurance <0.001
Income Translation <0.001
Insurance Translation 0.038

The overlapping nature of the variables supports the choice of examining the variables in combination, rather than individually. This approach yields a powerful impact in the group assignments. The coded variables or characteristics are weighted based on their relative level of importance in the assignment of groups. The weights assigned for this round of analysis were: Education (0.25), Translation (0.20), Employment (0.125), Income % Poverty (0.125), and Insurance (0.30). In general, the weights and ranges should always be adjusted to reflect the priorities paced on the variables by healthcare practitioners. They should also be reflective of the most relevant demographics and/or patient populations. The selection of these weight values were intended, first and foremost, to achieve a proof of concept while producing a fairly accurate reflection of the broader population. The weights are easily adjusted to reflect changes or differences in priorities and importance. The scores were accumulated and groups were assigned based on ranges:

  • Group 1 [0.0, 2.2),

  • Group 2 [2.2, 2.6), and

  • Group 3 [2.6, 3.0].

The final group assignments placed 47.6% of synthesized patients in Group 1, 29.4% of patients in Group 2, and 22.9% of patients in Group 3.

Rather than allowing an LHS to substitute an analytics judgment, the flexibility inherent in the weighting strategies allows physicians to use their own nuanced and experience-based judgment in diagnosing patients. With the groups assigned, survival curves could be generated.

Kaplan-Meier Curve Generation

Before generating the Kaplan-Meier curves, understanding the implications of the incomplete data was critical. Based on the nature of progression of palliative care, a frame of seven years was chosen as the maximum for the analysis. From the point of the diagnosis of the third comorbidity, seven years allows for the lifespan of the vast majority of patients. This is equivalent in nature to setting this time as the duration of a study.

During any study, participants or patients may depart the study. Patients may elect to leave treatment, or seek alternative medical care, and fall out of the analysis. In addition, patients lacking an end event date are also considered to have dropped off. These patients are censored to eliminate bias between groups based on inconsistent departure proportions. The Kaplan-Meier analysis assumes that censored patients follow the same progression as patients who have reached the end event, or death in this analysis.

The Log-Rank test, also known as the Mantle-Cox test, is used when groups are considered to be independent, but is robust against departures from proportional hazards. The Gehan-Breslow-Wilcoxin test requires that one group have a consistently higher risk than another. If two curves cross, then the relationship switches, but the assumption is still met. The Tarone-Ware Test places heavier weight on risks in the earlier time periods, and is a variant of the Log-Rank Test. In all three cases, the null and alternative hypothesis are:

H0. There are no differences between the populations in Groups 1–3.

H1. Differences exist between the populations in Groups 1–3.

As a conservative approach, all three tests’ results are presented along with pairwise comparisons tests. The means and medians for survival time are presented in , along with confidence intervals for both. The pairwise tests are provided in .

Table 8.6.

Mean and Median Output from the Kaplan-Meier Curve Generation.

Means and Medians for Survival Time
Meana 95% Confidence Interval Median 95% Confidence Interval
Group Estimate Std. Error Lower Bound Upper Bound Estimate Std. Error Lower Bound Upper Bound
1 23.727 .396 22.950 24.503 13.700 .968 11.804 15.597
2 24.951 .394 24.17 25.724 16.329 .825 14.711 17.947
3 27.548 .430 26.705 28.391 20.896 .900 19.132 22.659
Overall 25.358 .235 24.897 25.818 17.084 .518 16.069 18.100

a. Estimation is limited to the largest survival time if it is censord.

Table 8.7.

Pairwise Tests on Kaplan-Meier Curves.

Pairwise Comparisons
1 2 3
Group Chi-Square Sig. Chi-Square Sig. Chi-Square Sig.
Log Rank (Mantel-Cox) 1 6.664 .010 48.238 <0.01
2 6.664 .010 20.035 <0.01
2 48.238 <0.01 20.035 <0.01
Breslow (Generalized Wilcoxon) 1 5.596 0.18 41.958 <0.01
2 5.596 0.18 18.037 <0.01
3 41.958 <0.01 18.037 <0.01
Tarone-Ware 1 6.003 .014 46.720 <0.01
2 6.003 0.14 20.324 <0.01
3 46.720 <0.01 20.324 <0.01

The estimated survival time for patients in Group 1 was 23.727 months, for patients in Group 2 was 24.952 months and finally patients in Group 3 had a survival estimate of 27.548 months. All pairs tested as significantly different using all three tests, leading to the rejection of the null hypotheses, and the conclusion that the Kaplan-Meier curves are distinct and statistically significant in their differences.

The Kaplan-Meier curves are provided in . The vertical axis is labeled as the cumulative survival. In the case of palliative patients, this axis may be interpreted as the proportion of quality of life. As their conditions worsen, and Geriatric Syndromes begin to have an impact, the quality-of-life decreases. Patients will be less capable both physically and emotionally, with potential mental health issues such as depression appearing. Over the time of their palliative (non-curative) treatment, their quality of life will continue to decline until the end of life. The horizontal axis represents the time in months from diagnosis as a palliative patient until end of life. The observations end at 84 months or 7 years. Note that the immediate drop in quality of life at the 0-month mark is due to the patients whose dates of diagnosis for the third comorbidity and death were coincident.

Fig. 8.1. Kaplan-Meier Curves for Palliative Population Groups 1–3.

Fig. 8.1.

Kaplan-Meier Curves for Palliative Population Groups 1–3.

The resulting Hazard Functions are provided in . The hazard rate is defined as the probability that an event (end of life in this case) will occur at any point in time. The horizontal axis is the probability of life ending for palliative patients in each of the three groups. By the end of the seventh year of palliative care, the probability of life ending exceeds 0.8, or 80%.

Fig. 8.2. Hazard Rates for Palliative Patients in Groups 1–3.

Fig. 8.2.

Hazard Rates for Palliative Patients in Groups 1–3.

Use of Kaplan-Meier Curves

The Kaplan-Meier curves and associated outputs confirm the differences between Groups 1–3, and confirm the quality of life and life expectancy are improved with higher quality care. The purpose of generating these curves and presenting them with the LHS, is to demonstrate the potential improvements in quality of life, and the potential length of life, that may be achieved with changes in care.

In addition to the variables obtained through the connected Electronic Health Record systems, the client side of the LHS will capture additional observational data such as advocacy, translation, denial of need for care, and overall health. These observed characteristics are factored into the determination of the care group.

  • Advocacy: Advocacy refers to the level of championing that a patient has. It can take the form of participation at medical appointments, note taking, seeking referrals, and references, and overall, shouldering responsibilities and alleviating stress for the patient.

  • Translation: We spoke of translation as it applied to English language acquisition, but an additional component of translation is whether a patient (or their advocate) has the ability to assimilate and fully understand the medical language associated with the patient’s care. The lack of translation ability may come from many different sources including education, field of occupation, and English language acquisition.

  • Denial of Need for Care: Patient care is only as effective as the follow through on the side of the patient. If someone denies the need for their own care, they will not be participating in any constructive means to improve their quality of life. Denial is very fear based, and difficult to counter. It is an important component of determining quality of life and life expectancy.

  • Overall Health: As determined by a medical provider, a characterization of overall health is a meaningful determinant of the ability to sustain quality of life.

The analysis presented in this volume did not reflect these observed factors. The LHS does however use this information to further refine the Kaplan-Meier curves. As a future enhancement to the LHS, Sentiment Analysis and Aspect Analysis may be applied to chart notes and make additional contributions to thorough, yet concise patient histories. Sentiment analysis numerically characterizes positive, negative, neutral, and mixed sentiments in chart notes. Aspect analysis numerically characterizes the topic areas in the chart notes. Collectively, they build a complete picture of this unstructured patient data.

It is worth again emphasizing that the purpose of the Palliative LHS is to assist in healthcare decision making. It is not intended to be a diagnostic tool. It does however help to identify factors that may shift patients to higher performing care groups. An LHS will not alleviate resource challenges that exist in many hospitals and health care facilities, but it can identify concrete steps to improve care, particularly as the steps relate to advocacy, emotional/mental health support, and connection to available resources outside of the healthcare facility.

Appendix

Education

Table 8.8.

Education Levels.

Education Level #
Never attended/Kg only Incomplete Secondary 1
Grades 1–12 No Diploma Incomplete Secondary 1
HS Diploma or GED Secondary 2
Some College No Degree Secondary 2
Associated Degree Post Secondary 3
Bachelors Degree Post Secondary 3
Maters Degree Post Secondary 3
Terminal Degree Post Secondary 3

Translation

Table 8.9.

Translation Levels.

Translation
English in Home 3
Fluent 3
Conversant 2
Poor 1

Employment

Table 8.10.

Employment Levels.

Employment
Unemployed 1
Underemployed 2
Well Employed/Retired 3

Income

Table 8.11.

Income Ranges.

Income
<10K 5000
10K <= x <15K 12500
15K <= x <20K 17500
20K <= x <25K 22500
25K <= x <35K 30000
35K <= x <50K 42500
50K <= x <75K 62500
>=75K 13500
Table 8.12.

Income Levels.

Income as % of Poverty Level
0-150% 1
150-250% 2
250% and up 3

Insurance

Table 8.13.

Insurance Levels.

Insurance
Uninsured 1
Underinsured 2
Insured 3

Age Group

Table 8.14.

Age Groups.

Age Group
1 under 18
2 18 to 24
3 25 to 29
4 30 to 34
5 35 to 39
6 40 to 44
7 45 to 49
8 50 to 54
9 55 to 59
10 60 to 64
11 65 and over

References

2020 Poverty Guidelines. U.S. Federal Poverty Guidelines Used to Determine Financial Eligibility for Certain Federal Programs, 20232020 Poverty Guidelines. U.S. Federal Poverty Guidelines Used to Determine Financial Eligibility for Certain Federal Programs. (2023, April 4). Retrieved from https://aspe.hhs.gov/topics/poverty-economic-mobility/poverty-guidelines/prior-hhs-poverty-guidelines-federal-register-references/2020-poverty-guidelines

Collins, Gunja, & Aboulafia, 2020Collins, S., Gunja, M., & Aboulafia, G. (2020, August 19). U.S. Health Insurance Coverage in 2020: A looming crisis in affordability. Findings from the Commonwealth Fund Biennial Health Insurance Survey, 2020. Retrieved from https://www.commonwealthfund.org/publications/issue-briefs/2020/aug/looming-crisis-health-coverage-2020-biennial

Manning, 2023Manning, M. (2023, April 24). How many days are in a year? Retrieved from https://pumas.nasa.gov/sites/default/files/examples/04_21_97_1.pdf

Miori, Chalmers, & Miori, 2021Miori, V., Chalmers, K., & Miori, D. (2021). Ethical data stratification and analysis to eliminate unconscious bias in learning health systems for palliative medicine. Journal of Healthcare Ethics & Administration, 7(2), 115. https://www.jheaonline.org/pdf/Miori_Jhea.1.71628.pdf

Rich, Neely, Paniello, Voelker, Nussenbaum, & Wang, 2010Rich, J. T., Neely, J. G., Paniello, R. C., Voelker, C. C., Nussenbaum, B., & Wang, E. W. (2010). A practical guide to understanding Kaplan-Meier curves. Otolaryngology–Head and Neck Surgery, 143(3), 331336. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3932959/