Questioning the QALY: A closer look at the Quality-Adjusted Life-Year

March 10, 2021


By Nina Shevzov-Zebrun

Peer Reviewed

“How much would I have to pay you to lick the subway floor?”

“How much money would it take for you to wear an inflatable T-Rex costume to school?”

Growing up, my sister and I thrived on this game—the game of assigning value to various actions and states of being. So, naturally, when I came across the QALY,1 I had to learn more.

The quality-adjusted life-year, or QALY, is a metric used to evaluate the effectiveness of health interventions in a standardized fashion, with the broader goal of optimizing healthcare resource allocation. Seeking to integrate “the bio-medical and the psycho-social,” the QALY is defined as time x utility—where time is the additional years of life a given intervention affords, and utility is the quality of that time (in terms of morbidity) compared to that time in “perfect health.”2  In this way, QALYs assign worth to life in various states of fitness to determine interventions’ relative cost-effectiveness.3

The Institute for Clinical and Economic Review sets one QALY in the U.S. at around ~$100,000.4 That’s the value, worth, break-even point of one year lived in perfect health, two years lived in “half-perfect” heath, and so on.

As would any measure condensing the complexity of human experience, the QALY has come under scrutiny for a variety of ethical, philosophical, mathematical, and economic reasons.   Here, in this brief look at a massive topic, we will:

  • Review the history of the QALY and its derivation
  • Recap its common criticisms
  • Discuss a study probing the QALY’s validity
  • Broaden our discussion to questions of “population-level” versus “individual-level” health, which the QALY brings to the fore.

The QALY first appeared in the literature in the 1970s. According to Zeckhauser and Shepard, the authors credited as first using the acronym, the central question behind the QALY is: “where should we spend whose money to undertake what program to save which lives with what probability?”5 They outline the benefits of “bringing lifesaving discussion out into the open,” touting the potential of the QALY and similar indices to align policy “more closely with the valuation of the members of society.”5

In addition to detailing its societal benefits, early papers lay out the QALY’s philosophical-mathematical underpinnings. Then and now, the World Health Organization defines health as “a state of complete physical, mental, and social well-being and not merely the absence of disease and infirmity.”6 Accordingly, QALY developers defined health as a “three-dimensional” space with axes of physical, emotional, and social function, and utility as the relative value of any point in that space “as perceived by society” (specifically, an “appropriate sample of subjects from the population of interest”).6 Utility, they argue, while unique across individuals and life stages, can and should be aggregated into a population-level index to maximize overall “expected utility gain.”3,6,7 Perfect health is thus assigned utility of 1, and death utility of 0.6 This 0-1 index range persists today in QALY calculations, presumably allowing for comparison of vastly different disease states and cost-effectiveness of potential interventions—comparison, for instance, of the relative “bang for the buck” of a new hypertension drug versus a novel breast cancer screening program.8

With this background, we progress to key criticisms of the QALY, divided into humanistic and mathematical-methodological domains. Humanistically, criticisms include:2

  • The model’s absolute valuation of certain lives over others (eg, wheelchair-bound lives as worth less, and assigned lower utility value)
  • The model’s overly coarse utilitarian nature, with a QALY carrying the same “value” for individuals across lifespans, locations, and cultures
  • The model’s potential to rigidify medical options and decision-making, preventing providers from individualizing treatment

From a mathematical-methodological perspective, criticisms include:2,9

  • The model’s assumption that death is the lowest utility state possible—might not some disease states arguably be worse than death?
  • The mismatch between multiplied “life-year” and “utility” units, and thus the QALY’s mathematical incoherence (on a more complex level than is describable here)
  • The unreliability, obscurity, and bias of techniques used to determine the utility values of various health states

In part as a response to these and other criticisms, ten years ago the European Consortium in Healthcare Outcomes and Cost-Benefit Research (ECHOUTCOME) initiated a study to test the validity of key assumptions behind the QALY—including the assumption that time and utility are 100% independent, and that inclination to sacrifice or fight for life years remains constant over time.3

Conducted across Belgium, France, Italy, and the UK, the study consisted of >1000 participants asked about preferences for various combinations of health states and times lived in them.3 Notably, only 70% of individuals displayed “consistent preferences” for health states (specifically, preference for more, rather than fewer, years in a wheelchair). Overall, the responses of this subgroup demonstrated that QALY-calculated utility values fail to align reliably with experimentally observed ones.3 While the study was self-admittedly flawed in a number of ways—including likely selection bias, given that all participants were relatively young academics—it nevertheless calls into question the widespread use of QALYs in high-stakes decision-making. Are QALYs really, as proponents argue, the “best method available” to compare and prioritize health interventions?3

In this final section, we step back to consider a central tension underlying the QALY debate—namely, the tension between “individual-level” and “population-level” health, or, put differently, the balance between addressing the needs of the “few,” however rare or extraordinary (eg, rare pediatric brain tumors), versus the common plights of the statistically-aggregated “many” (eg, hypertension). These two approaches to health are often presented as diametrically opposed. Physician-scholar Onyebuchi Arah calls this the “if-it’s-not-individual-it’s-collective” conception of health policy, in which choices made for the population cannot be trusted to consistently benefit independent individuals (and vice versa).10 The standing recommendation to start screening individuals at age 50 for colon cancer, as a different example, misses the few individuals affected at 38.11 Broadening guidelines to screen at 38, however, arguably wastes resources better allocated elsewhere at the collective level. According to Arah, however, this tension is erroneous, as population and individual health are inextricably intertwined, and can only be understood in the context of one another.10

Metrics like the QALY that wrestle with the complexity of such socio-medical tensions—and attempt to quantify what many consider unquantifiable in hopes of bridging the individual-population health gap—have impact on innumerable lives. It is imperative that providers, policy makers, and others across the health stewardship spectrum consider the intricacies of such metrics when making decisions at any level—for the one, or for the many.

Nina Shevzov-Zebrun is a third-year medical student at NYU Grossman School of Medicine

Peer reviewed by Michael Tanner, MD, associate editor, Clinical Correlations

Image courtesy of  Wikimedia Commons

References

  1. Cohen J. A QALY Is A QALY Is A QALY, Or Is It? Forbes Healthcare. https://www.forbes.com/sites/joshuacohen/2019/01/14/a-qaly-is-a-qaly-is-a-qaly-or-is-it/?sh=76685115496a January 14, 2019. Accessed October 22, 2020.
  2. Pettitt DA, Raza S, Naughton B, et al. The limitations of QALY: a literature review. J Stem Cell Res Ther. 2016;6:4.
  3. Beresniak A, Medina-Lara A, Auray JP, et al. Validation of the underlying assumptions of the quality-adjusted life-years outcome: results from the ECHOUTCOME European project. 2015;33(1):61-69.
  4. Smith WS. The U.S. shouldn’t use the ‘QALY’ in drug cost-effectiveness reviews. Stat News. statnews.com/2019/02/22/qaly-drug-effectiveness-reviews. February 22, 2019. Accessed October 22, 2020.
  5. Zeckhauser R, Shepard D. Where Now for Saving Lives? Law and Contemporary Problems. 1976;40:5-45.
  6. Torrance GW, Thomas WH, Sackett DL. A utility maximization model for evaluation of health care programs. Health Serv Res. 1972;7(2):118-133.
  7. Prokop D. Von Neumann–Morgenstern utility function. Encyclopædia Britannica. 2016. Accessed October 22, 2020. britannica.com/topic/von-Neumann-Morgenstern-utility-function.
  8. Holekamp NM, Duff SB, Rajput Y. Quality-adjusted life years for the retina specialist. Retina Today. Accessed October 22, 2020. https://retinatoday.com/articles/2016-mar/quality-adjusted-life-years-for-the-retina-specialist
  9. Prieto L, Sacristán JA. Problems and solutions in calculating quality-adjusted life years (QALYs). Health Qual Life Outcomes. 2003;1:80.
  10. Arah OA. On the relationship between individual and population health. Med Health Care Philos. 2009;12(3):235-244.
  11. Final Recommendation Statement: Screening for Colorectal Cancer. S. Preventive Services Task Force. June 15, 2016. Accessed October 22, 2020. www.uspreventiveservicestaskforce.org/uspstf/document/ RecommendationStatementFinal/colorectal-cancer-screening.