Supervisor: Megan Coylewright
Cardiovascular (CV) clinical trials often use composite endpoints to provide the adequate assessment for a new treatment strategy whilst reducing sample size and saving costs 1. Using composite endpoints, however, causes the difficulties in estimating sample size and interpreting results. The composite endpoint is often defined as the time to first event variable with each component being considered as an event of interest. The conventional methods including log-rank test, Cox proportional hazard model and Kaplan-Meier plot treat all the components of a composite endpoint as if of equal importance and often only take into account the first occurring component event. In CV clinical trials, it is common that different endpoints may have different clinical importance, thereby these methods may lead to a loss of information.
Stolker et al. (2014) administered surveys to cardiovascular patients and clinical trialists 2. Participants were asked to distribute 25 “spending weights” to 5 common adverse events comprising composite endpoints in CV trials: death, myocardial infarction (MI), stroke, coronary revascularization, and hospitalization for angina. Patients and trialists placed greater emphasis on “hard” CV events (death, MI, stroke) than repeated revascularization or hospitalization. Among “hard” CV events, trialists considered reducing risk of death more important than MI or stroke, whereas patients viewed avoiding death, MI and stroke of equal importance. The findings suggest that patients and trialists have different priorities, and equal weights in a composite endpoint may not reflect the value from either the patient or the trialist perspectives. It is important to identify the order of the clinical priorities during the designing phase of the clinical trial, and to take into account patient preferences and values during the decision making process. In addition, how patients weighted clinical endpoints was found to be related to age, race, and household income. It might be necessary to understand the heterogeneity in patient preferences and to conduct benefit-risk assessments by subgroups. However, this survey study has several noteworthy limitations. There was no a comprehensive discussion process regarding the definitions of specific adverse events, and the survey questions were framed in terms of reducing risk of each event, which ignores a potential assumption, hold by some participants, that reducing risk of nonfatal events may also reduce the risk of death 3.
Patient-centered outcomes research was mandated by Section 6301 of the United States Patient Protection and Affordable Care Act, which authorized the establishment of the Patient-Centered Outcomes Research Institute (PCORI), intended to assist patients, clinicians, purchasers and policy-makers to make informed decisions of disease prevention, treatment, management, etc. The main duties of PCORI include identifying priorities and establishing agenda for patient-centered outcomes research. One of the key principles of evidence-generation system and patient-centered health care is the meaningful stakeholder engagement in clinical trials 4. The Early Feasibility Study Program was introduced by the US Food and Drug Administration (FDA) to facilitate medical device studies, which requires a holistic reform of clinical studies ecosystem while taking into account the rights and interests of patients 5. With regards to the benefit-risk assessments, it is critical for the FDA to build a harmonized framework accounting for patient viewpoints, whereas the methodology of evaluating patients’ benefit-risk preferences and accounting for the heterogeneity in patient preferences remains an emerging field 6. The Medical Device Innovation Consortium developed a Patient Centered Framework Report, aiming to describe how to integrate patient preference into medical device benefit-risk assessments and the regulatory approval process 7.
Current trial design often involves the selection of outcomes without significant patient input. The translation of the results in real-world shared decision making encounters can then be challenging. We review current statistical methods commonly utilized in cardiovascular clinical trials and describe their limitations. Examples of embedding patient priorities early in trial design (both for patient preference elicitation and outcome analysis methods), and a framework for how to practically implement this, are offered.
To briefly describe how current CV trials are analyzed, we use EMPHASIS-HF, PARTNER trial, and the CHARM program as examples.
Leon et al. (2010) analyzed the data from the PARTNER trial to compare the treatment effects of transcatheter aortic-valve implantation and standard therapy (e.g. balloon aortic valvuloplasty) in patients with severe aortic stenosis who were not suitable for surgery 8. The primary endpoint was death from any cause. The coprimary endpoint was the composite of death from any cause and the first hospitalization. The Kaplan-Meier analysis was used to construct survival curves for each time-to-event variable. The log-rank test was used to compare the event rates of two treatment groups. The coprimary endpoint was also analyzed by the nonparametric method introduced by Finkelstein and Schoenfeld (1999) 9, which allows for multiple pairwise comparisons for all patient pairs, first on time to death, and then on time to first hospitalization, accounting for the clinical hierarchies in the composite endpoint.
Zannad et al. (2011) analyzed the data from the EMPAHSIS-HF to investigate the effects of Eplerenone in patients with chronic systolic heart failure and mild symptoms 10. A composite of death from cardiovascular causes or a first hospitalization for heart failure was considered as the primary outcome and was analyzed via Kaplan-Meier estimate and Cox proportional-hazards models. Pfeffer et al. (2003) analyzed the data from the CHARM programme to evaluate the effects of candesartan in chronic heart failure. The CHARM programme was designed as three trials for three different patient populations. The primary outcome was death from any cause, and for all the component trials was CV death or first hospital admission for chronic heart failure. Kaplan-Meier analysis and log-rank test were used.
The current endpoint analyses put emphasis on time to first hospitalization or the fatal event, while these endpoints ignore recurrent hospitalizations as well as mortality after other endpoints. An illness-death model can be used to explore the impact of treatment on the hazard of terminal event after the occurrence of non-terminal event in the semi-competing risks data setting 11. Semi-competing risks data is the data regarding the non-terminal event whose observation is subject to a terminal event 12. Poisson, Anderson-Gill and Negative Binomial models can be utilized to analyze the repeated hospitalizations; the composite effect of recurrent hospitalizations and CV death can be analyzed by counting death as an additional event. However, these methods could not provide flexibility in analyzing a composite endpoint consisting of three or more component events of ordinal clinical importance.
The win ratio approach
Finkelstein and Schoenfeld (1999) first developed a nonparametric test, which combines a time to event measure and a longitudinal measure, based on a Wilcoxon rank sum test, to jointly analyze the mortality and longitudinal treatment effects 9. Pocock et al. (2012) extended this test and proposed the win ratio approach (which is named the “proportion in favor of treatment” by Buyse (2010)) to analyze the prioritized outcomes 13, 14. Subjects from treatment and control groups are first paired, and then a “winner” within each pair is identified by comparing time to component events sequentially according to the order of the clinical priorities. Subjects are considered as “tied” if a “winner” could not be determined within the pair. The win ratio is the total number of winners divided by the total number of losers, with large values (win ratio greater than one) indicating the treatment effect. “Matched” and “unmatched” versions of the approach were discussed. In the matched pairs approach, each subject in the new treatment is matched with a unique subject in the control group based on their baseline risk profiles. Confidence interval (CI) and P-value can be easily calculated based on the formulae in the original article. In the unmatched approach, every subject from the new treatment group is compared with every subject from the control group. CI and P-value can be calculated by using the bootstrap method. This approach has been applied to several recent trials (i.e. EMPHASIS-HF, PARTNER B trial, and the CHARM program).
Take the CHARM program as an example. Rogers et al. (2016) reanalyzed the treatment effect of candesartan in patients with heart failure, taking into account the event of recurrent hospitalization 15. They first analyzed the repeat hospitalizations with Poisson, Anderson-Gill and Negative Binomial models, and then analyzed the composite effect of recurrent hospitalizations and CV death by extending the above methods with the cardiovascular death being counted as an additional event. A joint model for the recurrent hospitalizations and CV death was also performed to estimate the treatment effect for the recurrent hospitalizations whilst taking into account death as informative censoring. Furthermore, the win ratio approach was performed to account for clinical hierarchies in composite outcomes. Each patient pair was untied first on the time to CV death, and then on the number of hospitalizations over patient common follow-up time. The results demonstrate that methods incorporating the recurrent hospitalizations show a larger treatment effect of candesartan than the conventional time to first hospitalization analysis, even when accounting for death. What is noteworthy is that for the patient type of CHARM-Preserved, the win ratio approach is less powerful than the Negative Binomial methods for repeated events, because the fact of no treatment effect for the prioritized outcome CV death diluted the treatment impact on the recurrent hospitalizations featuring second in the clinical hierarchy. Thus, the win ratio approach might be more suitable for trials with hypotheses that any treatment benefit will manifest itself in both reduced incidence of cardiovascular death and heart failure hospitalizations 15.
The limitation of unmatched win ratio approach is that comparing patients with different baseline risk profiles will be “unfair” in both direction. This problem will be minimized in a randomized controlled trial where the distribution of risk factors is similar in both arm and can be further improved by stratification. Compared to the “unmatched” version, the “matched” version can be cumbersome: (1) it necessitates the calculation of a baseline risk score, which can be challenging, subjective, and unrealistic when the risk factors differ for the components of a composite endpoint; (2) with the “matched” approach, the same data might not result in a unique estimate of the win ratio because several subjects may have same baseline profile; and (3) some patients may not find appropriate matching partner with same risk score and can not be included into the analysis, which may hurt the study power. The “matched” version is more applicable in a situation where the risk factors of the components of a composite endpoint are easy to identify and measure.
Even though the win ratio approach is first proposed by Pocock (2012) to analyze time to event endpoints, Wang and Pocock using real trial examples and simulation studies demonstrate that the win ratio can be applied for analyzing ordinal and continuous non-normal endpoints 16. Comparing with other non-parametric approaches, by using the bootstrap method, the win ratio approach can successfully yield an informative estimate of treatment difference with CI and P-value for the win ratio statistic.
Several limitations of the win ratio approach need to be acknowledged: (1) the original article did not provide a clear null hypothesis being tested using the win ratio statistic, or close-form solutions of variance estimator and sample size estimate; (2) the tied pairs are ignored in the calculation of win ratio statistic, which wastes information and hurts the statistic power; (3) it is an univariate approach that does not allow for adjusting for the potential confounders and clustering effects; and (4) this approach is highly dependent on the censoring and follow-up distribution 17.
To address some of the aforementioned concerns, Luo et al. (2015) developed a statistical framework for using the win ratio approach by providing the null hypothesis to be tested in the survival data setting: i.e. the win ratio statistic is used to test the intersection of hypothesis (1) no effect on hazard rate for terminal event over time, and hypothesis (2) no effect on hazard rate for non-terminal event conditioning on terminal event over time 18. A close-form variance estimator for win ratio and win difference for the unmatched version was developed by using the U-statistics technique 18. As indicated by Luo et al. (2015), this variance expression involves the distribution of the follow up and the censoring.
Luo et al. (2017) investigated the relationship between the win ratio approach and the traditional survival analysis on the time to first event and found that the first event analysis actually prioritized the non-terminal event instead of considering both non-terminal and terminal events of equal importance 19. Moreover, they proposed weighted win ratio approach, where weighting refers to the time that events occur, not the type of the events. Wins and losses are weighted according to when they occur (i.e. a win occurring later may be weighted more than a win occurring earlier). Several choices for the weights of terminal and non-terminal events are provided. However, the weighted win ratio approach still seems to rely on the censoring distributions.
Further studies are needed to get rid of the impact of the censoring distributions on the win ratio, utilizing appropriate weight functions.
The desirability of outcome ranking and the partial credit strategy
Evans et al. (2015) proposed the desirability of outcome ranking (DOOR) to compare different treatment strategies by ranking trial participants with respect to the desirability of their overall outcome 20. Patients are first partitioned to different hierarchical levels with the number and definition of levels tailored to the different clinical disease of interest. The distributions of DOORs are then compared between treatment strategies with the null hypothesis “there is no difference in DOOR distribution”, and the alternative hypothesis “the new strategy has a higher DOOR (i.e. the probability that a randomly selected patient will have a better DOOR if assigned to the new strategy vs the control strategy is >50%)”. The probability that a randomly selected patient from the new strategy group has a higher DOOR compared to the patient from the old strategy group can be calculated by using the number of pairwise comparisons between treatment groups in which the new strategy has a higher score than the old strategy, divided by the total number of pairwise comparisons between treatment groups. CIs can be calculated by using rank-based methods, such as the Wilcoxon-Mann-Whitney test, or via simulation. Evans et al. (2017) also utilized the traditional method of hazard ratio to explain the concept of DOOR 21: a survival trial designed to detect a hazard ratio of R can also be considered to be a trial powered to detect the probability of a better outcome in treatment of 1/(1+R).
The traditional “benefit:risk analyses” approach first analyzes each endpoint and then summarizes the analyses outcomes for treatment comparison. DOOR first evaluates the clinical outcomes combining risks and harms for each patient, and then integrates these data for the comparison of treatment groups, allowing for the interpretation of results at patient level (i.e. using outcomes to analyze patients rather than using patients to analyze outcomes). As stated by Evans et al. (2015), DOOR by utilizing ordinal outcomes ranks the clinical importance for multiple events at patient level, which overcomes the problem of the traditional time to the first event approach in CV trials. DOOR does not depend on distribution or other assumptions. Compared to the approaches that assign scores to each specific component, DOOR as an rank-based agreement is easier to conduct. However, it is still difficult to achieve a consensus on an ordinal ranking. In addition, DOOR can not reflect the magnitude of influence of each specific component.
Evans et al. (2017) extended the DOOR strategy by introducing the partial credit strategy, which provides appropriate magnitude of influence for each specific component 21. Evans et al. (2017) employed the trial comparing colistin with a new therapy as an example, with an ordinal composite clinical outcome combining survival without a major adverse event (AE), survival with a major AE, and death. The difference in mean scores is considered as a function of the partial credit, with the mean differences of the rates of survival with a major AE as the slope and the mean differences of the rates of survival without a major AE as the intercept. The partial credit ranges from 0 to 1, with 0 corresponds to using survival without a major AE as a binary outcome, while 1 means using survival as a binary outcome. This strategy allows patients and clinicians to decide where the tipping point is for the treatment effect comparison with the partial credit selected based upon their own preferences.
Integrate patient preferences
As mentioned by Evans et al. (2017), construction of the ordinal outcome for the DOOR strategy is critical, which requires the careful deliberation regarding different clinical diseases of interest and the consensus among the ranks. Although individuals could assign different partial credit scores and make their conclusions about the trial, a specific partial credit score should be selected in the trial design phase in order to estimate the required sample size and to provide a transparent regulatory setting. Similarly, the main issue in the win ratio approach is to determine the order of the clinical priorities of the components within a composite endpoint.
The Medical Device Innovation Consortium Patient Preference Framework lists several stated preferences methods, including conjoint analysis and discrete-choice experiments, best-worst scaling exercises, direct-assessment questions, as well as threshold technique 7. Conjoint analysis, particularly discrete choice experiment was cited in the recommendations released by the FDA as a favored stated preference elicitation method 6. DCE was first used in economics by Lancaster (1966) 22, and has been increasingly used to quantify preferences of patients, physicians and other stakeholders 23.
Tong et al. (2012) investigated the relative weights and importance among components of the composite endpoint of major adverse cardiac and cerebrovascular events: death, stroke, nonfatal MI and need for repeat revascularization by using DCE24. Among a total of 224 patients who completed the survey, risk of death was found to be most important (relative weight of 0.23), followed by stroke (0.18), potential increased longevity and recovery time (each 0.17), myocardial infarction (0.14), and risk of repeat revascularization (0.11).
Najafzadeh et al. (2014) employed DCE to elicit patients’ preferences in anticoagulant therapy, with risk attributes including: nonfatal stroke, nonfatal myocardial infarction, CV death, major bleeding, bleeding death, and need for monitoring 25. A total of 341 patients completed all DCE questions. On average, patients valued a 1% increased risk of a fatal bleeding event the same as a 2% increase in nonfatal myocardial infarction, a 3% increase in nonfatal stroke, a 3% increase in cardiovascular death, a 6% increase in major bleeding, and a 16% increase in minor bleeding. Patients with distinct previous experiences with myocardial infarction or stroke were found to hold different preferences.
While both qualitative and quantitative methods can be used to elicit patient benefit risk preferences, qualitative methods alone will likely not provide the level of evidence that can inform regulatory benefit risk assessments 7. A recent systematic review of 254 healthcare DCEs demonstrates that qualitative methods were used to select attributes and/or levels (n=95; 66%) and/or pilot the DCE survey (n=26; 18%), with the focus group (n=63; 44%) and the interview (n=109; 76%) as two most popular methods 26. All authors participating in the survey (n=50; 100%) thought that qualitative methods added value to their DCE studies.
- Rauch G, Rauch B, Schuler S and Kieser M. Opportunities and challenges of clinical trials in cardiology using composite primary endpoints. World J Cardiol. 2015;7:1-5.
- Stolker JM, Spertus JA, Cohen DJ, Jones PG, Jain KK, Bamberger E, Lonergan BB and Chan PS. Re-thinking composite endpoints in clinical trials: insights from patients and trialists. Circulation. 2014:CIRCULATIONAHA. 113.006588.
- Califf RM. Patient-centered outcomes composites: a glimpse of the future. Circulation. 2014;130:1223-4.
- Califf RM, Robb MA, Bindman AB, Briggs JP, Collins FS, Conway PH, Coster TS, Cunningham FE, De Lew N, DeSalvo KB, Dymek C, Dzau VJ, Fleurence RL, Frank RG, Gaziano JM, Kaufmann P, Lauer M, Marks PW, McGinnis JM, Richards C, Selby JV, Shulkin DJ, Shuren J, Slavitt AM, Smith SR, Washington BV, White PJ, Woodcock J, Woodson J and Sherman RE. Transforming Evidence Generation to Support Health and Health Care Decisions. N Engl J Med. 2016;375:2395-2400.
- Holmes DR, Jr., Califf R, Farb A, Abel D, Mack M, Syrek Jensen T, Zuckerman B, Leon M and Shuren J. Overcoming the Challenges of Conducting Early Feasibility Studies of Medical Devices in the United States. J Am Coll Cardiol. 2016;68:1908-1915.
- Califf RM. Benefit-Risk Assessments at the US Food and Drug Administration Finding the Balance. Jama-J Am Med Assoc. 2017;317:693-694.
- Medical Device Innovation Consortium. A framework for incorporating information on patient preferences regarding benefit and risk into regulatory assessments of new medical technology. Available from: http://mdic.org/ wp-content/uploads/2015/05/MDIC_PCBR_Framework_Proof5_Web.pdf. [Accessed Dec 28, 2017].
- Leon MB, Smith CR, Mack M, Miller DC, Moses JW, Svensson LG, Tuzcu EM, Webb JG, Fontana GP, Makkar RR, Brown DL, Block PC, Guyton RA, Pichard AD, Bavaria JE, Herrmann HC, Douglas PS, Petersen JL, Akin JJ, Anderson WN, Wang D, Pocock S and Investigators PT. Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. N Engl J Med. 2010;363:1597-607.
- Finkelstein DM and Schoenfeld DA. Combining mortality and longitudinal measures in clinical trials. Stat Med. 1999;18:1341-54.
- Zannad F, McMurray JJ, Krum H, van Veldhuisen DJ, Swedberg K, Shi H, Vincent J, Pocock SJ, Pitt B and Group E-HS. Eplerenone in patients with systolic heart failure and mild symptoms. N Engl J Med. 2011;364:11-21.
- Jazic I, Schrag D, Sargent DJ and Haneuse S. Beyond Composite Endpoints Analysis: Semicompeting Risks as an Underutilized Framework for Cancer Research. Jnci-J Natl Cancer I. 2016;108.
- Fine JP, Jiang H and Chappell R. On semi-competing risks data. Biometrika. 2001;88:907-919.
- Buyse M. Generalized pairwise comparisons of prioritized outcomes in the two‐sample problem. Statistics in medicine. 2010;29:3245-3257.
- Pocock SJ, Ariti CA, Collier TJ and Wang D. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European heart journal. 2011;33:176-182.
- Rogers JK, Pocock SJ, McMurray JJ, Granger CB, Michelson EL, Östergren J, Pfeffer MA, Solomon SD, Swedberg K and Yusuf S. Analysing recurrent hospitalizations in heart failure: a review of statistical methodology, with application to CHARM‐Preserved. European journal of heart failure. 2014;16:33-40.
- Wang D and Pocock S. A win ratio approach to comparing continuous non‐normal outcomes in clinical trials. Pharmaceutical statistics. 2016;15:238-245.
- Rauch G, Jahn‐Eimermacher A, Brannath W and Kieser M. Opportunities and challenges of combined effect measures based on prioritized outcomes. Statistics in medicine. 2014;33:1104-1120.
- Luo X, Tian H, Mohanty S and Tsai WY. An alternative approach to confidence interval estimation for the win ratio statistic. Biometrics. 2015;71:139-145.
- Luo X, Qiu J, Bai S and Tian H. Weighted win loss approach for analyzing prioritized outcomes. Statistics in Medicine. 2017;36:2452-2465.
- Evans SR, Rubin D, Follmann D, Pennello G, Huskins WC, Powers JH, Schoenfeld D, Chuang-Stein C, Cosgrove SE, Fowler VG, Jr., Lautenbach E and Chambers HF. Desirability of Outcome Ranking (DOOR) and Response Adjusted for Duration of Antibiotic Risk (RADAR). Clin Infect Dis. 2015;61:800-6.
- Evans S. Using outcomes to analyze patients rather than patients to analyze outcomes: partial credit, pragmatism, and benefit: risk evaluation. Trials. 2017;18.
- Lancaster KJ. A new approach to consumer theory. Journal of political economy. 1966;74:132-157.
- Hauber AB, Gonzalez JM, Groothuis-Oudshoorn CG, Prior T, Marshall DA, Cunningham C, MJ IJ and Bridges JF. Statistical Methods for the Analysis of Discrete Choice Experiments: A Report of the ISPOR Conjoint Analysis Good Research Practices Task Force. Value Health. 2016;19:300-15.
- Tong BC, Huber JC, Ascheim DD, Puskas JD, Ferguson TB, Jr., Blackstone EH and Smith PK. Weighting composite endpoints in clinical trials: essential evidence for the heart team. Ann Thorac Surg. 2012;94:1908-13.
- Najafzadeh M, Gagne JJ, Choudhry NK, Polinski JM, Avorn J and Schneeweiss SS. Patients’ preferences in anticoagulant therapy: discrete choice experiment. Circ Cardiovasc Qual Outcomes. 2014;7:912-9.
- Vass C, Rigby D and Payne K. The Role of Qualitative Research Methods in Discrete Choice Experiments. Med Decis Making. 2017;37:298-313.