1. INTRODUCTION
The success of chimeric antigen receptor (CAR) T-cell therapy, particularly for hematopoietic malignancies such as B-cell hematologic malignancies, has fundamentally altered the landscape of treating cancer [1, 2]. CAR T-cells are viable T-lymphocytes that are collected from patients and genetically engineered to express a synthetic CAR, which recognizes a target antigen and subsequently kills tumor cells [3]. CAR T-cells contain modular constructs consisting mainly of T-cell activating domains and a surface single-chain variable fragment that binds the target antigen [4]. Since 2016, six CAR T-cell products have been approved by the US Food and Drug Administration (FDA) for medical use in treating B-cell hematologic malignancies. By the end of April 2022, more than 1100 registered related clinical trials were listed on ClinicalTrials.gov, involving various target antigens, CAR constructs, and cancer types.
Early phase trials are critical for the success of CAR T-cell therapy development, because they determine the recommended phase II dose (RP2D) to be evaluated in subsequent late-phase trials. If a subtherapeutic dose is incorrectly selected as the RP2D, phase III trials may fail because of a lack of efficacy; moreover, if a toxic dose is incorrectly selected as the RP2D, drug development may be terminated because of high toxicity or delays due to post hoc dose adjustment. Because early phase trials are a necessary step for every investigational CAR T-cell therapy, small or moderate improvements observed in early phase trials may translate into large improvements in the therapy development. This article focuses on the design of early phase CAR-T cell therapy trials. Herein, we first elucidate why the conventional paradigm based on the maximum tolerated dose (MTD) is problematic for CAR-T cell therapies. We then introduce the phase I-II design paradigm, as illustrated with several novel Bayesian adaptive designs, to optimize the dose and development of CAR-T cell therapies according to benefit-risk tradeoff.
2. Methods
2.1 The conventional more-is-better paradigm
The conventional phase I dose-finding paradigm was developed in the era of cytotoxic therapies, with an aim of identifying the MTD on the basis of dose-limiting toxicity (DLT). The fundamental assumption underlying this more-is-better paradigm is that the efficacy monotonically increases with the dose, as is often true for cytotoxic drugs. In this case, the MTD is the dose that yields the highest efficacy with acceptable toxicity. Many designs have been developed to identify the MTD, which can generally be classified into algorithm-based designs, model-based designs, and model-assisted designs [5, 6]. The best-known example of the algorithm-based design is the 3+3 design, which uses a simple pre-specified algorithm to guide the dose escalation and de-escalation (e.g., the dose is escalated with 0/3 DLT, retained with 1/3 DLT, and de-escalated with 2/3 or more DLTs). The 3+3 design is simple but has the drawback of poor operating characteristics [7, 8]. Model-based designs, such as the continuous reassessment method (CRM) [9], have been proposed to improve the operating characteristics of phase I trials. The CRM assumes a statistical model (e.g., a logistic model) to describe the dose-toxicity relationship, then continuously updates the estimate of the model after each cohort of patients to determine the dose assignment for the next cohort. CRM outperforms the 3+3 design [8] and has higher accuracy in identifying the MTD, but it is complicated to implement because the model must be repeatedly fitted after each cohort.
Model-assisted designs were developed to combine the advantages of algorithm-based designs and model-based designs [5, 10]. Similarly to model-based designs, the model-assisted design uses a statistical model (e.g., the binomial model) for efficient decision-making; like the algorithm-based design, however, its dose escalation and de-escalation rule can be pre-determined before the onset of the trial. Consequently, model-assisted designs achieve statistical efficiency similar to that of model-based designs, but they can be implemented as simply as algorithm-based designs. Examples of model-assisted design include the Bayesian optimal interval (BOIN) design [11, 12] and the keyboard design [13]. BOIN is the first and the only novel dose-finding design to date that has received the Fit-for-Purpose designation from the FDA as a drug development tool, because of its desirable operating characteristics. BOIN is widely and increasingly used in practice [6].
For CAR T-cell therapies, the conventional more-is-better paradigm is problematic. Unlike cytotoxic therapies, CAR T-cells work as a living and targeted drug: they proliferate in the host’s body and trigger cytotoxic activity against a tumor expressed target antigen. Owing to the differences in kinetics with respect to those of conventional drugs, and the heterogeneity across patients, the monotonicity assumption regarding the dose-exposure and dose-efficacy relationship does not hold true for many CAR T-cell therapies. For example, in the phase I 17001 study (NCT02631044) [14], within a dose range of 50 to 150×106 CAR T-cells per kg in 256 patients, the relationship between dose and exposure (Cmax and the area under the curve) and the best overall response were both flat, suggesting that the efficacy had already plateaued in the investigated dose range. Phase II studies of pediatric and young adult B-cell acute lymphoblastic leukemia (NCT02435849 (ELIANA) and NCT02228096 (ENSIGN)) have also demonstrated the absence of a monotonic dose-exposure relationship [15]. In addition, with CAR T-cell therapies, DLTs are rarely observed in the therapeutic dose range, although low grade or manageable toxicity is common, thus making the MTD unattainable in phase I trials. Table 1 shows the phase I trial results for six CAR T-cell therapies approved by the FDA. None of the trials identified the MTD. Nonetheless, among four CAR T-cell therapy studies examining more than one dose, three selected an RP2D lower than the maximum administered dose (MAD), which is already lower than the MTD, because lower doses have been found to yield a better benefit-risk tradeoff (i.e., similar efficacy with a lower frequency of non-DLT toxicity than the MAD).
Axicabtagene ciloleucel (Yescarta) | Brexucabtagene autoleucel (Tecartus) | Lisocabtagene maraleucel (Breyanzi) | Idecabtagene vicleucel (Abecma) | Tisagen-lecleucel† (Kymriah) | Ciltacabtagene autoleucel† (Carvykti) | |
---|---|---|---|---|---|---|
MTD | Not found | Not found | Not found | Not found | Not found | Not found |
MAD | 2×106 | 2×106* | 150×106 | 800×106* | 2.5×108 | 7.5×105 |
RP2D | 2×106 | 1×106 | 100×106 | 150×106 and 450×106 | (0.2–5)×106 | 7.5×105 |
*No DLTs observed at the MAD.
†Only a single dose was investigated.
Therefore, for CAR T-cell therapies, the MTD is not highly relevant, whereas the optimal biological dose (OBD) that produces the best benefit-risk tradeoff (e.g., the lowest dose that reaches the efficacy plateau) is crucial. By definition, identifying the OBD requires consideration of both toxicity and efficacy to assess the benefit-risk (or desirability) of each dose, which cannot be accomplished in conventional phase I trial designs. For ease of understanding, we use the terms benefit-risk and desirability interchangeably herein. Unfortunately, most CAR T-cell therapies still use conventional MTD-based dose finding methods based only on the DLTs. For example, three of the four CAR T-cell therapies in Table 1 consider the incidence of DLTs only in the dose escalation stage. Misalignment between the trial design and objective results in low accuracy in identifying the OBD, as well as potential ethical concerns. For example, when a low dose has reached an efficacy plateau with clear clinical benefit, the 3+3 design and CRM will continue escalating the dose until the MAD is reached, and will treat more patients at doses that do not improve efficacy yet have greater toxicity. This critical issue, which is generally pertinent to targeted therapies including CAR T-cell therapies, has drawn extensive attention from regulatory agencies and industry. In September 2021, the FDA released Guidance on Benefit-Risk Assessment for New Drug and Biological Products [16]. In February 2022, the FDA initiated Project Optimus [17] “to reform the dose optimization and dose selection paradigm in oncology drug development” by shifting from an MTD-based approach to benefit-risk based dose optimization and drug development.
2.2 Dose optimization for CAR T-cell therapies
In this section, we introduce several novel Bayesian adaptive designs that account for both toxicity and efficacy, to optimize the dose for CAR T-cell therapies. These designs are known as phase I-II designs in the trial design literature [18, 19]. Readers should not confuse phase I-II designs with designs that simply connect phase I and II sequentially in a trial—a design often called a seamless phase I/II trial. Here, phase I-II designs refer to a design paradigm that simultaneously accounts for toxicity (conventionally considered in phase I) and efficacy (conventionally considered in phase II) to make decisions regarding dose assignment and selection, on the basis of the benefit-risk tradeoff. In addition, toxicity and efficacy are used generically herein to denote the endpoints that represent the risk and benefit of the treatment, respectively. Depending on the trial, toxicity can be DLTs, low grade toxicity, or tolerability (e.g., the percentage of dose interruption and discontinuation), and efficacy can be tumor response, the percentage of tumor shrinkage, pharmacodynamic endpoints, or other surrogate efficacy biomarkers. The phase I-II design paradigm has been reviewed by Yan et al. (2017) [18] and Yuan et al. (2018) [19].
Several phase I-II designs have been proposed to identify the OBD [20–34]. They can be classified as model-based designs and model-assisted designs. In model-based design, a statistical model (e.g., a bivariate logistic model) is posited to describe dose-toxicity and dose-efficacy relationships. On the basis of the observed interim data, the estimate of the model is updated, and the dose that has the highest desirability (i.e., benefit-risk tradeoff) is identified to make decisions regarding dose escalation and de-escalation. This process is repeated until the maximum sample size is reached or other early stopping criteria are met, and then the dose with the highest desirability is selected as the OBD. Examples of model-based phase I-II designs include the EffTox design (Thall and Cook, 2004), the late-onset EffTox (LO-EffTox) design (Jin et al., 2014), and the phase I-II design for immunotherapy (Liu et al., 2018).
Despite several successful applications, such as the recent trial examples of Tidwell et al. (2021) [35] and Msaouel et al. (2022) [36], the use of model-based phase I-II designs has been fairly limited. One major reason for this limited use is that these designs are statistically and computationally complicated. To account for toxicity and efficacy, the model used by the designs is substantially more complex than the conventional phase I trial design (e.g., CRM). Consequently, implementation of these novel designs requires extensive computational infrastructure and experienced biostatisticians with strong expertise in Bayesian adaptive designs, both of which may not be available at many institutions. In addition, highly complicated and structured parametric models make the design susceptible to model misspecification. Tidwell et al. (2021) have used a pancreatic trial to illustrate the challenges of implementing a phase I-II design and have provided several potential solutions. One solution is to use model-assisted designs.
Model-assisted designs have been proposed to simplify the implementation of phase I-II trials while yielding a performance comparable to or better than that of model-based designs. Because model-based designs do not assume any parametric dose-toxicity and dose-efficacy relationship, they are more robust than model-based designs. For these reasons, we focus on model-assisted phase I-II designs herein, and we use BOIN12 [33] and U-BOIN [32] to illustrate this approach. Other examples of model-assisted phase I-II designs have been provided by Lin et al. (2017) [28], Takeda et al. (2018) [29], and Shi et al. (2021) [34].
We first describe the BOIN12 design [33]. BOIN12 uses the utility to measure the benefit-risk tradeoff. For illustration purposes, we consider binary toxicity and efficacy endpoints. Here, the toxicity endpoint need not be DLTs. Given that many CAR T-cell therapies rarely cause DLTs, the toxicity endpoint can be appropriate for severe adverse events and should be tailored to the investigational therapy. The efficacy endpoint can be objective response, a pharmacokinetic endpoint (e.g., Cmax or area under the curve for T-cell expansion), or any appropriate surrogate efficacy endpoint. For any given patient in a trial, there are four possible outcomes: (no toxicity, efficacy); (no toxicity, no efficacy); (toxicity, efficacy); and (toxicity, no efficacy), as shown in Table 2 .
Clearly, (no toxicity, efficacy) is the most desirable, (toxicity, no efficacy) is the least desirable, and the other two outcomes have intermediate desirability. BOIN12 assigns the most desirable outcome (no toxicity, efficacy) a score of 100, and the least desirable outcome (toxicity, no efficacy) a score of 0; it then elicits the scores of the other two outcomes from clinicians to reflect their clinical desirability. For example, in Table 2 , the clinician assigns a score of 60 to (toxicity, efficacy) and a score of 40 to (no toxicity, no efficacy).
The desirability of a dose is an average of the utility scores of the four outcomes, in which each score is weighted by the probability of observing that outcome. For example, for a dose with probabilities of 0.6; 0.1, 0.2, and 0.1 for observing the outcomes of (no toxicity, efficacy), (no toxicity, no efficacy), (toxicity, efficacy), and (toxicity, no efficacy), respectively, the desirability of that dose is 0.6 × 100 + 0.1 × 40 + 0.2 × 60 + 0.1 × 0 = 76. Clearly, a dose with a higher likelihood of producing favorable outcomes will have a higher desirability. The utility approach is highly flexible and can accommodate various types of benefit-risk considerations. For example, if the goal is to identify the safe dose with the highest efficacy, the outcomes of (toxicity, efficacy) and (no toxicity, no efficacy) can simply be assigned scores of 100 and 0, respectively.
On the basis of the observed interim toxicity and efficacy data, the BOIN12 design adaptively assigns patients to the dose with the highest estimated desirability. The dose-finding rule of the BOIN12 design is depicted in Figure 1 . A key feature of BOIN12 is that dose desirability can be pre-tabulated and included in the trial protocol before the start of the trial ( Table 3 ). Thus, when conducting a trial, there is no need for the complicated calculations and estimations required in model-based designs. The desirability of a dose, and thus the dose assignment, can be simply determined using the desirability table as follows: the number of patients treated at that dose, the number of patients who experienced toxicity, and the number of patients who experienced efficacy are counted, and Table 3 is then used to determine the dose desirability, or more precisely, the rank-based desirability score (RDS). The dose with the highest RDS value is chosen to treat the next patients. For example, suppose that, at a certain point in the trial, the numbers of patients treated at the first three doses, d1, d2, and d3, are 3, 6, and 3, the numbers of patients with toxicity are 0, 1, and 1, and the number of patients with efficacy outcomes are 0, 4, and 1. The current dose is d2. According to the dose-finding rule, because the observed toxicity rate
is less than the escalation boundary λ e = 0.26, from Table 3 , the RDS of d1, d2, and d3 is determined to be 33, 66, and 39, respectively. Because d2 has the highest RDS, a decision is made to stay at d2 for treating the next cohort of patients. An extensive simulation study has indicated that BOIN12 has desirable operating characteristics, and it outperforms several more complicated model-based phase I-II designs [33], such as the EffTox design.No. pts | No. tox | No. eff | Desirability score | No. pts | No. tox | No. eff. | Desirability score | No. pts | No. tox | No. eff | Desirability score |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 54 | 6 | 3 | 0 | 3 | 9 | 2 | 7 | 76 |
3 | 0 | 0 | 33 | 6 | 3 | 1 | 10 | 9 | 2 | 8 | 85 |
3 | 0 | 1 | 50 | 6 | 3 | 2 | 19 | 9 | 2 | 9 | 91 |
3 | 0 | 2 | 69 | 6 | 3 | 3 | 34 | 9 | 3 | 0 | E |
3 | 0 | 3 | 84 | 6 | 3 | 4 | 46 | 9 | 3 | 1 | 5 |
3 | 1 | 0 | 21 | 6 | 3 | 5 | 61 | 9 | 3 | 2 | 12 |
3 | 1 | 1 | 39 | 6 | 3 | 6 | 74 | 9 | 3 | 3 | 22 |
3 | 1 | 2 | 57 | 6 | ≥4 | Any | E | 9 | 3 | 4 | 31 |
3 | 1 | 3 | 73 | 9 | 0 | 0 | E | 9 | 3 | 5 | 43 |
3 | 2 | 0 | 11 | 9 | 0 | 1 | 22 | 9 | 3 | 6 | 55 |
3 | 2 | 1 | 27 | 9 | 0 | 2 | 31 | 9 | 3 | 7 | 67 |
3 | 2 | 2 | 45 | 9 | 0 | 3 | 43 | 9 | 3 | 8 | 78 |
3 | 2 | 3 | 63 | 9 | 0 | 4 | 55 | 9 | 3 | 9 | 87 |
3 | 3 | Any | E | 9 | 0 | 5 | 67 | 9 | 4 | 0 | E |
6 | 0 | 0 | 19 | 9 | 0 | 6 | 78 | 9 | 4 | 1 | 2 |
6 | 0 | 1 | 34 | 9 | 0 | 7 | 87 | 9 | 4 | 2 | 7 |
6 | 0 | 2 | 46 | 9 | 0 | 8 | 92 | 9 | 4 | 3 | 14 |
6 | 0 | 3 | 61 | 9 | 0 | 9 | 95 | 9 | 4 | 4 | 25 |
6 | 0 | 4 | 74 | 9 | 1 | 0 | E | 9 | 4 | 5 | 36 |
6 | 0 | 5 | 86 | 9 | 1 | 1 | 14 | 9 | 4 | 6 | 48 |
6 | 0 | 6 | 93 | 9 | 1 | 2 | 25 | 9 | 4 | 7 | 59 |
6 | 1 | 0 | 13 | 9 | 1 | 3 | 36 | 9 | 4 | 8 | 71 |
6 | 1 | 1 | 24 | 9 | 1 | 4 | 48 | 9 | 4 | 9 | 81 |
6 | 1 | 2 | 38 | 9 | 1 | 5 | 59 | 9 | 5 | 0 | E |
6 | 1 | 3 | 51 | 9 | 1 | 6 | 71 | 9 | 5 | 1 | 1 |
6 | 1 | 4 | 66 | 9 | 1 | 7 | 81 | 9 | 5 | 2 | 4 |
6 | 1 | 5 | 80 | 9 | 1 | 8 | 90 | 9 | 5 | 3 | 8 |
6 | 1 | 6 | 89 | 9 | 1 | 9 | 94 | 9 | 5 | 4 | 17 |
6 | 2 | 0 | 6 | 9 | 2 | 0 | E | 9 | 5 | 5 | 28 |
6 | 2 | 1 | 16 | 9 | 2 | 1 | 8 | 9 | 5 | 6 | 40 |
6 | 2 | 2 | 30 | 9 | 2 | 2 | 17 | 9 | 5 | 7 | 52 |
6 | 2 | 3 | 42 | 9 | 2 | 3 | 28 | 9 | 5 | 8 | 64 |
6 | 2 | 4 | 58 | 9 | 2 | 4 | 40 | 9 | 5 | 9 | 76 |
6 | 2 | 5 | 70 | 9 | 2 | 5 | 52 | 9 | ≥6 | Any | E |
6 | 2 | 6 | 83 | 9 | 2 | 6 | 64 |
Note: Pts., Tox., and Eff. denote patients, toxicity, and efficacy, respectively. “E” indicates that the dose should be eliminated because it does not satisfy the safety and efficacy admissible criteria (i.e., it is not admissible because of high toxicity or low efficacy).
In applying BOIN12, one important consideration is that the design assumes that the efficacy endpoint is quickly ascertainable. This assumption is reasonable for some CAR-T trials. For example, in a phase I trial of donor-derived CD7 CAR T cell therapy in patients with relapsed or refractory T-cell acute lymphoblastic leukemia, all 18 complete remissions (in 20 patients) were observed by day 15 [37]. However, for some CAR-T trials, efficacy (e.g., best response) is late-onset and may take a long time (e.g., 60 days) to be observed. For example, In a phase I study on adult patients with relapsed or refractory B-cell non-Hodgkin’s lymphoma (NCT02631044), later complete responses were observed among 28 patients, with a wide range from 1.8 to 12.5 months [38]. In this case, applying BOIN12 would be logistically challenging, because researchers would need to wait for the efficacy endpoint of each cohort to be fully observed before enrollment of the next cohort. To address this issue, Zhou et al. (2021) have proposed time-to-event BOIN12 (TITE-BOIN12) [39], which allows for real-time decisions to be made when some patients’ efficacy (or/and toxicity) data are pending. The basic idea of TITE-BOIN12 is to use the pending patients’s follow-up time to predict the unobserved outcome to enable real-time decision-making; see Zhou et al. (2021) for details. The other approach that may alleviate the late-onset efficacy issue is the U-BOIN design.
Unlike BOIN12, U-BOIN uses a two-stage approach ( Figure 2 ). In the first stage, dose escalation is performed according to toxicity by using the BOIN design. The goal of stage 1 is to explore the dose space and collect preliminary toxicity and efficacy data. At the end of stage 1, the admissible dose set is identified, which is defined as a set of doses that satisfy safety and efficacy requirements as follows:
(1)
(2)
where ϕ T and ϕ E are a prespecified toxicity upper limit (e.g., 0.3) and an efficacy lower limit (e.g., 0.2), respectively, and CT and CE are thresholds. The safety criterion (1) states that, on the basis of the observed data, the probability that the toxicity rate is lower than ϕ T should be greater than the threshold CT ; the efficacy criterion (2) states that, on the basis of the observed data, the probability that the efficacy rate is higher than ϕ E should be greater than the threshold CE . The objective of the admissible rules is to rule out excessively toxic and futile doses, not to identify the optimal dose. Thus, we recommend using relatively low thresholds such as CT = 0.05 and CE = 0.05, thereby avoiding accidental exclusion of promising doses because of the large uncertainty caused by small sample sizes. The values of CT and CE should be calibrated with simulations based on the trial parameters under consideration (e.g., sample size, ϕ T , ϕ E , benefit-risk tradeoff criterion, etc.) to ensure desirable operating characteristics.
At stage 2, U-BOIN randomizes patients among the admissible doses. The goal is to collect additional data to learn more about these promising doses and identify the OBD. At this stage, several randomization strategies can be used, e.g., equal randomization or adaptive randomization, depending on the setting. If the efficacy endpoint requires a long time to be observed, equal randomization is a good choice, because its implementation does not require the observation of the efficacy endpoint. In addition, equal randomization is simple and allows for learning of each dose sufficiently and uniformly. If the efficacy endpoint is quickly ascertainable, adaptive randomization may be an attractive option, because it assigns more patients to more desirable doses (e.g., the OBD), thus not only benefitting patients but also providing more data information on the OBD to facilitate go/no-go decisions and inform the design of subsequent phase IIb or III trials.
2.3 Cohort expansion/phase II trial
After identification of the OBD, cohort expansion (or phase II) is routinely performed at the OBD to gain more information on the toxicity and efficacy profile of the CAR T-cell therapy. The resulting trial is sometimes called a seamless phase I/II trial. For dose expansion, the conventional approach uses the Simon optimal two-stage design: first, n 1 patients are treated. If the number of responses is greater than r 1, an additional n 2 patients are enrolled; otherwise, the trial is stopped [40]. This approach has several limitations when applied to CAR T-cell therapy. First, the first and second stage sample sizes n 1 and n 2 are determined completely on the basis of statistical considerations (e.g., type I error, power, null, and alternative hypotheses) without accounting for the high heterogeneity of CAR T-cell therapy and other objectives of the study. Consider a dose expansion with a null response rate of 0.15, a target response rate of 0.4, a type I error of 0.05, and a power of 80%. The Simon optimal two-stage design will first enroll n 1 = 7 patients, and if the number of responses is two or more, then an additional n 2 = 18 patients are enrolled. In this design, n 1 and n 2 are highly imbalanced, which may be undesirable. As described previously, CAR T-cell therapy is a living drug, and its effects are highly dependent on whether T cells successfully expand in the patient, largely as a result of the patient’s characteristics and genetic makeup. Consequently, CAR T-cell therapy tends to have more heterogeneous effects than other drugs. A substantial possibility exists that, owing to randomness, the first seven patients may have relatively poor prognoses and T-cell expansion, thus leading to accidental termination of a therapy that is actually effective. In addition, beyond efficacy, cohort expansion often has translational objectives (e.g., studying correlates) to understand the mechanism of the therapy. Even if the therapy is not highly effective, treating a reasonable number of patients to obtain biospecimens and data for translational research remains useful. For these reasons, being able to adjust the interim sample size on the basis of multiple considerations is highly desirable for researchers. In the above example, investigators may prefer having an interim examination at the middle of the trial with n 1 = 12 and n 2 = 13. The second limitation of the Simon two-stage design is that it handles only a binary efficacy endpoint, and it allows for only one interim examination. In some CAR T-cell therapies, progression free survival (PFS) or using both PFS and the ORR as multiple endpoints may be more appropriate; in other cases, performing multiple interim examinations is more efficient.
The Bayesian optimal phase II (BOP2) design provides a highly flexible adaptive design to address the aforementioned issues [41]. BOP2 allows investigators to specify the number and timing of interim examinations on the basis of various trial considerations. BOP2 is highly efficient, and it maximizes the power to detect effective treatments. Using the response rate as an example, BOP2 makes interim go/no-go decisions based on the following Bayesian decision criterion:
Here, θ is the null response rate (e.g., 0.25), and Cn is an adaptive threshold depending on the interim sample size n. The criterion states that if the observed interim data indicate that the probability that the response rate is better than the null value (e.g., 0.25) is small (i.e., less than Cn ), the trial should be stopped. The threshold Cn is optimized to maximize the power of the trial; see Zhou et al. (2017) for technical details.
This posterior-probability-based Bayesian decision criterion is intuitive and highly flexible. For example, if the endpoint is PFS, the following similar Bayesian decision criterion could be used:
Here, γ is the null median PFS (e.g., 6 months). The criterion states that if the observed interim data indicate that the probability that the median PFS is better than the null value (e.g., 6 months) is small (i.e., less than Cn ), the trial should be stopped.
One important advantage of BOP2 is that its decision boundaries can be prespecified in a similar manner to that with the Simon design, despite the use of the sophisticated Bayesian decision rule; Table 4 provides an example. Therefore, BOP2 is extremely easy to implement in practice. At each interim examination, the investigator simply assesses whether the stopping boundaries have been crossed to determine go or no-go.
3. Application and trial illustration
BOIN12, U-BOIN, and BOP2 have been used in many trials of CAR T-cell therapies and other targeted therapies. For example, trial NCT04835519 has used BOIN12 to evaluate safety and find the OBD of functionally enhanced CD33 CAR T cells in participants with relapsed or refractory acute myeloid leukemia. U-BOIN has been used to design a phase I trial study to identify the OBD for an NK cell therapy with or without atezolizumab in patients with advanced and refractory non-small cell lung cancer (NCT05334329). NCT04359784 is a phase II trial using BOP2 to assess the efficacy of anakinra in decreasing the occurrence of cytokine release syndrome and nerve damage (neurotoxicity) in patients with B-cell non-Hodgkin lymphoma receiving CD-19 targeted CAR T-cell therapy. Because these trials are ongoing, we present two hypothetical trials to illustrate the use of BOIN12 and BOP2 in practice.
3.1 A dose finding trial based on BOIN12
Consider a phase I CAR T-cell trial to find the OBD from four doses (d1, d2, d3, and d4) = (50, 150, 450, and 800)×106 cells. The maximum sample size is 24, and patients are treated in cohorts with a size of three. The trial may be stopped early if nine or more patients are treated at a dose, and that dose shows acceptable toxicity and desirable efficacy. The lower efficacy limit is ϕ E = 0.25, and the upper toxicity limit is ϕ T = 0.33. The efficacy endpoint is the response rate, and the toxicity endpoint is a severe toxicity event (e.g., grade ≥ 3 cytokine release syndrome and neurotoxicity). The utility score in Table 2 is used.
Figure 3 shows how the trial is conducted by using BOIN12. The trial starts by treating the first three patients at d1; none of the patients have toxicity, and one patient has a response. On the basis of the decision rule in Figure 1 , because the toxicity rate is lower than λ e = 0.26 (i.e., the escalation boundary of BOIN), Table 3 is used to determine the desirability of d1 and d2, which is 50 and 54, respectively. Because d2 has a higher desirability score, the dose is escalated to treat the second cohort of patients at d2. For the second cohort, no toxicity is observed, and two patients have responses. Again, because the toxicity rate is lower than λ e = 0.26 (i.e., the escalation boundary of BOIN), Table 3 is used to determine the desirability of d1, d2, and d3, which is 50, 69, and 54, respectively. Thus, d2 is retained for treating the third cohort, in which one patient has toxicity, and all three patients have responses. By this point, six patients have been treated at d2, one showing toxicity and five showing responses. Because the toxicity rate is 1/6 < 0.26, on the basis of Table 3 , the desirability of d1, d2, and d3 is 50, 80, and 54, respectively. Therefore, d2 is retained, and the fourth cohort is treated at d2; no toxicity is observed, and two patients have responses. By this point, a total of nine patients have been treated at d2, one with toxicity and seven showing responses, thus reaching the early stopping rule. Dose finding is stopped, and d2 (i.e., 150 × 106) is selected as the OBD. In contrast, if the 3+3 design were used, assuming that all doses are safe, the dose would be estimated until d4, six patients would be treated at d4, and the conclusion would be that no MTD had been found.
3.2 A phase II trial based on BOP2
Consider a phase II CAR T-cell trial, as a continuation of the above dose finding trial, to evaluate the efficacy of the therapy at OBD (150 × 106). The primary endpoint is the response rate. The null value of the response is 0.45 (i.e., regarded as futile), and the target value is 0.7. The type I error alpha is set to 0.05, and the power is set to 80%. According to trial considerations, the investigator prefers having one interim examination at the middle of the trial.
By applying BOP2, the following design is obtained: enroll 13 patients; if eight or more responses are observed, then enroll an additional 13 patients. Among a total of 26 patients, if 16 or more responses are observed, the treatment is regarded as promising.
4. SOFTWARE
The online apps BOIN Suite and BOP2 Suite, available at www.trialdesign.org, provide easy-to-use tools to design CAR-T trials. Each module in the apps has an intuitive graphical user interface and extensive documentation to help users navigate through the processes. Below are schematic steps to design a trial by using BOIN Suite.
Specify the design parameters (e.g., sample size, cohort size, and target DLT rate) through close collaboration between clinicians and biostatisticians, on the basis of clinical considerations and statistical calibration using computer simulations.
Run the software to produce the decision table and to design a diagram, then perform simulations to generate the operating characteristics of the design.
Prepare the trial protocol by using the design template and sample text automatically generated by the software, then conduct the trial.
5. DISCUSSION
CAR T-cell therapy holds great potential to treat and cure cancer. It shows distinct characteristics from other therapies, thus making the conventional MTD-based dose finding paradigm dysfunctional or inefficient. We introduced the phase I-II design paradigm to optimize the dose of CAR T-cell therapies according to the benefit-risk tradeoff. In particular, model-assisted designs such as BOIN12 and U-BOIN provide robust, easy-to-implement approaches to identify the OBD. For cohort expansion and phase II trials, BOP2 provides a highly flexible and powerful design to make go/no-go decisions.
Many challenges remain. For example, often only a subgroup of patients is responsive to CAR T-cell therapy. Precise identification of the sensitive subgroup and corresponding predictive biomarkers is highly important to realize the full potential of precision medicine and further improve the efficacy of CAR T-cell therapy. Some research has been conducted in this regard by incorporating genomic biomarkers [42] and adaptive enrichment strategies [43], but more research is warranted for CAR T-cell therapy.