INTRODUCTION Alzheimer disease (AD) drug development has a high failure rate. Drug development decision making can be improved based on lessons learned from past trials. Improved interpretation of animal models, better pharmacologic characterization in phase I and phase II trials, appropriate sample size, diagnosis of AD with biomarker support, optimization of global recruitment, and avoiding inappropriate subgroup analyses can improve drug development success rates. Alzheimer disease (AD) doubles in frequency every 5 years after the age of 65 years and is becoming increasingly common as the world's population ages. It is estimated that in the United States alone, the number of patients with AD will burgeon from 5.3 million now to nearly 14 million by 2050.1 To address this impending public health disaster, there is an urgent need to discover and develop new drugs to prevent, delay the onset, slow the progression, or treat the cognitive and behavioral symptoms of AD. AD drug development has proven to be unusually difficult with a 99.6% failure rate in the decade of 2002 to 20122; currently, the success rate continues at the same low level. Each clinical trial provides evidence on a narrow range of questions. For example, does this dose of the test agent, given for a specific period of time (e.g., 18–24 months for disease‐modifying therapies [DMTs]), to a defined population (e.g., preclinical AD; prodromal AD; mild, moderate, or severe AD dementia) produce a statistically significant difference compared with placebo in change from baseline on the prespecified primary outcomes, such as those measuring cognition (e.g., the Alzheimer's Disease Assessment Scale – Cognitive Portion)3 and function (e.g., the Alzheimer's Disease Cooperative Study Activities of Daily Living scale).4 Questions regarding effects in other populations, other doses, other exposure durations, and effects on other instruments must all be addressed in separate trials. These complex constraints on clinical trials have evolved to allow them to define efficacy in a way that is acceptable to regulatory agencies, such as the US Food and Drug Administration (FDA) and the European Medicines Agency. Regulatory acceptance of the data is the only way to gain marketing approval and make the agent widely available to patients. Each trial is a critical test of a narrow hypothesis and each incorporates methodologic decisions that offer valuable insights into AD drug development. It is important that learnings from every trial be optimized so the lessons learned can be applied to future trials and improve the likelihood of success. A review of the literature identifies several steps in drug development that have been the source of recurrent challenges to success. Perspectives on these lessons learned from past clinical trials are provided here with suggestions for how these lessons may be applied to future trials. Figure 1 shows how these lessons align with the phases of drug development. Figure 1 Lessons learned as they apply to the phases of drug development. BBB, blood‐brain barrier. LESSON 1: ANIMAL MODELS DO NOT PREDICT HUMAN EFFICACY OR TOXICITY Animal models of AD are an important means of investigating efficacy and toxicity in the preclinical state prior to exposing humans to possibly toxic or inefficacious compounds. A commonly used animal model is a transgenic mouse with the amyloid precursor protein/presenilin 1 double mutation. Triple and 5× transgenic models as well as many types of gene knock‐in and gene knock‐out models have been created to allow focused interrogation of the biology of AD.5 Many of the animal models address the amyloidogenic process leading to cortical plaques similar to those observed in human AD.6 These genetically engineered animals have abnormalities of amyloid metabolism but generally lack other aspects of human AD. Amyloid transgenic animals do not exhibit tau accumulation or cell death and have limited inflammatory changes.7 They have cognitive changes but do not develop severe progressive dementia equivalent to the human disease. Many types of therapy have been successful in reducing amyloid abnormalities in these animals and have often led to improved cognitive performance on tests, such as the Morris Water Maze or Novel Object Recognition.5 None of these successes at the preclinical level has predicted success at the human level. An important issue that has arisen with regard to animal models is their irreproducibility.8 If an experiment cannot be reproduced within a single model or across related models, then its ability to predict human outcomes is suspect. Strain, age, gender, diet, light, and handler behavior may all influence animal behavior. Randomization and sample size are important aspects of animal trial design that have often been suboptimal.9 Lack of rigor with regard to these aspects of animal model testing may contribute to the lack of reproducibility both across models and in translating results from animals to humans. The lesson to be derived from these observations is that animals serve as important gateways in the drug development process when the animal experiments are rigorously executed. These models reveal the impact of intervention on specific pathways, such as amyloid production or clearance. Advancing a drug to human testing that did not succeed as expected in animals would be unwise; as success in animals provides evidence about a specific aspect of the biology of AD and the relevant mechanism and efficacy of the proposed therapy. They do not provide evidence about the potential impact on the wider array of pathology characteristic of human AD and cannot be expected to predict the success of a candidate therapy in the human setting. The models are simulacra of specific aspects of human AD, such as amyloidosis, and cannot be taken as models of the full spectrum of pathology of human AD or predictors of human benefit.10 Use of induced pluripotent stem cells derived from humans with AD is a promising means of humanizing drug development much earlier in the process and possibly recapitulating more human‐like circumstances for preclinical drug efficacy and safety assessments.11, 12 LESSON 2: INSURE THAT THE DRUG ENTERS THE BRAIN Small molecules intended to impact AD pathophysiology must cross the blood‐brain barrier (BBB). This applies to all classes of agents (monoclonal antibodies [mAbs] are discussed below). In some cases, AD therapies have not been shown to enter the central nervous system (CNS) before advancing them to late‐stage trials. In general, compounds must be small (<500 Daltons) to cross the BBB unless they are the subject of facilitated transport.13 Even if they are the appropriate size, they may be subject to transport out of the CNS, for example, by p‐glycoprotein transport mechanisms.14 The compound‐excluding mechanisms of the CNS differ in rodent models and in humans; thus, entry into the CNS in animal models does not adequately establish BBB penetration in humans.15 Tarenflurbil is an example of an AD treatment candidate that had effects on the animal model system of AD but likely did not cross the human BBB in sufficient amounts and failed in clinical trials.16 Nonhuman primate models provide better guidance for human BBB penetration but are not often used in development programs. The optimal means of demonstrating BBB penetration, establishing the plasma/brain ratio, and determining if CNS exposure is compatible with therapeutic effects as demonstrated preclinically, is to measure the level of the agent in the cerebrospinal fluid (CSF). The CSF levels are a reasonable surrogate of brain levels since brain levels are inaccessible in human studies. Brain accumulation, brain clearance mechanisms, and intracellular entry are not completely resolved by CSF measures and remain a source of uncertainty.15 Monoclonal antibodies are large molecules that are excluded from the CNS except for very small numbers of the agent which cross the BBB. In most cases, 1 of 1,000 mAbs cross the BBB and enter the CNS.17 They may engage Aβ or tau directly or they may initiate inflammatory mechanisms in which activated microglia ingest the target and remove it from the CNS.17 An alternative view of mAb activity is that by binding peripheral Aβ, mAbs can create a “peripheral sink” that will create a peripheral/central imbalance and passively pull Aβ across the BBB into the periphery where it can be excreted and disposed.17 The peripheral sink hypothesis was assessed in recent trials of solanezumab, an mAb binding to peripheral Aβ. The agent led to markedly increased peripheral Aβ levels but did not meet prespecified clinical outcomes and did not change CSF Aβ(1–42) levels. These findings do not support a peripheral sink effect.18 The lesson of these observations is that, in small molecule drug development, CNS levels of the agent should be established in phase I. Proof‐of‐concept cannot be determined without establishing CNS penetration and plasma/CSF ratios. LESSON 3: DETERMINE MAXIMUM TOLERATED DOSE It is important to establish a maximum tolerated dose (MTD) whenever possible to insure that the highest possible doses have been explored. In some cases, occupancy studies may allow conclusions about dosing without an MTD if the receptor is fully occupied at lower doses. In other cases, solubility, volumes, or other limitations may cap the administered dose and the MTD cannot be determined. Beyond these exceptional circumstances an MTD should be determined. Without an MTD, failure to show a drug‐placebo difference in phase II or phase III will raise the issue of an inadequate dose. LESSON 4: SAMPLE SIZES MUST BE SUFFICIENT TO TEST THE HYPOTHESIS RIGOROUSLY Efficacy of a test agent is determined by the difference in change from baseline of the treatment compared with placebo. For symptomatic therapies tested in short trials (e.g., 3–6 months), there is typically improvement above baseline for the active agent compared with no change or mild decline below baseline in the placebo group.19 There is often a desire to use small sample sizes to minimize the cost of clinical trials. Small groups, however, can be affected by outliers and recruitment biases and can be misleading because of irregular outcomes in the placebo group. Unusual improvement in the placebo group undermines the ability to determine if the drug is efficacious, whereas an unusually rapid decline in the placebo group may be misleading in suggesting an overly robust benefit from the therapy. The latter outcome might lead investigators to under‐power a subsequent trial expecting a similarly robust effect. To minimize the risk of irregular outcomes because of small numbers, the placebo group should include at least 100 members.20 The sample size for each active treatment group should be at least as large as the placebo arm of the study. This lesson applies primarily to phase II trials when go/no go decisions are formulated. Larger samples are necessary in phase III trials to confirm expected effects. LESSON 5: PLACEBO DECLINE PROVIDES INSIGHT INTO THE POPULATION RECRUITED TO THE TRIAL AD is a progressive disease and failure to observe decline in the placebo group in trials of 6 months or more duration is indicative of a failed trial.21 For DMTs, efficacy is determined by less rapid decline in the group on active therapy; lack of placebo decline indicates that the hypothesized superior efficacy of the active treatment was not tested and no conclusion regarding efficacy can be drawn. Lack of decline may sometimes be attributed to inclusion of non‐AD patients into the trial (see Lesson 7 below). The lesson suggested is that failure of decline in a placebo group indicates operational flaws and a failed trial. LESSON 6: ACTIVE COMPARATORS PROVIDE INSIGHT INTO THE QUALITY OF TRIALS Donepezil has been shown to improve cognition in mild‐moderate AD with a 1.5–2.5 Alzheimer's Disease Assessment Scale – Cognitive Portion difference in drug vs. placebo change from baseline.22, 23 An active treatment arm with this agent can serve as a useful comparison group in trials. If no benefit is seen in the donepezil arm of the study compared with placebo, then the study has not achieved its operational goal and no conclusion can be drawn regarding the efficacy of the test agent.21, 24 The lesson derived is that an active comparator arm can provide insight into the quality of a clinical trial and failure to improve with an active comparator suggests operational shortcomings in the trial. If inclusion of an active comparator requires a true placebo arm in the trial, then implementation of an active comparator arm may not be feasible; nearly all current trials allow standard of care with approved agents. LESSON 7: CLINICAL DIAGNOSIS OF AD WITHOUT BIOMARKER CONFIRMATION IS NOT SUFFICIENTLY ACCURATE FOR TRIALS Studies of amyloid imaging in patients recruited to clinical trials show that up to 50% of mild cognitive impairment and 25% of those with mild dementia do not have a measurable amyloid plaque burden and do not meet biomarker criteria for AD.25 Subjects without AD typically show little or no change over time26 and, when included in AD trials, will be reflected in less decline of the placebo group and greater difficulty establishing a drug‐placebo difference. The apolipoprotein E epsilon 4 gene is present in 65% of patients with AD27; therefore, trial populations with substantially lower representations of apolipoprotein E epsilon 4 carriers can be assumed to have fewer patients with AD in the trial cohort. For anti‐amyloid agents, the presence of brain amyloid is critical for demonstration of efficacy. For drugs targeting nonamyloid AD‐related mechanisms, such as tau agents, the presence of brain amyloidosis is necessary to insure the accuracy of diagnosis. Agents producing cognitive enhancement may have mechanisms independent of AD‐specific pathology (e.g., 5‐HT6 antagonists) but accurate diagnosis will help insure that the placebo group demonstrates the expected natural history of AD. A lesson to be derived from these observations is that clinical diagnosis of AD is insufficient for inclusion of subjects in clinical trials. Confirming the diagnosis using amyloid imaging or CSF measures of amyloid and tau or phospho‐tau is required to insure the presence of AD in trial participants.28 LESSON 8: TARGET ENGAGEMENT SHOULD BE DEMONSTRATED IN PHASE II OF THE DEVELOPMENT PROGRAM Rigorous demonstration of hypothesized pharmacological effects in early‐stage trials will assist in successful drug development. Many drugs fail in phase III because of lack of efficacy29; in some cases, the engagement of the intended target in humans has not been demonstrated in phase II trials and the biological basis for the expected clinical response has not been established. Phase IIa proof‐of‐concept studies have two related aspects: target engagement and proof‐of‐pharmacology. Target engagement is a measure that shows that the intended target of the drug is engaged in the human setting. Receptor occupancy studies using positron emission tomography are often used to show target engagement of agents for which there is a specific receptor.30 Another example of target engagement is assessment of the effects of beta‐site cleavage enzyme (BACE) inhibitors on CSF BACE activity.31 Proof‐of‐pharmacology can be demonstrated by showing downstream effects of successful target engagement. Gamma‐secretase inhibitors or gamma‐secretase modulators result in reduced ß‐amyloid protein synthesis, which has been demonstrated using the stable isotope‐label kinetic technique.32, 33 Gamma‐secretase inhibitors/gamma‐secretase modulators change the cleavage of the amyloid protein producing an increase in short amyloid fragments detectable in CSF and establishing a pharmacologic effect.33, 34 BACE inhibitors have been shown to decrease CSF amyloid following chronic treatment, supporting proof‐of‐pharmacology.35 Removal of ß‐amyloid plaques from the brain can be regarded as a target engagement and proof‐of‐pharmacology outcome. Aducanumab is a mAb that showed a dose‐ and time‐dependent reduction in amyloid plaques and a reduction in rate of cognitive decline on some (but not all) cognitive measures included in a phase I trial.36 Target engagement of this type is not sufficient to predict drug efficacy. Some immunotherapies (e.g., AN1792, bapineuzumab, and gantenerumab) have been shown to have target engagement and reduce plaque amyloid without producing a corresponding clinical benefit.37, 38, 39 Thus, engaging the target is necessary for treatment response but does not insure that cognitive benefit will ensue. This may reflect the fact that insoluble plaque amyloid is not the most toxic form of amyloid; soluble species are more neurotoxic and may not be reduced by all antibodies. Phase II trials in drug development should minimally establish target engagement and doses to be advanced to phase III. A robust phase II program would also show a clinical benefit. Large sample sizes are typically required to demonstrate significant clinical benefit and many sponsors have progressed to phase III without conducting phase II studies. Establishing proof‐of‐target engagement or proof‐of‐pharmacology is one means of derisking phase III even without demonstrating cognitive benefit in phase II. DMTs being advanced without evidence of target engagement lack a critical aspect of the foundation of drug development. Development programs with biomarkers have a greater success rate than those without.40 Progressing to phase III without phase II also limits the availability of safety information that may impact phase III.41 A challenge for AD drug development is that there are many mechanisms for which there are no biomarkers available to demonstrate target engagement or proof‐of‐pharmacology. It is critical that, as agents are advanced from screening in assays to animal models to human testing, biomarkers indicative of target engagement be developed in concert with the candidate therapy. Biomarkers developed during drug development programs may become companion biomarkers useful to practitioners when the agent is advanced to the market. The lesson derived from the existing studies is that demonstration of target engagement is a key means of derisking a development program and that proceeding to phase III without target engagement or proof‐of‐pharmacology places the program at high risk for a negative outcome. LESSON 9: ASSURE DOSE SELECTION IN PHASE II Dose‐response relationships provide critical information for drug choices. Dose escalation studies in phase I and dose refinement studies in phase II should provide confidence in the dose(s) selected for phase III.42 Dosing approaches ideally establish a low dose that is ineffective, one or two mid‐range doses that are effective, and a high dose that is not tolerated and not acceptable. Regulatory agencies expect that patients are given the lowest effective dose to insure that they are not being exposed to unnecessary side effects. LESSON 10: COLLECT MULTIPLE BIOMARKERS TO ASSESS OUTCOMES Knowledge of the neurobiology of AD is incomplete. Moreover, AD biology is complex,43 and biomarkers provide limited windows into this complex and ill‐understood disease. Although working models of the order of events in AD have been posited, none have been proven and none have guided successful DMT development. Agnostic approaches to biomarkers (e.g., amyloid‐tau, neurodegeneration) can be used to acknowledge the exploratory nature of our biomarker documentation of drug effects.44 To support DMT as the outcome of a therapy, trial sponsors should collect amyloid‐tau, neurodegeneration biomarker data as well as other emerging biomarkers and biomarkers specifically linked to the mechanism of the intervention to delineate a comprehensive view of the impact of treatment. Preclinical, clinical, and biomarker data can be synthesized to provide support for DMT. The FDA specifies how to qualify biomarkers in trials of putative DMTs recommended.45 A lesson learned is that dependence on single key biomarkers should be avoided and multiple biomarker outcomes collected. LESSON 11: WORLD REGIONS VARY IN TERMS OF THE PATIENTS ENTERING TRIALS Clinical trials, especially those for DMTs, are large and often require more than 1,000 patients. Timely recruitment for such trials often necessitates inclusion of clinical trial sites from many global regions. The variations in language, culture, trial experience, standard of care, genetics, nutrition, and other aspects of AD create variability in the data that may compromise the ability to demonstrate a drug‐placebo difference. Recent studies of global trials show that North American and Western European data are very similar with regard to baseline characteristics, placebo group behavior, outcome measures, and adverse event reporting. Substantial variations on these parameters were observed for other global regions.46, 47 The lesson to be derived from these studies is that sponsors should seek ways of minimizing variability in global trials to insure greater data homogeneity. LESSON 12: SUBGROUP ANALYSIS OF A NEGATIVE TRIAL CAN BE MISLEADING Post hoc analyses of negative trials are often pursued to detect treatment‐responsive subgroups that can be exploited in future trials. This approach entails substantial risk of being misled by spurious results. Subgroups are not subject to the same recruitment or randomization as the original group; the sample sizes of subgroups are often small leading to underpowered results; and the outcome measures are typically not optimized for a specific subgroup. Basing a phase III program on a subgroup analysis of a phase II trial with a negative outcome has usually resulted in a negative phase III trial. Examples of this include the negative phase III outcomes for tarenflurbil,48 bapineuzumab,49 solanezumab,50 and ELND005.50 A lesson from these experiences is to apply guidelines for how to reduce the likelihood of being misled by phase II subgroup analyses. Table 1 51,52 shows the principal recommendations for subgroup analysis. A hypothesis‐generating subgroup observation can be tested by conducting a phase II trial for this subgroup. Table 1 Guidelines to establish the likely validity of a subgroup as a guide to additional trials; a “yes” answer is most consistent with a nonspurious subgroup (from 51,52) Design Was the subgroup variable a baseline characteristic? Was the subgroup variable a stratification factor at randomization? Was the subgroup hypothesis specified a priori? Was the subgroup analysis one of a small number of subgroup hypotheses tested (≤5)? Analysis Can chance explain the subgroup difference? Was the test of interaction significant (P < 0.05)? Was the significant interaction effect independent, if there were multiple significant interactions? Context Was the direction of the subgroup effect correctly prespecified? Was the subgroup effect consistent with evidence from previous related studies? Was the subgroup effect consistent across related outcomes? Was there indirect evidence to support the apparent subgroup effect – for example, biological rationale, laboratory tests, animal studies? Systematic reviews Is the subgroup difference suggested by comparisons within rather than between studies? John Wiley & Sons, Ltd. SUMMARY Successful AD drug development is an unmet need and more disciplined approaches to drug development will assist in reducing the current high rate of negative trials.53 The lessons learned from trials with negative outcomes can inform drug development. Better preclinical models of AD, better knowledge of BBB penetration, MTD, dose response, demonstration of target engagement, more accurate diagnosis using biomarkers, select use of regional populations in trials, adequate trial size, construction of trial populations that exhibit decline in the placebo group, improvement in an active comparator group, and avoidance of misleading subgroup analyses can all contribute to greater success in AD drug development. These improvements in trial conduct and interpretation must be matched by studying candidate therapies that have superior efficacy. This two‐pronged approach to AD therapy development will result in delivering urgently needed drugs to the rapidly growing AD population. Conflict of Interest Dr Cummings has provided consultation to Abbvie, Acadia, Adamas, Anavex, Avanir, Avid, Axovant, Biogen, Boehinger‐Ingelheim, Bracket, Dart, Eisai, Genentech, Lilly, Lundbeck, Medavante, Merck, Neurocog, Novartis, Otsuka, Pfizer, QR Pharma, Roche, Takeda, and Toyama pharmaceutical and assessment companies.