FDA Approved — And Ineffective: Database Explainer
BACKGROUND AND RATIONALE:
Recent scandals about drugs like Aduhelm and Relyvrio have led to concerns about the U.S. Food and Drug Administration’s drug approval process and whether it can protect the public from ineffective and dangerous drugs. To determine the extent of the problem, we conducted a two-year-long investigation into the scientific criteria that the agency relied on for each of its 429 new drug approvals from Jan. 1, 2013 through Dec. 31, 2022.
All new molecular entities were included from the FDA’s New Drug Applications (NDAs) and Biologic License Applications (BLAs) databases. Four essential criteria were selected for review and are based on regulations passed by Congress in 1962 and codified in 21CFR314.126. which states that “adequate and well-controlled studies” are required to provide “substantial evidence” that a drug is safe and effective. Flexibility was built in for exceptional circumstances in CFR 314 Subpart H. The 1962 regulation led the way to making the United States a global leader in drug regulation.
Still in effect, that regulation provides the basis for the first three criteria assessed in this investigation: 1) randomized control arms; 2) blinding; and 3) replication. Its use of the plural “studies” has generally been interpreted to mean that more than one controlled study is needed. The regulation also states that steps should be taken to minimize bias by using blinding and randomized trials with concurrent controls (rather than historical controls, which the FDA states should be reserved for “special circumstances”).
The fourth criterion rests on the nature of the patient outcome used to measure a drug’s efficacy. Until the late 1980s, efficacy studies generally used a clinical outcome, defined as a measure of whether a patient feels or functions better, spends fewer days in the hospital, or survives longer.
A 1986 appellate court ruling, Warner-Lambert Co. v Heckler, affirmed that the FDA was correct in refusing to approve a drug if it fails to provide a meaningful clinical benefit. The ruling went on to indicate that the effect size (the amount or degree of benefit) would have to be clinically meaningful and not just “statistically significant.” (The word “significant” in this statistical context does not in any way imply “important” or “meaningful,” as it does in common English usage, but instead describes whether or not the finding is likely to be due to chance alone).
In the late 1980s, the agency began to increase its acceptance of surrogate outcomes, such as a laboratory test or imaging study, that are thought to be “reasonably likely” to predict a clinical benefit. Surrogate outcomes were originally used by sponsors to determine whether it would be worthwhile to progress to larger studies that test for a clinical benefit; alternatively, if surrogates were submitted to the FDA for final approval, they were generally reserved for drugs to treat serious and life-threatening illnesses. Since then, multiple studies have shown that surrogate outcomes, even “validated” surrogates, often correlate poorly with clinical outcomes. Today, surrogate outcomes are no longer reserved for exceptional circumstances. Rather, as this investigation shows, surrogates have become the most common endpoint in studies submitted to the FDA by companies seeking approval for their drugs. This raises concerns about the clinical benefit of many drugs that have been approved.
For this reason, this investigation used a fourth criterion in assessing the science behind the 429 drugs in our database: whether or not the study or studies on which an approval was based looked at a clinical versus a surrogate outcome.
This investigation did not assess drug harms and focused solely on whether data reviewed by FDA to approve drugs were sufficient to ensure that the drugs were effective. Even in a safety context, we believe it is important to consider efficacy, because ineffective drugs can still cause physical and financial harm. Some drugs currently on the market have never been shown to be effective.
METHODOLOGY:
All 429 drugs (and imaging tracers or dyes and vaccines) were entered into an Excel database, and each study or studies used to grant approval was reviewed as listed in the sponsor’s FDA-approved label under “Section 14: Clinical Studies.” Based on the FDA’s criteria regarding what constitutes “substantial evidence” of efficacy to support new drug approvals, four trained reviewers assessed whether the studies satisfied each of this investigation’s four criteria: randomized control arms; blinding; replication; and clinical outcome. If the study’s methodology was not adequately described in the label, the expert reviewers sought more in-depth descriptions from ClinicalTrials.gov, the FDA’s Multi-Discipline review, and/or in the published medical literature. Only those studies cited by the FDA for its approval were assessed. Each of the drug labels can be found in current and archived databases, as listed below and in the database itself.
Only the first approval of each drug was reviewed (drugs that are already on the market can win multiple subsequent approvals if they are tested and shown to work for other conditions or diseases). Many drugs have received multiple approvals for various conditions; for example, Avastin and Keytruda were approved to treat numerous cancers, and this investigation only evaluated the first time the drug was approved and originally put on the market.
In order to avoid charges of bias against drug makers, reviewers were trained in cases of uncertainty to always indicate that a study has met the standard for the criterion in question. For example, if a study was only partially blinded, it would be interpreted as blinded. If a study used a different population to conduct a second study, even though it wasn’t a true replication, it was scored as having been replicated. If a study wasn’t blinded to either the patients or the investigators, but outcomes were assessed by a blinded third party, the study would be marked as blinded.
For drugs granted simultaneous approvals for more than one condition or disease, reviewers would score the criteria as met by selecting criteria from studies for both indications, even if the criteria were not met for one of the conditions. For example, if one study was blinded and the other was not, or the first was replicated and the second was not, the approval would be marked as meeting both the blinding and replication criteria. Examples include: Rozlytrek, which was reported as “replicated” even though it was initially approved for two cancers and only one of the cancers had a replication study; and Lutathera which was also approved for two cancers but only one of the cancers studied had a control group. By using this methodology in cases of uncertainty or partial evidence, this investigation purposefully biased rulings in favor of the sponsor’s claims of efficacy.
When a clinical outcome is a primary endpoint, it is marked as clinical regardless of whether the outcome was positive or negative. This was relevant in the case of Fetroja, which used a combined surrogate and clinical outcome in which the clinical outcome was negative and the drug was approved based on the surrogate component of the combined endpoint.
After all four criteria were assessed, a final global assessment of each drug was made to determine whether the research studies met, or failed to meet, “minimal FDA standards.” If all four criteria were marked as having been met, the reviewer entered “yes” and if any one or more of the four criteria were missing, the reviewer entered “no.”
We found that 73 percent of new drug approvals were based on studies that failed to meet one or more of the minimal criteria set forth by the FDA. The intentional bias in favor of drugmakers means that it is likely that the situation is worse than this investigation indicates.
Two databases are maintained for interested researchers. The initial database is locked and is maintained as a record of how all statistics were derived as reported in the published articles in The Lever. The locked database includes errors made by both the reviewers and the FDA (FDA’s errors included describing two drugs as a single drug, for example). Data were initially locked in October 2024, however, subsequent mechanical errors (line adjustments) incurred during downloads of the data and the line errors (one drug reported on three lines, for example) were corrected and a final data lock was made on Feb. 25, 2025. Further changes were made on March 13, 2025 to correct dead or missing hyperlinks. No criteria assessments were changed at any point.
The live online database has been updated to correct some errors. An Error and Update sheet tracks all corrections and updates. Readers are invited to submit any suspected errors or updates (such as drug removals or new boxed warnings) for potential adjustments that will be made by Jeanne Lenzer and/or other trained reviewers. As such, the online database is not fit for statistical analysis and instead is provided as a service to readers. If researchers want to conduct an analysis on the live database, they will need to vet the determinations (all corrected findings will be marked as such) using a clearly defined protocol.
The online database, which is available to the public, is searchable by drug names, and each drug appears with a color-based scoring response and its explanation as follows:
GREEN (4 points): The clinical studies used to guide an approval decision met all four minimal scientific standards. This means that the methodology used to assert efficacy and safety conformed with the general FDA standard for what is ordinarily required. However, meeting all four standards does not guarantee that a drug will prove to be effective and/or safe, only that the findings are more likely to support the sponsor’s claims.
YELLOW (3 points): The study or studies used for FDA approval did not meet one of the four minimal FDA standards criteria, and therefore the studies submitted in support of the drug are not optimal according to the FDA’s usual requirements.
RED (0, 1 or 2 points): The failure of a study or studies to meet two, three, or all four of the minimal FDA standards makes it impossible in almost any circumstance to trust that the drug is safe or effective. However, it should not be interpreted to mean the drug is a bad drug – only that the evidence provided by the drug’s sponsor is inadequate to verify claims of safety and efficacy.
We assessed additional elements for each new drug approval, including whether the drug was awarded orphan status; breakthrough status; carried a boxed warning at the time of approval (the most serious warning the FDA regarding serious and/or potentially deadly side effects); and whether the drug was withdrawn or discontinued after approval.
KAPPA REVIEW:
An interrater reliability review was conducted on twenty percent of the full dataset. Cohen’s Kappa statistic was 0.841 based on the following two-reviewer results:
R=Reviewer
R1 & R2 both affirmative: 196 instances
R1 affirmative R2 negative: 19 instances
R1 negative R2 affirmative: 8 instances
R1 & R2 both negative: 129 instances
Cohen's Kappa The standard interpretation of Cohen’s kappa is as follows:
0: no agreement
0.01–0.20: none to slight agreement
0.21–0.40: fair agreement
0.4–0.60: moderate agreement
0.61–0.80: substantial agreement
0.8–1.00 as almost perfect agreement (0.84 for this investigation)
McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia medica, 22(3), 276-282.
LIMITATIONS
This investigation focuses on the approval process itself, and the database did not include a systematic review of harms, although some specific cases are detailed in the articles. Nor did it include data on events occurring after approval, such as post-market studies, drug complications picked up after approval, or lawsuits revealing harms.
We did not address whether any of these drugs are either safe, or effective; drugs that failed to meet one or more of the four ‘minimal criteria’ could still be safe and effective, while those that did satisfy all of the criteria could nevertheless be unsafe, and/or ineffective. We assessed only whether the agency had sufficient evidence to establish “substantial evidence” that a drug is effective, according to the FDA’s own Guidance on Demonstrating Substantial Evidence of Effectiveness.
We did not address the methodological quality of the submitted studies themselves. There are many potential sources of bias in any study; even studies meeting all four basic criteria can (and often do) lead to false conclusions of efficacy and safety. For example, we did not evaluate effective unblinding, dropout rates, the use of historical controls, run-in phases, or whether “statistically significant” results were clinically meaningful (see Part 1 of the series for examples and a discussion of the latter issue).
As noted, reviewers were trained to err always on the side of accepting any of the criteria as having been met if there was uncertainty. We thus can only comment that at least 73 percent of approvals were based on studies that failed to meet all four criteria that the FDA asserts are standard.
DISCUSSION
In addition to accepting provisional and incomplete data from randomized clinical trials, the agency has also accepted so-called “real-world” or observational data as part of approval reviews. While real-world data might have some useful purposes to identify drug harms, such data are alarmingly unreliable when used to approve drugs; a study of how well real-world data replicated randomized controlled trials found that non-randomized studies failed in 85 percent of cases.
This investigation demonstrates that the extensive use of expedited pathways is associated with an increased risk that claims of benefit will later prove to be false or misleading. We did not address how often the use of an expedited pathway was appropriate, although we did find some examples of how sponsors have exploited their use (see Part I). Clearly, there are times when limited data and an expedited pathway are necessary and reasonable. For example, one drug in the database, Ebanga, was approved based on a single, randomized, controlled trial (RCT) to treat Ebola. In the face of a rapidly progressive and highly fatal condition, it was not necessary to replicate the first study, which showed a survival benefit. Instead, researchers subsequently tested Ebanga against other antibiotics, establishing it as a successful intervention.
However, the Ebola outbreak was an exceptional circumstance, which is not the case for most conditions, even most cancers and pandemics, which generally have far lower mortality rates than Ebola. It is worth noting that researchers were able in a time of crisis to carry out a highly useful RCT during an Ebola outbreak. Replication is one of the cornerstones of good science — without it, there is no science in most instances. However, the FDA has increasingly allowed the exception to become the rule, and has recently issued a draft guidance (pending at the time of this writing) to allow the agency to use a single study for drug approvals.
Conversely, even well-controlled, blinded RCTs using a clinical outcome can lead to incorrect conclusions. For example, Rexulti met standard criteria to treat psychosis in demented patients and to treat depression, but a subsequent analysis of Rexulti and two same-class drugs found they failed to provide the initially claimed benefits. There are many ways that even well-controlled, blinded, randomized trials with clinical endpoints can be undermined by research bias, such as the use of run-in phases and extension phases (which exclude from analysis patients who would be likely to do poorly). Strawman comparators (such as dose differences intended to amplify the harms of a comparator drug), and publication of only a subset of patients (as was the case with high-dose steroids for acute spinal cord injury) are among the many ways that RCTs can lead to exaggerated and false claims of benefit.
Although both expedited and standard approval pathways can lead to incorrect conclusions about a drug, it is only when bedrock criteria are fulfilled (like having a control arm, etc.) that issues of bias can be addressed to further reduce false results.
Our conclusion is that the approval of Aduhelm, Rexulti, and other drugs discussed in this article are not one-off lapses in decision making at the FDA, but instead are part of a systemic problem in which the agency has, in concert with industry, lowered the scientific bar for drug approvals. This decline has affected the overwhelming majority of drugs approved by the agency for several decades.
Databases used for this investigation
New Molecular Entities and New Biologic Approvals by Years
Archived NME and BLAs prior to 2015
Novel Drug Approvals (direct links to drug labels) by year
Novel Drug Approvals Archived 2011-2016
CDER Drug and Biologic Accelerated Approvals Based on a Surrogate Endpoint
FDA database of surrogate approvals
W/D cancer drugs based on accelerated approvals
Drugs@FDA with history of actions
"Unapproved Drugs" that remain on market
Database lead analyst: Jerome R. Hoffman, M.D., professor of medicine emeritus UCLA, epidemiologist and clinical trial expert;
Database members: Milutin Kostic, M.D., psychiatrist and former Fulbright Scholar; Joe Fraiman, M.D., emergency physician; Maximilian Siebert, PhD, Research Fellow in Therapeutic Science, Harvard Medical School.
Database conception and administrators: Jeanne Lenzer and Shannon Brownlee; Guarantor: Jeanne Lenzer (for access to original locked database, contact jeanne.lenzer@gmail.com)
Advisors
Jerome R. Hoffman, MD, Professor Emeritus of Medicine, epidemiologist, UCLA:
Reshma Ramachandran, MD, MPP, MHS; Assistant Professor, Yale School of Medicine; Co-Director, Yale Collaboration for Regulatory Rigor, Integrity, and Transparency (CRRIT).
Adriane Fugh-Berman, MD, Director, PharmedOut Georgetown University Medical Center or Professor, Dept of Pharmacology and Physiology, Georgetown University Medical Center
Maximilian Siebert, PhD, PhD post-doc Harvard, database group/METRICS US and Germany
Steven Goodman, MD, Professor of Epidemiology and Population Health at Stanford University
Iona Heath, MD, Former President Royal College of General Practitioners, England
John Ioannidis, MD, Professor of Medicine, Stanford, international expert on clinical trials, US and Greece
Milutin Kostic, MD, Fulbright scholar, psychiatrist, Associate Professor Faculty of Medicine, University of Belgrade, Serbia.
Anonymous MD, FDA insider
Joe Fraiman, MD, emergency physician
Erick Turner, MD, Professor Emeritus, Department of Psychiatry, Oregon Health & Science University, former FDA advisor
Kim Witczak, FDA consumer representative
Diana Zuckerman, PhD, President National Center for Health Research
Paul Glasziou, PhD, Professor Bond University evidence-based medicine, Australia