Evaluating cancer drugs at FDA
In the June 2nd paper issue of BusinessWeek (published online 5/21) the article “Cancer’s Cruel Economics” by Catherine Arnst provides a high-level look at the difficulty some small copmpanies are facing getting their cancer therapies approved in the US.
The focus is on cancer immunotherapies, particularly Antigenics’ Oncophage. I last discussed Oncophage in 2006, after the first report of its Phase 3 results. I have also devoted space in this forum to other cancer therapies mentioned in the BusinessWeek article including Dendreon’s Provenge and Genitope’s MyVax.
I was intrigued by a quote attributed to Richard Pazdur, head of CDER’s Oncology review division:
“[Post hoc subgroup analysis for differential treatment effect] is like shooting an arrow and then painting the bull’s-eye around it,” says Pazdur. “You cannot use subset analysis to salvage a failed trial.”
Pazdur’s concern regarding treatment effect inferences derived from post hoc subgroup (subset) analyses rests on firm grounds, but the quote suggests a black and white attitude towards their utility, without any room for compromise. That’s too bad, because the rule of thumb Pazdur is apparently using to reject subgroup evidence of efficacy is imperfect, undoubtedly resulting in the rejection of some effective therapies.
I’m not going to write a manuscript-length post describing the many risks inherent in inferential subgroup analyses. There are many published reviews you can find that do that. Suffice to say that the risks of both false-negative and false-positive inferences are inflated with subgroup analyses relative to the main analysis (primary hypothesis test), whether the analyses are pre hoc (defined before the trial results accrue) or post hoc (sometimes called retrospective). Pre hoc analyses are less susceptible to Pazdur’s target drawing, especially when the specifics of the subgroup are rigorously pre-defined than post hoc, and so they are preferred by regulators.
What I’ve found to be less well represented in the literature is a situation in which the weight of evidence presented in the subgroup analysis is sufficient to, as Pazdur says, “rescue a failed study.” I’ll not focus specifically on Provenge or Oncophage, but the example from the literature I’ll cite is relevant to both.
In order to determine whether any subgroup analyses provide evidence sufficient to warrant drug approval, it is necessary to first know the expected false-positive rate of a subgroup analysis, given a false-positive rate of 5% in the overall (main) analysis. A 5% rate is chosen, because that rate is usually considered an acceptable one by clinical practitioners and drug regulators. Thus, the null hypothesis is rejected falsely in 1 in 20 trials. FDA usually requires two independent experiments (trials) for evidence of efficacy, resulting in an overall false-positive rate of 2.5% (0.05×0.05), though one statistically significant experiment with corroborating evidence from others is sometimes sufficient, particularly for accelerated approvals.
In an important study published in 2001 by the UK’s NHS, Brookes et al used simulations of 100,000 clinical trials each to determine the false-positive (and false-negative) rates of subgroup analyses for different types of study designs. They simulated two subgroup analysis (ignoring the effects of multiple analyses, which inflate Type 1 error) and tested a variety of relative treatment effect and subgroup sizes.
The simulations showed that when there was in reality no main or subgroup effect of treatment, and the overall (main) analysis of treatment was falsely positive (i.e. null hypothesis was rejected at the nominal p<0.05) then the chance of falsely declaring one subgroup as demonstrating a treatment effect was high. For a survival study, this chance was 61%. In other words, with no real main treatment or subgroup effect, when there was a false-positive main effect, one of two subgroups analyzed will appear to have a treatment effect well over half the time.
However, under the same set of circumstances, when the main effect is not rejected (i.e. a true negative inference is made), then one subgroup will show evidence of a treatment effect much less often, only 6.5% of the time, approaching the overall effect false-positive rate of 5%. In other words, the probability of falsely rejecting a subgroup-specific null hypothesis in the absence of overall and subgroup effects is reasonably low if the overall effect is correctly negative.
Of course, the above simulation findings aren’t by themselves capable of determining whether a subgroup-specific effect is real or not. They simply suggest that the regulator need not reject out-of-hand statistical evidence of a subgroup-differential treatment effect when evidence for an overall effect is absent, as Dr. Pazdur’s quote suggests he is willing to do in some cases.
Evidence that the apparent subgroup effect in a survival study is real will be strengthened by the following factors:
- The main effect does not contradict the purported subgroup effect
- The subgroup-specific analysis was defined a priori
- A significant test of interaction between overall treatment effect and the subgroup is in evidence prior to any subgroup-specific test
- The total number of subgroups analyzed is small, and, if not, an inference of treatment effect made on any one subgroup uses an appropriately conservative adjustment of the significance level
- There is strong biological plausibility for the differential subgroup effect
- The size of the subgroup is large relative to the total sample size (i.e. relatively representative of the total population)
- The conduct of the study, particularly the handling of dropouts and non-compliant subjects, creates confidence in the quality of the subgroup data
Finally, as I’ve argued before in the case of Provenge, when the evidence of efficacy is marginal, regulators have a duty to the public they serve to weigh with utmost care and without bias the risk of introducing an ineffective medicine versus the risk of withholding ready availability of an effective medicine from a gravely ill population without other treatment options.

May 27th, 2008 at 3:01 pm
An interesting discussion regarding the statistical side of the issue; however, sound scientific and regulatory decisions can not and should never be made based solely on the outcome of a statistical analysis. Other crucial information that must be considered are consistency of outcome data acroos clinical trials (consdiered
May 27th, 2008 at 4:05 pm
While the above is a good discussion of the statistical aspects of an approval decision, it does not touch on any of the other information that should be considered in making drug approval decisions, especially when the statistics are less then classically perfect. Dr. Pazdur is notorious for his laser-like focus on pre-specified statistical analyses, prospectively-defined approval criteria and rigid application of approval endpoints that he deems appropriate to support approval. However, his approach tends to be uninformed by rational professional judgment, which must include consideration of (1) how well the mechanism of action is understood and supported by direct scientific and clinical observation/ measurement, (2) whether the effect seen in a subset of patients can be explained by anything other than treatment with the drug based on each case history (despite claims this is anecdotal, understanding the data collectively requires an in-depth understanding of each individual data point), (3) knowledge of the natural course of the disease, and (4) other factors which vary from drug to drug and disease to disease. In other words, the p-value alone cannot be allowed to make the decision if we want our drug approvals and those who make the decisions regarding drug approvals, to be correct, timely and effective.
There is no other field of applied science where the development of scientific progress, and the decisions regarding whether progress has been made, has been so thoroughly co-opted by a single, narrow, profoundly-limited subspecialty (biostatistics) as has happened in clinical research, drug development and drug approvals.
In the end, data from clinical trials and statistical analysis (or any analysis of a data set) can only inform decisions, not make them. The decisions must be made by knowledgeable, experienced, open-minded people who consider all of the information available and make the best decision they can make based on the entirety of the information they possess.
In direct contravention of this well-tested rule of applied science, Dr. Pazdur and his Office of Oncology Drug Products insist that the p-value alone makes the decision, and until they get a sufficiently perfect p-value, the decision will be delay and denial, no matter how compelling the overall weight of evidence, including the medical and first-principal scientific information, may be that a drug is safe enough and effective enough to represent progress that should reach patients.
Isn’t it time we asked ourselves how a crucially important field of scientific and medical inquiry, and an entire regulatory process, somehow came to be dominated by a single, extremely limited method of data analysis? Where is it written that all clinical trials must be designed to accomodate the arcane requirements of relative frequentist staistical analysis, and the only way to decide whether a drug should be approved is to calculate a hypertechnical and near-meaningless (scientifically and medically) statistical metric called a p-value?
A number of drugs that obviously represented progress have been delayed for years by Dr. Pazdur and his office in the last 7 years, with lethal effect on cancer patients, as he stubbornly insisted on clinical trials designed to produce a perfect p-value. It is time for open minds, a much broader perspective on how we test and evaluate new cancer drugs, and an acknowledgment that statistics is not an adequate replacement for the immensely more powerful and virtually unlimited toolbox of scientific inquiry.
Steve Walker
Abigail Alliance
May 27th, 2008 at 7:42 pm
Steve Walker and Frank Burroughs of Abagail Alliance have advocated for nearly 6 years for particular drugs/treatments in the FDA approval process.
Each drug/treatment they have backed in trying to achieve earlier access for terminal cancer patients has eventually been FDA approved, but with the loss of thousands upon thousands of lives because of the delayed approval.
EVERY treatment/drug they advocated for has been approved giving them a perfect batting average when advocating to the FDA and what they testify and tell Congress.
As I understand, they are also advocating for Congress to approve the ACCESS ACT to help terminal patients…
What I don’t understand is how Pazdur is allowed to circumvent Congress and public laws in slowing the accelerated approval and in other ways delaying access to potentially live-saving drugs/treatments.
One such glaring example is Pazdur having Howard Isadore Scher of Sloan Kettering being appointed to the FDA AC panel March 07 to sit in judgment of Provenge; they panel voted 17-0 Provenge was safe and 13-4 it showed substantial efficacy.
In order to sit in judgment on the Provenge FDA AC panel, Scher certified to the FDA he had only 3 Conflict of Interests; internet research suggests he has 17… 14 more than he listed and included 2 egregious ones:
1. Lead investigator of a competing prostate cancer trial which subsequently failed its clinical trail
2. Financial interest in Proquest Investments with ties to other, competing prostate cancer treatments.
Scher, Pazdur’s puppet on that AC panel also wrote a letter against Provenge approval AFTER the FDA panel meeting where Provenge earned an enthusiastic panel recommendation to the FDA.
What’s Pazdur to gain from all his actions?
Does he bear a grudge and resentment for Dr. von Eschenbach being picked over Pazdur at M.D. Anderson in Texas and as the FDA Commissioner? If so, doesn’t Pazdur see his personal vendetta is deadly to cancer patients?
Why now is he giving such interviews?
Pazdur, how about putting patients first for a change?