Examining alternative core symptom inclusion criteria in randomized placebo-controlled trials for acute major depression
Dr. Evyn Peters, MD, FRCPC, MSc
Clinical trials testing the effectiveness of antidepressant medication usually have to compare the active medication to an inactive placebo pill. Despite many successful trials demonstrating the effectiveness of antidepressant medications, a given drug may still fail to separate from placebo. After several decades of clinical trials, placebo response rates have not decreased. This suggests a need to better understand factors that contribute to high placebo response rates.Clinical trials typically have a long list of inclusion and exclusion criteria used to select patients. Virtually all antidepressant trials specify a minimum depression symptom score required to participate in a trial (e.g., a score of 18 or more on the Hamilton Depression Rating Scale; HAMD-17). This practice is problematic for two reasons: (1) it does not appear to have decreased placebo response rates over time, and (2) scales such as the HAMD-17 include items that assess symptoms which are rare or infrequent in outpatient samples. The latter point is particularly problematic because an antidepressant cannot outperform placebo on symptoms/items which were absent or minimal to begin with.According to the diagnostic criteria for Major Depressive Disorder (MDD), there are only two symptoms that are necessary to make a diagnosis: depressed mood, and lack of interest/pleasure. Not surprisingly, in MDD trials, scores on the HAMD-17 items that measure these symptoms are higher than most other symptoms because, by definition, at least one of them is necessary for a diagnosis of MDD. As such, when it comes to designing an clinical trial, these two items may be a better index of depression severity rather than total HAMD-17 scale scores.The purpose of this study is to re-analyse data from 10 randomized placebo-controlled antidepressant trials to examine whether the use more strict inclusion criteria derived from the HAMD-17 scale items that assess core depressive symptoms (depressed mood, item 1, and lack of interest, item 7) would have resulted in larger drug-placebo differences.The main outcome of interest is whether depression scores decreased by 50% or more by the end of the trial (i.e., a treatment response).We expect to find that as scores on these items increase, patients will have lower placebo response rates compared to the sample overall. As a result, drug-placebo differences will be larger.If this is confirmed here and with additional research, the results would suggest that drug developers could design trials with lower placebo response rates simply by altering the inclusion criteria slightly. This would potentially help increase the chances that new medication trials are successful against placebo, thus helping make new treatments available to patients suffering from MDD.
[{ "PostingID": 1623, "Title": "GSK-MY-1043/BRL-029060/115", "Description": "A multicenter, randomized, double-blind, placebo-controlled comparison of paroxetine and fluoxetine in the treatment of major depressive disorder." },{ "PostingID": 1633, "Title": "GSK-29060/448", "Description": "A Double-Blind, Placebo Controlled Trial to Evaluate the Clinical Effects of Immediate Release Paroxetine and Modified Release Paroxetine in the Treatment of Major Depression" },{ "PostingID": 1634, "Title": "GSK-29060/449", "Description": "A Double-Blind, Placebo Controlled Trial to Evaluate the Clinical Effects of Immediate Release Paroxetine and Modified Release Paroxetine in the Treatment of Major Depression" },{ "PostingID": 1638, "Title": "GSK-29060/810", "Description": "A double-blind, placebo-controlled, 3-arm, fixed-dose study of 12.5 mg/day and 25mg/day Paroxetine CR in the treatment of Major Depression." },{ "PostingID": 2129, "Title": "GSK-29060/128", "Description": "A Multicenter, Randomized, Double-Blind, Placebo-Controlled Comparison of Paroxetine and Fluoxetine in the Treatment of Major Depressive Disorder" },{ "PostingID": 2130, "Title": "GSK-29060/251", "Description": "A Double-Blind, Randomized Trial of Paroxetine Versus Placebo In Patients With Depression Accompanied by Anxiety" },{ "PostingID": 20091, "Title": "GSK-WELL AK1A4002", "Description": "A Multicenter, Double-Blind, Placebo-Controlled Comparison of the Effects on Sexual Functioning of Wellbutrin (Bupropion HCl) Sustained Release and Sertraline in Outpatients with Moderate to Severe Recurrent Major Depression" },{ "PostingID": 20092, "Title": "GSK-WELL AK1A4001", "Description": "A Multicenter, Double-Blind, Placebo-Controlled Comparison of the Effects on Sexual Functioning of Wellbutrin (Bupropion HCI) Sustained Release and Sertraline in Outpatients with Moderate to Severe Recurrent Major Depression" },{ "PostingID": 20093, "Title": "GSK-WELL AK1A4007", "Description": "A Multicenter, Double-Blind, Placebo-Controlled Comparison of the Safety and Efficacy and Effects on Sexual Functioning of Wellbutrin (Bupropion HCl) Sustained Release (SR) and Fluoxetine in Outpatients with Moderate to Severe Recurrent Major Depression" },{ "PostingID": 20095, "Title": "GSK-WELL AK1A4006", "Description": "A Multicenter, Double-Blind, Placebo-Controlled Comparison of the Safety and Efficacy and Effects on Sexual Functioning of Wellbutrin (Bupropion HCl) Sustained Release (SR) and Fluoxetine in Outpatients with Moderate to Severe Recurrent Major Depression" }]
Analysis Plan OverviewThe analysis will be conducted using Stata and/or R. We are requesting .csv data files which can be easily imported into Stata or R.The analysis is estimated to require no more than 12 months to complete, during which time the data will be stored on the University of Saskatchewan encrypted network. After the analysis is complete, or at the end of the data access period (whichever is sooner), the data will be permanently deleted from this network. Only the statistical output (i.e., the analytic results) and the statistical syntax will be retained for a minimum of five years. We will not be using any kind of machine learning or artificial intelligence.The analysis will proceed in several steps:Step 1. Data PreparationData from the 10 trials (ITT populations only) will be pooled together resulting in one large dataset with two treatment groups: SSRI (n = 2653) and placebo (n = 1270).Missing HAMD-17 scores will be filled with LOCF (last observation carried forward) imputation, consistent with the original trials.A binary outcome variable reflecting response will be created. Response is defined as a HAMD-17 score decrease by 50% or more from baseline to the end of the trial (8 or 12 weeks, depending on the trial).Step 2. Treatment effects in the overall sampleDrug and placebo response rates with 95% confidence intervals and number-needed-to-treat statistics will be estimated to establish the pooled response rates in the entire study sample.Then, this will be repeated in only patients with screening/baseline HAMD Item 1 (depressed mood) scores >=2. Since it was not a requirement for most of these studies, we can gauge the utility of this now commonly used inclusion criterion.Step 3. Examination of alternative inclusion criteriaWe will then calculate the drug and placebo response rates with 95% confidence intervals and number-needed-to-treat statistics, stratified by the following criteria:a) Screening/Baseline HAMD Item 1 >=2 AND Item 7 >=2b) Screening/Baseline HAMD Item 1 >=3 OR Item 7 >=3c) Screening/Baseline HAMD Item 1 >=3d) Screening/Baseline HAMD Item 1 + 7 >=6e) Screening/Baseline HAMD Item 1 >=3 AND Item 7 >=3f) Screening/Baseline HAMD Item 1 ==4 OR Item 7 ==4These criteria allow us to examine the utility of requiring higher core depressive symptoms at study entry. We have made the assumption that the mean baseline score for item 1 (depressed mood) and item 7 (work/interests) in the pooled sample will likely be around 2.7-2.8 [10, 11, 19]. Thus, we expect the % of patients meeting each criterion to be around 50% (criteria a, b, and c) or less (criteria d, e, f). Based on a previous analysis [11], we anticipated very few patients would score 4 on both symptoms (<10%), thus we did not consider this to a feasible criterion to examine.In particular, these criteria were chosen to examine the utility of requiring higher scores on one particular core symptom (e.g., c), either symptom (e.g., b and f), or both (a, d, e).Our expectation is that, moving from criterion a through f, the % of patients meeting each criterion will be lower. This is expected to result in larger drug-placebo differences and smaller number needed-to-treat statistics, driven primarily by a reduction in placebo response rates.Non-overlapping confidence intervals for the drug-placebo response-rate difference between subjects meeting and not meeting a criterion will be considered evidence of treatment-effect heterogeneity. Only those with non-overlapping confidence intervals will be considered further in the analysis.We have not specified a method to select one “best” criterion because we have considered that different trials may benefit from different criteria. For example, trials that want to emphasize signal detection may opt for a criterion that is more restrictive (e.g., d, e, or f) but results in a larger drug-placebo difference compared to a less restrictive criterion. Conversely, a trial that wants to emphasize external validity may choose a criterion that results in a smaller treatment effect but excludes less patients.Instead, we are only looking to establish a list of possible alternative inclusion criteria by virtue of them selecting patient subgroups with significantly larger drug-placebo differences, as evidenced by non-overlapping confidence intervals. This can be confirmed using a logistic regression model, with a treatment-by-subgroup interaction, a significant interaction confirming an interpretation of treatment effect heterogeneity. If there are no non-overlapping confidence intervals, we would not proceed with further analysis in Step 4.Step 4. Heterogeneity and sensitivity analysisIn order to test for inter-study variability in the pooled estimate of treatment effect heterogeneity, we will use the individual patient data meta-analysis command developed by Fisher [20] for Stata. This allows us to re-estimate the effect size for each individual study (prior to pooling of data) then pool the study effects by using inverse-variance weights. A pooled estimate can be calculated that takes into account inter-study differences. This command can then calculate the Q statistic (an index of heterogeneity). Although we do not anticipate significant between-study heterogeneity given the similar trial methods and patient selection/sampling procedures, this will allow us to observe and test for between-study heterogeneity if it is present. The command allows the specification of treatment by factor interactions into the model (in this case, a treatment-by-subgroup interaction). Thus, we can test for inter-study heterogeneity in the measure of treatment-effect heterogeneity. The model to be used here is logistic regression, given the binary outcome. A significant interaction is interpreted as evidence of treatment-effect heterogeneity, similar to non-overlapping confidence intervals in Step 3.In addition to the above, we will re-estimate the drug and placebo response rates, stratified by each criterion with non-overlapping confidence intervals, using a leave-one-out-style sensitivity analyses. This allows us to observe whether significant interactions remain after sequentially excluding one of the 10 studies, in essence, ruling out that one outlier study resulted in the overall pooled estimate.In addition, we will calculate the drug and placebo response rates in the subgroup meeting a given inclusion criterion, but stratify this according to trial series: (GSK-29060 studies vs. AK- studies) and study duration (8 vs. 12 weeks) to see if the drug-placebo differences remain larger compared to the overall sample.In final logistic models, age and sex, then baseline HAMD-17 scores, can be included as covariates to see if treatment-subgroup interactions remain significant. Step 5. Comparison to HAMD-17 sum scoresWe will then use screening/baseline HAMD-17 sum scores to establish severity cut-offs that identify similarly sized subgroups compared to the criteria that appeared to result in larger treatment effects (e.g., if criterion c, d, and e resulted in significant treatment-effect heterogeneity, and isolated 50%, 40%, and 35% of the sample, we would calculate whatever HAMD-17 score cut-offs were needed to isolate 50%, 40% and 35% of the sample). We hypothesize that the drug-placebo response-rate differences will be smaller, with larger number-needed-to-treat statistics, with the criteria derived from the HAMD-17-based severity subgroups compared to their analogous Item 1/7-based subgroups.This is an indirect way of testing whether simply requiring higher HAMD-17 total score inclusion criteria would improve signal detection as efficiently as the criteria derived from the core symptom items. By efficiently, we mean excluding less patients from the sample overall to achieve the same drug-placebo difference. While it is possible that requiring a certain level of higher HAMD-17 sum scores would increase drug-placebo differences to a similar degree, we suspect that this would require excluding a larger % of patients from the sample. Thus, with this analysis, we can demonstrate the corollary: that comparably stringent criteria (i.e., criteria that exclude a similar number of patients) from the HAMD-17 do not improve signal detection as much as the core-symptom derived criteria. Another way to think of this analysis is that it is examining which index of severity is more proximally related to treatment outcomes.References[1] Furukawa, T. A., Cipriani, A., Leucht, S., Atkinson, L. Z., Ogawa, Y., Takeshima, N., ... & Salanti, G. (2018). Is placebo response in antidepressant trials rising or not? A reanalysis of datasets to conclude this long-lasting controversy. Evidence-Based Mental Health, 21(1), 1-3.[2] Furukawa, T. A., Maruo, K., Noma, H., Tanaka, S., Imai, H., Shinohara, K., ... & Cipriani, A. (2018). Initial severity of major depression and efficacy of new generation antidepressants: individual participant data meta‐analysis. Acta Psychiatrica Scandinavica, 137(6), 450-458.[3] Posternak, M. A., Zimmerman, M., Keitner, G. I., & Miller, I. W. (2002). A reevaluation of the exclusion criteria used in antidepressant efficacy trials. American Journal of Psychiatry, 159(2), 191-200.[4] Zimmerman, M., Clark, H. L., Multach, M. D., Walsh, E., Rosenstein, L. K., & Gazarian, D. (2015, September). Have treatment studies of depression become even less generalizable? A review of the inclusion and exclusion criteria used in placebo-controlled antidepressant efficacy trials published during the past 20 years. In Mayo Clinic Proceedings (Vol. 90, No. 9, pp. 1180-1186). Elsevier.[5] von Glischinski, M., von Brachel, R., Thiele, C., & Hirschfeld, G. (2021). Not sad enough for a depression trial? A systematic review of depression measures and cut points in clinical trial registrations. Journal of affective disorders, 292, 36-44.[6] Bandelow, B., Bauer, M., Vieta, E., El-Khalili, N., Gustafsson, U., Earley, W. R., & Eriksson, H. (2014). Extended release quetiapine fumarate as adjunct to antidepressant therapy in patients with major depressive disorder: pooled analyses of data in patients with anxious depression versus low levels of anxiety at baseline. The World Journal of Biological Psychiatry, 15(2), 155-166.[7] Laoutidis, Z. G., & Kioulos, K. T. (2015). Desvenlafaxine for the acute treatment of depression: a systematic review and meta-analysis. Pharmacopsychiatry, 25(06), 187-199. Trivedi, M. H., Bandelow, B., Demyttenaere, K., Papakosts, G. I., Szamosi, J., Earley, W., & Eriksson, H. (2013). Evaluation of the effects of extended release quetiapine fumarate monotherapy on sleep disturbance in patients with major depressive disorder: a pooled analysis of four randomized acute studies. International Journal of Neuropsychopharmacology, 16(8), 1733-1744.[8] Trivedi, M. H., Bandelow, B., Demyttenaere, K., Papakosts, G. I., Szamosi, J., Earley, W., & Eriksson, H. (2013). Evaluation of the effects of extended release quetiapine fumarate monotherapy on sleep disturbance in patients with major depressive disorder: a pooled analysis of four randomized acute studies. International Journal of Neuropsychopharmacology, 16(8), 1733-1744.[9] Peters, E. M., Zhang, Y., Lodhi, R., Li, H., & Balbuena, L. (2021). Melancholic features in bipolar depression and response to lamotrigine: A pooled analysis of five randomized placebo-controlled trials. Journal of clinical psychopharmacology, 41(3), 315-319.[10] Hieronymus, F., Lisinski, A., Nilsson, S., & Eriksson, E. (2019). Influence of baseline severity on the effects of SSRIs in depression: an item-based, patient-level post-hoc analysis. The Lancet Psychiatry, 6(9), 745-752.[11] Peters, E. M., Lodhi, R. J., Zhang, Y., Li, H., & Balbuena, L. (2021). Lamotrigine for acute bipolar depression: An exploratory item‐level analysis. Brain and Behavior, 11(8), e2222.[12] Timmerby, N., Andersen, J. H., Søndergaard, S., Østergaard, S. D., & Bech, P. (2017). A systematic review of the clinimetric properties of the 6-item version of the Hamilton Depression Rating Scale (HAM-D6). Psychotherapy and psychosomatics, 86(3), 141-149.[13] Martino, D. J., Szmulewicz, A. G., Valerio, M. P., & Parker, G. (2019). Melancholia: an attempt at definition based on a review of empirical data. The Journal of nervous and mental disease, 207(9), 792-798.[14] Valerio, M. P., Szmulewicz, A. G., & Martino, D. J. (2018). A quantitative review on outcome-to-antidepressants in melancholic unipolar depression. Psychiatry Research, 265, 100-110.[15] Peters, E. M., Bowen, R., & Balbuena, L. (2020). Melancholic depression and response to quetiapine: A pooled analysis of four randomized placebo-controlled trials. Journal of Affective Disorders, 276, 696-698.[16] Papakostas, G. I., Østergaard, S. D., & Iovieno, N. (2015). The nature of placebo response in clinical studies of major depressive disorder. The Journal of clinical psychiatry, 76(4), 12518.[17] Melander, H., Salmonson, T., Abadie, E., & van Zwieten-Boot, B. (2008). A regulatory Apologia—a review of placebo-controlled studies in regulatory submissions of new-generation antidepressants. European Neuropsychopharmacology, 18(9), 623-627.[18] Vöhringer PA, Ghaemi SN. Solving the antidepressant efficacy question: effect sizes in major depressive disorder. Clin Ther 2011;33:B49-B61.[19] Hieronymus, F., Emilsson, J. F., Nilsson, S., & Eriksson, E. (2016). Consistent superiority of selective serotonin reuptake inhibitors over placebo in reducing depressed mood in patients with major depression. Molecular psychiatry, 21(4), 523-530.[20] Fisher, D. J. (2015). Two-stage individual participant data meta-analysis and generalized forest plots. The Stata Journal, 15(2), 369-396.
Peters, Evyn M. MD; Aziz, Saba MPhil; Balbuena, Lloyd PhD. Examining Alternative Inclusion Criteria Based on Core Symptoms of Depression in Antidepressant Clinical Trials. Journal of Clinical Psychopharmacology ():10.1097/JCP.0000000000001926, November 18, 2024
DOI: 10.1097/JCP.0000000000001926