Lupus is a chronic autoimmune disease that is difficult to treat because the type and severity of symptoms that occur may vary considerably between patients and also within patients over time. Safer and more effective medications for lupus are urgently needed, but the fluctuating nature of the symptoms makes it very challenging for clinical trials to prove that experimental treatments are superior to standard therapies. As a result, many lupus trials in the past have failed, and it is often unclear whether this failure is due to the drug not working or because the study designs, outcome measures, and methods for analyzing the data were not able to detect the treatment signals in the midst of considerable disease heterogeneity and other sources of background “noise”. The primary objective of this project is to use existing data from completed randomized SLE trials to obtain a greater understanding of the impact of disease heterogeneity on trial results, and to devise better methodological strategies for addressing this issue in future studies. We will apply various statistical methods to investigate the magnitude of between- and within-patient variability in treatment response patterns, and evaluate if clinical and laboratory patient characteristics, such as exposure to specific standard of care therapies, predict the likelihood and duration of response. We will also explore novel methods that make more efficient use of clinical trial data in evaluating treatment effects. Our expectation is that more stringent endpoints and analytic approaches that maximize information about a patient's disease activity during follow up will better discriminate experimental treatments from control treatments. The long-term goal in accomplishing our study aims is to substantially improve the methodological aspects of lupus trials and ultimately accelerate the discovery and approval of effective new therapies.
[{ "PostingID": 1416, "Title": "GSK-HGS1006-C1056", "Description": "A Phase 3, Multi-Center, Randomized, Double-Blind, Placebo-Controlled, 76-Week Study to Evaluate the Efficacy and Safety of Belimumab (HGS1006, LymphoStat-B™), a Fully Human Monoclonal Anti-BLyS Antibody, in Subjects with Systemic Lupus Erythematosus (SLE)" },{ "PostingID": 1417, "Title": "GSK-HGS1006-C1057", "Description": "A Phase 3, Multi-Center, Randomized, Double-Blind, Placebo-Controlled, 52-Wk Study to Evaluate the Efficacy and Safety of Belimumab (HGS1006, LymphoStat-B™), a Fully Human Monoclonal Anti-BLyS Antibody, in Subjects With Systemic Lupus Erythematosus (SLE)" }]
Aim 1. Characterize the magnitude and influence of disease heterogeneity in SLE clinical trials. 1.1 Between- and within-patient variability in response patterns during 52 weeks of follow up on standard of care and experimental therapies will be quantified, along with the intraclass correlation coefficient (ICC), for the following composite responder indices: SLE Responder Index (SRI-4, SRI-5, SRI-6), the British Isles Lupus Assessment Group (BILAG)-based Composite Lupus Assessment (BICLA) and Low Level of Disease Activity State (LLDAS). Confidence intervals for the ICC will be obtained with the method of Zou and Donner.19 Since the ICC is an average measure of the degree of stability in response across all subjects, pairwise correlations between visits will also be estimated to assess the temporal pattern in degree of agreement. In addition, heterogeneity in the correlation in repeated outcomes from the same subject will be assessed. The correlation is likely to be influenced by patient characteristics and treatment exposure; response may be more stable in patients on the experimental treatment arm, and our preliminary studies suggest high protein-creatinine ratios is associated with greater within-subject variability in response. We will identify factors contributing to the heterogeneity in correlations using the second-order generalized estimating equations (GEE) approach of Yan and Fine.20 If response Yij is measured at each of ki visits for the ith patient, the upper diagonal elements of Ri, the ki x ki correlation matrix for the outcomes, can be arranged into a vector of ρi pairwise correlations that is modeled as h(ρi) = Wiα, where h() is a known link function (e.g., Fisher transformation), Wi is a matrix of covariates (e.g., protein/creatinine), and α is a vector of parameters. The marginal mean of the response vector will be modeled using the logit link, both with and without Wi and other covariates as predictors in sensitivity analysis. Model parameters will be estimated using the geepak procedure in R.21 The stability of individual components of SLEDAI will be investigated using similar approaches to assess which are the biggest contributors to the variability in SRI and whether modifications at the item level are needed to develop more stable disease activity measures. Results regarding the sources and degree of variability in longitudinal response measures will be critical for determining the appropriate length of follow-up, visit schedules, and timing of interim analyses in future trials. 1.2. The influence of background medications and steroid regimens on clinical outcomes will be assessed. Rates of different composite response measures will be compared between background treatment groups (MMF, AZA, MTX, other) at specific visits, and logistic regression models will be fit to adjust for baseline steroid dose (as main effect and effect modifier) and other key baseline characteristics. Associations of background medications and time to flare will be evaluated using Cox proportional hazards models. Repeated measures of response and recurrent flares will also be analyzed using GEE methods and multivariate survival models (see 5.4.2), respectively, and the effect of cumulative levels of steroid exposure will be modeled with time-dependent covariates. Given the potential for confounding by indication from a large number of variables, propensity score methods for multiple treatments22 will be used to balance patient characteristics across SOC medication subgroups. Initially, these analyses will be performed separately in the SOC and experimental arms; the two arms will also be combined to evaluate the heterogeneity in experimental treatment effects across SOC subgroups. Inflation in the type 1 error rate due to testing of multiple response measures and SOC subgroups will be addressed by controlling the false discovery rate.23 1.3. Prognostic and predictive factors for clinical outcomes will be identified to improve trial design for a heterogeneous disease. Knowing the clinical and molecular characteristics which can influence a patient's outcome in a clinical trial helps in targeting patients for specific treatments and designing more efficient studies. To utilize these variables appropriately, however, one must have a clear understanding of whether they are prognostic factors, predictive factors, or both, and these distinctions have not been fully appreciated in the SLE literature. Prognostic factors are variables that influence disease outcomes in patients untreated or treated with standard therapy, and are important for identifying those with a good versus poor prognosis on SOC, and devising strategies for patient selection and risk stratification in clinical trials. Predictive factors, in contrast, indicate which characteristics influence the magnitude of the effect of a specific new treatment and the patient subgroups within a heterogeneous population that would benefit most from that treatment. Prognostic factors that lower response rates on SOC will not necessarily magnify treatment differences unless the factors also modify the experimental treatment effect. We will systematically evaluate baseline clinical and laboratory features for their predictive and prognostic effects to maximize their utility in designing future studies. The SOC arm of past trials provide the ideal data for evaluating prognostic factors.24 Earlier we identified several patient characteristics prognostic of persistent BILAG response on SoC such as higher C3 levels,10 but results must be confirmed with more data and with other response measures. Logistic regression models will be fit to different response outcomes with clinical and laboratory characteristics as the predictor variable. Model calibration and discrimination will be assessed with the Hosmer-Lemeshow test and area under the ROC curve. To address the potential for over-fitting, leave-one-out cross-validation will be performed. To investigate a variable's predictive effects, data from both the SOC and experimental treatment arms will be used to fit interaction terms between the predictive factor of interest and the treatment variable in the logistic regression models. Qualitative interactions (treatment works only in specific subgroups) as well as quantitative interactions (treatment works in all subgroups but magnitude differs) will be explored. Power calculations: To maximize power for Aim 1, the BLISS trials will be combined, resulting in approximately N=560 each in the placebo/SOC and belimumab arms. With this sample size and assuming an average of 11 measurements per patient, the maximum width of the 95% CI for the ICC for response will be ± 3.4%, indicating excellent precision. Assuming 10 - 20% of patients were taking AZA, MMF, or MTX, the minimum detectable absolute differences in response rates between background medication groups in each arm is < 21% with 80% power. The minimum detectable odds ratio (OR) for response for a continuous prognostic factor is 1.3 - 1.5 for a 1 SD increase above the mean, assuming the R2 between the prognostic factor and other covariates in the logistic model is 0 - 0.4 and 20% response rate in the reference group. Aim 2. Improve outcome measures and analytic approaches for detecting treatment differences in trial populations with substantial between- and within-patient heterogeneity in symptoms.2.1 Sustained response and response duration. We will evaluate whether measures of response that incorporate response duration magnify treatment differences in the BLISS trials. Rates of sustained response using various degrees of stringency, e.g., response at 100%, 80%, 60% of visits, or landmark visit only, will be estimated in the SOC and experimental arms over different time intervals during follow-up to assess when and at which degree of outcome stringency treatment differences are maximized. Response duration (sojourn and total time) on SOC and experimental therapy will also be estimated using a two-state reversible Markov model (MSM) which can handle complex incomplete longitudinal data, and provides a more comprehensive picture of the response profile and the role of patient characteristics in different aspects of response25. The average sojourn time in response is clinically important because it indicates how long response is expected to last if the patient does respond to therapy. The effects of treatment and other predictors on the likelihood of attaining and sustaining response will be modeled using the following proportional hazards regression approach: λ_ij (t|x)=λ_ij0 (t)exp(x^T β_ij) where λ_ij (t|x) is the transition intensity at time t between states i and j (i, j = 1 or 2 for non-response and response states, respectively) given x = (x1,…,xp)T, a (p x 1) vector of covariates which is assumed to be constant over time and across states; λ_ij0 (t) is the baseline transition intensity, and β_ij = (βij1,…,βijp)T is the corresponding vector of regression coefficients which quantifies the effect of x on transitions between states i and j. Baseline transition intensities will be modeled as both constant and piecewise constant over time. Parameters will be estimated using the maximum likelihood approach and analyses will be performed using the msm package in R.25 2.2 Multiple SLE flares: Patients can experience multiple flares over 52 weeks of follow up, yet SLE trials typically consider only time until the first flare.3-5,9 Analyzing only the first severe flare could exclude a substantial proportion of all the severe flares, reducing study power. Standard Poisson regression can be easily fit to analyze flare counts, but ignores the timing and correlations between events. Flares (BILAG and SLE Flare Index) in the BLISS trials will be analyzed using recurrent event methods which have been applied to other disease areas but not in SLE. Specifically, we will fit and compare the results of the Anderson-Gill (AG) and the Prentice, Williams and Peterson (PWP-total time and gap time) models.26 Frailty models will also be used to account for unmeasured heterogeneity that cannot be explained by covariates, and to obtain a more complete understanding of treatment effects on flares.27 Since the appropriate method will depend on the number of flares, relationship between flares, and other factors, guidelines for which approach to implement in SLE trials will be developed. Analyses will be performed using PROC PHREG in SAS28 and frailtypack in R.29Power: BLISS-52 and BLISS-76 will be analyzed separately in Aim 2 to assess whether the proposed methods improve detection of treatment differences in each trial compared to conventional methods. With a sample size of approximately 280 patients per arm (placebo/SOC and belimumab arms) in each trial, 80% power, and a two-sided α=5%, the minimum detectable absolute difference in sustained response rates between groups is 9% - 12% in each trial, assuming placebo rates of 10% - 30%. In addition, using the sample size approach for the Anderson-Gill model30, a minimum of 200 flares of any type in the high dose and placebo arms of each trial, a baseline hazard of 0.17 in placebo corresponding to a median time to flare of 4 months, a variance inflation factor of 1.4 to account for the within-subject correlation in multiple events, 80% power and α=5%, the minimum detectable hazard ratio between groups is 0.63 in each trial.
Kim M, Pradhan K, Izmirly P, Kalunian K, Hanrahan L, Merrill J. Identifying Subgroups of SLE Patients with Differential Responses to a BLyS Inhibitor: Application of a Machine Learning Algorithm to Clinical Trial Data [abstract]. Arthritis Rheumatol. 2019; 71 (suppl 10).
https://acrabstracts.org/abstract/identifying-subgroups-of-sle-patients-with-differential-responses-to-a-blys-inhibitor-application-of-a-machine-learning-algorithm-to-clinical-trial-data/