1. Methodology 1.1. Criteria for study inclusionThe PICOS (population, interventions, comparator, outcomes, study design) criteria used to guide the selection of studies that are included in this systematic literature review are listed below:Population: Treatment-naive adults and adolescents with HIVInterventions and Comparators: Integrase inhibitors, non-nucleoside reverse transcriptase inhibitors, and protease inhibitors within a three treatment ART.Outcomes:• Viral suppression• Increase in CD4 cell counts• Mortality• AIDS defining illnesses• Discontinuation• Discontinuation due to adverse events• Retention• Severe adverse events Treatments will be differentiated according to the specific drugs, doses and frequencies of administration. The only drugs that will be considered interchangeable are lamivudine (3TC) and emtricitabine (FTC) due to their molecular likeness, referred to here as XTC. Non-standard doses will not considered reason for exclusion at the study selection process; however, non-standard doses that do not serve as connectors (i.e., were not compared to two or more treatments of interest) will be excluded in the final selection stage (following full text selection). ART regimens with a single antiviral agent and those with two agents that include one or more NRTI will not be considered eligible. Similarly, with the exception of boosted regimens, ART regimens with four or more agents will not eligible (e.g. NNRTI+PI+2NRTI). Trials that have mixed backbones will be included if the backbones are equally distributed across arms. Trials where backbones were selected prior to randomization are considered eligible. Trials failing to report on backbone distribution or reporting imbalanced backbone distributions will be excluded. Finally, conference abstracts published prior to 2012 will be excluded from the database searches. Non-English studies will be excluded as well, as will studies that included healthy volunteers.1.2. Literature search1.2.1. SourcesA comprehensive systematic search of the literature will be conducted using the following databases: Cochrane Central Register of Controlled Trials, EMBASE, and MEDLINE. Conference abstracts provided through the EMBASE search, as well as the International AIDS conference (AIDS), the annual Conference on Retroviruses and Opportunistic Infections (CROI), and the conference on HIV Pathogenesis, Treatment and Prevention (IAS) will also reviewed to determine if there are relevant studies that were recently completed (conference abstracts will be restricted to within the last three years). Additionally, hand searches of the bibliographies of published systematic reviews and health technology assessments will be performed. We will also perform manual searches of clinicaltrials.gov and metaRegister of Controlled Trials to identify RCTs that are not yet published but are potentially eligible for inclusion.1.2.2. Search strategyThe general search strategy will involve identifying papers according to the population of interest, and the inclusion of interventions and comparators of interest, and the restriction to randomized controlled trials. Population will be identified as having HIV or AIDS and not being treatment experienced or failing treatment. Our search further restricted on publication types that are not of interest (i.e., news letters and reviews). 1.2.3. Study selectionTwo investigators, working independently, will scan all abstracts and proceedings identified in the literature search. The same two investigators will independently review abstracts and proceedings potentially relevant in full-text. If any discrepancies occur between the studies selected by the two investigators, a third investigator will provide arbitration. 1.2.4. Study qualityThe validity of individual trials will be assessed using the Risk of Bias instrument, endorsed by the Cochrane Collaboration. This instrument is used to evaluate 7 key domains: sequence generation; allocation concealment; blinding of participants and personnel; blinding of outcome assessors; incomplete outcome data; selective outcome reporting; and other sources of bias. 1.2.5. Data extractionTwo investigators, working independently, will extract data on study characteristics, interventions, patient characteristics at baseline, and outcomes for the study populations of interest for the final list of selected eligible studies. Any discrepancies observed between the data extracted by the two data extractors will be resolved by involving a third reviewer and coming to a consensus. Data will be provided in a Microsoft Excel Workbook with sheets corresponding to the different information categories. For each continuous outcome, the change from baseline at the end of the randomized phase will be extracted, along with the corresponding sample size, standard deviation (SD) for mean change from baseline and measures of uncertainty (i.e. standard error (SE), 95% confidence intervals (CI), and p-value) for all relevant intervention groups. If the change from baseline is not provided, we will extract the score at the follow-up time point of interest and the baseline score, and calculated the change In such cases, the standard error of change will be estimated by combining the standard errors at both time points and using an outcome specific correlation coefficient (?).The outcome specific correlation can be obtained by first deriving the correlation from studies that reported both change and measurements at the both time points. Given that this was not available for the outcomes of interest, we used the conservative measure of 0.5. In cases where interquartile ranges (IQR) were provided, the length of the IQR was divided by 1.35 to estimate standard deviation. If the SE is not reported, it will be calculated according to the following hierarchy: based on the reported 95% CI by intervention group; SD by intervention group along with sample size; 95% CI of the difference between intervention groups; p-values by intervention groups; p-values for the difference between intervention groups. In cases where standard deviations are not provided, the average standard deviation among reported studies will be used. Measures of dispersion will be imputed for trials in which dispersion measures were not reported. Mean standard deviation will be used for imputation and standard errors will be derived from these.2.1. AnalysesChoice of analytical methodology was dependent on data availability. 2.1.1. Evaluation of consistency between direct and indirect comparisonsPrior to the NMA, the consistency between direct and indirect comparisons will be evaluated for networks that consisted of closed loops. For each of the comparisons (i.e. contrasts) that were part of a closed loop made up of more than 1 RCT, we will split the available trials into direct and indirect information. For each contrast in question, two (pooled) relative treatment effect estimates will be obtained, one with independent-means (or independent-effects) models using only the trials providing direct comparisons, and one based on an NMA of the remaining trials providing only indirect evidence. This iterative technique is called edge-splitting. The difference in estimates generated by the two sets of evidence will be evaluated with the Bucher test for inconsistency.2.1.2. Pairwise and network meta-analysesIn situations with very limited and sparse data, a narrative review will be used as an alternative to quantitative analysis. The latter were restricted to the sub-population analyses. When sufficient data are available for quantitative evidence synthesis, a conventional pairwise meta-analysis was employed as a first step. Pairwise meta-analyses will be conducted using the traditional frequentist approach using the DerSimonian-Laird random-effects model, and the I-squared measure will be used to gauge the degree of heterogeneity. When multiple treatments are available within the evidence base, we will employ network meta-analyses (NMA). All NMAs will be conducted within the Bayesian framework using Bayesian hierarchical models. Under the assumption of consistency, the NMA model relates the data from the individual studies to basic parameters reflecting the (pooled) relative treatment effect and safety profiles between interventions. Based on these parameters, the relative treatment effects between each of the contrasts in the network will be obtained.The NMA will be expanded to be conducted with individual patient-level data (IPD) in order to conduct meta-regression adjustments. Only using aggregate data for the purpose of meta-regression puts the analysis at risk of the ecological fallacy. It is well accepted that applying regression adjustments as per the relationships determined through IPD and applying these across a network is favourable to applying meta-regression from aggregate data only. The variables that we will adjust for are clinical variables (baseline CD4, viral load, ADI) and demographic variables (age, sex, ethnicity, etc). Regression will be conducted using hierarchical Bayesian models.For the methods piece, we will analyze the data modelling the control arm in trials as if it had not been observed and compare these to the results that would have been obtained using the full data set. The purpose here will be to compare our method to those currently proposed in the literature.For each outcome and subgroup of interest, fixed or random-effects models will be applied. Because some heterogeneity is always anticipated, random-effects models tend to be favoured. Model selection will be conducted using the deviance information criterion (DIC) according to NICE conventions. The DIC provides a measure of model fit that penalizes for model complexity. Model fit will also assessed using leverage plots and any outliers identified in this fashion will be investigated further. The model with the best fit will be chosen as the primary analysis model. 2.1.3. Node definitions and backbone adjustmentsGiven that the research questions for this project concern third agent antivirals (i.e., non-backbone antivirals), we choose to define the nodes in terms of specific antivirals rather than specific ART regimens. All treatments with multiple standard doses or frequency of administration will not be differentiated on this basis. For example, nevirapine 200 mg twice daily (bid) will be considered equivalent to nevirapine 400 mg once daily (qd). The only treatment with multiple doses that will be distinguished in the analysis was efavirenz (600 mg qd) and low dose efavirenz (400 mg qd). Defining nodes according to a single ARV rather than the full regimen significantly simplifies the interpretation of modeling and results. Nonetheless it is important to account for differences in backbone therapies. RCTs that use the same backbone in all trial arms do not require any adjustment in terms of backbones; however, RCTs employing different backbones require adjustments in order to properly measure the effect attributable to the antiviral agent comparison being estimated. Two approaches will be used to address differences in backbone regimens. First, backbone regimens will be categorized as TDF + XTC (the reference category), abacavir (ABC) + XTC, zidovudine (AZT) + XTC, and as other. The other category will include treatments such as stavudine (d4T) and didanosine (ddI) as well as the agents contained in the previous categories. We will use arm-specific meta-regression to adjust estimates according to differences in backbones according to these categories. The alternative approach is to simply reduce the evidence base to trials that do not differ with respect to backbones.The most notable trial to differ in backbones was the SINGLE trial comparing EFV to DTG, which is central to the research questions. Otherwise, trials that differ in backbones tend to be older and include older agents (e.g., nelfinavir, indinavir, etc.) or to be endonodal. Endonodal trials are those that compare a node to itself. Indeed some trials differing in backbone only will be included to improve the backbone meta-regression adjustments. Such trials, comparing EFV to EFV, will only of interest in the analysis using meta-regression adjustments for differences in backbones. The adjusted model served as the primary analysis; however, in outcomes where differences in backbones will be restricted to endonodal trials or a few older trials with dated regimens, the restricted model will be used instead.2.1.4. ModelsAll outcomes are either binary or continuous. Viral suppression and CD4 outcomes are frequently reported at multiple time points and will be analysed separately for each of the three time points of interest: 24 weeks, 48 weeks, and 96 weeks. The remaining outcomes tend to be reported at a single time point, which vary and typically coincided with trial duration. During the feasibility assessment stage, the relationship between follow-up time and outcomes was explored. The odds ratios are the more important consideration given they represent the effect being modelled. The odds ratios at multiple time points within a single trial were connected to further help determine whether follow-up time is an effect modifier to relative treatment effects. The odds ratios tend to be stable over time or include an equal amount of downward and upward trends. On this basis, we will model the relative treatment effects on all remaining variables using the outcomes combined across multiple time points. For studies reporting one of these outcomes at multiple time points, the values at longest follow-up will be used.For binary outcomes (mortality, AIDS defining illnesses, viral suppression, loss to follow-up, serious adverse events, and regimen substitutions) we will use a logistic regression model with the logit link function and a binomial likelihood. We choose to present results as odds ratios (OR) for these models so as to avoid the ceiling effect that limits relative risks (RR) for outcomes with proportions around 0.8 to 0.95. To test for the presence of heterogeneity both the fixed-effects and random-effects models will be employed. For the random-effects model, the conventional non-informative prior, a uniform distribution between 0 and 2, was applied to the between-trial standard deviation. For continuous outcomes (increase in CD4 count) we will use linear regression models with an identity link and normal likelihood. The data will be arm based, and we will model the differences in change from baseline between all informed treatment comparisons. Estimates of comparative efficacy will be represented as mean differences. 2.1.5. Sensitivity analysisFor viral suppression, we deem the intention-to-treat (ITT) outcomes for our primary analysis and considered the per-protocol outcomes as a sensitivity analysis. Additionally, multiple cut-off values will be reported for the definition of viral suppression. Newer trials tend to use a cut-off of <50 copies/mL, but some trials used higher cut-off values, <200 and <400 copies/ml, due to limited sensitivity of older assays. While the cut-off does affect the absolute count, we will explore whether they alter relative treatment effects. 2.1.6. Presentation of resultsThe primary output of the Bayesian NMA are posterior distributions of the relative treatment effects between all interventions in the networks, e.g. odds ratios for discontinuation and mean change from baseline in CD4 cell counts. The results for all outcomes will be presented with NMA crosstables as OR or mean differences (MD). The posterior distributions of relative treatment effects and modelled outcomes will be summarized by the median and 95% credible intervals (CrIs), which were constructed from the 2.5th and 97.5th percentiles. As this project pertains to questions particular to leading first line regimens, estimated comparative effects presented in the results section will not include treatments that are solely used as connectors and of limited interest (i.e., the older drugs). 2.1.7. SoftwareThe parameters of the different models will be estimated using a Markov Chain Monte Carlo (MCMC) method implemented in the OpenBUGS software package. A first series of iterations from the OpenBUGS sampler will be discarded as ‘burn-in' and the inferences will be based on additional iterations using two chains. Convergence of the chains will be confirmed by the Gelman-Rubin statistic. All analyses will performed using R version 3.3.1 (

http://www.r-project.org/) and OpenBugs version 3.2.3 (OpenBUGS Project Management Group).