**Proposal **1374

Title of Proposed Research

A novel representation of vaccine efficacy trial datasets for use in computer simulation of vaccination policy

Lead Researcher

Michael M. Wagner

Affiliation

University of Pittsburgh

Funding Source

NIH, MIDAS Informatics Services Group (ISG)

Potential Conflicts of Interest

None

Data Sharing Agreement Date

25 May 2016

Lay Summary

BACKGROUND

Computer simulation is important in vaccination policy analysis. It is really the only method available for diseases that are rare (e.g., Brouwers' 2006 study of smallpox vaccination) or urgent (e.g., Lee's 2010 study of vaccination policies for 2009 H1N1).

This proposal focuses on a particular method of computer simulation called ‘agent-based simulation' (ABS), which is the most realistic method for computer simulation of vaccination policy. Briefly, when using ABS to analyze a vaccination policy P, we first create a population of ‘agents' whose sociodemographic and disease characteristics match those of a population of interest. We then program the simulator to emulate policy P. In particular, we use published data about vaccine efficacy (VE) as the probability that an agent develops immunity as a result of being vaccinated.

A limitation of ABS analysis of vaccination policy is that published results of VE trials typically report a single overall VE, or VE conditioned on one covariate (e.g., age). Thus, ABS's potential to realistically simulate the effects of co-existing diseases, medications, age, gender and other socio-demographic characteristics of a population is under-used.

Thus, the BROAD OBJECTIVE OF THE PROPOSED RESEARCH is to improve the information available about VE for use in ABS analysis of vaccination policy.

IMPACT

We expect that an improvement in information about VE needed for vaccination policy analysis will lead to more effective use of vaccines, and ultimately improvements in health.

OBJECTIVE

The objective of the proposed research is to develop and evaluate using Bayesian Networks (BNs) as a more complete statistical representation of the results of VE trials. Our planned evaluation will study how the more complete statistical information changes the results of ABS simulation of policy.

METHOD

Bayesian Networks: We will use a BN to represent the statistical information in each VE trial dataset. A BN is a compact mathematical representation of the full-joint probability distribution over a set of variables.

Machine learning: We will use standard machine learning algorithms to infer the BN representation of a VE trial dataset.

Probability that an agent become immune during ABS: We will use a standard BN inference algorithm to obtain the VE for each vaccinated agent, conditioned on the agent's gender and other covariates of VE.

STUDY DESIGN

The study is a comparison of the existing method for releasing results of VE trials (i.e., tables in publications) with a new method that uses a BN representation of a VE trial dataset. As proof of concept, we will compare the number of infections predicted by an ABS vaccination policy analysis that uses published results with that of an analysis that uses the BN representation of the VE trial dataset.

PUBLICATION

We will communicate the results of this research via scientific publication in the field of medical informatics.

Study Data Provided

[{ "PostingID": 1286, "Title": "GSK-444563/024", "Description": "A multi-country & multi-center study to assess the efficacy, immunogenicity & safety of two doses of GSK Biologicals' oral live attenuated HRV vaccine given concomitantly with routine EPI vaccinations including OPV in healthy infants" },{ "PostingID": 1294, "Title": "GSK-102247", "Description": "A multi-country & multi-center study to assess the efficacy, safety & immunogenicity of 2 doses of GSK Biologicals' oral live attenuated human rotavirus (HRV) vaccine in healthy infants in co-administration with specific childhood vaccines" },{ "PostingID": 1295, "Title": "GSK-109810", "Description": "To assess long-term efficacy & safety of subjects approximately 3 years after priming with 2 doses of GlaxoSmithKline (GSK) Biologicals' oral live attenuated human rotavirus (HRV) vaccine (Rotarix) in the primary vaccination study (102247)." },{ "PostingID": 1296, "Title": "GSK-102248", "Description": "Multi-Center Study to Assess the Efficacy, Safety and Immunogenicity of 2 or 3 Doses of GSK Biologicals' Oral Live Attenuated Human Rotavirus (HRV) Vaccine Given Concomitantly With Routine EPI Vaccinations in Healthy Infants" },{ "PostingID": 1538, "Title": "GSK-104438", "Description": "A randomized, double-blind, placebo-controlled, post-marketing phase III Study to evaluate the efficacy of GSK Biologicals' influenza vaccine (Fluarix™) administered intramuscularly in adults." },{ "PostingID": 2204, "Title": "GSK-108134", "Description": "A study to demonstrate the efficacy of GSK Biologicals' influenza vaccine (Fluarix™) administered intramuscularly in adults" }]

Statistical Analysis Plan

We will use statistical analysis mainly in two steps of the proposed research: (i) machine-learning of Bayesian Network (BN) representation of vaccine efficacy (VE) trial datasets (ii) comparing the BN representation with the published VEs using agent-based simulation (ABS).Machine-learning of BNs from a VE trial dataset generally consists of three main steps: dataset preparation, learning BNs of the dataset using standard machine-learning algorithms, and evaluating BNs. We will use statistical analysis in the evaluation step to find the best BN representation of the dataset. We will use the best BN representation in the comparison step.1 MACHINE-LEARN BN REPRESENTATIONS1-1 Preparing VE Trial DatasetsWe will prepare each VE trial dataset to be used by machine-learning algorithms that can infer BN representation from the dataset.1-1-1 Handling Missing ValuesSome machine-learning algorithms for learning BNs cannot handle datasets with missing values. In order to use these algorithms, we should replace missing values (e.g., by mean or mode of existing values) or remove the data points that contain missing values.For handling missing values of a VE trial dataset we prefer the same approach taken by the VE trial to be able to machine-learn BN representation of the same dataset from which the VE trial has published VEs.1-1-2 Discretizing Continuous VariablesSome machine-learning algorithms for learning BNs cannot use datasets with continuous variables (e.g., age, vaccination time). We will discretize continuous variables using methods like binning. Binning groups the continuous values into a number of bins. Each bin represents a smaller interval in whole range of the continuous variable.1-2 Learning BNsFor each VE trial dataset we will use a number of standard machine-learning algorithms (e.g., PC, Greedy Thick Thinning, Bayesian Search) to obtain BN representations of the dataset. Each machine-learning algorithm takes a VE trial dataset as input and outputs a BN representation of the dataset.We will verify each BN representation by examining whether we can obtain published VEs from the BN representation of the VE trial dataset. 1-3 Evaluating BNsFor each VE trial dataset we will evaluate the obtained BN representations in order to find the best representation among them. In particular, we will plot the Receiver Operating Characteristic (ROC) curve for each BN and use the Area Under ROC curve (AUROC) as a measure for comparing the performance of BNs. Considering each BN as a binary classifier of an outcome variable such as disease, ROC curve demonstrates the sensitivity of each BN as a function of its specificity for different discrimination thresholds of the outcome variable. The AUROC illustrates how well a BN can distinguish between two values of the outcome variable. We will select the BN with the best AUROC as the BN representation of each VE trial dataset.2 COMPARE THE METHODSFor each VE trial dataset we will compare its selected BN representation with the VEs published by the VE trial. In particular, we will compare the simulated number of infections by ABS of a vaccination policy when using the VEs from the BN representation with the simulated number of infections by ABS of same vaccination policy when using published VEs.A statistically significant difference between the numbers of infections in two ABSs indicates that the result of vaccination-policy analysis can be different when using BN representation of a VE trial dataset instead published VEs.To analyze the difference between the numbers of infections in two ABSs, we will compare the Kaplan-Meier (K-M) curves of the simulations. We will use log-rank test to compare K-M curves with the null hypothesis that there is no difference regarding number of infections between two ABSs.ReferencesBrouwers, L., Mäkilä, K., & Camitz, M. (2006). Spridning av smittkoppor—Simulerings experiment [Spread of smallpox—Simulation experiment]. SMI-Rapport, 5, 2006T.Lee, B. Y., Brown, S. T., Korch, G. W., Cooley, P. C., Zimmerman, R. K., Wheaton, W. D., ... & Burke, D. S. (2010). A computer simulation of vaccine prioritization, allocation, and rationing during the 2009 H1N1 influenza pandemic. Vaccine, 28(31), 4875-4879.

Publication Citation

Tajgardoon M, Wagner MM, VisweswaraS, Zimmerman RK. A Novel Representation of Vaccine Efficacy Trial Datasets for use in Computer Simulation of Vaccination Policy. AMIA Summits on Translational Science Proceedings. 2018;2017:389.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961808/