Just a moment, the page is loading...
Browse ALL STUDIES
Keyword Search
View All Selected
Clear All
Login / Create Account
Login
Create Account
Home
About Us
Privacy Policy
Minimum System Requirements
How To Join
Mission
Data Sponsors
Researchers
How It Works
How to Request Data
Review of Requests
Data Sharing Agreement
Access to Data
Independent Review Panel
Metrics
FAQs
News
Help/Contact Us
Towards precision medicine in Chronic Obstructive Pulmonary Diseases: predicting individual treatment response and general outcomes in SUMMIT
Proposal
11340
Title of Proposed Research
Towards precision medicine in Chronic Obstructive Pulmonary Diseases: predicting individual treatment response and general outcomes in SUMMIT
Lead Researcher
Wim Janssens
Affiliation
University Hospitals Leuven Department of Respiratory Diseases Research Laboratory BREATHE, Department CHROMETA, KU Leuven Herestraat 49 3000 Leuven Belgium
Funding Source
Potential Conflicts of Interest
Data Sharing Agreement Date
10 June 2020
Lay Summary
With the constant increase in computational power and the rise of big data, artificial intelligence (AI) is rapidly moving forward and is currently invading different domains of healthcare. The central idea of the project is to use machine learning approaches to detect patterns between several baseline patient characteristics, interventions and particular outcomes of Randomized Controlled Trials (RCT). The advantage of machine learning, compared to the classical statistical models, is the ability to learn more complex patterns in the data without many assumptions about underlying distributions. At this stage, RCT are providing the highest level of evidence on the efficacy of any pharmaceutical intervention. These trials are absolutely needed in product development and obligatory in different phases before any drug will be considered by health authorities for reimbursement and broad clinical use.In RCT, the current statistical approaches are comparing groups and subgroups in terms of different predefined outcomes. These approaches are highly important to demonstrate the efficacy or futility of an intervention on group level. For a clinician in daily practice, treatment decisions must be made on individual level. However, it is still hard to predict if an individual patient will be a responder or non-responder to an intervention based on group evidence of its RCT. Machine learning models may overcome this problem by looking at the likelihood an individual is going to respond on a studied intervention. By learning from relationships in a detailed training data set, intelligent algorithms will be developed to predict probabilities of outcomes on an individual patient level.This project will focus on the Study to Understand Mortality and Morbidity (SUMMIT) dataset in the domain of Chronic Obstructive Pulmonary Diseases (COPD). COPD is most often caused by smoking and is currently the 3th leading cause of mortality worldwide (after cardiovascular disease and cancer). Different treatments via inhalers or oral drug intakes have been validated by large randomized controlled trials and are currently being used in daily practice. Unfortunately, many of the patients taking these medications daily, are not responding to the drug that has been prescribed to them. This is caused by the high heterogeneity and the complexity of the disease that is not yet well-understood. Some subgroups may respond better to the treatment while other subgroups of the disease do not respond at all. Intelligent tools that can identify responders from non-responders in advance, are therefore of utmost importance, not only from the individual perspective but also from a health-economic perspective.The objective of our research is to develop models based on different AI approaches, e.g., support vector machines, random forests and deep learning, and compare their accuracy performance in outcome prediction. Different sets of baseline features will be used to augment prediction power at minimal redundancy. Training and validation data sets will be used to develop ready-for-use approaches to allow fast evaluation of big pharma trials. In parallel, cloud-based software tools will be developed to allow clinicians to predict therapeutic responses. As such, artificial intelligence may be at the doorstep of personalized medicine in chronic respiratory diseases.
Study Data Provided
[{ "PostingID": 4109, "Title": "GSK-HZC113782", "Description": "A Clinical Outcomes Study to compare the effect of Fluticasone Furoate/Vilanterol Inhalation Powder 100/25mcg with placebo on Survival in Subjects with moderate Chronic Obstructive Pulmonary Disease (COPD) and a history of or at increased risk for cardiovascular disease" }]
Statistical Analysis Plan
The first step is to conduct an exploratory data analysis in order to understand the characteristics of the data. It is important to understand how each variable in the dataset behaves and to which extent values are missing. Correlations between the variables will be tested (Pearson and Spearman correlation) and preliminary regression analysis can give a first indication of which variables have a greater influence on the outcomes. This will be done both univariate and multivariate and the used model will depend on the outcome. Linear regression for continuous outcomes, logistic regression for discrete outcomes. For survival analysis, Kaplan-Meier curves and Cox proportional hazard models will be used. For testing differences between groups, we will use ANOVA followed by post hoc tests with Holm-Bonferroni correction for multiple tests.Before the data can be used by any model, some preprocessing will be required. This includes normalization or standardization, encoding of variables and handling null values. An important step is to handle the missing data. Methods like single imputation are able to handle this in a relatively simple way (e.g., mean imputation) but these primitive methods have their downsides like increasing bias. Alternative methods have been proposed and will be explored for this research. A statistically more powerful method is multiple imputation that considers the uncertainties of the imputations by averaging over multiple imputed datasets. Other methods stem from artificial intelligence and use samples of which the values for a variable are known, to learn the imputations. The methods used range from the simple k-nearest neighbors to Random Forests.The dataset then has to be prepared for machine learning. This will include feature engineering and possibly feature selection. During the feature engineering, the domain knowledge of the inhouse experts on pulmonology in our research laboratory can be incorporated. To reduce the model complexity and the risk of overfitting, feature selection can help while also speeding up computation time.Once these steps are completed and an understanding of the data has been obtained, the development of prediction models can be started. The dataset will be split in a training and test set. We will use the former to develop models and to choose the best one. Final evaluation will then be performed on the independent test set. The split ratio will initially be 80:20. Models will be evaluated by k-fold cross validation on the training set and will be repeated over different random splits to lower variance on the estimated error.We will first assess how the classical methods like linear and logistic regression perform to have a benchmark. By evaluating the results, we can get an idea of where the difficulties in the dataset lie. This will be followed by training existing models such as the simple k-nearest neighbors, support vector machines, random forests and ensemble methods. Each model usually contains some hyperparameters that need to be fine-tuned. This will be done during the cross validation using either a grid search or random search, depending on the computational load of each model and the size of the dataset. We can then have a first evaluation of the added value of artificial intelligence. After comparing the existing models, the focus and the big goal of this research is to develop new models that perform better. We will try to adjust and improve existing models or build completely new architectures in deep learning. Since machine learning on the data of randomized controlled trials is mostly unexplored, we will be very explorative in the different approaches tested and validated. Since an ultimate goal of our research is to develop software that has potential to be implemented in the clinical setting for certain predications, it is important for the end-users to trust the decisions our models make. A recently growing field in artificial intelligence is that of interpretability. Clinicians are usually skeptical about these black box models. Using existing interpretability methodologies, we will provide the reasoning behind the models' decisions and rank most important contributing features per individual for certain decisions. We think this would mean a great step towards a world in which clinicians, artificial intelligence, and the highest level of evidence from randomized trials complement each other. Ultimately, this should result in better decision making in general.
Publication Citation
Verstraete K, Gyselinck I, Huts H, et al Estimating individual treatment effects on COPD exacerbations by causal machine learning on randomised controlled trials Thorax 2023;78:983-989.
DOI:
http://dx.doi.org/10.1136/thorax-2022-219382
© 2024 ideaPoint. All Rights Reserved.
Powered by ideaPoint.
Help
Privacy Policy
Cookie Policy
Help and Resources