TY  - GEN
A1  - Kurilov, Roman
Y1  - 2019///
N2  - Despite significant progress in cancer research, effective cancer treatment is still a challenge. Cancer treatment approaches are shifting from standard cytotoxic chemotherapy regimens towards a precision oncology paradigm, where a choice of treatment is personalized, i.e. based on a tumor?s molecular features. In order to match tumor molecular features with therapeutics we need to identify biomarkers of response and build predictive models. Recent growth of large-scale pharmacogenomics resources which combine drug sensitivity and multi-omics information on a large number of samples provides necessary data for biomarker identification and drug response modelling. However, although many efforts of using this information for drug response prediction have been made, our ability to accurately predict drug response using genetic data remains limited.

In this work we used pharmacogenomics data from the largest publicly available studies in order to systematically assess various aspects of the drug response model-building process with the ultimate goal of improving prediction accuracy. We applied several machine learning methods (regularized regression, support vector machines, random forest) for predicting response to a number of drugs. We found that while accuracy of response prediction varies across drugs (in most of the cases R2 values vary between 0.1 and 0.3), different machine learning algorithms applied for the the same drug have similar prediction performance. Experiments with a range of different training sets for the same drug showed that predictive power of a model depends on the type of molecular data, the selected drug response metric, and the size of the training set. It depends less on number of features selected for modelling and on class imbalance in training set. We also implemented and tested two methods for improving consistency for pharmacogenomics data coming from different datasets.

We tested our ability to correctly predict response in xenografts and patients using models trained on cell lines. Only in a fraction of the tested cases we managed to get reasonably accurate predictions, particularly in case of response to erlotinib in the NSCLC xenograft cohort, and in cases of responses to erlotinib and docetaxel in the NSCLC and BRCA patient cohorts respectively.

This work also includes two applied pharmacogenomics analyses. The first is an analysis of a drug-sensitivity screen performed on a panel of Burkitt cell lines. This combines unsupervised data exploration with supervised modelling. The second is an analysis of drug-sensitivity data for the DKFZ-608 compound and the generation of the corresponding response prediction model.

In summary, we applied machine learning techniques to available high-throughput pharmacogenomics data to study the determinants of accurate drug response prediction. Our results can help to draft guidelines for building accurate models for personalized drug response prediction and therefore contribute to advancing of precision oncology.
UR  - https://archiv.ub.uni-heidelberg.de/volltextserver/26166/
ID  - heidok26166
TI  - Assessment of modeling strategies for drug response prediction in cell lines and xenografts
AV  - public
ER  -