In this talk, I will present our recent work to decipher the evolution of viruses using machine-learning approaches.
In a first study [1], the goal was to analyze large HIV datasets for drug resistance mutations (DRMs). We used simple machine learning approaches (e.g., logistic classifiers) with high explanatory power to select descriptors/mutations that were highly correlated with the resistance status of training strains. Using this approach, six new mutations significantly associated with resistance were identified. However, our results likely indicate that all mutations directly conferring resistance have been found (which is good news!), and that our newly discovered DRMs are accessory or compensatory mutations. Moreover, apart from the accessory nature of the relationships we found, we did not find any significant signal of further, more subtle epistasis combining several mutations that individually do not seem to confer any resistance.
The second study [2] concerns phylodynamics. Given a phylogeny of strains, which is a proxy for the transmission tree among patients, the goal was to estimate key epidemiological parameters, such as R0, the basic reproductive number (expected number of transmissions per infectious case). The standard approach relies on mathematical modeling and complex systems of ordinary differential equations, which cannot be solved for trees with more than about 500 tips. We used a deep learning approach, where a neural network architecture was trained to estimate parameters from simulated trees generated for a broad range of parameters values. The approach proved to be both more accurate and much faster than the standard approach, allowing for the analysis of trees with tens of thousands of tips.
These two case studies are intended to show the interactions between machine learning and phylogenetics, which are expected to become increasingly common and fruitful.
[1] Blassel Luc, Tostevin Anna, Villabona-Arenas Christian Julian, Peeters Martine, Hue Stephane & Gascuel Olivier, Août 2021 — Using machine learning and big data to explore the drug resistance landscape in HIV. PLoS computational biology. Vol. 17, n° 8, e1008873–e1008873.
[2] Voznica J., Zhukova A., Boskova V., Saulnier E., Lemoine F., Moslonka-Levebvre M. & Gascuel O., 2022 — Deep learning from phylogenies to uncover the epidemiological dynamics of outbreaks. Nature Communications. N° 3896.
Olivier Gascuel studied mathematics and completed a PhD in computer science. He started working on bioinformatics by the end of the 1980’s, at the very beginning of the genomic era and of the rapid development of interactions between mathematicians, computer scientists and molecular biologists. His early interests were in sequence analysis and protein structure prediction, using machine-learning approaches. Since the mid-1990’s, Olivier Gascuel has concentrated on evolution and phylogenetics, with particular focus on the mathematical and computational tools and concepts. From 2015 to 2020 he was the head of the new Center for Bioinformatics, Biostatistics and Integrative Biology (C3BI) of the Pasteur Institute at Paris, with a particular interest in pathogens and their evolution. He joined the National Museum of Natural History of Paris in 2021, to develop research on biodiversity. He authored a large number of phylogeny software programs, some highly cited, and is a member of the French Academy of Science.
Web page: https://isyeb.mnhn.fr/fr/annuaire/olivier-gascuel-7496