In this talk, I will present our recent work to decipher the evolution of viruses using machine-learning approaches.

In a first study [1], the goal was to analyze large HIV datasets for drug resistance mutations (DRMs). We used simple machine learning approaches (e.g., logistic classifiers) with high explanatory power to select descriptors/mutations that were highly correlated with the resistance status of training strains. Using this approach, six new mutations significantly associated with resistance were identified. However, our results likely indicate that all mutations directly conferring resistance have been found (which is good news!), and that our newly discovered DRMs are accessory or compensatory mutations. Moreover, apart from the accessory nature of the relationships we found, we did not find any significant signal of further, more subtle epistasis combining several mutations that individually do not seem to confer any resistance.

The second study [2] concerns phylodynamics. Given a phylogeny of strains, which is a proxy for the transmission tree among patients, the goal was to estimate key epidemiological parameters, such as R0, the basic reproductive number (expected number of transmissions per infectious case). The standard approach relies on mathematical modeling and complex systems of ordinary differential equations, which cannot be solved for trees with more than about 500 tips. We used a deep learning approach, where a neural network architecture was trained to estimate parameters from simulated trees generated for a broad range of parameters values. The approach proved to be both more accurate and much faster than the standard approach, allowing for the analysis of trees with tens of thousands of tips.

These two case studies are intended to show the interactions between machine learning and phylogenetics, which are expected to become increasingly common and fruitful.

[1] Blassel Luc, Tostevin Anna, Villabona-Arenas Christian Julian, Peeters Martine, Hue Stephane & Gascuel Olivier, Août 2021 — Using machine learning and big data to explore the drug resistance landscape in HIV. PLoS computational biology. Vol. 17, n° 8, e1008873–e1008873.

[2] Voznica J., Zhukova A., Boskova V., Saulnier E., Lemoine F., Moslonka-Levebvre M. & Gascuel O., 2022 — Deep learning from phylogenies to uncover the epidemiological dynamics of outbreaks. Nature Communications. N° 3896.