Machine Learning and Reasoning for Drug Discovery
A tutorial @ECML-PKDD 2021.
TL;DR: This tutorial reviews recent
developments on drug discovery using machine learning methods.
|Applied AI Institute, Deakin University
Slides | Videos
by neural networks, modern machine learning has enjoyed great successes
in data-intensive domains such as computer vision and languages where
human can naturally perform well. Machine learning equipped with
reasoning is now accelerating fields that traditionally require deep
expertise such as physics, chemistry and biomedicine. This tutorial
provides an overview of how machine learning and reasoning are speeding
up and lowering the cost of drug discovery. This includes how machine
learning can help in wide range of areas such as novel molecule
identification, protein representation, drug-target binding, drug
re-purposing, generative drug design, chemical reaction, retrosynthesis
planning, drug-drug interaction, and safety assessment. We will also
discuss relevant machine learning models for graph classification,
molecular graph transformation, drug generation using deep generative
models and reinforcement learning, and chemical reasoning.
tutorial assumes some familarity with deep learning.
tutorial is broadly organised into three parts. Part A introduces the
drug discovery pipeline from virtual in silico screening to wet lab
experimentation to clinical trials. We will then explain how machine
learning and reasoning play the role in each of the stage in the
Part B focuses on representation learning of molecular structures and
predicting biochemical properties given the structures. On representing
drugs, we will cover traditional fingerprints, string representation
and learning, graph representation and learning. On representing
proteins, we will discuss recent unsupervised embedding techniques
operating on the sequences and 2D structures. Then we talk about how
drug and protein interact, and recent deep learning techniques to model
and predict their binding. Part B ends with the topic of polypharmacy
and predicting drug-drug interaction.
Part C covers the optimisation of molecular structures to meet
desirable drug properties, and generative models for goal-directed
exploration in the drug space. We also talk about the chain of
synthesis of target drugs, including reaction prediction and
retrosynthesis planing. Finally, we explain about reasoning in the
domain knowledge graphs with applications to recommendation and drug
How is this relevant to
Drug discovery is a scientific area of the most profound impact to
humanity. The field is steadily moving from being knowledge-driven
towards data-driven, where we now routinely screen hundreds of millions
of potential drugs, and explore the astronomically large chemical
space. Machine learning is making important contributions to the field,
finding new drugs for previously undruggable targets. On the one hand,
the current advances in deep learning coupled with big compute have
opened up new opportunities to accelerate the drug discovery pipeline.
On the other hand, the domain offers new challenges unseen before and
this has motivated the development of new kinds of modelling
techniques, especially in the area of graphs and geometric machine
Tran, “AI for drug discovery” (Slides
| Video)., A invited talk @VietAI Summit, HCM City,
Vietnam, Nov 2019 .
A: Introduction (30 mins)
- Drug discovery pipeline
learning tasks in drug discovery
Molecular representation and property prediction (90 mins)
- Self-supervised learning of molecules
regression and classification
- Explaining graph prediction
- Data efficient drug discovery
Protein representation learning
- Protein folding
Drug-target binding prediction
binding as graph-graph interaction
and drug-drug interaction.
C: Drug design & synthesis (90 mins)
- Molecular optimisation
optimisation in latent space
generative models for molecules
models for molecules
on biomedical knowledge graphs
- Chemical reaction as graph morphism
- Wrapping up & future
- Adhikari, B. (2019). "DEEPCON: Protein contact prediction using dilatedconvolutional neural networks with dropout". Bioinformatics,36(2),470–477
- Agrawal, A., & Choudhary, A. (2016).
"Perspective: Materials informatics and big data: Realization of the
“fourth paradigm” of science in materials science". Apl Materials, 4(5), 053208.
- Alley, Ethan C., et al. "Unified rational protein engineering with sequence-only deep representation learning." bioRxiv (2019): 589333.
- Altae-Tran, Han, et al. "Low data drug discovery with one-shot learning." ACS central science 3.4 (2017): 283-293.
- Aspuru-Guzik, Alán, Roland Lindh, and Markus Reiher. "The matter simulation (r) evolution." ACS central science 4.2 (2018): 144-152.
- Bepler, Tristan, and Bonnie Berger. "Learning protein sequence embeddings using information from structure." International Conference on Learning Representations. 2018.
- Bonner, S., Barrett, I. P., Ye, C., Swiers,
R., Engkvist, O., Bender, A., ... & Hamilton, W. (2021). "A review
of biomedical datasets relating to drug discovery: A knowledge graph
perspective". arXiv preprint arXiv:2102.10062.
- Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2 (2014): 133-149.
- Bradshaw, J., et al. "A model to search for synthesizable molecules." Advances in Neural Information Processing Systems 32 (2019).
- Callahan, Tiffany J., et al. "Knowledge-based biomedical data science." Annual review of biomedical data science 3 (2020): 23-41.
- Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.
- Cheng, Feixiong, et al. "Prediction of drug-target interactions and drug repositioning via network-based inference." PLoS computational biology 8.5 (2012): e1002503.
- Chithrananda, S., Grand, G., &
Ramsundar, B. (2020). "Chemberta: Large-scale self-supervised
pretraining for molecular property prediction". Machine Learning for Molecules Workshop NeurIPS 2020
- Devlin, J.et al.(2019). "BERT: Pre-training of Deep BidirectionalTransformers for Language Understanding". InProceedings
of the2019 Conference of the North American Chapter of the
Association forComputational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers), pages 4171–4186.
- Do, Kien, et al. "Attentional Multilabel Learning over Graphs-A message passing approach." Machine Learning, 2019.
- Do, Kien, Truyen Tran, and Svetha Venkatesh. "Graph Transformation Policy Network for Chemical Reaction Prediction." KDD’19.
- Do, Kien, Truyen Tran, and Svetha Venkatesh. "Knowledge graph embedding with multiple relation projections." 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018.
- Do, Kien, Truyen Tran, and Svetha Venkatesh. "Learning deep matrix representations." arXiv preprint arXiv:1703.01454 (2017).
- Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015.
- Elnaggar, Ahmed, et al. "ProtTrans: towards
cracking the language of Life's code through self-supervised deep
learning and high performance computing." arXiv preprint arXiv:2007.06225 (2020).
- Gilmer, Justin, et al. "Neural message passing for quantum chemistry." International conference on machine learning. PMLR, 2017..
- Grover, Aditya, and Jure Leskovec. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
- Gómez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-driven continuous representation of molecules." ACS Central Science (2016).
- Jin, W., Barzilay, R., & Jaakkola, T. (2018). "Junction Tree Variational Autoencoder for Molecular Graph Generation". ICML’18.
- Jin, W., Yang, K., Barzilay, R., &
Jaakkola, T. (2019). "Learning multimodal graph-to-graph translation
for molecular optimization". ICLR'19.
- Jumper, J., Evans, R., Pritzel, A., Green,
T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021).
"Highly accurate protein structure prediction with AlphaFold". Nature, 596(7873), 583-589.
- Kadurin, Artur, et al. "The cornucopia of
meaningful leads: Applying deep adversarial autoencoders for new
molecule development in oncology." Oncotarget 8.7 (2017): 10883.
- Kandathil, Shaun M., et al. "Ultrafast
end-to-end protein structure prediction enables high-throughput
exploration of uncharacterised proteins." bioRxiv (2021): 2020-11.
- Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725.
- Kuhlman, Brian, and Philip Bradley. "Advances in protein structure prediction and design." Nature Reviews Molecular Cell Biology 20.11 (2019): 681-697.
- Kusner, Matt J., Brooks Paige, and José Miguel Hernández-Lobato. "Grammar variational autoencoder." International Conference on Machine Learning. PMLR, 2017.
- Lee, Wing-Hin, et al. "The potential to treat lung cancer via inhalation of repurposed drugs." Advanced drug delivery reviews 133 (2018): 107-130.
- Lim, S., Lu, Y., Cho, C. Y., Sung, I., Kim,
J., Kim, Y., ... & Kim, S. (2021). "A review on compound-protein
interaction prediction methods: Data, format, representation and
model". Computational and Structural Biotechnology Journal, 19, 1541.
- Lipinski, Christopher A., et al.
"Experimental and computational approaches to estimate solubility and
permeability in drug discovery and development settings." Advanced drug delivery reviews 23.1-3 (1997): 3-25.
- Mahmood, Omar, et al. "Masked graph modeling for molecule generation." Nature communications 12.1 (2021): 1-12.
- Mohamed, S. K., Nováček, V., & Nounu, A. (2020). "Discovering protein drug targets using knowledge graph embeddings". Bioinformatics, 36(2), 603-610.
- Nguyen, T. M., Nguyen, T., Le, T. M., & Tran, T. (2021). “GEFA: Early Fusion Approach in Drug-Target Affinity Prediction”. IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Nguyen, T., Le, H., & Venkatesh, S.
(2019). "GraphDTA: prediction of drug–target binding affinity using
graph convolutional networks". Bioinformatics, 2021.
- Nguyen, Tri Minh, et al. "Counterfactual Explanation with Multi-Agent Reinforcement Learning for Drug Target Prediction." arXiv preprint arXiv:2103.12983 (2021).
- Paliwal, S., de Giorgio, A., Neil, D.,
Michel, J. B., & Lacoste, A. M. (2020). "Preclinical validation of
therapeutic targets predicted by tensor factorization on heterogeneous
graphs". Scientific reports, 10(1), 1-19.
- Penmatsa, Aravind, Kevin H. Wang, and Eric
Gouaux. "X-ray structure of dopamine transporter elucidates
antidepressant mechanism." Nature 503.7474 (2013): 85-90.
- Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
- Pham, T., Tran, T., & Venkatesh, S. (2018). "Relational dynamic memory networks". arXiv preprint arXiv:1808.04247.
- Pham, Trang, et al. (2017) "Column Networks for Collective Classification." AAAI.
- Pham, Trang, Truyen Tran, and Svetha Venkatesh (2018). "Graph Memory Networks for Molecular Activity Prediction." ICPR’18.
- Pushpakom, Sudeep, et al. "Drug repurposing: progress, challenges and recommendations." Nature reviews Drug discovery 18.1 (2019): 41-58.
- Qiu, Jiezhong, et al. (2020) "GCC: Graph contrastive coding for graph neural network pre-training." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
- Rao, R.et al.(2019). "Evaluating Protein Transfer Learning with TAPE". In Advances in Neural Information Processing Systems.
- Rong, Yu, et al. "Self-supervised graph transformer on large-scale molecular data." arXiv preprint arXiv:2007.02835 (2020).
- Réda, Clémence, Emilie Kaufmann, and Andrée Delahaye-Duriez. "Machine learning applications in drug development." Computational and structural biotechnology journal 18 (2020): 241-252.
- Senior, A. W.et al.(2020). "Improved protein structure prediction usingpotentials from deep learning". Nature, pages 1–5.
- Shi, Chence, et al. "A graph to graphs framework for retrosynthesis prediction." International Conference on Machine Learning. PMLR, 2020.
- Shi, Chence, et al. "GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation." International Conference on Learning Representations. 2019.
- Simonovsky, Martin, and Nikos Komodakis. "Graphvae: Towards generation of small graphs using variational autoencoders." International conference on artificial neural networks. Springer, Cham, 2018.
- Stokes, Jonathan M., et al. "A deep learning approach to antibiotic discovery." Cell 180.4 (2020): 688-702.
- Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
- Veličković, Petar, et al. "Graph Attention Networks." International Conference on Learning Representations. 2018.
- Yang, K. K., Wu, Z., Bedbrook, C. N., & Arnold, F. H. (2018). "Learned protein embeddings for machine learning". Bioinformatics, 34(15), 2642-2648.
- Ying, Rex, et al. "Gnnexplainer: Generating explanations for graph neural networks." Advances in neural information processing systems 32 (2019): 9240.
- You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation." NeurIPS (2018).
- You, Jiaxuan, et al. "GraphRNN: Generating realistic graphs with deep auto-regressive models." ICML (2018).
- Yuan, J., Jin, Z., Guo, H., Jin, H., Zhang,
X., Smith, T., & Luo, J. (2020). "Constructing biomedical
domain-specific knowledge graph with minimum supervision". Knowledge and Information Systems, 62(1), 317-336.
- Zhang, Daniel, et al. "The AI index 2021 annual report." arXiv preprint arXiv:2103.06312 (2021).
- Zhang, Rui, et al. "Drug repurposing for COVID-19 via knowledge graph completion." Journal of biomedical informatics 115 (2021): 103696.
- Zhou, Zhenpeng, et al. "Optimization of molecules via deep reinforcement learning." Scientific reports 9.1 (2019): 1-10.
- Zitnik, M., Agrawal, M., & Leskovec, J. (2018). "Modeling polypharmacy side effects with graph convolutional networks". Bioinformatics, 34(13), i457-i466.