AI for drug discovery
(Source: Venomous Vector)







From Deep Learning to Deep Reasoning
A tutorial @KDD 2021, August 14th (Virtual).

TL;DR: This tutorial reviews recent developments to extend the capacity of neural networks to “learning to reason” from data, where the task is to determine if the data entails a conclusion.


A/Prof Truyen Tran
A/Prof Truyen Tran

Dr Vuong Le
Dr Vuong Le

Dr Hung Le
Dr Hung Le

Dr Thao Le
Dr Thao Le

Applied AI Institute, Deakin University

Slides (Part A | Part B | Part C)

The rise of big data and big compute has brought modern neural networks to many walks of digital life, thanks to the relative ease of construction of large models that scale to the real world. Current successes of Transformers and self-supervised pretraining on massive data have led some to believe that deep neural networks will be able to do almost everything whenever we have data and computational resources. However, this might not be the case. While neural networks are fast to exploit surface statistics, they fail miserably to generalize to novel combinations. Current neural networks do not perform deliberate reasoning – the capacity to deliberately deduce new knowledge out of the contextualized data. This tutorial reviews recent developments to extend the capacity of neural networks to “learning to reason” from data, where the task is to determine if the data entails a conclusion. This capacity opens up new ways to generate insights from data through arbitrary querying using natural languages without the need of predefining a narrow set of tasks.

Prerequisite: the tutorial assumes some familarity with deep learning.


The tutorial consists of three main parts. Part A covers the learning-to-reason framework, explains how neural networks can serve as a strong backbone for reasoning through its natural operations such as binding, attention & dynamic computational graphs. We will also show how neural networks can learn to perform combinatoric algorithms. Part B goes into more details how neural networks perform reasoning over unstructured and structured data, and across multiple modalities. Reasoning over sets, relations, graphs and time will be explained. Part C reviews more advanced topics including neural nets with external memories, learning to reason with limited labels, and recursive reasoning with theory of mind. A special attention will be paid to neural memories as a fundamental mechanism to support reasoning over entities, relations, and even neural programs. Whenever possible, case-studies in text understanding and visual question answering will be presented.

Existing related talks


Part A: Learning to reason framework (60 mins)

  • Reasoning as a prediction skill that can be learnt from data.
    • Question answering as zero-shot learning.
  • Neural network operations for learning to reason:
    • Concept-object binding.
    • Attention & transformers.
    • Dynamic neural networks, conditional computation & differentiable programming.
  • Reasoning as iterative representation refinement & query-driven program synthesis and execution.
    • Compositional attention networks.
    • Neural module networks.
  • Combinatorics reasoning.

Part B: Reasoning over unstructured and structured data (60 mins)

  • Cross-modality reasoning, the case of vision-language integration.
  • Reasoning as set-set interaction.
    • Query processing.
    • Context processing.
    • Dual-attention.
    • Conditional set functions.
  • Relational reasoning
    • Relation networks
    • Graph neural networks
    • Graph embedding.
    • Graph convolutional networks.
    • Graph attention.
    • Message passing.
    • Query-conditioned dynamic graph constructions
    • Reasoning over knowledge graphs.
  •  Temporal reasoning
    • Video question answering.

Part C: Advanced topics (60 mins)

  • Reasoning with external memories
    • Memory of entities – memory-augmented neural networks
    • Memory of relations with tensors and graphs
    • Memory of programs & neural program construction.
  • Learning to reason with less labels:
    • Data augmentation with analogical and counterfactual examples
    • Question generation
    • Self-supervised learning for question answering
    • Learning with external knowledge graphs
  • Recursive reasoning with neural theory of mind.


  1. Abboud, Ralph, Ismail Ceylan, and Thomas Lukasiewicz. "Learning to reason: Leveraging neural networks for approximate DNF counting.“ In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 3097-3104. 2020.

  2. Andreas, Jacob, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In CVPR, pages 39–48, 2016.

  3. Bahdanau, Dzmitry, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, and Aaron Courville. Systematic generalization: what is required and can it be learned? ICLR, 2019.

  4. Bai, Yunsheng, Derek Xu, Alex Wang, Ken Gu, Xueqing Wu, Agustin Marinovic, Christopher Ro, Yizhou Sun, and Wei Wang. Fast detection of maximum common subgraph via deep q-learning. arXiv preprint arXiv:2002.03129, 2020.

  5. Barcelo´, Pablo, Egor V Kostylev, Mikael Monet, Jorge Pe´rez, Juan Reutter, and Juan Pablo Silva. The logical expressiveness of graph neural networks. In International Conference on Learning Representations, 2020.

  6. Battaglia, Peter W, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.

  7. Bottou, Léon. “From machine learning to machine reasoning”. Machine learning 94.2 (2014): 133-149.

  8. Buckner, Cameron and James Garson. Connectionism. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2019 edition, 2019.

  9. Chen, Zhengdao, Lei Chen, Soledad Villar, and Joan Bruna. Can graph neural networks count substruc- tures? arXiv preprint arXiv:2002.04025, 2020.

  10. Dang, Long Hoang, et al. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering." IJCAI’21

  11. Dang, Long Hoang, Thao Minh Le, Vuong Le, and Truyen Tran. Object-centric relational reasoning for video question answering, IJCNN 2021.

  12. Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.

  13. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1), 2019.

  14. Eichenbaum, Howard, Memory, amnesia, and the hippocampal system (MIT press, 1993).

  15. Evans, Jonathan St and Keith E Stanovich. Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science, 8(3):223–241, 2013.

  16. Fan, Chenyou, et al. "Heterogeneous memory enhanced multimodal attention model for video question answering." CVPR’19.

  17. Feeney, Aidan and Valerie A Thompson. Reasoning as memory. Psychology Press, 2014.

  18. Fodor, Jerry A and Zenon W Pylyshyn. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3–71, 1988.

  19. Gao, Jiyang, et al. "Motion-appearance co-memory networks for video question answering." CVPR’18.

  20. Garcez, Artur d’Avila, Marco Gori, Luis C Lamb, Luciano Serafini, Michael Spranger, and Son N Tran. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. arXiv preprint arXiv:1905.06088, 2019.

  21. Garnelo, Marta and Murray Shanahan. Reconciling deep learning with symbolic artificial intelligence: representing objects and relations. Current Opinion in Behavioral Sciences, 29:17–23, 2019.

  22. Gasse, Maxime, Didier Che´telat, Nicola Ferroni, Laurent Charlin, and Andrea Lodi. Exact combinatorial optimization with graph convolutional neural networks. NeurIPS, 2019.

  23. Gokhale, Tejas, et al. "Mutant: A training paradigm for out-of-distribution generalization in visual question answering." EMNLP’20.

  24. Goyal, Anirudh, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Scho¨lkopf. Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893, 2019.

  25. Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).

  26. Graves, Alex, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska- Barwin´ska, Sergio Go´mez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.

  27. Greff, Klaus, Sjoerd van Steenkiste, and Jurgen Schmidhuber. “On the binding problem in artificial neural networks”. arXiv preprint arXiv:2012.05208, 2020.

  28. Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).

  29. Heit, Evan, and Brett K. Hayes. "Predicting reasoning from memory." Journal of Experimental Psychology: General 140, no. 1 (2011): 76.

  30. Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free energy." Advances in neural information processing systems. 2003.

  31. Hu, Ronghang, Anna Rohrbach, Trevor Darrell, and Kate Saenko. Language-conditioned graph networks for relational reasoning. ICCV, 2019.

  32. Hu, Ronghang, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. Learning to reason: End-to-end module networks for visual question answering. In ICCV, pages 804–813. IEEE, 2017.

  33. Hudson, Drew A and Christopher D Manning. Compositional attention networks for machine reasoning. ICLR, 2018.

  34. Kahneman, Daniel. Thinking, fast and slow. Farrar, Straus and Giroux New York, 2011.

  35. Khardon, Roni and Dan Roth. Learning to reason. Journal of the ACM (JACM), 44(5):697–725, 1997.

  36. Konkel, Alex and Neal J Cohen. Relational memory and the hippocampus: representations and methods. Frontiers in neuroscience, 3:23, 2009.

  37. Krishna, Ranjay, Michael Bernstein, and Li Fei-Fei. "Information maximizing visual question generation." CVPR’19.

  38. Kuleshov, Volodymyr and Stefano Ermon. Neural variational inference and learning in undirected graphical models. In Advances in Neural Information Processing Systems, pages 6734–6743, 2017.

  39. Lake, Brenden M, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. “Building machines that learn and think like people”. Behavioral and Brain Sciences, 40, 2017.

  40. Lamb, Luis C., Artur Garcez, Marco Gori, Marcelo Prates, Pedro Avelar, and Moshe Vardi. “Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective.” In Proceedings of IJCAI 2020.

  41. Le, Hung, , Truyen Tran, and Svetha Venkatesh. Learning to remember more with less memorization. In ICLR’19, 2019.

  42. Le, Hung, Truyen Tran, and Svetha Venkatesh. Neural stored-program memory. In ICLR, 2020.

  43. Le, Hung, Truyen Tran, and Svetha Venkatesh. Self-attentive associative memory. In ICML, 2020.

  44. Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Dynamic language binding in relational visual reasoning. In IJCAI, 2020.

  45. Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Hierarchical conditional relation networks for multimodal video question answering. International Journal of Computer Vision, 2021.

  46. Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Hierarchical conditional relation networks for video question answering. In CVPR, 2020.

  47. Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Neural reasoning, fast and slow, for video question answering. In IJCNN, 2020.

  48. Lei, Jie, et al. "Less is more: Clipbert for video-and-language learning via sparse sampling." CVPR’21.

  49. Lei, Jie, et al. "Tvqa: Localized, compositional video question answering." EMNLP’18.

  50. Lemos, Henrique, Marcelo Prates, Pedro Avelar, and Luis
    Lamb. Graph colouring meets deep learning: Effective graph neural network models for combinatorial problems. arXiv preprint arXiv:1903.04598, 2019.

  51. Li, Linjie, et al. "Hero: Hierarchical encoder for video+ language omni-representation pre-training." EMNLP’20.

  52. Liu, Yongfei, Bo Wan, Xiaodan Zhu, and Xuming He. Learning cross-modal context graph for visual grounding. AAAI, 2020.

  53. Ma, Qiang, Suwen Ge, Danyang He, Darshan Thaker, and Iddo Drori. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. arXiv preprint arXiv:1911.04936, 2019.

  54. Mao, Jiayuan, et al. “The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision.”, In International Conference on Learning Representations. 2019.

  55. Marcus, Gary. “Deep learning: A critical appraisal.” arXiv preprint arXiv:1801.00631 (2018).

  56. Marino, Kenneth, et al. "Ok-vqa: A visual question answering benchmark requiring external knowledge." CVPR’19.

  57. Morais, Romero, Vuong Le, Truyen Tran, and Svetha Venkatesh. Learning to abstract and predict human actions. In British Machine Vision Conference (BMVC), 2020.

  58. Narasimhan, M., Lazebnik, S., & Schwing, A. G. (2018). Out of the box: Reasoning with graph convolution nets for factual visual question answering. Advances in Neural Information Processing Systems, 2018, 2654-2665.

  59. Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian Conference on Machine Learning. PMLR, 2020.

  60. Palm, Rasmus Berg, Ulrich Paquet, and Ole Winther. "Recurrent Relational Networks." In NeurIPS. 2018.

  61. Pareja, Aldo, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, and Charles E Leisersen. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. AAAI, 2020.

  62. Perez, Ethan, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In AAAI, 2018.

  63. Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational dynamic memory networks." arXiv preprint arXiv:1808.04247 (2018).

  64. Prates, Marcelo, Pedro HC Avelar, Henrique Lemos, Luis C Lamb, and Moshe Y Vardi. Learning to solve np-complete problems: A graph neural network for decision tsp. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4731–4738, 2019.

  65. Rabinowitz, Neil C., et al. “Machine theory of mind.” In ICML (2018).

  66. Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understand- ing by generative pre-training, 2018.

  67. Ralph Abboud, Ismail Ilkan Ceylan, and Thomas Lukasiewicz. Learning to reason: Leveraging neural networks for approximate dnf counting. AAAI, 2020.

  68. Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).

  69. Rasmus Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. In NeurIPS, pages 3368–3378, 2018.

  70. Santoro, Adam, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Lillicrap, Tim. A simple neural network module for relational reasoning. In NIPS, pages 4974–4983, 2017.

  71. Santoro, Adam, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Vinyals, Oriol, Razvan Pascanu, and Timothy Lillicrap. Relational recurrent neural networks. NIPS, 2018.

  72. Sato, Ryoma, Makoto Yamada, and Hisashi Kashima. Approximation ratios of graph neural networks for combinatorial problems. arXiv preprint arXiv:1905.10261, 2019.

  73. Schlag, Imanol and Ju¨ rgen Schmidhuber. Learning to reason with third order tensor products. In Advances in Neural Information Processing Systems, pages 9981–9993, 2018.

  74. Seo et al., Dynamic coattention networks for question answering, ICLR 2017

  75. Seo, Minjoon, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. ICLR, 2017.

  76. Sukhbaatar, Sainbayar, Arthur Szlam, Jason Weston, and Rob Fergus. End-to-end memory networks. NIPS, 2015.

  77. Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19.

  78. Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020).

  79. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. “Attention is all you need”. In NIPS, 2017.

  80. Veličković, Petar, and Charles Blundell. "Neural Algorithmic Reasoning." arXiv preprint arXiv:2105.02761 (2021).

  81. Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. “Graph attention networks.”, In ICLR, 2018.

  82. Veličković, Petar, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell. "Neural Execution of Graph Algorithms." In International Conference on Learning Representations. 2019.

  83. Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015.

  84. Weston, J., Bordes, A., Chopra, S., Rush, A. M., van Merriënboer, B., Joulin, A., & Mikolov, T. (2015). Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698.

  85. Xiong, Caiming, Stephen Merity, and Richard Socher. Dynamic memory networks for visual and textual question answering. In International Conference on Machine Learning, pages 2397–2406, 2016.

  86. Xu, Keylu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, and Stefanie Jegelka. "What Can Neural Networks Reason About?." ICLR 2020 (2020).

  87. Yan, Yujun, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, and Milad Hashemi. "Neural Execution Engines: Learning to Execute Subroutines." Advances in Neural Information Processing Systems 33 (2020).

  88. Yang, Zhilin, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5753–5763, 2019.

  89. Yi, Kexin, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua B Tenenbaum. Clevrer: Collision events for video representation and reasoning. arXiv preprint arXiv:1910.01442, 2019.

  90. Yoon, KiJung, Renjie Liao, Yuwen Xiong, Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard Zemel, and Xaq Pitkow. Inference in probabilistic graphical models by graph neural networks. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 868–875. IEEE, 2019.

  91. Yu, Adams Wei, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. QANet: Combining Local Convolution with Global Self-Attention for Reading Compre- hension. ICLR, 2018.

  92. Zellers, Rowan, et al. "From recognition to cognition: Visual commonsense reasoning." CVPR’19.

  93. Zeng, Chengchang, Shaobo Li, Qin Li, Jie Hu, and Jianjun Hu. A survey on machine reading com- prehension: Tasks, evaluation metrics, and benchmark datasets. arXiv preprint arXiv:2006.11880, 2020.

  94. Zeng, Kuo-Hao, et al. "Leveraging video descriptions to learn video question answering." AAAI’17.

  95. Zhang, Yuyu, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, and Le Song. Can graph neural networks help logic reasoning? arXiv preprint arXiv:1906.02111, 2019.

  96. Zhao, Zhou, et al. "Video question answering via hierarchical dual-level attention network learning." ACL’17.