Learning to Deep Reasoning
A tutorial @KDD 2021, August
TL;DR: This tutorial reviews recent
developments to extend the capacity of neural networks to “learning to
reason” from data, where the task is to determine if the data entails a
A/Prof Truyen Tran
Dr Vuong Le
Dr Hung Le
Dr Thao Le
|Applied AI Institute, Deakin University
(Part A | Part B | Part C)
rise of big data and big compute has brought modern neural networks to
many walks of digital life, thanks to the relative ease of construction
of large models that scale to the real world. Current successes of
Transformers and self-supervised pretraining on massive data have led
some to believe that deep neural networks will be able to do almost
everything whenever we have data and computational resources. However,
this might not be the case. While neural networks are fast to exploit
surface statistics, they fail miserably to generalize to novel
combinations. Current neural networks do not perform deliberate
reasoning – the capacity to deliberately deduce new knowledge out of
the contextualized data. This tutorial reviews recent developments to
extend the capacity of neural networks to “learning to reason” from
data, where the task is to determine if the data entails a conclusion.
This capacity opens up new ways to generate insights from data through
arbitrary querying using natural languages without the need of
predefining a narrow set of tasks.
tutorial assumes some familarity with deep learning.
tutorial consists of three main parts. Part A covers the
learning-to-reason framework, explains how neural networks can serve as
a strong backbone for reasoning through its natural operations such as
binding, attention & dynamic computational graphs. We will also
show how neural networks can learn to perform combinatoric algorithms.
Part B goes into
more details how neural networks perform reasoning over unstructured
and structured data, and across multiple modalities. Reasoning over
sets, relations, graphs and time will be explained. Part C reviews more
advanced topics including neural nets with external memories, learning to reason with limited labels, and recursive reasoning with theory of
mind. A special attention
will be paid to neural memories as a fundamental mechanism to support
reasoning over entities, relations, and even neural programs. Whenever
possible, case-studies in text understanding and visual question
answering will be presented.
A: Learning to reason framework (60 mins)
as a prediction skill that can be learnt from data.
answering as zero-shot learning.
network operations for learning to reason:
neural networks, conditional computation & differentiable
as iterative representation refinement & query-driven program synthesis and execution.
- Combinatorics reasoning.
Reasoning over unstructured and structured data (60 mins)
reasoning, the case of vision-language integration.
as set-set interaction.
dynamic graph constructions
over knowledge graphs.
C: Advanced topics (60 mins)
- Reasoning with external memories
- Memory of entities –
memory-augmented neural networks
- Memory of relations with tensors
- Memory of programs & neural
- Learning to reason with less labels:
- Data augmentation with analogical
and counterfactual examples
- Question generation
- Self-supervised learning for
- Learning with external knowledge
- Recursive reasoning with neural theory
Ralph, Ismail Ceylan, and Thomas Lukasiewicz. "Learning to reason:
Leveraging neural networks for approximate DNF counting.“ In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 3097-3104. 2020.
Andreas, Jacob, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In CVPR, pages 39–48, 2016.
Bahdanau, Dzmitry, Shikhar Murty, Michael
Noukhovitch, Thien Huu Nguyen, Harm de Vries, and Aaron Courville.
Systematic generalization: what is required and can it be learned? ICLR, 2019.
Bai, Yunsheng, Derek Xu, Alex Wang, Ken Gu,
Xueqing Wu, Agustin Marinovic, Christopher Ro, Yizhou Sun, and Wei
Wang. Fast detection of maximum common subgraph via deep q-learning. arXiv preprint arXiv:2002.03129, 2020.
Barcelo´, Pablo, Egor V Kostylev, Mikael
Monet, Jorge Pe´rez, Juan Reutter, and Juan Pablo Silva. The logical
expressiveness of graph neural networks. In International Conference on Learning Representations, 2020.
Battaglia, Peter W, Jessica B Hamrick,
Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz
Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan
Faulkner, et al. Relational inductive biases, deep learning, and graph
networks. arXiv preprint arXiv:1806.01261, 2018.
Bottou, Léon. “From machine learning to machine reasoning”. Machine learning 94.2 (2014): 133-149.
Buckner, Cameron and James Garson. Connectionism. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2019 edition, 2019.
Chen, Zhengdao, Lei Chen, Soledad Villar, and Joan Bruna. Can graph neural networks count substruc- tures? arXiv preprint arXiv:2002.04025, 2020.
Dang, Long Hoang, et al. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering." IJCAI’21
Dang, Long Hoang, Thao Minh Le, Vuong Le, and Truyen Tran. Object-centric relational reasoning for video question answering, IJCNN 2021.
Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee,
and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding. In NAACL-HLT (1), 2019.
Eichenbaum, Howard, Memory, amnesia, and the hippocampal system (MIT press, 1993).
Evans, Jonathan St and Keith E Stanovich. Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science, 8(3):223–241, 2013.
Fan, Chenyou, et al. "Heterogeneous memory enhanced multimodal attention model for video question answering." CVPR’19.
Feeney, Aidan and Valerie A Thompson. Reasoning as memory. Psychology Press, 2014.
Fodor, Jerry A and Zenon W Pylyshyn. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3–71, 1988.
Gao, Jiyang, et al. "Motion-appearance co-memory networks for video question answering." CVPR’18.
Garcez, Artur d’Avila, Marco Gori, Luis C
Lamb, Luciano Serafini, Michael Spranger, and Son N Tran.
Neural-symbolic computing: An effective methodology for principled
integration of machine learning and reasoning. arXiv preprint arXiv:1905.06088, 2019.
Garnelo, Marta and Murray Shanahan.
Reconciling deep learning with symbolic artificial intelligence:
representing objects and relations. Current Opinion in Behavioral Sciences, 29:17–23, 2019.
Gasse, Maxime, Didier Che´telat, Nicola
Ferroni, Laurent Charlin, and Andrea Lodi. Exact combinatorial
optimization with graph convolutional neural networks. NeurIPS, 2019.
Gokhale, Tejas, et al. "Mutant: A training paradigm for out-of-distribution generalization in visual question answering." EMNLP’20.
Goyal, Anirudh, Alex Lamb, Jordan Hoffmann,
Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Scho¨lkopf.
Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893, 2019.
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
Graves, Alex, Greg Wayne, Malcolm Reynolds,
Tim Harley, Ivo Danihelka, Agnieszka Grabska- Barwin´ska, Sergio Go´mez
Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al.
Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.
Greff, Klaus, Sjoerd van Steenkiste, and Jurgen Schmidhuber. “On the binding problem in artificial neural networks”. arXiv preprint arXiv:2012.05208, 2020.
Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).
Heit, Evan, and Brett K. Hayes. "Predicting reasoning from memory." Journal of Experimental Psychology: General 140, no. 1 (2011): 76.
Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free energy." Advances in neural information processing systems. 2003.
Hu, Ronghang, Anna Rohrbach, Trevor Darrell, and Kate Saenko. Language-conditioned graph networks for relational reasoning. ICCV, 2019.
Hu, Ronghang, Jacob Andreas, Marcus
Rohrbach, Trevor Darrell, and Kate Saenko. Learning to reason:
End-to-end module networks for visual question answering. In ICCV, pages 804–813. IEEE, 2017.
Hudson, Drew A and Christopher D Manning. Compositional attention networks for machine reasoning. ICLR, 2018.
Kahneman, Daniel. Thinking, fast and slow. Farrar, Straus and Giroux New York, 2011.
Khardon, Roni and Dan Roth. Learning to reason. Journal of the ACM (JACM), 44(5):697–725, 1997.
Konkel, Alex and Neal J Cohen. Relational memory and the hippocampus: representations and methods. Frontiers in neuroscience, 3:23, 2009.
Krishna, Ranjay, Michael Bernstein, and Li Fei-Fei. "Information maximizing visual question generation." CVPR’19.
Kuleshov, Volodymyr and Stefano Ermon. Neural variational inference and learning in undirected graphical models. In Advances in Neural Information Processing Systems, pages 6734–6743, 2017.
Lake, Brenden M, Tomer D Ullman, Joshua B
Tenenbaum, and Samuel J Gershman. “Building machines that learn and
think like people”. Behavioral and Brain Sciences, 40, 2017.
Lamb, Luis C., Artur Garcez, Marco Gori,
Marcelo Prates, Pedro Avelar, and Moshe Vardi. “Graph Neural Networks
Meet Neural-Symbolic Computing: A Survey and Perspective.” In Proceedings of IJCAI 2020.
Le, Hung, , Truyen Tran, and Svetha Venkatesh. Learning to remember more with less memorization. In ICLR’19, 2019.
Le, Hung, Truyen Tran, and Svetha Venkatesh. Neural stored-program memory. In ICLR, 2020.
Le, Hung, Truyen Tran, and Svetha Venkatesh. Self-attentive associative memory. In ICML, 2020.
Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Dynamic language binding in relational visual reasoning. In IJCAI, 2020.
Le, Thao Minh, Vuong Le, Svetha Venkatesh,
and Truyen Tran. Hierarchical conditional relation networks for
multimodal video question answering. International Journal of Computer Vision, 2021.
Le, Thao Minh, Vuong Le, Svetha Venkatesh,
and Truyen Tran. Hierarchical conditional relation networks for video
question answering. In CVPR, 2020.
Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Neural reasoning, fast and slow, for video question answering. In IJCNN, 2020.
Lei, Jie, et al. "Less is more: Clipbert for video-and-language learning via sparse sampling." CVPR’21.
Lei, Jie, et al. "Tvqa: Localized, compositional video question answering." EMNLP’18.
Lemos, Henrique, Marcelo Prates, Pedro Avelar, and Luis
Lamb. Graph colouring meets deep learning: Effective graph neural network models for combinatorial problems. arXiv preprint arXiv:1903.04598, 2019.
Li, Linjie, et al. "Hero: Hierarchical encoder for video+ language omni-representation pre-training." EMNLP’20.
Liu, Yongfei, Bo Wan, Xiaodan Zhu, and Xuming He. Learning cross-modal context graph for visual grounding. AAAI, 2020.
Ma, Qiang, Suwen Ge, Danyang He, Darshan
Thaker, and Iddo Drori. Combinatorial optimization by graph pointer
networks and hierarchical reinforcement learning. arXiv preprint arXiv:1911.04936, 2019.
Mao, Jiayuan, et al. “The Neuro-Symbolic
Concept Learner: Interpreting Scenes, Words, and Sentences From Natural
Supervision.”, In International Conference on Learning Representations. 2019.
Marcus, Gary. “Deep learning: A critical appraisal.” arXiv preprint arXiv:1801.00631 (2018).
Marino, Kenneth, et al. "Ok-vqa: A visual question answering benchmark requiring external knowledge." CVPR’19.
Morais, Romero, Vuong Le, Truyen Tran, and Svetha Venkatesh. Learning to abstract and predict human actions. In British Machine Vision Conference (BMVC), 2020.
Narasimhan, M., Lazebnik, S., &
Schwing, A. G. (2018). Out of the box: Reasoning with graph convolution
nets for factual visual question answering. Advances in Neural Information Processing Systems, 2018, 2654-2665.
Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian Conference on Machine Learning. PMLR, 2020.
Palm, Rasmus Berg, Ulrich Paquet, and Ole Winther. "Recurrent Relational Networks." In NeurIPS. 2018.
Pareja, Aldo, Giacomo Domeniconi, Jie Chen,
Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, and Charles
E Leisersen. Evolvegcn: Evolving graph convolutional networks for
dynamic graphs. AAAI, 2020.
Perez, Ethan, Florian Strub, Harm De Vries,
Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a
general conditioning layer. In AAAI, 2018.
Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational dynamic memory networks." arXiv preprint arXiv:1808.04247 (2018).
Prates, Marcelo, Pedro HC Avelar, Henrique
Lemos, Luis C Lamb, and Moshe Y Vardi. Learning to solve np-complete
problems: A graph neural network for decision tsp. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4731–4738, 2019.
Rabinowitz, Neil C., et al. “Machine theory of mind.” In ICML (2018).
Radford, Alec, Karthik Narasimhan, Tim
Salimans, and Ilya Sutskever. Improving language understand- ing by
generative pre-training, 2018.
Ralph Abboud, Ismail Ilkan Ceylan, and
Thomas Lukasiewicz. Learning to reason: Leveraging neural networks for
approximate dnf counting. AAAI, 2020.
Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).
Rasmus Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. In NeurIPS, pages 3368–3378, 2018.
Santoro, Adam, David Raposo, David G
Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and
Lillicrap, Tim. A simple neural network module for relational
reasoning. In NIPS, pages 4974–4983, 2017.
Santoro, Adam, Ryan Faulkner, David Raposo,
Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Vinyals,
Oriol, Razvan Pascanu, and Timothy Lillicrap. Relational recurrent
neural networks. NIPS, 2018.
Sato, Ryoma, Makoto Yamada, and Hisashi Kashima. Approximation ratios of graph neural networks for combinatorial problems. arXiv preprint arXiv:1905.10261, 2019.
Schlag, Imanol and Ju¨ rgen Schmidhuber. Learning to reason with third order tensor products. In Advances in Neural Information Processing Systems, pages 9981–9993, 2018.
Seo et al., Dynamic coattention networks for question answering, ICLR 2017
Seo, Minjoon, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. ICLR, 2017.
Sukhbaatar, Sainbayar, Arthur Szlam, Jason Weston, and Rob Fergus. End-to-end memory networks. NIPS, 2015.
Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19.
Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020).
Vaswani, A., Shazeer, N., Parmar, N.,
Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.
“Attention is all you need”. In NIPS, 2017.
Veličković, Petar, and Charles Blundell. "Neural Algorithmic Reasoning." arXiv preprint arXiv:2105.02761 (2021).
Veličković, Petar, Guillem Cucurull,
Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. “Graph
attention networks.”, In ICLR, 2018.
Veličković, Petar, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell. "Neural Execution of Graph Algorithms." In International Conference on Learning Representations. 2019.
Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015.
Weston, J., Bordes, A., Chopra, S., Rush,
A. M., van Merriënboer, B., Joulin, A., & Mikolov, T. (2015).
Towards ai-complete question answering: A set of prerequisite toy
tasks. arXiv preprint arXiv:1502.05698.
Xiong, Caiming, Stephen Merity, and Richard Socher. Dynamic memory networks for visual and textual question answering. In International Conference on Machine Learning, pages 2397–2406, 2016.
Xu, Keylu, Jingling Li, Mozhi Zhang, Simon
S. Du, Ken-ichi Kawarabayashi, and Stefanie Jegelka. "What Can Neural
Networks Reason About?." ICLR 2020 (2020).
Yan, Yujun, Kevin Swersky, Danai Koutra,
Parthasarathy Ranganathan, and Milad Hashemi. "Neural Execution
Engines: Learning to Execute Subroutines." Advances in Neural Information Processing Systems 33 (2020).
Yang, Zhilin, Zihang Dai, Yiming Yang,
Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet:
Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5753–5763, 2019.
Yi, Kexin, Chuang Gan, Yunzhu Li, Pushmeet
Kohli, Jiajun Wu, Antonio Torralba, and Joshua B Tenenbaum. Clevrer:
Collision events for video representation and reasoning. arXiv preprint arXiv:1910.01442, 2019.
Yoon, KiJung, Renjie Liao, Yuwen Xiong,
Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard Zemel, and Xaq
Pitkow. Inference in probabilistic graphical models by graph neural
networks. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 868–875. IEEE, 2019.
Yu, Adams Wei, David Dohan, Minh-Thang
Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. QANet:
Combining Local Convolution with Global Self-Attention for Reading
Compre- hension. ICLR, 2018.
Zellers, Rowan, et al. "From recognition to cognition: Visual commonsense reasoning." CVPR’19.
Zeng, Chengchang, Shaobo Li, Qin Li, Jie
Hu, and Jianjun Hu. A survey on machine reading com- prehension: Tasks,
evaluation metrics, and benchmark datasets. arXiv preprint arXiv:2006.11880, 2020.
Zeng, Kuo-Hao, et al. "Leveraging video descriptions to learn video question answering." AAAI’17.
Zhang, Yuyu, Xinshi Chen, Yuan Yang, Arun
Ramamurthy, Bo Li, Yuan Qi, and Le Song. Can graph neural networks help
logic reasoning? arXiv preprint arXiv:1906.02111, 2019.
Zhao, Zhou, et al. "Video question answering via hierarchical dual-level attention network learning." ACL’17.