
From Deep
Learning to Deep Reasoning
A tutorial @KDD 2021, August
14th (Virtual).
TL;DR: This tutorial reviews recent
developments to extend the capacity of neural networks to “learning to
reason” from data, where the task is to determine if the data entails a
conclusion.
Presenters
A/Prof Truyen Tran

Dr Vuong Le

Dr Hung Le

Dr Thao Le

Applied AI Institute, Deakin University 
Slides
(Part A  Part B  Part C)
The
rise of big data and big compute has brought modern neural networks to
many walks of digital life, thanks to the relative ease of construction
of large models that scale to the real world. Current successes of
Transformers and selfsupervised pretraining on massive data have led
some to believe that deep neural networks will be able to do almost
everything whenever we have data and computational resources. However,
this might not be the case. While neural networks are fast to exploit
surface statistics, they fail miserably to generalize to novel
combinations. Current neural networks do not perform deliberate
reasoning – the capacity to deliberately deduce new knowledge out of
the contextualized data. This tutorial reviews recent developments to
extend the capacity of neural networks to “learning to reason” from
data, where the task is to determine if the data entails a conclusion.
This capacity opens up new ways to generate insights from data through
arbitrary querying using natural languages without the need of
predefining a narrow set of tasks.
Prerequisite:
the
tutorial assumes some familarity with deep learning.
Content
The
tutorial consists of three main parts. Part A covers the
learningtoreason framework, explains how neural networks can serve as
a strong backbone for reasoning through its natural operations such as
binding, attention & dynamic computational graphs. We will also
show how neural networks can learn to perform combinatoric algorithms.
Part B goes into
more details how neural networks perform reasoning over unstructured
and structured data, and across multiple modalities. Reasoning over
sets, relations, graphs and time will be explained. Part C reviews more
advanced topics including neural nets with external memories, learning to reason with limited labels, and recursive reasoning with theory of
mind. A special attention
will be paid to neural memories as a fundamental mechanism to support
reasoning over entities, relations, and even neural programs. Whenever
possible, casestudies in text understanding and visual question
answering will be presented.
Existing
related talks
Structure:
Part
A: Learning to reason framework (60 mins)
 Reasoning
as a prediction skill that can be learnt from data.
 Question
answering as zeroshot learning.
 Neural
network operations for learning to reason:
 Conceptobject
binding.
 Attention
& transformers.
 Dynamic
neural networks, conditional computation & differentiable
programming.
 Reasoning
as iterative representation refinement & querydriven program synthesis and execution.
 Compositional
attention networks.
 Combinatorics reasoning.
Part B:
Reasoning over unstructured and structured data (60 mins)
 Crossmodality
reasoning, the case of visionlanguage integration.
 Reasoning
as setset interaction.
 Context
processing.
 Dualattention.
 Conditional
set functions.
 Relational
reasoning
 Relation
networks
 Graph
neural networks
 Graph
embedding.
 Graph
convolutional networks.
 Graph
attention.
 Message
passing.
 Queryconditioned
dynamic graph constructions
 Reasoning
over knowledge graphs.
 Temporal
reasoning
 Video
question answering.
Part
C: Advanced topics (60 mins)
 Reasoning with external memories
 Memory of entities –
memoryaugmented neural networks
 Memory of relations with tensors
and graphs
 Memory of programs & neural
program construction.
 Learning to reason with less labels:
 Data augmentation with analogical
and counterfactual examples
 Question generation
 Selfsupervised learning for
question answering
 Learning with external knowledge
graphs
 Recursive reasoning with neural theory
of mind.

References

Abboud,
Ralph, Ismail Ceylan, and Thomas Lukasiewicz. "Learning to reason:
Leveraging neural networks for approximate DNF counting.“ In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 30973104. 2020.

Andreas, Jacob, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In CVPR, pages 39–48, 2016.

Bahdanau, Dzmitry, Shikhar Murty, Michael
Noukhovitch, Thien Huu Nguyen, Harm de Vries, and Aaron Courville.
Systematic generalization: what is required and can it be learned? ICLR, 2019.

Bai, Yunsheng, Derek Xu, Alex Wang, Ken Gu,
Xueqing Wu, Agustin Marinovic, Christopher Ro, Yizhou Sun, and Wei
Wang. Fast detection of maximum common subgraph via deep qlearning. arXiv preprint arXiv:2002.03129, 2020.

Barcelo´, Pablo, Egor V Kostylev, Mikael
Monet, Jorge Pe´rez, Juan Reutter, and Juan Pablo Silva. The logical
expressiveness of graph neural networks. In International Conference on Learning Representations, 2020.

Battaglia, Peter W, Jessica B Hamrick,
Victor Bapst, Alvaro SanchezGonzalez, Vinicius Zambaldi, Mateusz
Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan
Faulkner, et al. Relational inductive biases, deep learning, and graph
networks. arXiv preprint arXiv:1806.01261, 2018.

Bottou, Léon. “From machine learning to machine reasoning”. Machine learning 94.2 (2014): 133149.

Buckner, Cameron and James Garson. Connectionism. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2019 edition, 2019.

Chen, Zhengdao, Lei Chen, Soledad Villar, and Joan Bruna. Can graph neural networks count substruc tures? arXiv preprint arXiv:2002.04025, 2020.

Dang, Long Hoang, et al. "Hierarchical Objectoriented SpatioTemporal Reasoning for Video Question Answering." IJCAI’21

Dang, Long Hoang, Thao Minh Le, Vuong Le, and Truyen Tran. Objectcentric relational reasoning for video question answering, IJCNN 2021.

Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.

Devlin, Jacob, MingWei Chang, Kenton Lee,
and Kristina Toutanova. BERT: Pretraining of Deep Bidirectional
Transformers for Language Understanding. In NAACLHLT (1), 2019.

Eichenbaum, Howard, Memory, amnesia, and the hippocampal system (MIT press, 1993).

Evans, Jonathan St and Keith E Stanovich. Dualprocess theories of higher cognition: Advancing the debate. Perspectives on psychological science, 8(3):223–241, 2013.

Fan, Chenyou, et al. "Heterogeneous memory enhanced multimodal attention model for video question answering." CVPR’19.

Feeney, Aidan and Valerie A Thompson. Reasoning as memory. Psychology Press, 2014.

Fodor, Jerry A and Zenon W Pylyshyn. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(12):3–71, 1988.

Gao, Jiyang, et al. "Motionappearance comemory networks for video question answering." CVPR’18.

Garcez, Artur d’Avila, Marco Gori, Luis C
Lamb, Luciano Serafini, Michael Spranger, and Son N Tran.
Neuralsymbolic computing: An effective methodology for principled
integration of machine learning and reasoning. arXiv preprint arXiv:1905.06088, 2019.

Garnelo, Marta and Murray Shanahan.
Reconciling deep learning with symbolic artificial intelligence:
representing objects and relations. Current Opinion in Behavioral Sciences, 29:17–23, 2019.

Gasse, Maxime, Didier Che´telat, Nicola
Ferroni, Laurent Charlin, and Andrea Lodi. Exact combinatorial
optimization with graph convolutional neural networks. NeurIPS, 2019.

Gokhale, Tejas, et al. "Mutant: A training paradigm for outofdistribution generalization in visual question answering." EMNLP’20.

Goyal, Anirudh, Alex Lamb, Jordan Hoffmann,
Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Scho¨lkopf.
Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893, 2019.

Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).

Graves, Alex, Greg Wayne, Malcolm Reynolds,
Tim Harley, Ivo Danihelka, Agnieszka Grabska Barwin´ska, Sergio Go´mez
Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al.
Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.

Greff, Klaus, Sjoerd van Steenkiste, and Jurgen Schmidhuber. “On the binding problem in artificial neural networks”. arXiv preprint arXiv:2012.05208, 2020.

Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).

Heit, Evan, and Brett K. Hayes. "Predicting reasoning from memory." Journal of Experimental Psychology: General 140, no. 1 (2011): 76.

Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free energy." Advances in neural information processing systems. 2003.

Hu, Ronghang, Anna Rohrbach, Trevor Darrell, and Kate Saenko. Languageconditioned graph networks for relational reasoning. ICCV, 2019.

Hu, Ronghang, Jacob Andreas, Marcus
Rohrbach, Trevor Darrell, and Kate Saenko. Learning to reason:
Endtoend module networks for visual question answering. In ICCV, pages 804–813. IEEE, 2017.

Hudson, Drew A and Christopher D Manning. Compositional attention networks for machine reasoning. ICLR, 2018.

Kahneman, Daniel. Thinking, fast and slow. Farrar, Straus and Giroux New York, 2011.

Khardon, Roni and Dan Roth. Learning to reason. Journal of the ACM (JACM), 44(5):697–725, 1997.

Konkel, Alex and Neal J Cohen. Relational memory and the hippocampus: representations and methods. Frontiers in neuroscience, 3:23, 2009.

Krishna, Ranjay, Michael Bernstein, and Li FeiFei. "Information maximizing visual question generation." CVPR’19.

Kuleshov, Volodymyr and Stefano Ermon. Neural variational inference and learning in undirected graphical models. In Advances in Neural Information Processing Systems, pages 6734–6743, 2017.

Lake, Brenden M, Tomer D Ullman, Joshua B
Tenenbaum, and Samuel J Gershman. “Building machines that learn and
think like people”. Behavioral and Brain Sciences, 40, 2017.

Lamb, Luis C., Artur Garcez, Marco Gori,
Marcelo Prates, Pedro Avelar, and Moshe Vardi. “Graph Neural Networks
Meet NeuralSymbolic Computing: A Survey and Perspective.” In Proceedings of IJCAI 2020.

Le, Hung, , Truyen Tran, and Svetha Venkatesh. Learning to remember more with less memorization. In ICLR’19, 2019.

Le, Hung, Truyen Tran, and Svetha Venkatesh. Neural storedprogram memory. In ICLR, 2020.

Le, Hung, Truyen Tran, and Svetha Venkatesh. Selfattentive associative memory. In ICML, 2020.

Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Dynamic language binding in relational visual reasoning. In IJCAI, 2020.

Le, Thao Minh, Vuong Le, Svetha Venkatesh,
and Truyen Tran. Hierarchical conditional relation networks for
multimodal video question answering. International Journal of Computer Vision, 2021.

Le, Thao Minh, Vuong Le, Svetha Venkatesh,
and Truyen Tran. Hierarchical conditional relation networks for video
question answering. In CVPR, 2020.

Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. Neural reasoning, fast and slow, for video question answering. In IJCNN, 2020.

Lei, Jie, et al. "Less is more: Clipbert for videoandlanguage learning via sparse sampling." CVPR’21.

Lei, Jie, et al. "Tvqa: Localized, compositional video question answering." EMNLP’18.

Lemos, Henrique, Marcelo Prates, Pedro Avelar, and Luis
Lamb. Graph colouring meets deep learning: Effective graph neural network models for combinatorial problems. arXiv preprint arXiv:1903.04598, 2019.

Li, Linjie, et al. "Hero: Hierarchical encoder for video+ language omnirepresentation pretraining." EMNLP’20.

Liu, Yongfei, Bo Wan, Xiaodan Zhu, and Xuming He. Learning crossmodal context graph for visual grounding. AAAI, 2020.

Ma, Qiang, Suwen Ge, Danyang He, Darshan
Thaker, and Iddo Drori. Combinatorial optimization by graph pointer
networks and hierarchical reinforcement learning. arXiv preprint arXiv:1911.04936, 2019.

Mao, Jiayuan, et al. “The NeuroSymbolic
Concept Learner: Interpreting Scenes, Words, and Sentences From Natural
Supervision.”, In International Conference on Learning Representations. 2019.

Marcus, Gary. “Deep learning: A critical appraisal.” arXiv preprint arXiv:1801.00631 (2018).

Marino, Kenneth, et al. "Okvqa: A visual question answering benchmark requiring external knowledge." CVPR’19.

Morais, Romero, Vuong Le, Truyen Tran, and Svetha Venkatesh. Learning to abstract and predict human actions. In British Machine Vision Conference (BMVC), 2020.

Narasimhan, M., Lazebnik, S., &
Schwing, A. G. (2018). Out of the box: Reasoning with graph convolution
nets for factual visual question answering. Advances in Neural Information Processing Systems, 2018, 26542665.

Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian Conference on Machine Learning. PMLR, 2020.

Palm, Rasmus Berg, Ulrich Paquet, and Ole Winther. "Recurrent Relational Networks." In NeurIPS. 2018.

Pareja, Aldo, Giacomo Domeniconi, Jie Chen,
Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, and Charles
E Leisersen. Evolvegcn: Evolving graph convolutional networks for
dynamic graphs. AAAI, 2020.

Perez, Ethan, Florian Strub, Harm De Vries,
Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a
general conditioning layer. In AAAI, 2018.

Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational dynamic memory networks." arXiv preprint arXiv:1808.04247 (2018).

Prates, Marcelo, Pedro HC Avelar, Henrique
Lemos, Luis C Lamb, and Moshe Y Vardi. Learning to solve npcomplete
problems: A graph neural network for decision tsp. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4731–4738, 2019.

Rabinowitz, Neil C., et al. “Machine theory of mind.” In ICML (2018).

Radford, Alec, Karthik Narasimhan, Tim
Salimans, and Ilya Sutskever. Improving language understand ing by
generative pretraining, 2018.

Ralph Abboud, Ismail Ilkan Ceylan, and
Thomas Lukasiewicz. Learning to reason: Leveraging neural networks for
approximate dnf counting. AAAI, 2020.

Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).

Rasmus Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. In NeurIPS, pages 3368–3378, 2018.

Santoro, Adam, David Raposo, David G
Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and
Lillicrap, Tim. A simple neural network module for relational
reasoning. In NIPS, pages 4974–4983, 2017.

Santoro, Adam, Ryan Faulkner, David Raposo,
Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Vinyals,
Oriol, Razvan Pascanu, and Timothy Lillicrap. Relational recurrent
neural networks. NIPS, 2018.

Sato, Ryoma, Makoto Yamada, and Hisashi Kashima. Approximation ratios of graph neural networks for combinatorial problems. arXiv preprint arXiv:1905.10261, 2019.

Schlag, Imanol and Ju¨ rgen Schmidhuber. Learning to reason with third order tensor products. In Advances in Neural Information Processing Systems, pages 9981–9993, 2018.

Seo et al., Dynamic coattention networks for question answering, ICLR 2017

Seo, Minjoon, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. ICLR, 2017.

Sukhbaatar, Sainbayar, Arthur Szlam, Jason Weston, and Rob Fergus. Endtoend memory networks. NIPS, 2015.

Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19.

Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020).

Vaswani, A., Shazeer, N., Parmar, N.,
Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.
“Attention is all you need”. In NIPS, 2017.

Veličković, Petar, and Charles Blundell. "Neural Algorithmic Reasoning." arXiv preprint arXiv:2105.02761 (2021).

Veličković, Petar, Guillem Cucurull,
Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. “Graph
attention networks.”, In ICLR, 2018.

Veličković, Petar, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell. "Neural Execution of Graph Algorithms." In International Conference on Learning Representations. 2019.

Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015.

Weston, J., Bordes, A., Chopra, S., Rush,
A. M., van Merriënboer, B., Joulin, A., & Mikolov, T. (2015).
Towards aicomplete question answering: A set of prerequisite toy
tasks. arXiv preprint arXiv:1502.05698.

Xiong, Caiming, Stephen Merity, and Richard Socher. Dynamic memory networks for visual and textual question answering. In International Conference on Machine Learning, pages 2397–2406, 2016.

Xu, Keylu, Jingling Li, Mozhi Zhang, Simon
S. Du, Kenichi Kawarabayashi, and Stefanie Jegelka. "What Can Neural
Networks Reason About?." ICLR 2020 (2020).

Yan, Yujun, Kevin Swersky, Danai Koutra,
Parthasarathy Ranganathan, and Milad Hashemi. "Neural Execution
Engines: Learning to Execute Subroutines." Advances in Neural Information Processing Systems 33 (2020).

Yang, Zhilin, Zihang Dai, Yiming Yang,
Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet:
Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5753–5763, 2019.

Yi, Kexin, Chuang Gan, Yunzhu Li, Pushmeet
Kohli, Jiajun Wu, Antonio Torralba, and Joshua B Tenenbaum. Clevrer:
Collision events for video representation and reasoning. arXiv preprint arXiv:1910.01442, 2019.

Yoon, KiJung, Renjie Liao, Yuwen Xiong,
Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard Zemel, and Xaq
Pitkow. Inference in probabilistic graphical models by graph neural
networks. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 868–875. IEEE, 2019.

Yu, Adams Wei, David Dohan, MinhThang
Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. QANet:
Combining Local Convolution with Global SelfAttention for Reading
Compre hension. ICLR, 2018.

Zellers, Rowan, et al. "From recognition to cognition: Visual commonsense reasoning." CVPR’19.

Zeng, Chengchang, Shaobo Li, Qin Li, Jie
Hu, and Jianjun Hu. A survey on machine reading com prehension: Tasks,
evaluation metrics, and benchmark datasets. arXiv preprint arXiv:2006.11880, 2020.

Zeng, KuoHao, et al. "Leveraging video descriptions to learn video question answering." AAAI’17.

Zhang, Yuyu, Xinshi Chen, Yuan Yang, Arun
Ramamurthy, Bo Li, Yuan Qi, and Le Song. Can graph neural networks help
logic reasoning? arXiv preprint arXiv:1906.02111, 2019.

Zhao, Zhou, et al. "Video question answering via hierarchical duallevel attention network learning." ACL’17.
