
Deep Learning 1.0 and Beyond
A tutorial @IEEE SSCI 2020, Canberra, December 1st (Virtual). Slides (Part I; Part II)
Videos (Part IA, Part IB; Part IIA, Part IIB)
Deep
Learning has taken the digital world by storm. As a general purpose
technology, it is now present in all walks of life. Although the
fundamental developments in methodology have been slowing down in the
past few years, applications are flourishing with major breakthroughs
in Computer Vision, NLP and Biomedical Sciences. The primary successes
can be attributed to the availability of large labelled data, powerful
GPU servers and programming frameworks, and advances in neural
architecture engineering. This combination enables rapid construction
of large, efficient neural networks that scale to the real world. But
the fundamental questions of unsupervised learning, deep reasoning, and
rapid contextual adaptation remain unsolved. We shall call what we
currently have Deep Learning 1.0, and the next possible breakthroughs
as Deep Learning 2.0.
Prerequisite: the
tutorial assumes some familarity with deep learning.
Content
This
tutorial goes through recent advances in methods and applications of
deep learning, and sketches a possible picture of the future. The
tutorial is broadly divided into two parts: Deep Learning 1.0 and Deep
Learning 20. In the first part, I start with the three classic
architectures: feed forward, recurrent and convolutional neural
networks. The tutorial then briefly reviews important concepts such as
attention, fast weight and architecture search. Then I cover the two
architectures that are at the frontier of current deep learning
research: the Transformer family and Graph Neural Networks. The last
topic is deep unsupervised learning, including the classic one like
autoencoder and the latest algorithms like BERT and selfsupervised
techniques. In the second part of the tutorial, I discuss the reason
why we need to move beyond the current deep learning paradigm, and then
walk through emerging topics that may shape the next phase of research.
In particular, I present a general dualprocess cognitive architecture
which can be implemented as a neural system. One of the main components
in the architecture is the memory subsystem, which extends the
capacity of the current fast parallel inference neural models. The
capacity is realised through reasoning engines that carry out
deliberative sequential inference. Finally, I cover the emerging topic
of neural theory of mind, which is concerned with the social dimension
of multiagent learning systems.
Structure: Part I: Deep Learning 1.0 (105 mins)
 Introduction (10 mins)
 Classic models and concepts (25 mins)
 Transformers (20 mins)
 Graph neural networks (20 mins)
 Deep unsupervised learning (20 mins)
Part II: Deep Learning 2.0 (90 mins)
 Introduction (10 mins)
 A system view (20 mins)
 Neural memories (20 mins)
 Neural reasoning (20 mins)
 Neural theory of mind (20 mins)

References

Anonymous, “Neural spatiotemporal
reasoning with objectcentric selfsupervised learning”,
https://openreview.net/pdf?id=rEaz5uTcL6Q

Bello, Irwan, et al. "Neural optimizer search with reinforcement learning." ICML (2017).

Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 17981828.

Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2 (2014): 133149.

Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.

Kien Do, Truyen Tran, and Svetha Venkatesh. "Graph Transformation Policy Network for Chemical Reaction Prediction." KDD’19.

Kien Do, Truyen Tran, Svetha Venkatesh, “Learning deep matrix representations”, arXiv preprint arXiv:1703.01454

Gilmer, Justin, et al. "Neural message passing for quantum chemistry." ICML (2017).

Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." ICLR (2017).

Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free energy." Advances in neural information processing systems. 2003.

Hudson, Drew A., and Christopher D. Manning. "Compositional attention networks for machine reasoning." ICLR (2018).

Karras, T., Aila, T., Laine, S., &
Lehtinen, J. "Progressive growing of GANs for improved quality,
stability, and variation". ICLR (2018).

Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697725.

Hung Le, Truyen Tran, Svetha Venkatesh, “Selfattentive associative memory”, ICML (2020).

Hung Le, Truyen Tran, Svetha Venkatesh, “Neural storedprogram memory”, ICLR (2020).

Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual Reasoning”, IJCAI (2020).

LeKhac, Phuc H., Graham Healy, and Alan F. Smeaton. "Contrastive Representation Learning: A Framework and Review." arXiv preprint arXiv:2010.05113 (2020).

Liu, Xiao, et al. "Selfsupervised learning: Generative or contrastive." arXiv preprint arXiv:2006.08218 (2020).

Marcus, Gary. "Deep learning: A critical appraisal." arXiv preprint arXiv:1801.00631 (2018).

Mao, Jiayuan, et al. "The NeuroSymbolic
Concept Learner: Interpreting Scenes, Words, and Sentences From Natural
Supervision." International Conference on Learning Representations (2019).

Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian Conference on Machine Learning. (2020)

Penmatsa, Aravind, Kevin H. Wang, and Eric
Gouaux. "Xray structure of dopamine transporter elucidates
antidepressant mechanism." Nature 503.7474 (2013): 8590.

Pham, Trang, et al. "Column Networks for Collective Classification." AAAI (2017).

Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).

Rabinowitz, Neil C., et al. "Machine theory of mind." ICML (2018).

Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "Endtoend memory networks." Advances in neural information processing systems (2015).

Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020).

Xie, Tian, and Jeffrey C. Grossman.
"Crystal Graph Convolutional Neural Networks for an Accurate and
Interpretable Prediction of Material Properties." Physical review letters 120.14 (2018): 145301.

You, Jiaxuan, et al. "GraphRNN: Generating realistic graphs with deep autoregressive models." ICML (2018).
