deep learning
(Source: rdn-consulting)







Deep Learning 1.0 and Beyond

A tutorial @IEEE SSCI 2020, Canberra, December 1st (Virtual).

Slides (Part I; Part II)

Videos (Part IA, Part IB; Part IIA, Part IIB)

Deep Learning has taken the digital world by storm. As a general purpose technology, it is now present in all walks of life. Although the fundamental developments in methodology have been slowing down in the past few years, applications are flourishing with major breakthroughs in Computer Vision, NLP and Biomedical Sciences. The primary successes can be attributed to the availability of large labelled data, powerful GPU servers and programming frameworks, and advances in neural architecture engineering. This combination enables rapid construction of large, efficient neural networks that scale to the real world. But the fundamental questions of unsupervised learning, deep reasoning, and rapid contextual adaptation remain unsolved. We shall call what we currently have Deep Learning 1.0, and the next possible breakthroughs as Deep Learning 2.0.

Prerequisite: the tutorial assumes some familarity with deep learning.


This tutorial goes through recent advances in methods and applications of deep learning, and sketches a possible picture of the future. The tutorial is broadly divided into two parts: Deep Learning 1.0 and Deep Learning 20. In the first part, I start with the three classic architectures: feed forward, recurrent and convolutional neural networks. The tutorial then briefly reviews important concepts such as attention, fast weight and architecture search. Then I cover the two architectures that are at the frontier of current deep learning research: the Transformer family and Graph Neural Networks. The last topic is deep unsupervised learning, including the classic one like autoencoder and the latest algorithms like BERT and self-supervised techniques. In the second part of the tutorial, I discuss the reason why we need to move beyond the current deep learning paradigm, and then walk through emerging topics that may shape the next phase of research. In particular, I present a general dual-process cognitive architecture which can be implemented as a neural system. One of the main components in the architecture is the memory sub-system, which extends the capacity of the current fast parallel inference neural models. The capacity is realised through reasoning engines that carry out deliberative sequential inference. Finally, I cover the emerging topic of neural theory of mind, which is concerned with the social dimension of multi-agent learning systems.


Part I: Deep Learning 1.0 (105 mins)

  • Introduction (10 mins)
  • Classic models and concepts (25 mins)
  • Transformers (20 mins)
  • Graph neural networks (20 mins)
  • Deep unsupervised learning (20 mins)

Part II: Deep Learning 2.0 (90 mins)

  • Introduction (10 mins)
  • A system view (20 mins)
  • Neural memories (20 mins)
  • Neural reasoning (20 mins)
  • Neural theory of mind (20 mins)


  1. Anonymous, “Neural spatio-temporal reasoning with object-centric self-supervised learning”,

  2. Bello, Irwan, et al. "Neural optimizer search with reinforcement learning." ICML (2017).

  3. Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.

  4. Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2 (2014): 133-149.

  5. Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.

  6. Kien Do, Truyen Tran, and Svetha Venkatesh. "Graph Transformation Policy Network for Chemical Reaction Prediction." KDD’19.

  7. Kien Do, Truyen Tran, Svetha Venkatesh, “Learning deep matrix representations”,  arXiv preprint arXiv:1703.01454

  8. Gilmer, Justin, et al. "Neural message passing for quantum chemistry." ICML (2017).

  9. Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." ICLR (2017).

  10. Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free energy." Advances in neural information processing systems. 2003.

  11. Hudson, Drew A., and Christopher D. Manning. "Compositional attention networks for machine reasoning." ICLR (2018).

  12. Karras, T., Aila, T., Laine, S., & Lehtinen, J. "Progressive growing of GANs for improved quality, stability, and variation". ICLR (2018).

  13. Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725.

  14. Hung Le, Truyen Tran, Svetha Venkatesh, “Self-attentive associative memory”, ICML (2020).

  15. Hung Le, Truyen Tran, Svetha Venkatesh, “Neural stored-program memory”, ICLR (2020).

  16. Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual Reasoning”, IJCAI (2020).

  17. Le-Khac, Phuc H., Graham Healy, and Alan F. Smeaton. "Contrastive Representation Learning: A Framework and Review." arXiv preprint arXiv:2010.05113 (2020).

  18. Liu, Xiao, et al. "Self-supervised learning: Generative or contrastive." arXiv preprint arXiv:2006.08218 (2020).

  19. Marcus, Gary. "Deep learning: A critical appraisal." arXiv preprint arXiv:1801.00631 (2018).

  20. Mao, Jiayuan, et al. "The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision." International Conference on Learning Representations (2019).

  21. Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian Conference on Machine Learning. (2020)

  22. Penmatsa, Aravind, Kevin H. Wang, and Eric Gouaux. "X-ray structure of dopamine transporter elucidates antidepressant mechanism." Nature 503.7474 (2013): 85-90.

  23. Pham, Trang, et al. "Column Networks for Collective Classification." AAAI (2017).

  24. Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).

  25. Rabinowitz, Neil C., et al. "Machine theory of mind." ICML (2018).

  26. Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems (2015).

  27. Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020).

  28. Xie, Tian, and Jeffrey C. Grossman. "Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties." Physical review letters 120.14 (2018): 145301.

  29. You, Jiaxuan, et al. "GraphRNN: Generating realistic graphs with deep auto-regressive models." ICML (2018).