generated digits

[Source: rdn-consulting]  

Home
AI Future page


 

 AI Future Projects

 Projects
» New inductive biases in deep learning
» Memory architectures for neural networks
» Indirection mechanisms for generalization
» Compositional reasoning in vision-language
» Collaborative priors for LLM multi-agents
» Learning for structural reasoning
» Theory of mind architectures
» Efficient exploration of combinatorial spaces
» Theory of mind in LLMs
» Scaling with sparse mixture of experts
» Theory-informed machine learning
» Representing and reasoning over noisy data
» Structured reasoning in video
» Understanding human behaviours in video
» Visual question answering
» Video dialog
 
Projects (Finished)
      

 

   


New inductive biases in deep learning

This research project explores novel architectural designs for neural networks, drawing direct inspiration from biological neural systems. By studying the structural organization of the brain, particularly the columnar architecture of the neocortex and the routing mechanisms of the thalamus, we develop modular networks optimized for diverse data types including matrices, tensors, graphs, and relational data. Our approach integrates key cognitive mechanisms such as working memory for enhanced problem-solving capabilities and episodic memory for temporal information integration. This biomimetic framework aims to improve neural network performance by incorporating proven solutions from neuroscience, potentially leading to more robust and adaptable AI systems.

Column networks
Column Networks, as inspired by the cortical columns, to solve multi-relational learning.

Memory architectures for neural networks

While current deep learning systems demonstrate remarkable capabilities in pattern recognition, they struggle with higher-order cognitive tasks like complex system manipulation, rapid adaptation, and maintaining coherent long-term interactions. This research addresses these limitations by developing novel memory architectures that move beyond simple pattern matching. Our approach implements explicit memory systems capable of robust generalization, reduced reliance on rote memorization, and program-like information storage. This framework forms the foundation of a comprehensive cognitive architecture that seamlessly integrates learning, reasoning, and creative processes. By incorporating these advanced memory mechanisms, we aim to bridge the gap between current AI capabilities and human-like cognitive flexibility and contextual understanding.

Variational memory encoder decoder
Generative models with variational memory

 Compositional reasoning in vision-language domains
Compositionality is pervasive in nature, language, and our thought processes, enabling us to understand complex concepts by combining simpler elements. However, these compositional structures must be uncovered from raw signals and texts through sophisticated AI methods. Our research develops neural architectures that learn to identify and manipulate compositional patterns across visual and linguistic domains. By integrating computer vision and natural language processing, we model how basic visual elements combine into objects, scenes, and actions, while simultaneously capturing how words form phrases, sentences, and narratives. This compositional understanding enables more robust and interpretable AI systems capable of human-like reasoning across modalities.
 

A system for compositional reasoning
Compositional reasoning over complex queries.

Learning for relational and causal reasoning
This research investigates the fundamental role of relational and causal structures across natural phenomena, linguistic systems, and cognitive processes. By recognizing that causality and relationships are core organizing principles in both physical and abstract domains, we develop novel computational approaches to capture and reason about these structures. Our work spans multiple levels, from identifying basic causal mechanisms in natural systems to understanding how relational thinking shapes language acquisition and human reasoning. Through advanced machine learning techniques, we aim to create AI systems that can learn and leverage these inherent structural patterns, leading to more sophisticated understanding and decision-making capabilities comparable to human cognitition. 

Relational Dynamic Memory Network
Relational Dynamic Memory Network, a model for detecting relations between graphical structures.

Indirection mechanisms for better generalization
This research explores fundamental mechanisms enabling human-like generalization and abstraction in artificial intelligence systems. By investigating how humans effortlessly transfer knowledge across disparate domains through analogical reasoning, we develop new computational frameworks that support sophisticated abstraction capabilities. Our approach encompasses multiple dimensions: automated discovery of objects and their relationships, implementation of functional programming principles, development of indirect reference mechanisms, and formulation of complex analogies. These components work together to create AI systems capable of symbolic manipulation and abstract reasoning, ultimately enabling the kind of extreme generalization characteristic of human intelligence. This work aims to bridge the gap between current AI's domain-specific competence and human-level general intelligence.

Analogical reasoning for IQ-test questions
A system for abstracting out visual details, focusing on relations between images via an indirection mechanism. This is capable of solving IQ problems.

Theory of mind architectures

This research develops artificial intelligence systems capable of understanding and attributing mental states to others—a fundamental aspect of human social cognition. Drawing from developmental psychology, cognitive science, and anthropology, we design architectures that enable AI agents to engage in sophisticated social interactions. Our approach encompasses multiple innovations: role-learning frameworks for cooperative agents, guilt-aversion mechanisms to enhance cooperation, and memory-augmented neural networks for processing long-term social experiences. Particular emphasis is placed on developing false-belief understanding, allowing agents to recognize that others may hold beliefs incongruent with reality. The project aims to create more socially intelligent AI systems that can effectively collaborate in team environments. 

Theory of mind in agents

A system of multi-agents equipped with social psychology.


Efficient exploration of combinatorial spaces

This research addresses the fundamental challenge of navigating vast combinatorial search spaces in critical domains including structural design, materials science, drug discovery, and network optimization. Given the exponential growth of possible solutions with problem size, traditional exhaustive search methods become intractable. We develop novel generative AI approaches that intelligently balance exploration of new possibilities, exploitation of promising solutions, and maintenance of solution diversity. Our methods employ advanced sampling strategies and learning algorithms to efficiently traverse these complex spaces, enabling practical solutions to previously intractable problems. This work aims to accelerate discovery and optimization processes across multiple scientific and engineering domains.

Crystal structures generated using Generative AI

Crystal structures generated and optimized by several generative AI techniques.


Collaborative priors for LLM-powered multi-agents

This research develops novel frameworks to enhance collaboration between Large Language Model (LLM) powered agents, particularly in challenging social dilemma scenarios. While LLMs demonstrate remarkable individual capabilities, they lack inherent collaborative mechanisms. We design specialized priors that guide these agents toward effective team cooperation and long-term goal achievement. Our approach implements two key methodologies: strategic prompting and targeted interventions, leveraging the rich cooperative concepts embedded in LLMs' pre-training. This framework enables agents to make decisions that balance individual actions with team objectives, creating more effective multi-agent systems capable of handling complex social interactions and collective decision-making scenarios.

LLM-agent prompted with social priors

LLM-agent prompted with social priors.


Theory of mind in LLMs

This research investigates and enhances Large Language Models' capacity to understand and reason about mental states of others—a capability not inherently developed through standard training processes. While LLMs excel at text compression and instruction following, their ability to model others' beliefs, intentions, and knowledge states remains limited. We analyze how these models currently represent and process social understanding, and develop novel architectures and training approaches to explicitly incorporate theory of mind capabilities. Our work spans multiple dimensions: mental state attribution, belief modeling, intention recognition, and social reasoning. The project aims to create more socially intelligent language models that can better understand and navigate human interactions.

Theory of mind with LLMs

DALL.E3 illustration of Theory of Mind in LLMs.


 Scaling with sparse mixture of experts

This research investigates the fundamental properties and optimization of Sparse Mixture of Experts (SMoE) architectures for Large Language Model training. While SMoE offers promising scalability benefits, its operational dynamics remain incompletely understood. Our project conducts systematic analysis of critical design elements: token representation strategies, router indexing mechanisms, and methods to prevent mode collapse. We examine how different architectural choices affect model performance, routing efficiency, and computational resource utilization. Special attention is given to understanding noise effects on routing decisions and expert specialization. This work aims to establish theoretical foundations and practical guidelines for building more efficient and robust SMoE-based language models.
 

Discrete representation for MoE

Discrete representation for MoE.


Representing and reasoning over noisy data: Thinking, fast and slow
This research develops a unified framework that mimics the "thinking fast and slow" patterns in human for analyzing complex, multimodal sensor data streams in large-scale systems. We address the fundamental challenge of integrating diverse data types—including text, audio, video, and sensor feeds—into coherent representations that support long-term reasoning. Our approach combines self-supervised learning with sophisticated memory architectures to process time-varying, noisy information streams. The framework enables effective pattern recognition and prediction across extended timeframes, with applications in cyber threat detection, situational awareness, and intelligent system monitoring. By creating scalable methods for handling noisy, heterogeneous data, we aim to enhance decision-making capabilities in complex networked environments.

Partners: Australian Department of Defence

Duration: 2022-2025

Discrete representation for MoE

A general framework for thinking fast and slow in dealing with noisy sensors.


Structured reasoning in video
This research aims to uncover and leverage inherent structural patterns in video content, moving beyond pixel-level analysis to enable sophisticated reasoning about events, objects, and narratives. The project encompasses three key components: developing human-centric models for understanding behavioral context, implementing visual abductive reasoning to explain observed events, and creating predictive frameworks for future event states. Through these approaches, we enhance AI systems' ability to comprehend complex human interactions and temporal relationships in visual scenes. By incorporating commonsense knowledge and structured reasoning mechanisms, we aim to bridge the gap between simple pattern recognition and human-like understanding of visual narratives.

COMPUTER architecure for human activity recognition

An architecture to model human activity in context.


Human behaviour understanding in video

This research develops advanced AI systems for comprehensively analyzing human behavior in video data across diverse environmental contexts. Our approach integrates multiple levels of analysis: trajectory modeling, social interaction patterns, causal inference of trigger events, and prediction of actions and intentions. We incorporate fundamental human behavioral principles as inductive biases, including goal-directed behavior, environmental affordances, and commonsense reasoning. The project culminates in developing a multimodal foundation model that seamlessly processes video, text, and object information, grounding abstract knowledge in concrete visual observations. This framework enables deeper understanding of human behavior in both static and dynamic camera scenarios. 

Partners: iCetana

Anomaly detection with skeleton trajectories
Detecting anomalies in video using skeleton trajectories (last row)

Visual question answering 

This research advances AI systems' ability to comprehend and answer natural language questions about visual content, bridging low-level pattern recognition with high-level symbolic reasoning. We develop dynamic computational architectures that enable iterative reasoning processes, guided by linguistic queries to analyze visual scenes. The project encompasses several key innovations: query-specific neural architectures for spatio-temporal object interaction analysis, frameworks for understanding complex human-object relationships and events, and novel multimodal prompting strategies for few-shot learning in Large Vision Language Models. This work aims to create more sophisticated visual reasoning systems capable of handling complex, multi-step queries about visual content.

Answering question about video

Answering questions about a video.

Video dialog

This research develops advanced systems for natural conversations about video content of any duration. We create neural-symbolic architectures that combine visual understanding with dialogue capabilities, enabling AI to engage in meaningful discussions about video content. Our approach parses complex video sequences into structured representations of object trajectories and their interactions, maintaining dynamic dialogue states that evolve with conversation. The system employs sophisticated neural-symbolic reasoning mechanisms to track object relationships across space and time, while managing conversation history to ensure contextually coherent responses. By integrating object-oriented representations with self-attention mechanisms, we enable comprehensive understanding of long-range temporal dependencies and complex narratives in video content. 

A model for multi-turn video dialog

A model for multi-turn video dialog.