AI Future | Prof Truyen Tran

This research project explores novel architectural designs for neural networks, drawing direct inspiration from biological neural systems. By studying the structural organization of the brain, particularly the columnar architecture of the neocortex and the routing mechanisms of the thalamus, we develop modular networks optimized for diverse data types including matrices, tensors, graphs, and relational data. Our approach integrates key cognitive mechanisms such as working memory for enhanced problem-solving capabilities and episodic memory for temporal information integration. This biomimetic framework aims to improve neural network performance by incorporating proven solutions from neuroscience, potentially leading to more robust and adaptable AI systems.

While current deep learning systems demonstrate remarkable capabilities in pattern recognition, they struggle with higher-order cognitive tasks like complex system manipulation, rapid adaptation, and maintaining coherent long-term interactions. This research addresses these limitations by developing novel memory architectures that move beyond simple pattern matching. Our approach implements explicit memory systems capable of robust generalization, reduced reliance on rote memorization, and program-like information storage. This framework forms the foundation of a comprehensive cognitive architecture that seamlessly integrates learning, reasoning, and creative processes. By incorporating these advanced memory mechanisms, we aim to bridge the gap between current AI capabilities and human-like cognitive flexibility and contextual understanding.

Compositionality is pervasive in nature, language, and our thought processes, enabling us to understand complex concepts by combining simpler elements. However, these compositional structures must be uncovered from raw signals and texts through sophisticated AI methods. Our research develops neural architectures that learn to identify and manipulate compositional patterns across visual and linguistic domains. By integrating computer vision and natural language processing, we model how basic visual elements combine into objects, scenes, and actions, while simultaneously capturing how words form phrases, sentences, and narratives. This compositional understanding enables more robust and interpretable AI systems capable of human-like reasoning across modalities.

This research investigates the fundamental role of relational and causal structures across natural phenomena, linguistic systems, and cognitive processes. By recognizing that causality and relationships are core organizing principles in both physical and abstract domains, we develop novel computational approaches to capture and reason about these structures. Our work spans multiple levels, from identifying basic causal mechanisms in natural systems to understanding how relational thinking shapes language acquisition and human reasoning. Through advanced machine learning techniques, we aim to create AI systems that can learn and leverage these inherent structural patterns, leading to more sophisticated understanding and decision-making capabilities comparable to human cognitition.

This research explores fundamental mechanisms enabling human-like generalization and abstraction in artificial intelligence systems. By investigating how humans effortlessly transfer knowledge across disparate domains through analogical reasoning, we develop new computational frameworks that support sophisticated abstraction capabilities. Our approach encompasses multiple dimensions: automated discovery of objects and their relationships, implementation of functional programming principles, development of indirect reference mechanisms, and formulation of complex analogies. These components work together to create AI systems capable of symbolic manipulation and abstract reasoning, ultimately enabling the kind of extreme generalization characteristic of human intelligence. This work aims to bridge the gap between current AI's domain-specific competence and human-level general intelligence.

This research develops artificial intelligence systems capable of understanding and attributing mental states to others—a fundamental aspect of human social cognition. Drawing from developmental psychology, cognitive science, and anthropology, we design architectures that enable AI agents to engage in sophisticated social interactions. Our approach encompasses multiple innovations: role-learning frameworks for cooperative agents, guilt-aversion mechanisms to enhance cooperation, and memory-augmented neural networks for processing long-term social experiences. Particular emphasis is placed on developing false-belief understanding, allowing agents to recognize that others may hold beliefs incongruent with reality. The project aims to create more socially intelligent AI systems that can effectively collaborate in team environments.

This research addresses the fundamental challenge of navigating vast combinatorial search spaces in critical domains including structural design, materials science, drug discovery, and network optimization. Given the exponential growth of possible solutions with problem size, traditional exhaustive search methods become intractable. We develop novel generative AI approaches that intelligently balance exploration of new possibilities, exploitation of promising solutions, and maintenance of solution diversity. Our methods employ advanced sampling strategies and learning algorithms to efficiently traverse these complex spaces, enabling practical solutions to previously intractable problems. This work aims to accelerate discovery and optimization processes across multiple scientific and engineering domains.

This research develops novel frameworks to enhance collaboration between Large Language Model (LLM) powered agents, particularly in challenging social dilemma scenarios. While LLMs demonstrate remarkable individual capabilities, they lack inherent collaborative mechanisms. We design specialized priors that guide these agents toward effective team cooperation and long-term goal achievement. Our approach implements two key methodologies: strategic prompting and targeted interventions, leveraging the rich cooperative concepts embedded in LLMs' pre-training. This framework enables agents to make decisions that balance individual actions with team objectives, creating more effective multi-agent systems capable of handling complex social interactions and collective decision-making scenarios.

This research investigates the fundamental properties and optimization of Sparse Mixture of Experts (SMoE) architectures for Large Language Model training. While SMoE offers promising scalability benefits, its operational dynamics remain incompletely understood. Our project conducts systematic analysis of critical design elements: token representation strategies, router indexing mechanisms, and methods to prevent mode collapse. We examine how different architectural choices affect model performance, routing efficiency, and computational resource utilization. Special attention is given to understanding noise effects on routing decisions and expert specialization. This work aims to establish theoretical foundations and practical guidelines for building more efficient and robust SMoE-based language models.

Discrete representation for MoE

Discrete representation for MoE.

This research develops a unified framework that mimics the "thinking fast and slow" patterns in human for analyzing complex, multimodal sensor data streams in large-scale systems. We address the fundamental challenge of integrating diverse data types—including text, audio, video, and sensor feeds—into coherent representations that support long-term reasoning. Our approach combines self-supervised learning with sophisticated memory architectures to process time-varying, noisy information streams. The framework enables effective pattern recognition and prediction across extended timeframes, with applications in cyber threat detection, situational awareness, and intelligent system monitoring. By creating scalable methods for handling noisy, heterogeneous data, we aim to enhance decision-making capabilities in complex networked environments.

This research aims to uncover and leverage inherent structural patterns in video content, moving beyond pixel-level analysis to enable sophisticated reasoning about events, objects, and narratives. The project encompasses three key components: developing human-centric models for understanding behavioral context, implementing visual abductive reasoning to explain observed events, and creating predictive frameworks for future event states. Through these approaches, we enhance AI systems' ability to comprehend complex human interactions and temporal relationships in visual scenes. By incorporating commonsense knowledge and structured reasoning mechanisms, we aim to bridge the gap between simple pattern recognition and human-like understanding of visual narratives.

This research develops advanced AI systems for comprehensively analyzing human behavior in video data across diverse environmental contexts. Our approach integrates multiple levels of analysis: trajectory modeling, social interaction patterns, causal inference of trigger events, and prediction of actions and intentions. We incorporate fundamental human behavioral principles as inductive biases, including goal-directed behavior, environmental affordances, and commonsense reasoning. The project culminates in developing a multimodal foundation model that seamlessly processes video, text, and object information, grounding abstract knowledge in concrete visual observations. This framework enables deeper understanding of human behavior in both static and dynamic camera scenarios.

This research advances AI systems' ability to comprehend and answer natural language questions about visual content, bridging low-level pattern recognition with high-level symbolic reasoning. We develop dynamic computational architectures that enable iterative reasoning processes, guided by linguistic queries to analyze visual scenes. The project encompasses several key innovations: query-specific neural architectures for spatio-temporal object interaction analysis, frameworks for understanding complex human-object relationships and events, and novel multimodal prompting strategies for few-shot learning in Large Vision Language Models. This work aims to create more sophisticated visual reasoning systems capable of handling complex, multi-step queries about visual content.

This research develops advanced systems for natural conversations about video content of any duration. We create neural-symbolic architectures that combine visual understanding with dialogue capabilities, enabling AI to engage in meaningful discussions about video content. Our approach parses complex video sequences into structured representations of object trajectories and their interactions, maintaining dynamic dialogue states that evolve with conversation. The system employs sophisticated neural-symbolic reasoning mechanisms to track object relationships across space and time, while managing conversation history to ensure contextually coherent responses. By integrating object-oriented representations with self-attention mechanisms, we enable comprehensive understanding of long-range temporal dependencies and complex narratives in video content.