Projects
Research projects
I have been involved on the following projects:
- National Computational Merit Allocation Scheme 2024 (NCMAS 2024)
- Lead Researcher
- Project Title: MotionNetLite - Video Dynamics Distillation for Scalable Models
- In this research project, we aim to develop more lightweight video data formats which are able to efficiently distill the motions at different granular levels by removing redundant information while focusing on the core dynamics. We also explore the acceleration of training and testing while reducing the amount of video data for storage, and answer a scientific question ‘How much information is contained in the video data and in what format?’. Many downstream video processing tasks will benefit from our video dynamic distillation process, towards making video understanding much easier and more efficient. This also opens up a new research direction in exploring better video data representations for more lightweight cutting-edge video models.
- This work has potential to impact Safety and Security, Future Cities, IoT, Agri-business, Defence via applications in Health and wellbeing, Safety, and Innovative industries. Our work focuses on researching advanced technologies from data that support all areas of science and society to provide national benefit. Video understanding, e.g., action recognition and anomaly detection, is needed in surveillance of airports, malls, etc. It has applications in monitoring health and well-being of elderly population, in farming, and analysis of crops. This project has also potential to ‘Shape Societal Transformations’. For instance, action recognition is a necessary component in recognition from wearable clothing, monitoring health and exercise regimes in the gym, recommendation systems via wearables, recognition of fake videos on social media, etc.
- This project focuses on ‘Analysing, Representing and Modelling data’, as video processing models require spatio-temporal modeling of time series, video frames, sequences, etc. My proposal aims at overcoming ‘Fundamental limits of data’, e.g. by learning in-the-wild and reinforcement learning to explore natural sources of information (e.g. predicting future evolution of video frames learns intrinsic manifold of video/motion data). I hope to bring my ideas to the social media, wearable devices, and recommender systems thus shaping ‘data-driven society’.
- The NCI National AI Flagship Merit Allocation Scheme
- Assistant Researcher
- Project Title: Robust anomaly detection in human-centric videos
- This project aims at developing advanced computer vision and deep learning techniques to identify and characterise anomalies in video data where humans are central. The project leverages cutting-edge technology to enhance security, safety, and surveillance systems, making them more effective in detecting unusual behaviours and events, which may range from security breaches and accidents to rare medical conditions in healthcare applications.
- The significance of this project lies in its unique focus on human-centric videos. While anomaly detection in videos is an established field, the novelty emerges from its specialised applications in situations where human activity is central. The innovative aspects include (i) Robustness: The project seeks to develop highly reliable models capable of detecting anomalies in complex, real-world scenarios, where human interactions and activities can vary significantly. (ii) Real-time analysis: By applying these methods to real-time video streams, the project addresses the demand for timely responses to anomalies in security, industrial, and healthcare settings. (iii) Ethical considerations: The project incorporates ethical considerations, such as ensuring privacy and avoiding bias in the identification of anomalies, thereby making the technology responsible and trustworthy. Beyond its academic contributions, this project has the potential to make significant contributions to the economy, society, environment, and culture.
- NCI Adaptater Scheme Q4 2023 (HPC funding scheme)
- Assistant Researcher
- Project Title: Towards building general-purpose multimodal foundation models
- Scope: Vision-language pre-training (VLP) has attracted rapidly growing attention in both computer vision and NLP communities due to the emergence of large-scale multimodal foundation models like Contrastive Language-Image Pre-training (CLIP). It is very encouraging to see that many Vision-Language (VL) systems have been deployed in industry. For example, iPhone can generate image captions read by VoiceOver for vision-impaired users. Although multimodal intelligence has been applied in many areas including image-text, core computer vision and video-text tasks, there are still many factors to be considered including robustness to new domains, fairness and responsible AI issues.
- Aim: One common theme stands out is how to build a general-purpose multi-modal foundation model. We aim to build a foundation model that is stable and generalisable, and can be readily adopted to various downstream tasks, ranging from image-level vision tasks (e.g., image classification, retrieval, and captioning), region-level vision tasks (e.g., object detection and phrase grounding), to pixel-level vision tasks (e.g., segmentation and image generation). In order to build a general-purpose foundation model, we need a unified model architecture that can be readily scaled up; and when being pre-trained at scale, it can be readily adopted to various downstream computer vision and VL tasks.
- Summer Research Internship Program
- Summer Scholar
- Project Title: Video dynamics distillation
- Video captures motions such as natural dynamics and human actions. Various research works have been dedicated in learning and extracting spatio-temporal features for scene understanding, human action recognition, anomaly detection, etc. Nowadays, video data have been preprocessed using either machine learning tools / computer vision algorithms or physical sensors, such as optical flows that highlight the dynamics, depth videos that segment the foreground objects or even human subjects, and skeleton sequences that focus on human actions, for better video understanding. However, videos contain very redundant information which hinders the effectiveness and efficiency in extracting the motions, forcing the computer vision community works hard in various large-scale pre-training, and suffers the dark matter of large models for a long time.
- In this research project, we aim to develop more lightweight video data formats which are able to efficiently distill the motions at different granular levels by removing redundant information while focusing on the core dynamics. Many downstream video processing tasks will benefit from our video dynamic distillation process, towards making video understanding much easier and more efficient. This also opens up a new research direction in exploring better video data representations for more lightweight cutting-edge video models.