
Dr Zhenghao Chen
Lecturer in Data Science
School of Information and Physical Sciences
Career Summary
Biography
Dr. Zhenghao Chen is a Faculty Member at ÁñÁ«³ÉÈËappÏÂÔØ of Newcastle. He obtained B.Eng. H1 and Ph.D. at the University of Sydney in 2017 and 2022, respectively. He was a Research Engineer at TikTok, a Postdoctoral Research Fellow at the University of Sydney and a Visiting Research Scientist at Microsoft Research and Disney Research. He has been awarded Google Australia Prize, Australia RTP International Fellowship, ACM SIGMM Outstanding Thesis Award, and Microsoft Research StrarTrack Fellowship for his academic merit.
Dr. Chen's general research interests encompass Computer Vision, Natural Language Processing, and Machine Learning. He is particularly renowned for his expertise in Generative AI, applying his research in both academic and industrial contexts. Dr. Chen has published extensively in flagship AI conferences and journals such as CVPR, ICCV, ECCV, MM, AAAI, ICLR and journals asuch as T-PAMI, T-IP, T-MI, and holds several patents. Additionally, he serves on the Conference Program Committee (PC) for CVPR, ECCV, ICCV, SIGGRAPH, AAAI, IJCAI, MICCAI, MM, KDD and organizes workshops in MM and ICCV. He also acts as a journal reviewer for IJCV, T-IP, T-MI, T-CSVT, PR, and a Guest Editor for MDPI-Algorithms and Frontier-in-AI.
Qualifications
- DOCTOR OF PHILOSOPHY, University of Sydney
- BACHELOR OF INFORMATION TECHNOLOGY, University of Sydney
Keywords
- Computer Vision
- Machine Learning
- Multimedia
- Natural Language Processing
Fields of Research
Code | Description | Percentage |
---|---|---|
460307 | Multimodal analysis and synthesis | 20 |
461103 | Deep learning | 30 |
460304 | Computer vision | 30 |
460208 | Natural language processing | 20 |
Professional Experience
UON Appointment
Title | Organisation / Department |
---|---|
Lecturer in Data Science | University of Newcastle School of Information and Physical Sciences Australia |
Academic appointment
Dates | Title | Organisation / Department |
---|---|---|
1/9/2022Ìý-Ìý1/5/2024 | Postdoctoral Research Fellow | University of Sydney Australia |
Professional appointment
Dates | Title | Organisation / Department |
---|---|---|
1/5/2024Ìý-Ìý1/10/2024 | Research Engineer | TikTok Australia |
1/10/2022Ìý-Ìý1/2/2023 | Visiting Research Scientist | Disney Research Switzerland |
Awards
Honours
Year | Award |
---|---|
2025 |
Microsoft Research Asia StarTrack Fellowship Microsoft Research |
2024 |
ACM SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Application Association for Computing Machinery (ACM) |
Scholarship
Year | Award |
---|---|
2019 |
Australia Government Research Training Program (RTP) Fellowship (International), Australian Government Department of Education |
2017 |
Google Australia Prize for Excellence in Computer Science |
Teaching
Code | Course | Role | Duration |
---|---|---|---|
ELEC5304 |
The university of Sydney |
Coordinator & Lecturer | 1/3/2023 - 1/7/2023 |
ELEC5306 |
The university of Sydney |
Lecturer | 1/7/2022 - 1/11/2022 |
COMP1140 |
School of Information and Physical Sciences (SIPS), University of Newcastle |
Coordinator & Lecturer | 21/7/2025 - 3/11/2025 |
COMP1010 |
School of Information and Physical Sciences (SIPS), University of Newcastle |
Coordinator & Lecturer | 3/3/2025 - 7/7/2025 |
Publications
For publications that are currently unpublished or in-press, details are shown in italics.
Chapter (2 outputs)
Year | Citation | Altmetrics | Link | |||||
---|---|---|---|---|---|---|---|---|
2022 |
Wang Z, Huo X, Chen Z, Zhang J, Sheng L, Xu D, 'Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation', 13692, 175-191 (2022) Point cloud registration aims at estimating the geometric transformation between two point cloud scans, in which point-wise correspondence estimation is the key to its success. In... [more] Point cloud registration aims at estimating the geometric transformation between two point cloud scans, in which point-wise correspondence estimation is the key to its success. In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence. However, it is not trivial to effectively fuse the geometric and visual information from these two distinctive modalities, especially for the registration problem. In this work, we propose a new Geometry-Aware Visual Feature Extractor (GAVE) that employs multi-scale local linear transformation to progressively fuse these two modalities, where the geometric features from the depth data act as the geometry-dependent convolution kernels to transform the visual features from the RGB data. The resultant visual-geometric features are in canonical feature spaces with alleviated visual dissimilarity caused by geometric changes, by which more reliable correspondence can be achieved. The proposed GAVE module can be readily plugged into recent RGB-D point cloud registration framework. Extensive experiments on 3D Match and ScanNet demonstrate that our method outperforms the state-of-the-art point cloud registration methods even without correspondence or pose supervision.
|
|||||||
2020 |
Hu Z, Chen Z, Xu D, Lu G, Ouyang W, Gu S, 'Improving Deep Video Compression by Resolution-Adaptive Flow Coding', 12347 LNCS, 193-209 (2020) In the learning based video compression approaches, it is an essential issue to compress pixel-level optical flow maps by developing new motion vector (MV) encoders. In this work,... [more] In the learning based video compression approaches, it is an essential issue to compress pixel-level optical flow maps by developing new motion vector (MV) encoders. In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder. To handle complex or simple motion patterns globally, our frame-level scheme RaFC-frame automatically decides the optimal flow map resolution for each video frame. To cope different types of motion patterns locally, our block-level scheme called RaFC-block can also select the optimal resolution for each local block of motion features. In addition, the rate-distortion criterion is applied to both RaFC-frame and RaFC-block and select the optimal motion coding mode for effective flow coding. Comprehensive experiments on four benchmark datasets HEVC, VTL, UVG and MCL-JCV clearly demonstrate the effectiveness of our overall RaFC framework after combing RaFC-frame and RaFC-block for video compression.
|
Conference (16 outputs)
Year | Citation | Altmetrics | Link | |||||
---|---|---|---|---|---|---|---|---|
2025 |
Yang L, Wang Z, Chen Z, Liang X, Zhou L, 'Medxchat: A Unified Multimodal Large Language Model Framework Towards CXRS Understanding and Generation', 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) (2025) [E1]
|
|||||||
2025 | Yu S, Jin C, Wang H, Chen Z, Jin S, Zuo Z, Xu X, Sun Z, Bingni Z, Wu J, Hao Z, Sun Q, 'Frame-Voyager: Learning to Query Frames for Video Large Language Models', International Conference on Learning Representations (ICLR) 2025 (2025) | |||||||
2024 |
Liu X, Chen Z, Luping Z, Xu D, Xi W, Bai G, Yihan Z, Zhao J, 'UFDA: Universal Federated Domain Adaptation with Practical Assumptions', Proceedings of the 38th AAAI Conference on Artificial Intelligence, 14026-14034 (2024) [E1]
|
|||||||
2024 |
Liu L, Hu Z, Chen Z, 'Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor', Generalizing from Limited Resources in the Open World Second International Workshop, GLOW 2024 Held in Conjunction with IJCAI 2024, 2160 CCIS, 3-17 (2024) [E1]
|
|||||||
2024 |
Chen Z, Zhou L, Hu Z, Xu D, 'Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression', MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 11022-11031 (2024) [E1]
|
|||||||
2023 |
Yang X, Lin G, Chen Z, Zhou L, 'Neural Vector Fields: Implicit Representation by Explicit Learning', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023-June, 16727-16738 (2023) [E1]
|
|||||||
2023 |
Liu L, Hu Z, Chen Z, Xu D, 'ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision', PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 8047-8056 (2023) [E1] Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus solely on impr... [more] Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus solely on improving human vision perception. In this work, our objective is to enhance image compression methods for both human vision quality and machine vision tasks simultaneously. To achieve this, we introduce a novel approach to Partition, Transmit, Reconstruct, and Aggregate (PTRA) the latent representation of images to balance the optimizations for both aspects. By employing our method as a module in existing neural image codecs, we create a latent representation predictor that dynamically manages the bit-rate cost for machine vision tasks. To further improve the performance of auto-regressive-based coding techniques, we enhance our hyperprior network and predictor module with context modules, resulting in a reduction in bit-rate. The extensive experiments conducted on various machine vision benchmarks such as ILSVRC 2012, VOC 2007, VOC 2012, and COCO demonstrate the superiority of our newly proposed image compression framework. It outperforms existing neural image compression methods in multiple machine vision tasks including classification, segmentation, and detection, while maintaining high-quality image reconstruction for human vision.
|
|||||||
2023 |
Chen Z, Relic L, Azevedo R, Zhang Y, Gross M, Xu D, Zhou L, Schroers C, 'Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers', PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 8543-8551 (2023) [E1]
|
|||||||
2022 |
Chen Z, Lu G, Hu Z, Liu S, Jiang W, Xu D, 'LSVC: A Learning-based Stereo Video Compression Framework', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 6063-6072 (2022) In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i.e., stereo videos from autonomous driving applications) from both lef... [more] In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i.e., stereo videos from autonomous driving applications) from both left and right views. Specifically, when compressing the current frame from each view, our framework reduces temporal redundancy by performing motion compensation using the reconstructed intra-view adjacent frame and at the same time exploits binocular redundancy by conducting disparity compensation using the latest reconstructed cross-view frame. Moreover, to effectively compress the introduced motion and disparity offsets for better compensation, we further propose two novel schemes called motion residual compression and disparity residual compression to respectively generate the predicted motion offset and disparity offset from the previously compressed motion offset and disparity offset, such that we can more effectively compress residual offset information for better bit-rate saving. Overall, the entire framework is implemented by the fully-differentiable modules and can be optimized in an end-to-end manner. Our comprehensive experiments on three automotive stereo video benchmarks Cityscapes, KITTI 2012 and KITTI 2015 demonstrate that our proposed framework outperforms the learning-based single-view video codec and the traditional hand-crafted multi-view video codec.
|
|||||||
2021 |
Chen Z, Gu S, Zhu F, Xu J, Zhao R, 'IMPROVING FACIAL ATTRIBUTE RECOGNITION BY GROUP AND GRAPH LEARNING', Proceedings IEEE International Conference on Multimedia and Expo (2021) Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of correlatio... [more] Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of correlations that are spatial and non-spatial relationships. For the spatial correlation, we aggregate attributes with spatial similarity into a part-based group and then introduce a Group Attention Learning to generate the group attention and the part-based group feature. On the other hand, to discover the non-spatial relationship, we model a group-based Graph Correlation Learning to explore affinities of predefined part-based groups. We utilize such affinity information to control the communication between all groups and then refine the learned group features. Overall, we propose a unified network called Multi-scale Group and Graph Network. It incorporates these two newly proposed learning strategies and produces coarse-to-fine graph-based group features for improving facial attribute recognition. Comprehensive experiments demonstrate that our approach outperforms the state-of-the-art methods.
|
|||||||
2020 |
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W, 'Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition', 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 140-149 (2020)
|
|||||||
2017 |
Chen Z, Zhou J, Wang X, Swanson J, Chen F, Feng D, 'Neural net-based and safety-oriented visual analytics for time-spatial data', Proceedings of the International Joint Conference on Neural Networks, 2017-May, 1133-1140 (2017) Safety-oriented visualization is one of significant approaches to gain insights from time-spatial data while neural net currently serves as a decent way to perform machine learnin... [more] Safety-oriented visualization is one of significant approaches to gain insights from time-spatial data while neural net currently serves as a decent way to perform machine learning in data mining industry. This paper proposes a visual analytics pipeline for trajectory data enabling better understanding movements pattern of people using Neural Network as back-end and other visualization techniques as front-end for gaining information of preferences of attractions, similarities of groups, popularities of attractions and pattern of movement flow. Such understandings help to address the management issue by extracting the outstanding features to detect abnormal pattern such as detection of crime and predicting overall movements, and so on. Successfully dealing with those issues would have significant improvements of entire management of public facility such as parks and transportation.
|
|||||||
2017 |
Zhi W, Yueng HWF, Chen Z, Zandavi SM, Lu Z, Chung YY, 'Using Transfer Learning with Convolutional Neural Networks to Diagnose Breast Cancer from Histopathological Images', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 10637 LNCS, 669-676 (2017) Diagnosis from histopathological images is the gold standard in diagnosing breast cancer. This paper investigates using transfer learning with convolutional neural networks to aut... [more] Diagnosis from histopathological images is the gold standard in diagnosing breast cancer. This paper investigates using transfer learning with convolutional neural networks to automatically diagnose breast cancer from patches of histopathological images. We compare the performance of using transfer learning with an off-the-shelf deep convolutional neural network architecture, VGGNet, and a shallower custom architecture. Our proposed final ensemble model, which contains three custom convolutional neural network classifiers trained using transfer learning, achieves a significantly higher image classification accuracy on the large public benchmark dataset than the current best results, for all image resolution levels.
|
|||||||
2017 |
Zhi W, Chen Z, Yueng HWF, Lu Z, Zandavi SM, Chung YY, 'Layer Removal for Transfer Learning with Deep Convolutional Neural Networks', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 10635 LNCS, 460-469 (2017) It is usually difficult to find datasets of sufficient size to train Deep Convolutional Neural Networks (DCNNs) from scratch. In practice, a neural network is often pre-trained on... [more] It is usually difficult to find datasets of sufficient size to train Deep Convolutional Neural Networks (DCNNs) from scratch. In practice, a neural network is often pre-trained on a very large source dataset. Then, a target dataset is transferred onto the neural network. This approach is a form of transfer learning, and allows very deep networks to achieve outstanding performance even when a small target dataset is available. It is thought that the bottom layers of the pre-trained network contain general information, which are applicable to different datasets and tasks, while the upper layers of the pre-trained network contain abstract information relevant to a specific dataset and task. While studies have been conducted on the fine-tuning of these layers, the removal of these layers have not yet been considered. This paper explores the effect of removing the upper convolutional layers of a pre-trained network. We empirically investigated whether removing upper layers of a deep pre-trained network can improve performance for transfer learning. We found that removing upper pre-trained layers gives a significant boost in performance, but the ideal number of layers to remove depends on the dataset. We suggest removing pre-trained convolutional layers when applying transfer learning on off-the-shelf pre-trained DCNNs. The ideal number of layers to remove will depend on the dataset, and remain as a parameter to be tuned.
|
|||||||
2016 |
Liu G, Chen Z, Yeung HWF, Chung YY, Yeh WC, 'A new weight adjusted particle swarm optimization for real-time multiple object tracking', Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 9948 LNCS, 643-651 (2016) This paper proposes a novel Weight Adjusted Particle Swarm Optimization (WAPSO) to overcome the occlusion problem and computational cost in multiple object tracking. To this end, ... [more] This paper proposes a novel Weight Adjusted Particle Swarm Optimization (WAPSO) to overcome the occlusion problem and computational cost in multiple object tracking. To this end, a new update strategy of inertia weight of the particles in WAPSO is designed to maintain particle diversity and prevent pre-mature convergence. Meanwhile, the implementation of a mechanism that enlarges the search space upon the detection of occlusion enhances WAPSO's robustness to non-linear target motion. In addition, the choice of Root Sum Squared Errors as the fitness function further increases the speed of the proposed approach. The experimental results has shown that in combination with the model feature that enables initialization of multiple independent swarms, the high-speed WAPSO algorithm can be applied to multiple non-linear object tracking for real-time applications.
|
|||||||
Show 13 more conferences |
Journal article (4 outputs)
Year | Citation | Altmetrics | Link | |||||
---|---|---|---|---|---|---|---|---|
2025 |
Yang L, Chen Z, Wang K, Zhou L, 'Improving CXR Bone Suppression by Exploiting Domain-level and Instance-level Information', IEEE Transactions on Medical Imaging (2025) [C1] For chest X-ray image (CXR) analysis, effective bone structure suppression is essential for uncovering lung abnormalities and facilitating accurate clinical diagnoses. While recen... [more] For chest X-ray image (CXR) analysis, effective bone structure suppression is essential for uncovering lung abnormalities and facilitating accurate clinical diagnoses. While recent deep generative models, to some extent, improve the reconstruction quality of bone-suppressed CXRs, they often fall short in delivering substantial improvements in downstream diagnosis tasks. This limitation is attributed to a narrow focus on instance-specific details, neglecting broader domain-level knowledge, which hampers bone-suppression effectiveness. In response to these challenges, our proposed framework adopts a novel approach that integrates both instance-level and domain-level information. To capture instance information, our model employs a hybrid approach using both cross-covariance attention blocks (CABs) to underscore relevant image information and a followed Vision Transformers (ViTs) encoder for image feature embedding. To capture domain information, we introduce multi-head codebook attention (MCA) which leverages codebook structure with multi-head attention mechanism to capture global, domain-level information specific to the bone-suppressed CXR domain, thereby refining the synthesis process. During optimization, our two-stage training scheme involves a MCA learning stage that encapsulates the domain of bone-suppressed CXRs in MCA through a ViT-based GAN model, and a synthesis stage that employs the learned codebook to generate bone-suppressed CXRs from the original ones, enhancing instance synthesis through domain insights. Moreover, the incorporation of CABs further refines pixellevel instance information. Extensive experiments demonstrate the superior performance of our approach, improving PSNR by 8.36% and SSIM by 2.7% for bone suppression while boosting lung disease classification by 2.8% and 4.2% on two datasets and segmentation by 1.5%.
|
|||||||
2025 |
Yang X, Lin G, Chen Z, Zhou L, 'Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization', IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 5818-5831 (2025) [C1] Recent neural networks based surface reconstruction can be roughly divided into two categories, one warping templates explicitly and the other representing 3D surfaces implicitly.... [more] Recent neural networks based surface reconstruction can be roughly divided into two categories, one warping templates explicitly and the other representing 3D surfaces implicitly. To enjoy the advantages of both, we propose a novel 3D representation, Neural Vector Fields (NVF), which adopts the explicit learning process to manipulate meshes and implicit unsigned distance function (UDF) representation to break the barriers in resolution and topology. This is achieved by directly predicting the displacements from surface queries and modeling shapes as Vector Fields, rather than relying on network differentiation to obtain direction fields as most existing UDF-based methods do. In this way, our approach is capable of encoding both the distance and the direction fields so that the calculation of direction fields is differentiation-free, circumventing the non-trivial surface extraction step. Furthermore, building upon NVFs, we propose to incorporate two types of shape codebooks, i.e., NVFs (Lite or Ultra), to promote cross-category reconstruction through encoding cross-object priors. Moreover, we propose a new regularization based on analyzing the zero-curl property of NVFs, and implement this through the fully differentiable framework of our NVF (ultra). We evaluate both NVFs on four surface reconstruction scenarios, including watertight vs non-watertight shapes, category-agnostic reconstruction vs category-unseen reconstruction, category-specific, and cross-domain reconstruction.
|
|||||||
2024 | Guo J, Chen Z, Ma Y, Liu X, Kim J, Ouyang W, Tao D, 'EMCLR’24 Chairs’ Welcome', Emclr 2024 Proceedings of the 1st International Workshop on Efficient Multimedia Computing Under Limited Resources Co Located with mm 2024 (2024) | |||||||
2022 |
Chen Z, Gu S, Lu G, Xu D, 'Exploiting Intra-Slice and Inter-Slice Redundancy for Learning-Based Lossless Volumetric Image Compression', IEEE TRANSACTIONS ON IMAGE PROCESSING, 31, 1697-1707 (2022) [C1]
|
|||||||
Show 1 more journal article |
Patent (3 outputs)
Year | Citation | Altmetrics | Link |
---|---|---|---|
2025 | Chen Z, Albuquerque ARGD, Schroers CR, Zhang Y, Relic L, 'Contextual video compression framework with spatial-temporal cross-covariance transformers' (2025) | ||
2021 | Chen Z, Xu J, Zhu F, Rui Z, Facial attribute recognition method and apparatus, and electronic device and storage medium (2021) | ||
2020 | Chen Z, Xu J, Zhao R, Face recognition method and apparatus, electronic device, and storage medium (2020) |
Dr Zhenghao Chen
Position
Lecturer in Data Science
School of Information and Physical Sciences
College of Engineering, Science and Environment
Contact Details
zhenghao.chen@newcastle.edu.au | |
Links |
|