📝 Publications
Multimodal Learning

Gamma: Toward Generic Image Assessment with Mixture of Assessment Experts
Hantao Zhou, Rui Yang, Longxiang Tang, Guanyi Qin, Runze Hu, Xiu Li
ACM Multimedia (ACM MM), 2025
- Gamma is a general image assessment model that can be applied to natural image, underwater image, AIGC image and face image assessment, etc.

UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li
- UniQA is a foundational multimodal image assessment model that supports image quality and aesthetic assessment tasks and generalizes well to natural images, AIGC images, and medical images, etc.
- We generate multimodal dataset via multimodal large language model and pretrain CLIP on authentic and synthetic data.

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia
European Conference on Computer Vision (ECCV), 2024
- We propose the Distributionaware Interference-free Knowledge Integration (DIKI) framework to retain pre-trained knowledge of in continual learning of Vision-Language Models (VLMs).

SemanticAC: Semantics-Assisted Framework for Audio Classification
Yicheng Xiao, Yue Ma, Shuyan Li, Hantao Zhou, Ran Liao, Xiu Li
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
- We propose SemanticAC, a semantics-assisted framework for Audio Classification to better leverage the semantic information.
Detection and Segmentation

UniHead: Unifying Multi-Perception for Detection Heads
Hantao Zhou, Rui Yang, Yachao Zhang, Haoran Duan, Yawen Huang, Runze Hu, Xiu Li, Yefeng Zheng
IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2024
- UniHead unifies various perceptions in a single detection head via novel attention modules, which can enhance the performance of many classical detectors on both object detection and segmentation tasks.

Video Object Segmentation with Dynamic Query Modulation
Hantao Zhou, Runze Hu, Xiu Li
International Conference on Multimedia and Expo (ICME), 2024
- We design a dynamic query modulation framework for Video Object Segmentation, which can update queries and perform multi-object interaction effectively.

ETDNet: Efficient Transformer-Based Detection Network for Surface Defect Detection
Hantao Zhou, Rui Yang, Runze Hu, Chang Shu, Xiaochu Tang, Xiu Li
IEEE Transactions on Instrumentation and Measurement (IEEE TIM), 2022
- We propose the Efficient Transformer-Based Detection Network (ETDNet) for defect detection, which includes variety of novel Transformer-based module designs to improve the detection performance.
Patent:
- 用于图像质量和美学评价的统一视觉语言模型预训练和调整方法 , 李秀, 周涵涛