📠Academic Services
Conference Reviewer: ICLR (2024-26), ACMMM (2023-25), ICME (2024-25)
Journal Reviewer: International Journal of Computer Vision (IJCV)
📖 Educations
- 2022.09 - 2025.06, Master, Tsinghua University, Shenzhen.
- 2018.09 - 2022.06, Bachelor, Xidian University, Xian.
💻 Internships
- Baidu ERNIE (文心一言), 2024.05 - 2024.12, Shenzhen.
Topic: Multimodal Large Language Model Pre-training
Job Description: I develop Multimodal Large Language Model (MLLM) for ERNIE Bot. Specifically, I focus on video MLLM pre-training, involving video, image, audio, and language modalities. - Tencent Youtu, 2024.03 - 2024.05, Shenzhen.
Topic: Multimodal Large Language Model Pre-training
Job Description: I work on Multimodal Large Language Model based on discrete coding. - DJI Automotive, 2023.10 - 2024.02, Shenzhen.
Topic: Multimodal Image-Text Pre-training
Job Description: I develop a image-text retrieval system for DJI Automotive. Specifically, I construct a traffic image-text dataset and enhance the existing multimodal model’s performance on traffic scene using traffic image-text pre-traing. I also leverage LLM (Large Language Model) and Diffusion Model to generate synthetic data to further enhance the model’s performance. - Tencent, 2023.03 - 2023.07, Shenzhen.
Topic: Text to Image Generation (AIGC)
Job Description: I employed various techniques to improve the performance of the AIGC model, such as image aesthetics assessment and human keypoint detection.