πŸ“  Academic Services

Conference Reviewer: ICLR (2024-26), ACMMM (2023-25), ICME (2024-25)

Journal Reviewer: International Journal of Computer Vision (IJCV)

πŸ“– Educations

  • 2022.09 - 2025.06, Master, Tsinghua University, Shenzhen.
  • 2018.09 - 2022.06, Bachelor, Xidian University, Xian.

πŸ’» Internships

sym
Baidu ERNIE (文心一言), 2024.05 - 2024.12, Shenzhen.
  • Topic: Multimodal Large Language Model Pre-training
  • Job Description: I develop Multimodal Large Language Model (MLLM) for ERNIE Bot. Specifically, I focus on video MLLM pre-training, involving video, image, audio, and language modalities.
sym
Tencent Youtu, 2024.03 - 2024.05, Shenzhen.
  • Topic: Multimodal Large Language Model Pre-training
  • Job Description: I work on Multimodal Large Language Model based on discrete coding.
sym
DJI Automotive, 2023.10 - 2024.02, Shenzhen.
  • Topic: Multimodal Image-Text Pre-training
  • Job Description: I develop a image-text retrieval system for DJI Automotive. Specifically, I construct a traffic image-text dataset and enhance the existing multimodal model's performance on traffic scene using traffic image-text pre-traing. I also leverage LLM (Large Language Model) and Diffusion Model to generate synthetic data to further enhance the model's performance.
sym
Tencent, 2023.03 - 2023.07, Shenzhen.
  • Topic: Text to Image Generation (AIGC)
  • Job Description: I employed various techniques to improve the performance of the AIGC model, such as image aesthetics assessment and human keypoint detection.