• Baidu ERNIE (文心一言), 2024.05 - 2024.12, Shenzhen.
    Topic: Multimodal Large Language Model Pre-training
    Job Description: I develop Multimodal Large Language Model (MLLM) for ERNIE Bot. Specifically, I focus on video MLLM pre-training, involving video, image, audio, and language modalities.
  • Tencent Youtu, 2024.03 - 2024.05, Shenzhen.
    Topic: Multimodal Large Language Model Pre-training
    Job Description: I work on Multimodal Large Language Model based on discrete coding.
  • DJI Automotive, 2023.10 - 2024.02, Shenzhen.
    Topic: Multimodal Image-Text Pre-training
    Job Description: I develop a image-text retrieval system for DJI Automotive. Specifically, I construct a traffic image-text dataset and enhance the existing multimodal model’s performance on traffic scene using traffic image-text pre-traing. I also leverage LLM (Large Language Model) and Diffusion Model to generate synthetic data to further enhance the model’s performance.
  • Tencent, 2023.03 - 2023.07, Shenzhen.
    Topic: Text to Image Generation (AIGC)
    Job Description: I employed various techniques to improve the performance of the AIGC model, such as image aesthetics assessment and human keypoint detection.