- Baidu ERNIE (文心一言), 2024.05 - 2024.12, Shenzhen.
Topic: Multimodal Large Language Model Pre-training
Job Description: I develop Multimodal Large Language Model (MLLM) for ERNIE Bot. Specifically, I focus on video MLLM pre-training, involving video, image, audio, and language modalities.
- Tencent Youtu, 2024.03 - 2024.05, Shenzhen.
Topic: Multimodal Large Language Model Pre-training
Job Description: I work on Multimodal Large Language Model based on discrete coding.
- DJI Automotive, 2023.10 - 2024.02, Shenzhen.
Topic: Multimodal Image-Text Pre-training
Job Description: I develop a image-text retrieval system for DJI Automotive. Specifically, I construct a traffic image-text dataset and enhance the existing multimodal model’s performance on traffic scene using traffic image-text pre-traing. I also leverage LLM (Large Language Model) and Diffusion Model to generate synthetic data to further enhance the model’s performance.
- Tencent, 2023.03 - 2023.07, Shenzhen.
Topic: Text to Image Generation (AIGC)
Job Description: I employed various techniques to improve the performance of the AIGC model, such as image aesthetics assessment and human keypoint detection.