I specialize in integrating multimodal understanding and generative models, with a focus on enhancing autoregressive models for generating and interpreting continuous data such as images, videos, and 3D content. My expertise extends to large-scale cluster training, involving the deployment of hundreds to thousands of GPUs. I have contributed to the training of foundational text-to-image and text-to-video models.
Additionally, my research spans areas like AIGC, federated learning, and robotics, and I have authored multiple publications in top-tier conferences within these fields.
Feel free to reach out via email for collaboration or inquiries.
Last updated: December 22, 2024
Powered by Jekyll and Minimal Light theme.