I am a Generative AI Researcher at ByteDance~(TikTok), focusing on video generation and unified multimodal modeling. My recent work explores a unified autoregressive (AR) generative framework built upon continuous token representations, aimed at establishing a scalable foundation for multimodal generation. This framework has yielded notable improvements in high-fidelity image synthesis, and my ongoing efforts extend these advances to audio and video generation, with an emphasis on temporal consistency, cross-modal alignment, and controllability.
Last updated: Nov. 16, 2025 · Feel free to reach out via email
arXiv
arXiv
ICLR
arXiv
arXiv
CVPR
AAAI
CVPR
CVPR
ICLR
arXiv
CVPR
IROS
ROBIO
ICIA
Powered by Jekyll and Minimal Light theme.