I'm an incoming Assistant Professor at Tsinghua University, College of Artificial Intelligence. Previously, I was a Research Scientist at META Reality Labs and GenAI, primarily focusing on first-person vision and generative AI models (such as Llama3, Llama4, and EMU). I completed my Ph.D. in Robotics at Georgia Tech, advised by Prof. James Rehg. I also work closely with Prof. Yin Li from the University of Wisconsin–Madison. I was fortunate to collaborate with Prof. Siyu Tang and Prof. Michael Black during my visit to ETH Zurich and the Max Planck Institute. I enjoyed a wonderful internship at Facebook Reality Labs, where I worked with Dr. Chao Li, Dr. Lingni Ma, Dr. Kiran Somasundaram, and Prof. Kristen Grauman on egocentric action recognition and localization. I am honored to have received several awards, including Best Paper Candidate at CVPR 2022 and ECCV 2024, and the BMVC Best Student Paper Award. Before joining Georgia Tech, I earned my Master’s degree from Carnegie Mellon University and Bachelor’s degree from Beihang University.
As a primary contributor, I have helped construct several widely recognized egocentric video datasets, including Ego4D, Ego-Exo4D, EGTEA Gaze+, and the Behavior Vision Suite, which have been broadly adopted in both academia and industry. I have also proposed multiple algorithms for egocentric action recognition and anticipation, some of which will be deployed in the next-generation smart glasses developed by Meta Reality Labs. During my time at Meta GenAI, I was deeply involved in the training and evaluation of large-scale generative multimodal models, including EMU, Llama3, and Llama4 (multimodal components only).
*This image of Jaime Lannister charging alone at Daenerys and her dragon reveals what it often takes to do science—you must be willing to stand as the lonely warrior.