Hi, I am Yi Su (苏仪). I am a Senior Researcher at Tencent, working at the Hunyuan AI Infra Department in Shanghai through Tencent Project UP (青云计划).
I am also completing my M.S. at Soochow University, supervised by Assoc. Prof. Juntao Li and Prof. Min Zhang. My work focuses on efficient training and deployment of Large Language Models, especially through quantization, KV-cache optimization, efficient attention, and other acceleration techniques.
🔥 News
- 2026.04: Release a light-weight 2-bit translation model: Hy-MT1.5-1.8B-2bit.
- 2026.04: MY previous work in KuaiShou, Kwai Summary Attention Technical Report is released.
- 2026.04: One paper is accepted by ACL2026: General-RL.
- 2026.03: I join Tencent Hunyuan AI Infra Department as a Senior Researcher through Tencent Project UP (青云计划).
- 2026.03: Release new works on efficiency: LongFlow, DapQ, and Quantized Inference for OneRec-V2.
- 2025.10: Release OneRec-Think, bringing explicit reasoning into generative recommendation.
- 2025.08: Our work at KuaiShou, OneRec-V2 Technical Report, is released.
- 2025.06: I join Kuaishou OneRec Team as a Research Intern through KuaiShou K-Star Project (快Star-X).
- 2025.05: Two paper are accepted by ACL 2025: OTT and OPT-Tree.
📖 Education
Supervised by Assoc. Prof. Juntao Li and Prof. Min Zhang. Researching Efficient LLMs.
💼 Work Experiences
- Build efficient LLM serving systems around quantization, model compression, and KV-cache optimization for production-scale Hunyuan models.
- Explore efficient attention architectures to reduce latency and memory pressure in long-context serving.
- Develop and maintain AngelSlim, an open-source model compression toolkit for practical LLM deployment.
- Low-precision training and inference for OneRec-V2, adapting LLM compression techniques to industrial generative recommendation.
- Design efficient attention mechanisms, including Kwai Summary Attention, for scalable recommendation foundation models.
- Improve LLM reasoning capabilities through algorithmic pipelines for general-domain inference and alignment.
- Design reward signals for applying reinforcement learning beyond verifiable tasks.
- Research KV-cache quantization and compression algorithms for efficient LLM inference.
- Brought KV-cache quantization techniques into Ascend-vLLM for deployment on Ascend NPUs.
📝 Selected Publications
Full list is available on my Google Scholar.
Yi Su, Xinchen Luo, Hongtao Cheng, Ziteng Shu,..., Ruiming Tang
Yi Su, Zhenxu Tian, Dan Qiao, Yuechi Zhou, Juntao Li, Min Zhang
Yi Su, Dian Yu, Linfeng Song, Juntao Li, Haitao Mi, Zhaopeng Tu, Min Zhang, Dong Yu
Yi Su, Yuechi Zhou, Quantong Qiu, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang
Jikai Wang*, Yi Su*, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang
Yi Su, Yunpeng Tai, Yixin Ji, Juntao Li, Bowen Yan, Min Zhang
Yi Su, Yixin Ji, Juntao Li, Hai Ye, Min Zhang
🎖 Honors and Awards
- 2025: Champion of Huawei Shengsi Model Development Challenge Competition (2025)
- 2025: Soochow University National Scholarship for Graduate Students
- 2025: Soochow University Graduate Outstanding Scholarship
- 2024: Champion of Huawei Shengsi Model Development Challenge Competition (2024)
- 2023: Soochow University Excellent Graduation Thesis for Undergraduate Students
- 2023: Soochow University Outstanding Graduate Student