Hi, I am Yi Su (苏仪). I am a Senior Researcher at Tencent, working at the Hunyuan AI Infra Department in Shanghai through Tencent Project UP (青云计划).

I am also completing my M.S. at Soochow University, supervised by Assoc. Prof. Juntao Li and Prof. Min Zhang. My work focuses on efficient training and deployment of Large Language Models, especially through quantization, KV-cache optimization, efficient attention, and other acceleration techniques.

Efficient LLMs Quantization KV Cache Optimization Efficient Attention

🔥 News

📖 Education

Soochow University
M.S. in Computer Science · Institute of Computer Science and Technology · Suzhou
2023.09 - 2026.06

Supervised by Assoc. Prof. Juntao Li and Prof. Min Zhang. Researching Efficient LLMs.

Soochow University
B.Eng. in Computer Science · Institute of Computer Science and Technology · Suzhou
2019.09 - 2023.06

💼 Work Experiences

Tencent · Hunyuan AI Infra Department
Senior Researcher · Tencent Project UP (青云计划)
2026.03 - Present · Shanghai
  • Build efficient LLM serving systems around quantization, model compression, and KV-cache optimization for production-scale Hunyuan models.
  • Explore efficient attention architectures to reduce latency and memory pressure in long-context serving.
  • Develop and maintain AngelSlim, an open-source model compression toolkit for practical LLM deployment.
Kuaishou · OneRec Team
Research Intern · K-Star (快Star-X)
2025.06 - 2026.02 · Beijing
  • Low-precision training and inference for OneRec-V2, adapting LLM compression techniques to industrial generative recommendation.
  • Design efficient attention mechanisms, including Kwai Summary Attention, for scalable recommendation foundation models.
Tencent · AI Lab
Research Intern
2024.11 - 2025.06 · Shenzhen
  • Improve LLM reasoning capabilities through algorithmic pipelines for general-domain inference and alignment.
  • Design reward signals for applying reinforcement learning beyond verifiable tasks.
Huawei Cloud · AI System Innovation Lab
Research Intern
2024.08 - 2024.11 · Hangzhou
  • Research KV-cache quantization and compression algorithms for efficient LLM inference.
  • Brought KV-cache quantization techniques into Ascend-vLLM for deployment on Ascend NPUs.

📝 Selected Publications

Full list is available on my Google Scholar.

arXiv 2026 Kwai Summary Attention Technical Report

KuaiShou OneRec Team

arXiv 2026 Quantized Inference for OneRec-V2

Yi Su, Xinchen Luo, Hongtao Cheng, Ziteng Shu,..., Ruiming Tang

arXiv 2026 LongFlow: Efficient KV Cache Compression for Reasoning Models

Yi Su, Zhenxu Tian, Dan Qiao, Yuechi Zhou, Juntao Li, Min Zhang

arXiv 2025 OneRec-V2 Technical Report

KuaiShou OneRec Team

ACL 2026 Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Yi Su, Dian Yu, Linfeng Song, Juntao Li, Haitao Mi, Zhaopeng Tu, Min Zhang, Dong Yu

ACL 2025 Accurate KV Cache Quantization with Outlier Tokens Tracing

Yi Su, Yuechi Zhou, Quantong Qiu, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang

ACL 2025 OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

Jikai Wang*, Yi Su*, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang

ACL 2024 Demonstration Augmentation for Zero-shot In-context Learning

Yi Su, Yunpeng Tai, Yixin Ji, Juntao Li, Bowen Yan, Min Zhang

🎖 Honors and Awards

  • 2025:  Champion of Huawei Shengsi Model Development Challenge Competition (2025)
  • 2025:  Soochow University National Scholarship for Graduate Students
  • 2025:  Soochow University Graduate Outstanding Scholarship
  • 2024:  Champion of Huawei Shengsi Model Development Challenge Competition (2024)
  • 2023:  Soochow University Excellent Graduation Thesis for Undergraduate Students
  • 2023:  Soochow University Outstanding Graduate Student