Experience

Industry

Applied Scientist · 2025.02 – Present
Building Amazon Nova (2.0 and beyond) post-training and reasoning systems.

Agentic Post-Training · 2025.06 – Present

Designed agentic SFT/RL recipes with tool use (Python interpreter + external tools).
Built efficient inference infrastructure for data curation and online rollouts.
Built and maintain an agentic RL framework (training + evaluation + debugging).
Ongoing: scaling and reliability improvements.

Multi-Modal Reasoning · 2025.02 – 2025.05

Applied Scientist · 2024.08 – 2025.02
Post-training and evaluations for LLM alignment and usefulness.

Developed internal DPO variants to improve alignment.
Built diagnosis pipelines for preference alignment (data vs reward-model issues).
Created RPGBench, a benchmark for evaluating LLMs as role-playing game engines.
Optimized data synthesis and evaluation pipelines for inference efficiency.

University of Illinois Urbana–Champaign

PhD in Computer Science · 2019.08 – 2024.08
Advisor: Prof. Heng Ji

Tsinghua University

BE in Electronic Engineering; secondary BS in Mathematics · 2015.09 – 2019.07
NLP research advised by Prof. Zhiyuan Liu.