About Me

Hi — I’m Pengfei Yu, working on agentic learning for large language models at Amazon AGI Foundations.

I am a major contributor to Nova2 agentic capabilities, achieving state-of-the-art tool-integrated reasoning performance on academic benchmarks and advanced agentic capabilities on agentic benchmarks. My current research focuses on Agentic Post-Training (SFT & RL) to boost LLM’s general intelligence through more robust reinforcement learning framework, better context management and improved reward signals. Previously, I worked at Boson AI, where I focused on language model post-training and evaluations for role-playing games.

I earned a Ph.D. in Computer Science from UIUC, advised by Prof. Heng Ji. I am honored to have My PhD thesis Association knowledge in natural language learning advised by Prof. Jiawei Han, Prof. Derek Hoiem, Prof. Gramham Neubig and Dr. Scott Wen-tau Yih (alphabetically ordered). Before that, I studied Electronic Engineering and Mathematics at Tsinghua University for Bachelor’s, where I did NLP research under the guidance of Prof. Zhiyuan Liu.

What I work on

  • Agentic Post-Training: Effective, robust and scalable SFT/RL recipes for LLM agents. Tool-Integrated Reasoning, search-based rollouts, context management, optimized agent contracts and reward modeling.
  • Evaluation & safety for agents: Benchmarks, failure analysis, and guardrails for risky agent behaviors.
  • LLM Mechanisms: Understanding hallucination, knowledge editing mechanisms, and decision-making patterns.

Recent publications

For Featured Publications, please see my Research page. For a complete list, please visit my Google Scholar Profile.

  • SABER: safeguarding mutating steps in LLM agents with reflection + context management.
    • TauBench-Verified: curating a high-quality agentic benchmark for fair evaluations for LLM agents.
  • RPGBench: a benchmark for evaluating LLMs as role-playing game engines (game creation + simulation) with robust validity checks.
  • Event-based knowledge editing: defining editing boundaries and model behaviors under conflicting context.
  • Hallucination analysis: connecting hallucination to generalization/data bias and proposing causal inference based detection and mitigation methods.

What’s else

I believe in using AI to accelerate scientific discovery. I think scientist–AI collaboration is one of the most meaningful near-term paths to impact, even as we move toward increasingly autonomous agents. In addition, I have broader interests in Diffusion & Flow-Matching Models, Pretraining Data Selection, Model Architecture Design, and ML System.

If you have any questions or collaboration ideas, please feel free to reach out.


Recent Posts

Parallel Generation Streams for LLMs

3 minute read

Parallel Generation Streams (PGS) allows models to branch, merge, and discard generation streams dynamically, enabling more sophisticated probablity model...