I’m currently a researcher at OpenAI. I work on reinforcement learning, small models, and synthetic data. I led the release of 4o-mini, and have contributed to other models such as o*-mini and o3.

Before that, I worked at Hudson River Trading and Meta AI, where I researched flavors of sequential decision making and deep learning.

I graduated from UC Berkeley, where I worked on reinforcement learning and modeling offline sequence data. I was fortunate to be advised by Pieter Abbeel and Igor Mordatch.

Email: matches my arxiv papers

Blog

Jun 2025 — AI Models for Pokemon Games
What can Pokemon teach us about designing interactive agents?
Mar 2024 — Spending Inference Time
How should we structure inference compute to maximize performance?
Feb 2024 — LoRAs as Composable Programs
How can we design LLMs to be future-proof operating systems?
Jan 2024 — Unifying RLHF Objectives
What are different RL algorithms actually doing?

Papers

Summary — Towards a Universal Decision Making Paradigm
How can we design a universal learning method for sequential decision making?
Jun 2021 — Decision Transformer
How can we perform reinforcement learning with autoregressive sequence models?
Mar 2021 — Pretrained Transformers as Universal Computation Engines
What are the limits of transfer of large pretrained language models?