I’m currently a researcher at OpenAI. I work on reinforcement learning, small models, and synthetic data. I led the release of 4o-mini, and have contributed to other models such as o*-mini and o3.
Before that, I worked at Hudson River Trading and Meta AI, where I researched flavors of sequential decision making and deep learning.
I graduated from UC Berkeley, where I worked on reinforcement learning and modeling offline sequence data. I was fortunate to be advised by Pieter Abbeel and Igor Mordatch.
Email: matches my arxiv papers
Blog
-
Jun 2025 — AI Models for Pokemon Games
What can Pokemon teach us about designing interactive agents? -
Mar 2024 — Spending Inference Time
How should we structure inference compute to maximize performance? -
Feb 2024 — LoRAs as Composable Programs
How can we design LLMs to be future-proof operating systems? -
Jan 2024 — Unifying RLHF Objectives
What are different RL algorithms actually doing?
Papers
-
Summary — Towards a Universal Decision Making Paradigm
How can we design a universal learning method for sequential decision making? -
Jun 2021 — Decision Transformer
How can we perform reinforcement learning with autoregressive sequence models? -
Mar 2021 — Pretrained Transformers as Universal Computation Engines
What are the limits of transfer of large pretrained language models?
