Adam Gleave
Adam Gleave
Home
Publications
Opinions
Contact
CV
Light
Dark
Automatic
3
imitation: Clean Imitation Learning Implementations
imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse …
Adam Gleave
,
Mohammad Taufeeque
,
Juan Rocamonde
,
Erik Jenner
,
Steven H. Wang
,
Sam Toyer
,
Maximilian Ernestus
,
Nora Belrose
,
Scott Emmons
,
Stuart Russell
PDF
Cite
Code
Calculus on MDPs: Potential Shaping as a Gradient
In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce. A particularly …
Erik Jenner
,
Herke Van Hoof
,
Adam Gleave
PDF
Cite
Reducing Exploitability with Population Based Training
Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Yet …
Pavel Czempin
,
Adam Gleave
PDF
Cite
A Primer on Maximum Causal Entropy Inverse Reinforcement Learning
Inverse Reinforcement Learning (IRL) algorithms infer a reward function that explains demonstrations provided by an expert acting in …
Adam Gleave
,
Sam Toyer
PDF
Cite
Uncertainty Estimation for Language Reward Models
Language models can learn a range of capabilities from unsupervised training on text corpora. However, to solve a particular problem …
Adam Gleave
,
Geoffrey Irving
PDF
Cite
Cite
×