Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Abstract

Reinforcement Learning from Human Feedback (RLHF) has achieved impressive empirical successes while relying on a small amount of human feedback. However, there is limited theoretical justification for this phenomenon. Additionally, most recent studies focus on value-based algorithms despite the recent empirical successes of policy-based algorithms. In this work, we consider an RLHF algorithm based on policy optimization (PO-RLHF). The algorithm is based on the popular Policy Cover-Policy Gradient (PC-PG) algorithm, which assumes knowledge of the reward function. In PO-RLHF, knowledge of the reward function is not assumed, and the algorithm uses trajectory-based comparison feedback to infer the reward function. We provide performance bounds for PO-RLHF with low query complexity, and exhibit an insight into why a small amount of human feedback may be sufficient to achieve good performance with RLHF. We propose and analyze algorithms PG-RLHF and NN-PG-RLHF for two important settings: linear and neural function approximation, respectively.

Bio

Dr. Yihan Du is currently a postdoctoral researcher at the University of Illinois Urbana-Champaign, working with Prof. R. Srikant. Her research interests lie in machine learning, with emphases on reinforcement learning and online learning. Dr. Du obtained her Ph.D. degree from the Institute for Interdisciplinary Information Sciences (headed by Prof. Andrew Chi-Chih Yao) at Tsinghua University in 2023. She has published several papers in top conferences in machine learning, including ICML, NeurIPS, ICLR and AAAI. Dr. Du also received several honors, including the China Computer Federation (CCF) Agent and Multi-Agent System Doctoral Dissertation Award, and the Tsinghua Outstanding Doctoral Dissertation Award.

 

Share this event

facebook linked in twitter email

Event Contact: Iam-Choon Khoo

 
 

About

The School of Electrical Engineering and Computer Science was created in the spring of 2015 to allow greater access to courses offered by both departments for undergraduate and graduate students in exciting collaborative research fields.

We offer B.S. degrees in electrical engineering, computer science, computer engineering and data science and graduate degrees (master's degrees and Ph.D.'s) in electrical engineering and computer science and engineering. EECS focuses on the convergence of technologies and disciplines to meet today’s industrial demands.

School of Electrical Engineering and Computer Science

The Pennsylvania State University

207 Electrical Engineering West

University Park, PA 16802

814-863-6740

Department of Computer Science and Engineering

814-865-9505

Department of Electrical Engineering

814-865-7039