5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SAIL: Self-Improving Efficient Online Alignment of Large Language Models

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead.

          Related collections

          Author and article information

          Journal
          21 June 2024
          Article
          2406.15567
          942fb134-de56-49b4-b3d6-60bbaa35a38e

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          24 pages, 6 figures, 3 tables
          cs.LG cs.AI cs.CL stat.ML

          Theoretical computer science,Machine learning,Artificial intelligence
          Theoretical computer science, Machine learning, Artificial intelligence

          Comments

          Comment on this article