2,115
views
0
recommends
+1 Recommend
1 collections
    1
    shares

      One-Click Submission System Now Available for SO Preprints, learn more on how this works in our blog post and don't forget to check the video, too!

      scite_
       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Iterative improvements from feedback for language models

      Preprint
      In review
      research-article
        1 ,
      ScienceOpen Preprints
      ScienceOpen
      language models, reinforcement learning

            Abstract

            Iterative improvements from feedback is a general approach for many, if not all, successful systems.Ground-truth-in-the-loop is critical.Language models (LMs) like ChatGPT are phenomenal, however, there are still issues like hallucinations and a lack of planning and controllability.We may leverage LMs' competence of language to handle tasks by prompting, fine-tuning, and augmenting with tools and APIs.AI aims for optimality.(Current) LMs are approximations, thus induce an LM-to-real gap. Our aim is to bridge such a gap.Previous study shows that grounding, agency and interaction are the cornerstone for sound and solid LMs.Iterative improvements from feedback is critical for further progress of LMs and reinforcement learning is a promising framework, although pre-training then fine-tuning is a popular approach.Iterative updates are too expensive for monolithic large LMs, thus smaller LMs are desirable.A modular architecture is thus preferred.These help make LMs adapt to humans, but not vice verse.We discuss challenges and opportunities, in particular, data & feedback, methodology, evaluation, interpretability, constraints and intelligence.

            Content

            Author and article information

            Journal
            ScienceOpen Preprints
            ScienceOpen
            7 July 2023
            Affiliations
            [1 ] RL4RealLife.org;
            Author notes
            Author information
            https://orcid.org/0000-0002-4270-2487
            Article
            10.14293/PR2199.000220.v1
            454e6452-b1f3-40b6-bd25-8d62bfbeb661

            This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

            History
            : 7 July 2023
            Categories

            Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
            Computer science,Artificial intelligence
            language models, reinforcement learning

            Comments

            Comment on this article