988
views
0
recommends
+1 Recommend
1 collections
    1
    shares

      One-Click Submission System Now Available for SO Preprints, learn more on how this works in our blog post and don't forget to check the video, too!

      scite_
       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Towards a Universal Scaling Law of LLM Training and Inference

      Preprint
      In review
      research-article
        1 , , 1
      ScienceOpen Preprints
      ScienceOpen
      large language model, scaling law

            Abstract

            Guided by the prophecy of scaling law, large language models (LLMs) demonstrate higher levels of intelligence with increased sizes and computational power. Meanwhile , the overall outcome of small LLMs seems to show a scaling trend when a higher inference cost is paid in prompting and sampling. However, the inherent relatedness between training and inference in the path of scaling up is less studied. In this article, we present a universal theory on the joint computational scaling of LLM training and inference, which characterizes the general behaviors of LLM in various settings. Based on simple modeling of several key hyperparameters, we give intuitive explanations for the effectiveness of various techniques at both training and inference time. To explain the limitation of the current inference paradigm, we further propose the concept of meta-scaling to address the problem of error accumulation in the inference scaling process. We hope that this work can provide insight into future LLM research, development, and applications.

            Content

            Author and article information

            Journal
            ScienceOpen Preprints
            ScienceOpen
            17 September 2024
            Affiliations
            [1 ] Huawei Noah's Ark Lab;
            Author notes
            Author information
            https://orcid.org/0000-0001-5730-8792
            Article
            10.14293/PR2199.001074.v1
            ff29502d-50ee-4ed5-9631-a5731d211dcc

            This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

            History
            : 17 September 2024
            Categories

            All data generated or analysed during this study are included in this published article (and its supplementary information files).
            Computer science,Artificial intelligence
            large language model,scaling law

            Comments

            Comment on this article