11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This research assesses the evolution of lexical diversity in scholarly titles using a new indicator based on zipfian frequency-rank distribution tail fits. At the operational level, while both head and tail fits of zipfian word distributions are more independent of corpus size than other lexical diversity indicators, the latter however neatly outperforms the former in that regard. This benchmark-setting performance of zipfian distribution tails proves extremely handy in distinguishing actual patterns in lexical diversity from the statistical noise generated by other indicators due to corpus size fluctuations. From an empirical perspective, analysis of Web of Science (WoS) article titles from 1975 to 2014 shows that the lexical concentration of scholarly titles in Natural Sciences & Engineering (NSE) and Social Sciences & Humanities (SSH) articles increases by a little less than 8% over the whole period. With the exception of the lexically concentrated Mathematics, Earth & Space, and Physics, NSE article titles all increased in lexical concentration, suggesting a probable convergence of concentration levels in the near future. As regards to SSH disciplines, aggregation effects observed at the disciplinary group level suggests that, behind the stable concentration levels of SSH disciplines, a cross-disciplinary homogenization of the highest word frequency ranks may be at work. Overall, these trends suggest a progressive standardization of title wording in scientific article titles, as article titles get written using an increasingly restricted and cross-disciplinary set of words.

          Related collections

          Most cited references80

          • Record: found
          • Abstract: not found
          • Article: not found

          REFLECTIONS OF THE ENVIRONMENT IN MEMORY

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment

            The main purpose of this study was to examine the validity of the approach to lexical diversity assessment known as the measure of textual lexical diversity (MTLD). The index for this approach is calculated as the mean length of word strings that maintain a criterion level of lexical variation. To validate the MTLD approach, we compared it against the performances of the primary competing indices in the field, which include vocd-D, TTR, Maas, Yule's K, and an HD-D index derived directly from the hypergeometric distribution function. The comparisons involved assessments of convergent validity, divergent validity, internal validity, and incremental validity. The results of our assessments of these indices across two separate corpora suggest three major findings. First, MTLD performs well with respect to all four types of validity and is, in fact, the only index not found to vary as a function of text length. Second, HD-D is a viable alternative to the vocd-D standard. And third, three of the indices--MTLD, vocd-D (or HD-D), and Maas--appear to capture unique lexical information. We conclude by advising researchers to consider using MTLD, vocd-D (or HD-D), and Maas in their studies, rather than any single index, noting that lexical diversity can be assessed in many ways and each approach may be informative as to the construct under investigation.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Mapping of science by combined co-citation and word analysis. I. Structural aspects

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Formal analysisRole: MethodologyRole: SoftwareRole: Validation
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: SupervisionRole: ValidationRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2018
                9 July 2018
                : 13
                : 7
                : e0197775
                Affiliations
                [1 ] École de Bibliothéconomie et des Sciences de l’information, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, QC. H3C 3J7, Canada
                [2 ] Centre for Science and Technology Studies, Leiden University, P.O. Box 905, 2300 AX Leiden, The Netherlands
                [3 ] Observatoire des Sciences et des Technologies (OST), Centre Interuniversitaire de Recherche sur la Science et la Technologie (CIRST), Université du Québec à Montréal, CP 8888, Succ. Centre-Ville, Montréal, QC. H3C 3P8, Canada
                KU Leuven, BELGIUM
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-0760-0290
                Article
                PONE-D-18-02903
                10.1371/journal.pone.0197775
                6037356
                29985920
                e78683ec-f6d6-4dbc-b423-74c2b7f9cdfe
                © 2018 Bérubé et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 27 January 2018
                : 8 May 2018
                Page count
                Figures: 9, Tables: 5, Pages: 31
                Funding
                The author(s) received no specific funding for this work.
                Categories
                Research Article
                Social Sciences
                Linguistics
                Semantics
                Social Sciences
                Linguistics
                Computational Linguistics
                Social Sciences
                Linguistics
                Languages
                Social Sciences
                Social Sciences
                Linguistics
                Grammar
                Phonology
                Vocabulary
                Engineering and Technology
                Physical Sciences
                Physics
                Mathematical Physics
                Social Sciences
                Linguistics
                Grammar
                Syntax
                Custom metadata
                All data and scripts used to generate the different lexical distributions can be accessed via the Open Science Framework at https://osf.io/hxrua/.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article