75
views
0
recommends
+1 Recommend
1 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      From categories to gradience: Auto-coding sociophonetic variation with random forests

      research-article

      Read this article at

      ScienceOpenPublisher
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The time-consuming nature of coding sociophonetic variables that are typically treated as categorical represents an impediment to addressing research questions around these variables that require large volumes of data. In this paper, we apply a machine learning method, random forest classification ( Breiman, 2001), to automate coding (categorical prediction) of two English sociophonetic variables traditionally treated as categorical, non-prevocalic /r/ and word-medial intervocalic /t/, based on tokens’ acoustic signatures. We found good performance for binary classifiers of non-prevocalic /r/ (Absent versus Present) and medial /t/ (Voiced versus Voiceless), but not for medial /t/ with a six-way coding distinction (largely due to some codes being sparsely represented in the training data). This method also yields rankings of acoustic measures in terms of importance in classification. Beyond any individual measures, this method generates probabilistic predictions of variation (classifier probabilities) that represent a composite of the acoustic cues fed into the model. In a listening experiment, we found that not only did classifier probabilities significantly capture gradience in trained listeners’ perceptions of rhoticity, they better predicted listeners’ perceptions than individual acoustic measures. This method thus represents a new approach to reconciling the categorical and continuous dimensions of sociophonetic variation.

          Related collections

          Most cited references73

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found
            Is Open Access

            Fitting Linear Mixed-Effects Models Using lme4

            Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer. Journal of Statistical Software, 67 (1) ISSN:1548-7660
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Collinearity: a review of methods to deal with it and a simulation study evaluating their performance

                Bookmark

                Author and article information

                Contributors
                Journal
                1868-6354
                Laboratory Phonology: Journal of the Association for Laboratory Phonology
                Ubiquity Press
                1868-6354
                10 June 2020
                2020
                : 11
                : 1
                : 6
                Affiliations
                [1 ]Department of Linguistics, University of Pittsburgh, Pittsburgh, PA, US
                [2 ]New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, NZ
                [3 ]Department of Linguistics, University of Canterbury, Christchurch, NZ
                Author information
                http://orcid.org/0000-0002-6070-1138
                http://orcid.org/0000-0003-3282-6555
                http://orcid.org/0000-0001-8127-0413
                http://orcid.org/0000-0002-2341-0921
                Article
                10.5334/labphon.216
                ed429372-1ab7-474e-bc0c-00f6d9bd1993
                Copyright: © 2020 The Author(s)

                This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.

                History
                : 24 July 2019
                : 11 April 2020
                Categories
                Journal article

                Applied linguistics,General linguistics,Linguistics & Semiotics
                machine learning,rhoticity,New Zealand English,Sociophonetic variation

                Comments

                Comment on this article