17
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training.

          Objective

          The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting.

          Methods

          Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio.

          Results

          On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)–area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time.

          Conclusions

          These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          CNN architectures for large-scale audio classification

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Audio Set: An ontology and human-labeled dataset for audio events

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Methodological issues in interviewing and using self-report questionnaires with people with mental retardation.

              In this article the authors review methodological issues that arise when interviews and self-report questionnaires are used with people with mental retardation and offer suggestions for overcoming some of the difficulties described. Examples are drawn from studies that use qualitative methodology, quantitative studies assessing different question types, and studies reporting on the development of instruments measuring psychiatric symptoms, self-concept, and quality of life. Specific problems that arise with respect to item content (e.g., quantitative judgments, generalizations), question phrasing (e.g., modifiers), response format (e.g., acquiescence, multiple-choice questions), and psychometric properties (factor structure and validity) are discussed. It is argued that because many self-report questionnaires include questions that have been found to be problematic in this population, more attention needs to be paid to establishing the validity of such measures and to clearly defining the population for which the instrument is designed.
                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Form Res
                JMIR Form Res
                JFR
                JMIR Formative Research
                JMIR Publications (Toronto, Canada )
                2561-326X
                June 2020
                16 June 2020
                : 4
                : 6
                : e18279
                Affiliations
                [1 ] The Abigail Wexner Research Institute Nationwide Children's Hospital Columbus, OH United States
                [2 ] Department of Psychology Nationwide Children's Hospital Columbus, OH United States
                Author notes
                Corresponding Author: Emre Sezgin emre.sezgin@ 123456nationwidechildrens.org
                Author information
                https://orcid.org/0000-0003-4462-9735
                https://orcid.org/0000-0001-8798-9605
                https://orcid.org/0000-0002-0411-6073
                https://orcid.org/0000-0002-1924-9055
                https://orcid.org/0000-0003-2876-2042
                Article
                v4i6e18279
                10.2196/18279
                7327591
                32459656
                73e08178-e122-4a29-ae7c-930068808d51
                ©Rebecca O'Donovan, Emre Sezgin, Sven Bambach, Eric Butter, Simon Lin. Originally published in JMIR Formative Research (http://formative.jmir.org), 16.06.2020.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on http://formative.jmir.org, as well as this copyright and license information must be included.

                History
                : 17 February 2020
                : 23 March 2020
                : 17 April 2020
                : 19 April 2020
                Categories
                Original Paper
                Original Paper

                machine learning,scream detection,audio event detection,tantrum identification,autism,behavioral disorder,data-driven approach

                Comments

                Comment on this article