1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PSST! Prosodic Speech Segmentation with Transformers

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Self-attention mechanisms have enabled transformers to achieve superhuman-level performance on many speech-to-text (STT) tasks, yet the challenge of automatic prosodic segmentation has remained unsolved. In this paper we finetune Whisper, a pretrained STT model, to annotate intonation unit (IU) boundaries by repurposing low-frequency tokens. Our approach achieves an accuracy of 95.8%, outperforming previous methods without the need for large-scale labeled data or enterprise grade compute resources. We also diminish input signals by applying a series of filters, finding that low pass filters at a 3.2 kHz level improve segmentation performance in out of sample and out of distribution contexts. We release our model as both a transcription tool and a baseline for further improvements in prosodic segmentation.

          Related collections

          Author and article information

          Journal
          03 February 2023
          Article
          2302.01984
          7258ce21-2eee-4135-9eda-bb481ecbe52b

          http://creativecommons.org/licenses/by/4.0/

          History
          Custom metadata
          5 pages, 3 figures. For associated repository, see https://github.com/Nathan-Roll1/psst
          cs.CL cs.SD eess.AS

          Theoretical computer science,Electrical engineering,Graphics & Multimedia design

          Comments

          Comment on this article