Blog
About

160
views
0
recommends
+1 Recommend
1 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Broadcast Language Identification & Subtitling System (BLISS)

      1 , 2 , 3 , 2 , 4

      Proceedings of the 32nd International BCS Human Computer Interaction Conference (HCI)

      Human Computer Interaction Conference

      4 - 6 July 2018

      Automatic Speech Recognition (ASR), Accent, Automated Subtitling, Background Noise, BLISS, Human-Computer Interaction, Kaldi, LibriSpeech

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Accessibility is an important area of Human Computer Interaction (HCI) and regulations within many countries mandate that broadcast media content be accessible to all. Currently, most subtitles for offline and live broadcasts are produced by people. However, subtitling methods employing re-speaking with Automatic Speech Recognition (ASR) technology are increasingly replacing manual methods. We discuss here the subtitling component of BLISS (Broadcast Language Identification & Subtitling System), an ASR system for automated subtitling and broadcast monitoring built using the Kaldi ASR Toolkit. The BLISS Gaussian Mixture Model (GMM)/Hidden Markov Model (HMM) acoustic model has been trained with ~960 hours of read speech, and language model with ~900k words combined with a pronunciation dictionary of 200k words from the LibriSpeech corpus. In tests with ~5 hours of unseen clean speech test data with little background noise and seen accents BLISS gives recognition accuracy of 91.87% based on the WER (Word Error Rate) metric. For ~5 hours of unseen challenge speech test data, with higher-WER speakers, BLISS’s accuracy reduces to 75.91%. A BLISS Deep Learning Neural Network (DNN) acoustic model has also been trained with ~100 hours of read speech data. It’s accuracy for ~5 hours of unseen clean and unseen challenge speech test data is 92.88% and 77.27% respectively based on WER. Future work includes training the DNN model with ~960 hours of read speech data using CUDA GPUs and also incorporating algorithms for background noise reduction. The BLISS core engine is also intended as a Language Identification system for broadcast monitoring (BLIS). This paper focuses on its Subtitling application (BLSS).

          Related collections

          Most cited references 33

          • Record: found
          • Abstract: not found
          • Article: not found

          Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

           G E Dahl,  Dong Yu,  Li Deng (2012)
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Acoustic Modeling Using Deep Belief Networks

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Librispeech: An ASR corpus based on public domain audio books

                Bookmark

                Author and article information

                Contributors
                Conference
                July 2018
                July 2018
                : 1-6
                Affiliations
                [ 1 ] Faculty of Computing, Engineering & Built Environment, Ulster University (Magee), Derry/Londonderry, BT48 7JL, Northern Ireland
                [ 2 ] Faculty of Computing, Engineering & Built Environment, Ulster University (Jordanstown), Newtownabbey, BT37 0QB
                [ 3 ] Department of Computing, Letterkenny Institute of Technology (LYIT), Port Road, Letterkenny, IRL- F92 FC93, Co. Donegal, Ireland
                [ 4 ] Faculty of Arts, Humanities & Social Sciences, Ulster University (Magee), Derry/Londonderry, BT48 7JL, Northern Ireland
                Article
                10.14236/ewic/HCI2018.150
                © Wang et al. Published by BCS Learning and Development Ltd. Proceedings of British HCI 2018. Belfast, UK.

                This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                Proceedings of the 32nd International BCS Human Computer Interaction Conference
                HCI
                32
                Belfast, UK
                4 - 6 July 2018
                Electronic Workshops in Computing (eWiC)
                Human Computer Interaction Conference
                Product
                Product Information: 1477-9358BCS Learning & Development
                Self URI (journal page): https://ewic.bcs.org/
                Categories
                Electronic Workshops in Computing

                Comments

                Comment on this article