Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. MTA-SZTE Research Group on Artificial Intelligence, Szeged, Hungary.ORCID iD: 0000-0002-0546-116x
Institute of Informatics, University of Szeged, Szeged, Hungary.
Department of Electrical Engineering (ESAT), KU Leuven, Leuven, Belgium.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
2019 (English)In: Proc. Interspeech 2019, The International Speech Communication Association (ISCA), 2019, p. 421-425Conference paper, Published paper (Refereed)
Abstract [en]

A pivotal question in Automatic Speech Recognition (ASR) is the robustness of the trained models. In this study, we investigate the combination of two methods commonly applied to increase the robustness of ASR systems. On the one hand, inspired by auditory experiments and signal processing considerations, multi-band band processing has been used for decades to improve the noise robustness of speech recognition. On the other hand, dropout is a commonly used regularization technique to prevent overfitting by keeping the model from becoming over-reliant on a small set of neurons. We hypothesize that the careful combination of the two approaches would lead to increased robustness, by preventing the resulting model from over-rely on any given band.

To verify our hypothesis, we investigate various approaches for the combination of the two methods using the Aurora-4 corpus. The results obtained corroborate our initial assumption, and show that the proper combination of the two techniques leads to increased robustness, and to significantly lower word error rates (WERs). Furthermore, we find that the accuracy scores attained here compare favourably to those reported recently on the clean training scenario of the Aurora-4 corpus.

Place, publisher, year, edition, pages
The International Speech Communication Association (ISCA), 2019. p. 421-425
Series
Proceedings in ISCA Archive, ISSN 1990-9772
Keywords [en]
multi-band processing, band-dropout, robust speech recognition, Aurora-4
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-76905DOI: 10.21437/Interspeech.2019-3215ISI: 000831796400085Scopus ID: 2-s2.0-85074721972OAI: oai:DiVA.org:ltu-76905DiVA, id: diva2:1373866
Conference
20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019), 15-19 September, 2019, Graz, Austria
Available from: 2019-11-28 Created: 2019-11-28 Last updated: 2023-05-08Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kovács, GyörgyLiwicki, Marcus

Search in DiVA

By author/editor
Kovács, GyörgyLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 53 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf