Exploring low-dimensional structures of modulation spectra for robust speech recognition

Bi Cheng Yan, Chin Hong Shih, Shih Hung Liu, Berlin Chen

Research output: Contribution to journalConference article

Abstract

Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low-dimensional structures of speech features can yield good robustness. Along this vein, research on low-rank representation (LRR), which considers the intrinsic structures of speech features lying on some low dimensional subspaces, has gained considerable interest from the ASR community. When speech features are contaminated with various types of environmental noise, its corresponding modulation spectra can be regarded as superpositions of unstructured sparse noise over the inherent linguistic information. As such, we in this paper endeavor to explore the low dimensional structures of modulation spectra, in the hope to obtain more noise-robust speech features. The main contribution is that we propose a novel use of the LRR-based method to discover the subspace structures of modulation spectra, thereby alleviating the negative effects of noise interference. Furthermore, we also extensively compare our approach with several well-practiced feature-based normalization methods. All experiments were conducted and verified on the Aurora-4 database and task. The empirical results show that the proposed LRR-based method can provide significant word error reductions for a typical DNN-HMM hybrid ASR system.

Original languageEnglish
Pages (from-to)3637-3641
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - 2017 Jan 1
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 2017 Aug 202017 Aug 24

Fingerprint

Robust Speech Recognition
Speech recognition
Automatic Speech Recognition
Modulation
Interference
Subspace
Noise Robustness
Error Reduction
Veins
Acoustic noise
Linguistics
Normalization
Superposition
Robustness
Speech
Speech Recognition
Experiment
Experiments

Keywords

  • Deep neural network
  • Low-rank representation
  • Modulation spectrum
  • Robustness
  • Sparse representation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Exploring low-dimensional structures of modulation spectra for robust speech recognition. / Yan, Bi Cheng; Shih, Chin Hong; Liu, Shih Hung; Chen, Berlin.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2017-August, 01.01.2017, p. 3637-3641.

Research output: Contribution to journalConference article

@article{30d5aa3ac7f048e0bbc45b59d479fb73,
title = "Exploring low-dimensional structures of modulation spectra for robust speech recognition",
abstract = "Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low-dimensional structures of speech features can yield good robustness. Along this vein, research on low-rank representation (LRR), which considers the intrinsic structures of speech features lying on some low dimensional subspaces, has gained considerable interest from the ASR community. When speech features are contaminated with various types of environmental noise, its corresponding modulation spectra can be regarded as superpositions of unstructured sparse noise over the inherent linguistic information. As such, we in this paper endeavor to explore the low dimensional structures of modulation spectra, in the hope to obtain more noise-robust speech features. The main contribution is that we propose a novel use of the LRR-based method to discover the subspace structures of modulation spectra, thereby alleviating the negative effects of noise interference. Furthermore, we also extensively compare our approach with several well-practiced feature-based normalization methods. All experiments were conducted and verified on the Aurora-4 database and task. The empirical results show that the proposed LRR-based method can provide significant word error reductions for a typical DNN-HMM hybrid ASR system.",
keywords = "Deep neural network, Low-rank representation, Modulation spectrum, Robustness, Sparse representation",
author = "Yan, {Bi Cheng} and Shih, {Chin Hong} and Liu, {Shih Hung} and Berlin Chen",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-611",
language = "English",
volume = "2017-August",
pages = "3637--3641",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Exploring low-dimensional structures of modulation spectra for robust speech recognition

AU - Yan, Bi Cheng

AU - Shih, Chin Hong

AU - Liu, Shih Hung

AU - Chen, Berlin

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low-dimensional structures of speech features can yield good robustness. Along this vein, research on low-rank representation (LRR), which considers the intrinsic structures of speech features lying on some low dimensional subspaces, has gained considerable interest from the ASR community. When speech features are contaminated with various types of environmental noise, its corresponding modulation spectra can be regarded as superpositions of unstructured sparse noise over the inherent linguistic information. As such, we in this paper endeavor to explore the low dimensional structures of modulation spectra, in the hope to obtain more noise-robust speech features. The main contribution is that we propose a novel use of the LRR-based method to discover the subspace structures of modulation spectra, thereby alleviating the negative effects of noise interference. Furthermore, we also extensively compare our approach with several well-practiced feature-based normalization methods. All experiments were conducted and verified on the Aurora-4 database and task. The empirical results show that the proposed LRR-based method can provide significant word error reductions for a typical DNN-HMM hybrid ASR system.

AB - Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low-dimensional structures of speech features can yield good robustness. Along this vein, research on low-rank representation (LRR), which considers the intrinsic structures of speech features lying on some low dimensional subspaces, has gained considerable interest from the ASR community. When speech features are contaminated with various types of environmental noise, its corresponding modulation spectra can be regarded as superpositions of unstructured sparse noise over the inherent linguistic information. As such, we in this paper endeavor to explore the low dimensional structures of modulation spectra, in the hope to obtain more noise-robust speech features. The main contribution is that we propose a novel use of the LRR-based method to discover the subspace structures of modulation spectra, thereby alleviating the negative effects of noise interference. Furthermore, we also extensively compare our approach with several well-practiced feature-based normalization methods. All experiments were conducted and verified on the Aurora-4 database and task. The empirical results show that the proposed LRR-based method can provide significant word error reductions for a typical DNN-HMM hybrid ASR system.

KW - Deep neural network

KW - Low-rank representation

KW - Modulation spectrum

KW - Robustness

KW - Sparse representation

UR - http://www.scopus.com/inward/record.url?scp=85039151148&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039151148&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-611

DO - 10.21437/Interspeech.2017-611

M3 - Conference article

VL - 2017-August

SP - 3637

EP - 3641

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -