Discriminative autoencoders for acoustic modeling

Ming Han Yang, Hung Shin Lee, Yu Ding Lu, Kuan Yu Chen, Yu Tsao, Berlin Chen, Hsin Min Wang

研究成果: 雜誌貢獻會議論文同行評審

5 引文 斯高帕斯(Scopus)


Speech data typically contain information irrelevant to automatic speech recognition (ASR), such as speaker variability and channel/environmental noise, lurking deep within acoustic features. Such unwanted information is always mixed together to stunt the development of an ASR system. In this paper, we propose a new framework based on autoencoders for acoustic modeling in ASR. Unlike other variants of autoencoder neural networks, our framework is able to isolate phonetic components from a speech utterance by simultaneously taking two kinds of objectives into consideration. The first one relates to the minimization of reconstruction errors and benefits to learn most salient and useful properties of the data. The second one functions in the middlemost code layer, where the categorical distribution of the context-dependent phone states is estimated for phoneme discrimination and the derivation of acoustic scores, the proximity relationship among utterances spoken by the same speaker are preserved, and the intra-utterance noise is modeled and abstracted away. We describe the implementation of the discriminative autoencoders for training tri-phone acoustic models and present TIMIT phone recognition results, which demonstrate that our proposed method outperforms the conventional DNN-based approach.

頁(從 - 到)3557-3561
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版狀態已發佈 - 2017
事件18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, 瑞典
持續時間: 2017 8月 202017 8月 24

ASJC Scopus subject areas

  • 語言與語言學
  • 人機介面
  • 訊號處理
  • 軟體
  • 建模與模擬


深入研究「Discriminative autoencoders for acoustic modeling」主題。共同形成了獨特的指紋。