新穎基於預訓練語言表示模型於語音辨識重新排序之研究

Translated title of the contribution: Innovative Pretrained-based Reranking Language Models for N-best Speech Recognition Lists

Shih Hsuan Chiu, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes two BERT-based models for accurately rescoring (reranking) N-best speech recognition hypothesis lists. Reranking the N-best hypothesis lists decoded from the acoustic model has been proven to improve the performance in a two-stage automatic speech recognition (ASR) systems. However, with the rise of pre-trained contextualized language models, they have achieved state-of-the-art performance in many NLP applications, but there is a dearth of work on investigating its effectiveness in ASR. In this paper, we develop simple yet effective methods for improving ASR by reranking the N-best hypothesis lists leveraging BERT (bidirectional encoder representations from Transformers). Specifically, we treat reranking N-best hypotheses as a downstream task by simply fine-tuning the pre-trained BERT. We proposed two BERT-based reranking language models: (1) uniBERT: ideal unigram elicited from a given N-best list taking advantage of BERT to assist a LSTMLM, (2) classBERT: treating the N-best lists reranking as a multi-class classification problem. These models attempt to harness the power of BERT to reranking the N-best hypothesis lists generated in the ASR initial pass. Experiments on the benchmark AMI dataset show that the proposed reranking methods outperform the baseline LSTMLM which is a strong and widely-used competitor with 3.14% improvement in word error rate (WER).

Translated title of the contributionInnovative Pretrained-based Reranking Language Models for N-best Speech Recognition Lists
Original languageChinese (Traditional)
Title of host publicationROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
EditorsJenq-Haur Wang, Ying-Hui Lai, Lung-Hao Lee, Kuan-Yu Chen, Hung-Yi Lee, Chi-Chun Lee, Syu-Siang Wang, Hen-Hsen Huang, Chuan-Ming Liu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages148-162
Number of pages15
ISBN (Electronic)9789869576932
Publication statusPublished - 2020
Event32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020 - Taipei, Taiwan
Duration: 2020 Sept 242020 Sept 26

Publication series

NameROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing

Conference

Conference32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020
Country/TerritoryTaiwan
CityTaipei
Period2020/09/242020/09/26

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Innovative Pretrained-based Reranking Language Models for N-best Speech Recognition Lists'. Together they form a unique fingerprint.

Cite this