TY - GEN
T1 - Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition
AU - Chiu, Shih Hsuan
AU - Lo, Tien Hong
AU - Chao, Fu An
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - How to effectively incorporate cross-utterance information cues into a neural language model (LM) has emerged as one of the intriguing issues for automatic speech recognition (ASR). Existing research efforts on improving conualization of an LM typically regard previous utterances as a sequence of additional input and may fail to capture complex global structural dependencies among these utterances. In view of this, we in this paper seek to represent the historical con information of an utterance as graph-structured data so as to distill cross-utterances, global word interaction relationships. To this end, we apply a graph convolutional network (GCN) on the resulting graph to obtain the corresponding GCN embeddings of historical words. GCN has recently found its versatile applications in social-network analysis, summarization, and among others due mainly to its ability of effectively capturing rich relational information among elements. However, GCN remains largely underexplored in the con of ASR, especially for dealing with conversational speech. In addition, we frame ASR N-best reranking as a prediction problem, leveraging bidirectional encoder representations from transformers (BERT) as the vehicle to not only seize the local intrinsic word regularity patterns inherent in a candidate hypothesis but also incorporate the cross-utterance, historical word interaction cues distilled by GCN for promoting performance. Extensive experiments conducted on the AMI benchmark dataset seem to confirm the pragmatic utility of our methods, in relation to some current top-of-the-line methods.
AB - How to effectively incorporate cross-utterance information cues into a neural language model (LM) has emerged as one of the intriguing issues for automatic speech recognition (ASR). Existing research efforts on improving conualization of an LM typically regard previous utterances as a sequence of additional input and may fail to capture complex global structural dependencies among these utterances. In view of this, we in this paper seek to represent the historical con information of an utterance as graph-structured data so as to distill cross-utterances, global word interaction relationships. To this end, we apply a graph convolutional network (GCN) on the resulting graph to obtain the corresponding GCN embeddings of historical words. GCN has recently found its versatile applications in social-network analysis, summarization, and among others due mainly to its ability of effectively capturing rich relational information among elements. However, GCN remains largely underexplored in the con of ASR, especially for dealing with conversational speech. In addition, we frame ASR N-best reranking as a prediction problem, leveraging bidirectional encoder representations from transformers (BERT) as the vehicle to not only seize the local intrinsic word regularity patterns inherent in a candidate hypothesis but also incorporate the cross-utterance, historical word interaction cues distilled by GCN for promoting performance. Extensive experiments conducted on the AMI benchmark dataset seem to confirm the pragmatic utility of our methods, in relation to some current top-of-the-line methods.
KW - BERT
KW - GCN
KW - N-best hypothesis reranking
KW - automatic speech recognition
KW - cross-utterance
KW - language modeling
UR - http://www.scopus.com/inward/record.url?scp=85126688859&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126688859&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126688859
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 1104
EP - 1110
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -