TY - GEN
T1 - An effective contextual language modeling framework for speech summarization with augmented features
AU - Weng, Shi Yan
AU - Lo, Tien Hong
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2021 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2021/1/24
Y1 - 2021/1/24
N2 - Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, we in this paper contextualize and enhance the state-of-the-art BERT-based model for speech summarization, while its contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Secondly, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Finally, we validate the effectiveness of our proposed method on a benchmark dataset, in comparison to several classic and celebrated speech summarization methods.
AB - Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, we in this paper contextualize and enhance the state-of-the-art BERT-based model for speech summarization, while its contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Secondly, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Finally, we validate the effectiveness of our proposed method on a benchmark dataset, in comparison to several classic and celebrated speech summarization methods.
KW - BERT
KW - Confidence score
KW - Extractive speech summarization
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85099301811&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099301811&partnerID=8YFLogxK
U2 - 10.23919/Eusipco47968.2020.9287432
DO - 10.23919/Eusipco47968.2020.9287432
M3 - Conference contribution
AN - SCOPUS:85099301811
T3 - European Signal Processing Conference
SP - 316
EP - 320
BT - 28th European Signal Processing Conference, EUSIPCO 2020 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 28th European Signal Processing Conference, EUSIPCO 2020
Y2 - 24 August 2020 through 28 August 2020
ER -