In this two-year research project, we set out to design and develop novel modeling frameworks to speech summarization. It is motivated by the urgent need of efficient and effective automatic summarization methods for huge amounts of multimedia associated with speech information. Meanwhile, we have witnessed rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. Notably, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, apart from the various recurrent neural network (RNN) based summarization models, we in this project adopt and contextualize the state-of-the-art BERT-based model for speech summarization, while its major contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Second, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Third, we confirm the efficacy of our proposed methods on a benchmark dataset, in relation to several classic and celebrated speech summarization methods. Related results of this project have been presented at several premier international conferences, such as ICASSP 2019, ASRU 2019, APSIPA 2019, ICASSP 2020, Interspeech 2020, APSIPA 2020 and EUSIPCO 2020, as well as prestigious journals, such as ACM Transactions on Asian and Low-Resource Language Information Processing. From here on down, we will focus exclusively on the efforts we have made for the above-mentioned challenges and their corresponding results, based on the material that has been published in ICASSP 2019, ICASSP 2020 and EUSIPCO 2020.
|Effective start/end date||2018/08/01 → 2020/07/31|
- speech summarization
- artificial neural network
- contextualized language model
- speech recognition
- confidence score
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.