A recurrent neural network language modeling framework for extractive speech summarization

Kuan Yu Chen, Shih Hung Liu, Berlin Chen, Hsin Min Wang, Wen Lion Hsu, Hsin Hsi Chen

Research output: Contribution to journalConference article

5 Citations (Scopus)

Abstract

Extractive speech summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document so as to concisely express the most important theme of the document, has been an active area of research and development. A recent school of thought is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each spoken document to be summarized. This paper presents a continuation of this general line of research and its contribution is two-fold. First, we propose a novel and effective recurrent neural network language modeling (RNNLM) framework for speech summarization, on top of which the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within spoken documents, getting around the need for the strict bag-of-words assumption. Second, the utilities of the method originated from our proposed framework and several widely-used unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization method when compared to several state-of-the-art existing unsupervised methods.

Original languageEnglish
Article number6890220
JournalProceedings - IEEE International Conference on Multimedia and Expo
Volume2014-September
Issue numberSeptmber
DOIs
Publication statusPublished - 2014 Sep 3
Event2014 IEEE International Conference on Multimedia and Expo, ICME 2014 - Chengdu, China
Duration: 2014 Jul 142014 Jul 18

Fingerprint

Recurrent neural networks
Experiments

Keywords

  • language modeling
  • long-span structural information
  • recurrent neural network
  • speech summarization

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

A recurrent neural network language modeling framework for extractive speech summarization. / Chen, Kuan Yu; Liu, Shih Hung; Chen, Berlin; Wang, Hsin Min; Hsu, Wen Lion; Chen, Hsin Hsi.

In: Proceedings - IEEE International Conference on Multimedia and Expo, Vol. 2014-September, No. Septmber, 6890220, 03.09.2014.

Research output: Contribution to journalConference article

Chen, Kuan Yu ; Liu, Shih Hung ; Chen, Berlin ; Wang, Hsin Min ; Hsu, Wen Lion ; Chen, Hsin Hsi. / A recurrent neural network language modeling framework for extractive speech summarization. In: Proceedings - IEEE International Conference on Multimedia and Expo. 2014 ; Vol. 2014-September, No. Septmber.
@article{eaba40c064ec46c1a89b3efee647d8ec,
title = "A recurrent neural network language modeling framework for extractive speech summarization",
abstract = "Extractive speech summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document so as to concisely express the most important theme of the document, has been an active area of research and development. A recent school of thought is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each spoken document to be summarized. This paper presents a continuation of this general line of research and its contribution is two-fold. First, we propose a novel and effective recurrent neural network language modeling (RNNLM) framework for speech summarization, on top of which the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within spoken documents, getting around the need for the strict bag-of-words assumption. Second, the utilities of the method originated from our proposed framework and several widely-used unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization method when compared to several state-of-the-art existing unsupervised methods.",
keywords = "language modeling, long-span structural information, recurrent neural network, speech summarization",
author = "Chen, {Kuan Yu} and Liu, {Shih Hung} and Berlin Chen and Wang, {Hsin Min} and Hsu, {Wen Lion} and Chen, {Hsin Hsi}",
year = "2014",
month = "9",
day = "3",
doi = "10.1109/ICME.2014.6890220",
language = "English",
volume = "2014-September",
journal = "Proceedings - IEEE International Conference on Multimedia and Expo",
issn = "1945-7871",
number = "Septmber",

}

TY - JOUR

T1 - A recurrent neural network language modeling framework for extractive speech summarization

AU - Chen, Kuan Yu

AU - Liu, Shih Hung

AU - Chen, Berlin

AU - Wang, Hsin Min

AU - Hsu, Wen Lion

AU - Chen, Hsin Hsi

PY - 2014/9/3

Y1 - 2014/9/3

N2 - Extractive speech summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document so as to concisely express the most important theme of the document, has been an active area of research and development. A recent school of thought is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each spoken document to be summarized. This paper presents a continuation of this general line of research and its contribution is two-fold. First, we propose a novel and effective recurrent neural network language modeling (RNNLM) framework for speech summarization, on top of which the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within spoken documents, getting around the need for the strict bag-of-words assumption. Second, the utilities of the method originated from our proposed framework and several widely-used unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization method when compared to several state-of-the-art existing unsupervised methods.

AB - Extractive speech summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document so as to concisely express the most important theme of the document, has been an active area of research and development. A recent school of thought is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each spoken document to be summarized. This paper presents a continuation of this general line of research and its contribution is two-fold. First, we propose a novel and effective recurrent neural network language modeling (RNNLM) framework for speech summarization, on top of which the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within spoken documents, getting around the need for the strict bag-of-words assumption. Second, the utilities of the method originated from our proposed framework and several widely-used unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization method when compared to several state-of-the-art existing unsupervised methods.

KW - language modeling

KW - long-span structural information

KW - recurrent neural network

KW - speech summarization

UR - http://www.scopus.com/inward/record.url?scp=84930933016&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930933016&partnerID=8YFLogxK

U2 - 10.1109/ICME.2014.6890220

DO - 10.1109/ICME.2014.6890220

M3 - Conference article

VL - 2014-September

JO - Proceedings - IEEE International Conference on Multimedia and Expo

JF - Proceedings - IEEE International Conference on Multimedia and Expo

SN - 1945-7871

IS - Septmber

M1 - 6890220

ER -