Leveraging topical and positional cues for language modeling in speech recognition

Hsuan Sheng Chiu, Kuan Yu Chen, Berlin Chen

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

This paper investigates language modeling with topical and positional information for large vocabulary continuous speech recognition. We first compare among a few topic models both theoretically and empirically, including document topic models and word topic models. On the other hand, since for some spoken documents such as broadcast news stories, the composition and the word usage of documents of the same style are usually similar, the documents hence can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, like introductory remarks, elucidations of methodology or affairs, conclusions of the articles, references or footnotes of reporters, etc. We hence present two position-dependent language models for speech recognition by integrating word positional information into the exiting n-gram and topic models. The experiments conducted on broadcast news transcription seem to indicate that such position-dependent models obtain comparable results to the existing n-gram and topic models.

Original languageEnglish
Pages (from-to)1465-1481
Number of pages17
JournalMultimedia Tools and Applications
Volume72
Issue number2
DOIs
Publication statusPublished - 2014 Jan 1

Fingerprint

Speech recognition
Continuous speech recognition
Transcription
Chemical analysis
Experiments

Keywords

  • Language model
  • Language model adaptation
  • Positional information
  • Speech recognition
  • Topical information

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Leveraging topical and positional cues for language modeling in speech recognition. / Chiu, Hsuan Sheng; Chen, Kuan Yu; Chen, Berlin.

In: Multimedia Tools and Applications, Vol. 72, No. 2, 01.01.2014, p. 1465-1481.

Research output: Contribution to journalArticle

@article{9fe291daa08e4e24b29ce9f352a1b204,
title = "Leveraging topical and positional cues for language modeling in speech recognition",
abstract = "This paper investigates language modeling with topical and positional information for large vocabulary continuous speech recognition. We first compare among a few topic models both theoretically and empirically, including document topic models and word topic models. On the other hand, since for some spoken documents such as broadcast news stories, the composition and the word usage of documents of the same style are usually similar, the documents hence can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, like introductory remarks, elucidations of methodology or affairs, conclusions of the articles, references or footnotes of reporters, etc. We hence present two position-dependent language models for speech recognition by integrating word positional information into the exiting n-gram and topic models. The experiments conducted on broadcast news transcription seem to indicate that such position-dependent models obtain comparable results to the existing n-gram and topic models.",
keywords = "Language model, Language model adaptation, Positional information, Speech recognition, Topical information",
author = "Chiu, {Hsuan Sheng} and Chen, {Kuan Yu} and Berlin Chen",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/s11042-013-1456-2",
language = "English",
volume = "72",
pages = "1465--1481",
journal = "Multimedia Tools and Applications",
issn = "1380-7501",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - Leveraging topical and positional cues for language modeling in speech recognition

AU - Chiu, Hsuan Sheng

AU - Chen, Kuan Yu

AU - Chen, Berlin

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This paper investigates language modeling with topical and positional information for large vocabulary continuous speech recognition. We first compare among a few topic models both theoretically and empirically, including document topic models and word topic models. On the other hand, since for some spoken documents such as broadcast news stories, the composition and the word usage of documents of the same style are usually similar, the documents hence can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, like introductory remarks, elucidations of methodology or affairs, conclusions of the articles, references or footnotes of reporters, etc. We hence present two position-dependent language models for speech recognition by integrating word positional information into the exiting n-gram and topic models. The experiments conducted on broadcast news transcription seem to indicate that such position-dependent models obtain comparable results to the existing n-gram and topic models.

AB - This paper investigates language modeling with topical and positional information for large vocabulary continuous speech recognition. We first compare among a few topic models both theoretically and empirically, including document topic models and word topic models. On the other hand, since for some spoken documents such as broadcast news stories, the composition and the word usage of documents of the same style are usually similar, the documents hence can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, like introductory remarks, elucidations of methodology or affairs, conclusions of the articles, references or footnotes of reporters, etc. We hence present two position-dependent language models for speech recognition by integrating word positional information into the exiting n-gram and topic models. The experiments conducted on broadcast news transcription seem to indicate that such position-dependent models obtain comparable results to the existing n-gram and topic models.

KW - Language model

KW - Language model adaptation

KW - Positional information

KW - Speech recognition

KW - Topical information

UR - http://www.scopus.com/inward/record.url?scp=84904857190&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904857190&partnerID=8YFLogxK

U2 - 10.1007/s11042-013-1456-2

DO - 10.1007/s11042-013-1456-2

M3 - Article

AN - SCOPUS:84904857190

VL - 72

SP - 1465

EP - 1481

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

SN - 1380-7501

IS - 2

ER -