Speech Genre Classification in Online Multimedia Platforms: A Cross-Modal Approach Integrating Text and Prosody

Research output: Contribution to journalConference articlepeer-review

Abstract

The rise of online multimedia, particularly on YouTube, has transformed information dissemination. Specifically, this study examines the multimodal nature of these online speeches, focusing on two prevalent speech genres in Taiwan Mandarin YouTube content: entertaining and informative clips. We collected 100-minute video clips from sixteen influential YouTubers for each genre, and segmented the clips into inter-pause units (IPUs) for subsequent analyses. For each IPU, acoustic features describing durational, rhythmic, and pitch patterns were derived from its speech signals, while bag-of-word lexical features were developed from its textual content. Our objectives were twofold: firstly, to explore the genre-specific prosodic patterns using the proposed prosodic feature set, and secondly, to evaluate the additional contribution of these prosodic features in enhancing the accuracy of speech genre classification when integrated with textual features. Results show that the ensemble model outperforms prosody-only and text-only mono-modal models with an 84.6% accuracy, suggesting the complementary role of prosodic features in speech genre classification. Furthermore, our findings underscore the impact of semantic topics on textual features, potentially leading to misclassifications of topic-neutral IPUs in monomodal models. This study highlights the imperative consideration of both prosodic and textual features in determining speech genres within multimodal discourse.

Original languageEnglish
Pages (from-to)195-199
Number of pages5
JournalProceedings of the International Conference on Speech Prosody
DOIs
Publication statusPublished - 2024
Event12th International Conference on Speech Prosody, Speech Prosody 2024 - Leiden, Netherlands
Duration: 2025 Jul 22025 Jul 5

Keywords

  • genre classification
  • multimodal discourse
  • online discourse
  • speech genre
  • speech prosody

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Speech Genre Classification in Online Multimedia Platforms: A Cross-Modal Approach Integrating Text and Prosody'. Together they form a unique fingerprint.

Cite this