Flexible VAD-PVAD Transition: A Detachable PVAD Module for Dynamic Encoder RNN VAD

  • En Lun Yu
  • , Chien Chun Wang
  • , Jeih Weih Hung
  • , Shih Chieh Huang
  • , Berlin Chen

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we propose Flexible Dynamic Encoder RNN (FDE-RNN), an innovative model capable of seamlessly switching between VAD and PVAD without incurring redundant resource consumption. In static PVAD modeling, performing VAD typically requires either merging categories or omitting speaker embeddings, often resulting in excessively large models that are impractical for VAD tasks. In contrast, FDE-RNN efficiently adapts by removing the personalization module when functioning as VAD, significantly reducing resource demands. Furthermore, on PVAD tasks, FDE-RNN leverages dynamic neural networks with a gating-based skipping mechanism, enabling it to bypass redundant computations during non-speech segments, further optimizing computational efficiency. Extensive experiments demonstrate that FDE-RNN outperforms all other prior arts on both PVAD and VAD tasks in terms of overall performance. Notably, when functioning as a VAD, FDE-RNN merely utilizes 30% of the parameters required by the competitive models, underscoring its remarkable efficiency and scalability.

Original languageEnglish
Pages (from-to)5793-5797
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
Publication statusPublished - 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 2025 Aug 172025 Aug 21

Keywords

  • Dynamic Neural Networks
  • Personalized Voice Activity Detection
  • Voice Activity Detection

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Language and Linguistics
  • Modelling and Simulation
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Flexible VAD-PVAD Transition: A Detachable PVAD Module for Dynamic Encoder RNN VAD'. Together they form a unique fingerprint.

Cite this