跳至主導覽 跳至搜尋 跳過主要內容

Speaker Conditional Sinc-Extractor for Personal VAD

  • En Lun Yu
  • , Kuan Hsun Ho
  • , Jeih Weih Hung
  • , Shih Chieh Huang
  • , Berlin Chen

研究成果: 雜誌貢獻會議論文同行評審

5   !!Link opens in a new tab 引文 斯高帕斯(Scopus)

摘要

This study explores Sinc-convolution's novel application in Personal Voice Activity Detection (PVAD).The Sinc-Extractor (SE) network, developed for PVAD, learns cutoff frequencies and band gains of sinc functions to extract acoustic features.Additionally, the speaker conditional SE (SCSE) module incorporates speaker information from high-dimensional d-vectors into low-dimensional acoustic features.SE-PVAD and Vanilla PVAD have similar model size and computing load, while SCSE-PVAD is more compact with shorter inference time as it excludes speaker embedding.Evaluated with concatenated utterances from the LibriSpeech corpus, SE-PVAD outperforms Vanilla PVAD significantly.SCSE-PVAD matches Vanilla PVAD's performance but reduces input feature dimensionality and network complexity.Thus, SCSE-PVAD can function like a typical VAD, accepting only acoustic features, making it suitable for low-resource wearable devices.

原文英語
頁(從 - 到)2115-2119
頁數5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
出版狀態已發佈 - 2024
事件25th Interspeech Conferece 2024 - Kos Island, 希腊
持續時間: 2024 9月 12024 9月 5

ASJC Scopus subject areas

  • 語言與語言學
  • 人機介面
  • 訊號處理
  • 軟體
  • 建模與模擬

指紋

深入研究「Speaker Conditional Sinc-Extractor for Personal VAD」主題。共同形成了獨特的指紋。

引用此