TY - GEN
T1 - Few-Shot Open-Set Keyword Spotting with Multi-Stage Training
AU - Li, Lo Ya
AU - Lo, Tien Hong
AU - Hung, Jeih Weih
AU - Huang, Shih Chieh
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - As the advance of human-computer interaction technologies continued, keyword spotting (KWS) systems have gained prominence in everyday devices. This study is dedicated to exploring innovative approaches for few-shot keyword recognition under open-set conditions, a challenging yet crucial area in speech processing. To this end, we design and develop a multi-stage training method that synergistically combines the advantages of acoustic and phonetic features, thereby substantially enhancing the ability of a KWS model. By learning multi-type features with joint training from only one dataset, our KWS model is equipped with a more robustness feature extractor to deal with few-shot KWS. Experimental results demonstrate that our model outperforms strong baselines by achieving a 15% improvement in recognition accuracy on open-set tests in a 10shot-10way setting. This research confirms the effectiveness of our multi-stage strategy and suggests promising directions for future development in keyword recognition technologies.
AB - As the advance of human-computer interaction technologies continued, keyword spotting (KWS) systems have gained prominence in everyday devices. This study is dedicated to exploring innovative approaches for few-shot keyword recognition under open-set conditions, a challenging yet crucial area in speech processing. To this end, we design and develop a multi-stage training method that synergistically combines the advantages of acoustic and phonetic features, thereby substantially enhancing the ability of a KWS model. By learning multi-type features with joint training from only one dataset, our KWS model is equipped with a more robustness feature extractor to deal with few-shot KWS. Experimental results demonstrate that our model outperforms strong baselines by achieving a 15% improvement in recognition accuracy on open-set tests in a 10shot-10way setting. This research confirms the effectiveness of our multi-stage strategy and suggests promising directions for future development in keyword recognition technologies.
KW - few-shot learning
KW - Keyword spotting
UR - http://www.scopus.com/inward/record.url?scp=85218193375&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218193375&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC63619.2025.10848588
DO - 10.1109/APSIPAASC63619.2025.10848588
M3 - Conference contribution
AN - SCOPUS:85218193375
T3 - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
BT - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Y2 - 3 December 2024 through 6 December 2024
ER -