TY - GEN
T1 - Gated Adapters with Balanced Activation for Effective Contextual Speech Recognition
AU - Liu, Yu Chun
AU - Wang, Yi Cheng
AU - Pai, Li Ting
AU - Lu, Jia Liang
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In end-to-end (E2E) automatic speech recognition (ASR), ac-curately recognizing rare words, such as named entities, re-mains a significant challenge. Although existing contextual biasing techniques have improved recognition rates for named entities, they often incur substantial computational costs and the risk of false biasing. Recent research has shown that integrating gating mechanisms with contextual biasing adapters can dynamically regulate activation, effectively re-ducing unnecessary computational overhead. However, we observed that gating mechanisms tend not to activate when encountering particularly rare instances within named entities. To address this challenge, we combined the gating mecha-nism with a novel activation-balanced objective, resulting in the gate-balanced adapter. This approach not only sustains high recognition rates for named entities but also significantly reduces character error rates (CER) and overall computational load. A series of experiments were conducted on the AISHELL-1 dataset, and the results showed approximately a 1.2% reduction in CER compared to the baseline, highlighting its potential for practical applications.
AB - In end-to-end (E2E) automatic speech recognition (ASR), ac-curately recognizing rare words, such as named entities, re-mains a significant challenge. Although existing contextual biasing techniques have improved recognition rates for named entities, they often incur substantial computational costs and the risk of false biasing. Recent research has shown that integrating gating mechanisms with contextual biasing adapters can dynamically regulate activation, effectively re-ducing unnecessary computational overhead. However, we observed that gating mechanisms tend not to activate when encountering particularly rare instances within named entities. To address this challenge, we combined the gating mecha-nism with a novel activation-balanced objective, resulting in the gate-balanced adapter. This approach not only sustains high recognition rates for named entities but also significantly reduces character error rates (CER) and overall computational load. A series of experiments were conducted on the AISHELL-1 dataset, and the results showed approximately a 1.2% reduction in CER compared to the baseline, highlighting its potential for practical applications.
KW - Automatic speech recognition
KW - contex-tual biasing
KW - contextual adapter
KW - long-tail learning
UR - https://www.scopus.com/pages/publications/85215695174
UR - https://www.scopus.com/pages/publications/85215695174#tab=citedBy
U2 - 10.1109/O-COCOSDA64382.2024.10800541
DO - 10.1109/O-COCOSDA64382.2024.10800541
M3 - Conference contribution
AN - SCOPUS:85215695174
T3 - 2024 27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Proceedings
BT - 2024 27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Proceedings
A2 - Su, Ming-Hsiang
A2 - Yeh, Jui-Feng
A2 - Liao, Yuan-Fu
A2 - Lee, Chi-Chun
A2 - Taso, Yu
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024
Y2 - 17 October 2024 through 19 October 2024
ER -