TY - GEN
T1 - A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning
AU - Tsao, You Sheng
AU - Lo, Tien Hong
AU - Li, Jiun Ting
AU - Weng, Shi Yan
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2021 ROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing. All rights reserved.
PY - 2021
Y1 - 2021
N2 - With the widespread commercialization of smart devices, research on environmental sound classification has gained more and more attention in recent years. In this paper, we set out to make effective use of large-scale audio pretrained model and semi-supervised model training paradigm for environmental sound classification. To this end, an environmental sound classification method is first put forward, whose component model is built on top a large-scale audio pretrained model. Further, to simulate a low-resource sound classification setting where only limited supervised examples are made available, we instantiate the notion of transfer learning with a recently proposed training algorithm (namely, FixMatch) and a data augmentation method (namely, SpecAugment) to achieve the goal of semi-supervised model training. Experiments conducted on benchmark dataset UrbanSound8K reveal that our classification method can lead to an accuracy improvement of 2.4% in relation to a current baseline method.
AB - With the widespread commercialization of smart devices, research on environmental sound classification has gained more and more attention in recent years. In this paper, we set out to make effective use of large-scale audio pretrained model and semi-supervised model training paradigm for environmental sound classification. To this end, an environmental sound classification method is first put forward, whose component model is built on top a large-scale audio pretrained model. Further, to simulate a low-resource sound classification setting where only limited supervised examples are made available, we instantiate the notion of transfer learning with a recently proposed training algorithm (namely, FixMatch) and a data augmentation method (namely, SpecAugment) to achieve the goal of semi-supervised model training. Experiments conducted on benchmark dataset UrbanSound8K reveal that our classification method can lead to an accuracy improvement of 2.4% in relation to a current baseline method.
KW - Environmental Sound Classification
KW - Semi-supervised learning
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85127442839&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127442839&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85127442839
T3 - ROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing
SP - 103
EP - 110
BT - ROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing
A2 - Lee, Lung-Hao
A2 - Chang, Chia-Hui
A2 - Chen, Kuan-Yu
PB - The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
T2 - 33rd Conference on Computational Linguistics and Speech Processing, ROCLING 2021
Y2 - 15 October 2021 through 16 October 2021
ER -