A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning

You Sheng Tsao, Tien Hong Lo, Jiun Ting Li, Shi Yan Weng, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the widespread commercialization of smart devices, research on environmental sound classification has gained more and more attention in recent years. In this paper, we set out to make effective use of large-scale audio pretrained model and semi-supervised model training paradigm for environmental sound classification. To this end, an environmental sound classification method is first put forward, whose component model is built on top a large-scale audio pretrained model. Further, to simulate a low-resource sound classification setting where only limited supervised examples are made available, we instantiate the notion of transfer learning with a recently proposed training algorithm (namely, FixMatch) and a data augmentation method (namely, SpecAugment) to achieve the goal of semi-supervised model training. Experiments conducted on benchmark dataset UrbanSound8K reveal that our classification method can lead to an accuracy improvement of 2.4% in relation to a current baseline method.

Original languageEnglish
Title of host publicationROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing
EditorsLung-Hao Lee, Chia-Hui Chang, Kuan-Yu Chen
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages103-110
Number of pages8
ISBN (Electronic)9789869576949
Publication statusPublished - 2021
Event33rd Conference on Computational Linguistics and Speech Processing, ROCLING 2021 - Taoyuan, Taiwan
Duration: 2021 Oct 152021 Oct 16

Publication series

NameROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing

Conference

Conference33rd Conference on Computational Linguistics and Speech Processing, ROCLING 2021
Country/TerritoryTaiwan
CityTaoyuan
Period2021/10/152021/10/16

Keywords

  • Environmental Sound Classification
  • Semi-supervised learning
  • Transfer learning

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning'. Together they form a unique fingerprint.

Cite this