Formosa Speech Recognition Challenge 2020 and Taiwanese across Taiwan Corpus

Yuan Fu Liao, Chia Yu Chang, Hak Khiam Tiun, Huang Lan Su, Hui Lu Khoo, Jane S. Tsay, Le Kun Tan, Peter Kang, Tsun Guan Thiann, Un Gian Iunn, Jyh Her Yang, Chih Neng Liang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Taiwanese (a.k.a. Taiwanese Hokkien, Hoklo, Taigi, Southern Min or Min-Nan) is an endangered language, because the domination of Mandarin, the number of Taiwanese speakers continues to drop, especially among the youth generations. In addressing this problem, a Taiwanese speech-enabled human-computer interface for supporting people's daily life is essential. Therefore, a Formosa Speech in the Wild (FSW) project was established to collect a large-scale Taiwanese speech across Taiwan (TAT) corpus to boost the development of Taiwanese speech recognition (TSR). A Formosa Speech Recognition Challenge 2020 (FSR-2020) was also hosted to promote the corpus as well as to evaluate the performance of state-of-the-art TSR systems. This paper briefly introduces TAT corpus and FSR-2020 challenge, presents the provided data profile, evaluation plan and reports experimental baseline results. A subset of TAT corpus, TAT-Vol1, is given away for free for all participants (non-commercial license), and its corresponding Kaldi baseline recipes have been published online. Experimental results have showed that the combination of TAT corpus and the baseline recipes is a good resource pack for TSR research and development.

Original languageEnglish
Title of host publicationProceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages65-70
Number of pages6
ISBN (Electronic)9781728198965
DOIs
Publication statusPublished - 2020 Nov 5
Event23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020 - Virtual, Yangon, Myanmar
Duration: 2020 Nov 52020 Nov 7

Publication series

NameProceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020

Conference

Conference23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
Country/TerritoryMyanmar
CityVirtual, Yangon
Period2020/11/052020/11/07

Keywords

  • Formosa Speech Recognition Challenge 2020
  • Machine learning
  • Taiwanese across Taiwan Corpus
  • Taiwanese Speech Recognition

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Formosa Speech Recognition Challenge 2020 and Taiwanese across Taiwan Corpus'. Together they form a unique fingerprint.

Cite this