Formosa Speech Recognition Challenge 2020 and Taiwanese across Taiwan Corpus

  • Yuan Fu Liao
  • , Chia Yu Chang
  • , Hak Khiam Tiun
  • , Huang Lan Su
  • , Hui Lu Khoo
  • , Jane S. Tsay
  • , Le Kun Tan
  • , Peter Kang
  • , Tsun Guan Thiann
  • , Un Gian Iunn
  • , Jyh Her Yang
  • , Chih Neng Liang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Taiwanese (a.k.a. Taiwanese Hokkien, Hoklo, Taigi, Southern Min or Min-Nan) is an endangered language, because the domination of Mandarin, the number of Taiwanese speakers continues to drop, especially among the youth generations. In addressing this problem, a Taiwanese speech-enabled human-computer interface for supporting people's daily life is essential. Therefore, a Formosa Speech in the Wild (FSW) project was established to collect a large-scale Taiwanese speech across Taiwan (TAT) corpus to boost the development of Taiwanese speech recognition (TSR). A Formosa Speech Recognition Challenge 2020 (FSR-2020) was also hosted to promote the corpus as well as to evaluate the performance of state-of-the-art TSR systems. This paper briefly introduces TAT corpus and FSR-2020 challenge, presents the provided data profile, evaluation plan and reports experimental baseline results. A subset of TAT corpus, TAT-Vol1, is given away for free for all participants (non-commercial license), and its corresponding Kaldi baseline recipes have been published online. Experimental results have showed that the combination of TAT corpus and the baseline recipes is a good resource pack for TSR research and development.

Original languageEnglish
Title of host publicationProceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages65-70
Number of pages6
ISBN (Electronic)9781728198965
DOIs
Publication statusPublished - 2020 Nov 5
Event23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020 - Virtual, Yangon, Myanmar
Duration: 2020 Nov 52020 Nov 7

Publication series

NameProceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020

Conference

Conference23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
Country/TerritoryMyanmar
CityVirtual, Yangon
Period2020/11/052020/11/07

Keywords

  • Formosa Speech Recognition Challenge 2020
  • Machine learning
  • Taiwanese Speech Recognition
  • Taiwanese across Taiwan Corpus

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Formosa Speech Recognition Challenge 2020 and Taiwanese across Taiwan Corpus'. Together they form a unique fingerprint.

Cite this