An Innovative BERT-Based Readability Model

Hou Chiang Tseng, Hsueh Chih Chen, Kuo En Chang, Yao Ting Sung, Berlin Chen

研究成果: 書貢獻/報告類型會議貢獻

摘要

Readability is referred to as the degree of difficulty to which an given text (article) can be understood by readers. When readers are reading a text with high readability, they will achieve better comprehension and learning retention. However, it has been a long-standing critical challenge to develop effective readability prediction models that can automatically and accurately assess the readability of a given text. When building readability prediction models for the Chinese language, word segmentation ambiguity is often a knotty problem that will inevitably happen in the pre-processing of texts. In view of this, we present in this paper a novel readability prediction approach for the Chinese language, building on a recently proposed, so-called Bidirectional Encoder Representation from Transformers (BERT) model that can capture both syntactic and semantic information of a text directly from its character-level representation. With the BERT-based readability prediction model that takes consecutive character-level representations as its input, we effectively assess the readability of a given text without the need of performing error-prone word segmentation. We empirically evaluate the performance of our BERT-based readability prediction model on a benchmark task, by comparing it with a strong baseline that utilizes a celebrated classification model (named fastText) in conjunction with word-level presentations. The results demonstrate that the BERT-based model with character-level representations can perform on par with the fastText-based model with word-level representations, yielding the accuracy of 78.45% on average. This finding also offers the promise of conducting readability assessment of a text in Chinese directly based on character-level representations.

原文英語
主出版物標題Innovative Technologies and Learning - 2nd International Conference, ICITL 2019, Proceedings
編輯Lisbet Rønningsbakk, Ting-Ting Wu, Frode Eika Sandnes, Yueh-Min Huang
發行者Springer
頁面301-308
頁數8
ISBN(列印)9783030353421
DOIs
出版狀態已發佈 - 2019 一月 1
事件2nd International Conference on Innovative Technologies and Learning, ICITL 2019 - Tromsø, 挪威
持續時間: 2019 十二月 22019 十二月 5

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
11937 LNCS
ISSN(列印)0302-9743
ISSN(電子)1611-3349

會議

會議2nd International Conference on Innovative Technologies and Learning, ICITL 2019
國家挪威
城市Tromsø
期間19/12/219/12/5

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

指紋 深入研究「An Innovative BERT-Based Readability Model」主題。共同形成了獨特的指紋。

  • 引用此

    Tseng, H. C., Chen, H. C., Chang, K. E., Sung, Y. T., & Chen, B. (2019). An Innovative BERT-Based Readability Model. 於 L. Rønningsbakk, T-T. Wu, F. E. Sandnes, & Y-M. Huang (編輯), Innovative Technologies and Learning - 2nd International Conference, ICITL 2019, Proceedings (頁 301-308). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 11937 LNCS). Springer. https://doi.org/10.1007/978-3-030-35343-8_32