An Innovative BERT-Based Readability Model

Hou Chiang Tseng, Hsueh Chih Chen, Kuo En Chang, Yao Ting Sung*, Berlin Chen


研究成果: 書貢獻/報告類型會議論文篇章

16 引文 斯高帕斯(Scopus)


Readability is referred to as the degree of difficulty to which an given text (article) can be understood by readers. When readers are reading a text with high readability, they will achieve better comprehension and learning retention. However, it has been a long-standing critical challenge to develop effective readability prediction models that can automatically and accurately assess the readability of a given text. When building readability prediction models for the Chinese language, word segmentation ambiguity is often a knotty problem that will inevitably happen in the pre-processing of texts. In view of this, we present in this paper a novel readability prediction approach for the Chinese language, building on a recently proposed, so-called Bidirectional Encoder Representation from Transformers (BERT) model that can capture both syntactic and semantic information of a text directly from its character-level representation. With the BERT-based readability prediction model that takes consecutive character-level representations as its input, we effectively assess the readability of a given text without the need of performing error-prone word segmentation. We empirically evaluate the performance of our BERT-based readability prediction model on a benchmark task, by comparing it with a strong baseline that utilizes a celebrated classification model (named fastText) in conjunction with word-level presentations. The results demonstrate that the BERT-based model with character-level representations can perform on par with the fastText-based model with word-level representations, yielding the accuracy of 78.45% on average. This finding also offers the promise of conducting readability assessment of a text in Chinese directly based on character-level representations.

主出版物標題Innovative Technologies and Learning - 2nd International Conference, ICITL 2019, Proceedings
編輯Lisbet Rønningsbakk, Ting-Ting Wu, Frode Eika Sandnes, Yueh-Min Huang
出版狀態已發佈 - 2019
事件2nd International Conference on Innovative Technologies and Learning, ICITL 2019 - Tromsø, 挪威
持續時間: 2019 12月 22019 12月 5


名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
11937 LNCS


會議2nd International Conference on Innovative Technologies and Learning, ICITL 2019

ASJC Scopus subject areas

  • 理論電腦科學
  • 一般電腦科學


深入研究「An Innovative BERT-Based Readability Model」主題。共同形成了獨特的指紋。