An Innovative BERT-Based Readability Model

Hou Chiang Tseng, Hsueh Chih Chen, Kuo En Chang, Yao Ting Sung*, Berlin Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Readability is referred to as the degree of difficulty to which an given text (article) can be understood by readers. When readers are reading a text with high readability, they will achieve better comprehension and learning retention. However, it has been a long-standing critical challenge to develop effective readability prediction models that can automatically and accurately assess the readability of a given text. When building readability prediction models for the Chinese language, word segmentation ambiguity is often a knotty problem that will inevitably happen in the pre-processing of texts. In view of this, we present in this paper a novel readability prediction approach for the Chinese language, building on a recently proposed, so-called Bidirectional Encoder Representation from Transformers (BERT) model that can capture both syntactic and semantic information of a text directly from its character-level representation. With the BERT-based readability prediction model that takes consecutive character-level representations as its input, we effectively assess the readability of a given text without the need of performing error-prone word segmentation. We empirically evaluate the performance of our BERT-based readability prediction model on a benchmark task, by comparing it with a strong baseline that utilizes a celebrated classification model (named fastText) in conjunction with word-level presentations. The results demonstrate that the BERT-based model with character-level representations can perform on par with the fastText-based model with word-level representations, yielding the accuracy of 78.45% on average. This finding also offers the promise of conducting readability assessment of a text in Chinese directly based on character-level representations.

Original languageEnglish
Title of host publicationInnovative Technologies and Learning - 2nd International Conference, ICITL 2019, Proceedings
EditorsLisbet Rønningsbakk, Ting-Ting Wu, Frode Eika Sandnes, Yueh-Min Huang
PublisherSpringer
Pages301-308
Number of pages8
ISBN (Print)9783030353421
DOIs
Publication statusPublished - 2019
Event2nd International Conference on Innovative Technologies and Learning, ICITL 2019 - Tromsø, Norway
Duration: 2019 Dec 22019 Dec 5

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11937 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Innovative Technologies and Learning, ICITL 2019
Country/TerritoryNorway
CityTromsø
Period2019/12/022019/12/05

Keywords

  • BERT
  • Readability
  • Representation learning
  • Text classification
  • fastText

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'An Innovative BERT-Based Readability Model'. Together they form a unique fingerprint.

Cite this