Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning

Yao Ting Sung*, Ju Ling Chen, Ji Her Cha, Hou Chiang Tseng, Tao Hsing Chang, Kuo En Chang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

41 Citations (Scopus)


Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outliers in advance. The use of such readability formulae tends to produce a low text classification accuracy, while using a support vector machine (SVM) in machine learning can enhance the classification outcome. The present study constructed readability models by integrating multilevel linguistic features with SVM, which is more appropriate for text classification. Taking the Chinese language as an example, this study developed 31 linguistic features as the predicting variables at the word, semantic, syntax, and cohesion levels, with grade levels of texts as the criterion variable. The study compared four types of readability models by integrating unilevel and multilevel linguistic features with GLMs and an SVM. The results indicate that adopting a multilevel approach in readability analysis provides a better representation of the complexities of both texts and the reading comprehension process.

Original languageEnglish
Pages (from-to)340-354
Number of pages15
JournalBehavior Research Methods
Issue number2
Publication statusPublished - 2015 Jun 1


  • Linguistic features
  • Multilevel
  • Readability
  • Support vector machine
  • Validity

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Psychology (miscellaneous)
  • General Psychology


Dive into the research topics of 'Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning'. Together they form a unique fingerprint.

Cite this