Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts

Hou Chiang Tseng, Berlin Chen, Tao Hsing Chang, Yao-Ting Sung

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Text readability assessment is a challenging interdisciplinary endeavor with rich practical implications. It has long drawn the attention of researchers internationally, and the readability models since developed have been widely applied to various fields. Previous readability models have only made use of linguistic features employed for general text analysis and have not been sufficiently accurate when used to gauge domain-specific texts. In view of this, this study proposes a latent-semantic-analysis (LSA)-constructed hierarchical conceptual space that can be used to train a readability model to accurately assess domain-specific texts. Compared with a baseline reference using a traditional model, the new model improves by 13.88% to achieve 68.98% of accuracy when leveling social science texts, and by 24.61% to achieve 73.96% of accuracy when assessing natural science texts. We then combine the readability features developed for the current study with general linguistic features, and the accuracy of leveling social science texts improves by an even higher degree of 31.58% to achieve 86.68%, and that of natural science texts by 26.56% to achieve 75.91%. These results indicate that the readability features developed in this study can be used both to train a readability model for leveling domain-specific texts and also in combination with the more common linguistic features to enhance the efficacy of the model. Future research can expand the generalizability of the model by assessing texts from different fields and grade levels using the proposed method, thus enhancing the practical applications of this new method.

Original languageEnglish
Pages (from-to)331-361
Number of pages31
JournalNatural Language Engineering
Volume25
Issue number3
DOIs
Publication statusPublished - 2019 May 1

Fingerprint

levelling
learning method
Learning systems
Semantics
semantics
Linguistics
Natural sciences
Social sciences
natural sciences
linguistics
social science
Leveling
Machine Learning
Conceptual Space
Latent Semantic Analysis
Readability
text analysis
Gages
school grade

Keywords

  • Domain-specific text
  • Machine learning
  • Readability
  • Support vector machine
  • Text mining

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Cite this

Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts. / Tseng, Hou Chiang; Chen, Berlin; Chang, Tao Hsing; Sung, Yao-Ting.

In: Natural Language Engineering, Vol. 25, No. 3, 01.05.2019, p. 331-361.

Research output: Contribution to journalArticle

@article{3c8f2fde67a647019e58f861372607fb,
title = "Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts",
abstract = "Text readability assessment is a challenging interdisciplinary endeavor with rich practical implications. It has long drawn the attention of researchers internationally, and the readability models since developed have been widely applied to various fields. Previous readability models have only made use of linguistic features employed for general text analysis and have not been sufficiently accurate when used to gauge domain-specific texts. In view of this, this study proposes a latent-semantic-analysis (LSA)-constructed hierarchical conceptual space that can be used to train a readability model to accurately assess domain-specific texts. Compared with a baseline reference using a traditional model, the new model improves by 13.88{\%} to achieve 68.98{\%} of accuracy when leveling social science texts, and by 24.61{\%} to achieve 73.96{\%} of accuracy when assessing natural science texts. We then combine the readability features developed for the current study with general linguistic features, and the accuracy of leveling social science texts improves by an even higher degree of 31.58{\%} to achieve 86.68{\%}, and that of natural science texts by 26.56{\%} to achieve 75.91{\%}. These results indicate that the readability features developed in this study can be used both to train a readability model for leveling domain-specific texts and also in combination with the more common linguistic features to enhance the efficacy of the model. Future research can expand the generalizability of the model by assessing texts from different fields and grade levels using the proposed method, thus enhancing the practical applications of this new method.",
keywords = "Domain-specific text, Machine learning, Readability, Support vector machine, Text mining",
author = "Tseng, {Hou Chiang} and Berlin Chen and Chang, {Tao Hsing} and Yao-Ting Sung",
year = "2019",
month = "5",
day = "1",
doi = "10.1017/S1351324919000093",
language = "English",
volume = "25",
pages = "331--361",
journal = "Natural Language Engineering",
issn = "1351-3249",
publisher = "Cambridge University Press",
number = "3",

}

TY - JOUR

T1 - Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts

AU - Tseng, Hou Chiang

AU - Chen, Berlin

AU - Chang, Tao Hsing

AU - Sung, Yao-Ting

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Text readability assessment is a challenging interdisciplinary endeavor with rich practical implications. It has long drawn the attention of researchers internationally, and the readability models since developed have been widely applied to various fields. Previous readability models have only made use of linguistic features employed for general text analysis and have not been sufficiently accurate when used to gauge domain-specific texts. In view of this, this study proposes a latent-semantic-analysis (LSA)-constructed hierarchical conceptual space that can be used to train a readability model to accurately assess domain-specific texts. Compared with a baseline reference using a traditional model, the new model improves by 13.88% to achieve 68.98% of accuracy when leveling social science texts, and by 24.61% to achieve 73.96% of accuracy when assessing natural science texts. We then combine the readability features developed for the current study with general linguistic features, and the accuracy of leveling social science texts improves by an even higher degree of 31.58% to achieve 86.68%, and that of natural science texts by 26.56% to achieve 75.91%. These results indicate that the readability features developed in this study can be used both to train a readability model for leveling domain-specific texts and also in combination with the more common linguistic features to enhance the efficacy of the model. Future research can expand the generalizability of the model by assessing texts from different fields and grade levels using the proposed method, thus enhancing the practical applications of this new method.

AB - Text readability assessment is a challenging interdisciplinary endeavor with rich practical implications. It has long drawn the attention of researchers internationally, and the readability models since developed have been widely applied to various fields. Previous readability models have only made use of linguistic features employed for general text analysis and have not been sufficiently accurate when used to gauge domain-specific texts. In view of this, this study proposes a latent-semantic-analysis (LSA)-constructed hierarchical conceptual space that can be used to train a readability model to accurately assess domain-specific texts. Compared with a baseline reference using a traditional model, the new model improves by 13.88% to achieve 68.98% of accuracy when leveling social science texts, and by 24.61% to achieve 73.96% of accuracy when assessing natural science texts. We then combine the readability features developed for the current study with general linguistic features, and the accuracy of leveling social science texts improves by an even higher degree of 31.58% to achieve 86.68%, and that of natural science texts by 26.56% to achieve 75.91%. These results indicate that the readability features developed in this study can be used both to train a readability model for leveling domain-specific texts and also in combination with the more common linguistic features to enhance the efficacy of the model. Future research can expand the generalizability of the model by assessing texts from different fields and grade levels using the proposed method, thus enhancing the practical applications of this new method.

KW - Domain-specific text

KW - Machine learning

KW - Readability

KW - Support vector machine

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=85063979660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063979660&partnerID=8YFLogxK

U2 - 10.1017/S1351324919000093

DO - 10.1017/S1351324919000093

M3 - Article

AN - SCOPUS:85063979660

VL - 25

SP - 331

EP - 361

JO - Natural Language Engineering

JF - Natural Language Engineering

SN - 1351-3249

IS - 3

ER -