The Corpus of Emotional Valences for 33,669 Chinese Words Based on Big Data

Chia Yueh Chang, Yen Cheng Chen, Meng Ning Tsai, Yao Ting Sung, Yu Lin Chang, Shu Yen Lin, Shu Ling Cho, Tao Hsing Chang, Hsueh Chih Chen*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Emotion theories are mainly classified as categorical or dimensional approaches. Given the importance of emotional words in emotion research, researchers have constructed a co-occurrence corpus of 7 types of emotion words through word co-occurrence and big data corpora. However, in addition to the categorical approach, the dimensional approach plays an important role in natural language processing. In particular, valence has an important influence on the study of emotion and language. In this study, the co-occurrence corpus of 7 types of emotion words constructed by Chen et al. [1] was expanded to create a corpus of emotional valences. Then, stepwise multiple regression analysis was performed with the predicted criterion variables and 15 predictor variables. The criterion variables were the emotional valences of 553 frequently occurring stimulus words included in the Chinese Word Association Norms [2]. The predictor variables included the emotion co-occurrences scores for 2 clusters (a cluster of literal emotion words and a cluster of metaphorical emotion words) and 7 types of emotions (happiness, love, surprise, sadness, anger, disgust, and fear) [the emotional words were common words from both the co-occurrence corpus of 7 types of emotion words constructed by Chen et al. [1] and the Chinese Word Association Norms established by Hu et al. [2]] and the virtue word co-occurrences score. The results showed that the scores for literal happiness word co-occurrences, metaphorical happiness word co-occurrences, literal disgust word co-occurrences, literal fear word co-occurrences, and virtue word co-occurrences could predict the valence values of emotion words, with the multiple correlation coefficients of multiple regression analyses reaching.729. Subsequently, the valence values of 33,669 words were established using the formula obtained from the multiple regression analysis of the 553 words. Next, the correlation between the actual valence values and the predicted valence values was analyzed to test the cross-validity of the established valences using the common words in the norm established by Lee and Lee [3] for the emotionality ratings and free associations of 267 common Chinese words. The results showed that the correlation between the 2 was.755, indicating that the predicted values generated by the big data corpora and word co-occurrence had a degree of similarity with the manually determined values. Based on theories and tests, this study used the co-occurrence data of 7 emotions and virtue to construct the corpus of emotional valences for 33,669 Chinese words. The results showed that the combined use of big data corpora and word co-occurrence can effectively expand existing corpora that were established based on emotional categories, improve the efficiency of manual construction of corpora, and establish a larger corpus of emotional words.

Original languageEnglish
Title of host publicationHCI in Business, Government and Organizations - 9th International Conference, HCIBGO 2022, Held as Part of the 24th HCI International Conference, HCII 2022, Proceedings
EditorsFiona Fui-Hoon Nah, Keng Siau
PublisherSpringer Science and Business Media Deutschland GmbH
Pages141-152
Number of pages12
ISBN (Print)9783031055430
DOIs
Publication statusPublished - 2022
Event9th International Conference on HCI in Business, Government and Organizations, HCIBGO 2022 Held as Part of the 24th HCI International Conference, HCII 2022 - Virtual, Online
Duration: 2022 Jun 262022 Jul 1

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13327 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on HCI in Business, Government and Organizations, HCIBGO 2022 Held as Part of the 24th HCI International Conference, HCII 2022
CityVirtual, Online
Period2022/06/262022/07/01

Keywords

  • Big data
  • Chinese
  • Emotion
  • Valence
  • Word co-occurrence

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'The Corpus of Emotional Valences for 33,669 Chinese Words Based on Big Data'. Together they form a unique fingerprint.

Cite this