TY - JOUR
T1 - Dual-Model Prediction of Affective Engagement and Vocal Attractiveness From Speaker Expressiveness in Video Learning
AU - Suen, Hung Yue
AU - Hung, Kuo En
AU - Tseng, Fan Hsun
N1 - Publisher Copyright:
© 2026 IEEE.
PY - 2026/6/1
Y1 - 2026/6/1
N2 - This article outlines a machine learning-enabled speaker-centric emotion AI approach capable of predicting audience-affective engagement and vocal attractiveness in asynchronous video-based learning, relying solely on speaker-side affective expressions. Inspired by the demand for scalable, privacy-preserving affective computing applications, this speaker-centric emotion AI approach incorporates two distinct regression models that leverage a massive corpus developed within massive open online courses (MOOCs) to enable affectively engaging experiences. The regression model predicting affective engagement is developed by assimilating emotional expressions emanating from facial dynamics, oculomotor features, prosody, and cognitive semantics, while incorporating a second regression model to predict vocal attractiveness based exclusively on speaker-side acoustic features. Notably, on speaker-independent test sets, both regression models yielded impressive predictive performance (R2 = 0.85 for affective engagement and R2 = 0.88 for vocal attractiveness), confirming that speaker-side affect can functionally represent aggregated audience feedback. This article provides a speaker-centric Emotion AI approach substantiated by an empirical study discovering that speaker-side multimodal features, including acoustics, can prospectively forecast audience feedback without necessarily employing audience-side input information.
AB - This article outlines a machine learning-enabled speaker-centric emotion AI approach capable of predicting audience-affective engagement and vocal attractiveness in asynchronous video-based learning, relying solely on speaker-side affective expressions. Inspired by the demand for scalable, privacy-preserving affective computing applications, this speaker-centric emotion AI approach incorporates two distinct regression models that leverage a massive corpus developed within massive open online courses (MOOCs) to enable affectively engaging experiences. The regression model predicting affective engagement is developed by assimilating emotional expressions emanating from facial dynamics, oculomotor features, prosody, and cognitive semantics, while incorporating a second regression model to predict vocal attractiveness based exclusively on speaker-side acoustic features. Notably, on speaker-independent test sets, both regression models yielded impressive predictive performance (R2 = 0.85 for affective engagement and R2 = 0.88 for vocal attractiveness), confirming that speaker-side affect can functionally represent aggregated audience feedback. This article provides a speaker-centric Emotion AI approach substantiated by an empirical study discovering that speaker-side multimodal features, including acoustics, can prospectively forecast audience feedback without necessarily employing audience-side input information.
KW - Affective sensing
KW - computational education
KW - emotional AI
KW - emotional expression
KW - intelligent tutoring systems
KW - learning analytics
KW - multimodal fusion
KW - sentiment analysis
UR - https://www.scopus.com/pages/publications/105036561754
UR - https://www.scopus.com/pages/publications/105036561754#tab=citedBy
U2 - 10.1109/TCSS.2026.3675249
DO - 10.1109/TCSS.2026.3675249
M3 - Article
AN - SCOPUS:105036561754
SN - 2329-924X
VL - 13
SP - 4111
EP - 4119
JO - IEEE Transactions on Computational Social Systems
JF - IEEE Transactions on Computational Social Systems
IS - 3
ER -