Multimodal fusion using learned text concepts for image categorization

Qiang Zhu*, Mei Chen Yeh, Kwang Ting Cheng

*此作品的通信作者

研究成果: 書貢獻/報告類型會議論文篇章

36 引文 斯高帕斯(Scopus)

摘要

Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning [1]. For an image under classification which contains multiple detected text lines, we calculate a weighted Euclidian distance between each text line and the learned text concept of the target category. Subsequently, the minimum distance, along with lowlevel visual cues, are jointly used as the features for SVM-based classification. Experiments on a challenging image database demonstrate that the proposed fusion framework achieves a higher accuracy than the state-of-art methods for image classification.

原文英語
主出版物標題Proceedings of the 14th Annual ACM International Conference on Multimedia, MM 2006
頁面211-220
頁數10
DOIs
出版狀態已發佈 - 2006
對外發佈
事件14th Annual ACM International Conference on Multimedia, MM 2006 - Santa Barbara, CA, 美国
持續時間: 2006 10月 232006 10月 27

出版系列

名字Proceedings of the 14th Annual ACM International Conference on Multimedia, MM 2006

其他

其他14th Annual ACM International Conference on Multimedia, MM 2006
國家/地區美国
城市Santa Barbara, CA
期間2006/10/232006/10/27

ASJC Scopus subject areas

  • 一般電腦科學
  • 媒體技術

指紋

深入研究「Multimodal fusion using learned text concepts for image categorization」主題。共同形成了獨特的指紋。

引用此