TY - GEN
T1 - Multimodal fusion using learned text concepts for image categorization
AU - Zhu, Qiang
AU - Yeh, Mei Chen
AU - Cheng, Kwang Ting
PY - 2006
Y1 - 2006
N2 - Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning [1]. For an image under classification which contains multiple detected text lines, we calculate a weighted Euclidian distance between each text line and the learned text concept of the target category. Subsequently, the minimum distance, along with lowlevel visual cues, are jointly used as the features for SVM-based classification. Experiments on a challenging image database demonstrate that the proposed fusion framework achieves a higher accuracy than the state-of-art methods for image classification.
AB - Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning [1]. For an image under classification which contains multiple detected text lines, we calculate a weighted Euclidian distance between each text line and the learned text concept of the target category. Subsequently, the minimum distance, along with lowlevel visual cues, are jointly used as the features for SVM-based classification. Experiments on a challenging image database demonstrate that the proposed fusion framework achieves a higher accuracy than the state-of-art methods for image classification.
KW - Image annotation
KW - Image categorization
KW - Multimodal fusion
KW - Multiple instance learning
KW - Text detection
UR - http://www.scopus.com/inward/record.url?scp=34547210642&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547210642&partnerID=8YFLogxK
U2 - 10.1145/1180639.1180698
DO - 10.1145/1180639.1180698
M3 - Conference contribution
AN - SCOPUS:34547210642
SN - 1595934472
SN - 9781595934475
T3 - Proceedings of the 14th Annual ACM International Conference on Multimedia, MM 2006
SP - 211
EP - 220
BT - Proceedings of the 14th Annual ACM International Conference on Multimedia, MM 2006
T2 - 14th Annual ACM International Conference on Multimedia, MM 2006
Y2 - 23 October 2006 through 27 October 2006
ER -