TY - GEN
T1 - Self-Supervised Multi-Label Classification with Global Context and Local Attention
AU - Chen, Chun Yen
AU - Yeh, Mei Chen
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/5/30
Y1 - 2024/5/30
N2 - Self-supervised learning has proven highly effective across various tasks, showcasing its versatility in different applications. Despite these achievements, the challenges inherent in multi-label classification have seen limited attention. This paper introduces GAELLE, a novel self-supervised multi-label classification framework that simultaneously captures image context and object information. GAELLE employs a combination of global context and local attention mechanisms to discern diverse levels of semantic information in images. The global component comprehensively learns image content while local attention eliminates object-irrelevant nuances by aligning embeddings with a projection head. The integration of global and local features in GAELLE effectively captures intricate object-scene relationships. To further enhance this capability, we introduce a global and local swap prediction technique, facilitating the nuanced interplay between various objects and scenes within images. Experimental results showcase GAELLE’s state-of-the-art performance in self-supervised multi-label classification tasks, highlighting its effectiveness in uncovering complex relationships between multiple objects and scenes in images.
AB - Self-supervised learning has proven highly effective across various tasks, showcasing its versatility in different applications. Despite these achievements, the challenges inherent in multi-label classification have seen limited attention. This paper introduces GAELLE, a novel self-supervised multi-label classification framework that simultaneously captures image context and object information. GAELLE employs a combination of global context and local attention mechanisms to discern diverse levels of semantic information in images. The global component comprehensively learns image content while local attention eliminates object-irrelevant nuances by aligning embeddings with a projection head. The integration of global and local features in GAELLE effectively captures intricate object-scene relationships. To further enhance this capability, we introduce a global and local swap prediction technique, facilitating the nuanced interplay between various objects and scenes within images. Experimental results showcase GAELLE’s state-of-the-art performance in self-supervised multi-label classification tasks, highlighting its effectiveness in uncovering complex relationships between multiple objects and scenes in images.
KW - Attention
KW - Multi-label classification
KW - Self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85199209591&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85199209591&partnerID=8YFLogxK
U2 - 10.1145/3652583.3658026
DO - 10.1145/3652583.3658026
M3 - Conference contribution
AN - SCOPUS:85199209591
T3 - ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval
SP - 934
EP - 942
BT - ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval
PB - Association for Computing Machinery, Inc
T2 - 2024 International Conference on Multimedia Retrieval, ICMR 2024
Y2 - 10 June 2024 through 14 June 2024
ER -