TY - JOUR
T1 - Indirect visual–semantic alignment for generalized zero-shot recognition
AU - Chen, Yan He
AU - Yeh, Mei Chen
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
PY - 2024/4
Y1 - 2024/4
N2 - Our paper addresses the challenge of generalized zero-shot learning, where the label of a target image may belong to either a seen or an unseen category. Previous methods for this task typically learn a joint embedding space where image features and their corresponding class prototypes are directly aligned. However, this can be difficult due to the inherent gap between the visual and semantic space. To overcome this challenge, we propose a novel learning framework that relaxes the alignment requirement. Our approach employs a metric learning-based loss function to optimize the visual embedding model, allowing for different penalty strengths on within-class and between-class similarities. By avoiding pair-wise comparisons between image and class embeddings, our approach achieves more flexibility in learning discriminative and generalized visual features. Our extensive experiments demonstrate the superiority of our method with performance on par with the state-of-the-art on five benchmarks.
AB - Our paper addresses the challenge of generalized zero-shot learning, where the label of a target image may belong to either a seen or an unseen category. Previous methods for this task typically learn a joint embedding space where image features and their corresponding class prototypes are directly aligned. However, this can be difficult due to the inherent gap between the visual and semantic space. To overcome this challenge, we propose a novel learning framework that relaxes the alignment requirement. Our approach employs a metric learning-based loss function to optimize the visual embedding model, allowing for different penalty strengths on within-class and between-class similarities. By avoiding pair-wise comparisons between image and class embeddings, our approach achieves more flexibility in learning discriminative and generalized visual features. Our extensive experiments demonstrate the superiority of our method with performance on par with the state-of-the-art on five benchmarks.
KW - Deep metric learning
KW - Fine-grained visual recognition
KW - Generalized zero-shot learning
KW - Visual–semantic embedding
UR - http://www.scopus.com/inward/record.url?scp=85189647451&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189647451&partnerID=8YFLogxK
U2 - 10.1007/s00530-024-01313-z
DO - 10.1007/s00530-024-01313-z
M3 - Article
AN - SCOPUS:85189647451
SN - 0942-4962
VL - 30
JO - Multimedia Systems
JF - Multimedia Systems
IS - 2
M1 - 111
ER -