Indirect visual–semantic alignment for generalized zero-shot recognition

Yan He Chen, Mei Chen Yeh*

*此作品的通信作者

研究成果: 雜誌貢獻期刊論文同行評審

摘要

Our paper addresses the challenge of generalized zero-shot learning, where the label of a target image may belong to either a seen or an unseen category. Previous methods for this task typically learn a joint embedding space where image features and their corresponding class prototypes are directly aligned. However, this can be difficult due to the inherent gap between the visual and semantic space. To overcome this challenge, we propose a novel learning framework that relaxes the alignment requirement. Our approach employs a metric learning-based loss function to optimize the visual embedding model, allowing for different penalty strengths on within-class and between-class similarities. By avoiding pair-wise comparisons between image and class embeddings, our approach achieves more flexibility in learning discriminative and generalized visual features. Our extensive experiments demonstrate the superiority of our method with performance on par with the state-of-the-art on five benchmarks.

原文英語
文章編號111
期刊Multimedia Systems
30
發行號2
DOIs
出版狀態已發佈 - 2024 4月

ASJC Scopus subject areas

  • 軟體
  • 資訊系統
  • 媒體技術
  • 硬體和架構
  • 電腦網路與通信

指紋

深入研究「Indirect visual–semantic alignment for generalized zero-shot recognition」主題。共同形成了獨特的指紋。

引用此