TY - JOUR
T1 - SemanticHash
T2 - Hash Coding Via Semantics-Guided Label Prototype Learning
AU - Tu, Cheng Hao
AU - Yang, Huei Fang
AU - Yang, Shih Min
AU - Yeh, Mei Chen
AU - Chen, Chu Song
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2021/2/1
Y1 - 2021/2/1
N2 - In this article, we propose SemanticHash, a simple and effective deep neural network model, to leverage semantic word embeddings (e.g., BERT) in hash codes learning. Both images and class labels are compressed into K-bit binary vectors by using the visual (or the semantic) hash functions, which are jointly learned and aligned to optimize the semantic consistency. The K-dimensional class label prototypes-projected from semantic word embeddings-guide the hash mapping on the image side and vice versa, creating the K-bit image hash codes being aligned with their semantic prototypes and therefore more discriminative. Extensive experimental results on four benchmarks, CIFAR10, NUS-WIDE, ImageNet, and MS-COCO datasets, demonstrate the effectiveness of our approach. We also perform studies to analyze the effects of quantization and word semantic spaces and to explain the relations among the learned class prototypes. Finally, the generalization capability of the proposed approach is further demonstrated. It achieves competitive performance in comparison with state-of-the-art unsupervised and zero-shot hashing methods. Impact Statement-Hash code learning is an important technology that enables efficient image retrieval on large-scale data. While existing hashing algorithms can effectively generate compact binary codes in a supervised learning setting trained with a moderate-size dataset, they are demanding to be scalable to large datasets and do not generalize to unseen datasets. The proposed approach overcomes these limitations. Compared with state-of-the-art ones, our solution achieves 2.1% of average performance improvement on four moderate-size benchmarks and 4.7% of improvement on ImageNet, a large-scale dataset with over 1.2 M training images. With superior performance on popular benchmarks for binary hash code learning, the technology introduced performs well on cross-dataset and zero-shot (i.e., the testing concepts are unseen during training) scenarios too. Our approach attains over 17.7% of zero-shot retrieval performance improvement when compared to the state-of-the-art in the area. This article thus provides a powerful solution to utilize massive data for fast and accurate image retrieval in the big data era.
AB - In this article, we propose SemanticHash, a simple and effective deep neural network model, to leverage semantic word embeddings (e.g., BERT) in hash codes learning. Both images and class labels are compressed into K-bit binary vectors by using the visual (or the semantic) hash functions, which are jointly learned and aligned to optimize the semantic consistency. The K-dimensional class label prototypes-projected from semantic word embeddings-guide the hash mapping on the image side and vice versa, creating the K-bit image hash codes being aligned with their semantic prototypes and therefore more discriminative. Extensive experimental results on four benchmarks, CIFAR10, NUS-WIDE, ImageNet, and MS-COCO datasets, demonstrate the effectiveness of our approach. We also perform studies to analyze the effects of quantization and word semantic spaces and to explain the relations among the learned class prototypes. Finally, the generalization capability of the proposed approach is further demonstrated. It achieves competitive performance in comparison with state-of-the-art unsupervised and zero-shot hashing methods. Impact Statement-Hash code learning is an important technology that enables efficient image retrieval on large-scale data. While existing hashing algorithms can effectively generate compact binary codes in a supervised learning setting trained with a moderate-size dataset, they are demanding to be scalable to large datasets and do not generalize to unseen datasets. The proposed approach overcomes these limitations. Compared with state-of-the-art ones, our solution achieves 2.1% of average performance improvement on four moderate-size benchmarks and 4.7% of improvement on ImageNet, a large-scale dataset with over 1.2 M training images. With superior performance on popular benchmarks for binary hash code learning, the technology introduced performs well on cross-dataset and zero-shot (i.e., the testing concepts are unseen during training) scenarios too. Our approach attains over 17.7% of zero-shot retrieval performance improvement when compared to the state-of-the-art in the area. This article thus provides a powerful solution to utilize massive data for fast and accurate image retrieval in the big data era.
KW - Convolutional neural networks (CNNs)
KW - representation learning
KW - semantic-guided artificial intelligence
KW - supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85124133970&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124133970&partnerID=8YFLogxK
U2 - 10.1109/TAI.2021.3068322
DO - 10.1109/TAI.2021.3068322
M3 - Article
AN - SCOPUS:85124133970
SN - 2691-4581
VL - 2
SP - 42
EP - 57
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 1
ER -