TY - JOUR
T1 - Enhanced visual attention-guided deep neural networks for image classification
AU - Yeh, Chia Hung
AU - Lin, Min Hui
AU - Chang, Po Chao
AU - Kang, Li Wei
N1 - Publisher Copyright:
© 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2020
Y1 - 2020
N2 - A fully connected layer is essential for a CNN, i.e., convolutional neural network, which has been shown to be successful in classifying images in several related applications. A CNN begins with convolution and pooling operations for decomposing an input image into features. The result of this process is then fed into a fully connected neural network, driving the final classification decision for the input image. However, it has been found that the learned feature maps in a CNN are sometimes not good enough for being fed into the fully connected layers to get good classification results. In this article, a visual attention learning module is proposed to enhance the classification capability of the fully connected layers in a CNN. By learning better feature maps to emphasize salient regions and weaken meaningless regions, better classification performance can be obtained with integrating the proposed module into the fully connected layers. The proposed visual attention learning module can be imposed on any existed CNN-based image classification models to achieve incremental improvements with negligible overhead. Based on our experiments, the proposed method achieves the top-1 accuracies of 95.32%, 92.73%, and 66.50% on average, respectively, obtained on our collected Underwater Fish dataset, the public Animals-10 dataset, and the public Stanford Cars dataset.
AB - A fully connected layer is essential for a CNN, i.e., convolutional neural network, which has been shown to be successful in classifying images in several related applications. A CNN begins with convolution and pooling operations for decomposing an input image into features. The result of this process is then fed into a fully connected neural network, driving the final classification decision for the input image. However, it has been found that the learned feature maps in a CNN are sometimes not good enough for being fed into the fully connected layers to get good classification results. In this article, a visual attention learning module is proposed to enhance the classification capability of the fully connected layers in a CNN. By learning better feature maps to emphasize salient regions and weaken meaningless regions, better classification performance can be obtained with integrating the proposed module into the fully connected layers. The proposed visual attention learning module can be imposed on any existed CNN-based image classification models to achieve incremental improvements with negligible overhead. Based on our experiments, the proposed method achieves the top-1 accuracies of 95.32%, 92.73%, and 66.50% on average, respectively, obtained on our collected Underwater Fish dataset, the public Animals-10 dataset, and the public Stanford Cars dataset.
KW - Convolutional neural networks
KW - Deep learning
KW - Image classification
KW - Object recognition
KW - Salient feature learning
UR - http://www.scopus.com/inward/record.url?scp=85094326463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094326463&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3021729
DO - 10.1109/ACCESS.2020.3021729
M3 - Article
AN - SCOPUS:85094326463
SN - 2169-3536
VL - 8
SP - 163447
EP - 163457
JO - IEEE Access
JF - IEEE Access
ER -