A fully connected layer is essential for a CNN, i.e., convolutional neural network, which has been shown to be successful in classifying images in several related applications. A CNN begins with convolution and pooling operations for decomposing an input image into features. The result of this process is then fed into a fully connected neural network, driving the final classification decision for the input image. However, it has been found that the learned feature maps in a CNN are sometimes not good enough for being fed into the fully connected layers to get good classification results. In this article, a visual attention learning module is proposed to enhance the classification capability of the fully connected layers in a CNN. By learning better feature maps to emphasize salient regions and weaken meaningless regions, better classification performance can be obtained with integrating the proposed module into the fully connected layers. The proposed visual attention learning module can be imposed on any existed CNN-based image classification models to achieve incremental improvements with negligible overhead. Based on our experiments, the proposed method achieves the top-1 accuracies of 95.32%, 92.73%, and 66.50% on average, respectively, obtained on our collected Underwater Fish dataset, the public Animals-10 dataset, and the public Stanford Cars dataset.
ASJC Scopus subject areas