Enhanced visual attention-guided deep neural networks for image classification

Chia Hung Yeh, Min Hui Lin, Po Chao Chang, Li Wei Kang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

30 Citations (Scopus)


A fully connected layer is essential for a CNN, i.e., convolutional neural network, which has been shown to be successful in classifying images in several related applications. A CNN begins with convolution and pooling operations for decomposing an input image into features. The result of this process is then fed into a fully connected neural network, driving the final classification decision for the input image. However, it has been found that the learned feature maps in a CNN are sometimes not good enough for being fed into the fully connected layers to get good classification results. In this article, a visual attention learning module is proposed to enhance the classification capability of the fully connected layers in a CNN. By learning better feature maps to emphasize salient regions and weaken meaningless regions, better classification performance can be obtained with integrating the proposed module into the fully connected layers. The proposed visual attention learning module can be imposed on any existed CNN-based image classification models to achieve incremental improvements with negligible overhead. Based on our experiments, the proposed method achieves the top-1 accuracies of 95.32%, 92.73%, and 66.50% on average, respectively, obtained on our collected Underwater Fish dataset, the public Animals-10 dataset, and the public Stanford Cars dataset.

Original languageEnglish
Pages (from-to)163447-163457
Number of pages11
JournalIEEE Access
Publication statusPublished - 2020


  • Convolutional neural networks
  • Deep learning
  • Image classification
  • Object recognition
  • Salient feature learning

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering


Dive into the research topics of 'Enhanced visual attention-guided deep neural networks for image classification'. Together they form a unique fingerprint.

Cite this