Adaptive Locality Guidance: Enhancing Vision Transformers on Tiny Datasets

Jules Rostand, Chen Chien Hsu*, Cheng Kai Lu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As studies show that lack of sufficient data leads Vision Transformers (VTs) to mainly learn global information from the input, the recently proposed Locality Guidance (LG) approach [1] uses a lightweight Convolutional Neural Network (CNN) pretrained on the same dataset to guide the VT into learning local features as well. Under a dual learning framework, the use of the LG significantly boosts the accuracy of different VTs on multiple tiny datasets, at the mere cost of a slight increase in training time. However, we also find that the use of the LG prevents the models from learning global aspects to their full ability. To remedy to this limitation, we propose the Adaptive Locality Guidance (ALG), an improved version which uses the LG as an initialization tool, and after a certain number of epochs lets the VT learn by itself in supervised fashion. Specifically, we estimate the needed duration for the LG based on a threshold set on the evolution of the distance separating the features of the VT to those of the lightweight CNN used for guidance. As our improved method can be used in plug-and-play fashion, we successfully apply it across 4 different VTs on the CIFAR-100 dataset. Experimental results show that the proposed ALG significantly reduces the computational cost added in training by the LG, and further increases the validation accuracy by up to 5.49%, thereby achieving new State-Of-The-Art (SOTA) results among tiny VTs.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Consumer Electronics, ICCE 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331521165
DOIs
Publication statusPublished - 2025
Event2025 IEEE International Conference on Consumer Electronics, ICCE 2025 - Las Vegas, United States
Duration: 2025 Jan 112025 Jan 14

Publication series

NameDigest of Technical Papers - IEEE International Conference on Consumer Electronics
ISSN (Print)0747-668X
ISSN (Electronic)2159-1423

Conference

Conference2025 IEEE International Conference on Consumer Electronics, ICCE 2025
Country/TerritoryUnited States
CityLas Vegas
Period2025/01/112025/01/14

Keywords

  • Convolutional Neural Network
  • Locality Guidance
  • Supervised Learning
  • Vision Transformer

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Adaptive Locality Guidance: Enhancing Vision Transformers on Tiny Datasets'. Together they form a unique fingerprint.

Cite this