Fine-grained video super-resolution via spatial-temporal learning and image detail enhancement

Chia Hung Yeh, Hsin Fu Yang, Yu Yang Lin, Wan Jen Huang, Feng Hsu Tsai, Li Wei Kang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

This paper addresses the problem for fine-grained video super-resolution (FGVSR) to suppress temporal flickering caused by separately processed consecutive frames and enhance the quality of restored video frame details when upsizing videos. Some existing video SR methods fail to sufficiently utilize spatial-temporal information from input low-resolution (LR) videos, while others may generate undesirable artifacts or cannot well reconstruct image details. To overcome these problems, we present a novel deep learning framework for FGVSR, which takes a set of consecutive LR video frames and generate the corresponding super-resolved frames. Our deep FGVSR framework focuses on reconstructing missing information from the LR sources based on the proposed multi-frame alignment and refinement strategies. More specifically, we propose an alignment module, where multiple frames are aligned at feature level, to prevent the output videos from flickering. Then, we introduce a feature fusion module, where aligned features generated from our alignment module are fused and refined in a multi-scale manner. Finally, the proposed refinement module is used to reconstruct missing information based on the fused features. In addition, we also embed an image enhancement module on the skip connection from the input layer to the output layer of our network for further enhancing the SR results. Experimental results show that the proposed deep FGVSR, compared with existing deep learning-based VSR methods, achieves state-of-the-art performances on the three well-known benchmarks, including REDS, Vid4, and Vimeo90k. More specifically, compared with the state-of-the-art VSR methods in our experiments, our FGVSR achieves quantitative improvements from 0.70 dB to 9.54 dB in PSNR. On the other hand, our method has also been shown to be efficient to other image restoration tasks, such as image inpainting.

Original languageEnglish
Article number107789
JournalEngineering Applications of Artificial Intelligence
Volume131
DOIs
Publication statusPublished - 2024 May

Keywords

  • Convolutional neural networks
  • Deep learning
  • Video enhancement
  • Video frame alignment
  • Video reconstruction
  • Video super-resolution

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Fine-grained video super-resolution via spatial-temporal learning and image detail enhancement'. Together they form a unique fingerprint.

Cite this