MATIC: Multilingual Accurate Textual Image Customization via Joint Generative Artificial Intelligence

  • Chiao Hsin Wu*
  • , I. Wei Lai
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The advancements in diffusion model enables the creation of highly detailed images. However, concurrently fusing texts and images poses significant challenges, often struggling with the maintenance of text accuracy across languages, optimal placement, and appropriate typography. To address these challenges, we introduce the Multilingual Accurate Textual Image Customization (MATIC) framework. MATIC employs the Chain-of-Thought (CoT) concept to decompose the textual image generation process into multiple steps, leveraging diverse generative artificial intelligence, including Multimodal Large Language Model (MMLLM) and diffusion model. The framework first generates the desired text and a corresponding prompt for the diffusion model based on user input. The diffusiongenerated image is then examined to remove any undesired text. Meanwhile, the typographic elements are designed to align with the visual content. Finally, the textual image is fused with the aid of a grid coordinate system, evaluated by MMLLM, and further customized by the user through natural language. Experimental results demonstrate that MATIC can produce accurate, high-quality, multilingual textual images that meet user requirements across various domains, including digital marketing, graphic design, and educational content creation.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 18th Pacific Visualization Conference, PacificVis 2025
PublisherIEEE Computer Society
Pages352-357
Number of pages6
ISBN (Electronic)9798331505813
DOIs
Publication statusPublished - 2025
Event18th IEEE Pacific Visualization Conference, PacificVis 2025 - Tokyo, Japan
Duration: 2025 Apr 222025 Apr 25

Publication series

NameIEEE Pacific Visualization Symposium
ISSN (Print)2165-8765
ISSN (Electronic)2165-8773

Conference

Conference18th IEEE Pacific Visualization Conference, PacificVis 2025
Country/TerritoryJapan
CityTokyo
Period2025/04/222025/04/25

Keywords

  • Artificial intelligence
  • Computer Vision
  • Computing methodologies
  • Natural language processing

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'MATIC: Multilingual Accurate Textual Image Customization via Joint Generative Artificial Intelligence'. Together they form a unique fingerprint.

Cite this