Abstract
This study presents AMUSE++, an advanced speech enhancement framework that extends the MUSE++ model by redesigning its core Mamba module with two major improvements. First, the originally unidirectional one-dimensional (1D) Mamba is transformed into a bi-directional architecture to capture temporal dependencies more effectively. Second, this module is extended to a two-dimensional (2D) structure that jointly models both time and frequency dimensions, capturing richer speech features essential for enhancement tasks. In addition to these structural changes, we propose a Preliminary Denoising Module (PDM) as an advanced front-end, which is composed of multiple cascaded 2D bi-directional Mamba Blocks designed to preprocess and denoise input speech features before the main enhancement stage. Extensive experiments on the VoiceBank+DEMAND dataset demonstrate that AMUSE++ significantly outperforms both the backbone MUSE++ across a variety of objective speech enhancement metrics, including improvements in perceptual quality and intelligibility. These results confirm that the combination of bi-directionality, two-dimensional modeling, and an enhanced denoising frontend provides a powerful approach for tackling challenging noisy speech scenarios. AMUSE++ thus represents a notable advancement in neural speech enhancement architectures, paving the way for more effective and robust speech enhancement systems in real-world applications.
| Original language | English |
|---|---|
| Article number | 282 |
| Journal | Electronics (Switzerland) |
| Volume | 15 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 2026 Jan |
Keywords
- bi-directional 2D mamba
- lightweight neural networks
- mamba state-space models
- speech enhancement
- time–frequency modeling
ASJC Scopus subject areas
- Control and Systems Engineering
- Signal Processing
- Hardware and Architecture
- Computer Networks and Communications
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'AMUSE++: A Mamba-Enhanced Speech Enhancement Framework with Bi-Directional and Advanced Front-End Modeling'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS