Abstract
This work presents a comparative study of voice activity detection in noise using simple acoustic features and relatively compact recurrent models within a controlled MATLAB-based framework. For each utterance, 9 baseline spectral-plus-periodicity features, MFCCs, and FBANKs are extracted and passed to several lightweight BiLSTM-based networks, either alone or preceded by a 1D CNN layer. The main experiments are carried out at a fixed SNR to separate the influence of the network structure and the feature type, and an additional series with four SNR levels is used to assess whether the same performance trends hold when the SNR varies. The results show that adding a compact CNN front-end before the BiLSTM consistently improves detection scores, that MFCCs generally outperform the baseline spectral–periodicity features and often give better recall/F1 than FBANKs for the considered lightweight models, and that (Formula presented.) +BiLSTM with 13-dimensional MFCCs offers a favorable trade-off between accuracy, robustness across SNRs, and model size. Because all conditions share a single MATLAB implementation with fixed noise types, SNR values, and evaluation metrics, this work is positioned as a benchmark and practical guideline publication for noise-robust, resource-constrained VAD, rather than as a proposal of a completely new deep-learning architecture.
| Original language | English |
|---|---|
| Article number | 263 |
| Journal | Electronics (Switzerland) |
| Volume | 15 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 2026 Jan |
Keywords
- convolutional neural network
- noise robustness
- speech enhancement
- voice activity detection
ASJC Scopus subject areas
- Control and Systems Engineering
- Signal Processing
- Hardware and Architecture
- Computer Networks and Communications
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'A Comparative Experimental Study on Simple Features and Lightweight Models for Voice Activity Detection in Noisy Environments'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS