Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning

Yu Te Wu, Berlin Chen, Li Su

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Spectral features are limited in modeling musical signals with multiple concurrent pitches due to the challenge to suppress the interference of the harmonic peaks from one pitch to another. In this paper, we show that using multiple features represented in both the frequency and time domains with deep learning modeling can reduce such interference. These features are derived systematically from conventional pitch detection functions that relate to one another through the discrete Fourier transform and a nonlinear scaling function. Neural networks modeled with these features outperform state-of-the-art methods while using less training data.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages401-405
Number of pages5
ISBN (Print)9781538646588
DOIs
Publication statusPublished - 2018 Sep 10
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 2018 Apr 152018 Apr 20

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Conference

Conference2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period18/4/1518/4/20

Fingerprint

Transcription
Discrete Fourier transforms
Neural networks
Deep learning

Keywords

  • Automatic music transcription
  • Cepstrum
  • Convolutional neural networks
  • Deep learning

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Wu, Y. T., Chen, B., & Su, L. (2018). Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (pp. 401-405). [8462079] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8462079

Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning. / Wu, Yu Te; Chen, Berlin; Su, Li.

2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. p. 401-405 8462079 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wu, YT, Chen, B & Su, L 2018, Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings., 8462079, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018-April, Institute of Electrical and Electronics Engineers Inc., pp. 401-405, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 18/4/15. https://doi.org/10.1109/ICASSP.2018.8462079
Wu YT, Chen B, Su L. Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. p. 401-405. 8462079. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2018.8462079
Wu, Yu Te ; Chen, Berlin ; Su, Li. / Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 401-405 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{fcd7df3a4aa24d9da53a12fe6e624c7f,
title = "Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning",
abstract = "Spectral features are limited in modeling musical signals with multiple concurrent pitches due to the challenge to suppress the interference of the harmonic peaks from one pitch to another. In this paper, we show that using multiple features represented in both the frequency and time domains with deep learning modeling can reduce such interference. These features are derived systematically from conventional pitch detection functions that relate to one another through the discrete Fourier transform and a nonlinear scaling function. Neural networks modeled with these features outperform state-of-the-art methods while using less training data.",
keywords = "Automatic music transcription, Cepstrum, Convolutional neural networks, Deep learning",
author = "Wu, {Yu Te} and Berlin Chen and Li Su",
year = "2018",
month = "9",
day = "10",
doi = "10.1109/ICASSP.2018.8462079",
language = "English",
isbn = "9781538646588",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "401--405",
booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",

}

TY - GEN

T1 - Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning

AU - Wu, Yu Te

AU - Chen, Berlin

AU - Su, Li

PY - 2018/9/10

Y1 - 2018/9/10

N2 - Spectral features are limited in modeling musical signals with multiple concurrent pitches due to the challenge to suppress the interference of the harmonic peaks from one pitch to another. In this paper, we show that using multiple features represented in both the frequency and time domains with deep learning modeling can reduce such interference. These features are derived systematically from conventional pitch detection functions that relate to one another through the discrete Fourier transform and a nonlinear scaling function. Neural networks modeled with these features outperform state-of-the-art methods while using less training data.

AB - Spectral features are limited in modeling musical signals with multiple concurrent pitches due to the challenge to suppress the interference of the harmonic peaks from one pitch to another. In this paper, we show that using multiple features represented in both the frequency and time domains with deep learning modeling can reduce such interference. These features are derived systematically from conventional pitch detection functions that relate to one another through the discrete Fourier transform and a nonlinear scaling function. Neural networks modeled with these features outperform state-of-the-art methods while using less training data.

KW - Automatic music transcription

KW - Cepstrum

KW - Convolutional neural networks

KW - Deep learning

UR - http://www.scopus.com/inward/record.url?scp=85054289665&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054289665&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8462079

DO - 10.1109/ICASSP.2018.8462079

M3 - Conference contribution

AN - SCOPUS:85054289665

SN - 9781538646588

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 401

EP - 405

BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -