Abstract
Multi-instrument automatic music transcription (AMT) is a critical but less investigated problem in the field of music information retrieval (MIR). With all the difficulties faced by traditional AMT research, multi-instrument AMT needs further investigation on high-level music semantic modeling, efficient training methods for multiple attributes, and a clear problem scenario for system performance evaluation. In this article, we propose a multi-instrument AMT method, with signal processing techniques specifying pitch saliency, novel deep learning techniques, and concepts partly inspired by multi-object recognition, instance segmentation, and image-to-image translation in computer vision. The proposed method is flexible for all the sub-tasks in multi-instrument AMT, including multi-instrument note tracking, a task that has rarely been investigated before. State-of-the-art performance is also reported in the sub-task of multi-pitch streaming.
Original language | English |
---|---|
Article number | 9222310 |
Pages (from-to) | 2796-2809 |
Number of pages | 14 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 28 |
DOIs | |
Publication status | Published - 2020 |
Keywords
- Automatic music transcription
- deep learning
- multi-pitch estimation
- multi-pitch streaming
- self-attention
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering