Multi-Instrument Automatic Music Transcription with Self-Attention-Based Instance Segmentation

Yu Te Wu, Berlin Chen, Li Su*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

34 Citations (Scopus)


Multi-instrument automatic music transcription (AMT) is a critical but less investigated problem in the field of music information retrieval (MIR). With all the difficulties faced by traditional AMT research, multi-instrument AMT needs further investigation on high-level music semantic modeling, efficient training methods for multiple attributes, and a clear problem scenario for system performance evaluation. In this article, we propose a multi-instrument AMT method, with signal processing techniques specifying pitch saliency, novel deep learning techniques, and concepts partly inspired by multi-object recognition, instance segmentation, and image-to-image translation in computer vision. The proposed method is flexible for all the sub-tasks in multi-instrument AMT, including multi-instrument note tracking, a task that has rarely been investigated before. State-of-the-art performance is also reported in the sub-task of multi-pitch streaming.

Original languageEnglish
Article number9222310
Pages (from-to)2796-2809
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Publication statusPublished - 2020


  • Automatic music transcription
  • deep learning
  • multi-pitch estimation
  • multi-pitch streaming
  • self-attention

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multi-Instrument Automatic Music Transcription with Self-Attention-Based Instance Segmentation'. Together they form a unique fingerprint.

Cite this