predicting missing music components with bidirectional …randall/publications/blstm... · 2016. 8....

1
References in brief: [1] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- works,” Signal Processing, IEEE Transactions on, vol. 45, no. 11, pp. 2673–2681, 1997. [2] J. Schmidhuber, “Long short-term memory: Tutorial on lstm recurrent networks,” http://people.idsia.ch/juergen/lstm/index.htm, 2003, [3] E. Foxley, “Nottingham dataset,” http://ifdo.ca/seymour/nottingham/ nottingham.html, 2011, accessed: 04-19-2015 [4] M. Greentree. (1996) http://www.jsbchorales.net/index.shtml. Accessed: 04-19-2015. Predicting Missing Music Components With Bidirectional Long Short-Term Memory Neural Networks I-Ting Liu 1 and Richard Randall 1,2 INTRODUCTION METHODS School of Music, Carnegie Mellon University, Pittsburgh, US 1 , Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, US 2 07.05.16 MA and SATB texture RESULTS & DISCUSSION Successfully predicting missing components (entire parts or voices) from complex multipart musical textures has attracted researchers of music information retrieval and music theory. However, these applications were limited to either two-part melody and accompaniment (MA) textures or four-part Soprano-Alto-Tenor-Bass (SATB) textures. We propose a robust framework applicable to both textures using a Bidirectional Long-Short Term Memory (BLSTM) recurrent neural network. We treat each voice as a part (e.g. the melody of the MA texture or the Soprano of the SATB texture) and the problem we address is given an incomplete texture, how successfully can we generate the missing part. Predictions are made using a Bidirectional Long-Short Term Memory (BLSTM) recurrent neural network that is able to learn the relationship between components, and can thus be trained to predict missing components. MA: The inputs at time t are the notes of the melody, a 12-dimensional vector of the pitch class. The output is the chord played at time t. SATB: The inputs are the pitches of the notes played at time t, a 88- dimensional vector (88 notes on a keyboard.) The outputs at time t is the predicted missing note at time t. A Bidirectional Recurrent Neural Network consists of two hidden layers, both connecting to the same input and output. BRNN can be trained using standard back propagation through time. ? A Bidirectional Recurrent Neural Network (BRNN) unfold in time. A LSTM block that contains one linear cell (orange) and three non- linear gating units (green) BLSTM LSTM BRNN RNN MLP Accuracy 0 0.2 0.4 0.6 0.8 54.66% 66.58% 68.86% 67.57% 71.13% ? Soprano Alto Tenor Bass Accuracy 0 0.2 0.4 0.6 0.8 BLSTM LSTM BRNN RNN MLP 2-Part MA Dataset: Nottingham Folk Tunes, 962 double-track midi files 4-Part SATB Dataset: J.S. Bach Chorale, 378 multi-track midi files Original Predicted Original Predicted

Upload: others

Post on 20-Feb-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Missing Music Components With Bidirectional …randall/publications/BLSTM... · 2016. 8. 27. · •Successfully predicting missing components (entire parts or voices)

References in brief: [1] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- works,” Signal Processing, IEEE Transactions on, vol. 45, no. 11, pp. 2673–2681, 1997.[2] J. Schmidhuber, “Long short-term memory: Tutorial on lstm recurrentnetworks,” http://people.idsia.ch/∼juergen/lstm/index.htm, 2003,[3] E. Foxley, “Nottingham dataset,” http://ifdo.ca/∼seymour/nottingham/nottingham.html, 2011, accessed: 04-19-2015[4] M. Greentree. (1996) http://www.jsbchorales.net/index.shtml. Accessed:04-19-2015.

Predicting Missing Music Components With Bidirectional Long Short-Term Memory Neural Networks

I-Ting Liu1 and Richard Randall1,2

INTRODUCTION

METHODS

School of Music, Carnegie Mellon University, Pittsburgh, US1, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, US2

07.05.16

MA and SATB texture

RESULTS & DISCUSSION

• Successfully predicting missing components (entire parts or voices) from complex multipart musical textures has attracted researchers of music information retrieval and music theory. However, these applications were limited to either two-part melody and accompaniment (MA) textures or four-part Soprano-Alto-Tenor-Bass (SATB) textures.

• We propose a robust framework applicable to both textures using a Bidirectional Long-Short Term Memory (BLSTM) recurrent neural network.

• We treat each voice as a part (e.g. the melody of the MA texture or the Soprano of the SATB texture) and the problem we address is given an incomplete texture, how successfully can we generate the missing part.

• Predictions are made using a Bidirectional Long-Short Term Memory (BLSTM) recurrent neural network that is able to learn the relationship between components, and can thus be trained to predict missing components.

• MA: The inputs at time t are the notes of the melody, a 12-dimensional vector of the pitch class. The output is the chord played at time t.

• SATB: The inputs are the pitches of the notes played at time t, a 88-dimensional vector (88 notes on a keyboard.) The outputs at time t is the predicted missing note at time t.

• A Bidirectional Recurrent Neural Network consists of two hidden layers, both connecting to the same input and output. BRNN can be trained using standard back propagation through time.

?

A Bidirectional Recurrent Neural Network (BRNN) unfold in time.

A LSTM block that contains one linear cell (orange) and three non- linear gating units (green)

BLSTM

LSTM

BRNN

RNN

MLP

Accuracy

0 0.2 0.4 0.6 0.8

54.66%

66.58%

68.86%

67.57%

71.13%

?

Soprano

Alto

Tenor

Bass

Accuracy

0 0.2 0.4 0.6 0.8

BLSTMLSTMBRNNRNNMLP

• 2-Part MA Dataset: Nottingham Folk Tunes, 962 double-track midi files• 4-Part SATB Dataset: J.S. Bach Chorale, 378 multi-track midi files

Original

Predicted

Original

Predicted