references978-3-319-70163...deep learning techniques for music generation – a survey, september...

References

1. Moray Allan and Christopher K. I. Williams. Harmonising chorales by probabilistic infer-ence. Advances in Neural Information Processing Systems, 17:25–32, 2005.

2. Giuseppe Amato, Malte Behrmann, Frederic Bimbot, Baptiste Caramiaux, Fabrizio Falchi,Ander Garcia, Joost Geurts, Jaume Gibert, Guillaume Gravier, Hadmut Holken, HartmutKoenitz, Sylvain Lefebvre, Antoine Liutkus, Fabien Lotte, Andrew Perkis, Rafael Redondo,Enrico Turrin, Thierry Vieville, and Emmanuel Vincent. AI in the media and creative indus-tries, May 2019. arXiv:1905.04175v1.

3. Gerard Assayag, Camilo Rueda, Mikael Laurson, Carlos Agon, and Olivier Delerue. Com-puter assisted composition at IRCAM: From PatchWork to OpenMusic. Computer MusicJournal (CMJ), 23(3):59–72, September 1999.

4. Lei Jimmy Ba and Rich Caruana. Do deep nets really need to be deep?, October 2014.arXiv:1312.6184v7.

5. Johann Sebastian Bach. 389 Chorales (Choral-Gesange). Alfred Publishing Company, 1985.6. David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and

Klaus-Robert Muller. How to explain individual classification decisions. Journal of MachineLearning Research (JMLR), (11):1803–1831, June 2010.

7. Gabriele Barbieri, Francois Pachet, Pierre Roy, and Mirko Degli Esposti. Markov constraintsfor generating lyrics with style. In Proceedings of the 20th European Conference on ArtificialIntelligence (ECAI 2012), pages 115–120, Montpellier, France, August 2012.

8. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A reviewand new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 35(8):1798–1828, August 2013.

9. Piotr Bojanowski, Armand Joulin, David Lopez-Paz, and Arthur Szlam. Optimizing thelatent space of generative networks, July 2017. arXiv:1707.05776v1.

10. Diane Bouchacourt, Emily Denton, Tejas Kulkarni, Honglak Lee, SiddharthNarayanaswamy, David Pfau, and Josh Tenenbaum (Eds.). NIPS 2017 Workshop onLearning Disentangled Representations: from Perception to Control, December 2017.https://sites.google.com/view/disentanglenips2017.

11. Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporaldependencies in high-dimensional sequences: Application to polyphonic music generationand transcription. In Proceedings of the 29th International Conference on Machine Learning(ICML-12), pages 1159–1166, Edinburgh, Scotland, U.K., 2012.

12. Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Chapter 14th – Mod-eling and generating sequences of polyphonic music with the RNN-RBM. In Deep LearningTutorial – Release 0.1, pages 149–158. LISA lab, University of Montreal, September 2015.http://deeplearning.net/tutorial/deeplearning.pdf.

13. Mason Bretan, Gil Weinberg, and Larry Heck. A unit selection methodology for musicgeneration using deep neural networks. In Ashok Goel, Anna Jordanous, and Alison Pease,

251© Springer Nature Switzerland AG 2020

J.-P. Briot et al., Deep Learning Techniques for Music Generation, Computational

Synthesis and Creative Systems, https://doi.org/10.1007/978-3-319-70163-9

https://sites.google.com/view/disentanglenips2017

http://deeplearning.net/tutorial/deeplearning.pdf

https://doi.org/10.1007/978-3-319-70163-9

252 References

editors, Proceedings of the 8th International Conference on Computational Creativity (ICCC2017), pages 72–79, Atlanta, GA, USA, June 2017.

14. Jean-Pierre Briot, Gaetan Hadjeres, and Francois-David Pachet. Deep learning techniquesfor music generation – A survey, September 2017. arXiv:1709.01620.

15. Jean-Pierre Briot and Francois Pachet. Music generation by deep learning – Challenges anddirections. Neural Computing and Applications (NCAA), October 2018. Special Issue onDeep Learning for Music and Audio.

16. Shan Carter, Zan Armstrong, Ludwig Schubert, Ian Johnson, and Chris Olah. Activationatlas. Distill, March 2019. https://distill.pub/2019/activation-atlas.

17. Davide Castelvecchi. The black box of AI. Nature, 538:20–23, October 2016.18. E. Colin Cherry. Some experiments on the recognition of speech, with one and two ears. The

Journal of the Acoustical Society of America, 25(5):975–979, September 1953.19. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi

Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNNEncoder-Decoder for statistical machine translation, September 2014. arXiv:1406.1078v3.

20. Keunwoo Choi, Gyorgy Fazekas, Kyunghyun Cho, and Mark Sandler. A tutorial on deeplearning for music information retrieval, September 2017. arXiv:1709.04396v1.

21. Keunwoo Choi, Gyorgy Fazekas, and Mark Sandler. Text-based LSTM networks for auto-matic music composition. In 1st Conference on Computer Simulation of Musical Creativity(CSMC 16), Huddersfield, U.K., June 2016.

22. Francois Chollet. Building autoencoders in Keras, May 2016. https://blog.keras.io/building-autoencoders-in-keras.html.

23. Anna Choromanska, Mikael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun.The loss surfaces of multilayer networks, January 2015. arXiv:1412.0233v3.

24. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empiricalevaluation of gated recurrent neural networks on sequence modeling, December 2014.arXiv:1412.3555v1.

25. Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, and Lin-Shan Lee. Au-dio Word2Vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder, June 2016. arXiv:1603.00982v4.

26. David Cope. The Algorithmic Composer. A-R Editions, 2000.27. David Cope. Computer Models of Musical Creativity. MIT Press, 2005.28. Fabrizio Costa, Thomas Gartner, Andrea Passerini, and Francois Pachet.

Constructive Machine Learning – Workshop Proceedings, December 2016.http://www.cs.nott.ac.uk/˜psztg/cml/2016/.

29. Marcio Dahia, Hugo Santana, Ernesto Trajano, Carlos Sandroni, and Geber Ramalho. Gen-erating rhythmic accompaniment for guitar: the Cyber-Joao case study. In Proceedings ofthe IX Brazilian Symposium on Computer Music (SBCM 2003), pages 7–13, Campinas, SP,Brazil, August 2003.

30. Shuqi Dai, Zheng Zhang, and Gus Guangyu Xia. Music style transfer issues: A positionpaper, March 2018. arXiv:1803.06841v1.

31. Ernesto Trajano de Lima and Geber Ramalho. On rhythmic pattern extraction in bossa novamusic. In Proceedings of the 9th International Conference on Music Information Retrieval(ISMIR 2008), pages 641–646, Philadelphia, PA, USA, September 2008. ISMIR.

32. Roger T. Dean and Alex McLean, editors. The Oxford Handbook of Algorithmic Music.Oxford Handbooks. Oxford University Press, 2018.

33. Jean-Marc Deltorn. Deep creations: Intellectual property and the automata. Frontiers inDigital Humanities, 4, February 2017. Article 3.

34. Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. Learning where toattend with deep architectures for image tracking, September 2011. arXiv:1109.3737v1.

35. Guillaume Desjardins, Aaron Courville, and Yoshua Bengio. Disentangling factors of varia-tion via generative entangling, October 2012. arXiv:1210.5474v1.

36. Rob DiPietro. A friendly introduction to cross-entropy loss, 02/05/2016.https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/.

https://distill.pub/2019/activation-atlas

https://blog.keras.io/building-autoencoders-in-keras.html

http://www.cs.nott.ac.uk/%CB%9Cpsztg/cml/2016/

https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

https://blog.keras.io/building-autoencoders-in-keras.html

References 253

37. Carl Doersch. Tutorial on variational autoencoders, August 2016. arXiv:1606.05908v2.38. Pedro Domingos. A few useful things to know about machine learning. Communications of

the ACM (CACM), 55(10):78–87, October 2012.39. Kenji Doya and Eiji Uchibe. The Cyber Rodent project: Exploration of adaptive mechanisms

for self-preservation and self-reproduction. Adaptive Behavior, 13(2):149–160, 2005.40. Shlomo Dubnov and Greg Surges. Chapter 6 – Delegating creativity: Use of musical al-

gorithms in machine listening and composition. In Newton Lee, editor, Digital Da Vinci –Computers in Music, pages 127–158. Springer-Verlag, 2014.

41. Kemal Ebcioglu. An expert system for harmonizing four-part chorales. Computer MusicJournal (CMJ), 12(3):43–51, Autumn 1988.

42. Douglas Eck and Jurgen Schmidhuber. A first look at music composition using LSTM recur-rent neural networks. Technical report, IDSIA/USI-SUPSI, Manno, Switzerland, 2002. No.IDSIA-07-02.

43. Ronen Eldan and Ohad Shamir. The power of depth for feedforward neural networks, May2016. arXiv:1512.03965v4.

44. Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, and Marian Mazzone. CAN: Cre-ative adversarial networks generating “art” by learning about styles and deviating from stylenorms, June 2017. arXiv:1706.07068v1.

45. Emerging Technology from the arXiv. Deep learning machine solvesthe cocktail party problem. MIT Technology Review, April 2015.https://www.technologyreview.com/s/537101/deep-learning-machine-solves-the-cocktail-party-problem/.

46. Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, and Pascal Vin-cent. Why does unsupervised pre-training help deep learning? Journal of Machine LearningResearch (JMLR), (11):625–660, 2010.

47. Douglas Eck et al. Magenta Project, Accessed on 20/06/2017.https://magenta.tensorflow.org.

48. Francois Pachet et al. Flow Machines – Artificial Intelligence for the future of music, 2012.http://www.flow-machines.com.

49. Otto Fabius and Joost R. van Amersfoort. Variational recurrent auto-encoders, June 2015.arXiv:1412.6581v6.

50. Jose David Fernandez and Francisco Vico. AI methods in algorithmic composition: A com-prehensive survey. Journal of Artificial Intelligence Research (JAIR), (48):513–582, 2013.

51. Rebecca Fiebrink and Baptiste Caramiaux. The machine learning algorithm as creative mu-sical tool, November 2016. arXiv:1611.00379v1.

52. Davis Foote, Daylen Yang, and Mostafa Rohaninejad. Audio style transfer – Do androidsdream of electric beats?, December 2016. https://audiostyletransfer.wordpress.com.

53. Eric Foxley. Nottingham Database, Accessed on 12/03/2018.https://ifdo.ca/˜seymour/nottingham/nottingham.html.

54. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elementsof Reusable Object-Oriented Software. Professional Computing Series. Addison-Wesley,1995.

55. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artisticstyle, September 2015. arXiv:1508.06576v2.

56. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Image style transfer using convo-lutional neural networks. In Proceedings of the 2016 IEEE Conference on Computer Visionand Pattern Recognition (CVPR), pages 2414–2423. IEEE, June 2016.

57. Robert Gauldin. A Practical Approach to Eighteenth-Century Counterpoint. Waveland Press,1988.

58. Michael Genesereth and Yngvi Bjornsson. The international general game playing competi-tion. AI Magazine, pages 107–111, Summer 2013.

59. Felix A. Gers and Jurgen Schmidhuber. Recurrent nets that time and count. In Proceedingsof the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000.Neural Computing: New Challenges and Perspectives for the New Millennium, volume 3,pages 189–194. IEEE, 2000.

https://www.technologyreview.com/s/537101/deep-learning-machine-solves-the-cocktail-party-problem/

https://magenta.tensorflow.org

http://www.flow-machines.com

https://audiostyletransfer.wordpress.com

https://ifdo.ca/%CB%9Cseymour/nottingham/nottingham.html

https://www.technologyreview.com/s/537101/deep-learning-machine-solves-the-cocktail-party-problem/

254 References

60. Kratarth Goel, Raunaq Vohra, and J. K. Sahoo. Polyphonic music generation by modelingtemporal dependencies using a RNN-DBN. In Proceedings of the International Conferenceon Artificial Neural Networks, number 8681 in Theoretical Computer Science and GeneralIssues, pages 217–224. Springer International Publishing, 2014.

61. Michael Good. MusicXML for notation and analysis. In Walter B. Hewlett and EleanorSelfridge-Field, editors, The Virtual Score: Representation, Retrieval, Restoration, pages113–124. MIT Press, 2001.

62. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.63. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sher-

jil Ozairy, Aaron Courville, and Yoshua Bengio. Generative adversarial nets, June 2014.arXiv:1406.2661v1.

64. Alex Graves. Generating sequences with recurrent neural networks, June 2014.arXiv:1308.0850v5.

65. Alex Graves, Greg Wayne, and Ivo Danihelka. Neural Turing machines, December 2014.arXiv:1410.5401v2.

66. Gaetan Hadjeres. Interactive Deep Generative Models for Symbolic Music. PhD thesis,Ecole Doctorale EDITE, Sorbonne Universite, Paris, France, June 2018.

67. Gaetan Hadjeres and Frank Nielsen. Interactive music generation with positional constraintsusing Anticipation-RNN, September 2017. arXiv:1709.06404v1.

68. Gaetan Hadjeres, Frank Nielsen, and Francois Pachet. GLSR-VAE: Geodesic latent spaceregularization for variational autoencoder architectures, July 2017. arXiv:1707.04588v1.

69. Gaetan Hadjeres, Francois Pachet, and Frank Nielsen. DeepBach: a steerable model for Bachchorales generation, June 2017. arXiv:1612.01010v2.

70. Jeff Hao. Hao staff piano roll sheet music, Accessed on 19/03/2017.http://haostaff.com/store/index.php?main page=article.

71. Dominik Harnel. ChordNet: Learning and producing voice leading with neural networks anddynamic programming. Journal of New Music Research (JNMR), 33(4):387–397, 2004.

72. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learn-ing: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer-Verlag,2009.

73. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for imagerecognition, December 2015. arXiv:1512.03385v1.

74. Dorien Herremans and Ching-Hua Chuan. Deep Learning for Music – Workshop Proceed-ings, May 2017. http://dorienherremans.com/dlm2017/.

75. Dorien Herremans and Ching-Hua Chuan. The emergence of deep learning: new opportuni-ties for music and audio technologies. Neural Computing and Applications (NCAA), April2019. Special Issue on Deep Learning for Music and Audio.

76. Walter Hewlett, Frances Bennion, Edmund Correia, and Steve Rasmussen. Muse-Data – an electronic library of classical music scores, Accessed on 12/03/2018.http://www.musedata.org.

77. Lejaren A. Hiller and Leonard M. Isaacson. Experimental Music: Composition with an Elec-tronic Computer. McGraw-Hill, 1959.

78. Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Computation, 14(8):1771–1800, August 2002.

79. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deepbelief nets. Neural Computation, 18(7):1527–1554, July 2006.

80. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. Reducing the dimensionality of data withneural networks. Science, 313(5786):504–507, 2006.

81. Geoffrey E. Hinton and Terrence J. Sejnowski. Learning and relearning in Boltzmann ma-chines. In David E. Rumelhart, James L. McClelland, and PDP Research Group, editors,Parallel Distributed Processing – Explorations in the Microstructure of Cognition: Volume 1Foundations, pages 282–317. MIT Press, Cambridge, MA, USA, 1986.

82. Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Computation,9(8):1735–1780, 1997.

http://haostaff.com/store/index.php?main_page=article

http://dorienherremans.com/dlm2017/

http://www.musedata.org

References 255

83. Douglas Hofstadter. Staring Emmy straight in the eye–and doing my best not to flinch. InDavid Cope, editor, Virtual Music – Computer Synthesis of Musical Style, pages 33–82. MITPress, 2001.

84. Hooktheory. Theorytabs, Accessed on 26/07/2017. https://www.hooktheory.com/theorytab.85. Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Net-

works, 4(2):251–257, 1991.86. Allen Huang and Raymond Wu. Deep learning for music, June 2016. arXiv:1606.04930v1.87. Cheng-Zhi Anna Huang, David Duvenaud, and Krzysztof Z. Gajos. ChordRipple: Recom-

mending chords to help novice composers go beyond the ordinary. In Proceedings of the 21stInternational Conference on Intelligent User Interfaces (IUI 16), pages 241–250, Sonoma,CA, USA, March 2016. ACM.

88. Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon CurtisHawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck.Music transformer: Generating music with long-term structure generating music with long-term structure, December 2018. arXiv:1809.04281v3.

89. Eric J. Humphrey, Juan P. Bello, and Yann LeCun. Feature learning and deep architectures:New directions for music informatics. Journal of Intelligent Information Systems (JIIS),41(3):461–481, 2013.

90. Patrick Hutchings and Jon McCormack. Using autonomous agents to improvise music com-positions in real-time. In Joao Correia, Vic Ciesielski, and Antonios Liapis, editors, Com-putational Intelligence in Music, Sound, Art and Design – 6th International Conference,EvoMUSART 2017, Amsterdam, The Netherlands, April 19–21, 2017, Proceedings, number10198 in Theoretical Computer Science and General Issues, pages 114–127. Springer Inter-national Publishing, 2017.

91. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network trainingby reducing internal covariate shift, March 2015. http://arxiv.org/abs/1502.03167v3.

92. Natasha Jaques, Shixiang Gu, Richard E. Turner, and Douglas Eck. Tuning recurrent neuralnetworks with reinforcement learning, November 2016. arXiv:1611.02796.

93. Daniel Johnson. Composing music with recurrent neural networks, August 2015.http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/.

94. Daniel D. Johnson. Generating polyphonic music using tied parallel networks. In JoaoCorreia, Vic Ciesielski, and Antonios Liapis, editors, Computational Intelligence in Music,Sound, Art and Design – 6th International Conference, EvoMUSART 2017, Amsterdam, TheNetherlands, April 19–21, 2017, Proceedings, number 10198 in Theoretical Computer Sci-ence and General Issues, pages 128–143. Springer International Publishing, 2017.

95. Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning:A survey. Journal of Artificial Intelligence Research (JAIR), (4):237–285, 1996.

96. Ujjwal Karn. An intuitive explanation of convolutional neural networks, August 2016.https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/.

97. Andrej Karpathy. The unreasonable effectiveness of recurrent neural networks, May 2015.http://karpathy.github.io/2015/05/21/rnn-effectiveness/.

98. Jeremy Keith. The Session, Accessed on 21/12/2016. https://thesession.org.99. Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schutt,

Sven Dahne, Dumitru Erhan, and Been Kim. The (un)reliability of saliency methods, Novem-ber 2017. arXiv:1711.00867v1.

100. Pieter-Jan Kindermans, Kristof T. Schutt, Maximilian Alber, Klaus-Robert Muller, DumitruErhan, Been Kim, and Sven Dahne. Learning how to explain neural networks: PatternNetand PatternAttribution, 2017. arXiv:1705.05598v2.

101. Diederik P. Kingma and Max Welling. Auto-encoding variational Bayes, May 2014.arXiv:1312.6114v10.

102. Jan Koutnık, Klaus Greff Faustino Gomez, and Jurgen Schmidhuber. A Clockwork RNN,December 2014. arXiv:1402.3511v1.

103. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deepconvolutional neural networks. In Proceedings of the 25th International Conference on Neu-ral Information Processing Systems, volume 1 of NIPS 2012, pages 1097–1105, Lake Tahoe,NV, USA, 2012. Curran Associates Inc.

https://www.hooktheory.com/theorytab

http://arxiv.org/abs/1502.03167v3

http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/

https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

https://thesession.org

256 References

104. Bernd Krueger. Classical Piano Midi Page, Accessed on 12/03/2018. http://piano-midi.de/.105. Andrey Kurenkov. A ‘brief’ history of neural nets and deep learning, Part 4, Decem-

ber 2015. http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/.

106. Patrick Lam. MCMC methods: Gibbs sampling and the Metropolis-Hastings algorithm, Ac-cessed on 21/12/2016. http://pareto.uab.es/mcreel/IDEA2017/Bayesian/MCMC/mcmc.pdf.

107. Kevin J. Lang, Alex H. Waibel, and Geoffrey E. Hinton. A time-delay neural network archi-tecture for isolated word recognition. Neural Networks, 3(1):23–43, 1990.

108. Stefan Lattner, Maarten Grachten, and Gerhard Widmer. Imposing higher-level structure inpolyphonic music generation using convolutional restricted Boltzmann machines and con-straints. Journal of Creative Music Systems (JCMS), 2(2), March 2018.

109. Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Cor-rado, Jeff Dean, and Andrew Y. Ng. Building high-level features using large scale unsuper-vised learning. In 29th International Conference on Machine Learning, Edinburgh, U.K.,2012.

110. Yann LeCun and Yoshua Bengio. Convolutional networks for images, speech, and time-series. In Michael A. Arbib, editor, The handbook of brain theory and neural networks,pages 255–258. MIT Press, Cambridge, MA, USA, 1998.

111. Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learningapplied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.

112. Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. The MNIST database of hand-written digits, 1998. http://yann.lecun.com/exdb/mnist/.

113. Yann LeCun, John S. Denker, and Sara A. Solla. Optimal brain damage. In David S. Touret-zky, editor, Advances in Neural Information Processing Systems 2, pages 598–605. MorganKaufmann, 1990.

114. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deepbelief networks for scalable unsupervised learning of hierarchical representations. In Pro-ceedings of the 26th Annual International Conference on Machine Learning (ICML 2009),pages 609–616, Montreal, QC, Canada, June 2009. ACM.

115. Andre Lemme, Rene Felix Reinhart, and Jochen Jakob Steil. Online learning and general-ization of parts-based image representations by non-negative sparse autoencoders. NeuralNetworks, 33:194–203, September 2012.

116. Fei-Fei Li, Andrej Karpathy, and Justin Johnson. Convolutional neural networks (CNNs /ConvNets) – CS231n Convolutional neural networks for visual recognition Lecture Notes,Winter 2016. http://cs231n.github.io/convolutional-networks/#conv.

117. Feynman Liang. BachBot, 2016. https://github.com/feynmanliang/bachbot.118. Feynman Liang. BachBot: Automatic composition in the style of Bach chorales – Devel-

oping, analyzing, and evaluating a deep LSTM model for musical style. Master’s thesis,University of Cambridge, Cambridge, U.K., August 2016. M.Phil in Machine Learning,Speech, and Language Technology.

119. Hyungui Lim, Seungyeon Ryu, and Kyogu Lee. Chord generation from symbolic melodyusing BLSTM networks. In Xiao Hu, Sally Jo Cunningham, Doug Turnbull, and ZhiyaoDuan, editors, Proceedings of the 18th International Society for Music Information RetrievalConference (ISMIR 2017), pages 621–627, Suzhou, China, October 2017. ISMIR.

120. Qi Lyu, Zhiyong Wu, Jun Zhu, and Helen Meng. Modelling high-dimensional sequences withLSTM-RTRBM: Application to polyphonic music generation. In Proceedings of the 24thInternational Conference on Artificial Intelligence, pages 4138–4139. AAAI Press, 2015.

121. Sephora Madjiheurem, Lizhen Qu, and Christian Walder. Chord2Vec: Learning musicalchord embeddings. In Proceedings of the Constructive Machine Learning Workshop at 30thConference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, De-cember 2016.

122. Dimos Makris, Maximos Kaliakatsos-Papakostas, Ioannis Karydis, and Katia Lida Kermani-dis. Combining LSTM and feed forward neural networks for conditional rhythm compo-sition. In Giacomo Boracchi, Lazaros Iliadis, Chrisina Jayne, and Aristidis Likas, editors,

http://piano-midi.de/

http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/

http://pareto.uab.es/mcreel/IDEA2017/Bayesian/MCMC/mcmc.pdf

http://yann.lecun.com/exdb/mnist/

http://cs231n.github.io/convolutional-networks/#conv

https://github.com/feynmanliang/bachbot

http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/

References 257

Engineering Applications of Neural Networks: 18th International Conference, EANN 2017,Athens, Greece, August 25–27, 2017, Proceedings, Communications in Computer and Infor-mation Science, pages 570–582. Springer International Publishing, 2017.

123. Iman Malik and Carl Henrik Ek. Neural translation of musical style, August 2017.arXiv:1708.03535v1.

124. Stephane Mallat. GANs vs VAEs, September 2018. Personal communication.125. Rachel Manzelli, Vijay Thakkar, Ali Siahkamari, and Brian Kulis. Conditioning deep gen-

erative raw audio models for structured automatic music. In Proceedings of the 19th Inter-national Society for Music Information Retrieval Conference (ISMIR 2018), pages 182–189,Paris, France, September 2018. ISMIR.

126. Huanru Henry Mao, Taylor Shin, and Garrison W. Cottrell. DeepJ: Style-specific musicgeneration, January 2018. arXiv:1801.00887v1.

127. John A. Maurer. A brief history of algorithmic composition, March 1999.https://ccrma.stanford.edu/˜blackrse/algorithm.html.

128. Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, JoseSotelo, Aaron Courville, and Yoshua Bengio. SampleRNN: An unconditional end-to-endneural audio generation model, February 2017. arXiv:1612.07837v2.

129. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of wordrepresentations in vector space, September 2013. arXiv:1301.3781v3.

130. Marvin Minsky and Seymour Papert. Perceptrons: An Introduction to Computational Geom-etry. MIT Press, 1969.

131. Tom M. Mitchell. Machine Learning. McGraw-Hill, 1997.132. MIDI Manufacturers Association (MMA). MIDI Specifications, Accessed on 14/04/2017.

https://www.midi.org/specifications.133. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou,

Daan Wierstra, and Martin Riedmiller. Playing Atari with deep reinforcement learning, De-cember 2013. arXiv:1312.5602v1.

134. Olof Mogren. C-RNN-GAN: Continuous recurrent neural networks with adversarial training,November 2016. arXiv:1611.09904v1.

135. Gregoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Muller. Explaining nonlinear classification decisions with deep Taylor decomposi-tion. Pattern Recognition, (65):211–222, 2017.

136. Alexander Mordvintsev, Christopher Olah, and Mike Tyka. Deep Dream, 2015.https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html.

137. Dan Morris, Ian Simon, and Sumit Basu. Exposing parameters of a trained dynamic modelfor interactive music creation. In Proceedings of the 23rd AAAI Conference on ArtificialIntelligence (AAAI 2008), pages 784–791, Chicago, IL, USA, July 2008. AAAI Press.

138. Michael C. Mozer. Neural network composition by prediction: Exploring the benefits ofpsychophysical constraints and multiscale processing. Connection Science, 6(2–3):247–280,1994.

139. Kevin P. Murphy. Machine Learning: a Probabilistic Perspective. MIT Press, 2012.140. Andrew Ng. Sparse autoencoder – CS294A/CS294W Lecture notes –

Deep Learning and Unsupervised Feature Learning Course, Winter 2011.https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf.

141. Andrew Ng. CS229 Lecture notes – Machine Learning Course – Part I Linear Regression,Autumn 2016. http://cs229.stanford.edu/notes/cs229-notes1.pdf.

142. Andrew Ng. CS229 Lecture notes – Machine Learning Course – Part IV Generative Learningalgorithms, Autumn 2016. http://cs229.stanford.edu/notes/cs229-notes2.pdf.

143. Gerhard Nierhaus. Algorithmic Composition: Paradigms of Automated Music Generation.Springer-Verlag, 2009.

144. Martin J. Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press, July 1994.145. Francois Pachet. Beyond the cybernetic jam fantasy: The Continuator. IEEE Computer

Graphics and Applications (CG&A), 4(1):31–35, January/February 2004. Special issue onEmerging Technologies.

https://ccrma.stanford.edu/%CB%9Cblackrse/algorithm.html

https://www.midi.org/specifications

https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf

http://cs229.stanford.edu/notes/cs229-notes1.pdf

http://cs229.stanford.edu/notes/cs229-notes2.pdf

258 References

146. Francois Pachet, Jeff Suzda, and Daniel Martın. A comprehensive online database ofmachine-readable leadsheets for Jazz standards. In Alceu de Souza Britto Junior, FabienGouyon, and Simon Dixon, editors, Proceedings of the 14th International Society for Mu-sic Information Retrieval Conference (ISMIR 2013), pages 275–280, Curitiba, PA, Brazil,November 2013. ISMIR.

147. Francois Pachet, Alexandre Papadopoulos, and Pierre Roy. Sampling variations of sequencesfor structured music generation. In Xiao Hu, Sally Jo Cunningham, Doug Turnbull, andZhiyao Duan, editors, Proceedings of the 18th International Society for Music InformationRetrieval Conference (ISMIR 2017), pages 167–173, Suzhou, China, October 2017. ISMIR.

148. Francois Pachet and Pierre Roy. Markov constraints: Steerable generation of Markov se-quences. Constraints, 16(2):148–172, 2011.

149. Francois Pachet and Pierre Roy. Imitative leadsheet generation with user constraints. InTorsten Schaub, Gerhard Friedrich, and Barry O’Sullivan, editors, ECAI 2014 – Proceedingsof the 21st European Conference on Artificial Intelligence, Frontiers in Artificial Intelligenceand Applications, pages 1077–1078. IOS Press, 2014.

150. Francois Pachet, Pierre Roy, and Gabriele Barbieri. Finite-length Markov processes withconstraints. In Proceedings of the 22nd International Joint Conference on Artificial Intelli-gence (IJCAI 2011), pages 635–642, Barcelona, Spain, July 2011.

151. Alexandre Papadopoulos, Francois Pachet, and Pierre Roy. Generating non-plagiaristicMarkov sequences with max order sampling. In Mirko Degli Esposti, Eduardo G. Altmann,and Francois Pachet, editors, Creativity and Universality in Language, Lecture Notes in Mor-phogenesis. Springer International Publishing, 2016.

152. Alexandre Papadopoulos, Pierre Roy, and Francois Pachet. Assisted lead sheet composi-tion using FlowComposer. In Michel Rueher, editor, Principles and Practice of ConstraintProgramming: 22nd International Conference, CP 2016, Toulouse, France, September 5-9,2016, Proceedings, Programming and Software Engineering, pages 769–785. Springer Inter-national Publishing, 2016.

153. Aniruddha Parvat, Jai Chavan, and Siddhesh Kadam. A survey of deep-learning frameworks.In Proceedings of the International Conference on Inventive Systems and Control (ICISC2017), Coimbatore, India, January 2017.

154. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. Deepxplore: Automated whiteboxtesting of deep learning systems, September 2017. arXiv:1705.06640v4.

155. Frank Preiswerk. Shannon entropy in the context of machine learning andAI, 04/01/2018. https://medium.com/swlh/shannon-entropy-in-the-context-of-machine-learning-and-ai-24aee2709e32.

156. Mathieu Ramona, Giordano Cabral, and Francois Pachet. Capturing a musician’s groove:Generation of realistic accompaniments from single song recordings. In Proceedings of the24th International Joint Conference on Artificial Intelligence (IJCAI 2015) – Demos Track,pages 4140–4141, Buenos Aires, Argentina, July 2015. AAAI Press / IJCAI.

157. Bharath Ramsundar and Reza Bosagh Zadeh. TensorFlow for Deep Learning. O’ReillyMedia, March 2018.

158. Dario Ringach and Robert Shapley. Reverse correlation in neurophysiology. Cognitive Sci-ence, 28:147–166, 2004.

159. Curtis Roads. The Computer Music Tutorial. MIT Press, 1996.160. Adam Roberts. MusicVae supplementary materials, Accessed on 27/04/2018.

g.co/magenta/musicvae-samples.161. Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. A hi-

erarchical latent vector model for learning long-term structure in music, June 2018.arXiv:1803.05428v2.

162. Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. A hierarchi-cal latent vector model for learning long-term structure in music. In Proceedings of the 35thInternational Conference on Machine Learning (ICML 2018). ACM, Montreal, QC, Canada,July 2018.

https://medium.com/swlh/shannon-entropy-in-the-context-of-machine-learning-and-ai-24aee2709e32

https://medium.com/swlh/shannon-entropy-in-the-context-of-machine-learning-and-ai-24aee2709e32

References 259

163. Adam Roberts, Jesse Engel, Colin Raffel, Ian Simon, and Curtis Hawthorne. Mu-sicVAE: Creating a palette for musical scores with machine learning, March 2018.https://magenta.tensorflow.org/music-vae.

164. Stacey Ronaghan. Deep learning: Which loss and activation functions should Iuse?, 26/07/2018. https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-should-i-use-ac02f1c56aa8.

165. Frank Rosenblatt. The Perceptron – A perceiving and recognizing automaton. Technicalreport, Cornell Aeronautical Laboratory, Ithaca, NY, USA, 1957. Report 85-460-1.

166. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representationsby back-propagating errors. Nature, 323(6088):533–536, October 1986.

167. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhi-heng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, andLi Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal ofComputer Vision (IJCV), 115(3):211–252, 2015.

168. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, andXi Chen. Improved techniques for training GANs, June 2016. arXiv:1606.03498v1.

169. Andy M. Sarroff and Michael Casey. Musical audio synthesis using autoencoding neuralnets, 2014. http://www.cs.dartmouth.edu/˜sarroff/papers/sarroff2014a.pdf.

170. Mike Schuster and Kuldip K. Paliwal. Bidirectional recurrent neural networks. IEEE Trans-actions on Signal Processing, (11):2673–2681, 1997.

171. Mary Shaw and David Garlan. Software Architecture: Perspectives on an Emerging Disci-pline. Prentice Hall, 1996.

172. Roger N. Shepard. Geometric approximations to the structure of musical pitch. Psychologi-cal Review, (89):305–333, 1982.

173. Ian Simon and Sageev Oore. Performance RNN: Generating music with expressive timingand dynamics, 29/06/2017. https://magenta.tensorflow.org/performance-rnn.

174. Ian Simon, Adam Roberts, Colin Raffel, Jesse Engel, Curtis Hawthorne, and Douglas Eck.Learning a latent space of multitrack measures, June 2018. arXiv:1806.00195v1.

175. Spotify for Artists. Innovating for writers and artists, Accessed on 06/09/2017.https://artists.spotify.com/blog/innovating-for-writers-and-artists.

176. Mark Steedman. A generative grammar for Jazz chord sequences. Music Perception,2(1):52–77, 1984.

177. Bob L. Sturm and Joao Felipe Santos. The endless traditional music session, Accessed on21/12/2016. http://www.eecs.qmul.ac.uk/˜sturm/research/RNNIrishTrad/index.html.

178. Bob L. Sturm, Joao Felipe Santos, Oded Ben-Tal, and Iryna Korshunova. Music transcriptionmodelling and composition using deep learning. In Proceedings of the 1st Conference onComputer Simulation of Musical Creativity (CSCM 16), Huddersfield, U.K., April 2016.

179. Felix Sun. DeepHear – Composing and harmonizing music with neural networks, Accessedon 21/12/2017. https://fephsun.github.io/2015/09/01/neural-music.html.

180. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neuralnetworks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger,editors, Advances in Neural Information Processing Systems 27, pages 3104–3112. CurranAssociates, Inc., 2014.

181. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolu-tions, September 2014. arXiv:1409.4842v1.

182. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, and Rob Fergus. Intriguing properties of neural networks, February 2014.arXiv:1312.6199v4.

183. Li Tao. Facial recognition snares China’s air con queen Dong Mingzhu for jay-walking, but it’s not what it seems. South China Morning Post, November2018. https://www.scmp.com/tech/innovation/article/2174564/facial-recognition-catches-chinas-air-con-queen-dong-mingzhu.

184. David Temperley. The Cognition of Basic Musical Structures. MIT Press, Cambridge, MA,USA, 2011.

https://magenta.tensorflow.org/music-vae

https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-should-i-use-ac02f1c56aa8

http://www.cs.dartmouth.edu/%CB%9Csarroff/papers/sarroff2014a.pdf

https://magenta.tensorflow.org/performance-rnn

https://artists.spotify.com/blog/innovating-for-writers-and-artists

http://www.eecs.qmul.ac.uk/%CB%9Csturm/research/RNNIrishTrad/index.html

https://fephsun.github.io/2015/09/01/neural-music.html

https://www.scmp.com/tech/innovation/article/2174564/facial-recognition-catches-chinas-air-con-queen-dong-mingzhu

https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-should-i-use-ac02f1c56aa8

https://www.scmp.com/tech/innovation/article/2174564/facial-recognition-catches-chinas-air-con-queen-dong-mingzhu

260 References

185. The International Association for Computational Creativity. InternationalConferences on Computational Creativity (ICCC), Accessed on 17/05/2018.http://computationalcreativity.net/home/conferences/.

186. Lucas Theis, Aaron van den Oord, and Matthias Bethge. A note on the evaluation of gener-ative models, 2015. arXiv:1511.01844.

187. John Thickstun, Zaid Harchaoui, and Sham Kakade. Learning features of music from scratch,December 2016. arXiv:1611.09827.

188. Alexey Tikhonov and Ivan P. Yamshchikov. Music generation with variational recurrentautoencoder supported by history, July 2017. arXiv:1705.05458v2.

189. Peter M. Todd. A connectionist approach to algorithmic composition. Computer MusicJournal (CMJ), 13(4):27–43, 1989.

190. A. M. Turing. Computing machinery and intelligence. Mind, LIX(236):433–460, October1950.

191. Dmitry Ulyanov and Vadim Lebedev. Audio texture synthesis and style transfer, December2016. https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/.

192. Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang,Abdelrahman Mohamed, Matthai Philipose, Matt Richardson, and Rich Caruana. Dodeep convolutional nets really need to be deep (or even convolutional)?, May 2016.arXiv:1603.05691v2.

193. Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, AlexGraves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. WaveNet: A generativemodel for raw audio, December 2016. arXiv:1609.03499v2.

194. Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, andKoray Kavukcuoglu. Conditional image generation with PixelCNN decoders, June 2016.arXiv:1606.05328v2.

195. Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with doubleQ-learning, December 2015. arXiv:1509.06461v3.

196. Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Statistics for Engineeringand Information Science. Springer-Verlag, 1995.

197. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, December 2017.arXiv:1706.03762v5.

198. Karel Vesely, Arnab Ghoshal, Luks Burget, and Daniel Povey. Sequence-discriminativetraining of deep neural networks. In Proceedings of the 14th Annual Conference of the Inter-national Speech Communication Association (Interspeech 2013), pages 2345–2349, Lyon,France, August 2013. ISCA.

199. Christian Walder. Modelling symbolic music: Beyond the piano roll, June 2016.arXiv:1606.01368.

200. Christian Walder. Symbolic Music Data Version 1.0, June 2016. arXiv:1606.02542.201. Chris Walshaw. ABC notation home page, Accessed on 21/12/2016. http://abcnotation.com.202. Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292,

1992.203. Raymond P. Whorley and Darrell Conklin. Music generation from statistical models of

harmony. Journal of New Music Research (JNMR), 45(2):160–183, 2016.204. WikiArt.org. WikiArt – Visual Art Encyclopedia, Accessed on 22/08/2017.

https://www.wikiart.org.205. Cody Marie Wild. What a disentangled net we weave: Representation learning in VAEs

(Pt. 1), July 2018. https://towardsdatascience.com/what-a-disentangled-net-we-weave-representation-learning-in-vaes-pt-1-9e5dbc205bd1.

206. Michael Wooldridge. An Introduction to MultiAgent Systems. John Wiley & Sons, 2009.207. Lonce Wyse. Audio spectrogram representations for processing with convolutional neural

networks. In Proceedings of the 1st International Workshop on Deep Learning for Music,pages 37–41, Anchorage, AK, USA, May 2017.

208. Iannis Xenakis. Formalized Music: Thought and Mathematics in Composition. IndianaUniversity Press, 1963.

http://computationalcreativity.net/home/conferences/

https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/

http://abcnotation.com

https://www.wikiart.org

https://towardsdatascience.com/what-a-disentangled-net-we-weave-representation-learning-in-vaes-pt-1-9e5dbc205bd1

https://towardsdatascience.com/what-a-disentangled-net-we-weave-representation-learning-in-vaes-pt-1-9e5dbc205bd1

References 261

209. Yamaha. e-Piano Junior Competition, Accessed on 19/03/2018. http://www.piano-e-competition.com/.

210. Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. Attribute2Image: Conditionalimage generation from visual attributes, October 2016. arXiv:1512.00570v2.

211. Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. MidiNet: A convolutional generative ad-versarial network for symbolic-domain music generation. In Xiao Hu, Sally Jo Cunningham,Doug Turnbull, and Zhiyao Duan, editors, Proceedings of the 18th International Society forMusic Information Retrieval Conference (ISMIR 2017), pages 324–331, Suzhou, China, Oc-tober 2017. ISMIR.

212. Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutnık, and Jurgen Schmidhuber. Recur-rent highway networks, July 2017. arXiv:1607.03474v5.

http://www.piano-e-competition.com/

http://www.piano-e-competition.com/

Glossary

ABC notation A text-based musical notation for folk and traditional music.

Accompaniment The musical part which provides the rhythmic and/or harmonicsupport for the melody or main themes of a song or instrumental piece. There aremany different styles and types of accompaniment in different genres and styles ofmusic. Examples are: an harmony accompaniment through a progression (sequence)of chords to be played by a polyphonic instrument such as a piano or guitar, and acounterpoint accompaniment through a sequence of melodic voices to be played byhuman voices or by instruments.

Activation function The function associated to a neural network layer. In the caseof hidden layers, its purpose is to add nonlinearity. Standard examples are sigmoid,tanh and ReLU. In the case of the output layer, its purpose is to organize the resultin order to be able to interpret it. Examples of output layer activation function are:softmax for computing associated probabilities in the case of a categorical classifi-cation task with a single label to be selected, and identity in the case of a predictiontask.

Algorithmic composition The use of algorithms and computers to generate musiccompositions (symbolic form) or music pieces (audio form). Examples of modelsand algorithms are: grammars, rules, stochastic processes (e.g., Markov chains),evolutionary methods and artificial neural networks.

Architecture An (artificial neural network) architecture is the structure of the or-ganization of computational units (neurons), usually grouped in layers, and theirweighted connexions. Examples of types of architecture are: feedforward (aka mul-tilayer Perceptron), recurrent (RNN), autoencoder and generative adversarial net-works. Architectures process encoded representations (in our case of a musical con-tent) which have been encoded.

Artificial neural network A family of bio-inspired machine learning algorithmswhose model is based on weighted connexions between computing units (neurons).

263J.-P. Briot et al., Deep Learning Techniques for Music Generation, Computational


© Springer Nature Switzerland AG 2020

https://doi.org/10.1007/978-3-319-70163-9

264 Glossary

Weights are incrementally adjusted during the training phase in order for the modelto fit the data (examples).

Attention mechanism A mechanism inspired by the human visual system whichfocuses at each time step on some specific elements of the input sequence. This ismodeled by weighted connexions onto the sequence elements (or onto the sequenceof hidden units) which are subject to be learned.

Autoencoder A specific case of artificial neural network architecture with an out-put layer mirroring the input layer and with a hidden layer. Autoencoders are goodat extracting features.

Backpropagation A short hand for “backpropagation of errors”, it is the algorithmused to compute the gradients (partial derivatives with respect to each weight pa-rameter and to the bias) of the cost function. Gradients will be used to guide theminimization of the cost function in order to fit the data.

Bag-of-words (BOW) It consists in transforming the original text (or arbitrary rep-resentation) into a vocabulary composed of all occurring tokens (items). Then, var-ious measures can be used to characterize the text, the most common being termfrequency, i.e. the number of times a term appears in the text. It is mainly used forfeature extraction, e.g., to characterize and compare texts.

Bias The b offset term of a simple linear regression model h(x) = b+ θx and byextension of a neural network layer.

Bias node The node of a neural network layer corresponding to a bias. Its constantvalue is 1 and is usually notated as +1.

Binning A technique to discretize a continuous interval or to reduce the dimen-sionality of a discrete interval. It consists of dividing the original domain of valuesinto smaller intervals and replacing each bin (and the values within it) by a valuerepresentative, often the central value.

Bottleneck hidden layer (aka Innermost hidden layer) The innermost hiddenlayer of a stacked autoencoder. It provides a compact and high-level embedding ofinput data and may be used as a seed for generation (by the chain of decoders).

Challenge One of the qualities (requirements) that may be desired for music gen-eration. Examples of challenges are: incrementality, originality and structure.

Chromagram (aka Chroma) A discretized version of a spectrogram. It is dis-cretized onto the tempered scale and is independent of the octave.

Classification A machine learning task about the attribution of an instance to aclass (from a set of possible classes). An example is to determine if next note is aC4, a C�4, etc.

Compound architecture An artificial neural network architecture which is the re-sult of some combination of some architectures. Examples of types of combinationare composition, nesting and pattern instantiation.

Glossary 265

Conditioning architecture The parametrization of an artificial neural network ar-chitecture by some conditioning information (e.g., a bass line, a chord progres-sion. . . ) represented via a specific extra input, in order to guide the generation.

Connexion A relation between a neuron and another neuron representing a com-putational flow from the output of the first neuron to an input of the second neuron.A connexion is modulated by a weight which will be adjusted during the trainingphase.

Convolution In mathematics, a mathematical operation on two functions sharingthe same domain that produces a third function which is the integral (or the sum inthe discrete case – the case of images made of pixels) of the pointwise multiplicationof the two functions varying within the domain in an opposing way. Inspired both bymathematical convolution and by a model of human visions, it has been adapted toartificial neural networks and it improves pattern recognition accuracy by exploitingthe spatial local correlation present in natural images. The basic principle is to slidea matrix (named a filter, a kernel or a feature detector) through the entire image (seenas the input matrix), and for each mapping position to compute the dot product ofthe filter with each mapped portion of the image and then sum up all elements of theresulting matrix. The results are named feature maps.

Correlation Any statistical relationship, whether causal or not, between two ran-dom variables. Artificial neural networks are good at extracting correlations be-tween variables, for instance between input variables and output variables and alsobetween input variables.

Cost function (aka Loss function) The function used for measuring the distancebetween the prediction by an artificial neural network architecture (y) and the actualtarget (true value y). Various cost functions may be used, depending on the task(prediction or classification) and the encoding of the output, e.g., mean squarederror, binary cross-entropy and categorical cross entropy.

Counterpoint In musical theory, an approach for the accompaniment of a melodythrough a set of other melodies (voices). An example is a chorale with 3 voices(alto, tenor and bass) matching a soprano melody. Counterpoint focuses on the hori-zontal relations between successive notes for each simultaneous melody (voice) andthen considers the vertical relations between their progression (e.g., to avoid parallelfifths).

Cross-entropy A function measuring the dissimilarity between two probabilitydistributions. It is used as a cost (loss) function for a classification task to measurethe difference between the prediction by an artificial neural network architecture (y)and the actual target (true value y). There are two types of cross-entropy cost func-tions: binary cross-entropy when the classification is binary and categorical cross-entropy when the classification is multiclass with a single label to be selected.

Data synthesis A machine learning technique to generate synthetic data as a wayto artificially augment the size of the dataset (the number of training examples), in

266 Glossary

order to improve accuracy and generalization of the learnt model. In the musicaldomain, a natural and easy way is transposition, i.e. to transpose all examples in allkeys.

Dataset The set of examples used for training an artificial neural network architec-ture. The dataset is usually divided into two subsets: the training set used during thetraining phase and the validation set used to estimate the ability for generalizationby the model learnt.

Decoder The decoding component of an autoencoder which reconstructs the com-pressed representation (an embedding) from the hidden layer into a representationat the output layer as close as possible to the initial data representation at the inputlayer.

Decoder feedforward strategy A strategy for generating content based on an au-toencoder architecture in which values are assigned onto the latent variables of thehidden layer and forwarded into the decoder component of the architecture in orderto generate a musical content corresponding to the abstract description inserted.

Deep learning (aka Deep neural network) An artificial neural network architec-ture with a significant number of successive layers.

Discriminator The discriminative model component of generative adversarial net-works (GAN) which estimates the probability that a sample came from the real datarather than from the generator.

Embedding In mathematics, an injective and structure-preserving mapping. Ini-tially used for natural language processing, it is now often used in deep learning asa general term for encoding a given representation into a vector representation.

Encoder The encoding component of an autoencoder which transforms the datarepresentation from the input layer into a compressed representation (an embedding)at the hidden layer.

Encoding The encoding of a representation consists in the mapping of the repre-sentation (composed of a set of variables, e.g., pitch or dynamics) into a set of inputs(also named input nodes or input variables) for the neural network architecture. Ex-amples of encoding strategies are: value encoding, one-hot encoding and many-hotencoding.

End-to-end architecture An artificial neural network architecture that processesthe raw unprocessed data – without any pre-processing, transformation of represen-tation, or extraction of features – to produce a final output.

Enharmony In the tempered system, the equivalence of notes with a same pitch,for example A� with B�, although harmonically they are distinct.

Feature map Also named a convolved feature, this is the result of applying a fil-ter matrix (also named a feature detector) at a specific position of an image and

Glossary 267

summing up all dot products. This represents the basic operation of a convolutionalartificial neural network architecture.

Feedforward The basic way for a neural network architecture to process an inputby feedforwarding the input data into the successive layers of neurons of the archi-tecture until producing the output. A feedforward neural architecture (also namedmultilayer neural network or multilayer Perceptron, MLP) is composed of succes-sive layers, with at least one hidden layer.

Fourier transform A transformation (which could be continuous or discrete) ofa signal into the decomposition into its elementary components (sinusoidal wave-forms). As well as compressing the information, its role is fundamental for musicalpurposes as it reveals the harmonic components of the signal.

Generative adversarial networks (GAN) A compound architecture composed oftwo component architectures, the generator and the discriminator, who are trainedsimultaneously with opposed objectives. The generator objective is to generate syn-thetic samples resembling real data while the discriminator objective is to detectsynthetic samples.

Generator The generative model component of generative adversarial networks(GAN) whose objective is to transform a random noise vector into a synthetic(faked) sample which resembles real samples drawn from a distribution of real data.

Gradient A partial derivative of the cost function with respect to a weight param-eter or a bias.

Gradient descent A basic algorithm for training a linear regression model and anartificial neural network. It consists in an incremental update of the weight parame-ters guided by the gradients of the cost function until reaching a minimum.

Harmony In musical theory, a system for organizing simultaneous notes. Harmonyfocuses on the vertical relations between simultaneous notes, as objects on their own(chords), and then considers the horizontal relations between them (e.g., harmoniccadences).

Hidden layer Any neuron layer located between the input layer and the outputlayer of a neural network architecture.

Hold The information about a note that extends its duration over a single time step.

Hyperparameter Higher-order parameters about the configuration of a neural net-work architecture and its behavior. Examples are: number of layers, number of neu-rons for each layer, learning rate and stride (for a convolutional architecture).

Input layer The first layer of a neural network architecture. It is an interface con-sisting in a set of nodes without internal computation.

Input manipulation strategy A strategy for generating content based on the in-cremental modification of a representation to be processed by an artificial neuralnetwork architecture.

268 Glossary

Iterative feedforward strategy A strategy for generating content by generating itssuccessive time slices.

Latent variable In statistics, a variable which is not directly observed. In deeplearning architectures, variables within a hidden layer. By sampling a latent vari-able(s), one may control the generation, e.g., in the case of a variational autoencoder.

Layer A component of a neural network architecture composed of a set of neuronswith no direct connexions between them.

Linear regression In statistics, linear regression is an approach for modeling the(assumed linear) relationship between a scalar variable and one or several explana-tory variable(s).

Linear separability The ability to separate by a line or a hyperplane the elementsof two different classes represented in an Euclidian space.

Long short-term memory (LSTM) A type of recurrent neural network architec-ture with capacity for learning long term correlations and not suffering from thevanishing or exploding gradient problem during the training phase. The idea is tosecure information in memory cells protected from the standard data flow of therecurrent network. Decisions about writing to, reading from and forgetting the val-ues of cells are performed by the opening or closing of gates and are expressed at adistinct control level, while being learnt during the training process.

Many-hot encoding Strategy used to encode simultaneously several values of acategorical variable, e.g., a triadic chord composed of three note pitches. As for aone-hot encoding, it is based on a vector having as its length the number of pos-sible values (e.g., from C4 to B4). Each occurrence of a note is represented with acorresponding 1 with all other elements being 0.

Markov chain A stochastic model describing a sequence of possible states. Thechance to change from the current state to a state or to another state is governed bya probability and does not depend on previous states.

Melody The abbreviation of a single-voice monophonic melody, that is a sequenceof notes for a single instrument with at most one note at the same time.

Musical instrument digital interface (MIDI) A technical standard that describesa protocol, a digital interface and connectors for interoperability between variouselectronic musical instruments, softwares and devices.

Multilayer Perceptron (MLP) A feedforward neural architecture composed ofsuccessive layers, with at least one hidden layer.

Multivoice (aka Multitrack) The abbreviation of a multivoice polyphony, that isa set of sequences of notes intended for more than one voice or instrument.

Neuron The atomic processing element (unit) of an artificial neural network ar-chitecture, inspired by the biological model of a neuron. A neuron has several inputconnexions, each one with an associated weight, and one output. A neuron will com-

Glossary 269

pute the weighted sum of all its input values and then apply its associated activationfunction in order to compute its output value. Weights will be adjusted during thetraining phase of the neural network architecture.

Node The atomic structural element of an artificial neural network architecture. Anode could be a processing unit (a neuron) or a simple interface element for a value,e.g., in the case of the input layer or a bias node.

Nonlinear function A function used as an activation function in an artificial neuralnetwork architecture in order to introduce nonlinearity and to address the linearseparability limitation.

Objective The nature and the destination of the musical content to be generated bya neural network architecture. Examples of objectives are: a monophonic melody tobe played by a human flutist and a polyphonic accompaniment played by a synthe-sizer.

One-hot encoding Strategy used to encode a categorical variable (e.g., a notepitch) as a vector having as its length the number of possible values (e.g., fromC4 to B4). A given element (e.g., a note pitch) is represented with a corresponding1 with all other elements being 0. The name comes from digital circuits, one-hotreferring to a group of bits among which the only legal (possible) combinations ofvalues are those with a single high (hot) (1) bit, all the others being low (0).

Output layer The last layer of a neural network architecture. It includes the outputactivation function which could be for instance a sigmoid or a softmax in the caseof a classification task.

Overfitting The situation for an artificial neural network architecture (and moregenerally speaking for a machine learning algorithm) when the model learnt is wellfit to the training data but not to the evaluation data. This means the inability of themodel to generalize well.

Parameter The parameters of an artificial neural network architecture are theweights associated to each connexion between neurons as well as the biases as-sociated to each layer.

Perceptron One of the first artificial neural network architecture, created by Rosen-blatt in 1957. It had no hidden layer and suffered from the linear separability limi-tation.

Piano roll Representation of a melody (monophonic or polyphonic) inspired fromautomated pianos. Each “perforation” represents a note control information, to trig-ger a given note. The length of the perforation corresponds to the duration of a note.In the other dimension, the localization (height) of a perforation corresponds to itspitch.

Pitch class The name of the corresponding note (e.g., C) independently of the oc-tave position. Also named chroma.

270 Glossary

Polyphony The abbreviation of a single-voice polyphony, that is a sequence ofnotes for a single instrument (e.g., a guitar or a piano) with possibly simultaneousnotes.

Pooling For a convolutional architecture, a data dimensionality reduction operation(by max, average or sum) for each feature map produced by a convolutional stage,while retaining significant information. Pooling brings the important property of theinvariance to small transformations, distortions and translations in the input image.

Pre-training A technique, also named greedy layer-wise unsupervised training,consisting in prior training in cascade (one layer at a time) of each hidden layer.It turned out to be a significant improvement for the accurate training of artificialneural networks with several layers by initializing the weights based on learnt data.

Q-learning An algorithm for reinforcement learning based on an incremental re-finement of the action value function Q which represents the cumulated rewards fora given state and a given action.

Recurrent connexion A connexion from an output of a node to its input. By exten-sion, a layer recurrent connexion fully connects all layer nodes outputs to all nodesinputs. This is the basis of a recurrent neural network (RNN) architecture.

Recurrent neural network (RNN) A type of artificial neural network architecturewith recurrent connexions. It is used to learn sequences.

Reinforcement learning An area of machine learning concerned with an agentmaking successive decisions about an action in an environment while receiving areward (reinforcement signal) after each action. The objective for the agent is tofind the best policy maximizing its cumulated rewards.

Reinforcement strategy A strategy for content generation by modeling generationof successive notes as a reinforcement learning problem while using an RNN as areference for the modeling of the reward. Therefore, one may introduce arbitrarycontrol objectives (e.g., adherence to current tonality, maximum number of repeti-tions, etc.) as additional reward terms.

ReLU The rectified linear unit function, which may be used as a hidden layer non-linear activation function, specially in the case of convolutions.

Representation The nature and format of the information (data) used to train andto generate musical content. Examples of types of representation are: signal, spec-trum, piano roll and MIDI.

Rest The information about the absence of a note (silence) during one (or more)time step(s).

Restricted Boltzmann machine (RBM) A specific type of artificial neural net-work that can learn a probability distribution over its set of inputs. It is stochastic,has no output and uses a specific learning algorithm.

Glossary 271

Sampling The action of producing an item (a sample) according to a given proba-bility distribution over the possible values. As more and more samples are generated,their distribution should more closely approximate the given distribution.

Sampling strategy A strategy for generating content where variables of a contentrepresentation are incrementally instantiated and refined according to a target prob-ability distribution which has been previously learnt.

Seed-based generation An approach to generate arbitrary content (e.g., a longmelody) with a minimal (seed) information (e.g., a first note).

Self-supervised learning A category of machine learning when the output value ofthe example (target value of the supervision) is equal to the input value. An exampleis the training of an autoencoder.

Sigmoid Also named the logistic function, it is used as an output layer activationfunction for binary classification tasks and it may also be used as a hidden layernon-linear activation function.

Single-step feedforward strategy A strategy for generating content where a feed-forward architecture processes in a single processing step a global temporal scoperepresentation which includes all time slices.

Softmax Generalization of the sigmoid (logistic) function to the case of multipleclasses. Used as an output activation function for multiclass single-label classifica-tion.

Sparse autoencoder An autoencoder with a sparsity constraint such that its hiddenlayer units are inactive most of the time. The objective is to enforce the specializa-tion of each unit in the hidden layer as a specific feature detector.

Spectrogram A visual representation of a spectrum of an audio signal obtained viaa Fourier transform.

Stacked autoencoder A set of hierarchically nested autoencoders with decreasingnumbers of hidden layer units.

Strategy The way the architecture will process representations in order to generatethe objective while matching desired requirements. Examples of types of strategyare: single-step feedforward, iterative feedforward and decoder feedforward.

Stride For a convolutional architecture, the number of pixels by which we slide thefilter matrix over the input matrix.

Style transfer The technique for capturing a style (e.g., of a given painting, by cap-turing the correlations between neurons for each layer) and applying it onto anothercontent.

Supervised learning A category of machine learning where for each training ex-ample a target information (a scalar value in case of a regression and a class in caseof a classification) is provided.

272 Glossary

Support vector machine (SVM) A class of supervised machine learning modelsfor linear classification with optimization of the separation margin. A kernel methodis usually associated to a SVM in order to transform the initial nonlinear classifica-tion problem into a linear classification problem within a higher dimension space.

Tanh (aka Hyperbolic tangent) The hyperbolic tangent function, which may beused as a hidden layer non-linear activation function.

Test set (aka Validation set) The subset of examples (dataset) which are used forevaluating the ability of the learnt model to generalize, that is to predict or to classifyproperly in the presence of yet unseen data.

Time slice The time interval considered as an atomic portion (grain) of the tempo-ral representation used by an artificial neural network architecture.

Time step The atomic increment of time considered by an artificial neural networkarchitecture.

Training set The subset of examples (dataset) which are used for training the arti-ficial neural network architecture.

Transfer learning An area of machine learning concerned with the ability to reusewhat has been learnt and apply (transfer) it to related domains or tasks.

Turing test Initially codified in 1950 by Alan Turing and named by him the “im-itation game”, the “Turing test” is a test of the ability for a machine to exhibit in-telligent behavior equivalent to (and more precisely, indistinguishable from) the be-havior of a human. In his imaginary experimental setting, Turing proposed the testto be a natural language conversation between a human (the evaluator) and a hiddenactor (another human or a machine). If the evaluator cannot reliably tell the machinefrom the human, the machine is said to have passed the test.

Unit See neuron.

Unit selection strategy A strategy for content generation about querying succes-sive musical units (e.g., one measure long melody segments) from a database andconcatenating them in order to generate a sequence according to some user charac-teristics.

Unsupervised learning A category of machine learning which extracts informa-tion from data without any added label or class information.

Variational autoencoder (VAE) An autoencoder with the added constraint thatthe encoded representation (its latent variables) follow some prior probability dis-tribution, usually a Gaussian distribution. The variational autoencoder is thereforeable to learn a “smooth” latent space mapping to realistic examples which providesinteresting ways to control the variation of the generation.

Value encoding The direct encoding of a numerical value as a scalar.

Glossary 273

Vanishing or exploding gradient problem A known problem when training a re-current neural network caused by the difficulty of estimating gradients, because,in backpropagation through time, recurrence brings repetitive multiplications andcould thus lead to over amplify or minimize effects (numerical errors). The longshort-term memory (LSTM) architecture solved the problem.

Waveform The raw representation of a signal as the evolution of its amplitude intime.

Weight A numerical parameter associated to a connexion between a node (neuronor not) and a unit (neuron). A neuron will compute the weighted sum of the acti-vations of its connexions and then apply its associated activation function. Weightswill be adjusted during the training phase.

Zero-padding For a convolutional architecture, the padding of the input matrixwith zeros around its border.

Index

ABC notation, 32, 48, 135, 263Abstraction, 1Accent, 187Accompaniment, 16, 116, 219, 243, 263Accurate, 192Action, 107

evaluation, 108selection, 108

Activation, 179, 181function, xxvii, 59, 64, 78, 81, 263

Ad hoc, 113, 130, 147Adaptation, 108, 217AF, see Activation function, 59Agent, 107, 108Agnostic, 5AI, see Artificial intelligence, 4, 218, 246Alan Turing, 3AlexNet, 52Algorithm, 195Algorithmic composition, 2, 7, 263Analysis, 223ANN, see Artificial neural networkAnomaly detection, 2Anticipation-RNN, 103, 109, 157, 163, 171,

224Architectural, 109, 163

meta-level, 81pattern, 66, 102, 109

Architecture, vii, 12, 55, 115, 263Argmax, 71, 73, 131, 157Argsort, 74Art, 208, 248Artificial

intelligence, xxvii, 4, 51, 218, 246neural network, see Neural network, 1, 52,

263AST, see Audio style transfer, 224

Asymmetry, 129, 166Attention mechanism, 98, 99, 112, 245, 264Attribute, 87, 157, 159, 161, 204

vector arithmetics, 87, 204Audio, 20, 21, 41, 102, 166, 196

style transfer, xxviiAudioWord2Vec, 112Autoencoder, 2, 36, 40, 65, 82, 85, 89, 90, 107,

109, 111, 121, 243, 245, 264Autoencoder(RNN, RNN), 109, 111Automated, 17Autonomous, 17Average, 87, 204Axis, 151Axone, 59

Bach, 3, 4, 125, 134, 149, 170, 211, 213, 247BachBot, 34, 218, 224, 247Backpropagation, 51, 79, 96, 98, 99, 177, 181,

264through time, xxvii, 96, 98

Backward, 212Bag-of-words, xxvii, 40, 193, 264BALSTM, see Bi-Axial LSTM, 153Bar, 27Baroque, 170Basic building block, 62, 67Bass, 213

line, 47, 103, 163, 164Beat, 27, 103, 163, 164Beethoven, 171Bernoulli, 92Bi-Axial LSTM, xxvii, 153, 170, 171, 224Bias, 22, 53, 56, 62, 264

node, 56, 264Bidirectional, 109, 144

LSTM, xxvii, 143

275J.-P. Briot et al., Deep Learning Techniques for Music Generation, Computational


© Springer Nature Switzerland AG 2020

https://doi.org/10.1007/978-3-319-70163-9

276 Index

recurrent neural network, 110, 161, 173,202, 212

RNN, see Bidirectional recurrent neuralnetwork, 110

Bill Evans, 199Bin, 44, 45, 171Binary, 28, 42

cross-entropy, 72, 78Binning, 44, 202, 264Biological

inspiration, 56, 59neural network, 56

Black box, 218Block, see LSTM block

Gibbs sampling, 124, 126, 149BLSTM, see Bidirectional LSTM, 110, 127,

143, 147, 173, 212, 224, 241Blues, 127BluesM , 224BluesMC , 224Boltzmann machine, 90Boolean, 42, 91, 92Bossa nova, 137Bottleneck hidden layer, 89, 121, 177, 264Bottom-up, 6, 13, 192BOW, see Bag-of-words, 40, 193, 264BPTT, see Backpropagation through time, 96Brittle, 5Bucketing, 44

C-RBM, see Convolutional restricted Boltz-mann machine, 92, 102, 109, 125, 156,187, 196, 201, 224, 244

C-RNN-GAN, 104, 109, 110, 161, 173, 212,224, 244

Cadence, 147, 212CAN, see Creative adversarial networks, 208Case-based reasoning, xxvii, 137Categorical, 42, 91

cross-entropy, 72, 77Causal convolution, 166CBR, see Case-based reasoning, 137Cell, see LSTM cellCeltic, 48, 135, 224Cepstrum, 21Chain rule, 79Challenge, vii, 12, 13, 115, 120, 264CHORAL, 3Chorale, 16, 116, 149, 211, 213Chord, 16, 26, 34, 40, 124, 143, 147

progression, 16, 34, 103, 143, 163Chord2Vec, 34Chroma, see Chromagram, 264Chromagram, 23, 264

Classical, 48, 149, 152, 201Classical piano MIDI database, 48, 152–154Classification, 1, 2, 65, 71, 106, 116, 264

task, 45, 131ClockworkRNN, 206CNN, see Convolutional neural network, 99Cocktail party effect, 41Coherence, 17, 47Column vector, 57Composer, 170Composition, 109

style transfer, 186, 195Compound architecture, 108, 125, 264Computer music, 3, 7Concatenation, 192, 194

cost, 192, 194CONCERT, 210, 224, 244Conditional

architecture, see Conditioning architecture,163

probability, 85, 95, 148, 157, 173, 213Conditioning, 103, 163, 164, 166, 168, 171,

196, 208architecture, 103, 163, 265input, 103, 104, 109, 112, 113, 163, 164,

166, 171Conditioning(RNN, RNN), 109Connexion, 81, 265Consonant, 177Constrained sampling, xxvii, 156, 187, 188Constraint, 156, 173, 186

satisfaction, 189Content, 2, 116

generation, 2Context, 161, 246, 248Continuous, 20, 42, 92Contrastive divergence, 91, 187Control, 81, 103, 108, 124, 137, 192, 195, 207,

215interface, 17parameter, 208

Convergence, 189Convex, 55, 79ConvNet, see Convolutional neural network,

99Convolution, 1, 99, 103, 166, 208, 244, 265Convolutional, 102

architecture, 244network, see Convolutional neural network,

102, 104, 166, 245neural network, xxvii, 52, 99restricted Boltzmann machine, xxvii, 187

Convolutional(RBM), 109Convolved feature, 100

Index 277

Corpus, 2, 5, 192Correlation, 100, 180, 181, 219, 265Cost, 54, 55, 79, 84, 85, 194

function, 72, 109, 179Counterpoint, 16, 116, 117, 147, 177, 213,

219, 265Country, 144Couple, 148Coverage, 17Creative, 208

adversarial networks, xxvii, 208Creativity, 7, 162, 170, 207, 248Criterium, 192, 193Cross-entropy, 72, 85, 265

cost, 72, 112, 150, 175CS, see Constrained sampling, 187CSV, see Comma-separated valuesCumulated rewards, 108Cyber-Joao, 137, 198

Data synthesis, 81, 265Database, 192, 195Dataset, 37, 47, 48, 116, 178, 192, 217, 266

augmentation, 47, 81dBFS, see Decibel relative to full scale, 23Decibel, 25

relative to full scale, xxvii, 23Decision tree, 44Decoder, 83, 90, 109, 111, 121, 266

feedforward strategy, 121, 127, 157, 244,266

Decomposition, 23Deep, 69

belief net, 52learning, vii, 1, 62, 218, 266network, see Deep neural network, 89neural network, 108, 266reinforcement learning, 107, 108

Deep Dream, 177, 178, 222deepAutoController, 123, 215, 224DeepBach, 39, 40, 113, 117, 125, 157, 159,

174, 177, 210, 211, 215, 218, 219, 222,224, 241, 243, 246, 247

DeepHear, 36, 121, 177, 224, 244, 246DeepHearC , 177, 224DeepHearM , 123, 224DeepJ, 45, 103, 153, 154, 163, 170, 196, 224Dendrite, 59Depth, 68, 100Destination, 15Deterministic, 91, 156Dilated convolution, 166Discrete, 20, 42Discretization, 37, 171

Discriminator, 105, 107, 208, 266Disentanglement, 196

learning, 196Distance, 54, 177, 187, 193Distribution, see Probability distribution, 124,

137, 187, 208Double Q-learning, 108Dropout, 81Drums, 46, 164Duration, 30, 33DX7, 2Dynamics, 41, 136, 137

Early stopping, 81Embedding, 83, 90, 103, 121, 163, 171, 177,

193–195, 202, 266Emergence, 179EMI, see Experiments in musical intelligence,

3Emulative, 208Encapsulate, 111Encoder, 83, 89, 107, 109, 111, 266Encoding, 40, 42, 72, 83, 90, 111, 266End-to-end architecture, 22, 266Energy, 92Engineering, 59Enharmony, 19, 40, 266Entangled, 41Entanglement, 196Entropy, 76Entry point, 155Environment, 107, 108Equiprobable, 209Equivariance, 101Error, 79Estimation, 54, 79, 105, 108, 187, 190Evaluation, 66, 195, 214, 247Event message, 29Expectation, 77Experience, 65Experiments in musical intelligence, xxvii, 3Explainability, 218Explanatory variable, 53Exploding gradient problem, 98, 273Exploration, 188

exploitation dilemma, 108Expressiveness, 41, 136Extensional, 133, 168, 170Extraction, 19, 65, 137, 192

Fast, 245Fourier transform, xxvii, 123

Feature, 19, 40, 89, 121, 192, 193, 208-based representation, 40

278 Index

detector, 84, 100extraction, 40, 83, 89, 193extractor, 193map, 100, 266matching, 106, 162, 208

Feedback, 108, 192, 217Feedforward, 36, 62, 116, 121, 181, 267

network, see Feedforward neural network,113, 120, 164, 178, 245

neural network, 67, 78, 116, 243propagation, 78

Fermata, 38, 213, 219FFT, see Fast Fourier transformFill, 192Filter, 100Fine-grained, 194Fixed-length, 111Flow, 151Flow Machines, 36, 48, 199FlowComposer, 5, 199Flute, 16FM, see Frequency modulation, 2Folk, 48, 135, 144, 149, 192

-rnn, 33, 135Form, 186, 201Forward, 212Fourier transform, 19, 23, 123, 267Frequency, 25

modulation, xxvii, 2

Gain, 65, 108, 190Game theory, 105GAN, see Generative adversarial networks,

104, 107, 109, 110, 157, 161, 208, 267GAN(RNN, RNN), 109GarageBand, 3Gated recurrent unit, xxvii, 98Gaussian distribution, 85GD, see Gradient descentGenerality, 5Generalization, 5, 47, 54, 79, 217

error, 80Generate, 116

-and-test, 124, 156Generated data, 20, 36Generation, 2, 5, 103, 108, 163, 166, 187, 217

data, 20, 36phase, 19

Generativeadversarial networks, xxvii, 104, 157, 161,

208, 267latent optimization, xxvii, 106model, 85, 90, 157, 161

Generator, 104, 208, 267

Genre, see Musical genre, 170Geodesic latent space regularization, xxvii,

158Gibbs sampling, xxvii, 85, 92, 124, 188, 213GLO, see Generative latent optimization, 106Global, 186

conditioning, 166, 171minimum, 55

GLSR, see Geodesic latent space regulariza-tion, 158

GLSR-VAE, 112, 158, 178, 224, 244GoogLeNet, 69GPU, see Graphics processing unit, 1, 64Gradient, 55, 79, 181, 267

ascent, 179descent, xxvii, 55, 66, 177, 267

Grammar, 4, 5Granularity, 192Graphics processing unit, xxvii, 1, 64Greedy layer-wise unsupervised training, 52Grid search, 219GRU, see Gated recurrent unit, 98GS, see Gibbs sampling, 92, 124, 188Guitar, 16, 137

Handcrafted, 193feature, 40, 134model, 5

Harmonics, 23, 38Harmonization, 199, 212Harmony, 16, 125, 147, 151, 267Hertz, 25Heterogeneous, 47, 108Heuristic, 55, 194Hexahedria, 224Hidden

layer, 51, 67, 93, 121, 267layer unit, 89Markov model, xxvii, 146unit, see Hidden layer unit

Hierarchical, 1, 89, 112, 187, 202, 206High-level, 179, 189Higher

-order, 187level, 89

History, 245HMM, see Hidden Markov model, 146Hold, 38, 39, 159, 174, 267Homogeneous, 108Hook, 155Hyperbolic tangent, 60, 272Hyperparameter, 79, 81, 100, 106, 157, 219,

267Hypothesis, 53

Index 279

Hz, see Hertz

Identification, 2Identity, 83, 85Illiac Suite, 2Image, 2, 48, 99, 177, 178, 180, 244

recognition, 1, 52, 98Incremental, 108, 180, 243Index, 93, 193, 195Initialization, 55Innermost hidden layer, 89, 264Input, 19

layer, 62, 67, 103, 163, 267manipulation strategy, 177, 185, 195, 267node, 42, 56, 62, 82variable, 42

Instantiate, 109Instrument, 16, 103, 163, 166Integer, 42Integration, 150Intensional, 168Interactive, 17, 215Interface, 121Interpolation, 87, 204Interval, 26Invariance, 101, 187, 245Iterative feedforward strategy, 36, 127, 138,

147, 156, 195, 210, 222, 244, 268

Jazz, 16, 27, 48, 144, 192, 201Jitter, 179JSB Chorales dataset, 48, 153

Kernel, 100Key, 47KL-divergence, see Kullback-Leibler

divergence, 77Knowledge representation, 20Kullback-Leibler divergence, xxvii, 77

L2, 81Label, 40, 65, 83, 97, 103, 121, 163, 177, 195Language, 111Latent

space, 85, 87, 244variable, 83–85, 87, 244, 268

Layer, 1, 55, 67, 68, 78, 245, 268Lead sheet, 16, 34, 48Lead sheet data base, xxvii, 48Learning, 2, 5, 108, 217

rate, 55, 81Length, 111, 126Likelihood, 78Limitation, vii, 120

Linearalgebra, 56, 64regression, 2, 53, 268separability, 67, 268

Localconditioning, 166minimum, 79

Logisticfunction, see Sigmoid functionregression, 2, 59, 71

Long-term dependency, 98, 99short-term memory, xxvii, 1, 52, 98, 268

Loss, 54Lossy representation, 40Low-level, 189LSDB, see Lead sheet data base, 36, 48LSTM, see Long short-term memory, 1, 52,

98, 99, 113, 201, 245, 268Lyrics, 16, 34

Machine learning, xxviii, 1Major, 27, 187Manifold, 85

learning, 83, 85Many-hot encoding, 43, 45–47, 168, 268Markov

chain, 2, 189, 199, 207, 268constraint, 189

Matrix, 57, 63, 64, 78Max/MSP, 2Maximize, 177Maximum likelihood estimation, 78Mean squared error, xxviii, 54, 72Measure, 27, 33, 245Mel-frequency cepstral coefficients, xxvii, 21Melody, see Single-voice monophonic melody,

34, 116, 125, 147, 268Memory, 80, 98, 152, 168

-based, 80Meta-level, 81, 98Metadata, 32, 38, 213Meter, 27, 32, 186, 187Metropolis-Hastings, 124MFCC, see Mel-frequency cepstral

coefficients, 21Michel Legrand, 196, 199MIDI, see Musical instrument digital interface,

2, 17, 29, 38, 40, 171, 268note number, 29, 42note velocity, 29

MidiNet, 20, 27, 30, 36, 103, 104, 106, 162,163, 167, 224, 244, 245

Miles Davis, 192

280 Index

MiniBach, 36, 117, 147, 177, 210, 211, 218,219, 224, 243

Minibatch, 217gradient descent, 55

Minimax, 105Minor, 27, 187MIR, see Music information retrieval, 6Misclassification, 106ML, see Machine learning, 1MLP, see Multilayer Perceptron, 78, 268MNIST, see Modified National Institute of

Standards and Technology, 48, 86Modality, 103, 163Mode, 15, 187Model, 53, 55Modified National Institute of Standards and

Technology, xxviiiModularity, 163Monophonic, 16, 43

melody, see Single-voice monophonicmelody

Motif, 178, 179, 187, 201, 245Mozart, 3, 187MSE, see Mean squared errorMulti

-many-hot encoding, 43-one-hot encoding, 43, 45multiclass single label, 73

Multiclassmultilabel, 73single label, 73

Multicriteria, 195, 247Multilayer

neural network, see Neural networkPerceptron, xxviii, 78, 268

Multinomial, 92Multinoulli, 91, 92Multiple linear regression, 53Multitrack, see Multivoice polyphony, 28, 30,

43, 268polyphony, see Multivoice polyphony

Multivariate linear regression, 57Multivoice, see Multivoice polyphony, 28, 43,

219, 268polyphony, 16, 218

MuseData library, 48, 153, 154Music, 2

information retrieval, xxviii, 6style transfer, 185unit, 192

Music I, 2Musical

content, viigenre, 5, 103, 123, 163

instrument digital interface, xxvii, 2, 29, 268unit, 192

MusicNet dataset, 48MusicTransformer, 245MusicVAE, 47, 110, 112, 178, 202, 224, 244MusicXML, 34

Nash equilibrium, 105Natural language processing, xxviii, 34, 40, 98Nested, 109Neural

net, see Neural networknetwork, xxviii, 51, 62, 104, 115, 116, 218Turing machine, xxviii, 98

Neuron, 56, 268NLP, see Natural language processing, 40NN, see Neural networkNode, 56, 269Nonlinear, 71

function, 59, 269Norm, 208Normal distribution, 85Notation convention, 27, 32, 34, 53, 60, 63, 69,

71, 78, 85, 93, 115, 228, 236Note, 16, 25, 34, 113, 124

beginning, 39duration, 37, 117ending, 38, 39onset, 41, 168, 187, 198step, 36, 133tie, 38

Nottingham database, 48, 153NTM, see Neural Turing machine, 98Nuance, 136

Objective, vii, 11, 15, 20, 116, 208, 269destination, 15facet, 15mode, 15type, 15use, 15

Octave, 23, 25, 153One-hot encoding, 42, 43, 45, 46, 71, 269OpenMusic, 5Optimization, 66, 81Organization, 201Originality, 207Output, 19

activation function, 71, 72layer, 62, 67, 269layer activation function, see Output

activation functionnode, 62, 82

Overestimation, 108

Index 281

Overfitting, 54, 81, 217, 269

Parallel, 166Parameter, 53, 81, 269

initialization, 55sharing, 101

Parameterization, 121Pareidolia, 178Pattern, see Architectural pattern, 187

instantiation, 109PCA, see Principal component analysis, 83Perceptron, 51, 69, 78, 269Percussion, 46Performance, 41, 49, 137, 195

style transfer, 195Performance RNN, 41, 44, 137, 224Piano, 16, 149, 152

roll, 30, 38, 43, 269Pitch, 25, 29, 30, 32, 42, 103, 149, 245

class, 23, 26, 32, 144, 269notation, 25

Pivot representation, 111Pixel, 179Plagiarism, 8, 122Policy, 65, 108Polymorphism, 20Polyphonic, 16Polyphony, see Single-voice polyphony, 113,

270Pooling, 101, 166, 270Pop, 144, 167Positional constraint, 157, 163, 173Pre-training, 1, 52, 55, 91, 270Prediction, 1, 2, 54, 62, 65, 71, 115, 116Principal component analysis, xxviii, 83Probability, 53, 71

distribution, 53, 71, 77, 85, 90, 95, 124, 134Process, 194Profile, 187Propagation, 152Pseudo-Gibbs sampling, 213Psychedelic, 178

Q-learning, 108, 270-table, 108function, 108network, 190

Quantization, 41, 136Query, 192, 195

R&B, see Rhythm and bluesRadial function, 69Ragtime, 121, 177

Random, 55, 121, 135, 156, 177, 180noise, 179

Randomness, 137Range, 92Rank, 194, 195RBM, see Restricted Boltzmann machine, 65,

90, 92, 109, 113, 125, 127, 156, 270RBMC , 224Real, 92ReChord, 199Recopy, 207Rectified linear unit, xxviii, 60Recurrent, 102

connexion, 93, 270highway network, xxviii, 175network, see Recurrent neural network, 52,

65, 103, 104, 111, 113, 156, 245neural network, xxviii, 36, 93, 125, 243,

244, 270Recursion, 97, 127, 138, 147Referential, 223Refinement, 109Regression, 2, 65, 71Regularization, 81, 208Reinforcement

learning, xxviii, 65, 107, 270signal, 107strategy, 189, 192, 270

Relation, 223Relationship, 115Relevance, 193ReLU, see Rectified linear unit, 60, 101, 270Rendering, 199Repetition, 138Replay, 108Representation, vii, 1, 12, 42, 55, 66, 117, 134,

270learning, 85

ResNet, 69Rest, 25, 159, 270Restricted Boltzmann machine, xxviii, 2, 90,

125, 156, 270Retrieval, 137Reuse, 245Reverse, 168Reward, 65, 107, 108, 190, 192, 218RHN, see Recurrent highway network, 175Rhythm, 27, 164, 224Rhythmic pattern, 137Richard Feynman, 3RL, see Reinforcement learning, 107RL-Tuner, 107, 190, 201, 218, 224RNN, see Recurrent neural network, 93, 102,

103, 109–111, 125, 243, 244, 270

282 Index

RNN Encoder-Decoder, 34, 108, 109, 111,147, 198, 244, 245

RNN-DBN, 150RNN-RBM, 109, 113, 125, 148, 151, 224Robustness, 45, 101Rock, 143, 192Row vector, 57Rule, 2, 5, 137

-based system, 4, 5

Saliency map, 221Sample, 91, 104, 106, 124, 156SampleRNN, 206Sampling, 71, 74, 78, 85, 90, 124, 149, 156,

243, 271strategy, 125, 134, 185, 210, 213, 222, 271

Scalar, 42, 43Scale, 187Sceptrum, 21Scope, 17Score, 25, 195Search, 186Seed, 90, 120, 121, 127, 128

-based generation, 127, 147, 243, 271Selection, 192Selective Gibbs sampling, xxviii, 188Self

-information, 75-similarity, 187-supervised learning, 83, 121, 193, 271

Semantic relevance, 192, 193Sense of direction, 192, 201Sequence, 16, 111, 125

-to-sequence learning, 111Sequencer, 17Sequential, 127, 141, 224SGD, see Stochastic gradient descent, 55, 66,

79SGS, see selective Gibbs sampling, 188Sigmoid, 60, 71, 271Signal, 12, 21, 23

processing, 20Similarity, 123, 177, 187, 207Simple linear regression, 53Simulated annealing, 188Simultaneous, 161Single

-step, 116-step feedforward strategy, 36, 116, 120,

210, 243, 271-track polyphony, see Single-voice

polyphony, 46, 47-voice monophonic melody, 16-voice polyphony, 16, 218

Sinusoidal, 23Size, 47Slice, see Time sliceSoftmax, 45, 46, 71, 271Solo, 192Sonata, 187Sound, 195Sparse autoencoder, 84, 109, 121, 271Sparsity, 17, 48, 84, 187Speaker, 166Specialization, 109, 246Spectrogram, 23, 271Spectrum, 19, 23Stacked autoencoder, 89, 108, 109, 271State, 107Step, 36, 91, 95Stochastic, 2, 91, 124, 156

gradient descent, xxviii, 55, 66, 79music, 3

Strategy, viii, 12, 13, 114, 115, 156, 271Stride, 100, 271Structure, 81, 186, 187, 192, 201

imposition, 196Style, 2, 5, 15, 103, 163, 170, 171, 180, 192,

208imposition, 177transfer, 177, 180, 184, 195, 246, 271

Sub-symbolic, 4, 218Subsampling, 101Successor semantic relevance, 194Summary, 223Supervised learning, 65, 79, 83, 91, 97, 116,

120, 271Support vector machine, xxviii, 52, 272SVM, see Support vector machine, 52, 272Symbolic, 4, 20, 51, 196

representation, 19Symbolic Music dataset, 48Symmetric, 83, 110, 173Synthesis, 2Synthesizer, 2Synthetic data, 47System, 11, 115

Tab, 30, 36, 48Tablature, 30Tag, 103, 113, 163, 166, 209Tanh, 60, 272Target, 65, 177Task, 2, 11, 246Tchaikovsky, 170Templagiarism, 186Template, 186Tempo, 34, 41, 136

Index 283

Temporalscope, 36, 219sequence, 93, 113

Ternary, 28Test

data, 19set, 80, 272

Text, 2, 32, 40, 47-to-speech, xxviii, 166, 192

Textual representation, 32Texture, 179TheoryTab database, 48Threshold, 59Tick, 29Tied, 38

note, 193, 218Timbre, 196

style transfer, 195, 196Time, 36, 93, 244

-delay neural network, 102frame, 34, 218quantization, 48signature, 27, 187slice, 36, 272step, 36, 37, 39, 99, 104, 164, 166, 171, 235,

243, 272stretching, 41

Time-Windowed, 139, 224, 241Timing, 137Tonality, 186Top-down, 192Track, 16, 29, 43, 47Trade-off, 162Training, 54, 79, 106, 166, 217, 245

corpus, 207data, 19error, 80phase, 19set, 80, 217, 272

Transcription, 125, 192Transfer learning, 52, 177, 184, 245, 272Transformer, 99, 245Transition, 192, 193Translation, 1, 2, 99, 111, 244Transposition, 47

invariance, 153Triad, 26, 144TTS, see Text-to-speech, 166, 192Tuple, 193Turing, see Alan Turing

test, 3, 105, 214, 247, 272Typology, viii

Unfolded, 93

Unit, 56, 59, 92, 272selection strategy, 192, 201, 272

UnitSelection, 224Universal approximator, 67Unsupervised learning, 65, 83, 91, 272Use, 15User, 192

-defined, 170interface, 124, 215

VAE, see Variational autoencoder, 85, 107,109, 110, 112, 157, 196, 272

Validationdata, 19set, 80, 82, 272

Value encoding, 42, 45, 46, 272Vanishing gradient problem, 98, 273Variable, 19, 53

length, 111, 147, 243Variance, 188Variational, 112

autoencoder, xxviii, 85, 89, 109, 110, 112,157, 178, 196, 244, 272

recurrent autoencoder, xxviii, 158, 202recurrent autoencoder supported by history,

xxviii, 175Variational(Autoencoder(RNN, RNN)), 112Vectorized, 64Velocity, 29, 38, 41Video game, 158Visible unit, 92Visual, 178Vocabulary, 33Vocal, 16Voice, 16, 43, 47, 147, 213Volume, 29, 41VRAE, see Variational recurrent autoencoder,

147, 158, 161, 178, 202, 224, 244VRASH, see Variational recurrent autoencoder

supported by history, 113, 163, 175, 224

Waltz, 27Waveform, 19, 21, 166, 273WaveNet, 20, 45, 102–104, 113, 163, 164, 166,

171, 206, 224, 244, 245Web, 214Weight, 53, 79, 81, 98, 273

decay, 81Weighted sum, 59Window, 187Word embedding, 40Word2Vec, 34, 40

Xenakis, 3

284 Index

XML, 144

Yamaha e-Piano Competition dataset, 49

Zero-padding, 100, 273

references978-3-319-70163...deep learning techniques for music generation – a survey, september...

Documents