computational synthesis and creative systems978-3-319-70163-9/1.pdf · authors or the editors give...

Computational Synthesis and Creative Systems

Series Editors

François Pachet, Paris, FrancePablo Gervás, Madrid, SpainAndrea Passerini, Trento, ItalyMirko Degli Esposti, Bologna, Italy

Creativity has become the motto of the modern world: everyone, every institution,and every company is exhorted to create, to innovate, to think out of the box. Thiscalls for the design of a new class of technology, aimed at assisting humans in tasksthat are deemed creative.

Developing a machine capable of synthesizing completely novel instances from acertain domain of interest is a formidable challenge for computer science, withpotentially ground-breaking applications in fields such as biotechnology, design,and art. Creativity and originality are major requirements, as is the ability to interactwith humans in a virtuous loop of recommendation and feedback. The problemcalls for an interdisciplinary perspective, combining fields such as machinelearning, artificial intelligence, engineering, design, and experimental psychology.Related questions and challenges include the design of systems that effectivelyexplore large instance spaces; evaluating automatic generation systems, notably increative domains; designing systems that foster creativity in humans; formalizing(aspects of) the notions of creativity and originality; designing productivecollaboration scenarios between humans and machines for creative tasks; andunderstanding the dynamics of creative collective systems.

This book series intends to publish monographs, textbooks and edited books witha strong technical content, and focuses on approaches to computational synthesisthat contribute not only to specific problem areas, but more generally introduce newproblems, new data, or new well-defined challenges to computer science.

More information about this series at http://www.springer.com/series/15219

http://www.springer.com/series/15219

Jean-Pierre BriotFrançois-David Pachet

Deep Learning Techniquesfor Music Generation

• Gaëtan Hadjeres

Jean-Pierre Briot Gaëtan Hadjeres LIP6, Sorbonne Université, CNRS Sony Computer Science Laboratories Paris, France Paris, France François-David Pachet Spotify Creator Technology Research Lab Paris, France

ISSN 2509-6575 ISSN 2509-6583 (electronic) Computational Synthesis and Creative Systems ISBN 978-3-319-70162-2 ISBN 978-3-319-70163-9 (eBook) https://doi.org/10.1007/978-3-319-70163-9 © Springer Nature Switzerland AG 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

https://doi.org/10.1007/978-3-319-70163-9

Jean-Pierre Briot dedicates this book to thememory of his late colleague, musician andfriend, Les Gasser.

Preface

This book is a survey and an analysis of different ways of using deep learning (deepartificial neural networks) to generate musical content. We propose a methodologybased on five dimensions for our analysis:

• Objective

– What musical content is to be generated?Examples are: melody, polyphony, accompaniment or counterpoint.

– For what destination and for what use?To be performed by a human(s) (in the case of a musical score), or by a ma-chine (in the case of an audio file).

• Representation

– What are the concepts to be manipulated?Examples are: waveform, spectrogram, note, chord, meter and beat.

– What format is to be used?Examples are: MIDI, piano roll or text.

– How will the representation be encoded?Examples are: scalar, one-hot or many-hot.

• Architecture

– What type(s) of deep neural network is (are) to be used?Examples are: feedforward network, recurrent network, autoencoder or gen-erative adversarial networks.

• Challenge

– What are the limitations and open challenges?Examples are: variability, interactivity and creativity.

vii

viii Preface

• Strategy

– How do we model and control the process of generation?Examples are: single-step feedforward, iterative feedforward, sampling or in-put manipulation.

For each dimension, we conduct a comparative analysis of various models andtechniques and we propose some tentative multidimensional typology. This typol-ogy is bottom-up, based on the analysis of many existing deep-learning based sys-tems for music generation selected from the relevant literature. These systems aredescribed in this book and are used to exemplify the various choices of objective,representation, architecture, challenge and strategy. The last part of this book in-cludes some discussion and some prospects. A table of contents, a list of tables, alist of figures, a table of acronyms, a bibliography, a glossary and an index completethis book.

Supplementary material is provided at the following companion web site:

www.briot.info/dlt4mg/

Paris and Rio de Janeiro, Jean-Pierre BriotParis, Gaetan HadjeresParis, Francois-David Pachet

http://www.briot.info/dlt4mg/

Acknowledgements

This research was partly conducted within the Flow Machines project which re-ceived funding from the European Research Council under the European Union Sev-enth Framework Programme (FP/2007-2013) / ERC Grant Agreement n. 291156.

The authors thank CNRS, LIP6, Sorbonne Universite, Sony CSL and SpotifyCTRL for their support and research environments. Jean-Pierre Briot also thanksCAPES, PUC-Rio and UNIRIO additional support.

The authors thank Ronan Nugent, senior editor at Springer, for his careful super-vision of the whole publishing process.

Jean-Pierre would like to thank his wife Marta for her patience and support dur-ing the making of this book.

ix

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Computer-Based Music Systems . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Autonomy versus Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Symbolic versus Sub-Symbolic AI . . . . . . . . . . . . . . . . . . . . . . 41.1.4 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.5 Present and Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Other Books and Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Other Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Deep Learning versus Markov Models . . . . . . . . . . . . . . . . . . 71.2.4 Requisites and Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.5 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.4 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.5 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.1 Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.2 Destination and Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.4 Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

xi

xii Contents

4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.1 Phases and Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Audio versus Symbolic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3.1 Waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.2 Transformed Representations . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3.3 Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3.4 Chromagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4 Symbolic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.5 Main Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5.1 Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5.2 Rest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5.3 Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5.4 Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5.5 Rhythm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.6 Multivoice/Multitrack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.7 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.7.1 MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7.2 Piano Roll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.7.3 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.7.4 Markup Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.7.5 Lead Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.8 Temporal Scope and Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.8.1 Temporal Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.8.2 Temporal Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.9 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.9.1 Note Hold/Ending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.9.2 Note Denotation (versus Enharmony) . . . . . . . . . . . . . . . . . . . 404.9.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.10 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.10.1 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.10.2 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.10.3 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.11 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.11.1 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.11.2 From One-Hot to Many-Hot and to Multi-One-Hot . . . . . . . . 434.11.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.11.4 Binning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.11.5 Pros and Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.11.6 Chords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.11.7 Special Hold and Rest Symbols . . . . . . . . . . . . . . . . . . . . . . . . 464.11.8 Drums and Percussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.12 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.12.1 Transposition and Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 474.12.2 Datasets and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Contents xiii

5 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.1 Introduction to Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.1.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.1.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.1.4 Gradient Descent Training Algorithm . . . . . . . . . . . . . . . . . . . 555.1.5 From Model to Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.1.6 From Model to Linear Algebra Representation . . . . . . . . . . . . 565.1.7 From Simple to Multivariate Model . . . . . . . . . . . . . . . . . . . . . 575.1.8 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Basic Building Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2.1 Feedforward Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2.2 Computing Multiple Input Data Simultaneously . . . . . . . . . . 63

5.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.3.2 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.3.3 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.4 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.5 Multilayer Neural Network aka Feedforward Neural Network . . . . . 67

5.5.1 Abstract Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.5.2 Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5.3 Output Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.5.4 Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.5.5 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.5.6 Entropy and Cross-Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.5.7 Feedforward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.5.8 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.5.9 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.5.10 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.5.11 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.5.12 Platforms and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.6 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.6.1 Sparse Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.6.2 Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.6.3 Stacked Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.7 Restricted Boltzmann Machine (RBM). . . . . . . . . . . . . . . . . . . . . . . . . 905.7.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.7.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.7.3 Types of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.8 Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.8.1 Visual Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.8.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.8.3 Long Short-Term Memory (LSTM) . . . . . . . . . . . . . . . . . . . . . 985.8.4 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

xiv Contents

5.9 Convolutional Architectural Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.9.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.9.2 Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.9.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.9.4 Multilayer Convolutional Architecture . . . . . . . . . . . . . . . . . . 1015.9.5 Convolution over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.10 Conditioning Architectural Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.11 Generative Adversarial Networks (GAN) Architectural Pattern . . . . 104

5.11.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.12 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.13 Compound Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.13.1 Composition Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.13.2 Bidirectional RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105.13.3 RNN Encoder-Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.13.4 Variational RNN Encoder-Decoder . . . . . . . . . . . . . . . . . . . . . 1125.13.5 Polyphonic Recurrent Networks . . . . . . . . . . . . . . . . . . . . . . . . 1135.13.6 Further Compound Architectures . . . . . . . . . . . . . . . . . . . . . . . 1135.13.7 The Limits of Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6 Challenge and Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.1 Notations for Architecture and Representation Dimensions . . . . . . . . 1156.2 An Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2.1 Single-Step Feedforward Strategy . . . . . . . . . . . . . . . . . . . . . . 1166.2.2 Example: MiniBach Chorale Counterpoint

Accompaniment Symbolic Music GenerationSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2.3 A First Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.3 A Tentative List of Limitations and Challenges . . . . . . . . . . . . . . . . . . 1206.4 Ex Nihilo Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.4.1 Decoder Feedforward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.4.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.5 Length Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.5.1 Iterative Feedforward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.6 Content Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.6.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.7 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.7.1 Example: Performance RNN Piano Polyphony Symbolic

Music Generation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1376.8 RNN and Iterative Feedforward Revisited . . . . . . . . . . . . . . . . . . . . . . 138

6.8.1 #1 Example: Time-Windowed Melody Symbolic MusicGeneration System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.8.2 #2 Example: Sequential Melody Symbolic MusicGeneration System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.8.3 #3 Example: BLSTM Chord Accompaniment SymbolicMusic Generation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Contents xv

6.8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1476.9 Melody-Harmony Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.9.1 #1 Example: RNN-RBM Polyphony Symbolic MusicGeneration System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.9.2 #2 Example: Hexahedria Polyphony Symbolic MusicGeneration Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6.9.3 #3 Example: Bi-Axial LSTM Polyphony Symbolic MusicGeneration Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6.10 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556.10.1 Dimensions of Control Strategies . . . . . . . . . . . . . . . . . . . . . . . 1556.10.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1566.10.3 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.10.4 Input Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1776.10.5 Input Manipulation and Sampling . . . . . . . . . . . . . . . . . . . . . . 1856.10.6 Reinforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1896.10.7 Unit Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

6.11 Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1956.11.1 Composition Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1966.11.2 Timbre Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1966.11.3 Performance Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1986.11.4 Example: FlowComposer Composition Support Environment199

6.12 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016.12.1 Example: MusicVAE Multivoice Hierarchical Symbolic

Music Generation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2026.12.2 Other Temporal Architectural Hierarchies . . . . . . . . . . . . . . . . 206

6.13 Originality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2076.13.1 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2076.13.2 Creative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . 208

6.14 Incrementality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2106.14.1 Note Instantiation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 2106.14.2 Example: DeepBach Chorale Multivoice Symbolic Music

Generation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2116.15 Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

6.15.1 #1 Example: deepAutoController Audio Music GenerationSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

6.15.2 #2 Example: DeepBach Chorale Symbolic MusicGeneration System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

6.15.3 Interface Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2166.16 Adaptability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2176.17 Explainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

6.17.1 #1 Example: BachBot Chorale Polyphonic SymbolicMusic Generation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

6.17.2 #2 Example: deepAutoController Audio Music GenerationSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

6.17.3 Towards Automated Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 221

xvi Contents

6.18 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

7 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2237.1 Referencing and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2237.2 System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2287.3 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

8 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2438.1 Global versus Time Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2438.2 Convolution versus Recurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2448.3 Style Transfer and Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 2458.4 Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2468.5 Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2468.6 Evaluation and Creativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2478.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

List of Tables

5.1 Relation between output activation function and cost (loss) function . 72

6.1 MiniBach summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.2 DeepHearM summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.3 deepAutoController summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4 RBMC summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.5 BluesC summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.6 BluesMC summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.7 Examples of PHCCCF pitch representation . . . . . . . . . . . . . . . . . . . . . . 1336.8 CONCERT summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1346.9 Celtic system summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.10 Performance RNN summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.11 Time-Windowed summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.12 Sequential architecture summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.13 BLSTM summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1466.14 RNN-RBM summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1496.15 Hexahedria summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1536.16 Bi-Axial LSTM summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1546.17 VRAE summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586.18 GLSR-VAE summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1606.19 C-RNN-GAN summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1626.20 Rhythm system summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1666.21 WaveNet summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676.22 MidiNet summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1706.23 DeepJ summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1726.24 Anticipation-RNN summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746.25 VRASH summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1756.26 DeepHearC summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1786.27 C-RBM summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1896.28 RL-Tuner summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1926.29 Unit selection summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

xvii

xviii List of Tables

6.30 Audio (timbre) style transfer (AST) summary . . . . . . . . . . . . . . . . . . . . 1976.31 MusicVAE summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2066.32 DeepBach summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2146.33 BachBot summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

7.1 Systems referencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2247.2 Abbreviations for the types of objective . . . . . . . . . . . . . . . . . . . . . . . . . 2257.3 Abbreviations for the types of representation . . . . . . . . . . . . . . . . . . . . . 2267.4 Abbreviations for the types of architecture . . . . . . . . . . . . . . . . . . . . . . . 2267.5 Abbreviations for the types of challenge . . . . . . . . . . . . . . . . . . . . . . . . 2277.6 Abbreviations for the types of strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 2277.7 Systems summary (1/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2297.8 Systems summary (2/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2307.9 System × Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2317.10 System × Representation (1/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2327.11 System × Representation (2/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2337.12 System × Architecture & Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2347.13 System × Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2357.14 Representation × Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2377.15 Architecture & Strategy × Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 2387.16 Objective & Architecture & Strategy × Representation . . . . . . . . . . . . 2397.17 Strategy × Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2407.18 Architecture & Strategy × Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

List of Figures

4.1 Example of a waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2 Example of a waveform with a fine grain resolution. Excerpt from

a waveform visualization (sound of a guitar) by Michael Jancsyreproduced from “https://plot.ly/˜michaeljancsy/205.embed” withpermission of the author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Example of a spectrogram of the spoken words “nineteenthcentury”. Reproduced from Aquegg’s original image at“https://en.wikipedia.org/wiki/Spectrogram” . . . . . . . . . . . . . . . . . . . . . 23

4.4 Examples of chromagrams. (a) Musical score of a C-major scale.(b) Chromagram obtained from the score. (c) Audio recording ofthe C-major scale played on a piano. (d) Chromagram obtainedfrom the audio recording. Reproduced from Meinard Mueller’soriginal image at “https://en.wikipedia.org/wiki/Chroma feature”under a CC BY-SA 3.0 licence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5 C major chord with an open position/voicing: 1-5-3 (root, 5th and3rd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.6 Excerpt from a MIDI file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.7 Score corresponding to the MIDI excerpt . . . . . . . . . . . . . . . . . . . . . . . . 304.8 Automated piano and piano roll. Reproduced from Yaledmot’s

post “https://www.youtube.com/watch?v=QrcwR7eijyc” withpermission of YouTube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.9 Example of symbolic piano roll. Reproduced from [70] withpermission of Hao Staff Music Publishing (Hong Kong) Co Ltd. . . . . 31

4.10 Score of “A Cup of Tea” (Traditional). Reproduced from TheSession [98] with permission of the manager . . . . . . . . . . . . . . . . . . . . . 32

4.11 ABC notation of “A Cup of Tea”. Reproduced from The Session[98] with permission of the manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.12 Folk-rnn notation of “A Cup of Tea”. Reproduced from [178] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.13 Lead sheet of “Very Late” (Pachet and d’Inverno). Reproducedwith permission of the composers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xix

https://plot.ly/%CB%9Cmichaeljancsy/205.embed%E2%80%9D

https://en.wikipedia.org/wiki/Spectrogram%E2%80%9D

https://en.wikipedia.org/wiki/Chroma

https://www.youtube.com/watch?v=QrcwR7eijyc%E2%80%9D

xx List of Figures

4.14 Temporal scope for a piano roll-like representation . . . . . . . . . . . . . . . 374.15 a) Extract from a J. S. Bach chorale and b) its representation using

the hold symbol “ ”. Reproduced from [69] with permission of theauthors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.16 Various types of encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1 Example and counterexample of linear separability . . . . . . . . . . . . . . . 515.2 Example of simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.3 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.4 Architectural model of linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 575.5 Architectural model of multivariate linear regression . . . . . . . . . . . . . . 585.6 Architectural model of multivariate linear regression showing the

bias and the weights corresponding to the connexions to the thirdoutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.7 Architectural model of multivariate linear regression with activationfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.8 Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.9 Tanh function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.10 ReLU function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.11 Example of a feedforward neural network (detailed) . . . . . . . . . . . . . . 675.12 Example of feedforward neural network (simplified) . . . . . . . . . . . . . . 685.13 Example of a feedforward neural network (abstract) . . . . . . . . . . . . . . . 685.14 (left) GoogLeNet 27-layer deep network architecture. Reproduced

from [181] with permission of the authors. (right) ResNet 34-layerdeep network architecture. Reproduced from [73] with permissionof the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.15 Cost functions and interpretation for real and binary values . . . . . . . . . 735.16 Cost function and interpretation for a multiclass single label . . . . . . . . 745.17 Cost function and interpretation for a multiclass multilabel . . . . . . . . . 745.18 Cost function and interpretation for a multi multiclass single label . . . 755.19 -log function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.20 Example of a feedforward neural network (abstract) pipelined

computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.21 Underfit, good fit and overfit models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.22 Autoencoder architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.23 Visualization of the input image motives that maximally activate

each of the hidden units of a sparse autoencoder architecture.Reproduced from [140] with permission of the author . . . . . . . . . . . . . 84

5.24 Various digits generated by decoding sampled latent points atregular intervals on the MNIST handwritten digits database . . . . . . . . 86

5.25 Comparison of interpolations between the top and the bottommelodies by (left) interpolating in the data (melody) space and(right) interpolating in the latent space and decoding it intomelodies. Reproduced from [161] with permission of the authors . . . 88

List of Figures xxi

5.26 A 2-layer stacked autoencoder architecture, resulting in a 4-layerfull architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.27 Restricted Boltzmann machine (RBM) architecture . . . . . . . . . . . . . . . 905.28 Recurrent neural network (folded) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.29 Recurrent neural network (unfolded) . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.30 Standard connexions versus recurrent connexions (unfolded) . . . . . . . 945.31 Recurrent neural network (folded) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.32 Recurrent neural network (unfolded) . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.33 Training a recurrent neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.34 LSTM architecture (conceptual) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.35 Convolution, filter and feature map. Inspired by Karn’s data science

blog post [96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.36 Pooling. Inspired by Karn’s data science blog post [96] . . . . . . . . . . . . 1025.37 Convolutional deep neural network architecture. Inspired by Karn’s

data science blog post [96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.38 Conditioning architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.39 Generative adversarial networks (GAN) architecture. Reproduced

from [157] with permission of O’Reilly Media . . . . . . . . . . . . . . . . . . . 1055.40 Reinforcement learning – conceptual model. Reproduced from [39]

with permission of SAGE Publications, Inc./Corwin . . . . . . . . . . . . . . 1075.41 Bidirectional RNN architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105.42 RNN Encoder-Decoder architecture. Inspired from [19] . . . . . . . . . . . 1115.43 RNN Encoder-Decoder audio Word2Vec architecture. Reproduced

from [25] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . 112

6.1 MiniBach architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.2 MiniBach architecture and encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.3 Example of a chorale counterpoint generated by MiniBach from a

soprano melody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.4 DeepHear stacked autoencoder architecture. Extension of a figure

reproduced from [179] with permission of the author . . . . . . . . . . . . . . 1226.5 Training DeepHear. Extension of a figure reproduced from [179]

with permission of the author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226.6 Generation in DeepHear. Extension of a figure reproduced from

[179] with permission of the author . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.7 Samples generated by the RBM trained on J. S. Bach chorales.

Reproduced from [11] with permission of the authors . . . . . . . . . . . . . 1266.8 A chord training example for blues generation. Reproduced from

[42] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.9 Blues chord generation architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.10 Example of blues generated (excerpt). Reproduced with permission

of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.11 Sampling the softmax output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1316.12 CONCERT PHCCCH pitch representation. Inspired by [172] and

[138] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

xxii List of Figures

6.13 CONCERT duration representation. Inspired by [138] . . . . . . . . . . . . . 1336.14 CONCERT architecture. Reproduced from [138] with permission

of Taylor & Francis (www.tandfonline.com) . . . . . . . . . . . . . . . . . . . . . 1346.15 Example of melody generation by CONCERT based on the J.

S. Bach training set. Reproduced from [138] with permission ofTaylor & Francis (www.tandfonline.com) . . . . . . . . . . . . . . . . . . . . . . . 135

6.16 Score of “The Mal’s Copporim” automatically generated.Reproduced from [178] with permission of the authors . . . . . . . . . . . . 136

6.17 Example of Performance RNN representation. Reproduced from[173] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.18 Time-Windowed architecture. Inspired from [189] . . . . . . . . . . . . . . . . 1406.19 Sequential architecture. Inspired from [189] . . . . . . . . . . . . . . . . . . . . . 1426.20 Examples of melodies generated by the Sequential architecture. (o)

Original plan melody learnt. (e1 and e2) Melodies generated byextrapolating from a new plan melody. Inspired from [189] . . . . . . . . . 142

6.21 Examples of melodies generated by the Sequential architecture.(oA and oB) Original plan melodies learnt. (i1 and i2) Melodiesgenerated by interpolating between oA plan and oB plan melodies.Inspired from [189] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.22 Example of extracted data from a single measure. Reproduced from[119] under a CC BY 4.0 licence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.23 BLSTM architecture. Reproduced from [119] under a CC BY 4.0licence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.24 Comparison of generated chord progressions (HMM, DNN-HMM,BLSTM and original). Reproduced from [119] under a CC BY 4.0licence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.25 RNN-RBM architecture. Reproduced from [12] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.26 Example of a sample generated by RNN-RBM trained on J. S.Bach chorales. Reproduced from [12] with permission of the authors 150

6.27 Hexahedria architecture (folded). Reproduced from [93] withpermission of the author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6.28 Hexahedria architecture (unfolded). Reproduced from [93] withpermission of the author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

6.29 Bi-Axial LSTM architecture. Reproduced from [126] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

6.30 Example of Bi-Axial LSTM generated music (excerpt).Reproduced from [94] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

6.31 Visualization of the VRAE latent space encoded data. Extendedfrom [49] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . 159

6.32 Visualization of GLSR-VAE latent space encoded data. Reproducedfrom [68] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . 160

6.33 Examples of 2 measures long melodies (separated by doublebar lines) generated by GLSR-VAE. Reproduced from [68] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

http://www.tandfonline.com

http://www.tandfonline.com

List of Figures xxiii

6.34 C-RNN-GAN architecture. Reproduced from [134] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

6.35 C-RNN-GAN generated example (excerpt). Reproduced from[134] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6.36 Rhythm generation architecture. Reproduced from [122] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

6.37 Example of a rhythm pattern generated. The five lines of the pianoroll correspond (downwards) to: kick, snare, toms, hi-hat andcymbals. Reproduced from [122] with permission of the authors . . . . 165

6.38 Example of a rhythm pattern generated with a specific bass line asthe conditioning input. Reproduced from [122] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

6.39 WaveNet architecture. Reproduced from [193] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.40 MidiNet architecture. Reproduced from [211] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

6.41 Architecture of the MidiNet generator. Reproduced from [211]with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

6.42 DeepJ architecture. Reproduced from [126] with permission of theauthors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

6.43 Example of baroque music generated by DeepJ. Reproduced from[126] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

6.44 Visualization of DeepJ embedding space. Extended from [126]with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.45 Anticipation-RNN architecture. Reproduced from [67] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

6.46 Examples of melodies generated by Anticipation-RNN.Reproduced from [67] with permission of the authors . . . . . . . . . . . . . 175

6.47 VRASH architecture. Reproduced from [188] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.48 VRASH architecture with a focus on the decoder. Extended from[188] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.49 Deep Dream architecture (conceptual) . . . . . . . . . . . . . . . . . . . . . . . . . . 1796.50 Deep Dream. Example of a higher-layer unit maximization

transformation. Created by Google’s Deep Dream. Original picture:Abbey Road album cover, Beatles, Apple Records (1969). Originalphotograph by Iain Macmillan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

6.51 Deep Dream architecture focusing on a lower-level unit . . . . . . . . . . . 1816.52 Deep Dream. Example of a lower-layer unit maximization

transformation. Reproduced from [136] under a CC BY 4.0 licence.Original photograph by Zachi Evenor . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

6.53 Style transfer full architecture/process. Extension of a figurereproduced from [55] with permission of the authors . . . . . . . . . . . . . . 183

6.54 Tubingen’s Neckarfront. Photograph by Andreas Praefcke.Reproduced from [55] with permission of the authors . . . . . . . . . . . . . 183

xxiv List of Figures

6.55 Style transfer of “The Starry Night” by Vincent van Gogh (1889)on Tubingen’s Neckarfront photograph. Reproduced from [55]with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

6.56 Style transfer of “The Shipwreck of the Minotaur” by J. M. W.Turner (1805) on Tubingen’s Neckarfront photograph. Reproducedfrom [55] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . 185

6.57 Variations on the style transfer of “Composition VII” by WassilyKandinsky (1913) on Tubingen’s Neckarfront photograph.Reproduced from [55] with permission of the authors . . . . . . . . . . . . . 186

6.58 C-RBM architecture. Reproduced from [108] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.59 Piano roll sample generated by C-RBM. Reproduced withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.60 RL-Tuner architecture. Reproduced from [92] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

6.61 Evolution during training of the two types of rewards for theRL-Tuner architecture. Reproduced from [92] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

6.62 Unit selection indexing architecture. Reproduced from [13] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

6.63 Unit selection based on semantic cost. Reproduced from [13] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

6.64 Anisotropic music vs an isotropic image. Incorporating Aquegg’soriginal image from “https://en.wikipedia.org/wiki/Spectrogram”and the painting “The Starry Night” by Vincent van Gogh (1889) . . . 198

6.65 Yesterday (Lennon/McCartney) (first 15 measures) – originalharmonization. Reproduced from [152] with permission of theauthors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

6.66 Yesterday (Lennon/McCartney) (first 15 measures) –reharmonization by FlowComposer in the style of Michel Legrand.Reproduced from [152] with permission of the authors . . . . . . . . . . . . 200

6.67 Yesterday (Lennon/McCartney) (first 15 measures) –reharmonization by FlowComposer in the style of Bill Evans.Reproduced from [152] with permission of the authors . . . . . . . . . . . . 200

6.68 Flow Composer control panel. Reproduced from [152] withpermission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

6.69 Example of a Flow Composer interactively generated lead sheet.Reproduced from [149] with permission of the authors . . . . . . . . . . . . 201

6.70 MusicVAE architecture. Reproduced from [162] with permissionof the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

6.71 Example of a trio music generated by MusicVAE. Reproducedfrom [161] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . 204

6.72 Example of a melody generated (middle) by MusicVAE byaveraging the latent spaces of two melodies (top and bottom).Reproduced from [162] with permission of the authors . . . . . . . . . . . . 205

https://en.wikipedia.org/wiki/Spectrogram%E2%80%9D

List of Figures xxv

6.73 Example of a melody generated (bottom) by MusicVAE by addinga “high note density” attribute vector to the latent space of anexisting melody (top). Reproduced from [161] with permission ofthe authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

6.74 Correlation matrices of the effect of adding (left) of subtracting(right) an attribute to other attributes in MusicVAE. Reproducedfrom [162] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . 206

6.75 Creative adversarial networks (CAN) architecture. Reproducedfrom [44] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . 208

6.76 Examples of images generated by CAN. Reproduced from [44]with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

6.77 Note generation/instantiation – three main strategies . . . . . . . . . . . . . . 2116.78 DeepBach architecture. Reproduced from [69] with permission of

the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2126.79 DeepBach incremental generation/sampling algorithm. . . . . . . . . . . . . 2146.80 Example of a chorale generated by DeepBach. Reproduced from

[69] with permission of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2146.81 Snapshot of a deepAutoController information window showing

hidden units. Reproduced from [169] with permission of the authors . 2156.82 DeepBach user interface. Reproduced from [69] with permission of

the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2166.83 Example of score encoding in BachBot. Reproduced from [118] . . . . 2196.84 Correlation analysis of BachBot layer/unit activation. Reproduced

from [118] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

Acronyms

AF Activation functionAI Artificial intelligenceANN Artificial neural networkAST Audio style transferBALSTM Bi-Axial LSTMBLSTM Bidirectional LSTMBOW Bag-of-wordsBPTT Backpropagation through timeCAN Creative adversarial networksCBR Case-based reasoningCNN Convolutional neural networkConvNet Convolutional neural networkC-RBM Convolutional restricted Boltzmann machineCS Constrained samplingdBFS Decibel relative to full scaleEMI Experiments in musical intelligenceFFT Fast Fourier transformFM Frequency modulationGAN Generative adversarial networksGD Gradient descentGLO Generative latent optimizationGLSR Geodesic latent space regularizationGPU Graphics processing unitGRU Gated recurrent unitGS Gibbs samplingHMM Hidden Markov modelKL-divergence Kullback-Leibler divergenceLSDB Lead sheet data baseLSTM Long short-term memoryMFCC Mel-frequency cepstral coefficientsMIDI Musical instrument digital interface

xxvii

xxviii Acronyms

MIR Music information retrievalML Machine learningMLP Multilayer PerceptronMNIST Modified National Institute of Standards and TechnologyMSE Mean squared errorNLP Natural language processingNN Neural networkNTM Neural Turing machinePCA Principal component analysisRBM Restricted Boltzmann machineReLU Rectified linear unitRHN Recurrent highway networkRL Reinforcement learningRNN Recurrent neural networkSGD Stochastic gradient descentSGS Selective Gibbs samplingSVM Support vector machineTTS Text-to-speechVAE Variational autoencoderVRAE Variational recurrent autoencoderVRASH Variational recurrent autoencoder supported by history

computational synthesis and creative systems978-3-319-70163-9/1.pdf · authors or the editors give...

Documents