bachmmachine: an interpretable and scalable model for

BacHMMachine: An Interpretable and Scalable Model for AlgorithmicHarmonization for Four-part Baroque Chorales

Yunyao Zhu,1 Stephen Hahn, 1 Simon Mak, 1 Yue Jiang, 1 Cynthia Rudin 1

1 Duke [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract

Algorithmic harmonization – the automated harmonization ofa musical piece given its melodic line – is a challenging prob-lem that has garnered much interest from both music theo-rists and computer scientists. One genre of particular interestis the four-part Baroque chorales of J.S. Bach. Methods foralgorithmic chorale harmonization typically adopt a black-box, “data-driven” approach: they do not explicitly integrateprinciples from music theory but rely on a complex learningmodel trained with a large amount of chorale data. We pro-pose instead a new harmonization model, called BacHMMa-chine, which employs a “theory-driven” framework guidedby music composition principles, along with a “data-driven”model for learning compositional features within this frame-work. As its name suggests, BacHMMachine uses a novelHidden Markov Model based on key and chord transitions,providing a probabilistic framework for learning key modu-lations and chordal progressions from a given melodic line.This allows for the generation of creative, yet musicallycoherent chorale harmonizations; integrating compositionalprinciples allows for a much simpler model that results in vastdecreases in computational burden and greater interpretabil-ity compared to state-of-the-art algorithmic harmonizationmethods, at no penalty to quality of harmonization or musi-cality. We demonstrate this improvement via comprehensiveexperiments and Turing tests comparing BacHMMachine toexisting methods.

Chorale harmonization is the task of generating a musicallyappropriate harmony given a melody as the input. In thiswork, we focus on Baroque-style chorale harmonizationsin four voices (soprano, alto, tenor, and bass); the givenmelody is a pre-existing soprano voice, and the harmonyarises from the interaction of the four melodic lines. WhileBaroque-style chorale harmonization is a creative endeavor,it also follows formal rules set by musical convention. Forinstance, Gradus ad Parnassum (Fux 1725) is an early trea-tise on species counterpoint codifying Palestrina’s Renais-sance style. Subject to such principles, composers may writecoherent harmonizations in any number of ways, the choiceof which often reflecting a composer’s creative signature.

In recent years, algorithmic harmonization has garnerednotable interest from both music theorists and computer sci-entists. Much of the existing literature adopts a “data-driven”approach: chorales are first converted to training data, then

used within state-of-the-art machine learning models forharmonization. Such methods are typically “black-box,” ig-noring underlying compositional principles from music the-ory (e.g., the DeepBach model of Hadjeres, Pachet, andNielsen 2017 or the BachBot model of Liang et al. 2017).We provide a more comprehensive review of existing har-monization models in a later section.

Existing methods have important limitations. Becausesuch methods use complex models for learning chorale fea-tures, they typically do not embed the guiding principlesthat underlie chorale composition. The expectation is thatsuch models will learn these rules from data, but this learn-ing is not perfect, and results often lack the musical coher-ence present in human-written harmonizations. This is par-ticularly salient for Baroque chorales due to their highly or-ganized harmonic structure. Furthermore, by ignoring struc-ture provided by compositional principles, such harmoniza-tion models require a large amount of training data to learnthis structure, not to mention idiosyncratic characteristics ofgiven composers. For certain use cases, the training sam-ple size needed for satisfactory model training may not evenbe available, resulting in unsatisfactory performance. Evenwhen such data are available, training the harmonizationmodel with large datasets can be computationally expensive,error-prone, and difficult to troubleshoot or tune.

In our work, we adopt a “theory-driven” learning modelwhich emulates the process an expert musician may use forharmonization (Andrews and Sclater 1993). For Baroquechorales, this harmonization process typically involves (i)generating the tonal and harmonic progressions from thegiven melodic line and (ii) using the progressions to gen-erate voice-leading for the remaining voices. To mimic Step(i), our proposed model learns the relationship between thegiven melody and local tonalities (or keys) of the chorale aswell as the relationship between the melody and the under-lying harmonic progression. The tonal and harmonic pro-gressions can then be efficiently inferred via either Viterbidecoding (Viterbi 1967) or posterior decoding (Russell andNorvig 2002). Once the “backbone” of the chorale – its tonaland harmonic progressions – is generated, we mimic Step(ii) by building a probabilistic model for harmonization un-der the inferred progression subject to Baroque composi-tional guidelines.

Our model, which we call BacHMMachine, provides

arX

iv:2

109.

0762

3v1

[cs

.SD

] 1

5 Se

p 20

21

Figure 1: Annotated excerpt of J.S. Bach’s chorale harmo-nization, Jesu, deine tiefen Wunden, BWV 194/6

an efficient, theory-guided, interpretable, easy-to-tune ap-proach to the chorale harmonization problem. Human ex-periments suggest preference for BacHHMachine comparedto the best black-box harmonization methods. The proposedmethod also yields great computational savings comparedto the state-of-the-art – ∼30x faster than Google’s Coconet(Dinculescu et al. 2019) in generating chorales and∼1,000xfaster than the approach of Allan and Williams (2004). Fur-thermore, Turing tests suggest a surprisingly good ability togenerate convincing chorales. Finally, as we can interpret themodel directly, we are able to gain insights into music com-position that cannot be obtained using existing methods.

Musical BackgroundWe study in this work a specific genre of music: the four-part Baroque chorales of J.S. Bach. Figure 1 shows an ex-cerpt of one such chorale. Here, the soprano (top) voiceline in a chorale is the melody of the chorale, often takenfrom pre-existing hymn tunes. This melody can be inter-preted as the “horizontal” aspect of a piece of music. Onthe other hand, harmony involves the relationship betweenthe notes of the chorale sung simultaneously in differentvoices. In this sense, harmony is often viewed as the “ver-tical” aspect of music. Music theorists (see, e.g., Rameau1722) typically classify vertical harmony using Roman nu-meral notation (e.g., I, IV, V), with different numerals de-scribing unique structures. For instance, “I” represents tri-adic harmony built on the tonic scale degree (the first noteof the scale), while “V7” represents a “seventh” chord builton the dominant scale degree (the fifth note of the scale). Anexample is provided in Figure 1.

Music of the Baroque era usually follows a harmonicframework known as the “phrase model” (Laitz 2016; Whiteand Quinn 2018, see Figure 2). This framework dictates thatharmonies should progress from those functioning as tonic(I, VI, III) to those functioning as predominant (II, IV, VI)or dominant (V, VII). Predominants lead to dominants, anddominants resolve back to tonics. Within each functionalcategory (tonic, predominant, or dominant), harmonies tendto progress by root relations of a descending third or fifth.Harmonic progressions that go against the phrase model areknown as retrogressive and rarely occur in Baroque music.Chorales that satisfy the phrase model yield musically co-hesive harmonic progressions; those that do not may sound

Figure 2: Visualizing the phrase model harmonic framework(Laitz 2016). Numerals may be major, minor, or diminished.

improperly resolved, and are musically displeasing.Another important aspect of chorale composition is the

relationship between local tonalities. Tonality refers to a mu-sical passage’s centricity around a single tone, where othertones in the musical environment are hierarchically relatedto the central pitch. Temporary transitions from one tonality(or key) to another, or modulations, are widely used to pro-vide a more engaging and complex harmonic structure. Likesurface-level harmonic progressions, deeper tonal shifts inmusic also usually follow the phrase model. The sequenceof tonal shifts in a chorale, or its key progression, is integralto a chorale’s musical signature. Figure 1 shows the modula-tion from the tonic key of B-flat major to F major, then backto B-flat major.

Additional rules must be followed for pleasing and con-vincing compositions. For instance, there are rules regardingchord inversions in the bass voice, and principles prescribingwhich chord notes can be doubled or omitted. To ensure afull sound, the soprano and alto voices and the alto and tenorvoices should stay within an octave of each other. Notes inan upper voice should be relatively stable with sparing useof melodic leaps. Parallel fifths and octaves are strongly dis-couraged as they erode voice independence and are distract-ing to the listener. Non-chord tones (such as suspensions,passing, and neighboring tones) may be added to smoothvoice leading and add rhythmic diversity (see Figure 1).

Existing Harmonization MethodsExisting methods can be broadly grouped into those rely-ing on Markovian models and those based on deep learning.Yi and Goldsmith (2007) use a factored Markov decisionprocesses planner to generate chordal progressions basedon input melody. However, their resulting harmonization(see Figure 3) has serious flaws from a musical perspective.There are parallel octaves, unusual note doublings, dissonantleaps in the bass, awkward voice spacing and chord inver-sions (e.g., ending on I64 ), and retrogressive harmonic pro-gressions, which result in musically displeasing harmonies.

Kaliakatsos-Papakostas and Cambouropoulos (2014) in-troduce a hierarchical modeling approach using a hiddenMarkov model (HMM), with user-specified fixed-chords as“anchors,” and generation of chords connecting them. Us-ing a second HMM, they produce the bass voice given thechordal progression produced by the first model. Althoughintroducing chord constraints is musically interesting, the re-liance on human experts to manually insert fixed chords can

Figure 3: Violations of compositional principles for thechorale harmonization in Yi and Goldsmith (2007).

make the harmonization process less flexible. Additionally,their study also fails to incorporate non-chord tones and doesnot always adhere to Baroque composition principles.

Allan and Williams (2004) propose a more musically ag-nostic approach that uses HMMs to generate chordal pro-gressions. A first HMM is used to generate chord notes anda second HMM adds non-chord notes decorating the harmo-nization. One drawback is its overly large search space, re-quiring over 2,800 hidden states for chorale representation!One reason is that, for this model, hidden states represent theunique sequence of intervals from the bass note. As such,there are many musically redundant hidden states that repre-sent the same chord with different transpositions. In contrast,we adopt a more musically-informed (and much smaller)state space which captures tonality and chordal structure.

Some notable studies in chorale harmonization use deeplearning methods. HARMONET (Hild, Feulner, and Men-zel 1991) is a hierarchical architecture containing five neuralnets determining chordal progression, chord inversions, thebass note, and non-chord notes. Two recent approaches us-ing neural networks are DeepBach and BachBot. DeepBach(Hadjeres, Pachet, and Nielsen 2017) is a graphical modelthat generates four-part chorales using recurrent neural net-works, while BachBot (Liang et al. 2017) is an automaticcomposition system that uses a deep long short-term mem-ory model. Both approaches are supposedly agnostic as they“rely on little musical knowledge.” Likely because of this,generated harmonizations from such approaches often vio-late important composition guidelines. Furthermore, thesedeep neural networks require a large training set and are lessinterpretable and harder to tune than HMMs.

BacHMMachineWhat distinguishes our approach from existing work is theemphasis on emulating harmonization procedures of trainedmusicians that incorporate Baroque-style chorale composi-tion conventions. Our model includes three important novelelements: key modulation, data representation of each chordby its Roman numeral (including inversions), and strictlyfollowing chord inversion and voice-crossing constraints.

Modulation: Existing methods generally focus only onchord transitions, with chords assumed to belong to only one

key throughout the entire chorale. However, chord transi-tions also depend on their location with respect to the wholechorale and presence of modulation, and thus existing meth-ods do not capture these musically important nuances, over-looking the prevalence and importance of key modulationin Bach chorales. To produce an authentic-sounding harmo-nization, it is crucial to take into account the key transitionsin addition to the chord transitions.

Chord-Equivalence Representation: Many existingmethods treat any concurrent vertical combination of notesas a unique chord. However, functionally identical chordsmight occur in different forms (e.g., inversions, transposi-tions). To reduce dimensionality of the emission and transi-tion matrices, we treat these different forms as one, as is typ-ical in chorale analysis. We consider only 31 unique chordsin our training set and transpose all chorales to C major or Aminor, greatly reducing search space and training time. Weemphasize the underlying structure (chordal progression andkey modulation) rather than absolute pitches of notes.

Harmonization Constraints: We imposed basic harmo-nization constraints when specifying each voice line. Ona chord-level, we strictly follow the chord inversion andvoice crossing constraints, with combinations of notes vi-olating them not considered in the final harmonization. Ona chorale-level, once we generate possible harmonizationsfor the input melody, we check them for potential violationsof composition guidelines and assign a penalty whenever aviolation (e.g., voice crossing between chords, large jumps,parallel fifths/octaves) occurs. We output the harmonizationwith the lowest overall penalty.

Key-Chord Hidden Markov ModelThe first step in our harmonization framework is to in-fer plausible key and chordal progressions given an in-put melody. This is achieved by a novel Key-Chord HMMmodel which integrates tonality and chordal structure toachieve efficient, scalable and interpretable harmonization.

Let M = (m1, · · · ,mn) be the given melody line, withmt the melody note at time t (i.e., on the tth beat). This canbe seen as a sequence of visible states for the HMM. LetK = (k1, · · · , kn) be the hidden key progression captur-ing the modulation of the chorale, with kt ∈ K the key attime t, where K is the state space of 24 keys (12 major, 12minor). Let C = (c1, · · · , cn) be its hidden chordal progres-sion, with ct ∈ C the chord quality at time t, where C is thestate space of 31 chords. We aim to recover (or decode) thesequences of keys and chords K and C from the melody M.

We build the Key-Chord HMM in two stages, first for thekey progression, then for the chordal progression. For thekey sequence K, we impose the following first-order Marko-vian model on transition probabilities:

P(kt+1|k1, ..., kt) = P(kt+1|kt) =: TKkt,kt+1, (1)

for t = 1, · · · , n. Here, TKkt,kt+1:= P(kt+1|kt) denotes the

transition probability from key kt to kt+1, which we esti-mate using chorale data. This first-order Markovian assump-tion is standard for HMMs, and can be justified by the earlierphrase model chordal structure. Given key kt, we assume

Figure 4: Key-Chord HMM Visualization.

that mt, the melody note at time t, depends only on kt, i.e.:

P(mt|k1, ..., kt,m1, ...,mt−1) = P(mt|kt) =: EKkt,mt,

(2)

for t = 1, · · · , n. Here, EKkt,mtdenotes the emission proba-

bility of melody note mt from key kt. This is again a stan-dard HMM assumption, justifiable by the earlier discussionthat the melody line can be well-characterized by its under-lying tonality and chord quality.

Next, for the chord sequence C, we presume that the keysequence has already been decoded from data (call this in-ferred sequence K∗, more on this in the next subsection).We again adopt a first-order Markovian model for transitionprobabilities:

P(ct+1|c1, ..., ct) = P(ct+1|ct) =: TCct,ct+1, (3)

for t = 1, · · · , n. Here, TCct,ct+1:= P(ct+1|ct) denotes the

transition probability from chord ct to ct+1, which we es-timate from data. This can again be reasoned by the earlierphrase model. Given the inferred key k∗t and chord ct, wethen assume that the transposed melody note δt = mt − k∗t(i.e., modulo key change) follows the model:

P(δt|c1, ..., ct, δ1, ..., δt−1) = P(δt|ct) =: ECct,δt , (4)

for t = 1, · · · , n. This leverages the observation that similarharmonic structures are used over different tonalities. Fig-ure 4 visualizes the Key-Chord HMM model: the observedstates are the input soprano melody line and the hidden statesare the underlying keys and chords.

In practice, both the emission probabilities EK and EC ,as well as key and chord transition probabilities TK and TC ,must be estimated from chorale training data. We adopt thefollowing hybrid estimation approach. First, to ensure theharmonization does not violate progressions from the phrasemodel, we set the probabilities of retrogressive chord transi-tions (i.e., those violating the phrase model) to be near zero.The remaining parameters are then estimated from the train-ing data using maximum likelihood estimation (Casella andBerger 2021). This ensures our model not only generatesmusically coherent chordal progressions in line with com-positional principles, but also permits us to learn a com-poser’s creative style under such constraints. Our proposedmodel requires substantially fewer parameters than exist-ing HMM harmonization models (specifically Allan and

Algorithm 1 Key-Chord Viterbi decodingViterbi decoding for keys:• Set V K

0 (0)← 1, V Kk (0)← 0 for all k ∈ K.

• For t = 0, · · · , n− 1, update for all k ∈ K

V Kk (t+ 1)← max

i∈K

{V Ki (t)EKk,mt+1

TKi,k

}.

• Set K∗ as the key sequence achieving maxi∈K VKi (n).

Viterbi decoding for chords:• Set V C

0 (0)← 1, V Cc (0)← 0 for all c ∈ C.

• For t = 0, · · · , n− 1, update for all c ∈ C

V Cc (t+ 1)← max

i∈C

{V Ci (t)ECc,mt+1−k∗t+1

TCi,c

}.

• Set C∗ as the chord sequence achieving maxi∈C VCi (n).

Williams 2004, which requires estimation of over 2, 8002

transition probabilities). As shown later, this yields a compu-tationally efficient and interpretable harmonization model,competitive with state-of-the-art models in terms of harmo-nization quality.

Inferring Hidden Keys and Chords We employ two ap-proaches for inferring the underlying hidden key-chord se-quence from the Key-Chord HMM.

Viterbi Decoding: The Viterbi decoding algorithm(Viterbi 1967) is a popular dynamic programming methodfor inferring hidden states in HMMs and is widely usedin signal processing, natural language processing (Jurafsky2000), and other fields. Here, a two-step implementation ofthe Viterbi algorithm allows for efficient inference of the un-derlying key and chord sequences.

Given melody line M, the key inference problem can beformulated as

K∗ ∈ arg maxK

P(K|M). (5)

Here, P(K|M) is the posterior probability of a certain keysequence K given melody line M under the Key-ChordHMM. This optimization, however, involves |K|n variables,which can be high-dimensional. The Viterbi algorithm pro-vides an efficient way to solve this optimization prob-lem via dynamic programming. In our implementation, weused the Viterbi decoding function in the Python packagehmmlearn (Lebedev et al. 2021). Similarly, given melodyline M and and inferred key sequence K∗, the chord infer-ence problem can be formulated as

C∗ ∈ arg maxC

P(C|M−K∗). (6)

This can again be efficiently solved via the Viterbi algo-rithm, with the observed states now taken to be the trans-posed melody M−K∗. Algorithm 1 outlines this two-stageViterbi algorithm for inferring the underlying key-chord se-quence (K∗,C∗).

Posterior Decoding: Posterior decoding provides an alter-nate approach for hidden state inference. Instead of finding

the key and chord sequences that maximize the joint pos-terior probabilities P(K|M) and P(C|M), posterior encod-ing finds the key-chord combination (kt, ct) that maximizesthe marginal posterior probabilities P(kt|M) and P(ct|M)at each time t. As with the earlier key-chord Viterbi de-coding, an analogous two-step procedure can be used forkey-chord posterior decoding, by performing the forward-backwards algorithm (an efficient posterior decoding algo-rithm, see Russell and Norvig 2002) first for keys, then forchords. Both the Viterbi algorithm and posterior encodingare widely used for HMM decoding, and we employ both inour experiments.

Harmonization ModelIn the second step, we use the inferred key-chord progres-sion to generate the harmonization. From music theory, eachchord admits a limited number of note arrangements over thefour voices, and is subject to certain constraints to ensure amusically coherent harmonization. We embed the followingconstraints within our model:

1. No voice crossings: the harmonization must maintain theSoprano-Alto-Tenor-Bass (SATB) order vertically.

2. Voice lines should stay in their vocal range: alto notesshould be between F3-D5, tenor notes between B2-G4,and bass notes between E2-C4.

3. Voice spacing: the soprano and alto voices should not bemore than an octave apart; the same should also hold forthe alto and tenor voices.

4. Chord structure: chords should obey their specific in-version and note doubling guidelines; see Andrews andSclater (1993) for a detailed discussion of such rules.

Given the generated progression, we enumerate chord ar-rangements a

[1]t ,a

[2]t , ... that satisfy the above vertical con-

straints at time t. A chord arrangement at ∈ R3 representsthe alto, tenor, and bass notes. Given a selected chord ar-rangement at, we iterate through all possible consecutivechord arrangement pairs (at,a

[1]t+1), (at,a

[2]t+1), ... and select

the at+1 that minimizes the Euclidean distance between theconsecutive chord arrangements. With different chord ar-rangements at the first beat, different harmonizations can begenerated for a given chorale. In the case of a tie, we checkfor horizontal constraints such as voice crossings and paral-lel intervals. We allow occasional constraint violations (witha penalty incurred when violations are found), and select theharmonization with the lowest penalty.

Non-Chord NotesWe further diversify the rhythm of the generated harmoniza-tion by adding non-chord notes: ornamentation notes whichdo not belong to the given chordal progression. While thereare a variety of non-chord tones (see Andrews and Sclater1993), we focus on the following types:

1. Passing notes between neighboring notes a third apart.

2. Auxiliary notes between two repeated chord notes.

3. Appoggiaturas, or non-chord notes on strong beats.

These non-chord notes are added probabilistically and in-dependently at each beat, with probabilities estimated fromtraining chorales or specified manually.

DataWe use the Bach chorale corpus and chorale analyses pro-vided in the Music21 toolkit (Cuthbert and Ariza 2021),dividing chorales into those in major vs. minor keys. Whenprocessing chorale data, we treat quarter notes as the unitfor one time step. Notes lasting more than one beat are spliton a per-beat basis, and notes lasting less than one beat aregrouped into a list of notes that together span one beat. Ateach time step, melody notes are represented by their cor-responding MIDI pitches, chords are represented using Ro-man numeral notation, and keys are represented using let-ters. We note that the data used to fit BacHMMachine isdifferent from the data used in past studies. Instead of us-ing raw note data corresponding directly to pitches, we usethe annotated chorales (containing chordal and modulationinformation, see Figure 1) which are proofread by music the-orists (Jones, Tymoczko, and Robb 2021).

ExperimentsTo assess performance, we conducted audience-preferenceexperiments comparing our harmonizations to those gen-erated by the existing state-of-the-art algorithmic harmo-nization models. The first experiment (Exp. 1) comparesthe proposed BacHMMachine with the approach in Allanand Williams (2004), and the second experiment (Exp. 2)compares BacHMMachine with Google’s Coconet model(Huang et al. 2017; Dinculescu et al. 2019), which was fea-tured on Google’s homepage on Bach’s 334th birthday. Wefurther conducted a preference experiment and Turing testusing Bach’s original harmonizations (Exp. 3). Finally, wecompared the BacHMMachine with either Viterbi or poste-rior decoding (Exp. 4), and with or without the use of non-chord notes (Exp. 5). Each participant evaluated choralesgenerated from each of these five experiments, and also oneset of “sanity-check” questions which compare the originalBach chorale to one deliberately composed to be unpleasantand dissonant.

For each experiment, we randomly selected an existingmajor key chorale and used the soprano voice as the inputmelody. We displayed two video files, each showing the har-monization generated by one of the methods. Video filespresented a scrolling score and highlighted notes as theywere sung in SATB voices. We blinded the label of eachharmonization. To account for order effects, the harmoniza-tion generated using BacHMMachine had a 50% chance ofbeing first or second. Respondents could pause and replayvideos an unlimited number of times. Following the videos,we asked respondents their preference regarding the two har-monizations presented. Exp. 3 contained an additional Tur-ing test question asking which harmonization was composedby Bach. Responses were evaluated on a five-point Likertscale that accounted for the random ordering of each harmo-nization pair presented.

We launched all surveys on Amazon Mechanical Turk(MTurk), requesting responses from MTurk crowd-workers

Exp. 1: Comparison to Allan and Williams n (%)Definitely prefer BacHMMachine 32 (27.6%)Somewhat prefer BacHMMachine 33 (28.4%)No preference 2 (1.7%)Somewhat prefer Allan and Williams 25 (21.6%)Definitely prefer Allan and Williams 24 (20.7%)

Exp. 2: Comparison to Google’s CoconetDefinitely prefer BacHMMachine 26 (22.4%)Somewhat prefer BacHMMachine 33 (28.4%)No preference 11 (9.5%)Somewhat prefer Coconet 30 (25.9%)Definitely prefer Coconet 16 (13.8%)

Exp. 3: Comparison to original BachDefinitely prefer BacHMMachine 20 (17.2%)Somewhat prefer BacHMMachine 23 (19.8%)No preference 9 (7.8%)Somewhat prefer original Bach 41 (35.3%)Definitely prefer original Bach 23 (19.8%)

Table 1: Comparison of harmonization approaches: “Whichharmonization do you prefer?”

with master qualifications and approval rates of >95%. Werejected any responses that failed the “sanity-check” (i.e.,not indicating preference for Bach’s original harmonizationover our deliberately dissonant composition). Our final anal-ysis dataset consisted of 116 participants out of 160. Amongour participants, 101 (87%) reported enjoyment of classicalmusic or status as music students or professionals; only 15(13%) reported seldom listening to classical music.

Preference and Computational ResultsTable 1 shows the preference results for Exp. 1, Exp. 2, andExp. 3. We see that more respondents preferred BacHMMa-chine’s harmonizations to those generated by both the Allanand Williams’ implementation (Mammana et al. 2019) andGoogle’s Coconet application (Dinculescu et al. 2019), sug-gesting that the quality of BachHMMachine’s harmoniza-tion is as good as, if not better, than the existing harmo-nization approaches. We also see that more respondents pre-ferred Bach’s original harmonizations to ours, which is notat all surprising. However, only 13.8% of respondents iden-tified Bach’s original harmonizations with certitude accord-ing to the results of the Turing test (Table 2), with 36 (31%)respondents believing to some degree that our harmoniza-tion was more likely to have been composed by Bach. Thisresult suggests that, while BacHMMachine might not havecaptured all of Bach’s creative idiosyncrasies, it can producemusically convincing four-part chorale harmonizations onpar with (if not better than) those generated by more com-plex methods. For methodological choices regarding BacH-MMachine evaluated in Exp. 4 and Exp. 5, Table 3 suggeststhat more respondents preferred our harmonizations gener-ated using posterior decoding and with non-chord notes.

Table 4 reports the computation time (in seconds) neededto train each of the three algorithms on the same eightchorales, as performed on a personal computer using a 1.8GHz Dual-Core Intel Core i5 processor. The reported results

n (%)Definitely BacHMMachine 10 (8.6%)Somewhat BacHMMachine 26 (22.4%)No preference 22 (19.0%)Somewhat original Bach 42 (36.2%)Definitely original Bach 16 (13.8%)

Table 2: Turing test: “Which was composed by Bach?”

Exp. 4: Posterior decoding vs. Viterbi n (%)Definitely prefer Posterior decoding 24 (20.7%)Somewhat prefer Posterior decoding 33 (28.4%)No preference 16 (13.8%)Somewhat prefer Viterbi 26 (22.4%)Definitely prefer Viterbi 17 (14.7%)

Exp. 5: Inclusion of non-chord notesDefinitely prefer no non-chord notes 16 (13.8%)Somewhat prefer no non-chord notes 32 (27.6%)No preference 15 (12.9%)Somewhat prefer with non-chord notes 30 (25.9%)Definitely prefer with non-chord notes 23 (19.8%)

Table 3: Comparison of BacHMMachine decoding ap-proaches: “Which harmonization do you prefer?”

are the averages of 5 algorithm runs. We see that the train-ing time for the Allan and Williams’ algorithm is ∼1,000xthat of the proposed BacHMMachine. After both modelsare trained, the harmonization of a new chorale can be per-formed very quickly (under a second). Unfortunately, wecannot compare the model training time for Google’s Co-conet, since the application was already pre-trained. How-ever, the computation time for Coconet’s trained model toharmonize a new chorale is ∼30x longer than that of BacH-MMachine. This significant edge in computation time (eitherfor model training or harmonization) for BacHMMachineshows that the proposed approach is indeed much more scal-able than the state-of-the-art. This is again due, in large part,to the integration of compositional principles within the har-monization model, which allows for model reduction andefficient training.

InterpretabilityFigure 5 shows an excerpt of a harmonized chorale us-ing BacHMMachine, with posterior decoding and includingnon-chord notes. A careful musical analysis of this showsthe generated harmonization satisfies much of the composi-tional principles desired for Baroque chorale composition.We investigate this further below.

Figure 6 visualizes the key transition probabilities learnedby BacHMMachine. There are several observations of inter-est. First, as expected, we see that the estimated key prob-abilities strongly conform to the phrase model. Second, wesee that key transition probabilities demonstrate a high de-gree of key stability, meaning that the generated choralesonly change keys approximately once per phrase. This fre-quency of key changes is similar to Bach chorale harmoniza-tions. Finally, we also observe that all keys have a strong

Approach Train time Harmonization timeBacHMMachine 6.27 0.60Allan and Williams 6175.05 0.56Google’s Coconet (unknown) 18.56

Table 4: Comparison of training time on 8 chorales and har-monization time (given the trained model) in seconds.

Figure 5: Annotated excerpt of the BacHMMachine harmo-nization of melody from J.S. Bach’s chorale Ach Gott undHerr, BWV 255.

probability of modulating back to the tonic, with the nextmost common destination being the dominant, which againis in line with musical expectation (Laitz 2016). This showsthat, by integrating compositional principles within the har-monization model, BacHMMachine can learn interpretablestylistic features which are verifiable from music theory.

Figure 7 visualizes the chord transition probabilitieslearned by BacHMMachine, organized by the tonic (T), pre-dominant (PD) and dominant (D) chord groups from thephrase model (see Figure 2). We see that, as expected, theestimated chord probabilities from BacHMMachine closelyfollow the harmonic structure dictated by phrase model. Inparticular, the model learned that tonics progress to tonics,predominants, or dominants with near equal probability, thatmost predominants progress to dominants, and that mostdominants progress to tonics. This agrees with expectedchord transitions from the phrase model (Laitz 2016), whichdemonstrates the interpretability of the proposed model incorroborating stylistic features of Baroque chorales.

ConclusionIn this study, we described a probabilistic framework capa-ble of generating musically convincing chorale harmoniza-tions. The main strength of the model is that it is musicallyinformed, emulating the harmonization process of a hu-man composer by incorporating musical guidelines and con-straints. Because of this, we are able to reduce the number ofviolations of composition guidelines. By using chorale anal-yses instead of raw musical pitches as input data, our methodrequires considerably fewer hidden states and takes a tiny

Figure 6: Estimated key transition probabilities (%) in thetrained BacHMMachine model.

Figure 7: Estimated chord transition probabilities in thetrained BacHMMachine model, grouped by tonic (T), pre-dominant (PD) and dominant (D) chords.

fraction of the training time of other HMM approaches. Theuse of HMM models themselves leads to the generation pro-cess being more interpretable, with faster composition timescompared to recent deep learning approaches.

Because we directly use musical information, it isstraightforward to extend our model to take advantage of ad-ditional considerations. For instance, we may explore sup-plemental data-encoding schemes that discretize choralesinto smaller time units, add an additional layer regardingkey modulation or pivot chords, or consider higher-orderMarkovian models. Regardless, it is clear that by incor-porating musical principles in our model, we are able toachieve high-quality harmonization with much simpler andeasily-interpretable models at a tiny fraction of the compu-tational cost. We encourage future researchers to considersuch domain-specific information when designing genera-tive models to hopefully achieve similar results.

ReferencesAllan, M.; and Williams, C. K. I. 2004. HarmonisingChorales by Probabilistic Inference. In Proceedings of Neu-ral Information Processing Systems.

Andrews, W. G.; and Sclater, M. 1993. Materials of WesternMusic: Part 1. Gordon V. Thompson Music.

Casella, G.; and Berger, R. L. 2021. Statistical Inference.Cengage Learning.

Cuthbert, M. S.; and Ariza, C. 2021. Music21: A Toolkitfor Computer-Aided Musicology and Symbolic Music Data.URL https://github.com/cuthbertLab/music21.

Dinculescu, M.; Huang, C.-Z. A.; Cooijmans, T.; Roberts,A.; Courville, A.; and Eck, D. 2019. Coconet coucou. URLhttp://coconet.glitch.me/.

Fux, J. J. 1725. Gradus ad Parnassum (Steps or Ascent toMount Parnassus). W. W. Norton & Company.

Hadjeres, G.; Pachet, F.; and Nielsen, F. 2017. DeepBach: aSteerable Model for Bach Chorales Generation. In ICML.

Hild, H.; Feulner, J.; and Menzel, W. 1991. HARMONET:A Neural Net for Harmonizing Chorales in the Style of J.S. Bach. In Proceedings of Neural Information ProcessingSystems.

Huang, C.-Z. A.; Cooijmans, T.; Roberts, A.; Courville, A.;and Eck, D. 2017. Counterpoint by Convolution. In Inter-national Society for Music Information Retrieval (ISMIR).

Jones, A.; Tymoczko, D.; and Robb, H. 2021.Music21 Corpus: Bach Chorale Analyses. URLhttps://github.com/cuthbertLab/music21/tree/master/music21/corpus/bach/choraleAnalyses.

Jurafsky, D. 2000. Speech & Language Processing. PearsonEducation India.

Kaliakatsos-Papakostas, M. A.; and Cambouropoulos, E.2014. Probabilistic harmonization with fixed intermediatechord constraints. In ICMC.

Laitz, S. 2016. The Complete Musician: An Integrated Ap-proach to Tonal Theory, Analysis, and Listening. OxfordUniversity Press.

Lebedev, S.; Lee, A.; Varoquaux, G.; and Farrow, C. 2021.hmmlearn: Unsupervised learning and inference of Hid-den Markov Models. URL https://github.com/hmmlearn/hmmlearn.

Liang, F. T.; Gotham, M.; Johnson, M.; and Shotton, J.2017. Automatic Stylistic Composition of Bach Choraleswith Deep LSTM. In ISMIR.

Mammana, L.; Nisoli, E.; Moray, A.; and Williams, C.2019. Generation of bach chorales harmonisation us-ing Hidden Markov Models. URL https://github.com/lorenzomammana/py-bach-harmonisation.

Rameau, J.-P. 1722. Treatise on Harmony. Dover Publica-tions.

Russell, S.; and Norvig, P. 2002. Artificial Intelligence: AModern Approach. Pearson.

Viterbi, A. 1967. Error bounds for convolutional codesand an asymptotically optimum decoding algorithm. IEEETransactions on Information Theory 13(2): 260–269.White, C. W.; and Quinn, I. 2018. Chord Context andHarmonic Function in Tonal Music. Music Theory Spec-trum 40(2): 314–335O. ISSN 0195-6167. doi:10.1093/mts/mty021. URL https://doi.org/10.1093/mts/mty021.Yi, L.; and Goldsmith, J. 2007. Automatic Generation ofFour-part Harmony. In BMA.

https://github.com/cuthbertLab/music21

http://coconet.glitch.me/

https://github.com/cuthbertLab/music21/tree/master/music21/corpus/bach/choraleAnalyses

https://github.com/cuthbertLab/music21/tree/master/music21/corpus/bach/choraleAnalyses

https://github.com/hmmlearn/hmmlearn

https://github.com/hmmlearn/hmmlearn

https://github.com/lorenzomammana/py-bach-harmonisation

https://github.com/lorenzomammana/py-bach-harmonisation

https://doi.org/10.1093/mts/mty021

bachmmachine: an interpretable and scalable model for

Documents