summing matrix elements of the lieb-liniger model with … · 2020-06-23 · charles stross,...

Summing matrix elements of the Lieb-Linigermodel with reinforcement learning

Master’s thesis theoretical physicsconducted between September 2017 and July 2018

AuthorTeun Zwart (10499873)

SupervisorProf. Dr. Jean-Sébastien Caux

Second assessorDr. Jasper van Wezel

Instituut voor Theoretische FysicaFaculteit der Natuurwetenschappen, Wiskunde en Informatica

Universiteit van Amsterdam

Cover image: Randall Munroe, Machine Learning, xkcd, https://xkcd.com/1838.

https://xkcd.com/1838

Science is magic that works.

Kurt Vonnegut, Cat’s Cradle

All sufficiently advanced technology isindistinguishable from magic.

Arthur C. Clark

In the distance, the cat hears the soundof lobster minds singing in the void,a distant feed streaming from theircometary home as it drifts silently outthrough the asteroid belt, en route to achilly encounter beyond Neptune. Thelobsters sing of alienation and obso-lescence, of intelligence too slow andtenuous to support the vicious pace ofchange that has sandblasted the hu-man world until all the edges peoplecling to are jagged and brittle.

Charles Stross, Accelerando

3

Abstract

Correlation functions play an important role in physics, but analytical results do not alwaysexist. For Bethe ansatz solvable models the ABACUS algorithm can numerically find Fouriertransformed correlation functions known as dynamical structure functions or DSFs. Thisoccurs through the summation of matrix elements in the Lehmann representation of the DSF.This summing is not optimal, in that it does not sum matrix elements in strictly monotonicallydecreasing order.

In this thesis we apply reinforcement learning to the summing of matrix elements, hopingto develop a strategy that leads to more optimal summation than done by ABACUS. Wedevelop a representation of the Lieb-Liniger model suitable for use in machine learning anduse Q-learning to learn the summation strategy of the density operator. We find that thealgorithm is able to learn to sum in such a way that it gives preference to highly contributingstates, but that performance is far inferior to that shown by ABACUS. As a corollary we alsouse supervised learning to try to optimize the numerical evaluation of Bethe ansatz parametersknown as rapidities, but find that any advantage this method has declines considerably withsystem size. The evaluation time added by the neural network makes this method unfit forfinding rapidities.

Popular summary

Correlation functions play an important role in physics. They tell us how strongly microscopicvariables, such as the magnetization of density, at different times and locations are related ina model. Calculating these functions is impossible for many systems and has to be done witha computer. One way to do this is to transform the correlation function to a form in whichwe perform a summation of so-called matrix elements. Every matrix element is related to astate in which the model we are interested in can be. The problem is that, to be certain thatall information is considered, you need to sum the matrix elements of all states of the model.In many systems the number of states is large or infinite.

It turns out that only a small fraction of all states has to be considered to get a very goodapproximation of the correlation functions. This makes the calculation practical to perform.The ABACUS algorithm can do this summation for a class of systems that are “Bethe ansatzsolvable”, which means an exact solution exists. In the ideal case ABACUS could start withthe largest matrix element and then continue with progressively smaller matrix elements.Then a calculation could be stopped knowing that all important contributions up to thatpoint have been taken into account. Due to the complex relation between states and matrixelements, this turns out to be difficult to do.

The subject of this thesis is the optimization of the summation such that only importantmatrix elements are summed. For this we use reinforcement learning. Reinforcement learningis a form of machine learning in which an algorithm has an interaction with the environmentand sees the results of this interaction in the form of a reward, after which the interaction ischanged to optimize for higher rewards. In the past few years this class of algorithms hasknown many successes, among which is beating the world champion in the Japanese boardgame Go.

In this thesis we trained a reinforcement learning algorithm to select states with largematrix elements, given the current state the system is occupying. The algorithm was able tofind large matrix elements, especially when compared to randomly selected states. Comparedwith ABACUS however, ABACUS turned out to perform better.

4

Populaire samenvatting

Correlatiefuncties spelen een belangrijke rol in de natuurkunde. Ze vertellen ons hoe sterkmicroscopische variabelen, zoals de magnetisatie of de dichtheid, op verschillende plekkenen tijden in een materiaal of model, aan elkaar gerelateerd zijn. Het berekenen van dezefuncties is voor veel modellen onmogelijk en moet daarom met de computer worden gedaan.Een manier om dit te doen is door de correlatiefunctie te transformeren naar een vormwaarin we een optelling van zogenaamde matrixelementen uitvoeren. Ieder matrixelementis gerelateerd aan een toestand waarin het systeem waarin we geïnteresseerd zijn zich kanbevinden. Het probleem is dat, om zeker te weten dat je alle informatie hebt, je voor alletoestanden van het systeem de matrixelementen moet optellen. In veel systemen is dit eengroot of zelfs oneindig aantal.

Het blijkt echter dat maar een kleine fractie van alle toestanden hoeft te worden over-wogen om een heel goede benadering voor de correlatiefunctie te krijgen. Dit is wat deberekening praktisch maakt om uit te voeren. Het ABACUS algoritme kan dit doen voor eenklasse systemen die “Bethe ansatz solvable” zijn, wat wil zeggen dat er een exacte oplossingvoor bestaat. In het ideale geval zou ABACUS met het grootste matrixelement beginnen endaarna steeds kleine matrixelementen vinden. Zo kan een berekening afgebroken wordenin de wetenschap dat alle belangrijke bijdragen tot dat punt zijn meegenomen. Door decomplexe relatie tussen toestanden en matrixelementen blijkt het niet makkelijk om dit tedoen.

Het onderwerp van deze scriptie is het optimaliseren van deze optelling zodat alleenbelangrijke matrixelementen worden meegenomen. Hiervoor gebruiken we reinforcementlearning. Reinforcement learning is een vorm van machine learning waarin een algoritmeeen interactie met de omgeving heeft en de resultaten van die interactie ziet in de vorm vaneen beloning, waarna deze de interactie zo aanpast dat wordt geoptimaliseerd voor hogerebeloningen. In de afgelopen jaren hebben deze algoritmen veel successen gekend, waaronderhet verslaan van de wereldkampioen in het Japanse bordspel Go.

In deze scriptie trainden we een reinforcement learning algoritme om toestanden teselecteren met grote matrixelementen, gegeven een huidige toestand waarin het systeem zichbevond. Het algoritme bleek inderdaad in staat om grote matrixelementen te vinden, zekerin vergelijking met een willekeurige selectie van toestanden. In vergelijking met ABACUSbleek echter dat ABACUS beter functioneerde.

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Popular summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Populaire samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1 Introduction 7

2 The Bethe ansatz 92.1 The Lieb-Liniger model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 A solution for the Schrödinger equation . . . . . . . . . . . . . . . . . . . 92.1.2 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.3 The Gaudin norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 The thermodynamic limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1 The ground state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Excitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Displacement functions for excitations . . . . . . . . . . . . . . . . . . . . . . . . 152.3.1 The ground state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.2 Calculating L(F) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.3 Excited states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Form factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4.1 The algebraic Bethe ansatz . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4.2 Form factor for the ρ operator . . . . . . . . . . . . . . . . . . . . . . . . 23

3 The ABACUS algorithm 253.1 Numerically solving the Bethe equations . . . . . . . . . . . . . . . . . . . . . . . 263.2 Scanning over the Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Machine learning techniques 314.1 Machine learning in physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Machine learning: the basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.1 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Methods and results 385.1 Regressing rapidities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 Walking through state space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.2.1 State and move representation . . . . . . . . . . . . . . . . . . . . . . . . 435.2.2 Exact results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5

CONTENTS 6

6 Discussion and conclusion 606.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.1.1 Non-Markovian nature of DSF Hilbert scanning . . . . . . . . . . . . . . 606.1.2 Problems with reinforcement learning . . . . . . . . . . . . . . . . . . . . 60

6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2.1 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A Boundary term of the Lieb-Liniger model 67

B The f-sum rule 68B.1 The ρ operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

C Convergence results for machine learning rapidities 73

D Q-learning results 74

Bibliography 75

The future is already here — it’s justnot very evenly distributed.

William Gibson

1Introduction

Correlation functions are one of the central pillars of theoretical physics. These functions,generally of the form

⟨O(x , t)O(0,0)⟩ , (1.1)

describe the correlations in time and space of microscopic variables. Much work has gone intofinding correlation functions, but it is not clear that for a given model closed form solutionsexist.

In this thesis we consider the numerical evaluation of correlation functions for a classof models which are exactly solvable using the Bethe ansatz. Solutions of Bethe ansatzsolvable models take the form of (superpositions of) plane waves, parameterized by numbersknown as rapidities. If we quantize the system an equation known as the Bethe equationemerges, which allows for the calculation of these rapidities. Physical properties of Betheansatz solvable models have expressions in terms of rapidities [1]. For this class of modelsthe ABACUS algorithm [2] allows for the numerical evaluation of the Fourier transform ofthe correlation function, known as a dynamical structure function or DSF.

The DSF can be evaluated by considering its Lehmann representation:

S(k,ω) =2πN

∑

µ

�

�

λ0�

�Ok

�

�µ��

�

2δ�

ω− Eµ + E0

�

, (1.2)

where the sum runs over the eigenstates of the model under consideration. We focus onthis summation here. The summing of the matrix elements in equation (1.2) is difficultbecause the possible values of these matrix elements span many orders of magnitude. Tomaximize efficiency in the calculation of DSFs it is thus desirable to start summing the largestmatrix elements first and continue with progressively smaller values. This ensures thatthe largest possible contributions to the DSFs are taken into account when a calculationreaches the maximum allowed computation time. Ensuring this ordering of matrix elementsis a non-trivial task and ABACUS does not sum matrix elements in strictly monotonicallydecreasing order. Here we may hope to see an improvement.

To that end we turn to reinforcement learning. Reinforcement learning is a machinelearning technique in which a computer learns a strategy for solving a problem by trying

7

CHAPTER 1. INTRODUCTION 8

different actions, observing the rewards these actions bring, and then altering it decisionmaking process to optimize for higher rewards. In the past five years reinforcement learninghas seen impressive results. It has allowed superhuman performance on 1980s video games,purely by observing the pixels on screen [3,4]. Much publicised was AlphaGo [5], a programcapable of defeating the best human player in the board game Go, a feat thought to be atleast a decade away.

While the initial version of AlphaGo first trained on human expert games and then wenton to hone its abilities by playing itself, later versions started with only the rules of thegame and learned purely by self-play [6]. This resulted in play that was stronger than wheninitially trained on human games, with the same result seen in chess and shogi (Chinesechess) [7]. The Google Deepmind team behind these breakthroughs noted that [6], “Ourresults comprehensively demonstrate that a pure reinforcement learning approach is fully feasible,even in the most challenging of domains: it is possible to train to superhuman level, withouthuman examples or guidance, given no knowledge of the domain beyond basic rules. Furthermore,a pure reinforcement learning approach requires just a few more hours to train, and achievesmuch better asymptotic performance, compared to training on human expert data.”

This final quote inspired the use of reinforcement learning in the domain of DSF cal-culation. Our hope is that a reinforcement learning algorithm may beat the handcraftedheuristics used in ABACUS in efficiently summing matrix elements. The main question weseek to answer in this thesis is whether we can train a reinforcement learning algorithm todo so. To that end we must also determine how reinforcement learning works and can beapplied exploring the Hilbert space, and how different states should be represented for usein a reinforcement learning environment. As an addendum we will also consider whetherthe numerical calculation of rapidities benefits from using machine learning techniques.

We structure this thesis as follows: in chapter 2 we review some of the basics of theBethe ansatz and its use in solving the Lieb-Liniger model, which we will use throughoutthis thesis because it has properties that make it simpler to work with. Chapter 3 describesthe ABACUS algorithm, its use, and the state scanning algorithm. In chapter 4 we givean introduction to machine learning and specifically reinforcement learning. We describeour approach and results in chapter 5. Finally, chapter 6 discusses some of the difficultiesinherent in reinforcement learning, summarizes our results and makes recommendations forfuture research directions.

The code used to produce the results in this thesis is to be found athttps://github.com/teunzwart/deepscanning.

https://github.com/teunzwart/deepscanning

You haff to vork fery feery haaard.

Hans Bethe

2The Bethe ansatz

The Bethe ansatz allows for a way to solve certain one-dimensional systems exactly. Proposedby Bethe in 1931 to solve the Heisenberg model, it refined and corrected the earlier work ofBloch [8].

2.1 The Lieb-Liniger model

Here, to introduce and understand the Bethe ansatz, we focus on the Lieb-Liniger model,finding its eigenfunctions and excitation spectrum. The Lieb-Liniger model, first proposedby Lieb and Liniger in 1963 [9,10], consists of bosons interacting through a delta-functionpotential. The Hamiltonian is

H = −N∑

j=1

∂ 2

∂ x2j

+ 2c∑

j<l

δ�

x j − x l

�

. (2.1)

Here c is the interaction strength of the model, ħh= 1 and we set 2m= 1. In the limit c→ 0particles behave as free bosons, while c→∞ yields fermionic behaviour and is known asthe Tonks-Girardeau model [9,11]. We will only solve the repulsive case (c > 0) here.

2.1.1 A solution for the Schrödinger equation

We will start solving the Lieb-Liniger model in the two-particle case, which readily generalizesto many particles.

For two particles, the Hamiltonian becomes

H = −∂ 2

∂ x21

−∂ 2

∂ x22

+ 2cδ(x1 − x2). (2.2)

We can write this in a more convenient form without the delta-function by integratingthe Schrödinger equation over a small interval [−ε,ε]. This yields the condition [9] (seeappendix A for an explicit derivation)

9

2.1. THE LIEB-LINIGER MODEL 10

�

∂

∂ x2−∂

∂ x1− c�

ψ(x1, x2)

�

�

�

�

x1=x2

= 0. (2.3)

Since particles only interact with each other when their positions coincide, this allows us tosimplify the Schrödinger equation to

�

−∂ 2

∂ x21

−∂ 2

∂ x22

�

ψ(x1, x2) = Eψ(x1, x2), (2.4)

with equation (2.3) as a boundary condition [9]. We assume an ordering of the positionsx1 < x2. Since we are dealing with bosons, we also require the wave function to be symmetric:ψ(x1, x2) = ψ(x2, x1). Effectively we have rewritten the Schrödinger equation such thatwe only consider the wave function away from the delta-function interaction points. Theboundary condition then reintroduces the interaction terms.

We can now conjecture a solution for the Lieb-Liniger model based on equation (2.4).The Bethe ansatz solves the Schrödinger equation with a superposition of plane waves:

ψ(x1, x2) = A12ei(λ1 x1+λ2 x2) + A21ei(λ2 x1+λ1 x2), (2.5)

where the λ’s are pseudo-momenta, so-called because they are unobservable [11]. The A’sare the amplitudes of the plane waves making up the wave function. Since we must fulfillequation (2.3) the amplitudes have to obey the equation

A12

A21= −

c − i(λ1 −λ2)c + i(λ1 −λ2)

. (2.6)

The right-hand side of this equation has modulus 1, meaning we can write it as a phase:

A12

A21= −e−iφ(λ1−λ2), (2.7)

where [1]

φ(λ1 −λ2) =1i

log�

c + i(λ1 −λ2)c − i(λ1 −λ2)

�

, (2.8)

or

φ(λ1 −λ2) = 2arctan�

λ1 −λ2

c

�

(2.9)

if we assume the argument of φ to be real [12].We can now write the two-particle wave function by setting A21 = eiφ/2 (and ignoring

normalization for the moment) to get

ψ(x1, x2) = −e−i2φ(λ1−λ2)+i(λ1 x1+λ2 x2) + e

i2φ(λ1−λ2)+i(λ2 x1+λ1 x2). (2.10)

Generalization to the whole domain (not just where x1 < x2) follows from requiring thewave function to be fully symmetric under coordinate exchange, yielding

ψ(x1, x2) = sgn(x2 − x1)∑

P∈π2

(−1)[P]eiλP1x1+iλP2

x2−i2 sgn(x2−x1)φ(λP1

−λP2). (2.11)


The sum runs over the permutations of the set (1, 2) [13]. This wave function is antisymmetricunder exchange of quasimomenta, meaning that any wave function in which λ1 = λ2 is equalto zero.

Going to the many-particle state is now a straightforward generalization of the two-particle case. The wave function in this case becomes [14]

ψ(x1, . . . , xN ) =∑

P∈πN

A(P)ei∑

j λPjx j (2.12)

where we sum over permutations of the set (1, . . . , N), and we have a condition on theamplitudes [11]

A(P)A(P ′)

= −eiφ(λP−λP′). (2.13)

2.1.2 Quantization

Since we would like to study the properties of the system for finite densities, we confinethe system to a finite length L. This translates into a periodicity requirement on the wavefunction. It require that the wave functions obeys [11]

ψ�

x1, . . . , x j + L, . . . , xN

�

=ψ�

x1, . . . , x j , . . . , xN

�

∀ j ∈ 1, . . . , N . (2.14)

If we impose this condition on the wave function, the quasimomenta have to obey therelation [1]

eiλ j L =∏

l 6= j

λ j −λl + ic

λ j −λl − ic. (2.15)

These are the Bethe equations. If we take the logarithm of equation (2.15) we get

λ j +1L

N∑

l=1

φ�

λ j −λl

�

=2πL

I j , j = 1, . . . , N , (2.16)

where

I j ∈

¨

Z+ 12 , N even

Z N odd. (2.17)

The numbers I j uniquely label the quasimomenta and are half-odd integers. They act as thequantum numbers of the theory. These Bethe numbers may not coincide with each other, sincethis would also imply equal rapidities (theorem 3). Because the wave function is then equalto 0 such coincidences are not allowed, which is reminiscent of the Pauli principle [2,11].

We can get the momentum and energy of the Lieb-Liniger model from the rapidities. Bysumming over equation (2.16) the momentum is :

P =N∑

j=1

λ j =2πL

N∑

j=1

I j , (2.18)

where the second step is valid because we sum the odd part of the Bethe equations over aneven interval. The energy meanwhile is [1]

E =N∑

j=1

λ2j . (2.19)

We now prove three properties of the Bethe equations and Bethe numbers:


Theorem 1. The solutions of the Bethe equations equation (2.16) are unique and exist, given a(half-) integer set of quantum numbers {I} [15].

Proof. This follows from a variational argument, where the Bethe equations are the extremumof an action. The Yang-Yang action for the Lieb-Liniger model is

S =L2

N∑

j=1

λ2j − 2π

N∑

j=1

n jλ j +12

N∑

j=1

N∑

k=1

θ1

�

λ j −λk

�

, (2.20)

where θ1(λ) =∫ λ

0 θ (µ)dµ. The Bethe equations follow from this by setting ∂ S∂ λ j= 0, which

corresponds to a minimum. To show this is indeed a minimum, we need to have a positivedefinite matrix of second derivatives:

∂ 2S∂ λ j∂ λl

= δ jl

�

L +N∑

m=1

C�

λ j −λm

�

�

− C�

λ j −λl

�

, (2.21)

where we have the kernel

C(λ−µ) = dφ(x)dx

�

�

�

�

x=λ−µ=

2c

c2 + (λ−µ)2. (2.22)

By introducing a vector v j with real components, we get [1]

∑

j,l

∂ 2S∂ λ j∂ λl

v j vl =N∑

j=1

Lv2j +

N∑

j>l=1

C�

λ j −λm

��

v j − vl

�2 ≥ LN∑

j=1

v2j > 0, (2.23)

meaning that the action is convex, and the minimum defining the Bethe equations is unique.

Theorem 2. All solutions of the Bethe equations for the Lieb-Liniger model are real.

Proof. We consider the set of complex solutions to the Bethe equations {λ j}, where we callthe value with the largest complex part in this set λmax:

Imλmax ≥ Imλ j . (2.24)

If we put this into the non-logarithmic form of the Bethe equations and take the modulus weget

�

�eiλmax L�

�=

�

�

�

�

�

∏

k

λmax −λk + icλmax −λk − ic

�

�

�

�

�

≥ 1, (2.25)

since�

�

�

�

λ+ icλ− ic

�

�

�

�

≥ 1 if Imλ≥ 0. (2.26)

This means that Imλmax ≤ 0 since only then eiλmax L ≥ 1. Defining Imλmin ≤ λ j as thesmallest imaginary part of a rapidity we can in the same way prove that Imλmin ≥ 0. Takingthese statements together we see that Imλ j = 0∀ j, and hence all λ’s are real [1].

Theorem 3. If I j > Ik, then λ j > λk. If I j = Ik then λ j = λk.

2.2. THE THERMODYNAMIC LIMIT 13

Proof. Subtracting the relevant Bethe equations from each other we get

λ j −λk +1L

N∑

l=1

�

φ�

λ j −λl

�

−φ�

λk −λl

��

=2πL

�

I j − Ik

�

. (2.27)

Because |atan(x)− atan(y)| ≤ |x − y|, the left side has the same sign as its first term, sincethe first term in the sum is larger than the second term, hence showing the theorem to betrue [1,14].

2.1.3 The Gaudin norm

The norm of a Bethe eigenstate has an expression proportional to a determinant. This resultwas first conjectured by Gaudin and for the Lieb-Liniger model takes the form [16]

⟨{λ}|{λ}⟩= cNN∏

k=1

N∏

j=k+1

λ2jk + c2

λ2jk

det (G({λ})), (2.28)

where G({λ}) is the Gaudin matrix of the state, with elements

G({λ}) jk = δ jk

�

L +N∑

m=1

C�

λ j −λm

�

�

− C�

λ j −λk

�

, (2.29)

and λ jk = λ j −λk.

2.2 The thermodynamic limit

2.2.1 The ground state

In going to the thermodynamic limit we let N and L tend to infinity, keeping D = N/Lconstant. We first introduce a continuum version of the Bethe equations in terms of thefunction λ(x):

Lλ(x) +N∑

k=1

θ (λ(x)−λk) = 2πLx . (2.30)

where λ(x) takes values that are solutions to the original Bethe equations when x = I j/L:λ j = λ(I j/L). If we invert the equation and take its derivative with respect to λ we get thedensity of particles in momentum space:

ρ(λ(x)) =dx(λ)

dλ. (2.31)

This allows us to differentiate equation (2.30) with respect to λ and write it as

1+1L

N∑

k=1

C(λ(x)−λk) = 2πρ(λ(x)). (2.32)

If we now change the sum to an integral we get

1L

∑

l

C(λ(x)−λl)→∫ λF

−λF

C(λ−µ)ρ(µ)dµ, (2.33)

2.2. THE THERMODYNAMIC LIMIT 14

leading to [1]

ρ(λ)−1

2π

∫ λF

−λF

C(λ−µ)ρ(µ)dµ= 12π

. (2.34)

This is the Lieb equation, which is the same as Love’s equation in electrostatics, describingthe capacitance of a circular capacitor [14]. Together with

D =NL=

∫ λF

−λF

ρ(λ)dλ (2.35)

this allows for the calculation of ρ(λ) for a given value of D.

2.2.2 Excitations

Given the ground state we can build different excited states. Two types of excitations exist:particle-like (type I) excitations in which a new particle outside the Fermi interval is addedwith momentum k > |λF | and hole-like (type II) excitations, where a particle is removedfrom the Fermi interval with momentum k < |λF |. A particle may also be excited out of theFermi interval, but this is a combination of the other two excitations [11].

The N -particle ground state in terms of Bethe numbers is

{I}=§

−N − 1

2,−

N − 32

, . . . ,N − 3

2,

N − 12

ª

. (2.36)

Adding an excitation, changing the state to an N + 1-particle state, results in

{I}=§

−N2

,−N2+ 1, . . . ,

N2− 1,

N2+ qª

, (2.37)

for q > 0. Due to the coupled and non-linear nature of the Bethe equations, the addition of asingle particle results in a change for all rapidities of the system [11].

We define the displacement of the rapidities as [13]

d j

L= λ j −λ0

j , (2.38)

with the superscript denoting that the rapidity is part of the ground state rapidity set.Subtracting the Bethe equations of the ground state from the excited state we get [11,13]

d j = −π−φ�

λ j − q�

−N∑

l=1

�

φ�

λ j −λl

�

−φ�

λ0j −λ

0l

��

, (2.39)

where q is the rapidity of the new particle. Expanding the term in brackets gives1

d j = −π−φ�

λ j − q�

−1L

N∑

l=1

2c

c2 +�

λ0j −λ

0l

�2

�

d j − dl

�

+O�

NL2

�

, (2.40)

1The expansions goes as follows:

φ�

λ j −λl

�

−φ�

λ0j −λ

0l

�

= φ�

λ0j −λ

0l +∆λ

�

−φ�

λ0j −λ

0l

�

where ∆λ=1L

�

d j − dl

�

≈∆λ∂φ(x)∂ x

�

�

�

�

x=λ0j −λ

0l

=1L

�

d j − dl

� 2c

c2 +�

λ0j −λ

0l

�2 .

2.3. DISPLACEMENT FUNCTIONS FOR EXCITATIONS 15

or, by rearranging and turning the sums into integrals, extending d j → d(λ, q) to the contin-uum

d(λ, q)

�

1+ 2π

∫ λF

−λF

dλ′C�

λ−λ′�

ρ�

λ′�

�

=

−π−φ�

λ− q�

+ 2π

∫ λF

−λF

dλ′C�

λ−λ′�

ρ�

λ′�

d�

λ′, q�

. (2.41)

If we use the Lieb equation and introduce Dp(λ, q) = d(λ, q)ρ(λ), which is the displacementfunction for particle-like excitations, we get the integral equation [11,13]

Dp

�

λ, q�

−∫ λF

−λF

dλ′C�

λ′ −λ�

Dp

�

λ′, q�

=−12π

�

π+φ(λ− q)�

. (2.42)

When dealing with holes we use the same idea. Starting from the N -particle ground statewe can create a hole-like excitation by removing a particle in the Fermi interval to create anN − 1-particle state with quantum numbers

{I}=§

−N2+ 1, . . . ,

N2−m− 1,

N2−m+ 1, . . . ,

N2

ª

. (2.43)

Here 0 < m < N2 and we get a hole displacement function described by the integral equa-

tion [13]

Dh

�

λ, q�

−∫ λF

−λF

dλ′C�

λ−λ′�

Dh

�

λ′, q�

=1

2π

�

π+φ(λ− q)�

. (2.44)

2.3 Displacement functions for excitations

From the previous section we have integral equations for the displacement functions of theLieb-Liniger model. We now turn to solving these equations. To that end we introduce thetruncated kernel:

C(F)�

λ,λ′�

= θ�

λF −�

�λ′�

�

�

C�

λ−λ′�

θ�

λF − |λ|�

. (2.45)

Its associated inverse is defined by�

1+L(F)�

∗�

1− C(F)� �

λ,λ′�

= δ�

λ−λ′�

, λ,λ′ ∈ R, (2.46)

where we define ∗ as the convolution:

( f ∗ g)(x) =

∫ ∞

−∞dy f (x − y)g(y). (2.47)

With this we can rewrite the differential equations for the displacement functions in equa-tion (2.42) and equation (2.44) to be valid for a larger domain.

Using equation (2.45) we can write the particle displacement integral equation as

�

1− C(F)�

∗ Dp

�

λ,λp

�

= −θ�

λF − |λ|�

2π

�

πsgn�

λp

�

+φ�

λ−λp

��

(2.48)


for which the solution is

Dp

�

λ,λp

�

= −1

2π

∫ λF

−λF

dλ′�

δ�

λ−λ′�

+L(F)�

λ,λ′��

πsgn�

λp

�

+φ�

λ′ −λp

��

, (2.49)

which is valid for all�

�λp

�

�> λF and all λ. We can use that

−�

πsgn�

λp

�

+φ�

λ′ −λp

��

=

∫ sgn(λp)∞

λp

dλ′d

dλ′φ�

λ−λ′�

= −∫ sgn(λp)∞

λp

dλ′C�

λ−λ′�

,

(2.50)

leading to [13]

Dp

�

λ,λp

�

= −1

2π

∫ sgn(λp)∞

λp

dλ′∫ λF

−λF

dλ′′�

δ�

λ−λ′′�

+L(F)�

λ,λ′′��

C�

λ′′ −λ′�

. (2.51)

For holes we have

�

1− C(F)�

∗ Dh

�

λ,λh

�

=θ�

λF − |λ|�

2π

�

πsgn(λh) +φ(λ−λF )�

, (2.52)

for which the solution is

Dh

�

λ,λh

�

=1

2π

∫ λF

−λF

dλ′�

δ�

λ−λ′�

+L(F)�

λ,λ′��

πsgn(λh) +φ�

λ′ −λh

��

. (2.53)

Rewriting using

�

πsgn(λh) +φ(λ−λh)�

=

∫ λh

−sgn(λh)∞dλ′

ddλ′φ�

λ−λ′�

= −∫ λh

−sgn(λh)∞dλ′C

�

λ−λ′�

(2.54)

gives

Dh

�

λ,λh

�

= −1

2π

∫ λh


∫ λF

−λF

dλ′′�

δ�

λ−λ′′�

+L(F)�

λ,λ′′��

C�

λ′′ −λ′�

, (2.55)

which is valid for all |λh|< λF and all λ [13].We now simplify the displacement functions for particle-like (type I) and hole-like (type II)

excitations, first for the ground state and then for excited states. Note that the expression wefind differ from those in e.g. [1], because they assume that in the displacement functionsL(F) inverts C, which is not the case.

2.3.1 The ground state

For type I excitations we have the displacement function equation (2.51). We can rewritethis in a more explicit way. Lets first focus on the part involving the δ-function, which we


can rewrite, using a Heaviside step function, as

−∫ sgn(λp)∞

λp

dλ′∫ λF

−λF

dλ′′δ�

λ−λ′′�

C�

λ′′ −λ′�

(2.56)

= −∫ sgn(λp)∞

λp

dλ′∫ ∞

−∞dλ′′θ

�

λF −�

�λ′′�

�

�

δ�

λ−λ′′�

C�

λ′′ −λ′�

(2.57)

= −θ�

λF − |λ|�

∫ sgn(λp)∞

λp

dλ′C�

λ−λ′�

. (2.58)

We change variables a = λ−λ′:

− θ�

λF − |λ|�

∫ sgn(λp)∞

λp

dλ′C�

λ−λ′�

= θ�

λF − |λ|�

∫ λ−sgn(λp)∞

λ−λp

daC�

a�

. (2.59)

Assuming λ to be finite, the upper bound of the integral reduces to −sgn(λp)∞.Equation (2.59) is 0 if |λ|> λF , so the only λ’s we have to consider in the integral are

|λ|< λF , but from the definition of the displacement we know�

�λp

�

�> λF , meaning�

�λp

�

�> |λ|for all λ in the integral. Then λ− λp always has the opposite sign from λp. Thus we canrewrite equation (2.59) as

θ (λF − |λ|)∫ λ−sgn(λp)∞

λ−λp

daC(a) = θ (λF − |λ|)∫ −sgn(λp)∞

−sgn(λp)|λ−λp|daC(a). (2.60)

Since the kernel is symmetric, the value of the integral does not depend on whethersgn

�

λp

�

is +1 or −1. This only adds a sign-function in front of the integral:

θ�

λF − |λ|�

∫ −sgn(λp)∞

−sgn(λp)|λ−λp|daC

�

a�

= −sgn�

λp

�

θ�

λF − |λ|�

∫ ∞

|λ−λp|daC

�

a�

(2.61)

= −sgn�

λp

�

θ�

λF − |λ|�

�

π− 2atan

��

�λ−λp

�

�

c

��

(2.62)

= −2sgn�

λp

�

θ�

λF − |λ|�

atan

�

c�

�λ−λp

�

�

�

, (2.63)

where the last equality is valid because the argument of the arctangent is strictly positive.The above expression would not be valid without the Heaviside function, since the integralcan only be solved in the manner above when |λ|< λF .

The second part of equation (2.51), which involves L(F)�

λ,λ′�

, can’t be rewritten inclosed form, since there is no closed form solution for L(F)

�

λ,λ′�

. But we can rewrite thisterm to get rid of the improper integral, which will help in numerical evaluation of the


expression. We have

−∫ sgn(λp)∞

λp

dλ′∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

C�

λ′′ −λ′�

(2.64)

= −∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

∫ sgn(λp)∞

λp

dλ′C�

λ′′ −λ′�

(2.65)

=

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

∫ λ′′−sgn(λp)∞

λ′′−λp

daC�

a�

(2.66)

= −sgn�

λp

�

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

∫ ∞

|λ′′−λp|daC

�

a�

(2.67)

= −sgn�

λp

�

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

�

π− 2 atan

��

�λ′′ −λp

�

�

c

��

(2.68)

= −2sgn�

λp

�

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

atan

�

c�

�λ′′ −λp

�

�

�

, (2.69)

where, as before, we argued that�

�λ′′�

�<�

�λp

�

� and thus that λ′′ −λp always has the oppositesign from λp.

Taking these results together, the displacement function for type I excitations becomes

Dp

�

λ,λp

�

= −1π

sgn�

λp

�

θ�

λF − |λ|�

atan

�

c�

�λ−λp

�

�

�

−1π

sgn�

λp

�

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

atan

�

c�

�λ′′ −λp

�

�

�

. (2.70)

Figure 2.1 shows an example.For type II excitations the idea is much the same as type I. Recalling equation (2.55), the

displacement is

Dh(λ,λh) = −1

2π

∫ λh


∫ λF

−λF

dλ′′�

δ�

λ−λ′′�

+L(F)�

λ,λ′′��

C�

λ′′ −λ′�

, (2.71)

which is valid for all λh (if |λh| < λF ) and all λ [13]. The δ-function part of this equationbecomes

−∫ λh


∫ λF

−λF

dλ′′δ�

λ−λ′′�

C�

λ′′ −λ′�

= −θ�

λF − |λ|�

∫ λh


�

λ−λ′�

.

(2.72)

A change of variables a = λ−λ′ results in:

−θ�

λF − |λ|�

∫ λh


�

λ−λ′�

= θ�

λF − |λ|�

∫ λ−λh

λ+sgn(λh)∞daC

�

a�

. (2.73)


1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

12.2

12.1

12.0

11.9

11.8

11.7

11.6

11.5

Dp

Figure 2.1: Dp(λ,λp) plotted for c = 4, λF = 1, λp = 1.5.

Since we again assume λ to be finite, the lower bound of the integral reduces to sgn(λh)∞.Because both |λh|< λF and |λ|< λF (due to the Heaviside function), we can no longer makestatements about the sign of λ−λh and have to be a bit more general in our expression. Weget

θ (λF − |λ|)∫ λ−λh

sgn(λh)∞daC(a) = θ (λF − |λ|)

�

2atan�

λ−λh

c

�

− sgn(λh)π�

. (2.74)

For hole-like excitations, the part of the displacement function involving L(F)(λ,λ′)becomes

−∫ λh


∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

C�

λ′′ −λ′�

(2.75)

= −∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

�

∫ λh


�

λ′′ −λ′�

�

(2.76)

=

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

�

∫ λ′′−λh

sgn(λh)∞daC

�

a�

�

(2.77)

=

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

�

2 atan�

λ′′ −λh

c

�

− sgn�

λh

�

π

�

, (2.78)

assuming λ′′ finite.The expression for the displacement of type II excitations is thus

Dh

�

λ,λh

�

=1πθ�

λF − |λ|�

�

atan�

λ−λh

c

�

− sgn�

λh

�π

2

�

+1π

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

�

atan�

λ′′ −λh

c

�

− sgn�

λh

�π

2

�

, (2.79)

with an example plotted in figure 2.2.


1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.0014.4

14.2

14.0

13.8

13.6

Dh

Figure 2.2: Dh(λ,λh) plotted for c = 4, λF = 1, λh = 0.5.

2.3.2 Calculating L(F)

In the above expressions we still have L(F), for which there is no closed form solution, andwhich thus has to be numerically evaluated.

Recalling equation (2.46), the definition of L(F) is [13]�

1+L(F)�

∗�

1− C(F)� �

λ,λ′�

= δ�

λ−λ′�

, λ,λ′ ∈ R (2.80)

or explicitly∫ ∞

−∞dλ′′

�

δ�

λ−λ′′�

+L(F)�

λ,λ′′��

δ�

λ′′ −λ′�

− C(F)�

λ′′,λ′��

= δ�

λ−λ′�

, (2.81)

with λ,λ′,λ′′ ∈ R. If we perform the integration, cancel common terms and rearrange, weget the integral equation

L(F)�

λ,λ′�

=

∫ ∞

−∞L(F)

�

λ,λ′′�

C(F)�

λ′′,λ′�

+ C(F)�

λ,λ′�

(2.82)

= θ�

λF −�

�λ′�

�

�

∫ λF

−λF

dλ′′L(F)�

λ,λ′′�

C�

λ′′ −λ′�

+ C(F)�

λ,λ′�

. (2.83)

Now we can make use of the method of successive approximation [17] to numericallysolve this equation (a Fredholm integral equation of the second kind).

If we have an integral equation of the form

φ(x) = f (x) +ω

∫ b

aK(x , t)φ(t)dt, (2.84)

where K(x , t) is a continuous integration kernel on the interval [a, b], then it seems reasonableto assume that to lowest order the solution of the integral equation is f (x), under thecondition that ω is small. This gives a lowest order approximation φ0(x) = f (x), withfurther substitution yielding

φ1(x) = f (x) +ω

∫ b

aK(x , t)φ0(t)dt, (2.85)


1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

14.8

14.6

14.4

14.2

14.0

(F)

Figure 2.3: L(F)(λ,λ′) plotted for c = 4, λF = 1, λ′ = 0.

which we hope to be more tractable than the original equation. In general the n + 1-thapproximation is defined in terms of the n-th approximation as

φn+1(x) = f (x) +ω

∫ b

aK(x , t)φn(t)dt. (2.86)

A sufficient but not necessary criterion for convergence is |ω|‖K‖2 < 1, with ‖K‖2 theL2-norm:

‖K‖2 =

�

∫ b

a

∫ b

a|K(x , t)|2dxdt

�1/2

. (2.87)

Here we have

L(F)n+1

�

λ,λ′�

= θ�

λF −�

�λ′�

�

�

∫ λF

−λF

L(F)n

�

λ,λ′′�

C�

λ′′ −λ′�

dλ′′ + C(F)�

λ,λ′�

, (2.88)

which converges as long as�

8λF

catan

�

2λF

c

��1/2

< 1. (2.89)

If we only want to know the value of L(F) at a single point we can use that at each iterationstep L(F) is a number, allowing us to take it out of the integral. We get

L(F)n+1

�

λ,λ′�

= θ�

λF −�

�λ′�

�

�

L(F)n

�

λ,λ′�

∫ λF

−λF

C�

λ′ −λ′′�

dλ′′ + C(F)�

λ,λ′�

(2.90)

= −2θ�

λF −�

�λ′�

�

�

L(F)n

�

λ,λ′�

�

atan�

λ′ +λF

c

�

− atan�

λ′ −λF

c

��

+ CF�

λ,λ′�

, (2.91)

where L(F)0

�

λ,λ′�

= 0. Figure 2.3 shows a numerical evaluation of L(F).

2.4. FORM FACTORS 22

2.3.3 Excited states

For excited states the displacement functions for type I and II excitations are respectively [13]

Dp

�

λ,λp

�

= −1

2π

∫ sgn(λp)∞

λp

dλ′∫ λF

−λF

dλ′′�

δ�

λ−λ′′�

+L(F)�

λ−λ′′��

C�

λ′′ −λ′�

, (2.92)

and

Dh

�

λ,λh

�

= −1

2π

∫ λh


∫ λF

−λF

dλ′′�

δ�

λ−λ′′�

+L�

λ−λ′′��

C�

λ′′ −λ′�

. (2.93)

2.4 Form factors

Now we look at form factors for the Lieb-Liniger model, which we will require for calculatingdynamical structure functions (see chapter 3). Form factors or matrix elements can beobtained by using the algebraic Bethe ansatz (until now we used a coordinate representationof the wave function known as the coordinate Bethe ansatz). We will briefly introduce thealgebraic Bethe ansatz.

2.4.1 The algebraic Bethe ansatz

In the algebraic Bethe ansatz (ABA) we define our operators in terms of fields. The quantizedHamiltonian of the Lieb-Liniger model is

H =

∫ L

0

dx�

∂xΨ†(x)∂xΨ(x) + cΨ†(x)Ψ†(x)Ψ(x)Ψ(x)

�

, (2.94)

with the field operators obeying the commutation relation [Ψ(x),Ψ†(y)] = δ(x − y).In the ABA a fundamental object is the monodromy matrix, consisting of operators

A(λ), B(λ), C(λ), D(λ):

T (λ) =

�

A(λ) B(λ)C(λ) D(λ)

�

, (2.95)

where

A(λ) |0⟩= a(λ) |0⟩ , D(λ) |0⟩= d(λ) |0⟩ , (2.96)

when acting on the vacuum |0⟩. Physical states may be built using B(λ) and C(λ):

|{λ}⟩=N∏

j=1

B�

λ j

�

|0⟩ and ⟨{λ}|= ⟨0|N∏

j=1

C�

λ j

�

. (2.97)

In the Lieb-Liniger model we have [18,19]

a(λ) = e−i L2λ, d(λ) = ei L

2λ. (2.98)


The Yang-Baxter equation [1] encodes the commutation relations between the operators.This equation depends on an object known as the R-matrix. For the Lieb-Liniger model, theR-matrix is

R(λ,µ) =

f (λ,µ) 0 0 00 g(λ,µ) 1 00 1 g(λ,µ) 00 0 0 f (λ,µ)

, (2.99)

with

f (λ,µ) =λ−µ+ icλ−µ

, g(λ,µ) =icλ−µ

. (2.100)

Physical operators are defined in terms of the field operator. To explicitly determine theaction of the operators on states we use the commutation relations

�

Ψ(0), B(λ)�

= −ip

cA(λ),�

C(λ),Ψ†(0)�

= ip

cD(λ). (2.101)

This allows us to derive

Ψ(0)N∏

k=1

B(λk) |0⟩= −ip

cN∑

k=1

Λka(λk)N∏

m=1m 6=k

B(λm) |0⟩ , (2.102)

⟨0|N∏

k=1

C(λk)Ψ†(0) = i

pc

N∑

k=1

⟨0|N∏

m=1m 6=k

C(λm)eΛd(λk), (2.103)

with [19]

Λk =N∏

m=1m 6=k

f (λk,λm), eΛk =N∏

m=1m 6=k

f (λm,λk). (2.104)

With these tools, we can calculate form factors of operators. We have only skimmed thesurface of the ABA. References [1] and [20] contain a (much) more thorough introduction.

2.4.2 Form factor for the ρ operator

Now we can calculate the form factor of the density operator. We only present here the result,referring to [18] for the derivation.

The density operator is

ρ̂�

x�

=N∑

j=1

δ�

x − x j

�

, (2.105)

with�

x j

Nj=1 the positions of the particles in the gas, or in terms of fields

ρ̂(x) = Ψ†(x)Ψ(x). (2.106)


The form factor is [18]

⟨{µ}|ρ̂(0)|{λ}⟩=det

�

IN + U)

V+(p)− V−(p)

N∑

j=1

�

µ j −λ j

�

!

×

N∏

j=1

�

V+�

λ j

�

− V−�

λ j

��

N∏

j=1

N∏

k=1

�

λ j −λk + ic

µ j −λk

�

, (2.107)

with element of the matrix U given by

U jk = iµ j −λ j

V+�

λ j

�

− V−�

λ j

�

N∏

m=1m 6= j

�

µm −λ j

λm −λ j

�

�

C�

λ j −λk

�

− C�

p−λk

��

, (2.108)

and

V�

λ j

�±=

N∏

k=1

µk −λ j ± ic

λk −λ j ± ic. (2.109)

The parameter p is an arbitrary complex numbers which is not necessarily in the set ofrapidities of the states. We will generally set it to zero, since that simplifies numericalevaluation of equation (2.107) and the form factor does not depend on their values. Notethat for the Lieb-Liniger model V

�

λ j

�−= (V

�

λ j

�+)∗, since all rapidities are real in this case.

Including normalization we have for the form factors F [21]

F({µ}, {λ}) = ⟨{µ}|ρ̂(x)|{λ}⟩p

⟨{µ}|{µ}⟩ ⟨{λ}|{λ}⟩. (2.110)

If two states have the same momentum, the matrix element vanishes [18], while if {µ} = {λ}we have F = N

L .

Hilbert space is a big place.

Unknown

3The ABACUS algorithm

The ABACUS algorithm, developed by Caux [16,22–24], allows for the computation of dy-namical correlation functions. Despite some analytic progress, this is still mostly a numericalaffair.

Specifically we are interested in calculating two-point zero-temperature equilibrium1

correlation functions of the form¬

O j(t)O†j′(0)

¶

, (3.1)

where j = 1 . . . N denotes lattice sites. This is the expectation value for arbitrary timeseparations t and distances j − j′. The equivalent Fourier transformed quantity is thedynamical structure factor (DSF)

S(k,ω) =1N

N∑

j, j′=1

exp−ik( j− j′)

∫ ∞

−∞dt expiωt

¬

O j(t)O†j′(0)

¶

. (3.2)

We can construct equation (3.1) from this given knowledge of the DSF for all energies andmomenta, but equation (3.2) is often more useful because this quantity is directly accessiblein experiments.

Calculating equation (3.2) is often still complicated, but by introducing the Fouriertransform of the operators Ok =

∑

j exp−ik j O j, and performing the time integrations, weget the Lehmann representation of the DSF:

S(k,ω) =2πN

∑

µ

�

�

λ0�

�Ok

�

�µ��

�

2δ�

ω− Eµ + E0

�

, (3.3)

where we sum over intermediate eigenstates of the Hamiltonian under consideration, withE0 the energy of the ground state

�

�λ0�

. We call the matrix elements ⟨λ|O|µ⟩ form factors.To calculate DSFs we require:

• an orthonormal eigenstate basis;

1At the time of writing ABACUS is also capable of calculating correlation function at arbitrary temperatureand out-of-equilibrium (having been developed further since the appearance of [2] in 2009), but we will notconsider that here further.

25

3.1. NUMERICALLY SOLVING THE BETHE EQUATIONS 26

• form factors for the operator Ok we are interested in;

• a way to do the summation over the intermediate states.

The Bethe ansatz provides the first and second item. The summation proves to be moredifficult, but for this we can use ABACUS (Algebraic Bethe Ansatz-based Computation ofUniversal Structure factors) to perform it numerically. As long as a system is Bethe ansatzintegrable and finite in size ABACUS can in principle compute its DSF. Even for large systemsthe results are highly accurate, with this accuracy being mostly energy and momentumindependent [2]. In fact, the accuracy can be measured using sum rules, and saturationsabove 99% are possible for systems with hundreds of particles.

3.1 Numerically solving the Bethe equations

As we saw in the previous chapter, we need to know the rapidities of states in order tocalculate form factors and energies. In general the computation of rapidities of Betheintegrable systems is non-trivial, with complex rapidities making the computation difficult.For the Lieb-Liniger model there is a simplifying condition, since the rapidities are all realand the Yang-Yang action of the model is convex (see theorem 1), meaning there is a uniquesolution for a given set of quantum numbers.

In this case we can use the matrix Newton method to calculate rapidities. To introducethis method we first look at the one-dimensional case. If we want to find the value of x thatsolves f (x) = 0, we can iteratively approximate this value by

xk+1 = xk −f�

xk�

f ′(xk), k = 0,1, 2, . . . , (3.4)

with given starting value x0. This converges quadratically (meaning on each iteration k+ 1the error εk+1 is of order ε2

k).Now we can turn to the case at hand, where we solve the Bethe equations. These are

coupled non-linear equations of the general form f(x) = 0. In this case we can find thesolution x by iteratively applying the formula

xk+1 = xk −�

J�

xk��−1

f�

xk�

, k = 0, 1,2 . . . , (3.5)

where J is the Jacobian matrix of f and x0 ∈ Rn. As in the single variable case this convergesquadratically.

To prevent the calculation of the inverse of the Jacobian, in practice Newton’s methodtakes the form

J�

xk��

xk+1 − xk�

= −f�

xk�

, (3.6)

where, given xk, we calculate J�

xk�

and f�

xk�

, and then solve the system of linear coupledequations to get ∆x= xk+1 − xk, which is added to xk to get xk+1 [25]. This linear systemcan be solved using LU decomposition [26].2

2In an LU decomposition we rewrite the set of linear equations:

A · x= (L ·U) · x= L · (U · x) = b, (3.7)

decomposing L ·U = A in a lower and an upper triangular matrix. The advantage of this is that we can now solvethis system in two parts, first solving L · y= b, and then solving U · x= y, which is easier to do with a triangularmatrix [26].

3.1. NUMERICALLY SOLVING THE BETHE EQUATIONS 27

In ABACUS we are of course interested in the Bethe equations. Here we will specificallylook at those of the Lieb-Liniger model. Recalling chapter 2, the Bethe equations in logarithmicform are

λ j +1L

N∑

l=1

φ�

λ j −λl

�

=2πL

I j , j = 1, . . . , N , (3.8)

or

2πL

I j −λ j −1L

N∑

l=1

φ�

λ j −λl

�

= B�

{λ}�

, (3.9)

with

φ�

λ j −λl

�

= 2 arctan

�

λ j −λl

c

�

. (3.10)

If we recast this is the shape of equation (3.6), we get

G��

λk�

∆λ= −B��

λk�

, (3.11)

where we used that the Jacobian of the Bethe equations is equal to the Gaudin matrix(equation (2.29)) [2].

Additionally, damping may be used while solving the system, with the updating of λlooking like

λk+1 = λk + γ∆λ. (3.12)

Defining the difference squared as

∆2 =1N

N∑

i=1

�

∆λi

λi

�2

, (3.13)

the update algorithm for γ at each iteration is

if ∆2k+1 >∆

2k and γk > 0.5 then

γk+1 = 0.5γkelse if ∆2

k+1 <∆2k then

γk+1 = 1elseγk+1 = γk

The Bethe equations for real rapidities are considered solved when [2]

∆2 < 104 × ε2, (3.14)

where ε is machine precision for the datatype double, which is “the difference between1 and the least value greater than 1 that is representable” [27]. On machines imple-menting IEEE 754–2008 (the standard for floating-point arithmetic) [28] this is equal to2−52 ≈ 2.22× 10−16.

3.2. SCANNING OVER THE HILBERT SPACE 28

3.2 Scanning over the Hilbert space

With the rapidities in hand (and the form factors and Gaudin norm readily computed there-from, see chapter 2), we are now faced with the summation over states. At face valuewe may now hit a wall, since we are tasked with exploring a factorially large space andfind eigenstates with monotonically decreasing absolute numerical form factor values. Thedimensionality of the state space is equal to the number of allowed choices of quantumnumbers because a Pauli-like principle is at work. For the Lieb-Liniger model (where allrapidities are real, and for a particle preserving operator), given N particles and a UV cutofffor the upper bound, I∞+, and lower bound of the quantum number I∞−, the number ofstates is [2]3

�

I∞+ + I∞− + 1N

�

=(I∞+ + I∞− + 1)!

N !(I∞+ + I∞− + 1− N)!. (3.15)

Luckily, there are some principles that allow for the definition of an efficient algorithm forscanning the Hilbert space.

Firstly, form factors for large enough systems are approximately continuous, meaning thechanging of a single quantum number does not change the form factor a lot. Thus, givenan eigenstate with a large form factor and thus a large contribution to the DSF, we can scanstates close to this given state to find other large contributions.

Secondly, form factors between the ground state and excited states generally decreasewhen more particle-hole excitations are created in the area in pseudo-momentum spaceclamped by the Fermi level.

Thirdly, as quantum numbers become larger, moving further from the Fermi level inquantum number space, the form factors also decrease [2].

With these principles we can start to define a protocol for scanning the Hilbert space.Starting from the lowest-energy state we generate states with low-lying particle-hole pairexcitations in the Fermi interval (low-lying meaning the states are close to the center of theinterval). If the contributions from these states to the chosen sum rule are large enough,further scanning of this state commences, with increasing complexity of the state in the formof more particle-hole pairs.

In the context above, large enough means that the form factor associated with thestate is above the so-called “running threshold”, with initial value such that the number ofintermediate states is of order N2. ABACUS saves the form factors, and the threshold islowered after a scan. A new scan is initiated of those sectors in the Hilbert space for whichthe average form factor is above the current running threshold. ABACUS thus keeps track ofthe already scanned states in the Hilbert space to gather knowledge on where to scan nextand to avoid doubly counting states [2].

In practice we are therefore descending a “mountain” of form factors as slowly as possible,always trying to move in the direction of smallest change. This proves especially difficultsince the form factors are not monotonically decreasing while moving around in the Hilbertspace. ABACUS can not fully compensate for this non-monotonicity, as seen in figure 3.1,where we show the absolute magnitude of form factors as a function of their order in thecomputation. We see both that some highly contributing form factors are missed early on,appearing as spikes in the distribution later, and we see small form factors spuriously summedearly, resulting in wasted computations.

3This is the number of states if particle Bethe numbers can not exceed the UV cutoffs. In the currentimplementation of ABACUS scanning within a single momentum slice is possible, which may include states inwhich individual particles exceed Imax, but which sum to lie within the allowed interval. Therefore this numbercan now be seen as a lower limit on the size of the Hilbert space under consideration.


Figure 3.1: Size of matrix element magnitudes encountered while summingintermediate states of the XXZ model. The yellow line denotesstrictly decreasing magnitude. Note that while in general the sizeof the form factors decreases, not all large form factors are foundearly on. Figure taken from [2].

After an ABACUS run, the available data consists of a set of energy-momentum-formfactor triplets. The quality of this data, meaning how well it captures important contributionsfrom the Hilbert space, can be determined with sum rules. A suitable candidate of these isthe f-sum rule [23], which relates the first energy moment over the DSF to a function of k.With this sum rule the intensity in individual momentum slices can be explicitly determined.For the density-density correlation function of the Lieb-Liniger model this becomes

∫ ∞

−∞

dω2πωS(k,ω) =

1N

∑

µ

(Eµ − E0)�

�

λ0�

�ρ�

�µ��

�

2=

NL

k2. (3.16)

Appendix B contains a full derivation of the generic formula, as well as the specific caseabove.

Figure 3.2 shows the DSF in energy-momentum space for the density-density correlatorof the Lieb-Liniger model obtained from ABACUS runs. In table 3.1 we see the efficiency ofABACUS. It presents the number of states necessary to reach a given saturation in the XXZ


Figure 3.2: Plots of the dynamical structure factor of the Lieb-Liniger modelfor different values of the interactions parameter γ. In this case wehave L = 100 and N = 100. Figure taken from [23]

N = 50 N = 100 N = 200 N = 400

90% 235 (4.9× 10−12) 944 (6.8× 10−26) 8772 (5.3× 10−54) 117373 (3.5× 10−111)95% 385 (8.2× 10−12) 2492 (1.8× 10−25) 27096 (1.6× 10−53) 573466 (1.7× 10−110)99% 1515 (3.2× 10−11) 18490 (1.3× 10−24) 469120 (2.8× 10−52)99.9% 7478 (1.6× 10−10) 141560 (1.0× 10−23)99.99% 26724 (5.7× 10−10) 932706 (6.8× 10−23)

Table 3.1: Number of states required for a given sum rule saturation in the XXZmodel with∆ = 0.6 and Sz

tot = 0.1. In parentheses are the fraction ofthe Hilbert space fraction this represents. Table reproduced from [2].

model4 and the fraction of the Hilbert space necessary to reach this saturation. Although thealgorithm is not perfect, only small fractions are necessary.

4The XXZ model is another Bethe ansatz integrable model, consisting of spins, with Hamiltonian [11]

H = JN∑

n=1

�

S xn S x

n+1 + S yn S y

n+1 +∆SznSz

n+1

�

. (3.17)

Real stupidity beats artificial intelli-gence every time.

Terry Pratchett, Hogfather

4Machine learning techniques

4.1 Machine learning in physics

In the past decade machine learning has emerged as an important tool in many areas. Theincrease in computing power for both CPUs and GPUs (useful for parallel linear algebraoften used in machine learning) has led to a boom in research on both the commercialand academic side. Some of the more high profile examples of this are self-driving cars (indevelopment by many technology and car companies) [29] and the development of AlphaGo,which beat the world champion in the Japanese board game of Go by learning from bothhuman professional games and playing against itself [5–7]. In the life sciences machinelearning uses include detecting risk factors for cardiovascular diseases from pictures of theretina [30] and recognizing genome variants in sequencing (with better performance thanhandcrafted sequencers) [31]. Image manipulation is also benefitting from machine learning,with the ability to change the weather in pictures or transfer the species of animal in onepicture to that in another [32].1

In physics applications have also been found in a variety of fields. CERN is exploring waysof tagging particles in LHC collisions using deep learning [33]. Astronomy uses machinelearning to classify objects in images and determine redshifts, which have to be automateddue to the often large data sets resulting from even a single night of observations [34].

Condensed matter physics is now also benefitting from this field. Recently, Carleoand Troyer [35] proposed a way to find the ground state wave function of spin systemsby approximately encoding the state of a system in a neural network. This allows foraccurate approximations of the ground state energy while avoiding the exponential numberof parameters encountered when representing the wave function. This technique has beenextended to different systems such as the Hubbard model [36]. Phase transitions and orderparameters have also been extracted from both static and periodically driven systems, usingonly measurable quantities such as the magnetization of individual spins [37].

Reinforcement learning has been used to design setups for optical experiments whichneed complex entangled states. Since the setups are often difficult to design, reinforcement

1The implications of this brave new world in which no picture is to be trusted are part of a discussion that isonly just beginning.

31

4.2. MACHINE LEARNING: THE BASICS 32

Figure 4.1: A cat. While this image is instantly recognizable to a human, trans-lating such recognition to a computer proves non-trivial. Imagecourtesy of E. W. Zwart.

learning is used, where the computer receives a reward if it creates an ‘interesting’ state [38].Interesting links also exist between deep learning and physics on a more fundamental

level. For example, an exact mapping was established between the variational renormalizationgroup and deep learning, implying that deep learning works by extracting relevant datafeatures through coarse-graining, similar to the renormalization group [39].

In the rest of this chapter we will review some aspects of machine learning useful forsolving Bethe ansatz solvable models.2

4.2 Machine learning: the basics

Today’s world is awash in data. With the increase in computational power that has swept theworld we are now being flooded by it. Making sense of this flood is becoming increasinglyimportant and requires us to find patterns in the chaos. For numbers this is relatively easy: ifwe have an input set (1,2,3) and output set (1,4,9), some basic mathematics shows thatthe function f (x) = x2 links the inputs and outputs. But what about more complex data? Apicture of a cat (figure 4.1) may be instantly recognized by a human, but translating thisrecognition to a computer is non-trivial. Which pixels encode the “catness” of a picture asopposed to its “dogness”? Manually writing such a function proves to be difficult, with theresulting function performing poorly, but there is a way of doing this in the form of neuralnetworks.

4.2.1 Neural networks

Neural networks allow for the efficient approximation of functions which map inputs tooutputs. Given an input x and an output t (for target), a neural net f acts as f (x)≈ t.

A neural net consists of layers, with each layer composed of nodes. Connections existsbetween these layers. The layers between the input and the output layer are called hidden.We can now construct a simple neural network with a single hidden layer.

2Note that we always talk here about applying machine learning to quantum mechanics, not about machinelearning on quantum computers (see [38]).


If we have an input x of size D to a network, the first layer looks like

a j =D∑

i=1

w(1)ji x i +w(1)j0 , (4.1)

with j = 1, . . . , M and superscript (1) denoting it as the first layer. We call a j the activation.

The parameters w(1)ji are called weights, with w(1)j0 being called biases. Equation (4.1) con-structs M linear combinations of the input. These linear combinations are then transformedwith a differentiable, nonlinear activation function h(a j) [40]. The nonlinearity is key here,because it allows the system to act as a general function approximator [41].

We can construct a second layer in the same manner:

ak =M∑

j=1

w(2)k j

�

h(a j)�

+w(2)k0 , (4.2)

with k = 1, . . . , K and K the total number of outputs. Then an appropriate activation functionis applied to the ak ’s to generate the network outputs yk. The problem we are trying to solvedetermines the appropriate output activation functions. For regression (finding continuousvariables) the output would be linear (yk = ak), while for classification we can use a logisticsigmoid function yk = σ(ak), with

σ(a) =1

1+ e−a. (4.3)

Taking it all together we get a network that looks like (using a sigmoid as the lastactivation)

yk(x,w) = σ

M∑

j=1

w(2)k j h

� D∑

i=1

w(1)ji x i +w(1)j0

�

+w(2)k0

!

, (4.4)

with w the set of all weight and bias parameters. Evaluation of equation (4.4) is known asforward propagation and the network is a feedforward network: there are no layers whichfeed back information to previous layers.

The bias parameters can be absorbed in the weight parameters by defining an additionalinput variable x0 = 1, resulting in

yk(x,w) = σ

M∑

j=0

w(2)k j h

� D∑

i=0

w(1)ji x i

�

!

. (4.5)

Now we get to training the model, or, in other words, tuning the network weights insuch a way that the network can link inputs to outputs. The ultimate goal here is to get thenetwork to generalize, providing useful predictions for data not present in the training set. Ifwe have a training set {xn} and target set {tn}, with n = 1, . . . , N , during training we willminimize the error function3

E(w) =12

N∑

n=1

(y(xn,w)− tn)2. (4.6)

3This is only one possible choice for the error function. For regression other choices include the mean squareerror and the absolute error.


Note that a problem that could occur during training is overfitting to the training set, meaningthe error is small but the network does not perform well when given new samples. To detectthis and be able to mitigate it we can use a validation set separate from the training set onwhich we determine the error [40].

Training now consists of finding a weight vector w which minimizes equation (4.6). If wemake a small step in weight vector space, w+δw, the resulting change in the error functionis δE ≈ δwT∇E(w). The error function is a smooth continuous function, so its minimumwill occur when ∇E(w) = 0. Locating such a minimum is non-trivial due to the non-linearnature of the error function. Points with ∇E(w) = 0 are often local minima, and in a twolayer network with M hidden units such as described above any point in weight space ispart of a set of M !2M equivalent points. Finding an exact solution for the minimum is thusunfeasible, prompting instead the use of iterative approaches of the form

w(τ+1) =w(τ) +∆w(τ), (4.7)

with τ labelling the iteration step.A simple way to perform this iterative optimization is through gradient descent. Updates

move the weights in the direction of the negative gradient:

w(τ+1) =w(τ) −η∇E�

w(τ)�

. (4.8)

The parameter η > 0 is known as the learning rate.4 Because the error is defined over theentire training set, at each iteration the entire set has to be processed when evaluating ∇E.This is inefficient and means that training can only occur on an existing data set which cannot change during training. An online method exists where the error consists of a sum oferrors on each data point:

E(w) =N∑

n=1

En(w). (4.9)

Such stochastic gradient descent updates the weights for one data point at a time:

w(τ+1) =w(τ) −η∇En

�

w(τ)�

. (4.10)

An advantage of this method is that training is less susceptible to local minima, since aminimum of the entire data set is rarely the minimum of a single data point.

As presented stochastic gradient descent only works for a single linear equation. Here weare dealing with nonlinear networks of multiple layers. The answer to updating parameterscomes in the form of error backpropagation.

In each layer we compute the activation

a j =∑

i

w jizi , (4.11)

with zi the output of the previous layer. After we forwardly propagate the input data togenerate an output, we can use gradient descent. For gradient descent we require thederivatives of the error function. We have (noting that biases have been absorbed into theweights)

∂ En

∂ w ji=∂ En

∂ a j

∂ a j

∂ w ji. (4.12)

4Parameters such as η are known as hyperparameters and influence the model during training.

4.3. REINFORCEMENT LEARNING 35

Introducing δ j =∂ En∂ a j

and zi =∂ a j

∂ w ji(equation (4.11)) gives

∂ En

∂ w ji= δ jzi . (4.13)

This means that to evaluate derivatives throughout the network only the values for δ j and ziin each layer are required. If the output layer has linear activation functions (yk =

∑

i wki x i),and the error is

En =12

∑

j

�

yn j − tnk

�2, (4.14)

we have δ j = yn j − tn j .For hidden layers we must take into account all the nodes in the next layer to which a

node passes information during forward propagation. We get

δ j =∂ En

∂ a j=∑

k

∂ En

∂ ak

∂ ak

∂ a j, (4.15)

and through substitution we get

δ j =∂ h(x)∂ x

�

�

�

�

x=a j

∑

k

wk jδk. (4.16)

With these equations neural networks can be trained [40].

4.3 Reinforcement learning

The machine learning techniques we have seen until now all rely on large data sets to function.They use expert data to train and this (hopefully) allows for generalization to new data.Often the acquisition of expert data is difficult or impossible. In these cases reinforcementlearning comes into play. In reinforcement leaning a model interacts with the environmentand receives rewards, changing over time to maximize the reward received. The algorithmwe will consider here is Q-learning.

4.3.1 Q-learning

In Q-learning we learn an action-value function Q, which tells us what action is the mostvaluable to take from any given state. Q-learning has the advantage of being off-policy,meaning that the optimal action-value function q∗ is approximated independently of thefollowed policy during training. As long as states are visited and the Q-function is updated,q∗ is approximated.

Q-learning is a form of temporal-difference learning, looking at the reward of the nextstep to make an estimate of the optimal policy. It has the advantage that it has to look forwardonly a single time step to update its estimate of the optimal policy to follow. This is a form ofbootstrapping: at every step the guess for the best action to take is based on a guess for whatthe best action in the next state is.

Q-learning is also model-free, meaning that it does not know how the environment theagent is interacting with changes based on its actions. All knowledge about the environmentis solely through rewards.


In Q-learning, at any given step we update the following equation:

Q(st , at)← (1−α)Q(st , at) +αh

rt+1 + γmaxa

Q(st+1, a)i

(4.17)

This means that at every time step we update the value of Q by some amount (modulated bythe step size α). We select an action at to take, observe the new state st+1 and the rewardrt+1 received. Then we determine the highest Q-value for the new state st+1, and use thesevalues to update our original estimate of Q(st , at). The parameter γ acts as a discount onfuture rewards [41].

A problem unique to reinforcement learning is the trade-off between exploration andexploitation. An agent has to exploit the high-reward actions it has discovered over time, butalso has to explore the action space to find high-reward actions. This is an unsolved problem.We will use an ε-greedy selection algorithm to ensure exploration, taking a random action afraction ε of the time and following the agents guess for the best action during the remaining1− ε fraction [41].

The simplest problems to which Q-learning applies are tabular, meaning that we canmake a table of all states and actions, and enumerate the reward for every combination. Theproblem with the tabular method is that exhaustively tabulating every possible state andaction is often impossible. The state space may be factorially large, meaning that almostevery state will never before have been encountered. This means that we have to generalizeto new states from previous experience. Here we can turn to function approximators likeneural networks (section 4.2.1). We require neural networks which can deal with incrementaldata (such as stochastic gradient descent), since we update at each step. Note that there isno proof of convergence to an optimal solution for this method as there is for the tabularmethod [41,42].

Initialize Q(s,w) with random weights1: for epoch=1,M do2: for t=1,T do3: Select random action at with probability ε4: otherwise select at =maxa Q(st ;w)5: Execute action at and observe reward and state st+16: Make a copy y of Q7: if st terminal state then8: y j = (1−α)Q(st ;w) +αr j9: else

10: y j = (1−α)Q(st) +α�

r j + γmaxa′ Q(st+1;w)�

11: Perform gradient descent on�

y j −Q(st)�2

Algorithm 1: Q-learning

The algorithm we use is a combination of what is presented in [41] (the step parameterα) and deep Q-learning from [3]. Note that we will not use the experience replay used in [3],which is a queue of previous (state, action, reward, new state) tuples from which randomselections are used for training. Algorithm 1 shows the algorithm. During each time step wechoose a move according to an ε-greedy policy. Then the chosen action is taken, the newstate reached and the reward observed. In a copy of Q which we denote y, we update theentry corresponding to the taken action, and then update the neural network to better matchthe optimal policy we want to approximate. This gives the algorithm its off-policy nature.


Note that the error is effectively a minimization of the rescaled change to the Q-value:

y j −Q(st) = α�

r j + γmaxa

Q(st+1)−Q(st)�

. (4.18)

The parameter w denotes the weight of the neural network. We also use the trick employedby [3] of using a network with an output over all actions, as opposed to requiring a networkevaluation for every action, to improve efficiency.

Finally, we should mention the credit assignment problem. When playing games such aschess or Go, where a reward is only given at the end of a match (win, loss, draw), assigningcredit to a given move for enabling victory is difficult. A move that was crucial to winningmay have been played early in the game, and encoding this into the network is difficult [41].Due to the nature of the reward function we will use later, this problem is not so prevalent inthe current work.

If you torture the data long enough, itwill confess anything.

Ronald Coase

5Methods and results

Now that we have introduced the theoretical tools necessary, we can combine them in away applicable to integrable models. We start with simple regression to find rapidities andthen turn to exploring the state space of the Lieb-Liniger model. All neural networks andcode were written in Python, with the neural networks using the Keras framework [43] withTensorflow backend [44].

5.1 Regressing rapidities

As discussed (chapter 2), rapidities play a central role in Bethe ansatz integrable models.They form the basis for calculations of form factors and quantities such as the energy of asystem.

We have also seen how rapidities may be found numerically in section 3.1, using the(multidimensional) Newton method. This method converges quadratically, but requires aninitial guess for the rapidities. Here we may find an improvement by way of machine learning.

In the naive case we base the initial values for the Newton method on a simplified firstorder version of the Bethe equations (equation (2.16)):

λ j =2πL

I j . (5.1)

We expect the number of iterations required for convergence of the Newton method tobe reduced if the initial values are closer to their true For this we use a neural network to actas a function approximator, feeding in an array of N Bethe numbers and outputting an arrayof N rapidities.

The methodology is as follows: we generate a data set of 20000 sets of Bethe numbersand calculate their corresponding rapidities of length N , for different values of N . The Bethenumbers are randomly drawn from a normal distribution centered on zero, with a varianceof N (this corresponds to biasing the generated states to be close to the ground state). Wedo not use an additional UV cutoff.

The generated data is used for training a neural network consisting of an input layer ofsize N , two hidden layers of size N2 and an output layer of size N . Apart from the output layer

38

5.1. REGRESSING RAPIDITIES 39

30 20 10 0 10 20 30Bethe number

10.0

7.5

5.0

2.5

0.0

2.5

5.0

7.5

Rap

idity

N = 20PredictedExact

Figure 5.1: Predicted and exact rapidities for the Lieb-Liniger model with c = 1,L = N = 20.

all layers use a hyperbolic tangent activation function. This architecture was arrived at byexperimenting. More complex networks did not show improvements on rapidity estimationand took longer to train and to evaluate. We minimize the mean square error.

5.1.1 Results

We compare 4 different ways of determining initial values: using equation (5.1) (denoted as“Naive Bethe”), using the neural network described above (denoted “ML”), using a vectorof random numbers drawn from a normal distribution with mean zero and variance N(“Random”) and a vector of random numbers drawn from a normal distribution with meanzero and variance N where entries are sorted (“Random sorted”).

We also compare all these methods with both damping (equation (3.12)) enabled anddisabled. We use systems with c = 1 and L = N = 5,10 and 20. Of the 20000 data points,19000 are used for training and 1000 for validation over 100 training epochs.

We observe that the neural network learns the structure of the rapidities reasonably well.Notably it learned part of theorem 3: if I j > Ik, then λ j > λk (see figure 5.1).

The real question is whether the initial guesses of the neural network are better thanthose of the Naive Bethe approach. Comparing different initial guess methods in figure 5.2,we see this being the case. We compare the average number of iterations to convergenceover 1000 sets of Bethe numbers for each method. A lower number of iterations is better,with a fractional number of iterations possible due to averaging. The difference betweenNaive Bethe and ML is most pronounced for N = 5, being on average 0.95 iterations. Thisdifference decreases to 0.43− 0.45 iterations for N = 10 and to 0.07− 0.09 iterations forN = 20. This decrease is probably due to the inability of the neural network to capturethe increasing number of cross interactions of the rapidities with increasing system size.Compared to random initialization (either sorted or unsorted), both Naive Bethe and MLperform better. We see that sorting random initial values improves convergence, probablybecause a sorted list of monotonically increasing initial values captures the basic structure ofthe rapidities. Damping does not appear to have a large impact on convergence, and we willfrom here on only consider undamped rapidity calculations.

When we compare the time until convergence, a different picture emerges. Looking at

5.2. WALKING THROUGH STATE SPACE 40

figure 5.3 we see that the ML method is consistently slower than Naive Bethe, although thedifference decreases as N increases. This is probably because during Newton iterations weuse LU decompositions, which have an algorithmic complexity O

�

N3�

[26]. For large Nthis step outweighs the increased complexity introduced by the neural network comparedto a naive guess, because the number of parameters determines the number of calculationsnecessary for evaluation in the neural network and is of order O

�

N2�

. This also explainswhy the random initializations are generally slower. The increased number of iterations withtheir associated operations of O

�

N3�

makes reaching convergence take longer. Note that thedurations in figure 5.3 for machine learning do not capture the time required for data setgeneration and training, making the actual times required to get predictions for the rapiditieslonger than those reported. Machine learning thus does not aid in determining rapiditiesmore efficiently.

The raw values reported in figures 5.2 and 5.3 are found in appendix C.

5.2 Walking through state space

We now turn to exploring the state space of Bethe integrable models using machine learning.Given a starting point in state space (usually the ground state), we want to explore the statespace in the most efficient way possible.

Recalling the Lehmann representation of the dynamic structure function (DSF, equa-tion (3.3))

S(k,ω) =2πN

∑

µ

�

�

λ0�

�Ok

�

�µ��

�

2δ�

ω− Eµ + E0

�

, (5.2)

we note that if we want to evaluate the above expression, we in principle have to sum overthe entire Hilbert space of the model in question (potentially a space of infinite size) tocapture all behaviour of the DSF. Due to the impracticality of this we instead consider statesthat contribute a lot to the above sum. In general these are states with a large matrix element.More precisely they are states with large contributions to the f-sum rule (see equation (3.16)):

1N

∑

µ

�

Eµ − E0

��

�

λ0�

�ρ�

�µ��

�

2=

NL

k2, (5.3)

where we specified to the case of the density operator ρ = Ψ†Ψ for the Lieb-Liniger model.This allows us to determine which fraction of states has been summed in a given slice ofmomentum k.

At present ABACUS implements the scanning of the Hilbert space in the form of a largefunction of approximately 600 lines, using heuristics and the method outlined in chapter 3 tocapture important states as much as possible. Looking at figure 3.1, we see that the summationis not yet optimal: some important states are not captured early in the summation. Thisresults in lower than possible saturation of the f-sum rule for a given running time of ABACUS.

Thus, we turn to machine learning to scan through the Hilbert space in a hopefullymore efficient way. This also moves complexity from the handcrafted function into a neuralnetwork. While the neural network in more opaque in how it arrives at solutions, it may stillpresent a more systematic way of arriving at an algorithm.

We use reinforcement learning as a way to iteratively come to an optimal state summation,inspired by the recent success of Google Deepmind in playing Go [5,6], chess, and shogi [7],and 1980s computer games [3, 4]. We use the Lieb-Liniger model and find the DSF ofthe density-density correlation function. These methods are especially notable because


Naive Bethe ML Random Random sorted0

1

2

3

4

5

6

7

Itera

tions

N = 5Damped

FalseTrue


1

2

3

4

5

6

7

8

Itera

tions

N = 10Damped

FalseTrue


1

2

3

4

5

6

7

8

Itera

tions

N = 20Damped

FalseTrue

Figure 5.2: Number of Newton iterations until convergence for N = 5,10,20,averaged over 1000 sets of Bethe numbers. The black bars denotestandard errors.


Naive Bethe ML Random Random sorted0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Tim

e (s

)

N = 5


1

2

3

4

5

Tim

e (s

)

N = 10

Naive Bethe ML Random Random sorted0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Tim

e (s

)

N = 20

Figure 5.3: Time in seconds until convergence for N = 5,10,20. We averageover 10 runs of 1000 sets of Bethe numbers. Only undampedmethods are considered. The black bars denote standard errors.


they do not require handcrafting the inputs to the neural networks, but learn higher levelrepresentations from scratch. Specifically, we will use Q-learning as in the last two references.We refer to chapter 4 for an general explanation of Q-learning and focus here only on thespecifics relevant to Hilbert space scanning of the Lieb-Liniger model. Note that the learningwe use is tabula rasa, meaning that there is no pretraining on a known data set and alllearning occurs from scratch.

5.2.1 State and move representation

When using neural networks, we need a concrete representation of the state. A vector ofBethe numbers has a resolution that is too low as it hides some of the degrees of freedom ofthe system, notably not covering the holes that appear between particles. Instead we use abinary encoding for Lieb-Liniger model states, mapping the Bethe numbers to a vector where1s represent particles and 0s represent empty locations in Bethe number space. Because thevector has to have a finite size this representation does force the use of a UV cutoff Imax,above which there are no Bethe numbers and which can not be occupied. States in binaryrepresentation are then vectors of length Nworld = 2Imax + 1. For example, the ground stateof the N = 3 state with Imax = 5 is1

�

0 0 0 0 1 1 1 0 0 0 0�

. (5.4)

Moves are determined by a neural network consisting of an input layer of size Nworld,two hidden layers of size bN3/2

worldc and an output layer of size N2world, with each layer having

hyperbolic tangent activation functions. The network implements the idea of changing theQ-function from the form Q(s, a) to Q(s), with an output over all possible actions [3,4]. Thissimplifies the decision-making process in that only a single evaluation of the neural net isrequired for each step, instead of an evaluation for every possible actions. We calculate lossusing the mean square error.

A move is represented in the same way as AlphaZero uses in chess [7], except “playing”on a one dimensional board instead of a two dimensional chess board. Moves are representedas a tuple, with the x-coordinate representing the location of the particle to remove and they-coordinate being the new coordinate on which the particle is placed (note that in Python,due to the way nested arrays are indexed, the first coordinate corresponds to the verticalaxis of a matrix). Moves are selected based on the output of the neural net with the highestvalue, subject to the constraints that

• a move does not lead to a state that has already been visited (to prevent doublecounting);

• a move does not remove a particle from an unoccupied position;

• a move does not place a particle on an already occupied position;

• a new state does not have an integer momentum L2π P > Imax (see section 5.2.2).

If the move with the largest Q-factor is illegal, the move with the next-highest Q-value isselected, until a legal move is encountered. Note that during training we use ε-greedy stateselection, with random (legal) action being selected a fraction ε of the time and Q-networkmediated actions being chosen during the remaining 1− ε fraction.

1Due to rounding errors when considering the half-integer Bethe numbers associated with even N , thisrepresentation is currently only implemented for odd N , and we only consider those from now on.


40 20 0 20 40Total state momentum

0

200

400

600

800

1000

1200

1400

Num

ber o

f sta

tes

Figure 5.4: The number of states in each momentum slice for the Lieb-Linigermodel for N = 5 and Imax = 12. Note that we dropped the prefactor2πL for each momentum.

We also consider what happens when we minimize the number of particle-hole pairs. Tothat end, we define the number of particle-hole pairs Nph as

Nph(s, r) = N − len(J(s)si=1 ∩ J(r)ri=1), (5.5)

where s is the state for which to determine the number of particle-hole pairs and r is thereference state, in this case the ground state. J(s)si=1 denotes the set of indices for whichsi = 1 (likewise for r). Both the state and the reference state are in the representation ofequation (5.4). Effectively this measures the number of particles that have left the Fermiinterval. It has the desired properties of being 0 when r = s and Nph = N when no twoparticles are in the same place for s and r.

5.2.2 Exact results

A problem now presents itself. In the representation discussed above the higher pseudo-momentum states are less well represented than lower momentum states, since there aremore possible combinations that result in a small momentum. We see this if we considerfor example a Lieb-Liniger model with c = 1, L = N = 5 and Imax = 12. This choice ofparameters allows for

�NworldN

�

=�25

5

�

= 53510 possible combinations, concentrated around theorigin. For this number of states we can feasibly generate all combinations and calculate thedensity operator form factors of the associated states. Doing so for all possible permutationswith individual particle momentum in the interval (−12, 12) of length 5 results in a Gaussiandistribution (figure 5.4). The maximum possible state momentum is given by

Pmax =2πL

N−1∑

i=0

(Imax − i) =π

LN(2Imax − N + 1) =

π

LN(Nworld − N), (5.6)

which in this case is L2π Pmax = 35. If we calculate the sum rule saturations for each momentum

slice, the saturations go to 0 as L2π P becomes larger (figure 5.5), falling off especially quickly

for�

�

L2π P

�

� > Imax. This is caused by the lack of states with larger momentum and proves


10 5 0 5 10Integer momentum

0.0

0.2

0.4

0.6

0.8

1.0

Sum

rule

satu

ratio

n

Figure 5.5: Maximum possible sum rule saturation per momentum slice in theHilbert space with Imax = 12. Note how the saturations fall offsharply at around Imax. This is a result of the lack of states withthe corresponding momentum. Not pictured are the saturationsoutside of the interval (-12,12), which decay exponentially.

especially problematic if we calculate total sum rule saturations over all states. The highermomentum slices will suppress the total possible attainable sum rule.

The problem can be somewhat mitigated by giving Imax two roles in the scanning. Itacts as both the cutoff for the momentum of individual particles as well as the total allowedmomentum. This contrasts with ABACUS, where Imax acts only as the cutoff for the totalmomentum, meaning that states could in principle be built from particles with momentumfar above Imax, as long as particles with approximately opposite momentum cancel eachother out. Thus ABACUS, when considering states, can always get high saturations in everymomentum slice. ABACUS also has the advantage that it can scan within a individual slice.Due to this when calculating the f-sum rule saturation we only consider states for whichL

2π P ∈ [−Imax, Imax]. If we take all states in this interval and use N = 5, Imax = 12 the totalsum rule saturation is approximately 83.5%. We will later consider this to be 100% in ourgiven subspace.

Furthermore we look at sum rule and form factor contributions for states with differentnumber of particle-hole excitations, as calculated with equation (5.5). Looking at figure 5.6,where we plot the mean form factor sizes for each possible number of particle-hole pairs, wesee that it decreases exponentially as Nph increases. We see the same pattern of exponentialdecay when plotting the mean sum rule saturation per Nph (figure 5.7).

Even though the number of states with a given Nph increases for increasing Nph, this onlyhappens approximately linearly (figure 5.8), meaning that only the states with Nph = 1,2have a large contribution. This validates the use of equation (5.5) while training to selectstates with large matrix elements. Note that due to the system size considered the numberof states with a given number of particle-hole pairs falls off at some point. This is strictly aconsequence of the system size considered. In larger systems there are always more statewith n+ 1 pairs than with n pairs.


0 1 2 3 4 5Number of particle-hole pairs

10 16

10 14

10 12

10 10

10 8

10 6

10 4

10 2

Mea

n sq

uare

form

fact

or

Figure 5.6: Mean form factor per number of particle-hole pairs.


10 14

10 12

10 10

10 8

10 6

10 4

10 2

100

Mea

n su

m ru

le sa

tura

tion

per s

tate

Figure 5.7: Mean sum rule saturation per given number of particle-hole pairs.

5.2.3 Numerical results

We consider six different possible ways of selecting states to sum:

• A neural network trained with and without checking the number of particle hole pairsfor states;

• Random state selection. In this case we set ε = 1 in the ε-greedy algorithm and do notdecrease it over the course of the run.

For both methods we calculate the DSFs both with and without minimizing Nph.2 The measurewe use to compare methods is how many states are required to reach a given f-sum rulesaturation. Due to the representation of the states and the cutoff used, direct comparison

2Note that during training, even when enabling particle-hole pair minimization, we do not use it during theforward pass of Q-learning. Doing so degrades performance.



0

5000

10000

15000

20000

25000

Num

ber o

f sta

tes

Figure 5.8: Number of states with a given number of particle-hole pairs.

with ABACUS is difficult (section 5.2.2), but we do give some results for a rough estimate ofrelative performance.

Required sum rule saturation is set at 83%. In the considered subspace the maximumachievable saturation is 83.5%, due to the lower number of states with higher momenta (sec-tion 5.2.2). Therefore 83% corresponds to approximately 83/83.5≈ 99.5% total achievablesum rule saturation. We average the number of states necessary over five runs with randominitial seed. If training occurs we do so for 25 epochs of 500 states. We set ε= 0.1 and donot change it during training. We do not start from full exploration (ε= 1) because manystates will contribute little to sum rule saturations, leading to poor results. In evaluationwe also have ε= 0.1. We set the hyperparameters α= 0.01 and γ= 0.975. If convergencehas not been reached after 3000 states, we terminate the scanning. Encountered states arestored for later analysis and to prevent double counting.

As an approximate measure of the average size of matrix elements, we consider a lineartrend line trough the logarithm of the data. The value of the midpoint of the data acts as aproxy for the average size, allowing for a comparison between different methods.

For the reward function we do not use the obvious choice of form factors or sum rulesaturation, since these measures span many decades. Instead we use

Input F ,µ,λ, Nworld

1: if |F |> 10−5 then2: return |F |1/10

3: else if d(µ,λ)<p

Nworld then4: return 1

d(µ,λ)1/10

5: else6: return -1

where F is the form factor of the current state µ, λ is the reference state and d(µ,λ) is thedistance between the current state and the reference state, defined as


Random Q-learningNph minimized

in training

Q-learningNph not minimized

in training

0

500

1000

1500

2000

2500

3000

3500

4000

Num

ber o

f sta

tes

Nph minimized in evaluationTrueFalse

Figure 5.9: Number of states needed to reach 83% sum rule saturation averagedover five runs. The error bars denote standard errors.

Random Q-learningNph minimized

in training

Q-learningNph not minimized

in training

10 15

10 13

10 11

10 9

10 7

10 5

10 3

10 1

Squa

re fo

rm fa

ctor

Nph minimized in evaluationTrueFalse

Figure 5.10: Mean square form factor size proxy average over five runs. Theerror bars denote standard errors.

Input µ,λ1: d = 02: for i = 1 to N do

3: d = d +�

�

�I iµ − I i

λ

�

�

�

4: return d

In other words, d measures the sum of the distances each particle has traveled in Bethenumber space from its original position in the reference state.

Figure 5.9 and figure 5.10 show the results for the number of states necessary forconvergence and the size of the form factor. We consider states with c = 1, L = N = 5 andImax = 12, and sum the matrix elements of the density operator. Appendix D contains full


results. The baseline measurement we use is a random exploration of the state space, withno checks on the number of particle-hole pairs during DSF calculation. As long as an actionis valid, meaning it does not add or remove particles where that is not possible, revisits astate, or creates a state with too large a momentum. Looking at figure 5.12 we see that noneof the runs converge after 3000 steps. Average square form factor sizes are of order 10−16,as seen in figure 5.11.

If we enforce particle-hole pair minimization, the picture changes. Now only 728 statesare necessary to reach saturation, and average form factor size is of order 10−7 (figure 5.11).The source of randomness here is the random selection of the state with the lowest Nph fromall states with the same value for Nph. We see that enforcing particle-hole minimization is animportant measure in selecting states with large form factors. Note the plateauing of thesum rule saturation that occurs for most runs (figure 5.12). This probably happens becausemost of the large form factors have already been selected and the last few percentage pointshave to be made up by many form factors with small contributions.

Now we come to the results of using Q-learning. If Nph-minimization is used during bothtraining and DSF calculation, we see the best results of all methods considered, with anaverage number of states to convergence of 545 (although again we often see a plateauing,figure 5.15). The average form factor also has the largest value at 6.53× 10−7, although theerror is large (figure 5.16).

The most interesting result is seen when we do enable Nph-minimization during trainingbut not during evaluation. In this case the number of states required to reach saturation is1475, although a large error of 909 states is present and one of the runs did not saturate after3000 states (figure 5.17). Average form factors are of order 10−8 and visually comparing theresults of this run (figure 5.18) with those of the random evaluation without Nph-minimization(figure 5.11) confirms this. The performance of this method confirms that learning is occurringsince there is no Nph-minimization occurring during evaluation, meaning that all decisionsare based on the information present during training.

That these results come from learning is further confirmed when considering the evalu-ation with particle-hole minimization disabled in training and enabled in evaluation. Theaverage number of states required is 795 (figure 5.19), with matrix elements of order 10−8

(figure 5.20). This is worse than with Nph-minimization enabled in training, pointing to apossibility that the minimization leads the model to learn about states with large contributionsand thus high rewards.

Finally, we come to an evaluation with Nph-minimization disabled in both training andevaluation. Here none of the five runs converge (figure 5.21) but the average form factor isof order 10−12 (figure 5.22), meaning learning still occurs. The poor results of this methodshows that using only the reward function to shape action selection does not provide enoughfeedback for learning.

An illustration of the learned behaviour is visible when considering the trained modelof the Q-function of a system with N = 11 and Imax = 20. When evaluating the groundstate (figure 5.23), we see a structure occurring in the resulting Q-matrix (figure 5.24, top).Unoccupied positions should not be selected for removal, while occupied positions shouldnot be selected for particle placement. We see this occurring in the form of bands of highand low Q-factors. The horizontal yellow of high Q-factors band consists of actions thatwould take a particle from the Fermi interval and deposit it somewhere outside. In contrastthe vertical blue band corresponds to forbidden actions that would take a particle froman unoccupied position and place it on an already occupied site. These actions with twoforbidden sub-actions are more strongly disfavored than the actions represented by the cornerquadrants, which represent taking a particle from an unoccupied position and placing it at


another unoccupied site. The band structures are clearer when we highlight the 100 lowest(figure 5.24, lower left) and highest (figure 5.24, lower right) Q-factors.

Finally, we compare with ABACUS. When we use the same settings asabove (L = N = 5, Imax = 12), we find that only 158 states are necessary for 99.5% sumrule saturation, with a form factor size proxy of 0.0009 (figure 5.25). We should note that,because ABACUS can scan within momentum slices, uniformly high saturations can beachieved (figure 5.26).

If we consider a larger system the power of ABACUS becomes even clearer. We calculatethe density-density DSF for c = 1, L = N = 11 and Imax = 40. This yields a state space ofat least size

�8111

�

≈ 1.2× 1013. The desired sum rule saturation of 99.9% (figure 5.27) isachieved with 9376 states, summed over approximately 7 seconds. This is at most a fraction7.7× 10−10 of the Hilbert space under consideration. The form factor size proxy has a valueof 8.01× 10−5 (figure 5.28) This shows that ABACUS is highly efficient in summing matrixelements.

In conclusion we can say that learning does occur when using Q-learning to sum matrixelements, but that this method is far outclassed by the current implementation of ABACUS.Of special note is the speed at which ABACUS operates, summing 10000 states in 7 seconds,while evaluation of approximately 2000 states takes 20 seconds, preceded by 600 seconds oftraining, for smaller systems.


0 500 1000 1500 2000 2500 3000Order in computation

10 24

10 21

10 18

10 15

10 12

10 9

10 6

10 3

100Sq

uare

form

fact

orOrdered form factorsTrend line

Figure 5.11: Square density operator form factors as a function of order ofcomputation for random state selection without minimizing Nphfor a selected DSF calculation.


0.00

0.02

0.04

0.06

0.08

0.10

0.12

Tota

l sum

rule

satu

ratio

n

Figure 5.12: Saturation as a function of order of computation for random stateselection without minimizing Nph for five runs of DSF calculation.


0 200 400 600 800Order in computation

0.0

0.2

0.4

0.6

0.8To

tal s

um ru

le sa

tura

tion

Figure 5.13: Saturation as a function of order of computation for random stateselection while minimizing Nph for five runs of DSF calculation.


10 13

10 11

10 9

10 7

10 5

10 3

10 1

Squa

re fo

rm fa

ctor

Ordered form factorsTrend line

Figure 5.14: Square density operator form factors as a function of order ofcomputation for random state selection while minimizing Nph fora selected DSF calculation.


0 100 200 300 400 500 600 700Order in computation

0.0

0.2

0.4

0.6

0.8To

tal s

um ru

le sa

tura

tion

Figure 5.15: Saturation as a function of order of computation for Q-learningstate selection while minimizing Nph during both training and DSFcalculation, for five runs of DSF calculation.


10 13

10 11

10 9

10 7

10 5

10 3

10 1

Squa

re fo

rm fa

ctor


Figure 5.16: Square density operator form factors as a function of order ofcomputation for Q-learning state selection while minimizing Nphduring both training and DSF calculation, for a selected DSFcalculation.



0.0

0.2

0.4

0.6

0.8To

tal s

um ru

le sa

tura

tion

Figure 5.17: Saturation as a function of order of computation for Q-learningstate selection while minimizing Nph during training but not duringDSF calculation, for five runs of DSF calculation. Note that oneof the DSF calculation runs did not converge, reaching ∼ 79.4%saturation after 3000 states.


10 18

10 15

10 12

10 9

10 6

10 3

100

Squa

re fo

rm fa

ctor


Figure 5.18: Square density operator form factors as a function of order ofcomputation for Q-learning state selection while minimizing Nphduring training but not during DSF calculation, for a selected DSFcalculation.


0 200 400 600 800 1000Order in computation

0.0

0.2

0.4

0.6

0.8To

tal s

um ru

le sa

tura

tion

Figure 5.19: Saturation as a function of order of computation for Q-learningstate selection while not minimizing Nph during training but withminimization during DSF calculation, for five runs of DSF calcula-tion.

0 100 200 300 400 500 600 700 800Order in computation

10 13

10 11

10 9

10 7

10 5

10 3

10 1

Squa

re fo

rm fa

ctor


Figure 5.20: Square density operator form factors as a function of order ofcomputation for Q-learning state selection while not minimizingNph during training but with minimization during DSF calculation,for a selected DSF calculation.



0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7To

tal s

um ru

le sa

tura

tion

Figure 5.21: Saturation as a function of order of computation for Q-learningstate selection without minimizing Nph during either training andDSF calculation, for five runs of DSF calculation.


10 24

10 21

10 18

10 15

10 12

10 9

10 6

10 3

100

Squa

re fo

rm fa

ctor


Figure 5.22: Square density operator form factors as a function of order ofcomputation for Q-learning state selection without minimizingNph during either training and DSF calculation, for a selected DSFcalculation.


0 5 10 15 20 25 30 35 40

Figure 5.23: Ground state of the Lieb-Liniger model in binary representationwith N = 11 and Imax = 20. The yellow dots represent particles.

0 10 20 30 40Location to place particle

0

5

10

15

20

25

30

35

40

Loca

tion

to re

mov

e pa

rticle

from


0

5

10

15

20

25

30

35

40

Loca

tion

to re

mov

e pa

rticle

from


0

5

10

15

20

25

30

35

40

Loca

tion

to re

mov

e pa

rticle

from

Figure 5.24: Q-matrix for the Lieb-Liniger model ground state with N = 11 andImax = 20 (top). In the lower left image, the 100 lowest Q-factorsare highlighted, with the 100 best are highlighted in the lowerright image.


0 20 40 60 80 100 120 140 160Order in computation

10 5

10 4

10 3

10 2

10 1Sq

uare

form

fact

orOrdered form factorsTrend line

Figure 5.25: Square density operator form factor for the density operator ofthe Lieb-Liniger model, at c = 1, L = N = 5, Imax = 12, calculatedwith ABACUS.

10 5 0 5 10Momentum

0.0

0.2

0.4

0.6

0.8

1.0

Satu

ratio

n

Figure 5.26: Saturation per momentum slice for the density operator of theLieb-Liniger model, at c = 1, L = N = 5, Imax = 12, calculatedwith ABACUS.


40 30 20 10 0 10 20 30 40Momentum

0.0

0.2

0.4

0.6

0.8

1.0Sa

tura

tion

Figure 5.27: Saturation per momentum slice for the density operator of theLieb-Liniger model, at c = 1, L = N = 11, Imax = 40, calculatedwith ABACUS.

0 2000 4000 6000 8000Order in computation

10 11

10 9

10 7

10 5

10 3

10 1

Squa

re fo

rm fa

ctor


Figure 5.28: Squared density operator form factor for the density operatorof the Lieb-Liniger model, at c = 1, L = N = 11, Imax = 40,calculated with ABACUS.

I may not have gone where I intendedto go, but I think I have ended upwhere I needed to be.

Douglas Adams, The Long DarkTea-Time of the Soul

6Discussion and conclusion

6.1 Discussion

The methods used in this thesis suffer from some difficulties.

6.1.1 Non-Markovian nature of DSF Hilbert scanning

Firstly, while the Lieb-Liniger model is Markovian, the Hilbert space scanning used to calculateDSFs is not. This is because, to prevent overcounting, the entire history of previously-visitedstates must be taken into consideration during each state selection. This breaks the propertyof Markov processes that all relevant information to take the best action is present in thecurrent state. DSF scanning is an example of a hidden state task. Applying reinforcementlearning algorithms to non-Markovian tasks (which are the norm when considering realworld control tasks) may lead to suboptimal performance [45]. But for the problem at handthe use of a history over states is essential.

Furthermore the proof of convergence of the Q-learning algorithm to an optimal policydepends on the Markovian nature of a problem [42]. Note though that convergence happensonly in the t →∞ limit, and thus of questionable relevance when Q-learning is actuallyused.

6.1.2 Problems with reinforcement learning

A number of issues with reinforcement learning more generally are present. On of those isthat reinforcement learning is data inefficient. In [46] a study was made of performancein Atari games by combining different techniques beyond those used in [3,4]. Even in thebest case approximately 15 million frames (at 60 Hertz) were required to reach human-levelperformance, corresponding to three days of gametime. But this was already an improvementon the original work done in [4], where only one in four frames where used to improvecomputational efficiency and which required 38 days of gametime to train on 50 millionsample frames. A similar data inefficiency is seen in [47]. Although the results are impressive(an agent learns to navigate a challenging environment consisting of hurdles and gaps usingonly a simple reward function), training took 100 hours of wall clock time on 64 workers,

60

6.1. DISCUSSION 61

Figure 6.1: Log plot of petaflop/s-day used in training AI over time. Doublingtime is 3.5 months. Figure taken from [49].

for 6400 hours in total [47, Figure 1]. This all in contrast to humans, who are often able tolearn from a single example [48].

This leads to the conjecture that a well-engineered handcrafted function may outperforma reinforcement learning solution, if such a function can be created. This is the case forABACUS, where the applicability to large systems stems from the preliminary informationprovided by integrability, built into the algorithm [2]. Thus learning the structure of the formfactor space by reinforcement would probably take an inordinate number of samples.

State of the art machine learning setups are also experiencing an large increase in computepower used. Data [49] compiled by the nonprofit OpenAI1 shows an increase of a factor300000 in the number of petaflop/s-day2 used in training machine learning systems between2012 and the end of 2017 (figure 6.1). This corresponds to a doubling time of 3.5 monthsand is the result of increased parallelism and monetary investments. This is far lower thanthe doubling time of 18 to 24 months predicted by Moore’s law.

Another difficulty more specific to reinforcement learning and evolutionary computingis the design of reward functions. Reward function are often designed to provide a simplequantitative measure which corresponds to the desired goal. The solutions with whichalgorithms come up are sometimes not in line with what researchers expect.

One example consisted of a simulated environment in which the goal was to evolvecreatures that would evolve locomotive strategies, with fitness measured by the averageground velocity over 10 seconds. Instead of developing movement strategies, some creatureswould evolve to have a long rigid body, which would fall over when the simulation started,causing large average ground velocities.

Another example is seen in automated program repair, where a program fixes anotherbuggy program automatically. The developer writes a set of tests which the program has topass. GenProg, which used digital evolution to generate debugging programs, fixed a sortingalgorithm by always making it return an empty list, which the tests considered as sorted.

1OpenAI was founded with the goal of doing research in finding safe ways towards the realization of artificialgeneral intelligence (see https://openai.com/about).

2A petaflop/s-day is defined as doing 1015 neural network operations per second for one day, or approximately1020 operations per day [49].

https://openai.com/about

6.1. DISCUSSION 62

Solutions may also depend on bugs in the simulation platform being used, acting ina way as automated bug discovery. In a simulation in which swimming strategies wereevolved, the simulation used inaccurate numerical simulation. Fast motion would lead to theaccumulation of errors, which was exploited by creatures by twitching body parts rapidly,providing free energy. In a similar vein buggy collision detection was used by creaturesto obtain free energy by penetrating the ground. When the simulation picked this up andcorrected it, the upward force led to forward locomotion [50].3

Preventing such reward function hacking and undesired side effects is an emerging fieldof study within AI, foreshadowing the possible eventual emergence of artificial generalintelligence [51]. Concurrently we are seeing a political debate develop related to the waysin which AI may be used maliciously right now [52].

Reward function with a simple scoring also have difficulties when confronted withproblems where there is a clear temporal structure. An example of this can be found in thegame Montezuma’s revenge, in which a tomb is traversed in search of treasure. This gamehas a temporal structure in that certain doors within the game are locked until the playerretrieves the appropriate key. A reward function based only on score performs poorly inthis game [4], because the agent is not able to causally reason about the actions required tomaximize score.4

We also see the difficulty of developing reward functions in the DSF scanning developedhere. The obvious choice for the reward function is the absolute form factor squared forthe operator under consideration. The problem with this approach comes from the manydecades spanned by the form factors. This makes the rewards for many transitions small,meaning these rewards will often only register as noise.

Finally, reinforcement learning research is facing problems with regards to reproducibilityof findings. In [53] a systematic study was made of factors impacting performance. Theyfound that simple changes, such as changing the activation function used in neural networks,can have a large effect on model performance. This makes the comparison between differentmodels difficult if those effects are not taken into account.

A source of non-determinism are the random seeds used to initialize weights of neuralnetworks. Figure 6.2 shows two averages for performance in the HalfCheetah environment,each taken over a set of five random seeds.5 The exact seeds chosen make a large difference inperformance. This may lead to inaccuracy when reporting on the performance of algorithms,because performance is often based on the average over five runs.

Figure 6.3 shows another aspect of performance influenced by random seeds. We seethe performance over time of a new exploration method developed in [55] and appliedto HalfCheetah. The solid line is the median reward at each iteration, averaged over tenrandom seeds, with the shaded area showing the quartiles. Notice that the lowest quartile(the white area beneath the light orange shaded area) lies close to zero even after 1000training iterations. This means that in almost a quarter of cases, performance of the trained

3Reference [50] contains many more examples.4There is now also progress in this field. OpenAI is working on agents which can beat professional players in

the game Dota 2. Dota 2 is a five-on-five strategy game in which the goal is to destroy the other teams base.Games can last upwards of 45 minutes, and require long-term strategies to win. The agent is now able to winagainst amateur and semi-professional players and will match against a professional team during the yearlyDota 2 tournament “The International” in August 2018. Note that this agent does suffer from some of the otherproblems discussed above, notably used resources (100000 CPUs) and data inefficiency (each training day isequivalent to 180 years of game time). It also does not learn to play directly from pixels, but uses a data interfaceto get the value for each of the 20000 relevant variables in the games [54].

5HalfCheetah is a research environment for reinforcement learning in which a cheetah-like creature has thegoal of learning to walk.

6.2. CONCLUSION 63

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00Timesteps ×106

0

1000

2000

3000

4000

5000

Average

Return

HalfCheetah-v1 (TRPO, Different Random Seeds)

Random Average (5 runs)

Random Average (5 runs)

Figure 6.2: Performance of a reinforcement algorithm in the HalfCheetah en-vironment, averaged over two different sets of five random seedseach. Note the large difference in performance, even though onlythe seeds differ between runs. Figure taken from [53].

Figure 6.3: Reward in the HalfCheetah environment over time. The solid lineis the median of ten random seeds, with shaded area denotingquartiles. The lowest quartile (the white area below the lightorange shaded area) shows that little learning takes place for thoseseeds. Figure taken from [55].

model is poor, which is caused solely by the initial random seed.Finally, deep learning has currently no way of learning high level concepts about the

environments. In learning Pong [3,4], there is no knowledge about the concepts of a “ball”or a “paddle”, only an input vector of pixels. Hierarchical structures also pose a problem,leading to difficulties when dealing with areas like language [48].

6.2 Conclusion

In this thesis we considered ways in which to numerically calculate the Fourier transformof correlation functions, known as dynamical structure functions or DSFs. In the Lehmann

6.2. CONCLUSION 64

representation DSFs are a sum over matrix elements. In Bethe ansatz integrable modelsthese DSFs can be calculated with the ABACUS algorithm. The summation over these matrixelements is not fully optimal, meaning that matrix elements are not summed in strictlydecreasing order. This results in inefficiency when considering finite compute times, sincestates with large contributions may be missed.

We looked at the Bethe ansatz solvable Lieb-Liniger model. This model was chosenbecause the parameters which define the wave function, known as rapidities, are real.Additionally, given the quantum numbers of a state only a single solution exists. We reviewedthe Bethe ansatz, finding expression for physical properties of the Lieb-Liniger model.

After this we reviewed the current operation of ABACUS. We examined how to numericallysolve for the rapidities of the Lieb-Liniger model, and gave a high-level overview of thescanning of the Hilbert space as currently implemented.

A review of some machine learning to be used later was given. We then consideredwhether supervised learning could make the calculation of rapidities faster by providingbetter initial guesses based on Bethe quantum numbers. We found that for small systems thenumber of Newton-method iterations decreased with better guesses provided by a neuralnetwork, but that this improvement decreased for increasing system size. Additionally thecomputation time added by the evaluations of the neural network made the machine learningapproach slower than using a naive initial guess.

We then developed a method to optimize summation of matrix elements using Q-learning.For this we developed a binary-encoded representation of Lieb-Liniger states, as well as a wayto encode state transitions in a neural network. We considered some of the limitations of thisrepresentation, notably its bias towards low-momentum states. Furthermore we consideredthe relationship between the number of particle-hole pairs and the size of density operatorform factors.

We proceeded to deploy Q-learning to find optimal summation strategies for the densityoperator. We found that random state selection led to poor performance, as expected.Activating particle-hole pair minimization increased performance, leading to better results,with larger form factors being found. Results from Q-learning show that particle-holeminimization is essential for learning to select high value states. If minimization is enabledduring training and evaluation, the best results are obtained in terms of number of statesrequired to reach saturation. Learning is seen to occur when we consider minimizationduring training but not during evaluation. While saturation takes longer than in other cases,it does occur and hints at a strategy being developed during training. Disabling minimizationduring training gave poor results when also disabling during evaluation, with no convergenceoccurring. If enabled during evaluation results were better.

Finally, we considered shortcomings of Q-learning as applied to DSF summation andreinforcement learning generally.

6.2.1 Outlook

In this thesis we used supervised learning to offer an initial guess for the numerical evaluationof rapidities. Reinforcement learning may also be used in this. Given a set of Bethe numbers,the rapidities can be predicted, after which convergence to machine precision is achievedthrough Newton iterations. The number of iterations can be used as part of a reward function,with the goal to improve the rapidity predictions by minimizing Newton iterations.

One of the limitations of the current approach to Q-learning is that the representationused requires a UV cutoff of the Bethe numbers to be able to represent possible moves inHilbert space. Furthermore, the size of action space scales as (I∞+ + I∞− + 1)2, meaning

6.3. ACKNOWLEDGEMENTS 65

that action space sizes quickly become too large for efficient computation, irrespective ofthe number of particles involved. This problem may be solved by using an alternativerepresentation of actions in terms of continuous changes. This has recently been done withQ-learning in continuous control applications such as operating a simulated gripper to graba ball, controlled only by applying torque to joints [56,57].

Another improvement may come from using experience replay [3,4]. This allows for away to deal with high correlation between consecutive states, by saving a tuple consistingof the current state, the action taken, the observed reward and the new state. Training isthen performed on small batches of this set of transitions. This has the advantage that datais used multiple times during training, improving efficiency. Furthermore, the correlationsbetween consecutive states are broken, reducing the variance of updates. We did not use ithere because in DSF calculation states may only be seen once to prevent double counting,thus making individual states not independent of the states that came before.

A further weakness in the current implementation is that only “connected states” can bereached from any given state. With this we mean states that differ only by a single particlemove from the current state under consideration. This differs from ABACUS, in which a treeof states is descended and at any point any previously reached state can be used as the basefor a new expansion into Hilbert space. The tree-like approach may be considered for furtherresearch.

We note that the approach explored in this thesis is in principle usable in any model,not only Bethe ansatz integrable systems. This is the case as long as eigenstates and matrixelements are known, a suitable expression for the f-sum rule exists, and a representationof states and moves can be found that is usable for computation. Thus further researchis warranted in this area, since not all models have a system such as ABACUS in place tocalculate DSFs.

On another front progress in computing DSFs may be made by Moore’s law and thepassage of time (as well as the development of new algorithms). We are seeing an increasein the system sizes realizable with exact diagonalization, with the low-energy properties ofup to 50 spin-1/2 particles now accessible exactly [58].

Furthermore, the mapping between deep learning and the renormalization group [39]should be further explored. This could provide insight into the working of neural networks.It may also provide clues to the optimum number of layers in a system for a given input set,the determination of which currently depends on experience and experimentation.

Finally, the fields of machine learning and artificial intelligence in general are currentlyexperiencing large changes and advances. New methods and techniques are constantlydeveloped and may also be applicable to the problem of DSF calculation.6

6.3 Acknowledgements

I would like to thank Jean-Sébastien Caux for his supervision and valuable suggestions.Thanks to Sam Kuypers for help in understanding some derivations. I would also like tothank Sam, along with the rest of the J-S master’s gang (Jorran de Wit and Rebekka Koch),for always being there to have lunch with and their willingness to hear me grumble about myproject. On the same note I would like to thank all the other people in the master’s room, inparticular Miró van der Worp, who was always organizing lectures and talks to help distracteveryone from their thesis.

6They may also lead to paperclip maximizers [59], uncooperative spaceship computers and ArnoldSchwarzenegger being sent back in time.

6.3. ACKNOWLEDGEMENTS 66

To my theatre friends I would like to say: thanks for being non-physicists and for all theparties.

Finally, I would like to thank my parents, Erik and Mickey. Without their support overthe past year I would never have finished this thesis. Erik, maybe I should have followedyour advice: “leer een echt vak” [60].

If this goes badly and I make a crater,I want it named after me!

Iain M. Banks, Against a DarkBackground

ABoundary term of the Lieb-Liniger model

Here we explicitly derive the boundary term (equation (2.3)) of the Lieb-Liniger model [13].We start with the Schrödinger equation

�

−∂ 2

∂ x21

−∂ 2

∂ x22

+ 2cδ(x1 − x2)

�

ψ(x1, x2) = Eψ(x1, x2). (A.1)

Substituting x+ =x1+x2

2 and x− = x1 − x2 this becomes

�

−12∂ 2

∂ x2+− 2

∂ 2

∂ x2−+ 2cδ(x−)

�

ψ= Eψ. (A.2)

Integrating over an interval x− ∈ [−ε,ε] we get

−12∂ 2

∂ x2+

∫ ε

−εψdx− − 2

∫ ε

−ε

∂ 2

∂ x2−ψdx− + 2c

∫ ε

−εδ(x−)ψdx− = E

∫ ε

−εψdx. (A.3)

The first term on the left hand side becomes 0 when taking limε→0+ , as well as the term onthe right hand side, leaving us with

−∂

∂ x−ψ

�

�

�

�

x−=0+

x−=−0++ cψ

�

�

�

�

x−=0

= 0. (A.4)

Given that we are dealing with the bosonic case we require symmetric wave functions,ψ(x1, x2) =ψ(x2, x1). This simplifies the result to

�

∂x2− ∂x1

− c�

ψ(x1, x2)

�

�

�

�

x2−x1=0+= 0. (A.5)

67

It’s an ancient and honorable termfor the final step in any engineeringproject. Turn it on, see if it smokes.

Lois McMaster Bujold, Falling Free

BThe f-sum rule

When computing the dynamical structure function numerically we need a way to measurewhat fraction of relevant states has been summed. For this we can use the f-sum rule, whichis a model-independent equality linking an integral over the dynamic structure function toa commutator of the Hamiltonian of the system and the operator under investigation [61].We first derive the general formula and then find sum rules for the density operator of theLieb-Liniger model.

Consider a lattice model where N is the number of lattice sites and thus system size.Starting with the dynamic structure function we have

S(k,ω) =1

2N

N∑

j, j′=1

e−ik( j− j′)

∫ ∞

−∞dteiωt

¬�

O j(t),O†j′(0)

�¶

, (B.1)

where the expectation value ⟨. . .⟩= ⟨γ|. . .|γ⟩ is over some state γ, although any averagingwill work, as long as we are consistent. Defining the Fourier transform

O j =1N

∑

k

eik jOk (B.2)

we can rewrite part of this equation:

12N

N∑

j, j′=1

e−ik( j− j′)¬�

O j(t),O†j′(0)

�¶

=1

2N3

∑

j, j′

∑

k′,k′′ei j(k′−k)ei j′(k−k′′)

�

Ok′(t),O†k′′(0)

��

(B.3)

=1

2N

∑

k′,k′′δk,k′δk,k′′

�

Ok′(t),O†k′′(0)

��

(B.4)

=1

2N

�

Ok(t),O†k(0)

��

. (B.5)

68

APPENDIX B. THE F-SUM RULE 69

Now we look at the first frequency moment:

Ik =

∫ ∞

−∞

dω2πωS(k,ω) (B.6)

=1

2N

∫ ∞

−∞

dω2πω

∫ ∞

−∞dteiωt

�

Ok(t),O†k(0)

��

(B.7)

=1

2N

∫ ∞

−∞

dω2πω

∫ ∞

−∞dteiωt

�

⟨γ|Ok(t)O†k(0)|γ⟩ − ⟨γ|O

†k(0)Ok(t)|γ⟩

�

(B.8)

=1

2N

∫ ∞

−∞

dω2πω

∫ ∞

−∞dteiωt

∑

α

�

⟨γ|Ok(t)|α⟩ ⟨α|O†k(0)|γ⟩

−⟨γ|O†k(0)|α⟩ ⟨α|Ok(t)|γ⟩

�

(B.9)

=1

2N

∫ ∞

−∞

dω2πω

∫ ∞

−∞dteiωt

∑

α

�

⟨γ|eiH tOke−iH t |α⟩ ⟨α|O†k|γ⟩

−⟨γ|O†k|α⟩ ⟨α|e

iH tOke−iH t |γ⟩�

(B.10)

=1

2N

∫ ∞

−∞

dω2πω

∫ ∞

−∞dteiωt

∑

α

�

e−i(Eα−Eγ) ⟨γ|Ok|α⟩ ⟨α|O†k|γ⟩

−ei(Eα−Eγ) ⟨γ|O†k|α⟩ ⟨α|Ok|γ⟩

�

, (B.11)

where we used the Heisenberg relation O(t) = eiH tOe−iH t and introduced a complete set ofstates α, 1=

∑

α |α⟩ ⟨α|.With the identity

∫ ∞

−∞dtei(ω−ω′)t = 2πδ(ω−ω′) (B.12)

we can rewrite equation (B.11) as

12N

∑

α

∫ ∞

−∞dω

�

δ�

ω− Eα + Eγ�

⟨γ|Ok|α⟩ ⟨α|O†k|γ⟩

−δ�

ω+ Eα − Eγ�

⟨γ|O†k|α⟩ ⟨α|Ok|γ⟩

�

(B.13)

=1

2N

∑

α

�

Eα − Eγ� �

⟨γ|Ok|α⟩ ⟨α|O†k|γ⟩+ ⟨γ|O

†k|α⟩ ⟨α|Ok|γ⟩

�

(B.14)

=1

2N

∑

α

�

−⟨γ|[H,Ok]|α⟩ ⟨α|O†k|γ⟩+ ⟨γ|O

†k|α⟩ ⟨α|[H,Ok]|γ⟩

�

(B.15)

=−12N⟨γ|�

[H,Ok],O†k

�

|γ⟩ , (B.16)

where we used H |α⟩= Eα |α⟩ and 1=∑

α |α⟩ ⟨α|.We have thus derived [62]

∫ ∞

−∞

dω2πωS(k,ω) =

−12N⟨γ|�

[H,Ok],O†k

�

|γ⟩ . (B.17)

B.1. THE ρ OPERATOR 70

B.1 The ρ operator

We can now calculate the f-sum rule explicitly for the density operator ρ when consideringthe Lieb-Liniger model [62]. The Hamiltonian in second-quantized form is [11]

H =

∫

dx�

∂xψ†(x)∂xψ(x) + cψ†(x)ψ†(x)ψ(x)ψ(x)

�

, (B.18)

with commutator�

ψ(x),ψ†(x ′)�

= δ�

x − x ′�

. Applying the Fourier transformψ(x) = 1

L

∑

k eikxψk, we get

H =1L

∑

k

k2ψ†kψk +

cL3

∑

k1,k2,q

ψ†k1+qψ

†k2+qψk2

ψk1, (B.19)

with commutator�

ψk,ψ†k′�

= δk,k′ .The dynamic structure function in this case becomes

S(k,ω) =1

2L

∫ ∞

0

dxdx ′e−ik(x−x ′)

∫ ∞

−∞dteiωt

�

ρ(x , t),ρ(x ′, 0)��

, (B.20)

with the Fourier transform of the density operator being

ρk =

∫ L

0

dxe−ikxρ(x) (B.21)

=

∫ L

0

dxe−ikxψ†(x)ψ(x) (B.22)

=1L2

∫ L

0

dx∑

k′,k′′eik′′x−ikx−ik′xψ†

k′ψk′′ (B.23)

=1L

∑

k1

ψ†k1ψk1+k. (B.24)

Adapting equation (B.17) to the Lieb-Liniger model, the f-sum rule becomes∫ ∞

−∞

dω2πωS(k,ω) =

−12L⟨[[H,ρk],ρ−k]⟩ . (B.25)

Calculating the right-hand side of the sum rule, we start with the commutator:

[H,ρk] =1L

∑

k1

1L

∑

k2

k22

�

ψ†k2ψk2

,ψ†k1ψk1+k

�

+cL3

∑

k3,k4,q

�

ψ†k3+qψ

†k4−qψk4

ψk3,ψ†

k1ψk1+k

�

!

. (B.26)

The first commutator in equation (B.26) is�

ψ†k2ψk2

,ψ†k1ψk1+k

�

=ψ†k2ψk2ψ†

k1ψk1+k −ψ

†k1ψk1+kψ

†k2ψk2

(B.27)

=ψ†k2

�

ψk2,ψ†

k1

�

ψk1+k +ψ†k2ψ†

k1ψk2ψk1+k

−ψ†k1

�

ψk1+k,ψ†k2

�

ψk2−ψ†

k1ψ†

k2ψk1+kψk2

(B.28)

= Lδk1,k2ψ†

k2ψk1+k − Lδk1+k,k2

ψ†k1ψk2

. (B.29)


The second commutator of equation (B.26) is�

ψ†k3+qψ

†k4−qψk4

ψk3,ψ†

k1ψk1+k

�

=ψ†k3+qψ

†k4−qψk4

ψk3ψ†

k1ψk1+k

−ψ†k1ψk1+kψ

†k3+qψ

†k4−qψk4

ψk3(B.30)

=ψ†k3+qψ

†k4−q

�

ψk4ψk3

,ψ†k1

�

ψk1+k

−ψ†k3+qψ

†k4−qψ

†k1ψk4ψk3ψk1+k

−ψ†k1

�

ψk1+k,ψ†k3+qψ

†k4−q

�

ψk4ψk3

+ψ†k1ψ†

k3+qψ†k4−qψk1+kψk4

ψk3(B.31)

=ψ†k3+qψ

†k4−q

�

ψk4ψk3

,ψ†k1

�

ψk1+k

−ψ†k1

�

ψk1+k,ψ†k3+qψ

†k4−q

�

ψk4ψk3

. (B.32)

We have�

ψk4ψk3

,ψ†k1

�

=ψk3ψk4ψ†

k1−ψ†

k1ψk3ψ4 (B.33)

=ψk3

�

ψk4,ψ†

k1

�

+ψk3ψ†

k1ψk4−ψ†

k1ψk3ψ4 (B.34)

= Lψk3δk1,k4

+�

ψk3,ψ†

k1

�

ψk4+ψ†

k1ψk3ψ4 −ψ

†k1ψk3ψ4 (B.35)

= Lψk3δk1,k4

+ Lψk4δk1,k3

(B.36)

and�

ψk1+k,ψ†k3+qψ

†k4−q

�

=ψk1+kψ†k3+qψ

†k4−q −ψ

†k3+qψ

†k4−qψk1+k (B.37)

=�

ψk1+k,ψ†k3+q

�

ψ†k4−q +ψ

†k3+qψk1+kψ

†k4−q −ψ

†k3+qψ

†k4−qψk1+k

(B.38)

=�

ψk1+k,ψ†k3+q

�

ψ†k4−q +

�

ψ†k3+q,ψk1+k

�

ψ†k4−q

+ψ†k3+qψ

†k4−qψk1+k −ψ

†k3+qψ

†k4−qψk1+k (B.39)

= Lδk1+k,k3+qψ†k4−q + Lδk1+k,k4−qψ

†k3+q. (B.40)

With this the second part of the commutator is (suppressing prefactors)∑

k1,k3,k4,q

�

ψ†k3+qψ

†k4−qψk4

ψk3,ψ†

k1ψk1+k

�

(B.41)

= L∑

k1,k3,k4,q

�

ψ†k3+qψ

†k4−q

�

ψk3δk1,k4

+ψk4δk1,k3

�

ψk1+k

−ψ†k1

�

δk1+k,k3+qψ†k4−q +δk1+k,k4−qψ

†k3+q

�

ψk4ψk3

�

(B.42)

= L∑

k3,k4,q

�

ψ†k3+qψ

†k4−qψk3

ψk4+k +ψ†k3+qψ

†k4−qψk4

ψk3+k

−ψ†k3+q−kψ

†k4−qψk3

ψk4−ψ†

k3+qψ†k4−q−kψk3

ψk4

�

(B.43)

= 0, (B.44)

when we perform the summation over k3 and k4.


Thus we have

[H,ρk] =1L2

∑

k1,k2

k22

�

δk1,k2ψ†

k2ψk1+k −δk1+k,k2

ψ†k1ψk2

�

(B.45)

=1L

∑

k1

k21

�

ψ†k1ψk1+k −ψ

†k1−kψk1

�

(B.46)

=1L

∑

k1

�

k21 − (k1 + k)

�2ψ†

k1ψk1+k (B.47)

=1L

∑

k1

�

−2kk1 + k2�

ψ†k1ψk1+k. (B.48)

Now we can calculate the rest of the right side of the sum rule:

[[H,ρk],ρ−k] =1L2

∑

k1

�

−2kk1 + k2�

∑

k2

[ψ†k1ψk1+k,ψ†

k2ψk2−k], (B.49)

where∑

k2

�

ψ†k1ψk1+k,ψ†

k2ψk2−k

�

=∑

k2

�

ψ†k1ψk1+kψ

†k2ψk2−k −ψ

†k2ψk2−kψ

†k1ψk1+k

�

(B.50)

=∑

k2

�

ψ†k1

�

ψk1+k,ψ†k2

�

ψk2−k +ψ†k1ψ†

k2ψk1+kψk2−k

−ψ†k2

�

ψk2−k,ψ†k1

�

ψk1+k −ψ†k2ψ†

k1ψk2−kψk1+k

�

(B.51)

= L∑

k2

�

δk1+k,k2ψ†

k1ψk2−k −δk2−k,k1

ψ†k2ψk1+k

�

(B.52)

= L�

ψ†k1ψk1−ψ†

k1+kψk1+k

�

. (B.53)

Thus

[[H,ρk],ρ−k] =1L

∑

k1

�

−2kk1 + k2�

�

ψ†k1ψk1−ψ†

k1+kψk1+k

�

(B.54)

=1L

∑

k1

�

−2kk1 − k2 − k2 + 2kk1

�

ψ†k1ψk1

(B.55)

=−2k2

L

∑

k1

ψ†k1ψk1

(B.56)

= −2k2N , (B.57)

where we used

N =

∫ L

0

dxψ†(x)ψ(x) =1L2

∫ L

0

dx∑

k,k′ei x(k′−k)ψ†

kψk′ =1L

∑

k

ψ†kψk. (B.58)

The f-sum rule for the dynamical structure factor of the Lieb-Liniger model for the densityoperator is

∫ ∞

−∞

dω2πωS(k,ω) =

NL

k2 (B.59)

I’m not dumb. I just have a commandof thoroughly useless information.

Bill Waterson

CConvergence results for machine learning

rapidities

Here we present the raw values for the number of iterations of the Newton method and timeuntil convergence, as presented in chapter 5.

Methods Damped N = 5 N = 10 N = 20

Naive Bethe Yes 5.09(30) 5.03(17) 5.03(17)No 5.09(30) 5.01(11) 5.00(5)

ML Yes 4.14(35) 4.58(57) 4.94(27)No 4.14(35) 4.58(70) 4.93(25)

Random Yes 6.80(71) 7.07(58) 7.14(58)No 6.64(57) 6.83(37) 6.87(34)

Random sorted Yes 5.92(49) 5.92(49) 6.03(42)No 5.74(50) 5.83(59) 5.92(28)

Table C.1: Newton method iterations necessary for convergence for differentinitial value methods for N = 5, 10,20.

Method N = 5 N = 10 N = 20

Naive Bethe 1.39(15) 4.01(8) 14.20(30)ML 1.64(1) 4.26(3) 14.69(20)Random 1.60(1) 5.42(2) 19.15(36)Random sorted 1.43(1) 4.68(2) 16.60(21)

Table C.2: Time in seconds necessary to converge for 1000 sets of Bethe num-bers for N = 5,10,20. Only undamped results are shown. We usethe average of 10 sets of 1000.

73

Sometimes the questions are compli-cated and the answers are simple.

Dr. Seuss

DQ-learning results

Here we present the raw values for the number of states necessary to reach 83% sum rulesaturation for c = 1, L = N = 5 and Imax = 12.

Particle-hole minimization

Method training DSF calculation Number of states Form factor size proxy

Random — Yes 728± 131 (1.31± 0.60)× 10−7

— No 30001 (2.79± 1.43)× 10−16

Q-learning Yes Yes 545± 151 (6.53± 6.34)× 10−7

No 1475± 9092 (1.73± 2.94)× 10−8

No Yes 795± 212 (2.28± 3.34)× 10−7

No 30003 (1.56± 3.10)× 10−12

Table D.1: Number of states required to reach convergence and size of formfactor size proxy for the methods under consideration. The reportederrors are standard errors.

1No convergence occured for any of the five runs after summing state contributions.2One out of five runs did not converge, reaching ∼ 79.4% saturation after 3000 steps.3No convergence occured for any of the five runs after summing state contributions.

74

Bibliography

[1] V. E. Korepin, N. M. Bogoliubov and A. G. Izergin, Quantum Inverse Scattering Method andCorrelation Functions, Cambridge University Press, Cambridge, ISBN 9780511628832(1993), doi:10.1017/CBO9780511628832.

[2] J.-S. Caux, Correlation functions of integrable models: A description of the ABACUSalgorithm, J. Math. Phys. 50, 095214 (2009), doi:10.1063/1.3216474.

[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Ried-miller, Playing Atari with Deep Reinforcement Learning, arXiv:1312.5602v1.

[4] V. Mnih et al., Human-level control through deep reinforcement learning, Nature 518,529 (2015), doi:10.1038/nature14236.

[5] D. Silver et al., Mastering the game of Go with deep neural networks and tree search,Nature 529, 484 (2016), doi:10.1038/nature16961.

[6] D. Silver et al., Mastering the game of Go without human knowledge, Nature 550, 354(2017), doi:10.1038/nature24270.

[7] D. Silver et al., Mastering Chess and Shogi by Self-Play with a General ReinforcementLearning Algorithm, arXiv:1712.01815v1.

[8] H. Bethe, Zur Theorie der Metalle I. Eigenwerte und Eigenfunktionen der linearen Atom-kette, Z. Physik 71, 205 (1931), doi:10.1007/BF01341708.

[9] E. H. Lieb and W. Liniger, Exact Analysis of an Interacting Bose Gas. I. The General Solutionand the Ground State, Phys. Rev. 130, 1605 (1963), doi:10.1103/PhysRev.130.1605.

[10] E. H. Lieb, Exact Analysis of an Interacting Bose Gas. II. The Excitation Spectrum, Phys.Rev. 130, 1616 (1963), doi:10.1103/PhysRev.130.1616.

[11] F. Franchini, An Introduction to Integrable Techniques for One-Dimensional QuantumSystems, Springer International Publishing, Cham, ISBN 9783319484860 (2017),doi:10.1007/978-3-319-48487-7.

[12] M. Panfil and J.-S. Caux, Finite-temperature correlations in the Lieb-Liniger one-dimensional Bose gas, Phys. Rev. A 89, 033605 (2014),doi:10.1103/PhysRevA.89.033605.

[13] J.-S. Caux, Integrability in atomic and condensed matter physics in and out of equilibrium,Lectures for Séminaire d’excellence en physique, Paris, June 2015, unpublished (2015).

[14] M. Gaudin and J.-S. Caux, The Bethe Wavefunction, Cambridge University Press, Cam-bridge, ISBN 9781107053885 (2009), doi:10.1017/cbo9781107053885.

75

http://dx.doi.org/10.1017/CBO9780511628832

http://dx.doi.org/10.1063/1.3216474

https://arxiv.org/abs/1312.5602v1

http://dx.doi.org/10.1038/nature14236




http://dx.doi.org/10.1007/BF01341708

http://dx.doi.org/10.1103/PhysRev.130.1605

http://dx.doi.org/10.1103/PhysRev.130.1616

http://dx.doi.org/10.1007/978-3-319-48487-7

http://dx.doi.org/10.1103/PhysRevA.89.033605

http://dx.doi.org/10.1017/cbo9781107053885

BIBLIOGRAPHY 76

[15] C. N. Yang and C. P. Yang, Thermodynamics of a One-Dimensional System ofBosons with Repulsive Delta-Function Interaction, J. Math. Phys. 10, 1115 (1969),doi:10.1063/1.1664947.

[16] J.-S. Caux, P. Calabrese and N. A. Slavnov, One-particle dynamical correlations inthe one-dimensional Bose gas, J. Stat. Mech. P01008 (2007), doi:10.1088/1742-5468/2007/01/P01008.

[17] S. M. Zemyan, The Classical Theory of Integral Equations, Birkhäuser Boston, Boston,ISBN 9780817683481 (2012), doi:10.1007/978-0-8176-8349-8.

[18] N. A. Slavnov, Nonequal-time current correlation function in a one-dimensional Bose gas,Theor. Math. Phys. 82, 273 (1990), doi:10.1007/bf01029221.

[19] N. A. Slavnov, Calculation of Scalar Products of Wave Functions and Form Factors inthe Framework of the Alcebraic Bethe Ansatz, Theor. Math. Phys. 79, 502 (1989),doi:10.1007/bf01016531.

[20] N. A. Slavnov, Algebraic Bethe ansatz, arXiv:1804.07350.

[21] J. De Nardis and M. Panfil, Density form factors of the 1D Bose gas for finite entropystates, J. Stat. Mech. P02019 (2015), doi:10.1088/1742-5468/2015/02/P02019.

[22] J.-S. Caux, R. Hagemans and J. M. Maillet, Computation of dynamical correlationfunctions of Heisenberg chains: the gapless anisotropic regime, J. Stat. Mech. P09003(2005), doi:10.1088/1742-5468/2005/09/P09003.

[23] J.-S. Caux and P. Calabrese, Dynamical density-density correlations in the one-dimensionalBose gas, Phys. Rev. A 74, 031605 (2006), doi:10.1103/PhysRevA.74.031605.

[24] J.-S. Caux and J. Michel Maillet, Computation of Dynamical Correlation Functionsof Heisenberg Chains in a Magnetic Field, Phys. Rev. Lett. 95, 077201 (2005),doi:10.1103/PhysRevLett.95.077201.

[25] E. Süli and D. Mayers, An introduction to numerical analysis, Cambridge UniversityPress, Cambridge, ISBN 9780521007948 (2003).

[26] W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical Recipes: The Artof Scientific Computing, Cambridge University Press, Cambridge, ISBN 9780521880688(2007).

[27] Standard for Programming Language C++, International Organization for Standardiza-tion, INCITS/ISO/IEC 14882:2014 (2016).

[28] 754-2008 - IEEE standard for floating-point arithmetic, IEEE (2008),doi:10.1109/ieeestd.2008.4610935.

[29] J. Levinson et al., Towards fully autonomous driving: Systems and algorithms, 2011 IEEEIntelligent Vehicles Symposium (IV) (2011), doi:10.1109/ivs.2011.5940562.

[30] R. Poplin, A. V. Varadarajan, K. Blumer, Y. Liu, M. V. McConnell, G. S. Corrado, L. Peng andD. R. Webster, Prediction of cardiovascular risk factors from retinal fundus photographsvia deep learning, Nat. Biomed. Eng. 2, 158 (2018), doi:10.1038/s41551-018-0195-0.

http://dx.doi.org/10.1063/1.1664947

http://dx.doi.org/10.1088/1742-5468/2007/01/P01008

http://dx.doi.org/10.1088/1742-5468/2007/01/P01008

http://dx.doi.org/10.1007/978-0-8176-8349-8

http://dx.doi.org/10.1007/bf01029221

http://dx.doi.org/10.1007/bf01016531

https://arxiv.org/abs/1804.07350

http://dx.doi.org/10.1088/1742-5468/2015/02/P02019

http://dx.doi.org/10.1088/1742-5468/2005/09/P09003

http://dx.doi.org/10.1103/PhysRevA.74.031605

http://dx.doi.org/10.1103/PhysRevLett.95.077201

http://dx.doi.org/10.1109/ieeestd.2008.4610935

http://dx.doi.org/10.1109/ivs.2011.5940562

http://dx.doi.org/10.1038/s41551-018-0195-0

BIBLIOGRAPHY 77

[31] R. Poplin et al., Creating a universal SNP and small indel variant caller with deep neuralnetworks, bioRxiv 092890 (2016), doi:10.1101/092890.

[32] M.-Y. Liu, T. Breuel and J. Kautz, Unsupervised Image-to-Image Translation Networks,arXiv:1703.00848.

[33] M. Paganini, Machine Learning Algorithms for b-Jet Tagging at the ATLAS Experiment,arXiv:1711.08811.

[34] N. M. Ball and R. J. Brunner, Data mining and machine learning in astronomy, Int. J.Mod. Phys. D 19, 1049 (2010), doi:10.1142/s0218271810017160.

[35] G. Carleo and M. Troyer, Solving the quantum many-body problem with artificial neuralnetworks, Science 355, 602 (2017), doi:10.1126/science.aag2302.

[36] H. Saito, Solving the Bose–Hubbard Model with Machine Learning, J. Phys. Soc. Jpn. 86,093001 (2017), doi:10.7566/jpsj.86.093001.

[37] E. van Nieuwenburg, E. Bairey and G. Refael, Learning phase transitions from dynamics,arXiv:1712.00450v1.

[38] V. Dunjko and H. J. Briegel, Machine learning & artificial intelligence in the quan-tum domain: a review of recent progress, Rep. Prog. Phys. 81, 074001 (2018),doi:10.1088/1361-6633/aab406.

[39] P. Mehta and D. J. Schwab, An exact mapping between the Variational RenormalizationGroup and Deep Learning, arXiv:1410.3831v1.

[40] C. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, New York,ISBN 9780387310732 (2006).

[41] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press,Cambridge, ISBN 9780262193986 (2018).

[42] C. J. C. H. Watkins and P. Dayan, Q-learning, Mach. Learn. 8, 279 (1992),doi:10.1007/bf00992698.

[43] F. Chollet et al., Keras, https://keras.io (2015).

[44] M. Abadi et al., TensorFlow: Large-scale machine learning on heterogeneous systems,https://www.tensorflow.org (2015).

[45] S. D. Whitehead and L.-J. Lin, Reinforcement learning of non-Markov decision processes,Artif. Intell. 73, 271 (1995), doi:10.1016/0004-3702(94)00012-p.

[46] M. Hessel et al., Rainbow: Combining Improvements in Deep Reinforcement Learning,arXiv:1710.02298.

[47] N. Heess et al., Emergence of Locomotion Behaviours in Rich Environments,arXiv:1707.02286.

[48] G. Marcus, Deep Learning: A Critical Appraisal, arXiv:1801.00631.

[49] D. Amodei and D. Hernandez, AI and Compute, https://blog.openai.com/ai-and-compute (2018).

http://dx.doi.org/10.1101/092890



http://dx.doi.org/10.1142/s0218271810017160

http://dx.doi.org/10.1126/science.aag2302

http://dx.doi.org/10.7566/jpsj.86.093001


http://dx.doi.org/10.1088/1361-6633/aab406


http://dx.doi.org/10.1007/bf00992698

https://keras.io

https://www.tensorflow.org

http://dx.doi.org/10.1016/0004-3702(94)00012-p




https://blog.openai.com/ai-and-compute

https://blog.openai.com/ai-and-compute

BIBLIOGRAPHY 78

[50] J. Lehman et al., The Surprising Creativity of Digital Evolution: A Collection of Anec-dotes from the Evolutionary Computation and Artificial Life Research Communities,arXiv:1803.03453.

[51] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman and D. Mané, ConcreteProblems in AI Safety, arXiv:1606.06565.

[52] M. Brundage et al., The Malicious Use of Artificial Intelligence: Forecasting, Prevention,and Mitigation, arXiv:1802.07228.

[53] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup and D. Meger, Deep Reinforce-ment Learning that Matters, arXiv:1709.06560.

[54] OpenAI, OpenAI Five, https://blog.openai.com/openai-five (2018).

[55] R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. De Turck and P. Abbeel, Vime: Varia-tional Information Maximizing Exploration, Advances in Neural Information ProcessingSystems 29, p. 1109–1117 (2016).

[56] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra,Continuous control with deep reinforcement learning, arXiv:1509.02971.

[57] S. Gu, T. Lillicrap, I. Sutskever and S. Levine, Continuous Deep Q-Learning with Model-based Acceleration, Proceedings of the 33rd International Conference on Machine Learn-ing, p. 2829–2838 (2016).

[58] A. Wietek and A. M. Läuchli, Sublattice Coding Algorithm and Distributed MemoryParallelization for Large-Scale Exact Diagonalizations of Quantum Many-Body Systems,arXiv:1804.05028.

[59] M. Tegmark, Life 3.0: Being Human in the Age of Artificial Intelligence, Random House,ISBN 9781101946596 (2017).

[60] E. W. Zwart, Fluid inclusions in carbonate rocks and calcite cements, Ph.D. thesis, VrijeUniversiteit Amsterdam, ISBN 9090085955 (1995).

[61] L. Pitaevskii and S. Stringari, Bose-Einstein Condensation and Superflu-idity, Oxford University Press, Oxford, ISBN 9780198758884 (2016),doi:10.1093/acprof:oso/9780198758884.001.0001.

[62] J.-S. Caux, The f-sum rule, unpublished (2011).





https://blog.openai.com/openai-five



http://dx.doi.org/10.1093/acprof:oso/9780198758884.001.0001

summing matrix elements of the lieb-liniger model with … · 2020-06-23 · charles stross,...

Documents