quantum neural network states: a brief review of methods ... · 3microsoft station q and department...

18
Quantum Neural Network States: A Brief Review of Methods and Applications Zhih-Ahn Jia, 1, 2, 3, * Biao Yi, 4 Rui Zhai, 5 Yu-Chun Wu, 1, 2, Guang-Can Guo, 1, 2 and Guo-Ping Guo 1, 2, 6 1 Key Laboratory of Quantum Information, Chinese Academy of Sciences, School of Physics, University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China 2 CAS Center For Excellence in Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China 3 Microsoft Station Q and Department of Mathematics, University of California, Santa Barbara, California 93106-6105, USA 4 Department of Mathematics, Capital Normal University, Beijing 100048, P.R. China 5 Institute of Technical Physics, Department of Engineering Physics, Tsinghua University, Beijing 10084, People’s Republic of China 6 Origin Quantum Computing, Hefei, 230026, P.R. China One of the main challenges of quantum many-body physics is the exponential growth in the dimensionality of the Hilbert space with system size. This growth makes solving the Schr¨ odinger equation of the system extremely difficult. Nonetheless, many physical systems have a simplified internal structure that typically makes the parameters needed to characterize their ground states exponentially smaller. Many numerical methods then become available to capture the physics of the system. Among modern numerical techniques, neural networks, which show great power in approximating functions and extracting features of big data, are now attracting much interest. In this work, we briefly review the progress in using artificial neural networks to build quantum many- body states. We take the Boltzmann machine representation as a prototypical example to illustrate various aspects of the states of a neural network. We briefly review also the classical neural networks and illustrate how to use neural networks to represent quantum states and density operators. Some physical properties of the neural network states are discussed. For applications, we briefly review the progress in many-body calculations based on neural network states, the neural network state approach to tomography, and the classical simulation of quantum computing based on Boltzmann machine states. I. INTRODUCTION One of the most challenging problems in condensed matter physics is to find the eigenstate of a given Hamil- tonian. The difficulty stems mainly from the power scal- ing of the Hilbert space dimension, which grows expo- nentially with the system size [1, 2]. To obtain a bet- ter understanding of quantum many-body physical sys- tems beyond the mean-field paradigm and to study the behavior of strongly correlated electrons requires effec- tive approaches to the problem. Although the dimen- sion of the Hilbert space of the system grows exponen- tially with the number of particles in general, fortunately, physical states frequently have some internal structures, for example, obeying the entanglement area law, making it easier to solve problems than in the general case [37]. Physical properties of the system usually restrict the form of the ground state, for example, area-law states [8], ground states of local gapped systems [9]. There- fore, many-body localized systems can be efficiently rep- resented by a tensor network [3, 1012], which is a new tool developed in recent years to attack difficulties in rep- resenting quantum many-body states efficiently. Tensor network approach achieves some great success in quan- tum many-body problems. It has become a standard tool * Electronic address: [email protected], [email protected] Electronic address: [email protected] and many classical algorithm-based tensor networks have been developed, such as the density-matrix renormaliza- tion group [13], projected entangled pair states (PEPS) [14], folding algorithm [15], entanglement renormaliza- tion [16], and time-evolving block decimation [17]. The research on tensor-network states also includes studies on finding new representations of quantum many-body states. During the last few years, machine learning has grown rapidly as an interdisciplinary field. Machine learning techniques have also been successfully applied in many different scientific areas [1820]: computer vision, speech recognition, and chemical synthesis, Combining quantum physics and machine learning has generated a new ex- citing field of research, quantum machine learning [21], which has recently attracted much attention [2229]. The research on quantum machine learning can be loosely categorized into two branches: developing new quan- tum algorithms, which share some features of machine learning and behave faster and better than their classi- cal counterparts [2224], using classical machine learning methods to assist the study of quantum systems, such as distinguishing phases [25], quantum control [30], error- correcting of topological codes [31], and quantum tomog- raphy [32, 33]. The latter is the focus of this work. Given the substantial progress so far, we stress here that ma- chine learning can also be used to attack the difficulties encountered with quantum many-body states. Since 2001, researchers have been trying to use ma- chine learning techniques, especially neural networks, to arXiv:1808.10601v3 [quant-ph] 26 Feb 2019

Upload: phungtruc

Post on 25-Mar-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Quantum Neural Network States: A Brief Review of Methods and Applications

Zhih-Ahn Jia,1, 2, 3, ∗ Biao Yi,4 Rui Zhai,5 Yu-Chun Wu,1, 2, † Guang-Can Guo,1, 2 and Guo-Ping Guo1, 2, 6

1Key Laboratory of Quantum Information, Chinese Academy of Sciences, School of Physics,University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China

2CAS Center For Excellence in Quantum Information and Quantum Physics,University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China

3Microsoft Station Q and Department of Mathematics,University of California, Santa Barbara, California 93106-6105, USA

4Department of Mathematics, Capital Normal University, Beijing 100048, P.R. China5Institute of Technical Physics, Department of Engineering Physics,Tsinghua University, Beijing 10084, People’s Republic of China

6Origin Quantum Computing, Hefei, 230026, P.R. China

One of the main challenges of quantum many-body physics is the exponential growth in thedimensionality of the Hilbert space with system size. This growth makes solving the Schrodingerequation of the system extremely difficult. Nonetheless, many physical systems have a simplifiedinternal structure that typically makes the parameters needed to characterize their ground statesexponentially smaller. Many numerical methods then become available to capture the physics ofthe system. Among modern numerical techniques, neural networks, which show great power inapproximating functions and extracting features of big data, are now attracting much interest. Inthis work, we briefly review the progress in using artificial neural networks to build quantum many-body states. We take the Boltzmann machine representation as a prototypical example to illustratevarious aspects of the states of a neural network. We briefly review also the classical neural networksand illustrate how to use neural networks to represent quantum states and density operators. Somephysical properties of the neural network states are discussed. For applications, we briefly reviewthe progress in many-body calculations based on neural network states, the neural network stateapproach to tomography, and the classical simulation of quantum computing based on Boltzmannmachine states.

I. INTRODUCTION

One of the most challenging problems in condensedmatter physics is to find the eigenstate of a given Hamil-tonian. The difficulty stems mainly from the power scal-ing of the Hilbert space dimension, which grows expo-nentially with the system size [1, 2]. To obtain a bet-ter understanding of quantum many-body physical sys-tems beyond the mean-field paradigm and to study thebehavior of strongly correlated electrons requires effec-tive approaches to the problem. Although the dimen-sion of the Hilbert space of the system grows exponen-tially with the number of particles in general, fortunately,physical states frequently have some internal structures,for example, obeying the entanglement area law, makingit easier to solve problems than in the general case [3–7]. Physical properties of the system usually restrict theform of the ground state, for example, area-law states[8], ground states of local gapped systems [9]. There-fore, many-body localized systems can be efficiently rep-resented by a tensor network [3, 10–12], which is a newtool developed in recent years to attack difficulties in rep-resenting quantum many-body states efficiently. Tensornetwork approach achieves some great success in quan-tum many-body problems. It has become a standard tool

∗Electronic address: [email protected], [email protected]†Electronic address: [email protected]

and many classical algorithm-based tensor networks havebeen developed, such as the density-matrix renormaliza-tion group [13], projected entangled pair states (PEPS)[14], folding algorithm [15], entanglement renormaliza-tion [16], and time-evolving block decimation [17]. Theresearch on tensor-network states also includes studieson finding new representations of quantum many-bodystates.

During the last few years, machine learning has grownrapidly as an interdisciplinary field. Machine learningtechniques have also been successfully applied in manydifferent scientific areas [18–20]: computer vision, speechrecognition, and chemical synthesis, Combining quantumphysics and machine learning has generated a new ex-citing field of research, quantum machine learning [21],which has recently attracted much attention [22–29]. Theresearch on quantum machine learning can be looselycategorized into two branches: developing new quan-tum algorithms, which share some features of machinelearning and behave faster and better than their classi-cal counterparts [22–24], using classical machine learningmethods to assist the study of quantum systems, such asdistinguishing phases [25], quantum control [30], error-correcting of topological codes [31], and quantum tomog-raphy [32, 33]. The latter is the focus of this work. Giventhe substantial progress so far, we stress here that ma-chine learning can also be used to attack the difficultiesencountered with quantum many-body states.

Since 2001, researchers have been trying to use ma-chine learning techniques, especially neural networks, to

arX

iv:1

808.

1060

1v3

[qu

ant-

ph]

26

Feb

2019

2

deal with the quantum problems, for example, solvingthe Schrodinger equations [34–37]. Later, in 2016, neuralnetworks were introduced as a variational ansatz for rep-resenting quantum many-body ground states [26]. Thisstimulated an explosion of results to apply machine learn-ing methods in the investigations of condensed matterphysics; see, e.g., Refs. [22–25, 27–29, 38]. Carleo andTroyer initially introduced the restricted BM (RBM) tosolve the transverse-field Ising model and antiferromag-netic Heisenberg model and study the time evolution ofthese systems [26]. Later, the entanglement properties ofthe RBM states were investigated [27], as was their rep-resentational power [29, 39]. Many explicit RBM con-structs for different systems were given, including theIsing model [26], toric code [28], graph states [29], sta-bilizer code [38, 40], and topologically ordered states[28, 38, 39, 41]. Furthermore, the deep BM (DBM)states were also investigated under different approaches[29, 42, 43].

Despite all the progress in applying neural networks inquantum physics, many important topics still remain tobe explored. The obvious topics are the exact definitionof a quantum neural network state and the mathematicsand physics behind the efficiency of quantum neural net-work states. Although RBM and DBM states are beinginvestigated from different aspects, there are many otherneural networks. It is natural to ask if they can similarlybe used for representing quantum states and what arethe relationships and differences between these represen-tations. Digging deeper, one central problem in studyingneural network is its representational power. We can aska) what is its counterpart in quantum mechanics and howto make the neural network work efficiently in represent-ing quantum states, and b) what kind of states can beefficiently represented by a specific neural network. Inthis work, we investigate partially these problems andreview the important progress in the field.

The work is organized as follows. In Section II, weintroduce the definition of artificial neural network. Weexplain in Section II A the feed-forward neural network,perceptron and logistic neural networks, convolutionalneural network, and stochastic recurrent neural network,the so-called Boltzmann machine (BM). Next, in Sec-tion II B we explain the representational power for thefeed-forward neural network in representing given func-tions and the BM in approximating given probabilitydistributions. In Section III, we explain how a neuralnetwork can be used as a variational ansatz for quan-tum states; the method given in Section III A is model-independent, that is, the way to construct states can beapplied to any neural network (with the ability to con-tinuously output real or complex numbers). Some con-crete examples of neural network states are given in Sec-tion III A 1. Section III A 2 is devoted to the efficientrepresentational power of neural network in represent-ing quantum states, and in Section III B we introducethe basic concepts of a tensor-network state, which isclosely relevant to neural network states. In Section IV,

we briefly review the neural network representation ofthe density operator and the quantum state tomographyscheme based on neural network states. In Section V, wediscuss the entanglement features of the neural networkstates. The application of these states in classically sim-ulating quantum computing circuit is discussed in Sec-tion VI. In the last section, some concluding remarks aregiven.

II. ARTIFICIAL NEURAL NETWORKS ANDTHEIR REPRESENTATIONAL POWER

A neural network is a mathematical model that is anabstraction of the biological nerve system, which con-sists of adaptive units called neurons that are connectedvia a an extensive network of synapses [44]. The ba-sic elements comprising the neural network are artificialneurons, which are the mathematical abstractions of thebiological neurons. When activated each neuron releasesneurotransmitters to connected neurons and changes theelectric potentials of these neurons. There is a thresholdpotential value for each neuron; while the electric poten-tial exceeds the threshold, the neuron is activated.

There are several kinds of artificial neuron mod-els. Here, we introduce the most commonly used Mc-Culloch–Pitts neuron model [45]. Consider n inputsx1, x2, · · · , xn, which are transmitted by n correspond-ing weighted connections w1, w2, · · · , wn (see Figure 1).After the signals have reached the neuron, they are addedtogether according to their weights, and then the valueis compared with the bias b of the neuron to determinewhether the neuron is activated or deactivated. The pro-cess is governed by the activation function f , and theoutput of a neuron is written y = f(

∑ni=1 wixi − b).

Note that we can regard the bias as a weight w0 for somefixed input x0 = −1, the output then has a more com-pact form, y = f(

∑ni=0 wixi). Putting a large number of

neurons together and allowing them to connect with eachother in some kind of connecting pattern produces a neu-ral network. Note that this is just a special representationto be able to picture the neural network intuitively, espe-cially in regard to feed-forward neural networks. Indeed,there are many other forms of mathematical structuresby which to characterize the different kinds of neural net-works. Here we give several important examples of neuralnetworks.

A. Some examples of neural networks

An artificial neural network is a set of neurons wheresome, or all, the neurons are connected according to acertain pattern. Note that we put neurons at both in-put and output ends. These input and output neuronsare not neurons as introduced previously, but depend onthe learning problems. There may or may not be activa-tion functions associated with them. In what follows, we

3

(a)

(b)

u

1 x

2 x

nx

)(f

1w

2w

nw

inputvalues

outputvalue

sum

weights y

biasb

activationfunction

outputlayer

inputlayer

hidden layers

FIG. 1: (a) McCulloch–Pitts neuron model, where x1, · · · , xnare inputs to the neurons, w1, · · · , wn are weights correspond-ing to each input, Σ is the summation function, b the bias, fthe activation function, and y the output of the neuron; (b)a simple artificial neural network.

briefly introduce the feed-forward, convolutional neuralnetworks, and the Boltzmann machines.

1. Rosenblatt’s perceptron and logistic neural network

To explain the neural network, we first see an exampleof the feed-forward neural network, perceptron, whichwas invented by Rosenblatt. In the history of artificialneural networks, the multilayer perceptron plays a crucialrole. In a perceptron, the activation function of eachneuron is set to a Heaviside step function

f(x) =

0, x ≤ 0,1, x > 0,

(1)

where one value represents the activation status of theneuron and zero value represents the deactivation statusof the neuron. The output value of one neuron is theinput value of the neurons connecting it.

The power of the perceptron mainly comes from itshierarchical recursive structure. It has been shown thatthe perceptron can be used for doing universal classicalcomputation [45–47]. To see this, we note that NAND

and FANOUT operations are universal for classical com-putation. In the perceptron, we still assume a FANOUT

operation works 1. We only need to show that the percep-

1 Here, we emphasize the importance of the FANOUT operation,which is usually omitted from the universal set of gates in theclassical computation theory. However, the operation is forbid-den in quantum computation by the famous no-cloning theorem.

TABLE I: Some popular activation functions.

functionlogistic function f(x) = 1

1+e−x

tanh tanh(x) = ex−e−x

ex+e−x

cos cos(x)

softmaxa σ(x)j = exj∑i exi

rectified linear unit ReLU(x) = max0, x

exponential linear unit ELU(x) =

x, x ≥ 0α(ex − 1), x < 0

softplus SP(x) = ln(ex + 1)

aThe softmax function acts on vectors x, which are usually usedin the final layer of the neural networks.

tron simulates the NAND operation. Suppose that x1, x2

are two inputs of the neuron each with weight −2 andthe bias of the neuron set to −3. When the inputs arex1 = x2 = 1, f(x1w1 + x2w2) = 0, otherwise its outputis 1, which is exactly the output of the NAND operation.

Although the perceptron is powerful in many applica-tions, it still has some shortcomings. The most outstand-ing one is that the activation function is not continuous,a small change in the weights may produce a large changein the output of the network, this makes the learning pro-cess difficult. One way to remedy this shortcoming is tosmooth out the activation function, usually, by choosingthe logistic function,

f(x) =1

1 + e−x. (2)

This resulting network is usually named the logistic neu-ral network (or sigmoid neural network). In practice,logistic neural networks are used extensively. Many prob-lems can be solved by this network. We remark that thelogistic function is chosen for convenience in the updat-ing process of learning and many other smooth step-likefunctions can be chosen as an activation function. In aneural network, the activation functions need not be allthe same. In Table I, we list some popular activationfunctions.

Next, we see how the neural network learns withthe gradient descent method. We illustrate the learn-ing process in the supervised machine framework usingtwo different sets of labelled data S = (xi, yi)Ni=1 andT = (zi, ti)Mi=1, known as training data and test data,respectively; here yi (resp. ti) is the label of xi (resp.zi). Our aim is to find the weights and biases of the neu-ral network such that the network output y(xi) (whichdepends on network parameters Ω = wij , bi) approx-imates yi for all training inputs xi. To quantify howwell the neural network approximates the given labels,we need to introduce the cost function, which measuresthe difference between y(xi) and yi,

C(Ω) := C(y(xi), yi) =1

2N

N∑i=1

‖y(xi)− yi‖2, (3)

4

where N denotes the number of data in the training set,Ω the set of network parameters wij and bj , and the sumruns over all data in the training set. Here, we choosethe quadratic norm, therefore, the cost function is calledquadratic. Now our aim is to minimize the cost functionas a multivariable function of the network parameterssuch that C(Ω) ≈ 0, this can be done by the well-knowngradient descent method.

The intuition behind the gradient decent method isthat we can regard the cost function, in some sense, asthe height of a map where the place is marked by networkparameters. Our aim is to go down repeatedly from someinitial place (given a configuration of the neural network)until we reach the lowest point. Formally, from somegiven configuration of the neural network, i.e., given pa-rameters wij and bi, the gradient decent algorithm needs

to compute repeatedly the gradient ∇C = ( ∂C∂wij

, ∂C∂bi ).

The updating formulae are given by

wij → w′ij = wij − η ∂C∂wij

, (4)

bi → b′i = bi − η ∂C∂bi , (5)

where η is a small positive parameter known as the learn-ing rate.

In practice, there are many difficulties in applying gra-dient method to train the neural network. The mod-ified form, the stochastic gradient descent, is usuallyused to speed up the training process. In the stochas-tic gradient method, sampling over the training set isintroduced, i.e., we randomly choose N ′ samples S ′ =(X1, Y1), · · · , (XN ′ , YN ′) such that the average gradi-ent of cost function over S ′ equals roughly the averagegradient over the whole training set S. Then the updat-ing formulae are accordingly modified as

wij → w′ij = wij − ηN ′

∑N ′

i=1∂C(Xi)∂wij

, (6)

bi → b′i = bi − ηN ′

∑N ′

i=1∂C(Xi)∂bi

, (7)

where C(Xi) = ‖y(Xi)−Yi‖2/2 is the cost function overthe training input Xi.

The test data T is usually chosen differently from S,and when the training process is done, the test data isused to test the performance of the neural network, whichfor many traditionally difficult problems (such as classifi-cation and recognition) is very good. As discussed later,the feed-forward neural network and many other neuralnetworks also work well in approximating quantum states[48, 49], this being the main theme of this paper.

2. Convolutional neural network

Convolutional neural networks are another importantclass of neural network and are most commonly used toanalyze images. A typical convolutional neural networkconsists of a sequence of different interleaved layers, in-cluding a convolutional layer, a pooling layer, and a fully

connected layer. Through a differentiable function, everylayer transforms the former layer’s data (usually pixels)into a new set of data (pixels).

For regular neural networks, each neuron is fully con-nected with the neurons in the previous layer. However,for the convolution layer of a convolutional neural net-work, the neurons only connect with neurons in a localneighborhood of the previous layer. More precisely, ina convolutional layer, the new pixel values of the k-thlayer are obtained from the (k − 1)-th layer by a filterthat determines the size of the neighborhood and then

gives v(k)ij =

∑p,q wij;pqv

(k−1)p,q where the sum runs over

the neurons in the local neighborhood of v(k)ij . After the

filter scans the whole image (all pixel values), a new im-age (new set of pixel values) is obtained. The poolinglayers are usually periodically added in-between succes-sive convolutional layers and its function is to reduce thedata set. For example, the max (or average) poolingchooses the maximum (or average) value of the pixels ofthe previous layer contained in the filter. The last fullyconnected layer is the same as the one in the regular neu-ral network and outputs a class label used to determinewhich class the image is categorized in.

The weights and biases of the convolutional neural net-works are learnable parameters, but the variables such asthe size of the filter and the number of interleaved con-volutional and pooling layers are usually fixed. The con-volutional neural network performs well in classification-type machine learning tasks such as image recognition[18, 50, 51]. As has been shown numerically [52], theconvolutional neural network can also be used to buildquantum many-body states.

3. Boltzmann machine

Now we introduce another special type of artificialneural networks, the Boltzmann machine (also knownas the stochastic Hopfield network with hidden units),which is an energy-based neural network model [53, 54].Recently introduced in many different physical areas[26, 27, 29, 31, 38, 39, 55–60], the quantum versions ofBMS, quantum BMs, have also been investigated [61].As the BM is very similar to the classical Ising model,here we explain the BM neural network by frequentlyreferring to the terminology of the Ising model. Noticethat the BM is very different from the perceptrons andlogistic neural network as it does not treat each neuronindividually. Therefore, there is no activation functionattached to each specific neuron. Instead, the BM treatsneurons as a whole.

Given a graph G with vertex set V (G) and edge setE(G), the neurons s1, · · · , sn (spins in the Ising model)are put on vertices, n = |V (G)|. If two vertices i and jare connected, there is a weight wij (coupling constant inthe Ising model) between the corresponding neurons siand sj . For each neuron si, there is also a corresponding

5

local bias (local field in the Ising model). As has beendone for Ising model, for each series of input values s =(s1, · · · , sn) (spin configuration in the Ising model), wedefine its energy as

E(s) = −∑

〈ij〉∈E(G)

wijsisj −∑i

sibi. (8)

Up to now, everything is just as for the Ising model. Nonew concepts or techniques are introduced. The maindifference is that, the BM construction introduces a col-oring on each vertex. Each vertex receives a label hiddenor visible. We assume the first k neurons are hiddenneurons denoted by h1, · · · , hk, and the left l neuronsare visible neurons denoted by v1, · · · , vl and k + l = n.Therefore, the energy is now E(h,v). The BM is a para-metric model of a joint probability distribution betweenvariables h and v with the probability given by

p(h,v) =e−E(h,v)

Z, (9)

where Z =∑

h,v e−E(h,v) is the partition function.

The general BM is very difficult to train, and thereforesome restricted architecture on the BM is introduced.The restricted BM (RBM) was initially invented bySmolensky [62] in 1986. In the RBM, it is assumed thatthe graph G is a bipartite graph; the hidden neurons onlyconnect with visible neurons and there are no intra-layerconnections. This kind of restricted structure makes theneural network easier to train and therefore has been ex-tensively investigated and used [26, 27, 29, 31, 38, 39, 55–60]. The RBM can approximate every discrete probabil-ity distribution [63, 64].

The BM is most notably a stochastic recurrent neu-ral network whereas the perceptron and the logistic neu-ral network are feed-forward neural networks. There aremany other types of neural networks. For a more compre-hensive list, see textbooks such as Refs. [65, 66]. The BMis crucial in quantum neural network states and hence itsneural network states are also the most studied. In latersections, we shall discuss the physical properties of theBM neural network states and their applications.

4. Tensor networks

Tensor networks are certain contraction pattern of ten-sors, which play an important role in many scientific ar-eas such as condensed matter physics, quantum informa-tion and quantum computation, computational physics,and quantum chemistry [3–7, 12]. We discuss some de-tails of tensor networks latter in this review. Here, weonly comment on the connection between tensor networksand machine learning.

Many different tensor network structures have been de-veloped over the years for solving different problems likethe matrix product states (MPS) [67–69], projective en-tangled pair states (PEPS) [14], multiscale entanglement

renormalization ansatz (MERA) [16], branching [70, 71],and tree tensor network [72], matrix product operator[73–76], projective entangled pair operator [77–79], andcontinuous tensor networks [80–82]. A large number ofnumerical algorithms based on tensor networks are nowavailable, including the density-matrix renormalizationgroup [13], folding algorithm [15], entanglement renor-malization [16], time-evolving block decimation [17], andtangent space method [83].

One of the most important properties that empowerstensor networks is that entanglement is much easier totreat in this representation. Many studies have appearedin recent years that indicate that tensor networks have aclose relationship with state-of-the-art neural network ar-chitectures. From theory, machine learning architectureswere shown in Ref. [84] to be understood via the tensornetworks and their entanglement pattern. In practicalapplications, tensor networks can also be used for manymachine-learning tasks, for example, performing learn-ing tasks by optimizing the MPS [85, 86], preprocessingthe dataset based on layered tree tensor networks [87],classifying images via the MPS and tree tensor networks[85–88], and realizing quantum machine learning via ten-sor networks [89]. Both tensor networks and neural net-works can be applied to represent quantum many-bodystates; the difference and connections of the two kinds ofrepresentations are extensively explored in several works[27, 29, 38, 40, 43, 59, 90],. We shall review some of theseprogress in detail in Section III.

B. Representational power of neural network

Next we comment on the representational power ofneural networks, which is important in understandingthe representational power of quantum neural networkstates. In 1900, Hilbert formulated his famous list of 23problems, among which the thirteenth problem is devotedto the possibility of representing an n-variable functionas a superposition of functions of a lesser number of vari-ables. This problem is closely related to the represen-tational power of neural networks. Kolmogorov [91, 92]and Arnold [93] proved that for continuous n-variablefunctions, this is indeed the case. The result is knownas the Kolmogorov–Arnold representation theorem (al-ternatively, the Kolmogorov superposition theorem);

Theorem 1. Any n-variable real continuous functionf : [0, 1]n → R expands as sums and compositions of con-tinuous univariate functions; more precisely, there existreal positive numbers a, b, λp, λp,q and a real monotonicincreasing function φ : [0, 1]→ [0, 1] such that

f(x1, · · · , xn) =

2n+1∑q=1

F (

n∑p=1

λpφ(xp + aq) + bq), (10)

6

or

f(x1, · · · , xn) =

2n+1∑q=1

F (

n∑p=1

λpqφ(xp + aq)), (11)

where F is a real and continuous function called the outerfunction, and φ called the inner function. Note that a andF may be different in two representations.

Obviously, the mathematical structure in the theo-rem is very similar to the mathematical structure offeed-forward neural networks. Since the initial work ofKolmogorov and Arnold, numerous follow-up work con-tributed to understanding more deeply the representa-tion power of neural networks from different aspects [94].Mathematicians have considered the problem in differ-ent support sets and different metric between functions.The discrete-function version of the problem is also ex-tensively investigated. As mentioned in the previous sec-tion, McCulloch and Pitts [45] showed that any Booleanfunction can be represented by the perceptron and, basedon this fact, Rosenblatt developed the learning algorithm[95]. S lupecki proved that all k-logic functions can berepresented as a superposition of one-variable functionsand any given significant function [96]. Cybenko [97], Fu-nahashi [98] and Hornik and colleagues [99] proved thatn-variable functions defined on a compact subset of Rnmay be approximated by a four-layer network with onlylogistic activation functions and a linear activation func-tion. Hecht [100] went a step further; he proved thatany n-variable continuous function can be representedby a two-layer neural network involving logistic activa-tion functions of the first layer and arbitrary activationfunctions on the second layer. These results are summa-rized as follows;

Theorem 2. The feed-forward neural network can ap-proximate any continuous n-variable functions and anyn-variable discrete functions.

For the stochastic recurrent neural network BM, thepower in approximating probability distributions has alsobeen studied extensively. An important result of Le Rouxand Bengio [63] claims that

Theorem 3. Any discrete probability distribution p :Bn → R≥0 can be approximated with an RBM with k+1hidden neurons where k = |supp(p)| is the cardinality ofthe support of p (i.e., the number of vectors with non-zero probabilities) arbitrarily well in the metric of theKullback–Leibler divergence.

The theorem states that any discrete probability dis-tribution can be approximated by the RBM. The boundof the number of hidden neurons is later improved [64].

Here we must stress that these representation theoremsare applicable only if the given function or probabilitydistribution can be represented by the neural network. Inpractice, the number of parameters to be learned cannotbe too large for the number of input neurons when we

neural network

1v 2v 3v 4v 5v 6v 7v 8v 1Nv Nv

),,,( 1 Nvv

FIG. 2: Schematic diagram for the neural network ansatzstate.

build a neural network. If a neural network can representa function or distribution in polynomial time (the numberof parameters depends polynomially on the number ofinput neurons), we say that the representation is efficient.

III. ARTIFICIAL NEURAL NETWORK ANSATZFOR QUANTUM MANY-BODY SYSTEM

A. Neural network ansatz state

We now describe how neural network can be used as avariational ansatz for quantum many-body systems. Fora given many-body pure state |Ψ〉 of an N -particle p-levelphysical system

|Ψ〉 =

p∑v1,v2··· ,vN=1

Ψ(v1, v2 · · · , vN )|v1〉⊗|v2〉⊗· · ·⊗|vN 〉.

The coefficient Ψ(v1, v2 · · · , vN ) of the state can be re-garded as an N -variable complex function. To character-ize a state, we only need to give the corresponding valueof Ψ function for each variable v = (v1, v2 · · · , vN ). Oneof the difficulties in quantum many-body physics is thatthe complete characterization of an N -particle system re-quires O(pN ) coefficients, which is exponentially large inthe system size N and therefore is computationally inef-ficient. Let us now see how neural network can be usedto attack the difficulty.

To represent a quantum state, we first need to build aspecific architecture of the neural network for which wedenote the set of adjustable parameters as Ω = wij , bi.The number of input neurons is assumed to be the sameas the number of physical particles N . For each seriesof inputs v = (v1, · · · , vN ), we anticipate the neural net-work to output a complex number Ψ(v,Ω), which de-pends on values of both the input and parameters of theneural network. In this way, a variational state

|Ψ(Ω)〉 =∑v∈ZNp

Ψ(v,Ω)|v〉, (12)

is obtained, where the sum runs over all basis labels, |v〉denotes |v1〉⊗ · · · ⊗ |vN 〉, and Zp the set 0, 1, · · · , p− 1

7

(the labels of local basis). The state in Equation (12)is a variational state. For a given Hamiltonian H, thecorresponding energy functional is

E(Ω) =〈Ψ(Ω)|H|Ψ(Ω)〉〈Ψ(Ω)|Ψ(Ω)〉

. (13)

In accordance with the variational method, the aim nowis to minimize the energy functional and obtain the cor-responding parameter values, with which the (approxi-mate) ground state is obtained. The process of adjustingparameters and finding the minimum of the energy func-tional is performed using neural network learning (seeFigure 2). Alternatively, if the appropriate dataset exists,we can also build the quantum neural network states bystandard machine learning procedures rather than min-imizing the energy functional. We first build a neuralnetwork with learnable parameters Ω and then train thenetwork with the available dataset. Once the trainingprocess is completed, the parameters of the neural net-work are fixed; we also obtain the corresponding approx-imate quantum states.

The notion of the efficiency of the neural networkansatz in representing a quantum many-body state is de-fined as the dependency relation of the number of non-zero parameters |Ω| involved in the representation andthe number of physical particles N : if |Ω| = O(poly(N)),the representation is called efficient. The aim when solv-ing a given eigenvalue equation is therefore to build aneural network for which the ground state can be repre-sented efficiently.

To obtain the quantum neural network states from theabove construction, we first need to make the neural net-work a complex neural network, specifically, use complexparameters and output complex values. In practice, someneural networks may have difficulty outputing complexvalues. Therefore, we need to develop another way tobuild a quantum neural network state |Ψ〉. We know thatwavefunction Ψ(v) can be written as Ψ(v) = R(v)eiθ(v)

where the amplitude R(v) and phase θ(v) are both realfunctions; hence, we can represent them by two separateneural networks with parameter sets Ω1 and Ω2. Thequantum states are determined from the union of thesesets, Ω = Ω1 ∪ Ω2 (Figure 4). In Section IV, we give anexplicit example to clarify the construction. Hereinafter,most of our discussion remains focused on the complexneural network approach.

Let us now see some concrete examples of neural net-work states.

1. Some examples of neural networks states

The first neural network state we consider is the logis-tic neural network state, where weights and biases nowmust be chosen as complex numbers and the activationfunction f(z) = 1/(1 + e−z) is also a complex function.As shown in Figure 3, we take the two-qubit state as

an example. We assume the biases are b1, · · · , b4 for hid-den neurons h1, · · · , h4 respectively; the weights betweenneurons are denoted by wij . We construct the state co-efficient neuron by neuron next.

In Figure 3, the output for hi, i = 1, 2, 3 is yi =f(v1w1i + v2w2i − bi), respectively. These outputs aretransmitted to h4; after acting with h4, we get the statecoefficient,

Ψlog(v1, v2,Ω) = f(w14y1 + w24y2 + w34y3 − b4), (14)

where Ω = wij , bi. Summing over all possible in-put values, we obtain the quantum state |Ψlog(Ω)〉 =∑v1,v2

Ψlog(v1, v2,Ω)|v1, v2〉 up to a normalization fac-tor. We see that the logistic neural network states have ahierarchical iteration control structure that is responsiblefor the representation power of the network in represent-ing states.

However, when we want to give the neural networkparameters of a given state |Ψ〉 explicitly, we find thatf(z) = 1/(1 + e−z) cannot exactly take values zero andone as they are the asymptotic values of f . This short-coming can be remedied by a smoothing step function inanother way. Here we give a real function solution; thecomplex case can be done similarly. The idea is very sim-ple. We cut the function into pieces and then glue themtogether in some smooth way. Suppose that we want toconstruct a smooth activation function F (x) such that

F (x)

= 0, x ≤ −a2 ,∈ (0, 1) −a2 < x < a

2 ,= 1, x ≥ a

2 ,(15)

we can choose a kernel function

K(x) =

4xa2 + 2

a , −a2 ≤ x ≤ 0,

2a −

4xa2 , 0 ≤ x < a

2 ,(16)

The required function can then be constructed as

F (x) =

∫ x+ a2

x− a2K(x− t)s(t)dt, (17)

where s(t) is step function. It is easy to check that theconstructed function F (x) is differentiable and satisfiesEquation (15). In this way, the explicit neural networkparameters can be obtained for any given state.

Note that the above representation of the quantumstate Ψ(v) by a neural network is to develop the complex-valued neural network. It will be difficult in somecases. Because the quantum state Ψ(v) can also beexpressed as an amplitude R(v) and phase eiθ(v) asΨ(v) = R(v)eiθ(v), we can also represent the amplitudeand phase by two neural networks separately as R(Ω1,v)and θ(Ω2,v) where Ω1 and Ω2 are two respective param-eter sets of the neural networks. The approach is usedin representing a density operator by purification; to bediscussed in Section IV.

For the BM states, we notice that the classical BMnetworks can approximate a discrete probability distri-bution. The quantum state coefficient Ψ(v) is the square

8

logistic neural network state

fully connected BM state

RBM state DBM state

FIG. 3: Examples of two-qubit neural network ansatz states.

root of the probability distribution and therefore shouldalso be able to be represented by the BM. This is onereason that the BM states are introduced as a represen-tation of quantum states. Here we first treat instances offully connected BM states (Figure 3). Instances for theRBM and DBM are similar. As in the logistic states, theweights and biases of the BM are now complex numbers.The energy function is defined as

E(h,v) =− (∑i

viai +∑j

hjbj +∑〈ij〉

wijvihj

+∑〈jj′〉

wjj′hjhj′ +∑〈ii′〉

wii′vivi′), (18)

where ai and bj are biases of visible neurons and hiddenneurons, respectively; wij , wjj′ , and wii′ are connectionweights. The state coefficients are now

ΨBM (v,Ω) =∑h1

· · ·∑hl

e−E(h,v)

Z(19)

with Z =∑

v,h e−E(h,v) the partition function, and the

sum runs over all possible values of the hidden neurons.

The quantum state is |ΨBM (Ω)〉 =∑

v ΨBM (v,Ω)|v〉N where

N is the normalizing factor.Because the fully connected BM states are extremely

difficult to train in practice, the more commonly usedones are the RBM states where there is one hidden layerand one visible layer. There are no intra-layer connec-tions [hidden (resp. visible) neurons do not connect withhidden (resp. visible) neurons]. In this instance, the en-ergy function becomes

E(h,v) = −∑i

aivi −∑j

bjhj −∑ij

viWijhj .

= −∑i

aivi −∑j

hj(bj +∑i

viWij). (20)

Then the wavefunction is

Ψ(v,Ω) ∼∑h1

· · ·∑hl

e∑i aivi+

∑j hj(bj+

∑i viWij),

=∏i

eaivi∏j

Γj(v; bj ,Wij), (21)

where by ‘∼’ we mean that the overall normalizationfactor and the partition function Z(Ω) are omitted,Γj =

∑hjehj(bj+

∑i viWij) is 2 cosh(bj +

∑i viWij) or

1 + ebj+∑i viWij for hj takes values in ±1 and 0, 1,

respectively. This kind of product form of the wavefunc-tion plays an important role in understanding the RBMstates.

The DBM has more than one hidden layer; indeed, ashas been shown in Ref. [29], any BM can be transformedinto a DBM with two hidden layers. Hence, we shall onlybe concerned with the DBM with two hidden layers. Thewavefunction is written explicitly as

Ψ(v,Ω) ∼∑h1

· · ·∑hl

∑g1

· · ·∑gq

exp−E(v,h,g)

Z, (22)

where the energy function is now of the form E(v,h,g) =−∑i viai −

∑k ckgk −

∑j hjbj −

∑i,j;〈ij〉Wijvihj −∑

jk;〈kj〉Wkjhjgk. It is also difficult to train the DBM,

in general, but the DBM states have a stronger repre-sentational power than the RBM states; the details arediscussed in the next subsection.

2. Representational power of neural network states

As the neural network states were introduced inmany-body physics to represent the ground state of thetransverse-field Ising model and the antiferromagneticHeisenberg model efficiently [26], many researchers havestudied their representation power. We now know thatRBMs are capable of representing many different classesof states [28, 29, 38, 58]. Unlike their unrestricted coun-terparts, RBMs allow an efficient sampling and they arealso the most studied cases. The DBM states are alsoexplored in various works [29, 42, 43]. In this section, webriefly review the progress in this direction.

We first list some known classes of states that canbe efficiently represented by RBM: Z2-toric code states[28]; graph states [29]; stabilizer states with generators ofpure type, SX ,SY ,SZ and their arbitrary union [38]; per-fect surface code states, surface code states with bound-aries, defects, and twists [38]; Kitaev’s D(Zd) quantumdouble ground states [38]; the arbitrary stabilizer codestate [40]; ground states of the double semion modeland the twisted quantum double models [41]; states ofthe Affleck–Lieb–Kennedy–Tasaki model and the two-dimensional CZX model [41]; states of Haah’s cubic codemodel [41]; and the generalized-stabilizer and hypergraphstates [41]. The algorithmic way to obtain the RBM pa-rameters of the stabilizer code state for arbitrary givenstabilizer group S has also been developed [40].

9

Although many important classes of states may be rep-resented by the RMB, there is a crucial result regardinga limitation: [29] there exist states that can be expressedas PEPS [101] but cannot be efficiently represented bya RBM; moreover, the class of RBM states is not closedunder unitary transformations. One way to remedy thedefect is by adding one more hidden layer, that is, usingthe DBM.

The DBM can efficiently represent physical states in-cluding:

• Any state which can be efficiently represented byRBMs 2;

• Any n-qubit quantum states generated by a quan-tum circuit of depth T ; the number of hidden neu-rons is O(nT ) [29];

• Tensor network states consist of n-local tensorswith bound dimension D and maximum coordina-tion number d; the number of hidden neurons isO(nD2d) [29];

• The ground states of Hamiltonians with gap ∆; the

number of hidden neurons is O(m2

∆ (n−log ε)) whereε is the representational error [29];

Although there are many known results concerning theBM states, the same for other neural networks neverthe-less has been barely explored.

B. Tensor network states

Let us now introduce a closely related representation ofthe quantum many-body states—the tensor network rep-resentation, which was originally developed in the con-text of condensed matter physics based on the idea ofthe renormalization group. Tensor network states havenow applications in many different scientific fields. Ar-guably, the most important property of the tensor net-work states is that entanglement is much easier to readout than other representations.

Although there are many different types of tensor net-works, we focus here on the two simplest and easily ac-cessible ones, the MPS and the PEPS. For other morecomprehensive reviews, see [3–7, 12].

By definition, a rank-n tensor is a complex variablewith n indices, for example Ai1,i2,··· ,in . The number ofvalues that an index ik can take is called the bond di-mension of ik. The contraction of two tensors is a newtensor, that being defined as the sum over any num-ber of pairs of indices; for example, Ci1,··· ,ip,k1,··· ,kq =

2 This can be done by setting all the parameters involved in thedeep hidden layer to zeros; only the parameters of the shallowhidden layer remain nonzero.

∑j1,··· ,jl Ai1,··· ,ip,j1,··· ,jlBj1,··· ,jl,k1,··· ,kq . A tensor net-

work is a set of tensors for which some (or all) of theindices are contracted.

Representing the tensor network graphically is quiteconvenient. The corresponding diagram is called a tensornetwork diagram, in which, a rank-n tensor is representedas a vertex with n-edges, for example, a scalar is just avertex, a vector is a vertex with one edge, and a matrixis a vertex with two edges:

scalar : ; vector : ; matrix : . (23)

The contraction is graphically represented by connectingtwo vertices with the same edge label. For two vectorsand matrices, this corresponds to the inner product andthe matrix product, respectively. Graphically, they looklike

inner product :∑i

aibi = ; (24)

matrix product :∑j

AijBjk = . (25)

How can we use the tensor network to represent amany-body quantum state? The idea is to regard thewavefunction Ψ(v1, · · · , vn) = 〈v|Ψ〉 as a rank-n ten-sor Ψv1,··· ,vn . In some cases, the tensor wavefunctioncan break into some small pieces, specifically, contrac-tion of some small tensors. For example Ψv1,··· ,vn =∑α1,··· ,αn A

[1]i1;αnα1

A[2]i2;α1α2

· · ·A[n]in;αn−1αn

. Graphically,

we have

= , (26)

where each A[k]ik;αk−1αk

is a local tensor depending only on

some subset of indices v1, · · · , vn. In this way, physi-cal properties such as entanglement are encoded into thecontraction pattern of the tensor network diagram. Itturns out that this kind of representation is very power-ful in solving many physical problems.

There are several important tensor network structures.We take two prototypical tensor network states usedfor 1d and 2d systems, MPS states, [67–69] and PEPSstates [14], as examples to illustrate the construction oftensor-network states. In Table II, we list some of themost popular tensor-network structures including MPS,PEPS, MERA [16], branching MERA [70, 71], and treetensor networks [72], We also list the main physical prop-erties of these structures, such as correlation length and

10

TABLE II: Some popular tensor network structures and their properties.

Tensor network structure Entanglement entropy S(A) correlation length ξ local observable 〈O〉 diagram

Matrix product state O(1) finite exact

Projective entangled pair state (2d) O(|∂A|) finite/infinite approximate

Multiscale entanglement

renormalization ansatz (1d) O(log |∂A|) finite/infinite exact

Branching multiscale entanglement

renormalization ansatz (1d) O(log |∂A|) finite/infinite exact

Tree tensor networks O(1) finite exact

entanglement entropy. For more examples, see Refs. [3–7, 12]

A periodic-boundary-condition MPS state is just likethe right-hand side of Equation (26), which consists ofmany local rank-3 tensors. For the open boundary case,the boundary local tensor is replaced with rank-2 ten-sors, and the inner part remains the same. The MPSscorrespond to the low energy eigenstates of local gapped1d Hamiltonians [102, 103]. The correlation length of theMPS is finite and they obey the entanglement area law,thus they cannot be used for representing quantum statesof critical systems that break the area law [8].

The PEPS state can be regarded as a higher-dimensional generalization of MPS. Here we give an ex-ample of a 2d 3× 3 PEPS state with open boundary

ΨPEPS(v) = .

(27)The typical local tensors for PEPS states are rank-5 ten-sors for the inner part, rank-4 tensors for the bound-ary part and rank-3 tensors for the corner part. The2d PEPSs capture the low-energy eigenstates of 2d localHamiltonians, which obey the entanglement area law [8].PEPSs have some difference with MPS; their correlationlength is not always finite and can be used to representquantum states of critical systems. However, there is,by now, no efficient way to contract physical informationfrom PEPS exactly, therefore, many approximate meth-

ods have been developed in recent years.The tensor network states have a close relationship

with neural network states. Their connections are ex-tensively explored in many studies [29, 39, 59]. Here, webriefly discuss how to transform a RBM state into a ten-sor network state. To do this, we need to regard visibleand hidden neurons as tensors. For example, the visibleneuron vi and hidden neuron hj is now replaced by

V (i) =

(1 00 eai

), (28)

H(j) =

(1 00 ebj

), (29)

and the weighted connection between vi and hj is nowalso replaced by a tensor

W (ij) =

(1 11 ewij

). (30)

It is easy to check that both RBM and tensor networkrepresentations give the same state. Note that by somefurther optimization, any local RBM state can be trans-formed into an MPS state [59]. The general correspon-dence between RBM and tensor-network states has beendiscussed in Ref. [59]. One crucial thing is that here weare only concerned with reachability, specifically, whetherone representation can be represented by another. How-ever, in practical applications, we must also know theefficiency to represent one by the other. As indicatedin Section III A 2, there exist some tensor network stateswhich cannot be efficiently represented by RBM.

We note that there are also several studies trying tocombine the respective advantages of a tensor networkand a neural network to give a more powerful represen-tation of the quantum many-body states [90].

11

C. Advances in quantum many-body calculations

There are several studies concerning numerical testsof the accuracy and efficiency of neural network statesfor different physical systems and different physical phe-nomena [22–29, 34–38]. The early work trying to use aneural network to solve the Schrodinger equations [34–37]date back to 2001. Recently, in 2016, Carleo and Troyermade the approach popular in calculating physical quan-tities of the quantum systems [26]. Here we briefly dis-cuss several examples of numerical calculations in many-body physics, including spin systems, and bosonic andfermionic systems.

Transverse-field Ising model. —The Hamiltonian forthe Ising model immersed in a transverse field is givenby

HtIsing = −J∑〈ij〉

ZiZj −B∑i

Xi, (31)

where the first sum runs over all nearest neighbor pairs.For the 1d case, the system is gapped as long as J 6= Bbut gapless when J = B. In Ref. [26], Carleo and Troyerdemonstrated that the RBM state works very well infinding the ground state of the model. By minimizingthe energy E(Ω) = 〈Ψ(Ω)|HtIsing|Ψ(Ω)〉/〈Ψ(Ω)|Ψ(Ω)〉with respect to the network parameters Ω using the im-proved gradient-descent optimization, they showed thatthe RBM states achieve an arbitrary accuracy for both1d and 2d systems.

Antiferromagnetic Heisenberg model. —The antifer-romagnetic Heisenberg model is of the form

H = J∑〈ij〉

SiSj , (J > 0) (32)

where the sum runs over all nearest neighbor pairs. InRef. [26], the calculation of the model is performed for the1d and 2d systems using the RBM states. The accuracyof the neural network ansatz turns out to be much betterthan the traditional spin-Jastrow ansatz [104] for the 1dsystem. The 2d system is harder, and more hidden neu-rons are needed to reach a high accuracy. In Ref. [105],a combined approach is presented; the RBM architec-ture was combined with a conventional variational MonteCarlo method with paired-product (geminal) wave func-tions to calculate the ground-state energy and groundstate. They showed that the combined method has ahigher accuracy than that achieved by each method sep-arately.J1-J2 Heisenberg model. —The J1-J2 Heisenberg

model (also known as the frustrated Heisenberg model)is of the form

H = J1

∑〈ij〉

SiSj + J2

∑〈〈ij〉〉

SiSj , (33)

where the first sum runs over all nearest neighbor pairsand the second sum runs over the next-nearest-neighbor

pairs. Cai and Liu [48] produced expressions of the neu-ral network states in this model using the feed-forwardneural networks. They used the variational Monte Carlomethod to find the ground state for the 1d system andobtained precisions to ∼ O(10−3). Liang and colleagues[52] investigated the model using the convolutional neuralnetwork, and showed that the precision of the calculationbased on convolutional neural network exceeds the stringbond state calculation.

Hubbard model. —The Hubbard model is a model ofinteracting particles on a lattice and endeavors to capturethe phase transition between conductors and insulators.It has been used to describe superconductivity and coldatom systems. The Hamiltonian is of the form

H = −t∑〈ij〉,σ

(c†i,σ cj,σ + c†j,σ ci,σ) + U∑i

ni,↑ni,↓, (34)

where the first term accounts for the kinetic energy and

the second term the potential energy; c†i,σ and ci,σ de-note the usual creation and annihilation operators, with

ni,σ = c†i,σci,σ. The phase diagrams of the Hubbardmodel have not been completely determined yet. InRef. [105], Nomura and colleagues numerically analyzedthe ground state energy of the model by combining theRBM and the pair product states approach. They showednumerically that the accuracy of the calculation sur-passes the many-variable variational Monte Carlo ap-proach when U/t = 4, 8. A modified form of the model,described by the Bose–Hubbard Hamiltonian, was stud-ied in Ref. [49] using a feed-forward neural network. Theresult is in good agreement with the calculation given byan exact diagonalization and the Gutzwiller approxima-tion.

Here we briefly mention several important examples ofnumerical calculations of many-body physical systems.Numerous other numerical works concerning many dif-ferent physical models have appeared. We refer the inter-ested readers to e.g., Refs. [22–29, 34–38, 48, 49, 52, 105]

IV. DENSITY OPERATORS REPRESENTEDBY NEURAL NETWORK

A. Neural network density operator

In realistic applications of quantum technologies, thestates that we are concerned about are often mixed be-cause the system is barely isolated from its environment.The mixed states are mathematically characterized bythe density operator ρ which is (i) Hermitian ρ† = ρ;(ii) positive semi-definite 〈Ψ|ρ|Ψ〉 ≥ 0 for all |Ψ〉; and(iii) trace one Trρ = 1. The pure state |Ψ〉 provides arepresentation of the density operator ρΨ = |Ψ〉〈Ψ| andthe general mixed states are non-coherent superpositions(classical mixture) of pure density operators. Let us con-sider the situation for which the physical space of thesystem is HS with basis v1, · · · , vn and the environment

12

Amplitude Phase

FIG. 4: RBM construction of the latent space purification fora density operator.

space is HE with basis e1, · · · , em. For a given mixedstate ρS of the system, then if we take into account theeffect of the environment there is a pure state |ΨSE〉 =∑

v

∑e Ψ(v, e)|v〉|e〉 for which ρS = TrE |ΨSE〉〈ΨSE |.

Every mixed state can be purified in this way.In Ref. [106], Torlai and Melko explored the possibil-

ity of representing mixed states ρS using the RBM. Theidea is the same as that for pure states. We build a neu-ral network with parameters Ω, and for the fixed basis|v〉, the density operator is given by the matrix entriesρ(Ω,v,v′), which is determined by the neural network.Therefore, we only need to map a given neural networkwith parameters Ω to a density operator as

ρ(Ω) =∑v,v′

|v〉ρ(Ω,v,v′)〈v′|. (35)

To this end, the purification method of the densityoperators is used. The environment is now represented bysome extra hidden neurons e1, · · · , em besides the hiddenneurons h1, · · · , hl. The purification |ΨSE〉 of ρS is nowcaptured by the parameters of the network, which we stilldenote as Ω, i.e.,

|ΨSE〉 =∑v

∑e

ΨSE(Ω,v, e)|v〉|e〉. (36)

By tracing out the environment, the density operator alsois determined by the network parameters

ρS =∑v,v′

[∑e

ΨSE(Ω,v, e)Ψ∗SE(Ω,v′, e)]|v〉〈v′|. (37)

To represent the density operators, Ref. [106] takes theapproach to represent the amplitude and phase of the pu-rified state |ΨSE〉 by two separate neural networks. First,the environment units are embedded into the hidden neu-ron space, i.e., they introduced some new hidden neuronse1, · · · , em, which are fully connected to all visible neu-rons (See Figure 4). The parameters corresponding tothe amplitude and phase of the wave function are now en-coded in the RBM with two different sets of parameters.That is, the state ΨSE(Ω,v, e) = R(Ω1,a,v)eiθ(Ω2,a,v)

with Ω = Ω1 ∪ Ω2. R(Ω1,a,v) and θ(Ω2,a,v) are bothcharacterized by the corresponding RBM (this structureis called the latent space purification by authors). In thisway, the coefficients of the purified state |ΨSE〉 encoded

by the RBM are

ΨSE(Ω,v, e) =

√∑h e−E(Ω1,v,h,e)

Z(Ω1)ei

log∑

h e−E(Ω2,v,h,e)

2 ,

(38)where Z(Ωi) =

∑h

∑e

∑v e−E(Ωi,v,h,e) is the partition

function corresponding to Ωi. The density operator cannow be obtained from Equation (37).

B. Neural network quantum state tomography

Quantum state tomography aims to identify or recon-struct an unknown quantum state from a dataset of ex-perimental measurements. The traditional exact brute-force approach to quantum state tomography is only fea-sible for systems with a small number of degress of free-dom otherwise the demand on computational resources ishigh. For pure states, the compressed sensing approachcircumvents the experimental difficulty and requires onlya reasonable number of measurements [107]. The MPStomography works well for states with low entanglement[108, 109]. For general mixed states, the efficiency of thepermutationally invariant tomography scheme based onthe internal symmetry of the quantum states is low [110].Despite all the progress, the general case for quantumstate tomography is still very challenging.

The neural network representation of quantum statesprovides another approach to state tomography. Here wereview its basic idea. For clarity (although there will besome overlap), we discuss its application to pure statesand mixed states separately.

From the work by Torlai and colleagues, [33] for apure quantum state, the neural network tomographyworks as follows. To reconstruct an unknown state |Ψ〉,we first perform a collection of measurements v(i),i = 1, · · · , N and therefore obtain the probabilitiespi(v

(i)) = |〈v(i)|Ψ〉|2. The aim of the neural networktomography is to find a set of RBM parameters Ω suchthat the RBM state Φ(Ω,v(i)) mimics the probabilitiespi(v

(i)) as closely as possible in each basis. This canbe done in neural network training by minimizing thedistance function (total divergence) between |Φ(Ω,v(i))|2and pi(v

(i)). The total divergence is chosen as

D(Ω) =

N∑i=1

DKL[|Φ(Ω,v(i))|2|pi(v(i))], (39)

where DKL[|Φ(Ω,v(i))|2|pi(v(i))] is the Kullback–Leibler(KL) divergence in basis v(i).

Note that to estimate the phase of |Ψ〉 in the refer-ence basis, a sufficiently large number of measurementbases should be included. Once the training is completed,we get the target state |Φ(Ω)〉 in the RBM form, whichis the reconstructed state for |Ψ〉. In Ref. [33], Torlaiand colleagues test the scheme for the W-state, modifiedW state with local phases, Greenberger–Horne–Zeilinger

13

and Dicke states, and also the ground states for thetransverse-field Ising model and XXZ model. They findthe scheme is very efficient and the number of measure-ment bases usually scales only polynomially with systemsize.

The mixed state case is studied in Ref. [106] and isbased on the RBM representations of the density oper-ators. The core idea is the same as for the pure state;that is, to reconstruct an unknown density operator ρ,we need to build an RBM neural network density σ(Ω)with RBM parameter set Ω. Before training the RBM,we must perform a collection of measurements v(i)and obtain the corresponding probability distributionpi(v

(i)) = 〈v(i)|ρ|v(i)〉. The training process involvesminimizing the total divergence between the experimen-tal probability distribution and the probability distribu-tion calculated from the test RBM state σ(Ω). Afterthe training process, we obtain a compact RBM repre-sentation of the density operator ρ, which may be usedto calculate the expectation of the physical observable.Neural network state tomography is efficient and accu-rate in many cases. It provides a good supplement to thetraditional tomography schemes.

V. ENTANGLEMENT PROPERTIES OFNEURAL NETWORK STATES

The notion of entanglement is ubiquitous in physics.To understand the entanglement properties of the many-body state is a central theme in both condensed matterphysics and quantum information theory. Tensor net-work representations of quantum states have an impor-tant advantage in that entanglement can be read outmore easily. Here, we discuss the entanglement prop-erties of the neural network states for a comparison withtensor networks.

For a given N -particle quantum system in state |Ψ〉,we divide the N particles into two groups A and Ac.With this bipartition, we calculate the Renyi entan-glement entropy SαR(A) := 1

1−α log TrραA, which char-acterizes the entanglement between A and Ac, whereρA = TrAc(|Ψ〉〈Ψ|) is the reduced density matrix. Ifthe Renyi entanglement entropy is nonzero, then A andAc are entangled.

The entanglement property is encoded in the geometryof the contraction patterns of the local tensors for tensornetwork states. For neural network states, it was shownthat the entanglement is encoded in the connecting pat-terns of the neural networks [27, 43, 58–60]. For RBMstates, Deng, Li, and Das Sarma [27] showed that locallyconnected RBM states obey the entanglement area law,see Figure 5(a) for an illustration of a local RBM state.Nonlocal connections result in the volume-law entangle-ment of the states [27]. We extended this result for anyBM, showing that by cutting the intra-layer connectionand adding hidden neurons, any BM state may be re-duced to a DBM state with several hidden layers. Then

(a)

local RBM

(b)

local DBM

FIG. 5: Example of (a) a local RBM state and (b) a localDBM state.

using the folding trick, folding the odd layers and evenlayers separately, every BM is reduced into a DBM withonly two hidden layers [29, 43]. Then we showed thatthe locally connected DBM states obey the entanglementarea law, and the DBM with nonlocal connections pos-sess volume-law entanglement [43], see Figure 5(b) for anillustration of a local DBM state.

The relationship between the BM and tensor networkstates was investigated in Refs. [29, 58, 59], and somealgorithmic way of transforming an RBM state into aMPS was given in Ref. [59]. The capability to representtensor network states using the BM was investigated in[29, 58] from a complexity theory perspective.

One final aspect is realizing the holographic geometry-entanglement correspondence using BM states [43, 60].When proving the entanglement area law and volumelaw of the BM states, the concept of locality must be in-troduced, this means that we must introduce a geometrybetween neurons. This geometry results in the entangle-ment features of the state. When we try to understandthe holographic entanglement entropy, we first tile theneurons in a given geometry and then make it learn fromdata. After the learning process is done, we can see theconnecting pattern of the neural network and analyze thecorresponding entanglement properties, which have a di-rect relationship to the given geometry, such as the signsof the space curvature.

Although much progress on the entanglement proper-ties of neural network states has been made, we still knowvery little about it. The entanglement features of neuralnetworks other than the BM have not been investigatedat all and remain to be explored in future work.

VI. QUANTUM COMPUTING AND NEURALNETWORK STATES

There is another crucial application of neural networkstates, namely, classical simulation of quantum comput-ing, which we briefly review in this section. It is well-

14

known that quantum algorithms can provide exponen-tial speedup over some of the best known classical algo-rithms for many problems such as factoring integers [111].Quantum computers are being actively developed of late,but one crucial problem, known as quantum supremacy[112], emerges naturally. Quantum supremacy concernsthe potential capabilities of quantum computers thatclassical computers practically do not have and the re-sources required to simulate quantum algorithms usinga classical computer. Studies of classical simulationsof quantum algorithms can also guide us to understandwhat are the practical applications of the quantum com-puting platforms developed recently in different labora-tories. Here we introduce the approach to simulatingquantum circuits based on the neural network represen-tation of quantum states.

Following Ref. [29], we first discuss how to simulatequantum computing via DBM states, since in the DBMformalism, all operations can be written out analytically.A general quantum computing process can be loosely di-vided into three steps: (i) initial state preparation, (ii)applying quantum gates, and (iii) measuring the outputstate. For the DBM state simulation in quantum com-puting, the initial state is first represented by a DBMnetwork. We are mainly concerned in how to apply a uni-versal set of quantum gates in the DBM representations.As we shall see, this can be achieved by adding hiddenneurons and weighted connections. Here the universalquantum gates is chosen as single-qubit rotation aroundz-axis Z(θ), the Hadamard gate H, and controlled rota-tions around the z-axis CZ(θ) [113].

We continue still to denote the calculating basis by |v〉;the input state is then represented by DBM neural net-work as Ψin(v,Ω) = 〈v|Ψin(Ω)〉. To simulate the circuitquantum computing, characterized by unitary transformUC , we need to devise strategies so that we can apply allthe universal quantum gates to achieve the transform,

〈v|Ψin(Ω)〉 DBM→ 〈v|Ψout(Ω)〉 = 〈v|UC |Ψin(Ω)〉. (40)

Let us first consider how to construct the Hadamardgate operation

H|0〉 =1√2

(|0〉+ |1〉), H|0〉 =1√2

(|0〉 − |1〉). (41)

If H acts on the i-th qubit of the system, we can thenrepresent the operation in terms of the coefficients of thestate,

Ψ(· · · vi · · · )H→ Ψ′(· · · v′i · · · )

=∑vi=0,1

1√2

(−1)viv′iΨ(· · · vi · · · ). (42)

In DBM settings, it is clear now that the HadamardDBM transform of the i-th qubit adds a new visible neu-ron v′i, which replaces vi, and another hidden neuronHi and vi now becomes a hidden neuron. The connec-tion weight is given by WH(v,Hi) = iπ

8 −ln2 −

iπv2 −

iπHi4 + iπvHi, where v = vi, v

′i. We easily check that∑

Hi=0,1 eWH(vi,Hi)+WH(v′i,Hi) = 1√

2(−1)viv

′i , which com-

pletes the construction of the Hadamard gate operation.

The Z(θ) gate operation,

Z(θ)|0〉 = e−iθ

2 |0〉, Z(θ)|1〉 = eiθ2 |1〉, (43)

can be constructed similarly. We can also add a new vis-ible neuron v′i and a hidden neuron Zi, and vi becomesa hidden neuron that should be traced. The connectionweight is given be WZ(θ)(v, Zi) = − ln 2

2 + iθv2 + iπvZi

where v = vi, v′i. The DBM transform of the controlled

Z(θ) gates is slightly different from single qubit gatesbecause it is a two-qubit operation acting on vi and vj .To simplify the calculation, we give here the explicit con-struction for CZ. This can be done by introducing a newhidden neuron Hij , which connects both vi and vj withthe same weights as those given by the Hadamard gate.In summary, we have

H ⇔ DBM→ , (44)

Z(θ) ⇔ DBM→ , (45)

•Z

⇔ DBM→ . (46)

Note that we can also achieve the quantum gate H inthe DBM setting by adding directly a new visible neuronv′i and connecting it with the (hidden) neuron vi. Z(θ)can also be realized in the DBM setting by changing thebias of the visible neuron vi. We choose the methodpresented above simply to make the construction clearerand more systematic.

The above protocol based on the DBM is an exactsimulation but has a drawback in that the sampling ofthe DBM quickly becomes intractable with increasingdepth of the circuit because the gates are realized byadding deep hidden neurons. In contrast, RBMs areeasier to train; a simulation based on the RBM has al-ready been developed [114]. The basic idea is the sameas the DBM approach, the main difference being thatHadamard gate cannot be exactly simulated in the RBMsetting. In Ref. [114], the authors developed the approx-imation method to simulate the Hadamard gate oper-ation. The RBM realizations of Z(θ) and CZ(θ) areachieved by adjusting the bias and introducing a newhidden neuron and weighted connections, respectively.

15

VII. CONCLUDING REMARKS

In this work, we discussed aspects of the quantum neu-ral network states. Two important kinds of neural net-works, feed-forward and stochastic recurrent, were cho-sen as examples to illustrate how neural networks canbe used as a variational ansatz state of quantum many-body systems. We reviewed the research progress on neu-ral network states. The representational power of thesestates was discussed and entanglement features of theRBM and DBM states reviewed. Some applications ofquantum neural network states, such as quantum statetomography and classical simulations of quantum com-puting, were also discussed.

In addition to the foregoing, we present some remarkson the main open problems regarding quantum neuralnetwork states.

• One crucial problem is to explain why the neu-ral network works so well for some special tasks.There should be deep reasons for this. Understand-ing the mathematics and physics behind the neuralnetworks may help to build many other importantclasses of quantum neural network states and guideus in applying the neural network states to differentscientific problems.

• Although the BM states have been studied fromvarious aspects, many other neural networks areless explored in regard to representing quantumstates both numerically and theoretically. Thisraises the question whether other networks can alsoefficiently represent quantum states, and what arethe differences between these different representa-tions?

• Developing the representation theorem for the com-plex function is also a very important topic in quan-tum neural network states. Because we must buildthe quantum neural network states from complexneural networks, as we have discussed, so it is im-portant to understand the expressive power of thecomplex neural network.

• Having a good understanding of entanglement fea-tures is of great importance in understanding quan-tum phases and the quantum advantage over someinformation tasks. Therefore, we can also ask ifthere is an easy way to read out entanglement prop-erties from specific neural networks such as the ten-sor network.

We hope that our review of the quantum neural net-work states inspires more work and exploration of thecrucial topics highlighted above.

Acknowledgments

Z.-A. Jia thanks Zhenghan Wang and hospitality ofDepartment of Mathematics of UCSB. He also acknowl-edges Liang Kong and Tian Lan for discussions duringhis stay in Yau Mathematical Science Center of TsinghuaUniversity, and he also benefits from the discussion withGiuseppe Carleo during the first international conferenceon “Machine Learning and Physics” at IAS, TsinghuaUniversity. This work was supported by the Anhui Ini-tiative in Quantum Information Technologies (Grant No.AHY080000).

[1] T. J. Osborne, “Hamiltonian complexity,” Reports onProgress in Physics 75, 022001 (2012).

[2] F. Verstraete, “Quantum hamiltonian complexity:Worth the wait,” Nature Physics 11, 524 (2015).

[3] R. Orus, “A practical introduction to tensor networks:Matrix product states and projected entangled pairstates,” Annals of Physics 349, 117 (2014).

[4] Z. Landau, U. Vazirani, and T. Vidick, “A polynomialtime algorithm for the ground state of one-dimensionalgapped local hamiltonians,” Nature Physics 11, 566(2015).

[5] I. Arad, Z. Landau, U. Vazirani, and T. Vidick, “Rig-orous rg algorithms and area laws for low energy eigen-states in 1d,” Communications in Mathematical Physics356, 65 (2017).

[6] N. Schuch, M. M. Wolf, F. Verstraete, and J. I. Cirac,“Computational complexity of projected entangled pairstates,” Phys. Rev. Lett. 98, 140506 (2007).

[7] A. Anshu, I. Arad, and A. Jain, “How local is the in-formation in tensor networks of matrix product statesor projected entangled pairs states,” Phys. Rev. B 94,195143 (2016).

[8] J. Eisert, M. Cramer, and M. B. Plenio, “Colloquium,”Rev. Mod. Phys. 82, 277 (2010).

[9] L. Amico, R. Fazio, A. Osterloh, and V. Vedral, “En-tanglement in many-body systems,” Rev. Mod. Phys.80, 517 (2008).

[10] M. Friesdorf, A. H. Werner, W. Brown, V. B. Scholz,and J. Eisert, “Many-body localization implies thateigenvectors are matrix-product states,” Phys. Rev.Lett. 114, 170505 (2015).

[11] F. Verstraete, V. Murg, and J. Cirac, “Matrix prod-uct states, projected entangled pair states, and vari-ational renormalization group methods for quantumspin systems,” Advances in Physics 57, 143 (2008),https://doi.org/10.1080/14789940801912366 .

[12] R. Orus, “Tensor networks for complex quantum sys-tems,” arXiv preprint arXiv:1812.04011 (2018).

[13] S. R. White, “Density matrix formulation for quan-tum renormalization groups,” Phys. Rev. Lett. 69, 2863(1992).

[14] F. Verstraete and J. I. Cirac, “Renormalization algo-rithms for quantum-many body systems in two andhigher dimensions,” arXiv preprint cond-mat/0407066

16

(2004).[15] M. C. Banuls, M. B. Hastings, F. Verstraete, and J. I.

Cirac, “Matrix product states for dynamical simulationof infinite chains,” Phys. Rev. Lett. 102, 240603 (2009).

[16] G. Vidal, “Entanglement renormalization,” Phys. Rev.Lett. 99, 220405 (2007).

[17] G. Vidal, “Efficient classical simulation of slightly en-tangled quantum computations,” Phys. Rev. Lett. 91,147902 (2003).

[18] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”Nature 521, 436 (2015).

[19] G. E. Hinton and R. R. Salakhutdinov, “Reducing thedimensionality of data with neural networks,” Science313, 504 (2006).

[20] R. S. Sutton and A. G. Barto, Reinforcement learning:An introduction, Vol. 1 (MIT press Cambridge, 1998).

[21] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost,N. Wiebe, and S. Lloyd, “Quantum machine learning,”Nature 549, 195 (2017).

[22] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quan-tum support vector machine for big data classification,”Phys. Rev. Lett. 113, 130503 (2014).

[23] V. Dunjko, J. M. Taylor, and H. J. Briegel, “Quantum-enhanced machine learning,” Phys. Rev. Lett. 117,130501 (2016).

[24] A. Monras, G. Sentıs, and P. Wittek, “Inductive super-vised quantum learning,” Phys. Rev. Lett. 118, 190503(2017).

[25] J. Carrasquilla and R. G. Melko, “Machine learningphases of matter,” Nature Physics 13, 431 (2017).

[26] G. Carleo and M. Troyer, “Solving the quantum many-body problem with artificial neural networks,” Science355, 602 (2017).

[27] D.-L. Deng, X. Li, and S. Das Sarma, “Quantum en-tanglement in neural network states,” Phys. Rev. X 7,021021 (2017).

[28] D.-L. Deng, X. Li, and S. Das Sarma, “Machine learn-ing topological states,” Phys. Rev. B 96, 195145 (2017).

[29] X. Gao and L.-M. Duan, “Efficient representation ofquantum many-body states with deep neural networks,”Nature Communications 8, 662 (2017).

[30] M. August and X. Ni, “Using recurrent neural networksto optimize dynamical decoupling for quantum mem-ory,” Phys. Rev. A 95, 012335 (2017).

[31] G. Torlai and R. G. Melko, “Neural decoder for topo-logical codes,” Phys. Rev. Lett. 119, 030501 (2017).

[32] Y. Zhang and E.-A. Kim, “Quantum loop topographyfor machine learning,” Phys. Rev. Lett. 118, 216401(2017).

[33] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer,R. Melko, and G. Carleo, “Neural-network quantumstate tomography,” Nature Physics 14, 447 (2018).

[34] C. Monterola and C. Saloma, “Solving the nonlinearschrodinger equation with an unsupervised neural net-work,” Optics Express 9, 72 (2001).

[35] C. Monterola and C. Saloma, “Solving the nonlinearschrodinger equation with an unsupervised neural net-work: estimation of error in solution,” Optics commu-nications 222, 331 (2003).

[36] C. Caetano, J. Reis Jr, J. Amorim, M. R. Lemes, andA. D. Pino Jr, “Using neural networks to solve nonlineardifferential equations in atomic and molecular physics,”International Journal of Quantum Chemistry 111, 2732(2011).

[37] S. Manzhos and T. Carrington, “An improved neuralnetwork method for solving the schrodinger equation,”Canadian Journal of Chemistry 87, 864 (2009).

[38] Z.-A. Jia, Y.-H. Zhang, Y.-C. Wu, L. Kong, G.-C. Guo,and G.-P. Guo, “Efficient machine-learning representa-tions of a surface code with boundaries, defects, domainwalls, and twists,” Phys. Rev. A 99, 012307 (2019).

[39] Y. Huang and J. E. Moore, “Neural network representa-tion of tensor network and chiral states,” arXiv preprintarXiv:1701.06246 (2017).

[40] Y.-H. Zhang, Z.-A. Jia, Y.-C. Wu, and G.-C. Guo,“An efficient algorithmic way to construct boltzmannmachine representations for arbitrary stabilizer code,”arXiv preprint arXiv:1809.08631 (2018).

[41] S. Lu, X. Gao, and L.-M. Duan, “Efficient represen-tation of topologically ordered states with restrictedboltzmann machines,” arXiv preprint arXiv:1810.02352(2018).

[42] W.-C. Gan and F.-W. Shu, “Holography as deep learn-ing,” International Journal of Modern Physics D 26,1743020 (2017).

[43] Z.-A. Jia, Y.-C. Wu, and G.-C. Guo, “Holographicentanglement-geometry duality in deep neural networkstates,” to be published (2018).

[44] T. Kohonen, “An introduction to neural computing,”Neural Networks 1, 3 (1988).

[45] W. S. McCulloch and W. Pitts, “A logical calculus ofthe ideas immanent in nervous activity,” The bulletinof mathematical biophysics 5, 115 (1943).

[46] M. Minsky and S. A. Papert, Perceptrons: An introduc-tion to computational geometry (MIT press, 2017).

[47] M. A. Nielsen, Neural Networks and Deep Learning (De-termination Press, 2015).

[48] Z. Cai and J. Liu, “Approximating quantum many-bodywave functions using artificial neural networks,” Phys.Rev. B 97, 035116 (2018).

[49] H. Saito, “Solving the bose–hubbard model with ma-chine learning,” Journal of the Physical Society of Japan86, 093001 (2017).

[50] Y. LeCun, Y. Bengio, et al., “Convolutional networksfor images, speech, and time series,” The handbook ofbrain theory and neural networks 3361, 1995 (1995).

[51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Ima-genet classification with deep convolutional neural net-works,” in Advances in neural information processingsystems (2012) pp. 1097–1105.

[52] X. Liang, W.-Y. Liu, P.-Z. Lin, G.-C. Guo, Y.-S. Zhang,and L. He, “Solving frustrated quantum many-particlemodels with convolutional neural networks,” Phys. Rev.B 98, 104426 (2018).

[53] G. E. Hinton and T. J. Sejnowski, “Optimal perceptualinference,” in Proceedings of the IEEE conference onComputer Vision and Pattern Recognition (IEEE NewYork, 1983) pp. 448–453.

[54] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “Alearning algorithm for boltzmann machines,” Cognitivescience 9, 147 (1985).

[55] G. Torlai and R. G. Melko, “Learning thermodynamicswith boltzmann machines,” Phys. Rev. B 94, 165134(2016).

[56] K.-I. Aoki and T. Kobayashi, “Restricted boltzmannmachines for the long range ising models,” ModernPhysics Letters B 30, 1650401 (2016).

[57] S. Weinstein, “Learning the einstein-podolsky-rosen cor-

17

relations on a restricted boltzmann machine,” arXivpreprint arXiv:1707.03114 (2017).

[58] L. Huang and L. Wang, “Accelerated monte carlo simu-lations with restricted boltzmann machines,” Phys. Rev.B 95, 035105 (2017).

[59] J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang,“Equivalence of restricted boltzmann machines and ten-sor network states,” Phys. Rev. B 97, 085104 (2018).

[60] Y.-Z. You, Z. Yang, and X.-L. Qi, “Machine learn-ing spatial geometry from entanglement features,” Phys.Rev. B 97, 045153 (2018).

[61] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy,and R. Melko, “Quantum boltzmann machine,” Phys.Rev. X 8, 021050 (2018).

[62] P. Smolensky, Information processing in dynamical sys-tems: Foundations of harmony theory, Tech. Rep.(COLORADO UNIV AT BOULDER DEPT OF COM-PUTER SCIENCE, 1986).

[63] N. L. Roux and Y. Bengio, “Representational powerof restricted boltzmann machines and deep belief net-works,” Neural Computation 20, 1631 (2008).

[64] G. Montufar and N. Ay, “Refinements of universal ap-proximation results for deep belief networks and re-stricted boltzmann machines,” Neural Comput. 23,1306 (2011).

[65] C. Bishop, C. M. Bishop, et al., Neural networks forpattern recognition (Oxford university press, 1995).

[66] L. V. Fausett et al., Fundamentals of neural net-works: architectures, algorithms, and applications,Vol. 3 (Prentice-Hall Englewood Cliffs, 1994).

[67] M. Fannes, B. Nachtergaele, and R. F. Werner,“Finitely correlated states on quantum spin chains,”Communications in mathematical physics 144, 443(1992).

[68] A. Klumper, A. Schadschneider, and J. Zittartz, “Ma-trix product ground states for one-dimensional spin-1quantum antiferromagnets,” EPL (Europhysics Letters)24, 293 (1993).

[69] A. Klumper, A. Schadschneider, and J. Zittartz,“Equivalence and solution of anisotropic spin-1 modelsand generalized tj fermion models in one dimension,”Journal of Physics A: Mathematical and General 24,L955 (1991).

[70] G. Evenbly and G. Vidal, “Class of highly entangledmany-body states that can be efficiently simulated,”Phys. Rev. Lett. 112, 240502 (2014).

[71] G. Evenbly and G. Vidal, “Scaling of entanglement en-tropy in the (branching) multiscale entanglement renor-malization ansatz,” Phys. Rev. B 89, 235113 (2014).

[72] Y.-Y. Shi, L.-M. Duan, and G. Vidal, “Classical simula-tion of quantum many-body systems with a tree tensornetwork,” Phys. Rev. A 74, 022320 (2006).

[73] M. Zwolak and G. Vidal, “Mixed-state dynamics inone-dimensional quantum lattice systems: A time-dependent superoperator renormalization algorithm,”Phys. Rev. Lett. 93, 207205 (2004).

[74] J. Cui, J. I. Cirac, and M. C. Banuls, “Variationalmatrix product operators for the steady state of dissi-pative quantum systems,” Phys. Rev. Lett. 114, 220601(2015).

[75] A. A. Gangat, T. I, and Y.-J. Kao, “Steady states ofinfinite-size dissipative quantum chains via imaginarytime evolution,” Phys. Rev. Lett. 119, 010501 (2017).

[76] B.-B. Chen, L. Chen, Z. Chen, W. Li, and A. Weichsel-

baum, “Exponential thermal tensor network approachfor quantum lattice models,” Phys. Rev. X 8, 031082(2018).

[77] P. Czarnik and J. Dziarmaga, “Variational approach toprojected entangled pair states at finite temperature,”Phys. Rev. B 92, 035152 (2015).

[78] M. M. Parish and J. Levinsen, “Quantum dynamics ofimpurities coupled to a fermi sea,” Phys. Rev. B 94,184303 (2016).

[79] A. Kshetrimayum, H. Weimer, and R. Orus, “A sim-ple tensor network algorithm for two-dimensional steadystates,” Nature communications 8, 1291 (2017).

[80] F. Verstraete and J. I. Cirac, “Continuous matrix prod-uct states for quantum fields,” Phys. Rev. Lett. 104,190405 (2010).

[81] J. Haegeman, J. I. Cirac, T. J. Osborne, I. Pizorn,H. Verschelde, and F. Verstraete, “Time−dependentvariational principle for quantum lattices,” Phys. Rev.Lett. 107, 070601 (2011).

[82] J. Haegeman, T. J. Osborne, H. Verschelde, and F. Ver-straete, “Entanglement renormalization for quantumfields in real space,” Phys. Rev. Lett. 110, 100402(2013).

[83] J. Haegeman, T. J. Osborne, and F. Verstraete,“Post−matrix product state methods: To tangent spaceand beyond,” Phys. Rev. B 88, 075133 (2013).

[84] Y. Levine, O. Sharir, N. Cohen, and A. Shashua,“Bridging many-body quantum physics and deeplearning via tensor networks,” arXiv preprintarXiv:1803.09780 (2018).

[85] E. Stoudenmire and D. J. Schwab, “Supervised learningwith tensor networks,” in Advances in Neural Informa-tion Processing Systems (2016) pp. 4799–4807.

[86] Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang,“Unsupervised generative modeling using matrix prod-uct states,” Phys. Rev. X 8, 031012 (2018).

[87] E. M. Stoudenmire, “Learning relevant features of datawith multi-scale tensor networks,” Quantum Scienceand Technology 3, 034003 (2018).

[88] D. Liu, S.-J. Ran, P. Wittek, C. Peng, R. B. Garcıa,G. Su, and M. Lewenstein, “Machine learning bytwo-dimensional hierarchical tensor networks: A quan-tum information theoretic perspective on deep architec-tures,” arXiv preprint arXiv:1710.04833 (2017).

[89] W. Huggins, P. Patel, K. B. Whaley, and E. M.Stoudenmire, “Towards quantum machine learningwith tensor networks,” arXiv preprint arXiv:1803.11537(2018).

[90] I. Glasser, N. Pancotti, M. August, I. D. Rodriguez,and J. I. Cirac, “Neural-network quantum states, string-bond states, and chiral topological states,” Phys. Rev.X 8, 011006 (2018).

[91] A. Kolmogorov, “The representation of continuous func-tions of several variables by superpositions of continuousfunctions of a smaller number of variables,” .

[92] A. N. Kolmogorov, “On the representation of contin-uous functions of many variables by superposition ofcontinuous functions of one variable and addition,” inDoklady Akademii Nauk, Vol. 114 (Russian Academy ofSciences, 1957) pp. 953–956.

[93] V. I. Arnold, Vladimir I. Arnold-Collected Works: Rep-resentations of Functions, Celestial Mechanics, andKAM Theory 1957-1965, Vol. 1 (Springer Science &Business Media, 2009).

18

[94] D. Alexeev, “Neural-network approximation of func-tions of several variables,” Journal of Mathematical Sci-ences 168, 5 (2010).

[95] F. Rosenblatt, Principles of neurodynamics. perceptronsand the theory of brain mechanisms, Tech. Rep. (COR-NELL AERONAUTICAL LAB INC BUFFALO NY,1961).

[96] J. S lupecki, “A criterion of fullness of many-valuedsystems of propositional logic,” Studia Logica 30, 153(1972).

[97] G. Cybenko, “Approximations by superpositions of asigmoidal function,” Mathematics of Control, Signalsand Systems 2, 183 (1989).

[98] K.-I. Funahashi, “On the approximate realization ofcontinuous mappings by neural networks,” Neural net-works 2, 183 (1989).

[99] K. Hornik, M. Stinchcombe, and H. White, “Mul-tilayer feedforward networks are universal approxima-tors,” Neural networks 2, 359 (1989).

[100] R. Hecht-Nielsen, “Kolmogorov’s mapping neural net-work existence theorem,” in Proceedings of the IEEEInternational Conference on Neural Networks III (IEEEPress, 1987) pp. 11–13.

[101] X. Gao, S.-T. Wang, and L.-M. Duan, “Quantumsupremacy for simulating a translation-invariant isingspin model,” Phys. Rev. Lett. 118, 040502 (2017).

[102] M. B. Hastings, “Solving gapped hamiltonians locally,”Phys. Rev. B 73, 085115 (2006).

[103] M. B. Hastings, “An area law for one-dimensional quan-tum systems,” Journal of Statistical Mechanics: Theoryand Experiment 2007, P08024 (2007).

[104] R. Jastrow, “Many-body problem with strong forces,”Phys. Rev. 98, 1479 (1955).

[105] Y. Nomura, A. S. Darmawan, Y. Yamaji, andM. Imada, “Restricted boltzmann machine learning forsolving strongly correlated quantum systems,” Phys.Rev. B 96, 205152 (2017).

[106] G. Torlai and R. G. Melko, “Latent space purificationvia neural density operators,” Phys. Rev. Lett. 120,240503 (2018).

[107] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, andJ. Eisert, “Quantum state tomography via compressedsensing,” Phys. Rev. Lett. 105, 150401 (2010).

[108] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma,D. Gross, S. D. Bartlett, O. Landon-Cardinal,D. Poulin, and Y.-K. Liu, “Efficient quantum state to-mography,” Nature communications 1, 149 (2010).

[109] B. Lanyon, C. Maier, M. Holzapfel, T. Baumgratz,C. Hempel, P. Jurcevic, I. Dhand, A. Buyskikh, A. Da-ley, M. Cramer, et al., “Efficient tomography of aquantum many-body system,” Nature Physics 13, 1158(2017).

[110] G. Toth, W. Wieczorek, D. Gross, R. Krischek,C. Schwemmer, and H. Weinfurter, “Permutationallyinvariant quantum tomography,” Phys. Rev. Lett. 105,250403 (2010).

[111] M. A. Nielsen and I. L. Chuang, Quantum computationand quantum information (Cambridge university press,2010).

[112] J. Preskill, “Quantum computing and the entanglementfrontier,” arXiv preprint arXiv:1203.5813 (2012).

[113] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo,N. Margolus, P. Shor, T. Sleator, J. A. Smolin, andH. Weinfurter, “Elementary gates for quantum compu-

tation,” Phys. Rev. A 52, 3457 (1995).[114] B. Jonsson, B. Bauer, and G. Carleo, “Neural-network

states for the classical simulation of quantum comput-ing,” arXiv preprint arXiv:1808.05232 (2018).