sampling etworks - wordpress.com

8
RESEARCH STATEMENT 1 Hanbaek Lyu My research is broadly driven by questions related to understanding emergent or latent properties of complex sys- tems and data sets. Such questions and techniques that I develop to address them span across the fields of probability, combinatorics, dynamical systems, optimization, and machine learning. 1. ONLINE DICTIONARY LEARNING + MCMC SAMPLING +NETWORKS 1.1. Motivation. In modern data analysis, a central step is to find a low-dimensional representation to better under- stand, compress, or convey the key phenomena captured in the data. Dictionary-learning (DL) algorithms are ma- chine learning techniques that learn interpretable latent structures of complex data sets and are applied regularly in text and image data analysis [EA06, MES07, Pey09]. A primary tool for DL is nonnegative matrix factorization (NMF), which is a nonconvex optimization problem to approximately factorize a data matrix into the product of two nonnega- tive matrices [LS99], which have a wide range of applications in text analysis for topic modeling, image reconstruction, bioinformatics for protein-protein interaction network, and so on. [SGH02, BB05, BBL + 07, CWS + 11, TN12, BMB + 15, RPZ + 18]. In order to handle large data (e.g., large fMRI images) with high speed and low memory consumption, a number of online matrix factorization (OMF) algorithms [MBPS10, GTLY12, Mai13, MMTV17] have been developed, which independently samples a small portion of it from the target distribution (e.g., uniform) and progressively learn a desired factorization instead of accessing full data at once. Meanwhile, Markov chain Monte Carlo (MCMC) methods has met with a great success as one of the most versa- tile sampling techniques across many disciplines such as Bayesian inference and machine learning [ADFDJ03], where direct sampling from the target distribution is often computationally intractable. The idea of MCMC algorithm is to design a Markov chain exploring the sample space only using local information and converging to the target distribu- tion. Naturally MCMC algorithms output dependent samples, so classical OMF algorithms for i.i.d. data samples are not directly applicable to MCMC data sequences. Modern data are not only large in their size, but they may also have intricate structure. A prime example is in the form of networks, which are formal representation of the architecture of interactions between entities in many com- plex systems in the physical, biological, and social sciences. As most real-world networks are sparse, independently choosing a set of k nodes from a network does not return any meaningful information with high probability. More- over, a naive use of random walk sampling may result in emphasizing nodes with large degree, which could result in undesirable biases in learning results (see, e.g., [BCZ + 16, ASS + 19] for issues on algorithmic fairness). Therefore, in order to learn efficiently and correctly from large and structured data, we need to develop a comprehensive theory for simultaneous MCMC sampling and matrix factorization. 1.2. Online Matrix/Tensor factorization for Markovian data. In [LNB20], we propose a framework for online matrix factorization for dependent data samples generated as a function of some underlying Markov chain, and prove an almost sure convergence of the online algorithm directly against the dependence in data samples. Our result opens up a wide variety of applications by combining classical OMF algorithms with MCMC sampling in diverse domains (see Subsection 1.3 for an example in network learning). To my best knowledge, this is the first time that consis- tency of any online dictionary learning algorithm has been established beyond independently sampled data streams. Figure 1. Plot of loss vs. MCMC iterations (unit 10 4 ) for subsampling epochs of 1000, 10000, 100000, and 500000 for the Ising model at tem- peratures T = 0.5 (left), T = 2.26 (middle), and T = 5 (right), respectively. Less frequent subsampling of MCMC trajectory results in more efficient dictionary learning. A practical implication of our result is that one can now extract features more efficiently from the same depen- dent data streams, as there is no need to subsample the data sequence to approximately satisfy the indepen- dence assumption (see Figure 1). The main theoretical bottleneck is the fact that the distribution of new data conditional on the present could be drastically different from the desired stationary distribution, which makes it impossible to use the theory of quasi-martingale and empirical processes. Our main innovation is to condi- tion the next data not on the present but on a distant past and let the Markov chain mix to the stationary distribution during the unconditioned period. This allows us to control the difference between the new and the average losses by concentration of Markov chains. 1 Click to see a 2-page version of the research statement 1

Upload: others

Post on 30-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAMPLING ETWORKS - WordPress.com

RESEARCH STATEMENT1

Hanbaek Lyu

My research is broadly driven by questions related to understanding emergent or latent properties of complex sys-tems and data sets. Such questions and techniques that I develop to address them span across the fields of probability,combinatorics, dynamical systems, optimization, and machine learning.

1. ONLINE DICTIONARY LEARNING + MCMC SAMPLING + NETWORKS

1.1. Motivation. In modern data analysis, a central step is to find a low-dimensional representation to better under-stand, compress, or convey the key phenomena captured in the data. Dictionary-learning (DL) algorithms are ma-chine learning techniques that learn interpretable latent structures of complex data sets and are applied regularly intext and image data analysis [EA06, MES07, Pey09]. A primary tool for DL is nonnegative matrix factorization (NMF),which is a nonconvex optimization problem to approximately factorize a data matrix into the product of two nonnega-tive matrices [LS99], which have a wide range of applications in text analysis for topic modeling, image reconstruction,bioinformatics for protein-protein interaction network, and so on. [SGH02, BB05, BBL+07, CWS+11, TN12, BMB+15,RPZ+18]. In order to handle large data (e.g., large fMRI images) with high speed and low memory consumption, anumber of online matrix factorization (OMF) algorithms [MBPS10, GTLY12, Mai13, MMTV17] have been developed,which independently samples a small portion of it from the target distribution (e.g., uniform) and progressively learna desired factorization instead of accessing full data at once.

Meanwhile, Markov chain Monte Carlo (MCMC) methods has met with a great success as one of the most versa-tile sampling techniques across many disciplines such as Bayesian inference and machine learning [ADFDJ03], wheredirect sampling from the target distribution is often computationally intractable. The idea of MCMC algorithm is todesign a Markov chain exploring the sample space only using local information and converging to the target distribu-tion. Naturally MCMC algorithms output dependent samples, so classical OMF algorithms for i.i.d. data samples arenot directly applicable to MCMC data sequences.

Modern data are not only large in their size, but they may also have intricate structure. A prime example is in theform of networks, which are formal representation of the architecture of interactions between entities in many com-plex systems in the physical, biological, and social sciences. As most real-world networks are sparse, independentlychoosing a set of k nodes from a network does not return any meaningful information with high probability. More-over, a naive use of random walk sampling may result in emphasizing nodes with large degree, which could result inundesirable biases in learning results (see, e.g., [BCZ+16, ASS+19] for issues on algorithmic fairness). Therefore, inorder to learn efficiently and correctly from large and structured data, we need to develop a comprehensive theory forsimultaneous MCMC sampling and matrix factorization.

1.2. Online Matrix/Tensor factorization for Markovian data. In [LNB20], we propose a framework for online matrixfactorization for dependent data samples generated as a function of some underlying Markov chain, and prove analmost sure convergence of the online algorithm directly against the dependence in data samples. Our result opensup a wide variety of applications by combining classical OMF algorithms with MCMC sampling in diverse domains(see Subsection 1.3 for an example in network learning). To my best knowledge, this is the first time that consis-tency of any online dictionary learning algorithm has been established beyond independently sampled data streams.

Figure 1. Plot of loss vs. MCMC iterations (unit 104) for subsampling

epochs of 1000, 10000, 100000, and 500000 for the Ising model at tem-

peratures T = 0.5 (left), T = 2.26 (middle), and T = 5 (right), respectively.

Less frequent subsampling of MCMC trajectory results in more efficient

dictionary learning.

A practical implication of our result is that one can nowextract features more efficiently from the same depen-dent data streams, as there is no need to subsamplethe data sequence to approximately satisfy the indepen-dence assumption (see Figure 1). The main theoreticalbottleneck is the fact that the distribution of new dataconditional on the present could be drastically differentfrom the desired stationary distribution, which makesit impossible to use the theory of quasi-martingale andempirical processes. Our main innovation is to condi-tion the next data not on the present but on a distantpast and let the Markov chain mix to the stationary distribution during the unconditioned period. This allows usto control the difference between the new and the average losses by concentration of Markov chains.

1Click to see a 2-page version of the research statement

1

Page 2: SAMPLING ETWORKS - WordPress.com

2

Spatial activation atom # 14

(𝑎) (𝑏) (𝑐) (𝑑)

Temporal activation Spatial activation

0 sec 2 sec

Figure 2. Learning 20 CP-dictionary patches from video frames on brain

activity across the mouse cortex. Our algorithm learns 20 spatial co-

activation patterns of the neurons in the mice cortex (left and middle),

and simultaneously learned are their temporal activation patterns (right)

In the setting of multi-modal data, tensor factoriza-tion algorithms (e.g., CP-decomposition) plays a simi-lar role as matrix factorization algorithms that providesthe functionality of dimension reduction and dictionarylearning simultaneously for each mode and are receivinggreat attention from the community [KB09]. In [SLN20],we develop the first online nonnegative tensor factoriza-tion (NTF) algorithm that is guaranteed to converge un-der the similar Markovian setting, whereas convergenceguarantee has not been available even under the i.i.d. as-sumption [ZEB18, ZVB+16, DZLZ18]. Being able to online-decompose multi-modal data invites a wide range of appli-cations (see Figure 2).

In response to the utmost challenge in the scientific community against the COVID-19 pandemic in 2020, I havealso developed novel methods for learning and predicting COVID-19 related data based on my earlier works on onlineNMF and NTF algorithms, including joint time-series data [LSMN20] and dynamic tweeter data [KKL+20].

Currently, I am developing a unifying framework of Alternating Stochastic Majorization-Minimization, which gen-eralizes my works on online NMF/NTF algorithms to the more general setting of optimization problems for dependentand multi-modal data. This framework also extends classical theories of stochastic optimization algorithms includ-ing convex optimization, proximal gradient, and expectation-maximization to the dependent data setting, yieldingoptimal convergence rates for convex problems as long as correlation in the data streams decays as fast as the clas-sical convergence rate of O(1/

pn). A key finding is that online learning algorithms in this class are robust against

dependence in data sequence.

1.3. Network Dictionary Learning. When analyzing texts documents, one can learn individual characters and sym-bols (microscale structures) or their total counts in the document (macroscale structures), but may still not havelearned how words come together to form sentences or paragraphs. Analogously for networks, one may learn their

Network Dictionary Learning

Network data

reconstruction

Interpretable parts

(Dictionary)

Network

Motif

Sample

MCMC Motif sampling

Memoli, Lyu, Sivakoff (2019+)

Dictionary

Dictionary

Dictionary

𝐷𝑎𝑡𝑎

𝐷𝑎𝑡𝑎

𝐷𝑎𝑡𝑎

Lyu, Needell, Balzano (2019+)

Online Matrix Factorization for Markovian data

+

(Low-rank basis)

UCLA Facebook Network CALTECH Facebook Network

Network Dictionary Network Dictionary

CYCLE by M.C. Escher

Image Dictionary

a b c

Figure 3. (a) 25 latent shapes learned from an image by NMF. (b,c) 25

latent motifs for k = 21-node connected subgraphs learned from UCLAand Caltech. The learned latent motifs reveal that in UCLA, two distant

friends tend to have a unique path of friendships, but in Caltech, there

are multiple paths of friendships interconnecting two distant friends.

microscale structures (e.g., nodes and edges) or macroscalestructures (e.g., degree distribution), but still lack insighton their mesoscale structures, such as community orcore–periphery structures [POM09, RPFM17]. Recently,the importance of understanding such mesoscale struc-tures of networks has been widely recognized [OFR+12,KSBB18]. At the same time, we are facing an enormousamount of large network data due to the explosion in ourability to measure, process, and store network data invarious domains. Hence, it would be an important andtimely contribution to develop an algorithmic tool that‘compresses’ a large network into a set of its mesoscalestructures so that we obtain network data compressionand interpretible mesoscale structures simultaneously.

Figure 4. Performance of five algorithms on edge inference tasks on three

real-world networks measured by area-under-the-curve (AUC) scores as-

sociated with receiver operating characteristic (ROC) curves.

In [LKVP20], we develop a systematic approach ofNetwork Dictionary Learning for learning ‘latent motifs’that reveals how a network ‘looks’ at mesoscale. Ourmethod is based on sampling a large number of k-nodeconnected subgraphs of a network using the MCMC mo-tif sampling algorithms that I developed in [LMS19] andthen applying online NMF algorithm in [LNB20] to findan approximate basis for the sampled subgraphs. Theuse of MCMC motif sampling algorithm is crucial herein order to sample a sequence of k neighboring nodes from the uniform (unbiased) distribution. The basis elementsform k-node networks, which we call latent motifs of the network. In analogy with text data, they play the role ofelementary ‘words’ that compose ‘sentences’ of k-node connected subgraphs of the network. From the learned latentmotifs, one can directly infer the mesoscale structure of the original network (see Figure 3). Using our method, we

Page 3: SAMPLING ETWORKS - WordPress.com

3

find that various real-world networks such as Facebook friendship networks, Coronavirus and Homo sapiens protein-protein interaction (PPI) networks, and an arXiv collaboration network possess a few latent motifs that together canwell-approximate most subnetworks at a fixed mesoscale.

The ability to encode a network using a set of latent motifs opens up a wide variety of network-analysis tasks, suchas network comparison, denoising, and edge inference. As a key application of our method, we perform networkdenoising tasks by using only the latent motifs that one learns directly from the corrupted networks. Our network de-noising method achieves state-of-the-art performance for the edge inference tasks , even compared to other popularsupervised Graph Neural Network algorithms such as node2vec [GL16] and DeepWalk [PARS14] (see Figure 4). Sucha high performance is partly due to the apparent robustness of latent motifs against random noise. Moreover, ournetwork denoising approach is novel because it does not require any negative examples (unsupervised), which mightnot be available in practice, making our method more applicable to practical network denoising tasks. We also providerigorous convergence analysis and theoretical error bounds of our proposed algorithms.

I am currently developing extensions of this line of research to temporal networks by combining with my recentwork in online NTF [SLN20] and also to labeled networks using a work in process on online semi-supervised NMF.

2. COMBINATORIAL AND PROBABILISTIC APPROACHES TO OSCILLATOR AND CLOCK SYNCHRONIZATION

(NSF AWARD: DMS-2010035)

If a group of people is given local clocks with arbitrarily set times, and there is no global reference (for exampleGPS), is it possible for the group to synchronize all clocks by only communicating with nearby members? In or-der for a distributed system to be able to perform high-level tasks that may go beyond the capability of an individ-ual agent, the system must first solve a "clock synchronization" problem to establish a shared notion of time. Thestudy of clock synchronization (or coupled oscillators) has been an important subject of research in mathematicsand various areas of science for decades [Str00], with fruitful applications in many areas including wildfire monitor-ing, electric power networks, robotic vehicle networks, large-scale information fusion, and wireless sensor networks[DB12, NL07, PS11]. However, there has been a gap between our theoretical understanding of systems of coupled

0

32

16

48

0

1

Figure 5. Illustration of clock synchronization by the adaptive 4-

coupling. The underlying graph is a spanning tree of the 150×150 square

grid. Oscillators on the nodes of the spanning tree interact according to

the ‘adaptive’ version of the inhibitory pulse-coupling in [Lyu17]. Phase

of each node is indicated by a warm color between black and white.

oscillators and practical requirements for clock syn-chronization algorithms in modern application contextssuch as robustness in arbitrary perturbation, boundedmemory consumption, and energy efficiency in com-munication [MS90, KB12, WND13].

One of the key difficulties in guaranteeing robustsynchronization under arbitrary perturbation lies in thecyclic hierarchy in the phase space. A widely used obser-vation in the literature is that, if all initial phases are con-centrated in an open half-circle, such a cyclic hierarchydisappears and we have robust synchronization results in various settings. This principle has been used pervasivelyin the literature of clock synchronization [NF11, KKBT12, PC15, NWD15] and also in multi-agent consensus problems[Mor05, PJM10, Cha11]. In a series of solo papers [Lyu15, Lyu16, Lyu17], I have developed a systematic approachesusing discrete pulse-coupled oscillators to not only break the half-circle barrier, but also to meet with the minimalresource and energy constraint in clock synchronization algorithms.

The novelty and significance of my contribution and my vision in the field of oscillator and clock synchronizationhave been recognized by the National Science Foundation and was awarded by the NSF Grant DMS-2010035 in May2020. Supported by NSF, I have already successfully mentored a group of REU students in summer 2020 on the project“Machine learning approches to oscillator and clock synchronization”. The project involves generating a large data-base for the collective behavior of some models of discrete coupled oscillators on finite graphs, and applying machinelearning techniques to extract key features of the pair of network topology and initial configuration that guaranteesynchronization. Our data-driven machine learning approach has shed a new light in understanding complicatedbehavior of coupled oscillators (e.g., Kuramoto model) on arbitrary graphs.

3. PHASE TRANSITION IN CONTINGENCY TABLES WITH NON-UNIFORM MARGINS

Contingency tables are n ×m matrices of nonnegative integer entries with prescribed row sums r = (a1, · · · , am)and columns sums c = (b1, · · · ,bn) called margins, where by M (r,c) we denote the set of all such tables. They are

Page 4: SAMPLING ETWORKS - WordPress.com

4

fundamental objects in statistics for studying dependence structure between two or more variables, see e.g. [Eve92,FLL17, Kat14], and they also correspond to bipartite multi-graphs with given degrees and play an important role incombinatorics and graph theory, see e.g. [Bar09, DG95, DS98]. Counting their number |M (r,c)| and sampling anelement from M (r,c) uniformly at random are two fundamental problems concerning contingency tables with manyconnections and applications to other fields [DG95, BLP20] (e.g., testing hypothesis on co-occurrence of species inecology [CS79]). A historic guiding principle to these problems is the independent heuristic, which was introduced byI. J. Good as far back as in 1950 [Goo50]. The heuristic states that the constraints for the rows and columns of the tableare asymptotically independent as the size of the table grows to infinity. This yields a simple yet surprisingly accurateformula that approximates the count |M (r,c)|. The independence heuristic and also implies the hypergeometric (orFisher-Yates) should approximate the uniform distribution on M (r,c).

𝐵 1 2 0 1 + √2

c ≡ √2 c c

1

√2

𝐶𝑛

𝐶𝑛

𝐶𝑛

𝐵𝐶𝑛

𝐵𝐶𝑛

𝐶𝑛 𝐶𝑛 𝐶𝑛 𝐵𝐶𝑛 𝐵𝐶𝑛 ⋯

𝑛 𝑛

𝑛

𝑛

Geom(𝐶)

Geom(𝐵 𝐶)

Geo

m( 𝐵

𝐶)

Geom 𝐶(𝐵 − 𝐵 )𝑛 𝑛 𝑛

𝑛

𝑛

𝑛 → ∞

𝐵 (1 + 𝐶)

(𝐵 − 𝐵)(𝐵 + 𝐵 − 2)

Geom(𝐶)

Geom(𝐵𝐶)

Geo

m( 𝐵

𝐶)

Geom

𝐵 < 𝐵 𝐵 > 𝐵

𝐵𝐶𝑛

𝐵𝐶𝑛

𝐶𝑛

𝐶𝑛 𝐶𝑛

𝐶𝑛 𝐶𝑛 𝐶𝑛 𝐵𝐶𝑛

⋮ ⋮

Figure 6. (Left) Contingency table with parameters n,δ,B and C . First

bnδc rows and columns have margins bC nc, the last n rows and columns

have margins bBC nc. (Right) Limiting distributions of the entries in the

uniform contingency table X in the subcritical B < Bc (left) and supercrit-

ical B > Bc (right) regimes for thick bezels 1/2 < δ < 1. Geom(λ) denotes

geometric distribution with mean λ.

Both of these implications of the independent heuris-tic has been verified when the margins are constant orhave bounded ratio close to one [CM10, GM08, BLSY10].However, when the margins are far from being con-stant, Barvinok [Bar10] conjectured that there is a dras-tic difference between the uniform and hypergeomet-ric distribution on M (r,c), and Barvinok and Hartigan[BH12] showed that the independent heuristic gives alarge undercounting of contingency tables. My workwith Dittmer and Pak [DLP20] and Pak [LP20] providesthe first complete answer to this puzzle; Contingency ta-bles exhibit a sharp phase transition when the heterogen-ity of margins exceeds a certain critical threshold.

In [DLP20], I have rigorously shown, for the first time, that there exists a sharp phase transition in the uniformdistribution on M (r,c) as the size of the table grow, when the margins assume two values BC and C and their ratio Bexceeds the critical ratio Bc = 1+p

1+1/C . Namely, the hypergeometric distribution correctly estimates the uniformdistribution for B < Bc , but does so drastically differently for B > Bc (see Figure 6). Such a sharp phase transition givesa probabilistic answer to the statistical question on why sampling a uniformly distributed contingency table is hard.Moreover, our result settle Barvinok’s conjecture [Bar10] (except for a special case).

More recently in [LP20], I have established a similar phase transition result for contingency tables, this time in theperspective of the counting problem. Roughly speaking, our result show that the rows and columns of contingencytables are asymptotically independent (and hence the independence heuristic is correct) when the ratio B betweenthe two margins is strictly less than the critical threshold Bc , but suddenly they become positively correlated as soonas B exceeds Bc and the independence heuristic gives exponential undercounting. This is quite surprising since theindependence heuristic does not “notice” the phase transition at Bc and changes smoothly with B . Both of my resultshow that the historic independent heuristic in 1950 captures the structure of contingency tables remarkably wellfor near-homogeneous margins, but in general positive correlation between rows and columns may emerge and theheuristic fails dramatically.

4. INTERACTING PARTICLE SYSTEMS AND DISCRETE SPATIAL PROCESSES

Many important phenomena that we would like to understand − formation of public opinion, trending topics onsocial networks, growth of stock market, development of cancer cells, outbreak of epidemics, and collective compu-tation in distributed systems − are closely related to predicting large-scale behavior of systems of locally interactingagents. Discrete spatial processes provide a simple framework for modeling such systems: A vertex coloring X t : V →Zκ

on a given graph G = (V ,E) updates in discrete or continuous time according to a fixed deterministic or random tran-sition rule. In a typical setting in applied probability literature, one draws the initial coloring X0 from some probabilitymeasure and asks how the probability P(X t has property P ) behaves. The answer usually depends on details such astopology of the underlying graph and parameters in the model.

In collaboration with many researchers in the field, I have addressed the above question for a number of mod-els arising from different contexts: the firefly cellular automata (coupled oscillators), the cyclic cellular automata (BZchecmial reaction), the Greenberg-Hastings model (neural network), the cyclic particle system (multicolor acyclicvoter model) [FL17], and lastly, the parking process and ballistic annihilation (annihilating particle systems) [DGJ+17].

Page 5: SAMPLING ETWORKS - WordPress.com

5

We studied the first model on the one-dimensional integer lattice Z by constructing integer-valued comparison pro-cesses on Z and using combinatorial correspondences [LS17b, LS17a]. We were able to used one of the main tech-niques to answer an open question on persistence of correlated partial sums [LS17a], which generalizes the classicalSparre-Anderson’s theorem [And53] in 1953. In [GLS16], we have characterized the limiting behavior of the κ = 3Cyclic Cellular automata and Greenberg-Hastings model on arbitrary graphs by constructing a similar comparisonprocess on the universal cover of the underlying graph. This was the first time that the long-term behavior of non-trivial dynamical systems are completely characterized on arbitrary underlying graphs. For the cyclic particle system,we proved a conjecture of Bramson and Griffeath in 1989 that the 3- and 4-color system clusters on Z.

Figure 7. Simulation of parking process with critical den-

sity p = 0.5. Blue=cars and Red=spots. Time goes upwards.

My recent contribution to this field concerns multitype annihilat-ing particle systems, namely, ballistic annihilation, the parking pro-cess, and diffusion-limited annihilating systems. In [JL18], we haveobtained a complete phase transition picture in the asymmetric bal-listic annihilation, where only some heuristic was available in thephysics literature [DRFP95]. On the other hand, we proposed andstudied the parking process (where two types of particles performrandom walks and annihilate each other) on a class of lattice-likegraphs (e.g., Cayley graphs) and obtained sharp phase transition in the long-time particle density [DGJ+17] and alsoa stretched exponential decay in parking times [DLS20]. Extending some of our techniques we developed for theparking process as well as using new techniques, we have answered a number open problems on particle densitiesin diffusion-limited annihilating systems [JJLS20] (e.g., the Bramson-Lebowitz asymptotics [BL91] in the asymmetricsetting).

5. SOLITONS, BOX-BALL SYSTEMS, AND INTEGRABLE PROBABILITY

Integrable systems roughly refer to nonlinear systems where one can explicitly write down the solutions in termsof a set of more elementary functions, just like we can superimpose solutions to linear differential equations. Under-standing such systems can greatly advance our understanding on more general nonlinear dynamical systems, whichwas highlighted by the 2014 Fields medal on the groundbreaking work of Hairer [Hai13] on the Kardar–Parisi–Zhang(KPZ) equation that describes the evolution of the profile of a growing interface [KPZ86]. Another closely related in-tegrable system is given by the Korteweg-de Veris (KdV) equation, which was proposed by Korteweg and de Veris in1895 in order to model particle-like waves (solitons) on shallow water surfaces [New85, Woy06, MQR16]. It has been acentral topic in statistical and mathematical physics over the past seventy years [DJ06]. A line of my research programconsiders a discrete counterpart of the KdV equation, known as the box-ball systems (BBS) [TS90, TTMS96]. Theyare known to arise both from the quantum and classical integrable systems and enjoy deep connections to quantumgroups, crystal base theory, solvable lattice models, the Bethe ansatz, soliton equations, the KdV equation, tropicalgeometry and so forth [FYO00, HHI+01, KOS+06, IKT12].

Figure 8. An example of 5-color Box-ball system evolution, which decom-

poses into solitons of non-decreasing lengths. The soliton statistic is also

determined by the associated invariant Young diagram on the right.

An important but notoriously difficult question forKdV equations is the following: If the system is randomlyinitialized, what is the limiting statistics of the solitonsemerging from the system, as the system size tends to in-finity? My research on BBS aims to address this questionin the discrete setting of BBS, and it leads a body of workin the literature of randomized BBS [LLP17, CKST18,KL18, FG18, KL18, CS19a, CS19b, LLPS]. In 2017 withLevine and Pike, we addressed this question for the basic 1-color BBS with i.i.d. initial configuration [LLP17]. Thiswork is considered as the foundational paper in the literature of randomized BBS and has cited by most papers inthe literature. In collaborations with Kuniba and Okado [KLO18, KL18], I have investigated the row lengths in themulticolor BBS and obtained Schur polynomials representations of their scaling limit as well as their large deviationsprinciple. One of the contribution there is to merge two entirely different approach – large deviations and Markovchains in probability and Thermodynamic Bethe Ansatz in statistical physics. Furthermore, with Lewis, Pylyavskyy,and Sen [LLPS], I have obtained scaling limits of the columns lengths in multicolor BBS using a modified version ofGreene-Kleitman invariants for BBS and circular exclusion processes. I am continuing to investigate on this topic andhope that we may get a better understanding on KdV from investigating its discrete counterpart of BBS.

Page 6: SAMPLING ETWORKS - WordPress.com

6

REFERENCES

[ADFDJ03] Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I Jordan, An introduction to mcmc for machine learning, Machine

learning 50 (2003), no. 1-2, 5–43.

[And53] Erik Sparre Andersen, On sums of symmetrically dependent random variables, Skand. Aktuarietidskr. 36 (1953), 123–138.

[ASS+19] Alexander Amini, Ava P Soleimany, Wilko Schwarting, Sangeeta N Bhatia, and Daniela Rus, Uncovering and mitigating algorithmic bias

through learned latent structure, Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 289–295.

[Bar09] Alexander Barvinok, Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation polytopes,

International Mathematics Research Notices 2009 (2009), no. 2, 348–385.

[Bar10] , What does a random contingency table look like?, Combinatorics, Probability and Computing 19 (2010), no. 4, 517–539.

[BB05] Michael W Berry and Murray Browne, Email surveillance using non-negative matrix factorization, Computational & Mathematical

Organization Theory 11 (2005), no. 3, 249–264.

[BBL+07] Michael W Berry, Murray Browne, Amy N Langville, V Paul Pauca, and Robert J Plemmons, Algorithms and applications for approximate

nonnegative matrix factorization, Computational statistics & data analysis 52 (2007), no. 1, 155–173.

[BCZ+16] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai, Man is to computer programmer as woman is to

homemaker? debiasing word embeddings, Advances in neural information processing systems, 2016, pp. 4349–4357.

[BH12] Alexander Barvinok and J Hartigan, An asymptotic formula for the number of non-negative integer matrices with prescribed row and

column sums, Transactions of the American Mathematical Society 364 (2012), no. 8, 4323–4368.

[BL91] Maury Bramson and Joel L Lebowitz, Asymptotic behavior of densities for two-particle annihilating random walks, Journal of statistical

physics 62 (1991), no. 1-2, 297–372.

[BLP20] Petter Brändén, Jonathan Leake, and Igor Pak, Lower bounds for contingency tables via lorentzian polynomials, arXiv preprint

arXiv:2008.05907 (2020).

[BLSY10] Alexander Barvinok, Zur Luria, Alex Samorodnitsky, and Alexander Yong, An approximation algorithm for counting contingency tables,

Random Structures & Algorithms 37 (2010), no. 1, 25–66.

[BMB+15] Rostyslav Boutchko, Debasis Mitra, Suzanne L Baker, William J Jagust, and Grant T Gullberg, Clustering-initiated factor analysis appli-

cation for tissue classification in dynamic brain positron emission tomography, Journal of Cerebral Blood Flow & Metabolism 35 (2015),

no. 7, 1104–1111.

[Cha11] Bernard Chazelle, The total s-energy of a multiagent system, SIAM Journal on Control and Optimization 49 (2011), no. 4, 1680–1706.

[CKST18] David A Croydon, Tsuyoshi Kato, Makiko Sasada, and Satoshi Tsujimoto, Dynamics of the box-ball system with random initial conditions

via Pitman’s transformation, arXiv preprint arXiv:1806.02147 (2018).

[CM10] E Rodney Canfield and Brendan D McKay, Asymptotic enumeration of integer matrices with large equal row and column sums, Combi-

natorica 30 (2010), no. 6, 655.

[CS79] Edward F Connor and Daniel Simberloff, The assembly of species communities: chance or competition?, Ecology 60 (1979), no. 6, 1132–

1140.

[CS19a] David A Croydon and Makiko Sasada, Duality between box-ball systems of finite box and/or carrier capacity, arXiv preprint

arXiv:1905.00189 (2019).

[CS19b] , Invariant measures for the box-ball system based on stationary Markov chains and periodic Gibbs measures, arXiv preprint

arXiv:1905.00186 (2019).

[CWS+11] Yang Chen, Xiao Wang, Cong Shi, Eng Keong Lua, Xiaoming Fu, Beixing Deng, and Xing Li, Phoenix: A weight-based network coordinate

system using matrix factorization, IEEE Transactions on Network and Service Management 8 (2011), no. 4, 334–347.

[DB12] Florian Dorfler and Francesco Bullo, Synchronization and transient stability in power networks and nonuniform kuramoto oscillators,

SIAM Journal on Control and Optimization 50 (2012), no. 3, 1616–1642.

[DG95] Persi Diaconis and Anil Gangolli, Rectangular arrays with fixed margins, Discrete probability and algorithms, Springer, 1995, pp. 15–41.

[DGJ+17] Micheal Damron, Janko Gravner, Matthew Junge, Hanbaek Lyu, and David Sivakoff, Parking on transitive unimodular graphs,

arXiv.org/1710.10529 (2017).

[DJ06] EM De Jager, On the origin of the korteweg-de vries equation, arXiv preprint math/0602661 (2006).

[DLP20] Sam Dittmer, Hanbaek Lyu, and Igor Pak, Phase transition in random contingency tables with non-uniform margins, Transactions of

the AMS. 373 (2020), 8313–8338.

[DLS20] Michael Damron, Hanbaek Lyu, and David Sivakoff, Stretched exponential decay for subcritical parking times on Zd , arXiv preprint

arXiv:2008.05072 (2020).

[DRFP95] Michel Droz, Pierre-Antoine Rey, Laurent Frachebourg, and Jarosław Piasecki, Ballistic-annihilation kinetics for a multivelocity one-

dimensional ideal gas, Physical Review E 51 (1995), no. 6, 5541.

[DS98] Persi Diaconis and Bernd Sturmfels, Algebraic algorithms for sampling from conditional distributions, The Annals of statistics 26 (1998),

no. 1, 363–397.

[DZLZ18] Yishuai Du, Yimin Zheng, Kuang-chih Lee, and Shandian Zhe, Probabilistic streaming tensor decomposition, 2018 IEEE International

Conference on Data Mining (ICDM), IEEE, 2018, pp. 99–108.

[EA06] Michael Elad and Michal Aharon, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Transactions

on Image processing 15 (2006), no. 12, 3736–3745.

[Eve92] Brian S Everitt, The analysis of contingency tables, Chapman and Hall/CRC, 1992.

[FG18] Pablo A Ferrari and Davide Gabrielli, BBS invariant measures with independent soliton components, arXiv preprint arXiv:1812.02437

(2018).

[FL17] Eric Foxall and Hanbaek Lyu, Clustering of three and four color cyclic particle systems in one dimension, arXiv.org/1711.04741 (2017).

[FLL17] Morten Fagerland, Stian Lydersen, and Petter Laake, Statistical analysis of contingency tables, Chapman and Hall/CRC, 2017.

Page 7: SAMPLING ETWORKS - WordPress.com

7

[FYO00] Kaori Fukuda, Yasuhiko Yamada, and Masato Okado, Energy functions in box ball systems, International Journal of Modern Physics A

15 (2000), no. 09, 1379–1392.

[GL16] Aditya Grover and Jure Leskovec, node2vec: Scalable feature learning for networks, Proceedings of the 22nd ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–864.

[GLS16] Janko Gravner, Hanbaek Lyu, and David Sivakoff, Limiting behavior of 3-color excitable media on arbitrary graphs, Annals of Applied

Probability (to appear) (2016).

[GM08] Catherine Greenhill and Brendan D McKay, Asymptotic enumeration of sparse nonnegative integer matrices with specified row and

column sums, Advances in Applied Mathematics 41 (2008), no. 4, 459–481.

[Goo50] Isidore Jacob Good, Probability and the weighing of evidence.

[GTLY12] Naiyang Guan, Dacheng Tao, Zhigang Luo, and Bo Yuan, Online nonnegative matrix factorization with robust stochastic approximation,

IEEE Transactions on Neural Networks and Learning Systems 23 (2012), no. 7, 1087–1099.

[Hai13] Martin Hairer, Solving the kpz equation, Annals of mathematics (2013), 559–664.

[HHI+01] Goro Hatayama, Kazuhiro Hikami, Rei Inoue, Atsuo Kuniba, Taichiro Takagi, and Tetsuji Tokihiro, The A(1)M automata related to crystals

of symmetric tensors, Journal of Mathematical Physics 42 (2001), no. 1, 274–308.

[IKT12] Rei Inoue, Atsuo Kuniba, and Taichiro Takagi, Integrable structure of box–ball systems: crystal, Bethe ansatz, ultradiscretization and

tropical geometry, Journal of Physics A: Mathematical and Theoretical 45 (2012), no. 7, 073001.

[JJLS20] Tobias Johnson, Matthew Junge, Hanbaek Lyu, and David Sivakoff, Particle density in diffusion-limited annihilating systems, arXiv

preprint arXiv:2005.06018 (2020).

[JL18] Matthew Junge and Hanbaek Lyu, The phase structure of asymmetric ballistic annihilation, arXiv preprint arXiv:1811.08378 (2018).

[Kat14] Maria Kateri, Contingency table analysis, Statistics for Industry and Technology 525 (2014).

[KB09] Tamara G Kolda and Brett W Bader, Tensor decompositions and applications, SIAM Review 51 (2009), no. 3, 455–500.

[KB12] Johannes Klinglmayr and Christian Bettstetter, Self-organizing synchronization with inhibitory-coupled oscillators: Convergence and

robustness, ACM Transactions on Autonomous and Adaptive Systems (TAAS) 7 (2012), no. 3, 30.

[KKBT12] Johannes Klinglmayr, Christoph Kirst, Christian Bettstetter, and Marc Timme, Guaranteeing global synchronization in networks with

stochastic interactions, New Journal of Physics 14 (2012), no. 7, 073031.

[KKL+20] Lara Kassab, Alona Kryshchenko, Hanbaek Lyu, Denali Molitor, Deanna Needell, and Elizaveta Rebrova, On nonnegative matrix and

tensor decompositions for covid-19 twitter dynamics, arXiv preprint arXiv:2010.01600 (2020).

[KL18] Atsuo Kuniba and Hanbaek Lyu, Large deviations and one-sided scaling limit of multicolor box-ball system, arXiv preprint

arXiv:1808.08074 (2018).

[KLO18] Atsuo Kuniba, Hanbaek Lyu, and Masato Okado, Randomized box-ball systems, limit shape of rigged configurations and Thermodynamic

Bethe ansatz, arXiv preprint arXiv:1808.02626v4 (2018).

[KOS+06] Atsuo Kuniba, Masato Okado, Reiho Sakamoto, Taichiro Takagi, and Yasuhiko Yamada, Crystal interpretation of Kerov–Kirillov–

Reshetikhin bijection, Nuclear Physics B 740 (2006), no. 3, 299–327.

[KPZ86] Mehran Kardar, Giorgio Parisi, and Yi-Cheng Zhang, Dynamic scaling of growing interfaces, Physical Review Letters 56 (1986), no. 9,

889.

[KSBB18] Ankit N. Khambhati, Ann E. Sizemore, Richard F. Betzel, and Danielle S. Bassett, Modeling and interpreting mesoscale network dynamics,

NeuroImage 180 (2018), 337–349.

[LKVP20] Hanbaek Lyu, Yacoub Kureh, Joshua Vendrow, and Mason A. Porter, Learning low-rank latent mesoscale structures in networks, Draft

available: LINK (2020).

[LLP17] Lionel Levine, Hanbaek Lyu, and John Pike, Double jump phase transition in a random soliton cellular automaton, arXiv preprint

arXiv:1706.05621 (2017).

[LLPS] Joel Lewis, Hanbaek Lyu, Pasha Pylyavskyy, and Arnab Sen, Scaling limit of soliton lengths in a multicolor box-ball system.

[LMS19] Hanbaek Lyu, Facundo Memoli, and David Sivakoff, Sampling random graph homomorphisms and applications to network data anal-

ysis, arXiv:1910.09483 (2019).

[LNB20] Hanbaek Lyu, Deanna Needell, and Laura Balzano, Online matrix factorization for Markovian data and applications to network dictio-

nary learning, To appear in Journal of Machine Learning Research (arXiv:1911.01931) (2020).

[LP20] Hanbaek Lyu and Igor Pak, On the number of contingency tables and the independence heuristic, arXiv preprint arXiv:2009.10810 (2020).

[LS99] Daniel D. Lee and H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (1999), no. 6755,

788.

[LS17a] Hanbaek Lyu and David Sivakoff, Persistence of sums of correlated increments and clustering in cellular automata, Submitted. Preprint

available at arXiv.org/1706.08117 (2017).

[LS17b] , Synchronization of finite-state pulse-coupled oscillators on Z, arXiv.org:1701.00319 (2017).

[LSMN20] Hanbaek Lyu, Christopher Strohmeier, Georg Menz, and Deanna Needell, Covid-19 time-series prediction by joint dictionary learning

and online nmf, arXiv preprint arXiv:2004.09112 (2020).

[Lyu15] Hanbaek Lyu, Synchronization of finite-state pulse-coupled oscillators, Physica D: Nonlinear Phenomena 303 (2015), 28–38.

[Lyu16] , Phase transition in firefly cellular automata on finite trees, arXiv preprint arXiv:1610.00837 (2016).

[Lyu17] , Global synchronization of pulse-coupled oscillators on trees, To appear in SIAM Journal on Applied Dynamical Systems. Preprint

available at arXiv:1604.08381 (2017).

[Mai13] Julien Mairal, Stochastic majorization-minimization algorithms for large-scale optimization, Advances in Neural Information Process-

ing Systems, 2013, pp. 2283–2291.

[MBPS10] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro, Online learning for matrix factorization and sparse coding, Journal of

Machine Learning Research 11 (2010), 19–60.

Page 8: SAMPLING ETWORKS - WordPress.com

8

[MES07] Julien Mairal, Michael Elad, and Guillermo Sapiro, Sparse representation for color image restoration, IEEE Transactions on Image Pro-

cessing 17 (2007), no. 1, 53–69.

[MMTV17] Arthur Mensch, Julien Mairal, Bertrand Thirion, and Gaël Varoquaux, Stochastic subsampling for factorizing huge matrices, IEEE Trans-

actions on Signal Processing 66 (2017), no. 1, 113–128.

[Mor05] Luc Moreau, Stability of multiagent systems with time-dependent communication links, Automatic Control, IEEE Transactions on 50(2005), no. 2, 169–182.

[MQR16] Konstantin Matetski, Jeremy Quastel, and Daniel Remenik, The kpz fixed point, arXiv preprint arXiv:1701.00018 (2016).

[MS90] Renato E Mirollo and Steven H Strogatz, Synchronization of pulse-coupled biological oscillators, SIAM Journal on Applied Mathematics

50 (1990), no. 6, 1645–1662.

[New85] Alan C Newell, Solitons in mathematics and physics, SIAM, 1985.

[NF11] Joel Nishimura and Eric J Friedman, Robust convergence in pulse-coupled oscillators with delays, Physical review letters 106 (2011),

no. 19, 194101.

[NL07] Sujit Nair and Naomi Ehrich Leonard, Stable synchronization of rigid body networks, Networks and Heterogeneous Media 2 (2007),

no. 4, 597.

[NWD15] Felipe Nunez, Yongqiang Wang, and Francis J Doyle, Synchronization of pulse-coupled oscillators on (strongly) connected graphs, Auto-

matic Control, IEEE Transactions on 60 (2015), no. 6, 1710–1715.

[OFR+12] Jukka-Pekka Onnela, Daniel J. Fenn, Stephen Reid, Mason A. Porter, Peter J. Mucha, Mark D. Fricker, and Nick S. Jones, Taxonomies of

networks from community structure, Physical Review E 86 (2012), no. 3, 036104.

[PARS14] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena, DeepWalk: Online learning of social representations, Proceedings of the 20th ACM

SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 701–710.

[PC15] Anton V Proskurnikov and Ming Cao, Synchronization of pulse-coupled oscillators and clocks under minimal connectivity assumptions,

arXiv preprint arXiv:1510.02338 (2015).

[Pey09] Gabriel Peyré, Sparse modeling of textures, Journal of Mathematical Imaging and Vision 34 (2009), no. 1, 17–31.

[PJM10] Antonis Papachristodoulou, Ali Jadbabaie, and Ulrich Munz, Effects of delay in multi-agent consensus and oscillator synchronization,

Automatic Control, IEEE Transactions on 55 (2010), no. 6, 1471–1477.

[POM09] Mason A Porter, Jukka-Pekka Onnela, and Peter J Mucha, Communities in networks, Notices of the AMS 56 (2009), no. 9, 1082–1097.

[PS11] Roberto Pagliari and Anna Scaglione, Scalable network synchronization with pulse-coupled oscillators, IEEE Transactions on Mobile

Computing 10 (2011), no. 3, 392–405.

[RPFM17] Puck Rombach, Mason A Porter, James H Fowler, and Peter J Mucha, Core-periphery structure in networks (revisited), SIAM Review 59(2017), no. 3, 619–646.

[RPZ+18] Bin Ren, Laurent Pueyo, Guangtun Ben Zhu, John Debes, and Gaspard Duchêne, Non-negative matrix factorization: robust extraction

of extended structures, The Astrophysical Journal 852 (2018), no. 2, 104.

[SGH02] Arkadiusz Sitek, Grant T Gullberg, and Ronald H Huesman, Correction for ambiguous solutions in factor analysis using a penalized least

squares objective, IEEE transactions on medical imaging 21 (2002), no. 3, 216–225.

[SLN20] Christopher Strohmeier, Hanbaek Lyu, and Deanna Needell, Online nonnegative tensor factorization and cp-dictionary learning for

markovian data, arXiv preprint arXiv:2009.07612 (2020).

[Str00] Steven H Strogatz, From kuramoto to crawford: exploring the onset of synchronization in populations of coupled oscillators, Physica D:

Nonlinear Phenomena 143 (2000), no. 1, 1–20.

[TN12] Leo Taslaman and Björn Nilsson, A framework for regularized non-negative matrix factorization, with application to the analysis of gene

expression data, PloS One 7 (2012), no. 11, e46331.

[TS90] Daisuke Takahashi and Junkichi Satsuma, A soliton cellular automaton, J. Phys. Soc. Japan 59 (1990), no. 10, 3514–3519.

[TTMS96] Tetsuji Tokihiro, Daisuke Takahashi, Junta Matsukidaira, and Junkichi Satsuma, From soliton equations to integrable cellular automata

through a limiting procedure, Physical Review Letters 76 (1996), no. 18, 3247.

[WND13] Yongqiang Wang, Felipe Nunez, and Francis J Doyle, Increasing sync rate of pulse-coupled oscillators via phase response function design:

theory and application to wireless networks, Control Systems Technology, IEEE Transactions on 21 (2013), no. 4, 1455–1462.

[Woy06] Wojbor A Woyczynski, Burgers-kpz turbulence: Göttingen lectures, Springer, 2006.

[ZEB18] Shuo Zhou, Sarah Erfani, and James Bailey, Online cp decomposition for sparse tensors, 2018 IEEE International Conference on Data

Mining (ICDM), IEEE, 2018, pp. 1458–1463.

[ZVB+16] Shuo Zhou, Nguyen Xuan Vinh, James Bailey, Yunzhe Jia, and Ian Davidson, Accelerating online cp decompositions for higher order

tensors, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1375–

1384.

HANBAEK LYU, DEPARTMENT OF MATHEMATICS, UNIVERSITY OF CALIFORNIA, LOS ANGELES, CA 91786.

Email address: [email protected]