cyclic g a match encoder (cgame): a network for od …

CYCLIC GRAPH ATTENTIVE MATCH ENCODER (CGAME): ANOVEL NEURAL NETWORK FOR OD ESTIMATION

Guanzhou LiTsinghua University

Beijing, [email protected]

Yujing HeTsinghua University


Jianping Wu*Tsinghua University


ABSTRACT

Origin-Destination Estimation plays an important role in traffic management and traffic simulationin the era of Intelligent Transportation System (ITS). Nevertheless, previous model-based methodsface the under-determined challenge, thus desperate demand for additional assumptions and extradata exists. Deep learning provides an ideal data-based method for connecting inputs and outputs byprobabilistic distribution transformation. While relevant researches of applying deep learning into ODestimation are limited due to the challenges lying in data transformation across representation space,especially from dynamic spatial-temporal space to heterogeneous graph in this issue. To address it,we propose Cyclic Graph Attentive Matching Encoder (C-GAME) based on a novel Graph Matcherwith double-layer attention mechanism. It realizes effective information exchange and establishescoupling relationship across underlying feature space. The proposed model achieves state-of-the-artresults in experiments, and offers a novel framework for inference task across spaces in prospectiveemployments.

Keywords Origin-Destination estimation · Cyclic Graph Attentive Graph Matching Encoder (C-GAME) · GraphMatcher · double-layer attention

1 Introduction

Origin–Destination Matrix (OD Matrix), a significant concept in the transportation domain, illustrates the travel demandbetween pairs of spots. It can serve as inputs in microscopic, mesoscopic traffic simulation [1]. In the traditional trafficdomains, it can help assess the construction needs of transportation infrastructure and tackle the problem of trafficassignment. With the development of Intelligent Transportation Systems (ITS), OD matrix plays key roles in trafficmanagement [2], contributing to intelligentizing traffic light control policies and taxi and car hailing pre-allocationservices [3]. Since the appearance of sharing bikes, their pick-up and drop-off demand has received more and moreattention [4]. We believe there are potential to mine the relationship between sharing-bikes’ and vehicle’s OD in that, asa mobile mode, sharing bikes meet public’s needs for the last mile of travel.

With regard to estimation method, previous attempts include Kalman filter, Bayesian approach, Generalized LeastSquares (GLS), Maximum Likelihood (ML) and Gradient based solution techniques [5]. The essential problem of ODestimation lies in equation mentioned in [6].

fl(t) =∑o,d,τ

altodτdod(τ) (1)

where fl(t) is the flow on the link l at timestep t. dod(τ) denotes OD flow between origin o and destination ddeparting at timestep τ . altodτ depicts assignment matrix mapping from dod(τ) to fl(t), related to 5 various parameters.OD estimation is equivalent to derivation of altodτ . Obtaining high-dimensional variables from low-dimensionalinformation brings an under-determined characteristic to the problem [2] [7], thus estimation quality is dependenton extra assumptions or additional positioning or path information. On one hand, based on auxiliary data fromroadside survey, Bluetooth, mobile device [2] [8–11], path-based or sub-path-based information [2] [6] [12, 13] is

arX

iv:2

111.

1462

5v2

[cs

.LG

] 2

9 Ja

n 20

22

used to formulate OD estimation. On the other hand, three categories of assumption are thought to matter in this dataassimilation process:

(1) Data reliability; (2) availability of how the quantifiable model describes the way data mapping to the estimated ODmatrix; (3) quality of solution algorithm and optimization process [6]. First, Data reliability refers to the completeness,accuracy and timeliness of observable data. In order to ensure the quality, data ought to be as simple and easily availableas possible in the model. That’s the reason why link traffic counts are the favor of many researchers. However, widelyused prior OD matrices in many algorithms are potentially the cause of the instability and bias of results [10] [14, 15].To eliminate the desperate need of that, attempts combined GLS and maximum entropy were made to stuff thenon-observable variables [16]. Upholding the principle of concise model, we devote to develop model with only easily-accessible traffic counts. Second, Availability of quantifiable model requires to get rid of unnecessary assumptionsin model. In term of assignment matrix, it can be categorized into assignment-matrix based and assignment-matrixfree [17]. Assignment-matrix based method usually presumes principles of route choice like system optimal (SO) anduser-equilibrium (UE). It doesn’t guarantee to be consistent with the dynamic actual assignments. Another viewpointconsiders assignment matrix depicting probabilities of route choices and spatial-temporal accessibility. Inspired bythat, Neural Networks are perceived as excellent estimator to capture probabilistic association. Third, performance ofsolution is highly related to the designed algorithm, and the OD estimation demand to be solve jointly with bi-levelmodel, with both nonlinear and nonconvex. To avoid dependency upon model, we hope to transfer our technicalroadmap from model-based to data-based, making Neural Networks our best choice.

In the era of ITS, deep learning has achieved great success in various traffic fields recently, involving traffic flowforecasting [18–20] and intelligent traffic signal control [20–22]. While to the best of our knowledge, application ofDeep Neural Networks (DNNs) in OD estimation issue is relatively limited. Viewing OD as a fully connected graph,the estimation problem is essentially approximation of quantities distributing among the graph edges. The emergingGraph Neural Networks (GNNs) differs from Convolution Neural Networks (CNNs) in that its topology cannot berepresented suitably in Euclidean space. Graph structure, instead of quantified distribution on a graph attracts moreattention in the domain of graph generation. Numerous models focus on whether links or points exist on a graph, orwhether a graph exists in a set of graph distributions [23–30]. Deduction from spatial-temporal space to quantities on agraph is not only a challenging issue, as transportation networks are unprecedentedly complicated with heterogeneousvehicular flow [31], but also an attractive issue, due to its widespread potential applications from traffic networks toother social networks.

Following the discussions above, we simply input form of data to traffic counts on the link within period, and presumethat there exists certain stable probabilistic pattern mapping from the input to outputs in the problem of OD estimation.Data assimilation process with traditional framework of DNNs is hard to converge due to heterogeneity of inputs anddiscreteness and sparsity of output. Inspired by cycleGAN, we proposed Cyclic Graph Attentive Matching Encoder(C-GAME), a novel bi-directional “Encoder-Decoder” structure with attention mechanism applying in the midtermencoded layer. The mainly contributions include:

• We bring in the concepts of coupling the spatial-temporal link’s flows and graphical OD flows as two sides ofcomplete routine trips.

• The C-GAME is designed as Cyclic generation architecture integrated with double-layer attention: attentionfor variation similarity and structure attention, regarding variation similarities in different ranges as channelsand coupling foreward and backward propagation of neural networks.

• To the best of our knowledge, we firstly apply pure deep learning method to capture dynamic OD matrixfrom traffic counts without assisting by other algorithm, and attain state-of-the-art (SOTA) compared withbenchmarks.

• The proposed framework can extend to other deduction of data from different representation space. Forexample, deduction of intensity of pairs of points among social networks from densities of message passing.

The rest of the papers is organized as follows. Chapter II reviews relevant works about OD estimation. ChapterIII introduces details of proposed model, C-GAME. Chapter IV includes data generation and model comparison.Conclusions are given in Section V.

2 Related Work

Existing OD estimation methods can be divided into two categories: "forward path" and "reversely engineering" [6].Information of OD flows is mainly derived from traffic survey or statistics method based on historical census datafor forward path. As an example, [32] incorporate trip generation, trip attraction and trip distribution into statistical

2

Bayesian model with Poisson and negative binomial distribution acting as the prior. Based on the traffic pattern impliedfrom the causality between activity motives and travel flow, these methods establish corresponding statistical modelto formulate OD flows with interpretable variables. [33] made use of index of geographical window-based structuralsimilarity (GSSI) to cluster the structure of OD matrix, and attain underlying travel pattern. Practically, traffic pattern isinfluenced by many factors, weather condition, topology of road networks and traffic management and so forth, makingthe estimated model difficult to be extensively applicable. Hence, adequate OD investigation will be costly in time andmanpower, while insufficient investigation will cause sample deviation. Most importantly, dynamic OD matrix can bekey inputs of evaluation of traffic management and policy measures [18], which is difficult to estimate in real time bydirect method.

The reversely engineering, on the other hand, dynamically deduces OD matrix with acquirable data, which receiveswidespread favor. Inferred results originate from various data sources. Most of them relates to link properties, liketraffic counts and average speed [12, 34–36]. Because these data are easily time-efficiently accessible in IntelligentTransportation System, deriving OD matrix from them is quite attractive [37]. Data from mobile devices and GPS is alsoincreasingly focused [10, 14, 38] due to its high penetration [9]. Other sources include automatic vehicle identification(AVI) [7, 13, 39], Bluetooth [2], probe vehicle data [40], video recordings [41], roadside survey [8], smart card in publictransportation [42]. Noting lots of them are routine-relative, it makes OD matrix estimation much more straightforward.While the completeness of these data acquisition has a great impact on the derived results, estimation assisted withsporadic routing data is proposed by [12].

Except for a small amount of delicately designed single-level models [43], the problem of OD estimation is oftenformulated as a bi-level optimization model. The upper-level attempt to estimate OD matrix precisely and lower-levelconsider problem of dynamic traffic assignment (DTA) [44].

To deal with the upper-level issue, the method can be categorized as constrained optimization functions, iterativeestimation in the state space, and gradient based solution techniques. Constrained optimization functions refer toGeneralized Least Square (GLS), Maximum Entropy, Bayesian approach and so forth. GLS was put forward by Cascettaand Bell to solve this problem by minimizing Mahalanobis distance to the prior OD matrix [45, 46]. It regards thetrue OD matrix and traffic counts in a stochastic way and minimize the error term with assumption of zero-meanerror [47]. To compensate for the locality of temporal information, Lin et al adopted GLS with time windows [48]. YuNie et al proposed a decoupled GLS path flow estimator to overcome the bi-level model’s non-guaranteed convergencedefect [49]. The Maximum Entropy, concept originated in physics and extending to other fields, was first formallypresented by Shannon in informatics. Van Zuylen and Willumsen introduced this method into the OD estimationproblem in different forms [47,50]. Bayesian Inference toward the problem was proposed by Maher with prior Gaussiandistribution [51]. Madalin-Dorin extended Bayesian approach by combining it with Markov process, which was usedto represent the access probability in the road networks. Estimating OD matrix iteratively in the state space mainlyinvolves Kalman Filter and its variants. Kalman filter, a classical algorithm developed by Rudolf E. Kálmán, utilizesa series of observed variables overtime by the means of estimating the joint probability distribution among variablesat each moment. It has extensive applications in various domains, from dynamic positioning to weather forecasting.Ashok viewed OD estimation as an autoregressive process and introduced the method into this subject [52]. Consideringthe nonlinearity of relationship between OD distributions and link traffic counts, Chang et al brought in ExtendedKalman filter (EKF) on highway networks and general networks [53, 54]. Vittorio Marzano introduced quasi-dynamicassumption into EKF as QD-EKF and test the results in real-size networks [55].

Gradient based techniques include SPSA and Neural Networks. To ease the dependencies of complex relationshipbetween OD flows and observable known data, Simultaneous Perturbation Stochastic Approximation (SPSA), anassignment matrix free method was brought in OD estimation [56]. This stochastic approximation algorithm canoptimize systems with both OD and link’s flow [57, 58] together. Nevertheless, there exists a common problem ingradient methods, and SPSA is no exception, that is they are sensitive to the initial parameters, the scale of variables,objective functions and gradient. Athina Tympakianaki et al improve the robustness of SPSA by applying clusteringstrategies, namely c-SPSA [17] and then extend c-SPSA by taking an alternative clustering strategy and adopting hybridgradient estimation [59]. Neural Networks have received tremendous attention in various fields, while the applicationstoward OD estimation is relatively limited due to its complexity. Gong proposed Hopfield Neural Networks aimed attrip matrix issues [60]. Krishnakumari et al constructed 3D supply pattern data to utilize allocation among N-shortestspatial-temporal-path in replacement of iterative optimization, and neural networks mapped from speed, volume to tripattractions and trip generations [6]. Cao et al capture the best underlying representations with autoencoder, and generateOD matrix [14]. Ou et al exploited multiple roles mechanism, including learner, assigner and searcher, to respectivelylearn traffic assignment and OD estimation [61].

To capture the features better and alleviate the impact of curse of dimensionality in large-scale networks, auxiliarymethods of dimension reduction are also put forward against this issue. Principle Component Analysis (PCA) is often

3

utilized to reduce the dimensions of inputs [36, 62] and the state space [63]. Ou addressed sparsity of observable ODmatrices with non-negative Tucker decomposition [14]. Ren factorized stacked 4-dimension OD tensor with CP tensordecomposition and evaluated the results with prediction by ARIMA and SVR and it shows better performance [64]. Asfor lower-level model, the quality of DTA will also affect the accuracy of OD estimation. Willumsen divided existingmethods into proportional assignment, equilibrium assignment and stochastic assignment [65]. Besides, heuristic searchalgorithms like Bee Colony Optimization are also employed to solve DTA problems [37].

Taking the diversity of models and data into account, Antoniou et al proposed a general test framework and comparisonbenchmarks for different means of OD estimation [18]. But they aimed mainly at algorithm-based solution, thuswe cannot fit our experiment into that framework as our newer model and more concise data, thereby we conductexperiment under the test framework built by ourselves, see Chapter IV for details.

3 Methodology

3.1 Problem Definition

The goal of this research is to dynamically generate Origin-Destination matrix based on the observable variables of linksin several timesteps during a certain period. Define a road network G, which is made up of np spots P1, P2, · · · , Pnp asorigins and destinations, and nl links l1, l2, · · · , lnp connecting pairs of spots. Continuous time is discretized into ntintervals with fixed duration τ , labelled as t1, t2, . . . , tnt . The element in traffic counts matrix F ∈ Rnl×nt , F (i, j),describes total traffic counts on the li at interval tj . OD matrix D ∈ Rnp×np is defined as a table including informationabout number of trips from one spot to another during whole period of t1 ∼ tnt .Considering all trips in the traffic networks as a set of routes r1, r2, . . . , rnr , each of them is expressed as a chain

ri :[Pϕo , (lα1 , tβ1) , (lα2 , tβ2) , . . . ,

(lαη , tβη

), Pϕd

]It can be seen from the definition that, OD matrix reflects its graph structure, and traffic counts depicts its spatial-temporal distribution. Information about graph structure and spatial-temporal distribution co-exist in the distribution ofroute set, like two sides of a coin. The essential of OD estimation is to couple these two sides and bridge the graph andspatial-temporal space.

Construct a vector R = [nr1 , nr2 , . . . , nrnr ]T to represent the distribution of route set, with nri being number of tripsalong route ri. It covers all information included in the traffic counts matrix F and OD matrix D, as shown:

F(i, j) =∑

(li,tj) in rk

nrk (2)

D(i, j) =∑

(PϕO,Pϕd )=(Pi,Pj)

nrk (3)

Including subset of the information of R, F and D can relate their distribution by the means of co-occurrence probabilityin the route set. The mapping process from F and D to R can be captured by Neural Networks as Figure1. Actually, thetraditional bi-level optimization is transformed to bi-direction Neural Networks from this perspective. However, thecombination of spatial-temporal data and graph data cause curse of dimensionality, and the vector R is unknown andsparse, making biased two-stage mapping and heavy computational burden.

3.2 Overview

To address the problem, this research compressed the vector R into representational space with nf features, as shownin Figure2. Each feature might be viewed as a subset of route set {rε1 , rε2 , . . . , rεv}, and the subsets generated fromdifferent directions distributes differently, expressed as {rε1 , rε2 , . . . , rεv} and {r′ε1 , r

′ε2 , . . . , r

′εv}, respectively. And a

matching layer weighs the relations between subsets from both sides.

C-GAME adopts bi-directional Encoder-Decoder architecture, with a midterm matching layer, namely Graph Matcher.It takes responsibilities of seeking right matchings to convey information from one side to the other correctly. To reachthe goal, we design a novel attention form to give more pass rates for correct matching information, while less for thewrong. The learning process consist of gradient propagation and iterative update of Graph Match layer, where theformer is for message passing learning, and latter is for graph structure learning. From both continuous and discretelayers, we expect to estimate complex OD more precisely.

4

Figure 1: Mapping from Traffic Counts to OD Matrix by Route Distribution Vector

Figure 2: Mapping from Traffic Counts to OD Matrix by Embedded Feature Space

5

Figure 3: Forward Encoder

3.3 Methodology

3.3.1 Introduction of proposed architecture

C-GAME consists of three components: forward Encoder, inverse Encoder and Graph Matcher. As shown in Figure3.Both forward and inverse Encoder use Encoder-Decoder framework with a sharing midterm match layer. Withoutloss of generality, taking the forward Encoder as example, we flatten spatial dimension and temporal dimension intoone dimension. Then MLP embeds the processed input into representative space. The Graph Matcher captures therelationship between forward embedded features with that in inverse direction, and transforms encoded features to feedinto Decoder.

3.3.2 Encoder-Decoder Frameworks

Embedding the complex, highly heterogeneous spatial-temporal input data into representative space is a key problemin flow information extraction. Graph Convolution Networks (GCNs) and Recurrent Neural Networks (RNNs) canextract the spatial dependencies and temporal dependencies respectively. However, the OD estimation requires to learnfrom highly space-time related route information, a hierarchical multi-hop graph in spatial-temporal space, thus weconnect all nodes in both spatial and temporal dimensions. The Encoder part can be built by two-layer MLPs, which isexpressed as:

hx = LeakyReLU (W2 (LeakyReLU (W1x+ b1)) + b2) (4)

where W1,W2, b1, b2 are learnable parameters in MLPs, x is the input data, hx is the embedded features with shape of[b, nf ]. LeakyReLU, a frequently used activation layer, stands for an extended formation of Rectified Linear Unit.

Given structure matching matrix M ∈ Rnf×ns , and structure value matrix V ∈ R1×ns by Graph Matcher Layer,the attention operation can be expressed by:

gx = meanns(hx � ||bM � ||b,nfV ) (5)

where � represents Hadamard product operator, and broadcast mechanism is utilized in the process, ||b,||b,nf denotesduplicate operation in batch dimensions, in both batch and embedded feature dimensions, respectively, hx, gx is the

6

Figure 4: Graph Matcher

Table 1: Notations

Notation Representationb batch sizej step of updating M and Vp sub-step of updating M in each stepλm discount factor in updating process of Mλv discount factor in updating process of Vnf number of features in Mns number of structures in VMj M in the jth stepVj V in the jth step

results of encoder and the inputs of decoder. More detail about meanings and generation of M and V will be shown infollowing part.

The Decoder includes also two-layer MLPs, mapping the processed embedded feature vectors gx to outputs:

y = LeakyReLU(W3(LeakyReLU(W3gx + b3)) + b4) (6)

3.3.3 Graph Match Layer

For the convenience of expression, the notations in table 1 are used to represent relevant parameters and variables inGraph Match Layer.

The structure matching matrix M is used to match forward and inverse embedded features hx and hy . Meanwhile, thestructure value matrix V weighs each structure with the quality of their generation. At the beginning, they are bothinitialized as matrices filled with 1, on behalf of uniform pass rates.

Each step of updating M consists of ns sub-steps. Similarity along the batch is calculated in every sub-step, with theinherent logic of similar sub-structure sets varying in an analogous way. The similarities between embedded vectors offorward and backward direction are decomposed into matches in different time range. In p− th sub-step, p batches ofembedded vectors are concatenated to calculate responding columns of structure matching matrix Mj :

7

Mj(:, p) = (1− λm) ·Mj(:, p) + λm ·∑p·b (hx � hy)√∑

p·b (hx � hx)√∑

p·b (hy � hy)(7)

where Mj(:, p) denotes the p− th columns of structure matching matrix. � is Hadamard product, and∑p·b(·)is sum

in the concatenated batch dimension. Correspondingly, the p− th column of structure value matrix V is updated as:

Vj(p) = (1− λm) · Vj(p) (8)

The structure value matrix V measures the similarity between hy and h′x, denoted as the embedded vector hx operatedby each structure. It is given by:

Vj =

∑nf

(‖nshx)�Mj � (‖nshy)√∑nf

(‖nshx)�Mj

√∑nf

(‖nshy)+ λvVj (9)

where ||ns duplicates embedded feature vectors along structure dimensions,∑nf

(·) is sum in the features dimension,and

∑nf

(||hs hx)�Mj processes embedded features hx with ns structures to h′x. The last term represents the discountweight value from last sub-step. The broadcasting mechanism is used in calculation.

The pseudo-code of Graph Matcher is given in Algorithm1.

Algorithm 1: Graph Match LayerInputs :hx,1, hx,2, · · · , hx,ns ;hy,1, hy,2, · · · , hy,ns ;λm;λvInitialize :M0 = 1 ∈ Rnf×ns ;V0 = 1 ∈ R1×ns

1 for j = 1, 2, · · · do2 for p = 1, 2, · · · , ns do3 Mj(p)← (1− λm) ·Mj(p) + λm ·

∑p·b(hx�hy)√∑

p·b(hx�hx)√∑

p·b(hy�hy)

4 Vj(p)← (1− λm)Vj(p)

5 Vj ← λVj +Σnf (‖nshx,i)�Mj�(||nshy,i)√∑nf

(||nshx,i)�Mj

√∑nf

(‖nshy,i)

6 end7 end8 return Mj , Vj

3.3.4 Loss Function

The goal of model training is to approximate the OD matrix yt as close to the true OD yt as possible. MSE Loss and L1Loss are utilized in the experiments, respectively. MSE is classical criterion which can limit large deviations betweenestimates and true values and L1 loss is able to measure the intuitive vehicle-number difference.

L1 Loss = |yt − yt| (10)

MSE Loss = (||yt − yt||)2 (11)

4 Experiments

4.1 Description

Due to the lack of complete real-time OD matrices in the realistic network, it is challenging to attain training data.Thus, the widespread traffic simulator, SUMO, is involved in our research. We evaluate our model in a 6×6 gridnetwork and a realistic traffic network. Both networks deploy several traffic signal control strategies at signal controlintersections. Trips are stochastically generated from the pre-designed path set without detour, taking the characteristicsof origin-destination pairs in to account. The pre-designed path set consists of all the non-detour paths between eachorigin-destination pair. Each vehicle is produced from several vehicular classes in a particular proportion, and these

8

Figure 5: Grid Network

classes are made up of normal car, slow car, fast car, bus and truck, whose parameters precisely set on the basis ofmeasured data. In terms of driving behavior and car following strategy, intelligent driver model (IDM) is taken to mimicbehaviors of human drivers.

4.1.1 Scenario 1: Grid Networks

As shown in Figure5, the grid network has 36 signal control intersections and 120 4-lane links, the length of which is 2km each, and all OD flows generate and sink around intersections. The duration of simulation in a round is configuredas 31000 simulated seconds with 1000-second buffer period for stabilizing the simulation, thereby the informationgenerated within the first 1000s is removed from the dataset. Traffic counts in every links are measured with interfaceprovided by SUMO, defined as number of vehicles passing particular link in 5 mins, then dataset is divided hourlyand 12 consecutive 5-minute slices are taken to estimate OD demands in the hour. Totally 12292 pieces of data aregenerated from 1600 epochs of simulations. And simulations are configured as random mode to ensure the results of anepoch not repeated from another. After random shuffling, 80% of data is used as a training set and 20% as a test set.

4.1.2 Scenario 2: Haikou Networks

As shown in Figure6, the proposed model has been also tested in the real-size experiment field, lying in downtownzone with size of 10km×6km in Haikou city, including 2328 links and 1171 intersections, among which 359 links aremonitored to obtain traffic counts. 31 traffic hotspots are selected as origins and destinations, including shopping mall,residential community, school and college, parks, etc. The period of every epochs is defined as from 5:20 am to 10:20pm, and considering the stabilizing process, data from 6 am to 22 pm is recorded and divided hourly as descriptionabove. In the period, 6 types of traffic demand intensity are defined: morning, morning peak, noon, afternoon, eveningpeak, night. After survey and observation, several urban crowd migration patterns are added to the simulation. Trafficflows move from residential to commercial areas during the day and vice versa at night on the weekday, and move toshopping malls and park in the morning and afternoon and home at nightfall at weekends. Totally we run 400 epochs ofsimulations, accumulating 16h data over 400 days. After normalization and randomly permutation, 90% of data serveas training set and 10% as testing sets.

4.2 Performance Evaluation

The matrices to evaluate the estimation performance are as below:

9

Figure 6: Haikou Network

a) Root Mean Squared Error

RMSE =

√√√√ 1

N

N∑i=1

(yi − yi)2 (12)

b) Mean Absolute Error

MAE =1

N

N∑i=1

|yi − yi| (13)

c) Accuracy

Acc = 1− |yi − yi||yi|

(14)

d) Coefficient of Determination (R2)

R2 = 1−∑Ni=1 (yi − yi)2∑Ni=1 (yi − y)

2(15)

e) Variance Score(var)

var = 1− var(yi − yi)var(yi)

(16)

where N denotes the number of samples from test sets, yi is the real value of OD flows, and yi is estimated from model.y is the average of yi.

The RMSE, MAE are measures quantifying the difference between derived OD matrix and the actual one. Accuracydepicts the estimation precision: The closer to 1 it is, the better model performs. R2 and var represents the correlationcoefficient, showing the ability of estimated results to reflect the real data.

4.3 Benchmark Model

In the experiments, we take 2 commonly used methods and 2 neural networks architecture that is considered suitable forthe OD estimation problem as comparison benchmarks. SPSA optimizes estimation of OD matrices and traffic counts

10

Figure 7: Thermodynamic Diagram for Scenario 1

data simultaneously and its extension c-SPSA operate the processes in traffic conditions with different scale respectively.Referring to [17], we selected the number of clusters as 3. The autoregressive model Kalman Filter optimizes theresults of estimation iteratively, and Extended Kalman Filter is used in our experiments due to the non-linearity oftraffic assignment. In the part of deep learning, Graph Convolutional Networks (GCNs) combined with TemporalConvolutional Networks (TCNs) is proven to be highly effective model in traffic forecasting problems, and it’s believedto perform extraordinary in extracting spatial-temporal information from graph structured data. Hence, GCNs andTCNs take responsibilities for capture of spatial-temporal dependencies from traffic counts, then 2-layer MLPs projectthe convolved vectors to final outputs. The other networks, CycleGAN is proposed to solve the style transfer problemin the computer vision domains. As mentioned in [20], the bi-level optimization of OD estimation can be viewed asa bi-directional framework, thereby CycleGAN is naturally suitably transferred to capture the relationship betweentraffic counts and demand matrix. To make it more comparable, the identical “Encoder-Decoder” structure is taken asgenerators. And we choose WGANs to ensure the stability of the process of generation.

4.4 Results and Analysis

4.4.1 Comparison between real OD and estimated OD

Figure7 and figure8 present the thermodynamic diagram of real and estimated OD matrices generated by proposedmodel and benchmarks in grid network and Haikou network, respectively. Colors of pixels quantify the intensities ofOD flows from origins to destinations. OD flows do not move from a site to the site itself, making the diagonal of ODmatrix 0. In the grid network, the 3 zero-value oblique lines around diagonal indicate that OD flows are not generatedbetween closely adjacent sites. Viewed as fully connected graph, OD matrix is a highly heterogeneous graph, that is forsome particular nodes, naming “heterogeneous points”, their intensities of OD flows are quite distinctive from theirneighbor pixels’, and predicting the existence of these points is challenging for many models. While as is often thecase, the information of these prominent points plays a significant and indispensable role in traffic management for it isessential for equitably allocating traffic resources. According to the results, we find that GCNs and SPSA model canonly exhibit macroscopic distribution of OD matrix and neglect nearly all of the heterogeneous in grid network whileSPSA is able to capture some of these points in Haikou network. The reasons of the phenomenon, we consider thateach epoch of simulation in grid network is independent and doesn’t exhibit a specific traffic demand pattern amongepochs, while in Haikou network SPSA is capable to capture the traffic demand mode in the same hour during variousdays. Additionally, the first network is relatively congested with over 30000 vehicles running on the network at peak

11

Figure 8: Thermodynamic Diagram for Scenario 2

Table 2: Measures of Performance

SPSA EKF GCN CycleGAN ProposedGrid Haikou Grid Haikou Grid Grid Haikou Grid Haikou

RMSE 11.3747 4.4824 17.4338 13.0741 16.2310 13.9392 8.3746 0.6487 4.0893MAE 8.8071 3.4682 11.1313 10.1892 10.8456 10.5372 5.3074 0.3945 2.9136

Accuracy 0.4804 0.6469 0.4292 0.2663 0.2439 0.3774 0.5169 0.9773 0.7433R2 * * * * * * * 0.9966 0.5693var 0.3429 0.7198 * * * 0.0213 0.4717 0.9979 0.8731

time, that may also affect the performance of SPSA. Because of partial observation of traffic counts leading to lackof adjacency information, the GCNs model requiring adjacent matrix training is not adopted in Haikou network. TheCycleGAN achieves better performance than prior 2 models to a certain extent, but still misses some of points and themodel’s estimation is not precise enough. And the iteration of Kalman Filter is composed of matrix multiplications,resulting in the elements in estimated matrices easy to too large or small collectively. Because of the intermediate graphmatcher layers, information can be fully exchanged and the demand matrix estimated by C-GAME is highly consistentwith the true one, The discreteness and heterogeneity of data are successfully reflected.

4.4.2 Measures of Performance

Table 2 shows the performance of C-GAME model and baseline methods based on the mentioned 5 metrics in the gridnetwork. * means the values are negative, indicating that the model’s performance is poor under the evaluation metric.It can be seen that C-GAME obtain the best results in both grid network and Haikou network under all metrics. TheRMSE and MSE error of C-GAME decreases to 0.65 and 0.39 in grid network, which means that the estimation errorof each OD pair is restricted to a single vehicle on average, and the error is around 3 vehicles averagely in real-sizenetwork. Hence it reaches the highest accuracy 97.73% and 74.33% respectively in corresponding networks. The causeof proposed model’s performance decay in real-size road network might be led by complicated network topologiesand large probability space of path selection versus the small training samples, making it biased between the samples’distributions in training set and test set. Generally, the proposed model has the best precision than other baselines. Asexplained in previous section, in uncongested real-size network with particular traffic pattern, SPSA and CycleGANperforms better than in grid network, obtaining 65% and 52% accuracy, respectively. Considering the metrics evaluated

12

Figure 9: Scatter plot of real and estimated OD flows and traffic counts in Scenario 1

the correlations between real data and estimations, due to the midterm layers extracting variation relationship, theproposed model outperforms baselines and reaches 99.66%, 99.79% in first scenario.

Other than that, Figure9 and Figure10 show the estimation capability of C-GAME on OD flows and traffic counts,Figure9 reflects the capability of estimation in scenario 1, and 3000 pairs of real and estimated values, namely statepoints, are randomly selected from the results and fitted into a straight line, whose slope and correlation are both closeto 1

Among the baselines, as is seen in the table that SPSA and CycleGAN perform better in Haikou network, thereby thetwo models are selected in Figure 10 for comparison. The straight line represents the standard of completely accurateestimation, and the closer state points are to the line, the best the model performs. The state points of C-GAME in bothOD and traffic counts estimations are generally distributed around the standard line, while there are outliers deviatingfrom the standard line in the two benchmarks, illustrating their poor accuracy on a few OD pairs. It is noted that forSPSA model, the larger the OD flows are, the more disperse the results shows, which is consistent with the inferenceof SPSA’s poor performance in congested network. To sum up, we proposed a neural networks C-GAME, which hasachieved ideal results in the both test scenarios.

4.4.3 Loss Curve

The loss curve shows the rate of convergence, and we compare proposed model with benchmark neural networks. Thetotal training process includes 100 epochs in both cases, and the loss curve shows CGAME can converge to ideal valuewith fast speed and small fluctuation.

5 Conclusion

This paper develops a novel neural network CGAME to address OD estimation, which consists of Graph Matcher Unitsexchanging information in underlying space.

We construct experiments in a 6×6 grids and real-size network in Haikou with the help of a microscopic traffic simulatorSUMO. As limited implements of deep learning in this problems, two classic architectures, GCN+TCN and CycleGAN,are thought to be suitable for this issue and are transferred as benchmarks. Besides, the classical methods in trafficdemand estimation, Simultaneous Perturbation Stochastic Approximation (SPSA) and Kalman filter (KF) are alsocompared with proposed model. The proposed model shows satisfactory performance with ability to not only learngeneral distributions but capture discrete high-demand elements in OD matrix, making the neural networks method ableto be deployed in the problem of predicting demand of complicated urban road networks. In the principle of our model,

13

Figure 10: Scatter plot of real and estimated OD flows and traffic counts in Scenario 2

Figure 11: Loss Curve in Scenario 1 and Scenario 2

14

we consider that high related states in coupling space will vary in the same direction in the representative space. Alongwith this line, further research could be extended to more inference tasks.

Acknowledgments

The authors acknowledge support from the Center of High Performance Computing, Tsinghua University.

References

[1] Jaume Barceló et al. Fundamentals of traffic simulation, volume 145. Springer, 2010.[2] Krishna NS Behara, Ashish Bhaskar, and Edward Chung. A novel methodology to assimilate sub-path flows in

bi-level od matrix estimation process. IEEE Transactions on Intelligent Transportation Systems, 2020.[3] Lingbo Liu, Zhilin Qiu, Guanbin Li, Qing Wang, Wanli Ouyang, and Liang Lin. Contextualized spatial–temporal

network for taxi origin-destination demand prediction. IEEE Transactions on Intelligent Transportation Systems,20(10):3875–3887, 2019.

[4] Junchen Ye, Leilei Sun, Bowen Du, Yanjie Fu, Xinran Tong, and Hui Xiong. Co-prediction of multiple trans-portation demands based on deep spatio-temporal neural network. In Proceedings of the 25th ACM SIGKDDInternational Conference on Knowledge Discovery & Data Mining, pages 305–313, 2019.

[5] Torgil Abrahamsson. Estimation of origin-destination matrices using traffic counts-a literature survey. 1998.[6] Panchamy Krishnakumari, Hans van Lint, Tamara Djukic, and Oded Cats. A data driven method for od matrix

estimation. Transportation Research Procedia, 38:139–159, 2019.[7] Xuesong Zhou and Hani S Mahmassani. Dynamic origin-destination demand estimation using automatic vehicle

identification data. IEEE Transactions on intelligent transportation systems, 7(1):105–114, 2006.[8] Masao Kuwahara and Edward C Sullivan. Estimating origin-destination matrices from roadside survey data.

Transportation research part b: methodological, 21(3):233–248, 1987.[9] Jingtao Ma, Huan Li, Fang Yuan, and Thomas Bauer. Deriving operational origin-destination matrices from large

scale mobile phone data. International Journal of Transportation Science and Technology, 2(3):183–204, 2013.[10] Md Shahadat Iqbal, Charisma F Choudhury, Pu Wang, and Marta C González. Development of origin–destination

matrices using mobile phone call data. Transportation Research Part C: Emerging Technologies, 40:63–74, 2014.[11] Patrick Bonnel, Mariem Fekih, and Zbigniew Smoreda. Origin-destination estimation using mobile network probe

data. Transportation Research Procedia, 32:69–81, 2018.[12] Katharina Parry and Martin L Hazelton. Estimation of origin–destination matrices from link counts and sporadic

routing data. Transportation Research Part B: Methodological, 46(1):175–188, 2012.[13] Wenming Rao, Yao-Jan Wu, Jingxin Xia, Jishun Ou, and Robert Kluger. Origin-destination pattern estimation

based on trajectory reconstruction using automatic license plate recognition data. Transportation Research Part C:Emerging Technologies, 95:29–46, 2018.

[14] Yumin Cao, Keshuang Tang, Jian Sun, and Yangbeibei Ji. Day-to-day dynamic origin–destination flow estimationusing connected vehicle trajectories and automatic vehicle identification data. Transportation Research Part C:Emerging Technologies, 129:103241, 2021.

[15] Martin L Hazelton. Inference for origin–destination matrices: estimation, prediction and reconstruction. Trans-portation Research Part B: Methodological, 35(7):667–676, 2001.

[16] Dietmar Bauer, Gerald Richter, Johannes Asamer, Bernhard Heilmann, Gernot Lenz, and Robert Kölbl. Quasi-dynamic estimation of od flows from traffic counts without prior od matrix. IEEE Transactions on IntelligentTransportation Systems, 19(6):2025–2034, 2017.

[17] Athina Tympakianaki, Haris N Koutsopoulos, and Erik Jenelius. c-spsa: Cluster-wise simultaneous perturba-tion stochastic approximation algorithm and its application to dynamic origin–destination matrix estimation.Transportation Research Part C: Emerging Technologies, 55:231–245, 2015.

[18] Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driventraffic forecasting. arXiv preprint arXiv:1707.01926, 2017.

[19] Zhengchao Zhang, Meng Li, Xi Lin, Yinhai Wang, and Fang He. Multistep speed prediction on traffic networks:A deep learning approach considering spatio-temporal dependencies. Transportation research part C: emergingtechnologies, 105:297–322, 2019.

15

[20] Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. T-gcn: A temporalgraph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems,21(9):3848–3858, 2019.

[21] Guanjie Zheng, Yuanhao Xiong, Xinshi Zang, Jie Feng, Hua Wei, Huichu Zhang, Yong Li, Kai Xu, and Zhenhui Li.Learning phase competition for traffic signal control. In Proceedings of the 28th ACM International Conferenceon Information and Knowledge Management, pages 1963–1972, 2019.

[22] Wendelin Böhmer, Vitaly Kurin, and Shimon Whiteson. Deep coordination graphs. In International Conferenceon Machine Learning, pages 980–991. PMLR, 2020.

[23] Julian Stier and Michael Granitzer. Deepgg: A deep graph generator. In International Symposium on IntelligentData Analysis, pages 313–324. Springer, 2021.

[24] Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Charlie Nash, William L Hamilton, David Duvenaud, RaquelUrtasun, and Richard S Zemel. Efficient graph generation with graph recurrent attention networks. arXiv preprintarXiv:1910.00760, 2019.

[25] Kai Lei, Meng Qin, Bo Bai, Gong Zhang, and Min Yang. Gcn-gan: A non-linear temporal link prediction modelfor weighted dynamic networks. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications,pages 388–396. IEEE, 2019.

[26] Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo.Graphgan: Graph representation learning with generative adversarial nets. In Proceedings of the AAAI conferenceon artificial intelligence, volume 32, 2018.

[27] Shirui Pan, Ruiqi Hu, Sai-fu Fung, Guodong Long, Jing Jiang, and Chengqi Zhang. Learning graph embeddingwith adversarial training methods. IEEE transactions on cybernetics, 50(6):2475–2487, 2019.

[28] Nicola De Cao and Thomas Kipf. Molgan: An implicit generative model for small molecular graphs. arXivpreprint arXiv:1805.11973, 2018.

[29] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. Netgan: Generating graphsvia random walks. In International Conference on Machine Learning, pages 610–619. PMLR, 2018.

[30] Bahare Fatemi, Layla El Asri, and Seyed Mehran Kazemi. Slaps: Self-supervision improves structure learning forgraph neural networks. arXiv preprint arXiv:2102.05034, 2021.

[31] Wei Ma, Xidong Pi, and Sean Qian. Estimating multi-class dynamic origin-destination demand through aforward-backward algorithm on computational graphs. Transportation Research Part C: Emerging Technologies,119:102747, 2020.

[32] Konstantinos Perrakis, Dimitris Karlis, Mario Cools, Davy Janssens, Koen Vanhoof, and Geert Wets. A bayesianapproach for modeling origin–destination matrices. Transportation Research Part A: Policy and Practice,46(1):200–212, 2012.

[33] Krishna NS Behara, Ashish Bhaskar, and Edward Chung. A dbscan-based framework to mine travel patterns fromorigin-destination matrices: Proof-of-concept on proxy static od from brisbane. Transportation Research Part C:Emerging Technologies, 131:103370, 2021.

[34] Constantinos Antoniou, Jaume Barceló, Martijn Breen, Manuel Bullejos, Jordi Casas, Ernesto Cipriani, BiagioCiuffo, Tamara Djukic, Serge Hoogendoorn, Vittorio Marzano, et al. Towards a generic benchmarking platform fororigin–destination flows estimation/updating algorithms: Design, demonstration and validation. TransportationResearch Part C: Emerging Technologies, 66:79–98, 2016.

[35] Hu Shao, William HK Lam, Agachai Sumalee, and Martin L Hazelton. Estimation of mean and covariance ofstochastic multi-class od demands from classified traffic counts. Transportation Research Procedia, 7:192–211,2015.

[36] Lorenzo Mussone and Matteo Matteucci. Od matrices estimation from link flows by neural networks and pca.IFAC Proceedings Volumes, 39(12):165–170, 2006.

[37] Leonardo Caggiani, Mauro Dell’Orco, Mario Marinelli, and Michele Ottomanelli. A metaheuristic dynamictraffic assignment model for od matrix estimation using aggregate data. Procedia-Social and Behavioral Sciences,54:685–695, 2012.

[38] Jingtao Ma, Huan Li, Fang Yuan, and Thomas Bauer. Deriving operational origin-destination matrices from largescale mobile phone data. International Journal of Transportation Science and Technology, 2(3):183–204, 2013.

[39] Hing Keung William Lam, Hu Shao, Shuhan Cao, and Hai Yang. Origin-destination demand estimation models.In Encyclopedia of Transportation, pages 515–518. Elsevier Ltd., 2021.

16

[40] Peng Cao, Tomio Miwa, Toshiyuki Yamamoto, and Takayuki Morikawa. Bilevel generalized least squaresestimation of dynamic origin–destination matrix for urban network with probe vehicle data. Transportationresearch record, 2333(1):66–73, 2013.

[41] Mihails Savrasovs and Irina Pticina. Methodology of od matrix estimation based on video recordings and trafficcounts. Procedia Engineering, 178:289–297, 2017.

[42] Marcela A Munizaga and Carolina Palma. Estimation of a disaggregate multimodal public transport origin–destination matrix from passive smartcard data from santiago, chile. Transportation Research Part C: EmergingTechnologies, 24:9–18, 2012.

[43] Wei Shen and Laura Wynter. A new one-level convex optimization approach for estimating origin–destinationdemand. Transportation Research Part B: Methodological, 46(10):1535–1555, 2012.

[44] Chung-Cheng Lu, Xuesong Zhou, and Kuilin Zhang. Dynamic origin–destination demand flow estimation undercongested traffic conditions. Transportation Research Part C: Emerging Technologies, 34:16–37, 2013.

[45] Ennio Cascetta. Estimation of trip matrices from traffic counts and survey data: a generalized least squaresestimator. Transportation Research Part B: Methodological, 18(4-5):289–299, 1984.

[46] Michael GH Bell. The estimation of origin-destination matrices by constrained generalised least squares. Trans-portation Research Part B: Methodological, 25(1):13–22, 1991.

[47] HJ Van Zuylen. A method to estimate a trip matrix from traffic volume counts. In PTRC Summer Annual Meeting,1978.

[48] Yongxuan Huang Yong Lin, Yuanli Cai. Gls model based dynamic origin-destination matrix estimation for trafficsystem. System Engineering Theory and Practice, 24(1):136–140.

[49] Yu Nie, HM Zhang, and WW Recker. Inferring origin–destination trip matrices with a decoupled gls path flowestimator. Transportation Research Part B: Methodological, 39(6):497–518, 2005.

[50] LG Willumsen. Estimating the most likely od matrix from traffic counts. In 11th Annual Conference of UniversitiesTransport Studies Group, University of Southampton, United Kingdom, 1979.

[51] Michael J Maher. Inferences on trip matrices from observations on link volumes: a bayesian statistical approach.Transportation Research Part B: Methodological, 17(6):435–447, 1983.

[52] Kalidas Ashok. Dynamic origin-destination matrix estimation and prediction for real-time traffic managementsystem. In 12th International Symposium on Transportation and Traffic Theory, 1993, pages 465–484, 1993.

[53] Gang-Len Chang and Jifeng Wu. Recursive estimation of time-varying origin-destination flows from traffic countsin freeway corridors. Transportation Research Part B: Methodological, 28(2):141–160, 1994.

[54] G-L Chang and Xianding Tao. Estimation of dynamic od distributions for urban networks. In Transportation andtraffic theory (Lyon, 24-26 July 1996), pages 1–20, 1996.

[55] Vittorio Marzano, Andrea Papola, Fulvio Simonelli, and Markos Papageorgiou. A kalman filter for quasi-dynamicod flow estimation/updating. IEEE Transactions on Intelligent Transportation Systems, 19(11):3604–3612, 2018.

[56] Ramachandran Balakrishna and Haris N Koutsopoulos. Incorporating within-day transitions in simultaneousoffline estimation of dynamic origin-destination flows without assignment matrices. Transportation researchrecord, 2085(1):31–38, 2008.

[57] Ernesto Cipriani, Michael Florian, Michael Mahut, and Marialisa Nigro. A gradient approximation approachfor adjusting temporal origin–destination matrices. Transportation Research Part C: Emerging Technologies,19(2):270–282, 2011.

[58] Ramachandran Balakrishna, Moshe Ben-Akiva, and Haris N Koutsopoulos. Offline calibration of dynamic trafficassignment: simultaneous demand-and-supply estimation. Transportation Research Record, 2003(1):50–58, 2007.

[59] Athina Tympakianaki, Haris N Koutsopoulos, and Erik Jenelius. Robust spsa algorithms for dynamic od matrixestimation. Procedia computer science, 130:57–64, 2018.

[60] Zhejun Gong. Estimating the urban od matrix: A neural network approach. European Journal of operationalresearch, 106(1):108–115, 1998.

[61] Jishun Ou, Jiawei Lu, Jingxin Xia, Chengchuan An, and Zhenbo Lu. Learn, assign, and search: real-timeestimation of dynamic origin-destination flows using machine learning algorithms. IEEE Access, 7:26967–26983,2019.

[62] Mussone Lorenzo and Matteucci Matteo. Od matrices network estimation from link counts by neural networks.Journal of Transportation Systems Engineering and Information Technology, 13(4):84–92, 2013.

17

[63] Tamara Djukic, Gunnar Flötteröd, Hans Van Lint, and Serge Hoogendoorn. Efficient real time od matrix estimationbased on principal component analysis. In 2012 15th International IEEE Conference on Intelligent TransportationSystems, pages 115–121. IEEE, 2012.

[64] Jiangtao Ren and Qiwei Xie. Efficient od trip matrix prediction based on tensor decomposition. In 2017 18thIEEE International Conference on Mobile Data Management (MDM), pages 180–185. IEEE, 2017.

[65] Luis G Willumsen. An entropy maximising model for estimating trip matrices from traffic counts. PhD thesis,University of Leeds, 1981.

18

cyclic g a match encoder (cgame): a network for od …

Documents