researcharticle large-scaletransportationnetwork ... · researcharticle...

RESEARCH ARTICLE

Large-Scale Transportation NetworkCongestion Evolution Prediction Using DeepLearning TheoryXiaolei Ma1,2, Haiyang Yu1,2*, YunpengWang1,2, Yinhai Wang3

1 School of Transportation Science and Engineering, Beijing Key Laboratory for Cooperative VehicleInfrastructure, Systems, and Safety Control, Beihang University, Beijing, China, 2 Jiangsu ProvinceCollaborative Innovation Center of Modern Urban Traffic Technologies, SiPaiLou #2, Nanjing, China,3 Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington,United States of America

* [email protected]

AbstractUnderstanding how congestion at one location can cause ripples throughout large-scale

transportation network is vital for transportation researchers and practitioners to pinpoint

traffic bottlenecks for congestion mitigation. Traditional studies rely on either mathematical

equations or simulation techniques to model traffic congestion dynamics. However, most of

the approaches have limitations, largely due to unrealistic assumptions and cumbersome

parameter calibration process. With the development of Intelligent Transportation Systems

(ITS) and Internet of Things (IoT), transportation data become more and more ubiquitous.

This triggers a series of data-driven research to investigate transportation phenomena.

Among them, deep learning theory is considered one of the most promising techniques to

tackle tremendous high-dimensional data. This study attempts to extend deep learning the-

ory into large-scale transportation network analysis. A deep Restricted Boltzmann Machine

and Recurrent Neural Network architecture is utilized to model and predict traffic congestion

evolution based on Global Positioning System (GPS) data from taxi. A numerical study in

Ningbo, China is conducted to validate the effectiveness and efficiency of the proposed

method. Results show that the prediction accuracy can achieve as high as 88% within less

than 6 minutes when the model is implemented in a Graphic Processing Unit (GPU)-based

parallel computing environment. The predicted congestion evolution patterns can be visual-

ized temporally and spatially through a map-based platform to identify the vulnerable links

for proactive congestion mitigation.

IntroductionTraffic congestion costs billions of dollars in each year due to lost time, wasted fuel, excessiveair pollution, and reduced productivity. The 2012 Urban Mobility Report indicates that the an-nual average delay per person was 38 hours in 2011 for the 498 surveyed urban areas, which is

PLOSONE | DOI:10.1371/journal.pone.0119044 March 17, 2015 1 / 17

a11111

OPEN ACCESS

Citation: Ma X, Yu H, Wang Y, Wang Y (2015)Large-Scale Transportation Network CongestionEvolution Prediction Using Deep Learning Theory.PLoS ONE 10(3): e0119044. doi:10.1371/journal.pone.0119044

Academic Editor: Jesus Gomez-Gardenes,Universidad de Zarazoga, SPAIN

Received: June 9, 2014

Accepted: January 9, 2015

Published: March 17, 2015

Copyright: © 2015 Ma et al. This is an open accessarticle distributed under the terms of the CreativeCommons Attribution License, which permitsunrestricted use, distribution, and reproduction in anymedium, provided the original author and source arecredited.

Data Availability Statement: All relevant data arewithin the paper and its Supporting Information files.

Funding: This research is supported by the BeijingNova Programme, National Natural ScienceFoundation of China (Grant No. 51308021, No.51408019, and No. 71402011) and National BasicResearch Program of China (2012CB725404). Thefunders had no role in study design, data collectionand analysis, decision to publish, or preparation ofthe manuscript.

Competing Interests: The authors have declaredthat no competing interests exist.

http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0119044&domain=pdf

http://creativecommons.org/licenses/by/4.0/

http://creativecommons.org/licenses/by/4.0/

equivalent to a 238% increase compared to that in 1982. Traffic congestion incurred a total of5.5 billion hours of travel delays and 2.9 billion gallons of extra fuel consumption in 2009,which corresponds to a congestion cost of 121 billion dollars [1].

Diagnosing congestion onset and predicting congestion evolution patterns are consideredstrategic countermeasures to locate traffic bottlenecks and adopt proactive measures for con-gestion mitigation. Many research efforts have been made to achieve these goals [2, 3]. Howev-er, most of previous studies tend to view congestion spots separately in a small-scale network.As mentioned by Yang[4], the number of network links for almost all the existing traffic con-gestion prediction methods do not exceed 100. In addition, these studies rely on either mathe-matical equations or simulation techniques to depict the network congestion evolution. Thisoften results in suboptimality since transportation activities involve human factors which aredifficult to represent or model accurately using mathematics-driven approaches. Previous net-work-wide congestion studies mainly resort to either complex network theory [5–11] or visual-ization techniques [12] to understand the evolution of network-wide traffic congestion. Incomplex network theory, transportation networks can be abstracted as scale-free networks [9],and traffic flow dynamics over the network are generated based on the power law distribution[11] However, these assumptions are not always adherent to the reality, and lack sufficient traf-fic sensor data to validate their findings. Visualization techniques can intuitively display thespatial and temporal distribution of network congestion through a map-based platform, butare incapable of explaining the mechanism of congestion generation and predicting futuretrend of congestion evolution.

Over the past decades, tremendous traffic sensors have been deployed on the existing free-way networks, generating a huge amount of data at relatively high time resolutions. The in-creasing availability of network data makes it possible to simultaneously examine traffic flowson a large-scale roadway network and observe the evolution of congestion on that networkthrough data mining techniques. Transportation network consists of thousands of links withchanging traffic condition over time. This is equivalent to a high-dimensional space where con-gestion prorogates temporally and spatially. This is challenging to model using traditional datamining approaches due to the curse of dimensionality: when the input dimensionality in-creases, the required training data grow exponentially [13]. The recent emergence of deeplearning theory can address the curse of dimensionality issue through distributed representa-tions, and thereby holds great promise in learning high-dimensional features with tremendousdata. Compared to those shallow learning architectures, deep learning is able to model complexnon-linear phenomenon using distributed and hierarchical feature representation [14, 15], andhas received numerous success in the domain of computer vision, speech recognition, naturallanguage processing and music composition [16]. Deep learning theory began to exhibits its su-periority of predicting traffic flow over a single road segment [14]. However, to the best of ourknowledge, no research has been conducted to apply deep learning theory into large-scaletransportation network modeling and analysis. To assist transportation professionals in con-gestion diagnosis and operational strategy assessment, this study aims at developing a data-driven and network-wide congestion analysis paradigm based on traffic sensor measurementsusing deep learning theory.

The remainder of this paper is organized as follows: Section 2 presents a methodologicalframework to model network-wide traffic congestion, where GPS data are collected to depictthe roadway traffic condition and a deep learning architecture Recurrent Neural Network andRestricted Boltzmann Machine (RNN-RBM) is introduced to predict the temporal-spatial con-gestion evolution pattern. To validate the effectiveness of the proposed algorithm, a numericalexperiment with a city roadway network of over 500 links is undertaken in Section 3. The pre-diction results are demonstrated through a GIS platform to observe the future congestion

Traffic Congestion Evolution Prediction Using Deep Learning

PLOS ONE | DOI:10.1371/journal.pone.0119044 March 17, 2015 2 / 17

evolution, followed by a sensitivity analysis with various congestion speed thresholds. In addi-tion, a comparative study with conventional neural network and support vector machine meth-ods is conducted to demonstrate the advantage of the proposed algorithm. Finally, discussionand conclusion are made in the last section.

MethodologyTraffic condition on each link is classified into a binary state, where 1 represents congested and0 represents uncongested. The link traffic condition changes over time, and essentially becomesa binary sequence with varying length. Considering the size of a typical transportation network,the network-wide congestion pattern is equivalent to a high-dimensional matrix arranged intime (number of time steps) and space (number of links). The key of network-wide traffic con-gestion evolution prediction model is to understand how each matrix element evolves spatiallyand temporally. The initial step is to identify the traffic condition of each link using taxi GPSdata. Missing and erroneous data will be reviewed and corrected to ensure satisfying quality. Fi-nally, processed GPS data are aggregated into varying time intervals for representing network-wide traffic patterns. Inspired by polyphonic music generation and transcription approach[17](Boulanger-Lewandowski, 2012), a combination of Restricted Boltzmann Machine (RBM)and Recurrent Neural Network (RNN) is then utilized to learn the high-dimensional trafficcongestion patterns. To fully take advantages of Graphics Processing Unit (GPU)-based paral-lel computing architecture, the proposed algorithm is implemented in the NVIDIA ComputeUnified Device Architecture (CUDA) environment to accelerate the model learning procedure.

Transportation Network RepresentationA GPS-equipped taxi can record its location and timestamp information, and the travel speedcan be directly measured. For each link n, the average speed at time t is denoted asctn, If the av-erage speed is lower than a threshold (20 km/hour), the link is considered congested, and there-by is set as 1. Due to the sampling error, GPS-equipped taxis may not fully travel along theentire network at any time. For those links without GPS data, vehicle speed is calculated byusing the historical records for the same link. Based on the above discussion, traffic congestionsfor a network with N links within T time intervals can be expressed as:

c11 c21 ::: cT1

c12 c12 ::: cT2

⋮ ⋮ ::: ⋮

c1N c2N ::: cTN

266664

377775

ð1Þ

Where ctn represents the traffic congestion condition on nth link at time t, which is a binaryvalue. The next step is to predict the elements in each row (link). This can be treated as the sin-gle sequence learning problem. When multiple rows (links) are predicted simultaneously, thisis known as the high-dimensional sequence learning problem. A deep learning architecturewith temporal processing capabilities is desired.

RNN-RBM: A deep learning architecture for high-dimensional temporalsequence prediction

Restricted Boltzmann Machine (RBM). Restricted Boltzmann Machine (RBM) is an en-ergy based model that includes one visible layer and one hidden layer [15]. As demonstrated inFig. 1, units in both visible layer and hidden layer are mutually connected.



In Fig. 1, nv and nh respectively represent the number of units in hidden layer and visible

layer. v ¼ ðv1; v2; . . . vnvÞTrepresents the state vector for visible layer, where vi is the state of the

ith unit in visible layer. h ¼ ðh1; h2; . . . hnhÞTrepresents the state vector value for hidden layer,

where hi is the state of the ith unit in hidden layer. a ¼ ða1; a2; . . . anvÞTrepresents the bias vec-

tor for visible layer, where ai is the bias of the ith unit in visible layer.

b ¼ ðb1; b2; . . . bnhÞTrepresents the bias vector for hidden layer, where bi is the bias of the ith

unit in hidden layer.W ¼ ðwi;jÞ 2 Rnh�nv represents the weight matrix between visible layer

and hidden layer, where wi,j is the connecting weight for the ith unit in visible layer and the jthunit in hidden layer. The units in both hidden and visible layers are binary-valued.

Based on the above notations, the energy function for a given (v,h) is defined as:

Eyðv;hÞ ¼ �aTv� bTv� hTWv ð2Þ

The joint probability distribution function can be expressed as:

Pyðv;hÞ ¼1

Zy

e�Eyðv;hÞ ð3Þ

Where θ = (W,a,b), Zy ¼Xv;h

e�Eyðv;hÞis defined as the partition function to normalize the

probability function.The conditional probability that hidden unit hk is activated given hidden vector v and the

conditional probability that visible unit vk is activated given visible vector h can be respectively

Fig 1. RBM Structure.

doi:10.1371/journal.pone.0119044.g001



calculated as:

Pðhk ¼ 1jvÞ ¼ sigmoidðbk þXnvi¼1

wk;iviÞ ð4Þ

Pðvk ¼ 1jhÞ ¼ sigmoidðak þXnhi¼1

wk;ihiÞ ð5Þ

sigmoidmeans the sigmoid activation function. Because there is no dependency within visiblelayer and hidden layer, both conditional probability functions can be written as:

PðhjvÞ ¼Ynhk¼1

Pðhk ¼ 1jvÞ ð6Þ

PðvjhÞ ¼Ynvk¼1

Pðvk ¼ 1jhÞ ð7Þ

To find the parameters θ, RBM is required to maximize the probability of training set V:

argmaxy

Yv2V

PðvÞ ð8Þ

This is equal to maximize the log likelihood of P(v). A common optimization method is gradi-ent descent algorithm. The mathematical form can be expressed in (Equation 9), where the de-rivative of ln P(v) should be calculated as (Equation 10) in advance.

y ¼ yþ Z@lnPðvÞ

@yð9Þ

@lnPðvÞ@y

¼ �h@Eðv;hÞ@y

iPðhjvÞ þ h@Eðv;hÞ@y

iPðv;hÞ ð10Þ

Where η is known as the learning rate, and it is responsible to control the speed of conver-gence to find the optimal solution. h�iPdenotes the expectation value with respect to probabilitydistribution P. Solving (equation 8) is computational intensive, Hilton [18] proposed the fa-mous Contrastive Divergence (CD) approach to train RBM in an efficient fashion. The key ofthis algorithm is to perform n-step Gibbs sampling for approximating the expectation value of(Equation 10), the equation can be further simplified into the following three equations:

@lnPðvÞ@y

¼

@lnPðvÞ@wi;j

� Pðhi ¼ 1jvð0ÞÞvð0Þj � Pðhi ¼ 1jvðnÞÞvðnÞj

@lnPðvÞ@ai

� vð0Þi � vðnÞi

@lnPðvÞ@bi

� Pðhi ¼ 1jvð0ÞÞ � Pðhi ¼ 1jvðnÞÞ

ð11Þ

8>>>>>>>><>>>>>>>>:

The probability distribution function for RBM can be modified as a conditional probability dis-tribution function depending on the previous state of RBM [19], and this leads to a variant ofRBMmodel for efficiently modeling complex time series [20].



Recurrent Neural Network (RNN)Recurrent Neural Network (RNN) is a special form of neural network family, and contains atleast one feed-back connection as an internal state from neurons’ outputs to their inputs. Thisloop structure grants the capability of temporal processing and sequence learning for the net-work. Due to RNN’s short-term memory, it is widely used to model non-linear time series data[21]. Fig. 2 presents a State Space Neural Network (SSNN), which a widely used structure ofRNN. Delay units are introduced to feed back the previous hidden unit activations with the in-puts into the neural network. x(t) and h(t) respectively represent the vectors in input layer aswell as delay units at time t. h(t+l) and y(t+l) respectively represent the vectors in hidden layerand output layer at time t+l.WIH is the weight matrix from input layer to hidden layer.WHH isthe weight matrix from the delay units to hidden layer.WHO is the weight matrix from hiddenlayer to output layer. A hidden layer connects both input layer and output layer. The hiddenstates are delayed by l time step, and fed back to the network as additional inputs. This proce-dure can be described in (Equation 12) and (13).

hðt þ lÞ ¼ sðWIHxðtÞ þWHHhðtÞÞ ð12Þ

yðt þ lÞ ¼ sðWHOhðt þ lÞÞ ð13Þ

Where σ(•) denote the activation function such as sigmoid function, tangent function andlinear function.

Training RNN is similar with training traditional multilayer Feed Forward Neural Network(FFNN). By unfolding the RNN structure over time, each feedback loop will be expanded assingle layer feed-forward neural network at each time stamp in Fig. 3. In this case, RNN can beefficiently trained using the well-known back prorogation algorithm over time. This is also

Fig 2. State Space Neural Network Structure.




known as BackPropagation Through Time (BPTT). However, BPTT has difficulty training thesequence with long-term dependency since it can be converted into a deep architecture withmultiple layers, and results in vanishing gradient as well as exploding gradient issues [22]. Al-though there are a certain number of algorithms proposed to solve these issues such as longshort term memory network [23], it is still worth investigating in the context of deeplearning theory.

RNN-RBMBoth RBM and RNN models have the capability of predicting a temporal sequence. A naturalthought is to take advantage of the merits from both models. This triggers a deep architecturecalled RNN-RBMmodel to depict the temporal dependencies in high-dimensional sequences[17]. The architecture of RBB-RBM is given in Fig. 4:

Fig 3. BackPropagation Through Time for RNN.




The conditional RBM and RNN are stacked to construct a RNN-RBMmodel. ConditionalRBM is an extension of conventional RBM, and it is designed to process temporal sequence byproviding a feedback loop between visible layer and hidden layer. The bias values for both visi-ble layer and hidden layer are updated based on the previous visible units [24]. The similar con-

cept can be applied in the RNN-RBMmodel. bðtÞv andbðtÞh respectively represent the bias vectorsfor visible layer and hidden layer in RBMmodel at time t, and are updated through the hiddenunits u(t-1) in RNN model at time t-1. Weight matricesWuv andWuh are provided to connectthe RNN model and RBMmodel. The above procedure can be described in (Equation 14) and(Equation 15).

bðtÞv ¼ bv þWuuuðt�1Þ ð14Þ

bðtÞh ¼ bh þWuhuðt�1Þ ð15Þ

Where bv and bh are the initial biases in visible layer and hidden layer for RBMmodel. TheRNNmodel is unfolded over time, and is utilized to generate the previous hidden states inRBMmodel based on the input layer v(t) and hidden layer u(t) in RNN model. The activation ofhidden units in the hidden layer can be then calculated as:

uðtÞ ¼ sigmoidðbu þWuuuðt�1Þ þWvuv

ðtÞÞ ð16Þ

The algorithm execution is summarized below:Step 1: Generate the value of hidden units in RNN model using (Equation 16).Step 2: Update the biases in the RBMmodel using (Equation 14) and (15) based on the esti-

mated u(t-1) in Step 1, and calculated the RBM parameters.

Step 3: Calculate the log-likelihood gradient @lnPðvÞ@y in RBMmodel using (Equation 11)

Fig 4. RBM-RNN Architecture.




Step 4: The estimated gradient is propagated into RNN model, and the weightWuv andWvu

are updated over time to train RNN model for prediction.

Temporal-Spatial Congestion Evolution PredictionAs previously mentioned, traffic condition on each roadway segment can be converted into abinary value denoting congestion status. Considering historical traffic patterns and number oflinks in a large-scale traffic network, the congestion evolution prediction problem becomes ahigh-dimensional temporal sequence learning problem. The RNN-RBMmodel can be appliedto predict the temporal-spatial network congestion evolution pattern. To implement the RNN-RBMmodel in an efficient manner, the mini-batches gradient descent optimization method isapplied: the training set is divided into a number of small samples, and each sample set can beparallelly processed using standard gradient descent optimization method. GPU is designed ona parallel throughput architecture that can execute multiple tasks concurrently [25]. This fea-ture is especially suitable to accelerate the computation of RNN-RBMmodel. CUDA is a paral-lel computing platform developed by NVIDIA [26], and it offers a flexible programminglibrary for users to manipulate the computational elements in GPUs.

Numerical Study

Model DevelopmentTo validate the effectiveness and efficiency of the proposed network-wide congestion predic-tion approach, a transportation network in Ningbo City, China is used for the numerical exper-iment. The network consists of 515 road links where approximately 4000 GPS-equipped taxistraveled from April 13, 2014 to May 9, 2014. The GPS data are updated every 2 minutes. A se-ries of data quality control methods are applied to remedy the missing or erroneous data, andthis generates 5,521,294 GPS records with speed information. Each GPS record is associatedwith a unique link ID, and this link ID corresponds to a specific geospatial polyline in a GISplatform. These GPS data are further aggregated into 5 minutes, 10 minutes, 30 minutes and60 minutes time intervals to evaluate the network traffic congestion prediction performance.The data from the first 22 days (i.e. from April 13, 2014 to May 4, 2014) are utilized for trainingthe RNN-RBMmodel, and the remaining data are used for testing.

Minimizing cross-entropy error is set as the optimization objective during the RNN-RBMmodel training procedure. This is because that the cross-entropy indicates the distance betweenthe probability distributions of computed outputs and target outputs, and thus it is preferableto using mean squared error for a neural network classifier with binary values [27]. The cross-entropy error is defined in (Equation 17).

Cross� Entropy Error ðCEEÞ ¼ � 1

T

XNn¼1

XT

t¼1

ctnlnðctnÞ þ ð1� ctnÞlnð1� ctnÞ ð17Þ

In addition, both training accuracy and testing accuracy are presented to measure the algo-rithm effectiveness. The estimated congestion status for link n at time t is denoted asctn. Ac-cording to (Equation 1), the ground-true congestion status for link n at time t is denoted asctn.Because both ctn and c

tn are binary values, the prediction accuracy can be defined in (Equation



18) given a transportation network with N links within T time intervals:

Accuacy ðAccÞ ¼ ð1�

XNn¼1

XT

t¼1

jctn � ctnj

NTÞ � 100% ð18Þ

If a positive result refers to each link that is correctly identified as either congested or uncon-gested, then the sensitivity and specificity can be respectively defined as:

Sensitivity ¼ number of links correctly identifed as congestedtotal number of congested links

ð19Þ

Specificity ¼ number of links correctly identifed as uncongestedtotal number of uncongested links

ð20Þ

Algorithm runtime is used to evaluate the efficiency of the proposed approach.Parameters in the RNN-RBMmodel are set optimally as follows:Number of hidden units in the RNNmodel = 100Number of hidden units in the RBMmodel = 150Learning rate for gradient descent optimization method = 0.05The weight matrixW for RNN-RBMmodel is initialized by following a Normal distribution

with mean = 0 and variance = 0.01, and other matricesWuu,Wuv,Wvu andWuh are initializedby following a Normal distribution with mean = 0 and variance = 0.0001. All the bias vectors inthe model are set as zero initially.

Result AnalysisGPS speed data with four different aggregation levels were tested using the RNN-RBMmodel,and the trained model was then utilized to generate the future network-wide congestion evolu-tion patterns fromMay 5, 2014 to May 9, 2014 for prediction. Each model was run for 200 iter-ations to ensure the algorithm’s successful convergence. The algorithm was implemented usingPython Theano [28] and was executed on a desktop computer with Intel i7 3.4GHz CPU, 8GBmemory and NVIDIA GeForce GTX650 GPU (2GB RAM). The comparison results are dem-onstrated in Table 1:

Results in Table 1 demonstrate that the overall algorithm prediction performance improvesas the data aggregation level increases. This is probably due to the less data fluctuation (i.e. traf-fic condition does not change dramatically) as the time interval becomes longer. When the ag-gregation level achieves to one hour, the training accuracy is 95%, and more than 88% ofcongestion conditions can be correctly identified, while the algorithm can be successfully exe-cuted within 6 minutes. Both testing accuracy and training accuracy degrade when the data res-olution becomes higher. However, the sensitivity and specificity do not straightly increase asthe data aggregation level becomes larger. For instance, only 43% of congested linked can be

Table 1. Comparison of traffic congestion prediction performance with different data aggregation levels.

Aggregation Level Training Accuracy (%) Cross-Entropy Error Runtime (seconds) Testing Accuracy (%)

60 minutes 95.1 -38.7 354 88.2%

30 minutes 88.4 -102.52 643 80.8%

10 minutes 80.2 -172.3 1729 73.4%

5 minutes 72.2 -239.8 4642 68.9%

doi:10.1371/journal.pone.0119044.t001



correctly predicted when the aggregation level is set as 10 minutes. This implies that the dataaggregation level influences the prediction outcomes, and should be carefully selected.

In addition, CUDA-based GPU parallel computing engine presents its superior capability ofconcurrently accelerating the computational intensive tasks. Compared with the CPU-based al-gorithm implementation, the proposed GPU-based framework can approximately improve thealgorithm efficiency by 5 times.

The training accuracy changing curves for the above four scenarios are plotted in Fig. 5:RNN-RBMmodels with varying data aggregation levels can converge after 150 iterations,

and the training accuracy remains almost constant. This indicates that the RNN-RBMmodel isable to learn the network-wide traffic congestion patterns in a fast and reliable fashion.

VisualizationBased on the trained RNN-RBMmodels, the network congestion evolution patterns from May5, 2014 to May 9, 2014 can be thereby predicted. As previously mentioned, each binary conges-tion status can be mapped on a GIS platform using the unique link ID. To visualize the spatialand temporal network congestion onset and growth, a series of dynamic GIS networks withcolored links were created in Fig. 6A-Fig. 6D based on the predicted traffic congestion patternson May 9, 2014.

The congested link (less than 20 km/hour) is marked in red, and the uncongested link(equal to or greater than 20 km/hour) is marked in green. The traffic condition in the earlymorning was generally good with only a small number of congested links, and the network be-came more and more congested during morning peak hours. Similarly, when most peoplecommuted during evening peak hours, network-wide traffic congestion reached highest. Final-ly, traffic jam dissipated at midnight since the majority of passengers engaged in less or no ac-tivities during that time. As presented in Fig. 6A-Fig. 6D, traffic congestion initially

Fig 5. Training Accuracy Changing Curves with Different Data Aggregation Levels.




concentrated in the central of network, and gradually spread to the network boundary. Thereare several links that consistently experience traffic congestion, and should be taken special at-tention for congestion mitigation in advance.

The temporal distribution for number of congested links is summarized in Table 2 and visu-alized in Fig. 7. There are 35.9% of road links experiencing traffic congestion during morningpeak hour (9:00AM-10:00AM), while 41.1% of network was congested during evening peakhour (5:00PM-6:00PM). This indicates that the evening peak traffic is worse than morningpeak traffic, and causes more delay. People may tend to return home earlier than usual weekdays for weekend activities, and thus higher traffic volumes on Friday’s evening peak hours canbe observed.

ComparisonTo further evaluate the advantages of RNN-RBM algorithm for large-scale transportation net-work congestion prediction, a study was conducted by comparing RNN-RBM, Back Propaga-tion Neural Network (BPNN) and Support Vector Machine (SVM) methods. To remain a faircomparison environment, the same dataset and computing platform was adopted. The topolo-gy for BPNN was set as one input layer, one hidden layers with 10 processing units and one

Fig 6. Predicted Network Congestion Evolution Patterns on May 09, 2014 with Varying Times of Day. (a) Spatial Distribution of Congestion from 5AMto 6AM; (b) Spatial Distribution of Congestion from 9AM to 10AM; (c) Spatial Distribution of Congestion from 5PM to 6PM; (d) Spatial Distribution ofCongestion from 11PM to 12PM (Red line indicates congested traffic condition; green line indicated uncongested traffic condition).




output layers\. For the SVMmethod, Radial Basis Function (RBF) was utilized with threeadjustable parameters: cost C, width parameter g, and epsilon �. These parameters were cali-brated using 5-fold cross validation. The rolling horizon (i.e. the size of moving window defin-ing previous network congestion statuses) was set as 2 time steps. To eliminate therandomness, both BPNN and SVM were executed for 10 times. Table 3 presents the perfor-mances for three algorithms with 60-minute data aggregation level.

Due to the limit of computational resources, only 60-minute data aggregation level is uti-lized for testing. As demonstrated in table 3, the RNN-RBM outperforms the other two algo-rithm in terms of efficiency and effectiveness: The algorithm execution time for RNN-RBM isonly 3% of that for BPNN, and 2.3% of that for SVM. The gain in computational efficiency forRNN-RBM does not sacrifice the algorithm accuracy. Compared with BPNN and SVM, theprediction accuracy for RNN-RBM increases by at least 17%. Moreover, both sensitivity andspecificity for RNN-RBM are significantly higher than those for BPNN and SVM. This isowing to the deep and parallel architecture of RNN-RBM for learning multidimensional fea-tures in an efficient and effective manner.

Sensitivity AnalysisCongestion identification relies on the setting of speed threshold when traffic congestion oc-curs. To understand how various speed thresholds influence the congestion pattern predictionperformance, a sensitivity analysis was conducted to examine the prediction capability ofRNN-RBMmodel. Three different speed settings were adopted: 10 minutes, 20 minutes and 30minutes. When the calculated average GPS link speed is lower than the specified speed thresh-old, the link is considered congested. The GPS speed data were aggregated into 60-minute in-terval, and followed the similar model training and testing procedure in the modeldevelopment. The results are presented in table 4:

Table 2. Statistics for number of congested links on May 9, 2014.

Time Number of congested links Percentage (%)

5:00–6:00 53 10.3

6:00–7:00 72 14.0

7:00–8:00 121 23.5

8:00–10:00 162 31.5

9:00–10:00 185 35.9

10:00–11:00 168 32.6

11:00–12:00 163 31.7

12:00–13:00 168 32.6

13:00–14:00 146 28.3

14:00–15:00 160 31.1

15:00–16:00 178 34.6

16:00–17:00 183 35.5

17:00–18:00 213 41.4

18:00–19:00 189 36.7

19:00–20:00 174 33.8

20:00–21:00 153 29.7

21:00–22:00 119 23.1

22:00–23:00 111 21.6

23:00–00:00 94 18.3




There are a total of 59,225 traffic conditions from May 5, 2014 to May 9, 2014 for GPS datawith 60-minute aggregation level. When the speed threshold is set as 10 km/hour, only 1895congested links can be detected. Surprisingly, the model training and testing accuracies are notsignificantly higher than those with 20km/hour speed threshold, even though the number ofcongested links with 20km/hour speed threshold is approximately 7 times more than that with10km/hour speed threshold. However, when the speed threshold increases to 30 km/hour,more than 30% of network links are congested. The training accuracy decreases to 84%, whichis much lower than the scenario with 20 km/hour speed threshold. This is probably because

Fig 7. Temporal distribution for number of congested links on May 9, 2014.


Table 3. Comparison of traffic congestion prediction performance for different algorithms with 60-minute data aggregation level.

Algorithms Runtime (seconds) Prediction Accuracy (%) Sensitivity (%) Specificity (%)

RNN-RBM 354 88.2% 64.1% 91.1%

BPNN 13498 69.7% 38.2% 77.5%

SVM 14979 71.0% 36.6% 80.3%


Table 4. Sensitivity analysis of congestion evolution prediction performance with various speed thresholds.

Speed Threshold Number of Congested Links Training Accuracy (%) Runtime (seconds) Testing Accuracy (%)

10 km/hour 1895 98.6% 290 93.8%

20 km/hour 13611 95.1% 354 88.2%

30 km/hour 20238 84.0% 419 79.9%




that traffic congestion temporal patterns fluctuate more significantly as the speed threshold ishigher, and thus incurs difficulties in predicting congestion evolution. In practice, the setting ofcongestion speed threshold should be carefully determined.

ConclusionProperly understanding the temporal and spatial patterns of congestion evolution is crucial toeffectively mitigate congestion. This study proposed a data-driven method to predict the net-work traffic congestion onset and evolution patterns. Based on the tremendous GPS taxi speeddata, the network congestion onset, propagation and dissipation can be modeled using a deepRNN-RBM architecture. To the best of our knowledge, this is the first study to utilize the deeplearning theory into large-scale transportation network analysis. Compared with the traditionalequation-driven or simulation-driven network studies, the proposed method relies on less as-sumption to model traffic congestion dynamics. A numerical experiment in Ningbo, China isconducted to validate the effectiveness and efficiency of the proposed method. A transportationnetwork with 515 links is constructed, where the speed information generated from 4000 taxisare mapped and converted into a binary value to represent traffic congestion for each link. TheRNN-RBMmodel has the capability of learning high-dimensional temporal sequences in anaccurate and efficient manner, and is implemented in a CUDA-based parallel environment toaccelerate the computational procedure. The training accuracy can achieve as high as 95.1%using the GPS data with one hour aggregation level while the algorithm execution time is onlyless than 6 minutes, and the trained model can be utilized to correctly predict network-widetraffic congestion on more than 88% links. The inferred network congestion patterns are fur-ther visualized on a GIS-based map platform to investigate the temporal and spatial evolutionof traffic congestion. By comparing with conventional data mining approaches (i.e. BPNN andSVM), the RNN-RBM algorithm demonstrates its superiority in terms of effectiveness and effi-ciency. In addition, a sensitivity analysis is conducted to understand how different speedthresholds impact the congestion prediction accuracy. The finding of this study is importantfor large-scale roadway network planning, operations, and investment decisions. Understand-ing which congested locations are autonomously generated and how congestion propagatesover transportation network will allow researchers and practitioners to focus the limited re-sources on the primary congestion locations, and adopt the proactive countermeasures tomitigate congestion.

Although the proposed method is promising to model and predict large-scale transportationnetwork congestion, there is still plenty of room to be improved in the future research. For in-stances, the pretraining techniques such as Hessian-free optimization method should be con-sidered to initialize the parameters more rationally. In addition, the spatial interaction amongthe adjacent roadway segments should be taken into account to increase training andprediction accuracy.

Supporting InformationS1 Dataset. The dataset includes the raw data used for training and testing with data aggre-gation levels of 30 minutes and 60 minutes. The data collection period is from April 13, 2014to May 9, 2014. Data from April 13 to May 4 are utilized for training models, and the remain-ing data are utilized for testing and prediction. For each CSV (Comma-Separated Values) file,the first row indicates the road link IDs in the entire network, and the other rows indicates thetraffic conditions for each time interval, where 0 represents uncongested and 1 represents con-gested. The congestion threshold is set as 20 kilometers per hour. The algorithm execution re-sults are saved in two CSV files. The file named as “result.csv” records both accuracy and cross-



http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0119044.s001

entropy value for each iteration, and the other file named as “sequence.csv” records the pre-dicted traffic congestion patterns using the RBM-RNNmodel.(ZIP)

Author ContributionsConceived and designed the experiments: XM Yinhai Wang. Performed the experiments: XMHY. Analyzed the data: XM HY. Contributed reagents/materials/analysis tools: YunpengWang. Wrote the paper: XM.

References1. Schrank D, Eisele B, Lomax T. TTI’s 2012 urban mobility report. Proceedings of the 2012 annual urban

mobility report Texas A&M Transportation Institute, Texas, USA. 2012.

2. Aligawesa AK. The detection and localization of traffic congestion for highway traffic systems using hy-brid estimation techniques: Purdue University; 2009.

3. Long J, Gao Z, Ren H, Lian A. Urban traffic congestion propagation and bottleneck identification. Sci-ence in China Series F: Information Sciences. 2008; 51(7):948–64.

4. Yang S. On feature selection for traffic congestion prediction. Transportation Research Part C: Emerg-ing Technologies. 2013; 26:160–9.

5. Wang J, Li Y, Liu J, He K, Wang P. Vulnerability analysis and passenger source prediction in urban railtransit networks. PloS one. 2013; 8(11):e80178. doi: 10.1371/journal.pone.0080178 PMID: 24260355

6. Wang J, Wang L. Congestion analysis of traffic networks with direction-dependant heterogeneity. Phy-sica A: Statistical Mechanics and its Applications. 2013; 392(2):392–9.

7. Liu C, Zhang Q-p, Zhang X. Emergence and disappearance of traffic congestion in weight-evolving net-works. Simulation Modelling Practice and Theory. 2009; 17(10):1566–74.

8. Colak S, Schneider CM, Wang P, González MC. On the role of spatial dynamics and topology on net-work flows. New Journal of Physics. 2013; 15(11):113037.

9. Zhao L, Lai Y-C, Park K, Ye N. Onset of traffic congestion in complex networks. Physical Review E.2005; 71(2):026125.

10. Wang P, Hunter T, Bayen AM, Schechtner K, González MC. Understanding road usage patterns inurban areas. Scientific reports. 2012; 2.

11. Chen S, HuangW, Cattani C, Altieri G. Traffic dynamics on complex networks: a survey. MathematicalProblems in Engineering. 2011;2012.

12. Cheng T, Tanaksaranond G, Brunsdon C, Haworth J. Exploratory visualisation of congestionevolutions on urban transport networks. Transportation Research Part C: Emerging Technologies.2013; 36:296–306.

13. Bellman R, Bellman RE, Bellman RE, Bellman RE. Adaptive control processes: a guided tour: Prince-ton University Press Princeton; 1961.

14. HuangWH, Song GJ, Hong HK, Xie KQ. Deep Architecture for Traffic Flow Prediction: Deep Belief Net-works With Multitask Learning. Ieee Transactions on Intelligent Transportation Systems. 2014;15(5):2191–201.

15. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science.2006; 313(5786):504–7. PMID: 16873662

16. Bengio Y. Learning deep architectures for AI. Foundations and trends1 in Machine Learning. 2009;2(1):1–127.

17. Boulanger-Lewandowski N, Bengio Y, Vincent P. Modeling temporal dependencies in high-dimensionalsequences: Application to polyphonic music generation and transcription. arXiv preprintarXiv:12066392. 2012.

18. Hinton GE. Training products of experts by minimizing contrastive divergence. Neural computation.2002; 14(8):1771–800. PMID: 12180402

19. Längkvist M, Karlsson L, Loutfi A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters. 2014; 42:11–24.

20. Sutskever I, Hinton GE, Taylor GW. The recurrent temporal restricted boltzmann machine. Advances inNeural Information Processing Systems; 2009.



http://dx.doi.org/10.1371/journal.pone.0080178

http://www.ncbi.nlm.nih.gov/pubmed/24260355



21. Hüsken M, Stagge P. Recurrent neural networks for time series classification. Neurocomputing. 2003;50:223–35.

22. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult.Neural Networks, IEEE Transactions on. 1994; 5(2):157–66. PMID: 18267787

23. Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation. 1997; 9(8):1735–80.PMID: 9377276

24. Sutskever I, Hinton GE, editors. Learning multilevel distributed representations for high-dimensional se-quences. International Conference on Artificial Intelligence and Statistics; 2007.

25. Schatz MC, Trapnell C, Delcher AL, Varshney A. High-throughput sequence alignment using GraphicsProcessing Units. BMC bioinformatics. 2007; 8(1):474.

26. Documentation CT. v6. 0. Santa Clara (CA, USA): NVIDIA Corporation. 2014.

27. Golik P, Doetsch P, Ney H, editors. Cross-entropy vs. squared error training: a theoretical and experi-mental comparison. INTERSPEECH; 2013.

28. Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, et al. Theano: a CPU andGPUmath expression compiler. Proceedings of the Python for scientific computing conference(SciPy); 2010.





researcharticle large-scaletransportationnetwork ... · researcharticle...

Documents