a bayesian modelling framework for individual passenger’s probabilistic route choice: a case study...

1
3 C ASE STUDY (edited from the Standard Tube map, Transport for London) UNDERGROUND STATIONS: VICTORIA (Orig) – LIVERPOOL STREET (Destn) The Oyster in London & test network Oyster Journey Time, EXT ENT OJT T T time-stamp of EXIT (end) time-stamp of ENTRY (start) (minutes) Frequency distribution of OJT in AM peak (07:00-10:00) (35,992 valid observations, 26/06/2011 – 31/03/2012) Suppose that c r (t; θ r ), for all r (r = 1, 2), is » Gaussian distribution » Lognormal distribution Oyster Journey Time (minutes) Frequency Stats (in minutes) ——————— min. max. mean med. stdev. : : : : : 17 42 26.3429 26 4.9664 15 20 25 30 35 40 45 50 0 500 1000 1500 2000 2500 3000 3500 Direct route (low service frequency) Indirect route (high service frequency) A Bayesian modelling framework for individual passenger’s probabilistic route choice: A case study on the London Underground Qian Fu, Ronghui Liu and Stephane Hess, Institute for Transport Studies (ITS), University of Leeds, UK Email: [email protected]; Tel: +44 (0)113 343 1790; 34-40 University Road, Leeds LS2 9JT, UK Presentation #14-5328, session 775 The 93rd Transportation Research Board Annual Meeting Washington D.C., 12-16 January 2014 Probability Density 15 20 25 30 35 40 45 50 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Oyster data (AM Peak) Est. Lognorm mixture Route1 (Victoria - Central) Route2 (Circle) Probability Density 15 20 25 30 35 40 45 50 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Oyster data (AM Peak) Est. Gaussian mixture Route1 (Victoria - Central) Route2 (Circle) Probability Density 15 20 25 30 35 40 45 50 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Oyster data (PM Peak) Est. Lognorm mixture Route1 (Victoria - Central) Route2 (Circle) Probability Density 15 20 25 30 35 40 45 50 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Oyster data (PM Peak) Est. Gaussian mixture Route1 (Victoria - Central) Route2 (Circle) Oyster Journey Time (minutes) Probability Density 15 20 25 30 35 40 45 50 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Oyster data (A whole weekday) Est. Lognorm mixture Route1 (Victoria - Central) Route2 (Circle) Oyster Journey Time (minutes) Probability Density 15 20 25 30 35 40 45 50 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Oyster data (A whole weekday) Est. Gaussian mixture Route1 (Victoria - Central) Route2 (Circle) providing knowledge for revealing passenger-flow distributions and traffic congestion, etc.; and assisting public-transport managers in delivering a more effective transit service, especially during rush hours small samples of individual passengers’ route choices data collection » costly, hence a/ and small sample size; and » lack of accuracy or even loss of essential information, e.g. travel time To understand passengers’ route choice behaviour 1 MOTIVATION & Q UESTIONS Issues of data availability for model estimation from automatic fare collection system implemented on local public transport, e.g. Oyster in London, SmarTrip in Washington D.C., Octopus in Hong Kong, and SPTC in Shanghai, to name but a few times-tamps of the start and the end of a journey » a sufficiently large sample of the smart-card users’ journey time between O-D stations; BUT … their detailed itineraries? » each individual’s actual route choice? ( | ), 1,..., N qr q choice t r Prob (Number of alternative routes) Given only the observed journey time, would it be possible to tell the most likely (or even the actual) route choice that the passenger made? observed journey time of passenger q the passenger q choosing route r (from his/her own route-choice set) ANS1: A conditional probability Would there be a link that potentially relates a passenger’s route choice to his/her journey time observed from the smartcard data? Q1 RESEARCH QUESTIONS POSSIBLE ANSWERS & Q2 time-stamp of end time-stamp of start Smart-card data Given a pair of O-D stations: 2 METHOD Mixture distribution of the journey time For simplicity, assume that all the passengers consider an identical route-choice set that contains all the N alternative routes Pr( ) Pr( ) qr r r choice choice In accordance with Bayesian framework, 1 Pr( ) Pr( ) Pr( | ) (; , ) q r r r t choice chocie m N t t Pr( | ) Pr( | ) (; ) q qr r r r t choice t choice c t Apply Expectation-Maximization (EM) algorithm (Dempster, Laird & Rubin, 1977) 1 Pr( | ) 1 qr q r choice t N 1 Pr( ) 1 qr r choice N 1 Pr( ) Pr( ) Pr( | ) q qr q qr r t choice t choice N Pr( | ) Pr( ) Pr( | ) qr q qr q qr choice t choice t choice under BAYESIAN FRAMEWORK Pr( ) Pr( | ) Pr( | ) Pr( ) qr q qr qr q q choice t choice choice t t If r* arg max Pr(choice qr | t q ), For all r = 1, 2, …, N Pr(choice q1 | t q ) Pr(choice q2 | t q ) Pr(choice qN | t q ) Route r* could then be deemed as the most likely route passenger q might have chosen. Based on the law of total probability, N component distributions, c r (t; θ r ), where r = 1, …, N Consider, on the given O-D, the overall observations of all the passengers’ journey time, t There are supposed to be N sub- populations of the observed journey times, in conformity with the N alternative routes, r A mixture distribution of journey time, m (t, Θ) 1 (; , ) (; ), r r r r mt c t N 1 1 r r N where How frequently is route r used? It should be learnt, a priori, from history data. The prior probability The likelihood that the observed journey time would be t q given the evidence that route r was actually chosen by the passenger q The likelihood function Q2 … ANSWERING Estimated results (AM Peak) K-means clustering Gaussian mixture Lognormal mixture Route Label Route1 Route2 Route1 Route2 Route1 Route2 Est. Mean (min) 23.12 31.72 22.02 28.75 21.78 28.69 Est. Stdev. (min) 2.39 3.18 1.83 4.51 1.78 4.43 Est. Mixing probability 62.55% 37.45% 35.77% 64.23% 34.02% 65.98% Naive inference passenger prop. 62.55% 37.45% 42.60% 57.40% 35.36% 64.64% Final inference passenger prop. 62.55% 37.45% 35.50% 64.50% 34.04% 65.96% Indirect route (Victoria Line – Central Line) Direct route (Circle Line only) 22.50 28.24 Oyster Journey Time (minutes) Posterior probability 15 20 25 30 35 40 45 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Route1 (Gaussian mixture) Route2 (Gaussian mixture) Route1 (Lognormal mixture) Route2 (Lognormal mixture) Posterior probabilities of an individual choosing an alternative route conditional on his/her OJT Estimated results & route matching Validation – Passenger-flow proportions (on a weekday): (Survey data source: Rolling Origin and Destination Survey (RODS), Transport for London) Direct route (Circle Line only) Indirect route (Victoria Line – Central Line) Time-band RODS Gaussian mixture Lognormal mixture RODS Gaussian mixture Lognormal mixture AM Peak (07:00-10:00) 51.89% 64.50% 65.96% 48.11% 35.50% 34.04% PM Peak (16:00-19:00) 62.28% 64.20% 71.50% 37.72% 35.80% 28.50% A whole day (05:34-00:30) 61.06% 61.02% 66.52% 38.94% 38.98% 33.48% The average journey time of each of the alternative routes was calculated by aggregating the average travel time of every journey segment Estimation & Results Posterior probabilities Survey results (AM Peak) Avg. journey time (min): (Data source: Transport for London) PDFs of the estimated mixture distributions 4 FURTHER WORK To further involve the timetable » passengers’ arrival time at origin station (time of ENTRY) » train’s scheduled departure time from platform To explicitly specify/identify each individual’s perceived choice set Journey time distribution of each alternative route The next steps … Applying to other comparable public transport networks with smartcard data Understanding route choice behaviour: » model estimation using the posterior probability estimates in the absence of actual route choices Potential applications ACKNOWLEDGEMENT The authors appreciate funding support from China Scholarship Council - University of Leeds Scholarship, and would also like to express their gratitude to the staff of Transport for London for their continued support.

Upload: q-fu

Post on 17-Jul-2015

104 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: A Bayesian modelling framework for individual passenger’s probabilistic route choice: a case study on the London Underground

3 CASE STUDY

(edited from the Standard Tube map, Transport for London)

UNDERGROUND STATIONS: VICTORIA (Orig) – LIVERPOOL STREET (Destn)

The Oyster in London & test network

Oyster Journey Time,EXT ENTOJT T T

time-stamp of EXIT (end)

time-stamp of ENTRY (start)

(minutes)

Frequency distribution of OJT in AM peak (07:00-10:00)(35,992 valid observations, 26/06/2011 – 31/03/2012)

Suppose that cr (t; θr), for all r (r = 1, 2), is

» Gaussian distribution» Lognormal distribution

Oyster Journey Time (minutes)

Freq

uenc

y

Stats (in minutes)——————— min. max. mean med. stdev.

: : : : :

174226.3429264.9664

15 20 25 30 35 40 45 500

500

1000

1500

2000

2500

3000

3500

Direct route(low service frequency)

Indirect route(high service frequency)

A Bayesian modelling framework for individual passenger’s probabilistic route choice: A case study on the London Underground

Qian Fu, Ronghui Liu and Stephane Hess, Institute for Transport Studies (ITS), University of Leeds, UK Email: [email protected]; Tel: +44 (0)113 343 1790; 34-40 University Road, Leeds LS2 9JT, UK

Presentation #14-5328, session 775The 93rd Transportation Research Board Annual Meeting

Washington D.C., 12-16 January 2014

Oyster Journey Time (minutes)

Prob

abil

ity

Den

sity

Lognormal Mixture

15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Oyster data (AM Peak)Est. Lognorm mixtureRoute1 (Victoria - Central)Route2 (Circle)

Oyster Journey Time (minutes)

Prob

abil

ity

Den

sity

15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Oyster data (AM Peak)Est. Gaussian mixtureRoute1 (Victoria - Central)Route2 (Circle)

Oyster Journey Time (minutes)

Prob

abil

ity

Den

sity

Lognormal Mixture

15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Oyster data (PM Peak)Est. Lognorm mixtureRoute1 (Victoria - Central)Route2 (Circle)

Oyster Journey Time (minutes)

Prob

abil

ity

Den

sity

Gaussian Mixture

15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Oyster data (PM Peak)Est. Gaussian mixtureRoute1 (Victoria - Central)Route2 (Circle)

Oyster Journey Time (minutes)

Prob

abil

ity

Den

sity

Lognormal Mixture

15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Oyster data (A whole weekday)Est. Lognorm mixtureRoute1 (Victoria - Central)Route2 (Circle)

Oyster Journey Time (minutes)

Prob

abil

ity

Den

sity

Gaussian Mixture

15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Oyster data (A whole weekday)Est. Gaussian mixtureRoute1 (Victoria - Central)Route2 (Circle)

providing knowledge for revealing passenger-flow distributions and traffic congestion, etc.; and

assisting public-transport managers in delivering a more effective transit service, especially during rush hours

small samples of individual passengers’ route choices

data collection

» costly, hence a/ and small sample size; and» lack of accuracy or even loss of essential information, e.g. travel time

To understand passengers’ route choice behaviour

1 MOTIVATION & QUESTIONS

Issues of data availability for model estimation

from automatic fare collection system implemented on local public transport, e.g. Oyster in London, SmarTrip in Washington D.C., Octopus in Hong Kong, and SPTC in Shanghai, to name but a few

times-tamps of the start and the end of a journey

» a sufficiently large sample of the smart-card users’ journey time between O-D stations; BUT …

their detailed itineraries?

» each individual’s actual route choice?

( | ), 1,..., Nqr qchoice t r Prob (Number of alternative routes)

Given only the observed journey time, would it be possible to tell the most likely (or even the actual) route choice that the passenger made?

observed journey time of passenger q

the passenger q choosing route r(from his/her own route-choice set)

ANS1: A conditional probability

Would there be a link that potentially relates a passenger’s route choice to his/her journey time observed from the smartcard data?

Q1

RESEARCH QUESTIONS POSSIBLE ANSWERS&

Q2

time-stamp ofend

time-stamp ofstart

Smart-card data

Given a pair of O-D stations:

2 METHOD

Mixture distribution of the journey time

For simplicity, assume that all the passengers consider an identical route-choice set that contains all the N alternative routes

Pr( ) Pr( )qr r rchoice choice

In accordance with Bayesian framework,

1Pr( ) Pr( ) Pr( | ) ( ; , )q r rrt choice chocie m

N

t t

Pr( | ) Pr( | ) ( ; )q qr r r rt choice t choice c t

Apply Expectation-Maximization (EM) algorithm(Dempster, Laird & Rubin, 1977)

1Pr( | ) 1qr qrchoice t

N

1Pr( ) 1qrrchoice

N

1Pr( ) Pr( )Pr( | )q qr q qrrt choice t choice

N

Pr( | ) Pr( )Pr( | )qr q qr q qrchoice t choice t choice

under BAYESIAN FRAMEWORK

Pr( )Pr( | )Pr( | )

Pr( )qr q qr

qr qq

choice t choicechoice t

t

If ∃r* ∊ arg max Pr(choiceqr | tq),

For all r = 1, 2, …, N

Pr(choiceq1 | tq)

Pr(choiceq2 | tq)

Pr(choiceqN | tq)Route r* could then be deemed as the most likely route passenger q might have chosen.

Based on the law of total probability,

N component distributions, cr (t; θr),

where r = 1, …, N

Consider, on the given O-D, the overall observations of all the passengers’ journey time, t

There are supposed to be N sub-populations of the observedjourney times, in conformity with the N alternative routes, r

A mixture distribution of journey time,

m (t; Ω, Θ)

1( ; , ) ( ; ), r r rrm t c t

N

11rr

Nwhere

How frequently is route rused? It should be learnt, a priori, from history data.

The prior probability

The likelihood that the observed journey time would be tq given the evidence that route r was actually chosen by the passenger q

The likelihood function

Q2… ANSWERING

Estimated results(AM Peak)

K-means clustering

Gaussian mixture

Lognormal mixture

Route Label Route1 Route2 Route1 Route2 Route1 Route2

Est. Mean (min) 23.12 31.72 22.02 28.75 21.78 28.69

Est. Stdev. (min) 2.39 3.18 1.83 4.51 1.78 4.43

Est. Mixing probability 62.55% 37.45% 35.77% 64.23% 34.02% 65.98%

Naive inference passenger prop. 62.55% 37.45% 42.60% 57.40% 35.36% 64.64%

Final inference passenger prop. 62.55% 37.45% 35.50% 64.50% 34.04% 65.96%

Indirect route(Victoria Line – Central Line)

Direct route(Circle Line only)

22.50 28.24

Oyster Journey Time (minutes)

Post

erio

r pr

obab

ilit

y

15 20 25 30 35 40 450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Route1 (Gaussian mixture)Route2 (Gaussian mixture)Route1 (Lognormal mixture)Route2 (Lognormal mixture)

Posterior probabilities of an individual choosing an alternative route conditional on his/her OJT

Estimated results & route matching

Validation – Passenger-flow proportions (on a weekday):

(Survey data source: Rolling Origin and Destination Survey (RODS), Transport for London)

Direct route (Circle Line only) Indirect route (Victoria Line – Central Line)

Time-band RODSGaussianmixture

Lognormal mixture

RODSGaussian mixture

Lognormal mixture

AM Peak (07:00-10:00) 51.89% 64.50% 65.96% 48.11% 35.50% 34.04%

PM Peak (16:00-19:00) 62.28% 64.20% 71.50% 37.72% 35.80% 28.50%

A whole day (05:34-00:30) 61.06% 61.02% 66.52% 38.94% 38.98% 33.48%

The average journey time of each of the alternative routes was calculated by aggregating the average travel time of every journey segment

Estimation & Results

Posterior probabilities

Survey results (AM Peak)

Avg. journey time (min):

(Data source: Transport for London)

PDFs of the estimated mixture distributions

4 FURTHER WORK

To further involve the timetable

» passengers’ arrival time at origin station (time of ENTRY) » train’s scheduled departure time from platform

To explicitly specify/identify each individual’s perceived choice set

Journey time distribution of each alternative route

The next steps …

Applying to other comparable public transport networks with smartcard data

Understanding route choice behaviour:

» model estimation using the posterior probability estimates in the absence of actual route choices

Potential applications

ACKNOWLEDGEMENT

The authors appreciate funding support from China Scholarship Council -University of Leeds Scholarship, and would also like to express their gratitude to the staff of Transport for London for their continued support.