Download - FinalReport - utrc2.org · Final March 2018 University Transportation Research Center - Region 2 Report Performing Organization: Manhattan College The Spatial Effect of Socio-Economic

Final

March 2018

University Transportation Research Center - Region 2

ReportPerforming Organization: Manhattan College

The Spatial Effect of Socio-Economic Demographics on Transit Ridership: a Case Study in New York

Sponsor:University Transportation Research Center - Region 2

University Transportation Research Center - Region 2

The Region 2 University Transportation Research Center (UTRC) is one of ten original University Transportation Centers established in 1987 by the U.S. Congress. These Centers were established with the recognition that transportation plays a key role in the nation's economy and the quality of life of its citizens. University faculty members provide a critical link in resolving our national and regional transportation problems while training the professionals who address our transpor-tation systems and their customers on a daily basis.

The UTRC was established in order to support research, education and the transfer of technology in the �ield of transportation. The theme of the Center is "Planning and Managing Regional Transportation Systems in a Changing World." Presently, under the direction of Dr. Camille Kamga, the UTRC represents USDOT Region II, including New York, New Jersey, Puerto Rico and the U.S. Virgin Islands. Functioning as a consortium of twelve major Universities throughout the region, UTRC is located at the CUNY Institute for Transportation Systems at The City College of New York, the lead institution of the consortium. The Center, through its consortium, an Agency-Industry Council and its Director and Staff, supports research, education, and technology transfer under its theme. UTRC’s three main goals are:

Research

The research program objectives are (1) to develop a theme based transportation research program that is responsive to the needs of regional transportation organizations and stakehold-ers, and (2) to conduct that program in cooperation with the partners. The program includes both studies that are identi�ied with research partners of projects targeted to the theme, and targeted, short-term projects. The program develops competitive proposals, which are evaluated to insure the mostresponsive UTRC team conducts the work. The research program is responsive to the UTRC theme: “Planning and Managing Regional Transportation Systems in a Changing World.” The complex transportation system of transit and infrastructure, and the rapidly changing environ-ment impacts the nation’s largest city and metropolitan area. The New York/New Jersey Metropolitan has over 19 million people, 600,000 businesses and 9 million workers. The Region’s intermodal and multimodal systems must serve all customers and stakeholders within the region and globally.Under the current grant, the new research projects and the ongoing research projects concentrate the program efforts on the categories of Transportation Systems Performance and Information Infrastructure to provide needed services to the New Jersey Department of Transpor-tation, New York City Department of Transportation, New York Metropolitan Transportation Council , New York State Department of Transportation, and the New York State Energy and Research Development Authorityand others, all while enhancing the center’s theme.

Education and Workforce Development

The modern professional must combine the technical skills of engineering and planning with knowledge of economics, environmental science, management, �inance, and law as well as negotiation skills, psychology and sociology. And, she/he must be computer literate, wired to the web, and knowledgeable about advances in information technology. UTRC’s education and training efforts provide a multidisciplinary program of course work and experiential learning to train students and provide advanced training or retraining of practitioners to plan and manage regional transportation systems. UTRC must meet the need to educate the undergraduate and graduate student with a foundation of transportation fundamentals that allows for solving complex problems in a world much more dynamic than even a decade ago. Simultaneously, the demand for continuing education is growing – either because of professional license requirements or because the workplace demands it – and provides the opportunity to combine State of Practice education with tailored ways of delivering content.

Technology Transfer

UTRC’s Technology Transfer Program goes beyond what might be considered “traditional” technology transfer activities. Its main objectives are (1) to increase the awareness and level of information concerning transportation issues facing Region 2; (2) to improve the knowledge base and approach to problem solving of the region’s transportation workforce, from those operating the systems to those at the most senior level of managing the system; and by doing so, to improve the overall professional capability of the transportation workforce; (3) to stimulate discussion and debate concerning the integration of new technologies into our culture, our work and our transportation systems; (4) to provide the more traditional but extremely important job of disseminating research and project reports, studies, analysis and use of tools to the education, research and practicing community both nationally and internationally; and (5) to provide unbiased information and testimony to decision-makers concerning regional transportation issues consistent with the UTRC theme.

Project No(s): UTRC/RF Grant No: 49198-26-28

Project Date: March 2018

Project Title: The Spatial Effect of Socio-Economic Demographics on Transit Ridership: a Case Study in New York

Project’s Website: http://www.utrc2.org/research/projects/spatial-effect-socio-economic-demographics Principal Investigator(s): Matthew Volovski, Ph.D.Assistant Professor Department of Civil & Environmental EngineeringManhattan CollegeRiverdale, NY 10471Tel: (718) 862-7184Email: [email protected]

Performing Organization(s): Manhattan College

Sponsor(s):)University Transportation Research Center (UTRC)

To request a hard copy of our �inal reports, please send us an email at [email protected]

Mailing Address:

University Transportation Reserch CenterThe City College of New YorkMarshak Hall, Suite 910160 Convent AvenueNew York, NY 10031Tel: 212-650-8051Fax: 212-650-8374Web: www.utrc2.org

Board of DirectorsThe UTRC Board of Directors consists of one or two members from each Consortium school (each school receives two votes regardless of the number of representatives on the board). The Center Director is an ex-of icio member of the Board and The Center management team serves as staff to the Board.

City University of New York Dr. Robert E. Paaswell - Director Emeritus of NY Dr. Hongmian Gong - Geography/Hunter College

Clarkson University Dr. Kerop D. Janoyan - Civil Engineering

Columbia University Dr. Raimondo Betti - Civil Engineering Dr. Elliott Sclar - Urban and Regional Planning

Cornell University Dr. Huaizhu (Oliver) Gao - Civil Engineering Dr. Richard Geddess - Cornell Program in Infrastructure Policy

Hofstra University Dr. Jean-Paul Rodrigue - Global Studies and Geography

Manhattan College Dr. Anirban De - Civil & Environmental Engineering Dr. Matthew Volovski - Civil & Environmental Engineering

New Jersey Institute of Technology Dr. Steven I-Jy Chien - Civil Engineering Dr. Joyoung Lee - Civil & Environmental Engineering

New York Institute of Technology Dr. Nada Marie Anid - Engineering & Computing Sciences Dr. Marta Panero - Engineering & Computing Sciences

New York University Dr. Mitchell L. Moss - Urban Policy and Planning Dr. Rae Zimmerman - Planning and Public Administration

(NYU Tandon School of Engineering) Dr. John C. Falcocchio - Civil Engineering Dr. Kaan Ozbay - Civil Engineering Dr. Elena Prassas - Civil Engineering

Rensselaer Polytechnic Institute Dr. José Holguín-Veras - Civil Engineering Dr. William "Al" Wallace - Systems Engineering

Rochester Institute of Technology Dr. James Winebrake - Science, Technology and Society/Public Policy Dr. J. Scott Hawker - Software Engineering

Rowan University Dr. Yusuf Mehta - Civil Engineering Dr. Beena Sukumaran - Civil Engineering

State University of New York Michael M. Fancher - Nanoscience Dr. Catherine T. Lawson - City & Regional Planning Dr. Adel W. Sadek - Transportation Systems Engineering Dr. Shmuel Yahalom - Economics

Stevens Institute of Technology Dr. Sophia Hassiotis - Civil Engineering Dr. Thomas H. Wakeman III - Civil Engineering

Syracuse University Dr. Baris Salman - Civil Engineering Dr. O. Sam Salem - Construction Engineering and Management

The College of New Jersey Dr. Thomas M. Brennan Jr - Civil Engineering

University of Puerto Rico - Mayagüez Dr. Ismael Pagán-Trinidad - Civil Engineering Dr. Didier M. Valdés-Díaz - Civil Engineering

UTRC Consortium Universities

The following universities/colleges are members of the UTRC consortium under MAP-21 ACT.

City University of New York (CUNY)Clarkson University (Clarkson)Columbia University (Columbia)Cornell University (Cornell)Hofstra University (Hofstra)Manhattan College (MC)New Jersey Institute of Technology (NJIT)New York Institute of Technology (NYIT)New York University (NYU)Rensselaer Polytechnic Institute (RPI)Rochester Institute of Technology (RIT)Rowan University (Rowan)State University of New York (SUNY)Stevens Institute of Technology (Stevens)Syracuse University (SU)The College of New Jersey (TCNJ)University of Puerto Rico - Mayagüez (UPRM)

UTRC Key Staff

Dr. Camille Kamga: Director, Associate Professor of Civil Engineering

Dr. Robert E. Paaswell: Director Emeritus of UTRC and Distinguished Professor of Civil Engineering, The City College of New York

Dr. Ellen Thorson: Senior Research Fellow

Penny Eickemeyer: Associate Director for Research, UTRC

Dr. Alison Conway: Associate Director for Education/Associate Professor of Civil Enginering

Nadia Aslam: Assistant Director for Technology Transfer

Nathalie Martinez: Research Associate/Budget Analyst

Andriy Blagay: Graphic Intern

Tierra Fisher: Of ice Manager

Dr. Sandeep Mudigonda, Research Associate

Dr. Rodrigue Tchamna, Research Associate

Dr. Dan Wan, Research Assistant

Bahman Moghimi: Research Assistant; Ph.D. Student, Transportation Program

Sabiheh Fagigh: Research Assistant; Ph.D. Student, Transportation Program

Patricio Vicuna: Research AssistantPh.D. Candidate, Transportation Program

Membership as of January 2018

Page | 1

TECHNICAL REPORT STANDARD TITLE PAGE

1. Report No. 2.Government Accession No. 3. Recipient’s Catalog No.

4. Title and Subtitle 5. Report Date

The Spatial Effect of Socio-Economic Demographics on Transit Ridership:

a Case Study in New York

March, 2018

6. Performing Organization Code

7. Author(s) 8. Performing Organization Report No.

Matthew Volovski, PhD

9. Performing Organization Name and Address 10. Work Unit No.

Manhattan College

4513 Manhattan College Parkway

Riverdale, NY 10471

11. Contract or Grant No.

49198-26-28

12. Sponsoring Agency Name and Address 13. Type of Report and Period Covered

University Transportation Research Center

Marshak Hall - Science Building, Suite 910

The City College of New York

138th Street & Convent Avenue ,New York, NY 10031

14. Sponsoring Agency Code

15. Supplementary Notes

16. Abstract

Demand for vehicle and public transportation systems continues to increase in and around major urban centers.

This increase is especially pronounced during the morning and evening commutes and is further complicated by

the complex spatial interactions that influence the variation in system demand. In an effort to help agencies better

understand this variability and develop better demand forecasts this research investigated the underlying factors

impacting public transportation ridership regardless of transit mode, then uses this insight to estimate specific

models to help forecast changes in subway ridership. The spatial database for the case study consisted of social,

economic, and land use characteristics for the 2166 census tracts in the five boroughs of New York City, NY. The

spatial models were found to have a better overall model fit compared to their non-spatial counterparts. Moreover,

spatial dependence was found to be statistically significant in both models. Failure to account for spatial

dependence in estimating public transportation use at the census tract or station level could lead to biased,

inefficient or inconsistent parameter estimates. The completed research can help public agencies better address

resource allocation by identifying locations that are over or underperforming in terms of expected ridership or

identifying locations for network expansion.

17. Key Words 18. Distribution Statement

Public Transportation, Spatial Econometrics, Census

Tract

19. Security Classif (of this report) 20. Security Classif. (of this page) 21. No of Pages 22. Price

Unclassified Unclassified

47

Form DOT F 1700.7 (8-69)

Page | 2

Disclaimer

The contents of this report reflect the views of the authors, who are responsible for the facts and

the accuracy of the information presented herein. The contents do not necessarily reflect the

official views or policies of the UTRC or the Federal Highway Administration. This report does

not constitute a standard, specification or regulation. This document is disseminated under the

sponsorship of the Department of Transportation, University Transportation Centers Program, in

the interest of information exchange. The U.S. Government assumes no liability for the contents

or use thereof.

Page | 1

TABLE OF CONTENTS

List of Tables ...................................................................................................................................2

List of Figures ..................................................................................................................................3

Executive Summary .........................................................................................................................4

1 Introduction ..........................................................................................................................6

1.1 Background ...................................................................................................................... 6

1.2 Scope and Motivation....................................................................................................... 8

1.3 Review of Current Literature ........................................................................................... 9

2 Data ....................................................................................................................................12

2.1 Data Collection ............................................................................................................... 13

3 Methodology ......................................................................................................................18

3.1 Overview ........................................................................................................................ 18

3.2 Spatial Weights Matrix................................................................................................... 18

3.2.1 Moran’s I ................................................................................................................. 19

3.3 Models for Spatial Dependence and Spatial Heterogeneity ........................................... 19

3.3.1 Tests for Spatial Lag and Spatial Error ................................................................... 21

4 Results: Commuters Using Public Transportation .............................................................23

4.1 Spatial Autocorrelation .................................................................................................. 24

4.2 Model for Spatial Dependence ....................................................................................... 27

5 Results: Subway Ridership ................................................................................................32

5.1 Spatial Autocorrelation .................................................................................................. 34

5.2 Model for Spatial Dependence ....................................................................................... 37

6 Summary and Conclusions ................................................................................................40

7 Acknowledgements ............................................................................................................42

8 References ..........................................................................................................................43

Page | 2

LIST OF TABLES

Table 1 Descriptive statistics for select census tract variables ..................................................... 14

Table 2. Commuters Using Public Transportation Modeling Specification Results .................... 29

Table 3. Commuters Using Public Transit Modeling Specification Results ................................ 38

Page | 3

LIST OF FIGURES

Figure 1. Study Area: New York City Boroughs ............................................................................ 9

Figure 2. New York City Census Blocks ...................................................................................... 13

Figure 3. Annual New York City Subway Ridership by Borough ............................................... 15

Figure 4. New York City Subway Stations ................................................................................... 16

Figure 5. Change in Annual Subway Ridership 2011-2016 by Station ........................................ 17

Figure 6. Percent Change in Annual Subway Ridership 2011-2016 by Station ........................... 17

Figure 7. Quantile Plot for Percentage of Commuters Using Public Transportation ................... 24

Figure 8. LISA Cluster Map for Percentage of Commuters Using Public Transportation ........... 26

Figure 9. Moran’s I for 1st Order Queen Matrix (Percentage of Commuters Using Public

Transportation) ........................................................................................................... 27

Figure 10. Connectivity Frequency Distribution for 1st Order Queen Matrix (Percentage

of Commuters Using Public Transportation) ............................................................. 28

Figure 11. Subway Station Attributable Area ............................................................................... 33

Figure 12. Quantile Plot for Change in Subway Ridership 2011-2016 (in Millions) ................... 34

Figure 13. Example Neighbor Diagram (1.0 Mile Distance Matrix) ............................................ 35

Figure 14. Connectivity Frequency Distribution for 1.0 Mile Weights Matrix ............................ 35

Figure 15. Moran’s I for Change in Ridership 2011-2016 (1.0 Mile Distance Matrix) ............... 36

Figure 16. Moran’s I for Change in Ridership 2011-2016 (1.5 Mile Distance Matrix) ............... 37

Page | 4

EXECUTIVE SUMMARY

Demand for vehicle and public transportation systems continues to increase in and around

major urban centers. This increase is especially pronounced during the morning and evening

commutes and is further complicated by the complex spatial interactions that influence the

variation in system demand. In an effort to help agencies better understand this variability and

develop better demand forecasts this research investigated the underlying factors impacting

public transportation ridership regardless of transit mode, then uses this insight to estimate

specific models to help forecast changes in subway ridership. The spatial database for the case

study consisted of social, economic, and land use characteristics for all 2166 census tracts in

New York City, NY’s five boroughs. The data were used to estimate spatial econometric models

for the percentage of commuters using public transportation at the census-tract level and the

change in subway ridership between 2011 and 2016 at the subway station-level.

Analysis of the commuters indicate that census tracts with a higher average commute

time, greater employed population, higher per capita income and lower median household

income were found to have a higher percentage of commuters using public transportation.

Additionally, this percentage increased if neighboring census tracts had a greater commercial

space area or fewer buildings. Lastly, the percentage increased in a tract if it increased in

neighboring tracts and vice-versa. The may be reflecting social norms or stigma related to public

transportation versus personal vehicle ownership.

Results of the change in subway ridership between 2011 and 2016 indicate that subway

stations that serve more train lines or are in areas comprised of census tracts with a greater

number of tax units (residential, commercial, etc.) or lower mean household incomes

Page | 5

experienced a greater increase in ridership. Furthermore, subway stations located in areas

surrounded by census tracts with more commercial property or higher median family income are

also expected to have a greater increase in ridership. Lastly, ridership at a given station decreases

due to an increase in ridership at neighboring stations. This may indicate that a change in

ridership at a station is due, in part, to riders in a region changing which station they use instead

of riders shifting from alternative modes of transportation.

The spatial models were found to have a higher overall model fit compared to their non-

spatial counterparts. Moreover, spatial dependence was found to be statistically significant in

both models. Failure to account for spatial dependence in estimating public transportation use at

the census tract or station level could lead to biased, inefficient or inconsistent parameter

estimates. The completed research can help public agencies better address resource allocation by

identifying locations for network expansion or locations that are over or underperforming in

terms of expected ridership.

Page | 6

1 INTRODUCTION

1.1 Background

Municipalities invest in public transportation systems, in part, to combat increasing road

congestion. Investments in subway and bus rapid transit systems directly remove drivers from

the road network onto alternative transportation systems. Investments into on-road public

transportation, such as local and city public bus systems, help to alleviate congestion by

increasing vehicle occupancy rates and thereby reducing the average space in the network

consumer by each individual user. Investments in rail networks, such as region’s commuter rail

or city’s subways, can result in users shifting off of the road network for a portion of their trips.

The biggest impact can be expected during the busiest travel times, specifically during the

morning and evening commute. Analysis of public transit use and commuters is complicated by

the fact that public transit use in general and specifically for the purpose of commuting is not

constant over space. There are census tracts in New York City that have less than 10% of

commuters reporting using public transportation while other census tracts in the city report over

90%. In a city with easy access to public transportation (bus, subway, etc.), there is a need to

explore the influential factors that contribute to this variation in use. This information could help

public agencies better address resource allocation by identifying locations that are over or

underperforming in terms of expected ridership or identifying locations for network expansion.

The impact that land use and socioeconomic demographics have on public transportation

use for the purpose of commuting in not constant over space. Previous research on this topic

have implicitly ignored the direct and underlying spatial processes (Cervero, 1993; Kitamura et

al., 1997; Kyte et al; Kain and Liu 1999; Boarnet and Greenwald, 2000). In reality, it is observed

that passengers embarking or disembarking at a given transit stop live, work, and recreate in both

Page | 7

the immediate neighborhood as well as the adjacent neighborhoods. If public transportation use

is high in a given census tract it could indicate that access to public transportation is especially

convenient in the census tract which would could increase the percentage of commuters using

public transportation in neighboring tracts. There could also be a negative stigma associated with

public transportation in a given tract that could impact the decision making of those living in the

census tract as well as those living in neighboring census tracts. Vehicle ownership is seen by

some as a symbol of social status with public transportation perceived as a less desirable mode of

travel. These social norms can propagate out of a region and effect the decision making of

neighboring areas. These complex spatial interactions are crucial to understanding the observed

variation in public transportation use.

Page | 8

1.2 Scope and Motivation

This research combined big-data visualization with spatial econometric modeling. Data

collection included demographic and economic census data analyzed at the census tract-level

paired with comprehensive land use and valuation based on tax lot records. This enabled the

visualization of the complex interrelationships in the data. Spatial econometric models were used

to capture the complex spatial trends that characterize the relationship between the influential

factors and public transit use. The five boroughs of New York City (NYC) are used as a case

study (Figure 1). The research investigates two important aspects of public transportation use.

The first is public transportation used specifically for commuting regardless of transit mode.

These insights were then used to develop models to estimate the five-year change in subway

ridership. The underlying causes for variability in public transportation use and ridership are not

constant over space which, if left unaccounted for in statistical and econometric models, will

yield biased, inefficient, and inconsistent results (Anselin, 1988a; 2006; Anselin and Rey, 2014).

The impact of spatial dependence can be investigated using lagged independent variables, known

as cross-regressive terms, lagged dependent variables, and/or by applying a spatial process to the

error term. In the context of the current research, cross-regressive terms quantify the change in

public transportation use or subway ridership in a given census tract or at given subway station

due to the land use and socioeconomic characteristics of neighboring census tracts. The lagged

dependent variable captures the propensity for the public transportation use/subway ridership at

one location to be impacted by the use/ridership in neighboring locations. In doing so, the lagged

dependent variable accounts for global spillovers in that the use/ridership at one location is a

function of the use/ridership of its neighbors, which is a function of their neighbors’

use/ridership, and so on. The overall goal of the research is to develop efficient and unbiased

Page | 9

models for commuters’ use of public transportation and the five year change in subway ridership.

The resulting models can be used by public agencies and decision makers to identify locations

that are over or underperforming in terms of expected use/ridership for the purposes of resource

allocation while also providing the framework to identify the best geographic areas for network

expansion.

Figure 1. Study Area: New York City Boroughs

1.3 Review of Current Literature

Understanding the link between the socioeconomic conditions of a region and the

propensity for those who reside in the region to use public transportation informs transportation

policy-making at the state, regional, and local levels. Transportation agencies and policy makers

have long grappled with shrinking budgets precipitating the need to optimize investment while

Staten Island

Bronx

Queens

Brooklyn

Manhattan

Page | 10

ensuring an equitable distribution of the benefits of public transportation across user groups. In

response to this need, research has continuously sought to better quantify underlying causes for

variance in public transportation use in geographic regions. The body of literature on this topic

has implicitly assumed that the causes of variance are constant over space. However, if this

assumption is not reflected in the actual data then the resulting econometric models would have

the potential to lead to biased, inefficient, and inconsistent results (Anselin, 1988a; 1988b; 2006;

Anselin and Rey, 2014). People are not limited to the public transportation options provided in

the census tract in which they live, but are more likely to use public transportation options closer

to their homes. For this reason the propensity for the people living in a given area (census tract)

to use public transportation in general or a specific mode is informed by the characteristics of

their home and neighboring tracts.

Previous research has investigated the variation in transit use over time as a function of

economic and land-use characteristics for a limited number of locations (Boarnet and Greenwald,

2000; Kain and Liu 1999). Research focusing on larger cross-sectional studies of public

transportation for commuting and non-commuting trips (Cervero, 1993; Boarnet and Greenwald,

2000; Kitamura et al., 1997) have not accounted for spatial effects. Limited research has been

conducted that directly investigates the underlying spatial processes evident in the data. The

research that has addressed the issue focused on unobserved spatial process (spatial error) at the

state-level without accounting for local spillovers (Chakrabortya and Mishrab, 2013). In

additional to the limited research on public transportation, past research has used spatial

econometric analysis to determine the relationship between socioeconomic factors and vehicle

use and vehicle ownership (Badoe and Miller, 2000; Volovski, 2015). Some of the research

estimated the average individual or household vehicle-miles-traveled (for a zip-code or census

Page | 11

tract) as a function of local and lagged socioeconomic variables (Frank et al., 2000; Cook et al.,

2012). In addition to the limited past research on spatial modeling of vehicle use and ownership

data, there have been spatial econometric applications in other areas of transportation research,

most notably in transportation safety data analysis and modeling. Spatial autocorrelation

regression estimation techniques have been used to model crashes involving vehicles and

pedestrians (LaScala et al., 2000; Schneider et al., 2000), and vehicles only (Boarnet and

Greenwald, 2000; Li et al., 2007; Aguero-Valverde and Jovanis, 2008; Erdogan 2009).

Furthermore, research has shown the influence of socioeconomic characteristics on vehicle crash

rates across regions (Kirk et al., 2005; Stamatiadi and Puccini 1999).

Page | 12

2 DATA

Social, economic, and land use data were collected and analyzed. The data was used to

investigate two measures public transportation use in large metropolitan centers. The first

measure, commuter transit use, is defined as the percentage of commuters in a census tract that

that use public transportation as their primary means of travel to and from work. The second

measure, change in subway ridership, is defined as the five year change in subway ridership by

station based on annual ridership data from 2011 to 2016.

Social, economic, and land use data was aggregated at the census tract-level due to the

relative consistency across tracts in terms of key characteristics such as population size (between

2,000 and 8,000). High quality data for the 2166 census tracts (Figure 2) across the five boroughs

of New York City is available from regional, state, and national sources. New York City was

chosen as the case study location due to availability and accessibility of its public transportation

system. Its extensive network of public transportation modes includes the nation’s largest

subway system in terms of ridership, length, and number of stops and largest bus system in terms

of total ridership (APTA, 2017).

Page | 13

Figure 2. New York City Census Blocks

2.1 Data Collection

Data were obtained from multiple sources. Social and economic data, including

commuter mode choice, were obtained from the 2015 American Community Survey conducted

by the United States Census Bureau (U.S. Census, 2015). Land use data were obtained from the

2017 PLUTO database administered by the New York City Department of City Planning

(PLUTO, 2017). Subway ridership data were obtained from the Metropolitan Transportation

Authority. Table 1 provides a brief summary of the descriptive statistics for the factors that have

been shown to influence transit use.

Page | 14

Table 1 Descriptive statistics for select census tract variables

Average

Standard Deviation

1st Quartile

Median 3rd

Quartile Max

Percent commute with public transportation 0.547 0.166 0.434 0.572 0.678 1.000

(0.000 – 1.000)

Mean travel time to work 40.45 7.17 36.80 41.10 44.70 100.00

(minutes)

Mean household income 78,555 44,948 52,538 69,434 89,512 435,803

(in 2014 inflation adjusted dollars)

Median household income 58,543 28,919 38,608 53,868 73,044 250,000


Per capita income 31,488 24,732 18,364 24,746 34,403 247,852


Percent with health insurance 0.868 0.073 0.828 0.874 0.920 1.000

(0.000 – 1.000)

Building area 2.527 2.955 1.174 1.788 2.690 54.138

(gross area for all buildings in 1,000,000ft2)

Commercial area 0.841 2.117 0.136 0.300 0.623 26.210

(gross commercial area in 1,000,000ft2)

Land area 2.976 8.755 1.098 1.313 2.134 214.976

(land area in ft2)

Number of residential units 1,625 1,290 841 1,338 2,008 14,338

(total)

Total assessed value 160.21 418.27 31.09 52.92 99.06 6,800.97

(total of all tax lots in $1,000,000)

New York City subway ridership and station location data were obtained from two open-

sourced databases compiled by the New York City Metropolitan Transportation Authority (MTA

2015; 2017). Ridership data included the annual ridership by borough and average weekday and

weekend ridership for each subway station. The total subway ridership per borough is broken

down in Figure 3.

Page | 15

Figure 3. Annual New York City Subway Ridership by Borough

It is important to note that the ridership statistics reflect the number boarding at each

station not the total volume of riders at the station (entering, exiting, and pass-through). Figure 4

shows the location of the subway stations. The stations are spread across all the boroughs except

Staten Island. Station locations were obtained from the MTA station entrance database. This

dataset included; the coordinates (latitude and longitude) for all entrances to each station, the

coordinates of each station, the subway lines accessible at each station, and the ADA

accessibility of each entrance. Since ridership data was station specific, the location and

accessibility data obtained from the station entrance database had to be grouped by station before

being merged with the ridership data.

-

200

400

600

800

1,000

1,200

2011 2012 2013 2014 2015 2016

Annual

Ridership

(Millions)

Manhattan

Brooklyn

Queens

Bronx

Page | 16

Figure 4. New York City Subway Stations

It can be difficult to quantify the factors that influence subway ridership in a region

where the variation in ridership is immense. For instance, in 2016 the average ridership at the ten

busiest subway stations was over 32 million annual riders, whereas the average at the ten stations

with the lowest ridership was just over 250 thousand annual riders. Investigating the change in

ridership, in terms of magnitude or percent change, can reduce the variability in the data, thereby

allowing for a more nuanced analysis of the underlying social, economic, and land use factors

that influence subway ridership. The change in annual ridership and the percent change in annual

rider between 2011 and 2016 is illustrated in Figure 5 and Figure 6, respectively.

Page | 17

Figure 5. Change in Annual Subway Ridership 2011-2016 by Station

Figure 6. Percent Change in Annual Subway Ridership 2011-2016 by Station

Page | 18

3 METHODOLOGY

3.1 Overview

The social, economic, and land use data detailed in the previous section were used to

investigate two measures of public transportation use in large metropolitan centers. The first

measure, commuter transit use, is defined as the percentage of commuters in a census tract that

use public transportation as their primary means of travel to and from work. The second

measure, change in subway ridership, is defined as the change in subway ridership from 2011 to

2016. Its extensive network of public transportation modes includes the nation’s largest subway

system in terms of ridership, length, and number of stops and largest bus system in terms of total

ridership (APTA, 20170). Spatial econometric modeling techniques were used to investigate the

factors effecting transit use while accounting for spatial processes. All spatial econometric

modeling was completed using the spatial software GeoDa and GeodaSpace (Anselin et al.,

2006a).

3.2 Spatial Weights Matrix

The spatial weights matrix is used to define the connectivity between a location and its

neighbors. Connectivity can be defined by the form (rook, queen/king, k nearest neighbors,

distance) and the extent (order or number). Connectivity in 1st order rook matrix are all regions

that share an edge, a 1st order queen/king matrix is a rook matrix that includes regions that only

share a single vertex, elements in a k-nearest neighbor matrix all have the same number of

neighbors (k), and connectivity in a distance matrix is defined by the distance between the

regions, typically measure between the regions’ centroids (Anselin and Rey, 2014; Cliff and Ord,

1981).

Page | 19

3.2.1 Moran’s I

The Moran’s I is a measure of spatial autocorrelation of a given variable and is defined in

Equation 1. The null hypothesis is that the variable exhibits spatial randomness, the alternative is

spatial dependence. Moran’s I takes the form:

𝐼 =𝑁

∑ ∑ 𝑤𝑖𝑗𝑗𝑖

∑ ∑ 𝑤𝑖𝑗𝑗𝑖 (𝑋𝑖 − �̂�)(𝑋𝑗 − �̂�)

∑ (𝑋𝑖 − �̂�)2

𝑖

1

Where (xi – x) is the rate of region i centered on the mean for i≠j, N is the number of

regions, and wij is the weight between region i and j. The statistical significance of Moran’s I can

be determined using random permutations where the variable of interest is randomly reassigned

across the geographic units.

3.3 Models for Spatial Dependence and Spatial Heterogeneity

Spatial process models can take a variety of forms depending on which functional

components (dependent variable, independent variables, and/or error) have a spatial process

applied (Anselin, 1988a; 1988b; Anselin and Rey, 2014). Spatial error models can be estimated

to ensure that regression estimation is efficient in instances where spatially correlated error terms

are observed in the dataset (Anselin, 1988a). The spatial error model takes the form Anselin,

1988b; Anselin and Rey, 2014):

𝑦 = 𝛽𝑥 + 𝜀

𝜀 = 𝜆𝑊𝜀 + 𝜇

2

where the dependent variable, y, is a function of a vector of independent variables, x, and

a spatial error term 𝜀. Spatial autocorrelation is accounted for in the error term by introducing the

Page | 20

weights matrix, W. μ is a non-spatial error term. The estimated parameter λ can be tested to

determine if the spatial error is statistical significance.

Spatial dependence can be accounted for by using a spatial lag or a cross regressive

model. Failure to account for spatial dependence will result in biased and inconsistent parameter

estimates (Anselin, 1988b; Anselin, 2006). The dependent variable in a spatial lag model is

estimated as a function of the observation’s independent variables and its neighbor’s dependent

variable. In the context of the commuter transit use model it means that the dependent variable,

percentage of commuters using transit in a given census tract, is a function of the attributes of the

census tract and the percentage of commuters that use transit in neighboring census tracts. Cross-

regressive estimation allows for the dependent variable to be estimated as a function of the

attributes of the given census tract and the attributes of neighboring census tracts.

The Spatial Durbin Model incorporates both spatial lag and cross-regressive terms. It

takes the form (Anselin, 1988b; Anselin and Rey, 2014):

𝑦 = 𝜌𝑊𝑦 + 𝛽𝑥 + 𝛾𝑊𝑍 + 𝜇 3

where Wy is the spatial lag term, WZ is a vector of cross-regressive terms, and ρ and γ are

coefficients for the lagged dependent variable and lagged independent variables (cross-regressive

terms), respectively. It is important to note that regardless of the type of weights matrix chosen,

the lagged independent variable and spatial error term will account for global spillovers. The

endogeneity of the lagged dependent variable can be overcome with two-stage least squares

estimation (2SLS), a special case of instrumental variables (IV) (Anselin and Rey, 2014). The

General Spatial Durbin is a special case of the Spatial Durbin where spatial lag, spatial error, and

cross regression are all statistically significant.

Page | 21

3.3.1 Tests for Spatial Lag and Spatial Error

The Lagrange Multiplier (LM) and the robust LM test for spatial lag are used to

determine if spatial error, spatial lag, or spatial error and lag are statistically significant in the

data (Anselin, L.,1988c; Anselin et al., 1996; Anselin and Rey, 2014, ). The LM test for spatial

error determines if the spatial error coefficient is statically different from zero. Likewise, the LM

test for spatial lag determines the statistical significance of the spatial lag coefficient (Anselin,

2006). The null hypothesis for the LM test for error is that spatial error coefficient (λ) is equal to

zero. Equation 4 details the LM test for error.

H0: λ = 0

H𝐴: λ ≠ 0

for 𝑦 = 𝛽𝑥 + λW + ε

𝐿𝑀λ = [e′We

s2]

2

𝑇⁄ ~ 𝜒12

𝑇 = 𝑡𝑟[(𝑊′ + 𝑊)]

𝑠2 = 𝑒′𝑒/𝑛

4

Likewise, the null hypothesis for the LM test for lag is that the spatial lag coefficient (ρ)

is equal to zero. Equation 4 details the LM test for lag.

Page | 22

H0: λ = 0

H𝐴: λ ≠ 0

for 𝑦 = ρWy + 𝛽𝑥 + μ

𝐿𝑀ρ = [e′Wy

s2]

2

𝑛𝐽ρβ⁄ ~ 𝜒12

𝐽ρβ = [(𝑊𝑋𝛽)′𝑀(𝑊𝑋𝛽) + 𝑇𝑠2] 𝑛⁄ s2

𝑀 = 𝐼 − 𝑋(𝑋′𝑋)−1𝑋′

5

The LM test for spatial lag and the LM test for spatial error are unidirectional in that they

only test for the presence of one spatial process. The robust LM tests can be used to test for either

spatial lag or spatial error while accounting for the presence of the other. The robust LM test for

lag and error are provided in Equations 6 and 7, respectively

𝐿𝑀λ∗ = [e′We

s2− 𝑇(𝑛𝐽ρβ)

−1 e′Wy

s2]

2

𝑇[1 − 𝑇(𝑛𝐽ρβ)−1

⁄

[e′Wy

s2−

e′We

s2]

2

/[ 𝑛𝐽ρβ − 𝑇]

6

7

The Anselin-Kelejian (AK) test is a variant of the Moran’s I statistic applied to the

residuals of the estimation. If the AK test results are statistically significant it indicates that there

is spatial autocorrelation left unaccounted for in the developed model (Anselin and Kelejian,

1997).

Page | 23

4 RESULTS: COMMUTERS USING PUBLIC TRANSPORTATION

New York City is home to one of the Nation’s most robust public transportation

networks. Even though there is nearly uninhibited access to the network, the variability in its use

as a primary mode choice for commuters ranges from 10% to 90% across geographical units.

This section presents the results of spatial econometric modeling to examine the social,

economic, and land use characteristics that influence public transportation ridership for work

trips. A non-spatial model, estimation without spatial error or lag, was estimated to provide a

baseline for comparison. The non-spatial model results are presented in Table 2. The results

indicate that the percentage of commuters using public transit living in a census tract is a

function of the employed population, per capita income, mean travel time to work, and median

family income. The R-squared and adjusted R-squared values indicate that the non-spatial model

is explaining 32% of the variance exhibited in the census tract commuter transit data.

The non-spatial model has a corresponding multicollinearity condition number of 21.91.

This value is used as an indication of the degree to which explanatory variables show a linear

relationship. In statistics, it is generally agreed upon that multicollinearity should be addressed if

the condition number is greater than 30 (Anselin and Rey, 2014). The Jarque-Bera test for non-

normality of the error terms was significant at a 99% level of confidence (Jarque and Bera,

1980). Therefore, in order to test the residuals for homoscedasticity (consistency of the error

variance) a Koenker–Basset test, a variant of the Breusch-Pagan test (Breusch and Pagan, 1979),

was used because, unlike the Braush-Pagan test, it does not assume normality of the error terms.

The Koenker–Basset test value was significant at a 99% level of confidence. To account for

heterogeneity, White-adjusted standard errors that are robust to heteroskedasticity were used

(White, 1980). A General Spatial Durbin specification was estimated to obtain the coefficient for

Page | 24

the spatial error term, λ. It had an estimated coefficient of -0.042 and a corresponding 61% level

of confidence, well below any acceptable threshold to reject the null hypothesis of no spatial

error. Therefore, final model specification is the Spatial Durbin.

4.1 Spatial Autocorrelation

Spatial autocorrelation is a measure of the correlation a variable has with itself in space

(Anselin, 1988a; 1988b; Anselin and Rey, 2014). Preliminary analysis of spatial autocorrelation

can be completed by investigating a plot of the dependent variable over space to discern if there

appears to be spatial clustering of relatively higher or lower values (positive autocorrelation).

Figure 7 shows that there are clusters of census tracts with a higher percentage of commuters

using public transit in northern Manhattan, selected areas of the Bronx and Brooklyn, whereas

there are clusters of census tracts in Queens and Staten Island with relatively lower transit use.

Figure 7. Quantile Plot for Percentage of Commuters Using Public Transportation

0 - 42.45%

42.45% - 56.72%

56.72% - 67.44%

67.44% - 100.00%

Page | 25

When spatial autocorrelation is positive it indicates that a census tract with a higher

percentage of commuters using public transit correlates with neighboring tracts that also have a

higher percentage (or smaller values correlate to smaller neighbor values). Negative

autocorrelation occurs when greater values are correlated with smaller neighbor values (and vice-

versa). Spatial clustering is visible in the local indicator of spatial association (LISA) cluster map

shown in Figure 8. The LISA map is best defined by Anselin as: “The LISA for each observation

[say, a small region among a set of regions] gives an indication of significant spatial clustering of

similar values around that observation. The sum of LISAs for all observations is proportional to a

global indicator of spatial association” (Anselin, 1995; Anselin et al., 1996).

Autocorrelation is found to exist in nearly half of the census tracts (960 out of 2106).

Positive correlation was found in 917 tracts of which 503 were high-high spatial clustering and

414 were low-low spatial clustering. High-high clustering is primarily found in tracts spread

across northern Manhattan, southern Bronx, and northern Brooklyn. In these areas, a census tract

with a higher percentage of transit use is more likely to have neighboring census tracts with

higher transit use. There were only 43 instances of negative autocorrelation which means a

census tract with high transit use is more likely to have a neighbor with low transit use. There are

three neighbor-less census tracts which are located on islands with a small enough population to

require a single census tract (unlike an island such as Roosevelt Island that is comprised of

multiple census tracts).

Page | 26

Figure 8. LISA Cluster Map for Percentage of Commuters Using Public Transportation

The statistical measure, Moran’s I, is used to measure the spatial autocorrelation in the

dataset and can be compared to a randomized spatial set to test the hypothesis of the presence of

spatial autocorrelation (Cliff and Ord, 1981). A plot of the Moran’s I is presented in Figure 9.

The Moran’s I was calculated to be 0.58047 with a corresponding z-score of 45.36 and p-value

of 0.002 based on 999 random permutations. This indicated that there is spatial heterogeneity at a

99.8% level of confidence.

Page | 27

Figure 9. Moran’s I for 1st Order Queen Matrix (Percentage of Commuters Using Public

Transportation)

4.2 Model for Spatial Dependence

It was determined that the commuter transit use data exhibits spatial processes

demonstrated by a lagged dependent variable and two cross-regressive terms. The spatial

processes are best captured using a 1st order queen matrix. Figure 10 provides a histogram of the

1st-order queen connectivity of census tracts in NYC in terms of the number of neighbors each

tract has.

% of Commuters Using Public Transportation

Page | 28

Figure 10. Connectivity Frequency Distribution for 1st Order Queen Matrix (Percentage of

Commuters Using Public Transportation)

The resulting Spatial Durbin model is presented in Table 2. The results were used to

investigate the influence each variable had on the expected percentage of commuters using

public transportation in each census tract.

3 824

91

275

599575

341

141

5518 14 8 1 1 4 1

0

100

200

300

400

500

600

700

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Freq

uen

cy

Number of Neighbors

Page | 29

Table 2. Commuters Using Public Transportation Modeling Specification Results

Response Variable: Percent of Commuters using Public Transit in the Census Tract (0.000 – 1.000)

Non-spatial Spatial Durbin w/

White Adj. Std. Errors

Independent Variables Coefficient Coefficient

Constant 0.2007*** -0.1243***

Mean travel time to work (minutes) 0.0086*** 0.0068***

Employed population (age 16 and older in 1,000s) 0.0295*** 0.0162***

Per capita income (in $10,000s) 0.0410*** 0.0127***

Median family income (in 2014 inflation adjusted dollars) -0.0409*** -0.0120***

Total assessed value of all tax lots (in $100,000,000) -0.0293*** -0.0206***

Percent with health insurance (0.000 – 1.000) 0.0592** 0.1298***

Cross-Regressive Variables (1st Order Queen Weights Matrix)

Number of buildings (total for all tax lots) -0.0001***

Percent commercial floor area (com/total; 0.000 – 1.000) 0.1110***

Spatial Lag Variable (1st Order Queen Weights Matrix)

Percent of commuters using public transit (0.000 – 1.000) 0.5789***

Model Statistics

Number of Observations 2166 2166

R-squared 0.3248

Adjusted R-squared 0.3232

Pseudo R-squared 0.6993

Spatial Pseudo R-squared 0.5167

***99% Level of Confidence, **95% Level of Confidence, *90% Level of Confidence

The estimated coefficients were determined to be statistically significant at the 99% level

of confidence. The resulting model experienced an increased statistical fit compared to the non-

spatial model exhibited by a spatial pseudo R-squared value 0.5167.

The results show that census tracts with a higher average commute time have a higher

percentage of commuters using public transportation. This may indicate that more secluded

census tracts are better served by public transportation compared to alternative travel modes such

as driving your own vehicle. It was shown that census tracts with a greater employed population

Page | 30

have a higher percentage of commuters using public transit. Two variables were included in the

model to capture the relative wealth of the individuals in the census tract. An increase in per

capita income increases the expected percentage commuters relying on public transportation

whereas the median household income had an inverse relationship with the dependent variable.

Census tracts with a higher household income would be expected to have a greater number of

households that can afford at least one vehicle, which is the main alternative to public

transportation use for commuting. This relationship is supported by the total assessed value

variable. Census blocks with more valuable real estate would be expected to house a greater

percentage of commuters that could more easily afford the costs associated with car ownership in

the city. It would also be expected that these employed people would be more likely to have jobs

with flexible work hours allowing them to avoid rush hour traffic. An increase in health

insurance increases the percentage of commuter trips using public transportation. This may

indicate that there is a stigma or distrust of public agencies among certain subsets of the

population regardless of whether the agency provides health insurance or access to

transportation.

The framework identified two statistically significant cross-regressive terms. An increase

in the number of buildings in neighboring census tracts reduces the percentage of commuters

using public transportation. This may indicate that people feel safer walking to work if they are

in more built-up areas and therefore wouldn’t rely as heavily on the bus or subway. Also, the

number of buildings might translate to an increased number employment opportunities allowing

a larger percentage of commuters to walk to work as opposed to using public transportation. It

was determined that as the ratio of commercial floor space relative to total floor space of all

buildings increases in neighboring tracts, the percentage of commuters using public

Page | 31

transportation increases. Deliveries and customers arriving to these commercial spaces may

make driving through these census tracts difficult thereby increasing the percentage of

commuters using public transportation options to avoid driving to work.

Lastly, the lagged dependent variable was determined to be statistically significant. This

means that an increase in the percentage of commuters who use public transportation increases in

neighboring tracts increases the percentage of commuters who use public transportation in the

given tract. This impact propagates back from the neighbors’ neighboring tracts, and so on,

providing for global spillovers. These impacts were shown in the raw data quantile and Lisa

plots, Figure 7 and Figure 8, respectively. This means that use of public transportation is

increased in a region if its use is high in neighboring regions, perhaps reflecting social norms.

This also means that social pressures may reduce public transportation use if users feel that their

neighbors aren’t using public transportation. The hypothesis that these global effects were

instead a function of an unobserved underlying spatial process was rejected because the spatial

error coefficient was not statistically significant.

Page | 32

5 RESULTS: SUBWAY RIDERSHIP

A metropolis’s subway system is typically the most expensive public transportation

network. As such it is especially importation to understand the underlying causes of ridership

fluctuations so that demand can be accurately forecasted. The current research focused on the

five year change in ridership at subway stations between 2011 and 2016. The land use

characteristics in the immediate area of the subway station and the socioeconomic characteristics

of those individuals living in the area was used to investigate ridership change. The area

associated with each station was defined according to the shortest arc distance between a location

and subway station. Figure 11 illustrates this process. In the figure, all locations whose closest

subway station (black dots) are considered to be the station’s attributable area (blue polygon).

The socioeconomic and land use characteristics of the census tracts (gray polygons) were joined

with the subway data using GIS applications in the software R (R Core Team, 2017). These

census-tract level characteristics were weighted according the percentage of the attributable area

comprised by each tract. Some manual data cleaning was required to remove census tracts

attributed to a given subway station that did not have a way for the individuals of that census

tract to access the subway station (this typically occurred in the limited instances where a

station’s attributed area crossed a natural boarder such as water).

Page | 33

Figure 11. Subway Station Attributable Area

As is evident in Figure 11, the geographic area attributed to each subway station is much

greater as one moves east from the Bronx and Manhattan to Queens and Brooklyn due to a

decrease in station density. For this reason the subsequent subway ridership analysis was limited

to the boroughs of Manhattan and the Bronx. In these regions individuals have greater access to

the subway network with an increased likelihood of using a station that isn’t the closest to their

residence if it serves a different rail-line thereby reducing an individual’s overall trip duration. It

is therefore expected that spatial processes will be especially pronounced in these boroughs.

Page | 34

5.1 Spatial Autocorrelation

Figure 12 illustrates spatial distribution of the change in subway ridership between 2011

and 2016. A preliminary indication of spatial autocorrelation is evident in the appearance of

spatial clustering of higher values in mid and lower Manhattan and northwestern sections of the

Bronx.

Figure 12. Quantile Plot for Change in Subway Ridership 2011-2016 (in Millions)

Distance-based weights matrices were used to investigate at what distance are the lagged

independent and dependent variables significant. The first defined neighbors as subway stations

separated by less than 1.0 miles the second defined neighbors as stations separated by less than

1.5 miles. An example of 1.0 mile neighbor is provided in Figure 13 where the 8 stations

indicated by gray dots are defined as the neighbors of the station indicated with a black dot.

-0.896

0.033

0.210

0.500

:

:

:

:

0.033

0.210

0.500

7.259

Page | 35

Figure 13. Example Neighbor Diagram (1.0 Mile Distance Matrix)

Figure 14 provides the corresponding histogram of the 1.0 mile connectivity of subway

stations in NYC in terms of the number of neighbors for each station.

Figure 14. Connectivity Frequency Distribution for 1.0 Mile Weights Matrix

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Freq

uen

cy

Number of Neighbors

Page | 36

A plot of the Moran’s I value for the change in subway ridership for 1.0 mile neighbor’s

weight matrix is presented in Figure 15. The Moran’s I was calculated to be 0.096 with a

corresponding z-score of 3.48 and p-value of 0.002 determined using 999 random permutations.

This indicated spatial heterogeneity at a 99.8% level of confidence. This process was repeated

for the 1.5 mile neighbor weights matrix (Figure 16) with a corresponding Moran’s I of 0.069

which is statistically significant at a 99.1% level of confidence.

Figure 15. Moran’s I for Change in Ridership 2011-2016 (1.0 Mile Distance Matrix)

Change in Subway Ridership

Page | 37

Figure 16. Moran’s I for Change in Ridership 2011-2016 (1.5 Mile Distance Matrix)

5.2 Model for Spatial Dependence

The change in subway ridership is best modeled using lagged dependent and independent

variables. It was determined that the spatial lag of the dependent variables was best captured

using a 1.0 mile distance-based weights matrix, whereas the lagged independent variable should

be weighted using a 1.5 mile distance based matrix. The LM test for spatial lag was significant at

a 90% level of confidence whereas the LM test for spatial error was not statistically significant.

The robust LM test for spatial lag and the robust LM for spatial error were both statically

significant at a 95% level of confidence. This indicates that the best model for the data is one that

incorporates spatial lag and may need to incorporate spatial error. A General Spatial Durbin

model (incorporating both spatial lag and error) was estimated using generalized methods of

moments. The spatial error coefficient was determined to be statically insignificant. Therefore

Change in Subway Ridership

Page | 38

the final model specification was a Spatial Durbin with three independent variables, two cross-

regressive variables and a lagged dependent variable. Results of the Anselin-Kelejian Test do not

indicate the presence of un-accounted for spatial processes at a statistically significant level.

Table 3 provides the results of the non-spatial and Spatial Durbin models. The non-spatial model

was calculated to illustrate the importance of the underlying spatial processes. The non-spatial

model had an adjusted r-squared value of 0.414. The inclusion of the lagged variables increased

the model fit as evidenced by the pseudo and spatial pseudo R-squared values of 0.471 and

0.461, respectively. What is important to note is the change in the estimated parameters in the

Spatial Durbin model compared to the non-spatial base case.

Table 3. Commuters Using Public Transit Modeling Specification Results

Response Variable: Change in Ridership from 2011-2016 (in Millions)

Non-spatial Spatial Durbin

Independent Variables Coefficient Coefficient

Constant -0.4111*** -0.3154***

Number of train lines 0.3191*** 0.2969***

Total units (in 1,000s) 0.1403 *** 0.0988**

Mean household income (in $10,000s) -0.0200 *** -0.0390***

Cross-Regressive Variables (1.0 Mile Weights Matrix)

Gross commercial area (in 1,000,000ft2) 0.1000**

Median family income (in $10,000s) 0.0370***

Spatial Lag Variable (1.5 Mile Weights Matrix)

Change in Ridership from 2011-2016 (in Millions) -0.7384*

Model Statistics

Number of Observations 185 185

R-squared 0.423

Adjusted R-squared 0.414

Pseudo R-squared 0.471

Spatial Pseudo R-squared 0.461

***99% Level of Confidence, **95% Level of Confidence, *90% Level of Confidence

Page | 39

Results of the Spatial Durbin model indicate that subway stations that serve more train

lines experienced a greater increase in ridership. This result is especially interesting because

variables that quantified the total ridership in 2011 and average ridership per line in 2016 were

both statistically insignificant. This means that it is not necessary the volume of initial

passengers that is best indication of future demand change, rather it is the extent of services, in

this case the number of available train lines (routes). The results also indicate that stations

located in areas comprised of census tracts with a greater number of tax units (residential,

commercial, etc.) and lower mean household incomes experienced a greater increase in ridership.

The result of the income is consistent with previous research and captures the greater reliance on

subway for lower income households. Two statistically significant cross-regressive variables

were identified. Subway stations located in areas surrounded by census tracts with more

commercial property or higher median family income are expected to have a greater increase in

ridership. Lastly the estimated parameter for the spatial lag variable was negative and statistically

significant. This indicates that the ridership at a given station decrease in response to an increase

in ridership at neighboring stations. This may indicate that a portion of the change in ridership at

a station is due to riders in a region changing which station they use instead of riders shifting

from alternative modes of transportation.

Page | 40

6 SUMMARY AND CONCLUSIONS

In an effort to help metropolitan planning agencies better understand current and future

demand of their public transportation networks this research investigates the underlying social,

economic, and land use factors that impact transit ridership regardless of transit mode, then uses

this insight to estimate specific models to help forecast changes in subway ridership. A spatial

dataset that covered all of the nearly 2,200 census tracts in the city was used as the basis of the

research. The dataset consisted of US Census social and economic data paired with New York

City Department of City Planning land use data and subway ridership data. The results showed

that the underlying causes for variability in public transportation ridership was not constant over

space. Moreover, the spatial processes were determined to be statistically significant in both the

commuter transit and subway ridership model. To account for these spatial processes (spatial

autocorrelation, heterogeneity, and dependence) the models were estimated using the Spatial

Durbin specification.

Census tracts with a higher average commute time, greater employed population, higher

per capita income and lower median household income were determined to have a greater

percentage of commuters using public transportation. Local spatial effects also played a

significant role in public transportation use for work trips. Census tracts were found to have a

greater percentage of commuters using public transportation if neighboring census tracts had

more commercial space or fewer buildings. The lagged dependent variable accounted for global

spatial effects. The significance of this variable paired with the insignificance of the spatial error

term shows that social norms and expectations can influence people’s mode choice.

Examination of subway ridership between 2011 and 2016 shows that subway stations

Page | 41

serving more train lines or are in areas comprised of census tracts with a greater number of tax

units (residential, commercial, etc.) or lower mean household incomes experienced a greater

increase in ridership over the study period. Furthermore, subway stations located in areas

surrounded by census tracts with more commercial property or higher median family income are

also expected to have a greater increase in ridership. Lastly, ridership at a given station decreases

due to an increase in ridership at neighboring stations. This may indicate that a change in

ridership at a station is due, in part, to riders in a region changing which station they use instead

of riders shifting from alternative modes of transportation. The completed research can help

public agencies better address resource allocation by identifying locations that are over or

underperforming in terms of expected ridership or identifying locations for network expansion.

Page | 42

7 ACKNOWLEDGEMENTS

The author would like to thank the Region 2 University Transportation Research Center

for supporting this research endeavor. The author would also like to thank Manhattan College

undergraduate research assistants Nicola Grillo and Conor Varga for their hard work and

contributions. The contents of this paper reflect the views of the author, who is responsible for

the facts and the accuracy of the data presented herein, and do not necessarily reflect the official

views or policies of any of the agencies included in the paper, nor do the contents constitute a

standard, specification, or regulation.

Page | 43

8 REFERENCES

Aguero-Valverde, J., & Jovanis, P. (2008). Analysis of Road Crash Frequency with Spatial

Models. Transportation Research Record: Journal of the Transportation Research Board,

55–63.

Anselin, L. (1988a). Spatial Econometrics: Methods and Models (Kluwer, Dordrecht).

Anselin, L. (1988b). Model Validation in Spatial Econometrics: A Review and Evaluation of

Alternative approaches, International Regional Science Review 11, 279-316.

Anselin, L. (1988c). Lagrange Multiplier Test Diagnostics for Spatial Dependence and Spatial

Heterogeneity, Geographical Analysis, No. 20, 1-17.

Anselin, L. (1995). Local Indicators of Spatial Association—LISA, Geographical Analysis, No.

27, 93-115

Anselin, L. (2006). Palgrave Handbook of Econometrics, Volume 1: Econometric Theory,

Chapter 26: Spatial Econometrics, Edited by Terence C. Mills, Kerry Patterson

Anselin, L., & Kelejian, H. (1997). Testing for Spatial Error Autocorrelation in the Presence of

Endogenous Regressors. International Regional Science, 435-444.

Anselin, L., and Rey, S. (2014). Modern Spatial Econometrics in Practice: A Guide to GeoDa,

GeoDaSpace

Anselin, L., Anil, Bera., Florax, R., & Yoon, M. (1996). Simple Diagnostic Tests for Spatial

Dependence, Regional Science and Urban Economics, 26, 77-104

Anselin, L., Syabri I., & Kho, Y., (2006). GeoDa: An Introduction to Spatial Data Analysis.

Geographical Analysis 38 (1), 5-22

Page | 44

APTA. (2017). Transit Ridership Report Fourth Quarter 2016. American Public Transportation

Association. http://www.apta.com/resources/statistics/Pages/ridershipreport.aspx.

Accessed 07-01-217

Badoe, D., & Miller, E., (2000).Transportation Land Use Interaction: Empirical Findings in

North America, and Their Implications for Modeling. Transportation Research Part D:

Transport and Environment.

Boarnet, M and Greenwald, M. (2000). Land use, urban design, and nonwork travel: reproducing

other urban areas’ empirical test results in Portland, Oregon. Transportation Research

Record: Journal of the Transportation Research Board, 1722, pp. 27-37

Breusch, T., & Pagan, A. (1979). Simple Test for Heteroscedasticity and Random Coefficient

Variation. Econometrica, 1287–1294

Cervero, R. (1993). Mixed Land-Use and Commuting: Evident from the American Housing

Survey, Transportation Research Part A: Policy and Practice. Vol. 30, No. 5, pp 361-

377.

Chakrabortya, A. and Mishrab, S. (2013). Land use and transit ridership connections:

Implications for state-level planning agencies. Land Use Policy. Volume 30, Issue 1, pp.

458-469

Cliff, A., & Ord, J. (1981). Spatial Processes: Models and Applications (Pion, London)

Cook, J., Alon, D., Sanchirico, J., & Williams, J., (2012). Driving Intensity in California:

Exploring Spatial Variation in VMT and its Relationship to the Built Environment.

Western Energy Policy Research Conference, Boise, ID

Erdogan, S. (2009). Explorative spatial analysis of traffic accident statistics and road mortality

among the provinces of Turkey. Journal of Safety Research, 341–351.

Page | 45

Frank, L., Stone, B., & Bachman, W., (2000). Linking Land Use with Household Vehicle

Emissions in the Central Puget Sound: Methodological Framework and Findings.

Transportation Research Part D: Transport and Environment, pp.173–196.

Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and Serial

Independence of Regression Residuals. Economics Letters.

Kain, J. and Liu, Z. (1999). Secrets of success: assessing the large increases in transit ridership

achieved by Houston and San Diego transit providers. Transportation Research Part A:

Policy and Practice, vol. 33 pp. 601-624.

Kirk, A. J., Pigman, J. G., & Agent, K. R. (2005). Socioeconomic Analysis of Fatal Crashes.

Lexington: Kentucky Transportation Center.

Kitamura, R., Mokhtarian, M., and Laidet, L. (1997). A micro-analysis of land use and travel in

five neighborhoods in the San Francisco Bay Area. Transportation, vol. 24 (2), pp. 125-

158.

Kyte, M., Stoner, J., and Cryer, J. (1988). A time-series analysis of public transit ridership in

Portland, Oregon, 1971–1982 Transportation Research Part Al, 22 (5), pp. 345-359

LaScala, E. A., Gerber, D., & Gruenewald, P. J. (2000). Demographic and environmental

correlates of pedestrian injury collisions: a spatial analysis. Accident Analysis &

Prevention, 32(5), 651-658.

Li, L., Zhu, L., & Sui, D. (2007). A GIS Based Bayesion approach for analyzing spatial-temporal

patterns of intra-city vehicle crashes. Journal of Transport Geography, 274-285.

MTA. (2015). NYC Transit Subway Entrance and Exit Data. Metropolitan Transportation

Authority, https://data.ny.gov/Transportation/NYC-Transit-Subway-Entrance-And-Exit-

Data/i9wp-a4ja/data, Accessed: 10/01/2017

Page | 46

MTA. (2017). Introduction to Subway Ridership. Metropolitan Transportation Authority,

web.mta.info/nyct/facts/ridership/, Accessed: 10/01/2017

PLUTO (2017). PLUTO and MapPLUTO. New York City Department of City Planning.

http://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page,

Accessed: 07/01/2017

R Core Team. (2017). R: A Language and Environment for Statistical Computing, R Foundation

for Statistical Computing, Vienna, Austria. https://www.R-project.org

Schneider, R. J., Ryznar, R. M., & Khattak, A. J. (2000). An accident waiting to happen: a

spatial approach to. Accident Analysis and Prevention, 193–211.

Stamatiadis, N. & Puccini, G. (1999). Fatal Crash Rates in Southeastern United States: Why are

They Higher? Transportation Research Record: Journal of the Transportation Research

Board, 118-124.

U.S. Census. (2015). American Fact Finder. United States Census Bureau, U.S. Department of

Commerce, Washington D.C., www. factfinder.census.gov, Accessed: 07/1/2016

Volovski, M. (2015). Spatial analysis of passenger vehicle use and ownership and its impact on

the sustainability of highway infrastructure funding. Dissertation, Purdue University,

West Lafayette, Indiana 47906

White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct

Test for Heteroskedasticity. Econometrica 48 (4): 817–838.

Univ

ersi

ty T

rans

port

atio

n Re

sear

ch C

ente

r - R

egio

n 2

Fund

ed b

y the

U.S.

Dep

artm

ent o

f Tra

nspo

rtat

ion

Region 2 - University Transportation Research Center

The City College of New YorkMarshak Hall, Suite 910

160 Convent AvenueNew York, NY 10031Tel: (212) 650-8050Fax: (212) 650-8374

Website: www.utrc2.org

Download - FinalReport - utrc2.org · Final March 2018 University Transportation Research Center - Region 2 Report Performing Organization: Manhattan College The Spatial Effect of Socio-Economic

Top Related