creating countdracula: an open source counts management tooldocs.trb.org/prp/14-4625.pdf · 17...

18
D. Tischler, L. Zorn and E. Sall Page 1 of 18 Creating CountDracula: an Open Source Counts Management Tool Daniel Tischler* San Francisco County Transportation Authority Tel: 415-593-1661 Fax: 415-522-4829 Email: [email protected] 1455 Market Street, 22 nd Floor San Francisco, CA 94103 Lisa Zorn San Francisco County Transportation Authority Tel: 415-593-1660 Fax: 415-522-4829 Email: [email protected] 1455 Market Street, 22 nd Floor San Francisco, CA 94103 Elizabeth Sall San Francisco County Transportation Authority Tel: 415-522-4810 Fax: 415-522-4829 Email: [email protected] 1455 Market Street, 22 nd Floor San Francisco, CA 94103 * indicates corresponding author Submission Date: November 15, 2013 Word Count: 5,054 + 8 Figures x 250 = 7,054 Submitted for presentation at the 2014 Transportation Research Board Annual Meeting. This paper is for the Transportation Planning Applications Committee to review. TRB 2014 Annual Meeting Paper revised from original submittal.

Upload: vancong

Post on 10-Nov-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

D. Tischler, L. Zorn and E. Sall Page 1 of 18

Creating CountDracula: an Open Source Counts Management Tool

Daniel Tischler* San Francisco County Transportation Authority Tel: 415-593-1661 Fax: 415-522-4829 Email: [email protected] 1455 Market Street, 22nd Floor San Francisco, CA 94103 Lisa Zorn San Francisco County Transportation Authority Tel: 415-593-1660 Fax: 415-522-4829 Email: [email protected] 1455 Market Street, 22nd Floor San Francisco, CA 94103 Elizabeth Sall San Francisco County Transportation Authority Tel: 415-522-4810 Fax: 415-522-4829 Email: [email protected] 1455 Market Street, 22nd Floor San Francisco, CA 94103

* indicates corresponding author

Submission Date: November 15, 2013 Word Count: 5,054 + 8 Figures x 250 = 7,054 Submitted for presentation at the 2014 Transportation Research Board Annual Meeting.

This paper is for the Transportation Planning Applications Committee to review.

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 2 of 18

Abstract 1

One of the most persistent problems faced by the San Francisco County Transportation Authority (the 2 Authority) is that of handling a growing collection of counts. Traffic, pedestrian and bicycle counts have 3 been collected by staff, consultants and sister agencies for numerous planning studies at various locations 4 in San Francisco over the years. But how should these counts be organized? Some are in Excel 5 workbooks of varying and spontaneous formats, others consist of scanned handwritten documents, and 6 finally some are simply on paper. 7

Since the modeling team at the Authority has a continuous need for these counts in order to calibrate and 8 validate the travel demand model as well as to inform model development, these counts have come under 9 the team’s purview. After a couple of failed attempts to standardize Excel formats and directory 10 structures, the modeling team decided to modernized its counts management system. The Authority first 11 explored proprietary software products, but found these solutions to be too expensive, cumbersome, or 12 inflexible. Instead, Authority staff embarked on developing CountDracula, an open source counts 13 management tool. The aim of CountDracula is to make uploading, downloading and querying counts 14 easy for Authority staff as well as other interested parties outside the organization. The CountDracula 15 code base has been designed to be reusable by other agencies with similar needs, and it is built on 16 GeoDjango, a geographic web framework. 17

CountDracula includes a web-based map user interface for visualizing the locations of counts (and as a 18 side effect, the locations of where more counts are needed), and it includes a query interface so that 19 specific types of counts can be batch-downloaded (for example, midweek counts from the last three 20 years). As it was developed by a modeling team, there is a specific emphasis on counts seamlessly 21 interfacing with model transportation networks. Counts can also be uploaded using this interface, and 22 moderated through an admin interface. This paper explains the development of Count Dracula, and 23 explores how CountDracula fits into the open data and open source movement. 24

25

26

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 3 of 18

Motivation and Background 1

Like many cities, San Francisco tries to adhere to data-driven decision-making about the most effective 2 ways to invest in its transportation system. The city has a multitude of databases that represent the transit 3 network (San Francisco Municipal Transportation Agency n.d.) as well as a state-of-the art travel demand 4 model (Outwater and Charlton 2008, Sall, Bent, et al. 2010, Zorn, Sall and Wu 2012) and citywide 5 dynamic traffic assignment model (Sall, Erhardt, et al. 2013) to forecast various future scenarios. 6 However, despite its location near Silicon Valley and possession of high-tech forecasting technology, San 7 Francisco did not have any comprehensive system for measuring the use of its roadways by walking, 8 cycling, or automobile. Specific projects had conducted numerous traffic counts, but there was no 9 systematic or electronic way of storing these counts - and those that were stored in a central location 10 lacked meta-data, context, and were often rolled up into single numbers thereby losing the richness of the 11 original data. Answering seemingly simple questions such as “how bad was traffic in the Northwest part 12 of San Francisco in 2005 compared to 2000” would have been a several week-long task, and there was no 13 hope of being able to do simple mappings of spatial and temporal trends across the years. The most 14 accessible form of the count data was in an excel file used to validate the SF-CHAMP travel demand 15 model, which was hardly user-friendly. 16

As the San Francisco County Transportation Authority (the Authority) looked towards being able to 17 validate its Dynamic Traffic Assignment model (Parsons Brinckerhoff; San Francisco County 18 Transportation Authority 2012) with 15-minute count data for the entire city, it became obvious that 19 investment in a robust count database tool would not only save time for that specific model development 20 project, but it could very easily be adapted to be useful to a variety of users including transportation 21 planners investigating travel patterns for a neighborhood study, consultants doing traffic impact analysis 22 and environmental review, and in the spirit of San Francisco’s open data policy (Board of Supervisors, 23 City and County of San Francisco 2010), the general public. 24

Design Considerations 25

The Authority team developed a list of user-requirements for the count database, née CountDracula, based 26 on four user types: local agency transportation planner/engineer, travel modeler, traffic engineering 27 consultant, and the general public. The goals of CountDracula are to: 28

(1) Store count data electronically in a single location (as opposed to emailing a dozen people to ask 29 them to check their file cabinets). 30

(2) Allow universal access to the data via web-based user interface. 31

(3) Query count data to find specific times, dates, or locations both efficiently (via application 32 programming interface –API-- access) and intuitively (web- based Graphical User Interface, or 33 GUI). 34

(4) Download data into commonly used data formats (such as the Universal Traffic Data Format, or 35 UTDF) or structured data. 36

(5) Include varying levels of account permissions (to allow for some users to write, and others to just 37 read). 38

(6) Allow user-uploads of data in a variety of formats that integrate into existing workflows. 39

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 4 of 18

(7) Maintain important meta-data such as: count methodology, associated projects, requesting 1 agency, etc. 2

The main technology consideration was to use existing tools as much as possible in order to minimize 3 both the development and maintenance burden. The Authority team first sought to find an existing tool 4 and considered the following solutions, although they all failed to meet one or more of our requirements 5 (Midwestern Software Solutions n.d., Cambridge Systematics n.d., California Department of 6 Transportation n.d.),. After doing a thorough review of existing options, the Authority decided to create 7 CountDracula, our own open-source count data management system. In particular, many existing systems 8 lack API-access to the data so while it was easy to query a single count, it was more difficult to 9 seamlessly link the data to our travel model, our DTA model, and to do flexible queries and data analysis 10 (e.g. what are all the counts that have more than 10% trucks? or what is the typical standard deviation of 11 the P.M. peak hour in the northwest part of San Francisco?). 12

Implementation 13

CountDracula utilizes existing tools as much as possible in order to maximize efficiency and minimize 14 code maintenance. After implementing an initial version of CountDracula in pure Python that directly 15 issued SQL commands to a PostgreSQL database, the development team realized that resources would be 16 used more efficiently by leveraging existing tools and frameworks that already accomplish: database 17 manipulation, user-friendly account provisioning and an administrator interface, and webpage templating 18 (custom views). After considering a few options, the Authority team settled on Django, a Python web 19 framework with the following appealing features (Django Software Foundation n.d.): 20

● Django is open source. This means that the team can use it for free, and in turn, that other 21 CountDracula users will also be able to use it for free. It also means that users can see “under the 22 hood” to understand what is going on. 23

● Django has an active development community and it is backed by its own non-profit 24 organization. This means that Django is well peer-reviewed and tested and that the CountDracula 25 development team can rely on the Django community to do frequent maintenance and keep the 26 code base relevant and up-to-date. 27

● Django is in Python, which is highly readable and Authority Staff are already proficient in it. 28

● Django has a high quality object-relational mapper, so that data models can be defined entirely in 29 Python and writing SQL queries can be avoided. 30

● Django comes with an automatic admin interface, elegant URL design, a template system that 31 allows us to customize views, and caching to improve performance. 32

● Django has an add-on, GeoDjango, which adds a spatial data types and enables efficient spatial-33 based queries and uses PostGIS (the spatial add-on to PostgreSQL). 34

Models 35

In Django, an object and its attributes are defined in “a model”; these typically map to a single database 36 table. Models may also have relationships defined with other models. The CountDracula models and 37 their attributes are as follows: 38

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 5 of 18

Node: This object represents an intersection or physical location of interest. 1

● point: The location of the intersection. This is a PointField, which is a GeoDjango field 2 type. 3

StreetName: This object represents a textual street name in the network. As a convention, our import 4 scripts store all StreetName fields as all upper case. 5

● street_name: This is the standard full street name, including spaces and an abbreviated generic 6 suffix (St, Rd, Ave, etc.) e.g. “CESAR CHAVEZ ST” 7

● nospace_name: This is a version with the spaces stripped. This is used for matching in situations 8 where the spacing is ambiguous. e.g. “CESARCHAVEZST” 9

● short_name: The street name without the generic suffix. e.g. “CESAR CHAVEZ” 10

● suffix: The generic suffix, abbreviated. e.g. “ST” 11

● nodes: The relationship between StreetName instances and Node instances is many-to-many. 12 This enables look-ups in both directions: one can query an intersection node for the street names 13 of those streets that meet at that intersection, or query a street name for all the nodes 14 corresponding to the street. 15

TurnCountLocation: This object represents a location for a turn count. 16

● from_street: A reference to the StreetName instance describing the street from which this turn 17 originates. 18

● from_dir: The direction going into the turn (one of N,S,E or W). 19

● to_street: A reference to the StreetName instance describing the street to which the turn is 20 destined. 21

● to_dir: The direction coming out of the turn (one of N, S, E or W). 22

● intersection_street: A StreetName reference for identifying a cross street at the turn count 23 intersection location, in case the turn movement is a through movement, and the from_street is 24 the same is the to_street. 25

● intersection: A Node reference for the turn count location. 26

TurnCount: This object represents a single turn count or an average of a set of turn counts. 27 CountDracula does allow count averages, although they are not preferred, in cases where the raw count 28 data is not available and only the average is available. 29

● location: A reference to the TurnCountLocation where this count was observed. 30

● count: The count itself. This is a decimal number because it may represent an average. 31

● count_date: The date that the count was collected. This can be null if the count is an average 32 over multiple dates. 33

● count_year: The year that the count was collected; this is a required field. 34

● start_time: The start time for the count period. 35

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 6 of 18

● period_minutes: This is an integer and it represents the period of time over which the count 1 took place. 2

● vehicle_type: This is an integer code representing the vehicle type. 3

● sourcefile: This is the name of the source file from which the count was imported. These files 4 will be archived for the system, for situations when a count needs to be investigated for possible 5 misinterpretation or error. 6

● project: A string for tracking for what project the count was collected, if any. 7

● upload_user: A reference to the user (a built-in Django model) that uploaded the count data 8

MainlineCountLocation: this object represents a location for a mainline count. 9

● on_street: A reference to the StreetName instance describing the street on which the 10 mainline count is observed. 11

● on_dir: The direction of the link on which the count is observed (one of N, S, E or W). 12

● from_street: A reference to a StreetName instance describing a cross street upstream of the 13 count location. 14

● from_int: A reference to a Node instance at the intersection of on_street and from_street. 15

● to_street: A reference to a StreetName instance describing a cross street downstream of the 16 count location. 17

● to_int: A reference to a Node instance at the intersection of on_street and to_street. 18

MainlineCount: This object represents a single mainline count or an average of a set of mainline 19 counts. These can be average counts, but raw counts are preferred. 20

● location: A reference to the MainlineCountLocation where this count was observed. 21

● count, count_date, count_year, start_time, period_minutes, vehicle_type, 22 sourcefile, project, upload_user: 23 these are the mainline versions of the same 24 variables in the TurnCount model 25

● reference_position: How far along the 26 link the count was actually observed. A value 27 of -1 represents unknown. 28

Tables 29

Given the model definitions discussed above, Django 30 automatically creates the PostGIS database tables to 31 support them and their relationships. Additionally, 32 Django creates some additional tables for users, 33 groups, and permissions, as well as for sessions and 34 logging. 35

Figure 1 Login screen for CountDracula; this functionality comes built-in to Django

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 7 of 18

User Interface 1

As previously mentioned, Django automatically creates a web-based admin interface for viewing and 2 editing the data. This interface comes with a user login system, as well as basic edit screens for each 3 model. Figure 1, Figure 2 and Figure 3 show some examples of these admin interface web pages. 4

5 Figure 2 This is the view of the StreetName instances.

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 8 of 18

1 Figure 3 The edit view for a StreetName instance.

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 9 of 18

1

2

On top of the basic Django administrator interface, CountDracula adds two additional views. The first is 3 the map view, shown in Figure 4, which shows all of the available data on a map, giving users a more 4 intuitive understanding of the location of counts. Mainline count locations are indicated by a line 5 segment, while turn count locations are indicated by a node at the intersection of the movement. This 6 view includes filtering widgets along the left hand side, so that users can see where counts have been 7 collected for various vehicle types or for a specific set of years. Additionally, users can enter in an address 8 or intersection, and filter count locations to those within a specific radius of that point (Figure 5). Finally, 9 the raw counts can be downloaded as a comma-separated value file, for all mainline or turn count 10 locations or for those locations filtered by the widgets. 11

Figure 4 CountDracula map view of count locations.

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 10 of 18

1

2

Figure 5 Map view with year- and location-based filtering of count locations.

Figure 6 Upload view

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 11 of 18

The second custom CountDracula view is the upload view, shown in Figure 6. This view allows logged 1 in users to upload a workbook of count data of a predetermined format, described in a later section. 2 CountDracula parses this count data and stores it, if no errors are found during processing. If errors are 3 found, they are reported back to the user for correction. In the future, a process for moderation or 4 validation of uploaded data could be implemented. 5

Data Population 6

Although CountDracula was written to be a generic counts management tool for any interested party with 7 similar needs, in order to create an installation of the system that fit our needs, some San Francisco-8 specific scripts were created to import data. First, a script called 9 insertSanFranciscoIntersectionsFromCube.py gets called to: 10

1. Read the Authority’s macroscopic static assignment network. This is in the Citilabs Cube 11 Voyager .network format (Citilabs n.d.), but it could be easily adapted to read any format that 12 exports a series of nodes (locations) and links (attributes for an ordered pair of nodes). 13

2. The nodes are added as new Node instances to CountDracula. Nodes that are outside of San 14 Francisco (such as those in neighboring counties) are skipped. 15

3. The street names for links are added as StreetName instances to CountDracula. Additionally, 16 the relationship between the StreetName instances and their corresponding Nodes is added. 17 Links that are outside of San Francisco are skipped, as are links that are missing street names. 18

Following this, several scripts are also run to import various types of data into CountDracula. One of 19 them reads downloaded data from the Caltrans Performance Measurement System, or PeMS (California 20 Department of Transportation n.d.). This data set is an aggregate of data for all of calendar year 2010, 21 cleaned and restricted to non-holiday mid- weekdays and it is already being used for 2010 vehicular 22 traffic model validation of SF-CHAMP. A second set of legacy data is similarly imported by another 23 script, which imports 2007-2008 traffic counts for major routes at county border locations collected for 24 the Metropolitan Transportation Commission, the regional metropolitan planning organization. 25

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 12 of 18

1

2

The third import script is the most general, importing an excel workbook file format that was devised to 3 be a generic template representing how raw traffic counts are often stored. Figure 7 shows an example of 4 this format; each separate day of counts is on its own worksheet and within each sheet, each row 5 represents a time period. Blank lines separate sections for different vehicle types. This generic count 6 format workbook is the type that is accepted by the current web-based upload mechanism discussed 7 above. 8

Deployment Environment 9

At the time of this writing, CountDracula requires the following tools to run: 10

● Django and GeoDjango, described in detail above 11

● PostgreSQL: an open source object-relational database system 12

● PostGIS: an open source add-on that spatially enables postgreSQL with GIS functionality 13

● Apache: an open source HTTP server; this is necessary to provide the web-based user interface 14

● python: the open source programming language on which Django, GeoDjango and CountDracula 15 are built 16

Figure 7 Standardized movement count input spreadsheet.

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 13 of 18

● modwsgi: WSGI (the Web Server Gateway Interface) is a specification for web servers to 1 communicate with web applications. modwsgi is a way for Apache to run a web application in 2 python, and it’s the recommended way to get Django to work with Apache. 3

Additionally, the following python modules are required: 4

● psycopg: a PostgreSQL adaptor for Python 5

● xlrd: a python module for reading Microsoft Excel files 6

● python-memcached: a memory-based caching framework to improve performance 7

Creating Open Source Tools at a Public Agency 8

Why open source 9

Over the past few years, the Authority has increasingly been developing tools in the open source domain. 10 This has been for a couple of reasons. Many of the Authority’s modeling projects are funded by sales tax 11 dollars and Federal Highway Administration money, and many of them address common problems facing 12 transportation agencies and modelers. By making these projects open source, the Authority team hopes 13 that these public funds will not be spent doing something more than once, and that other interested 14 municipalities and teams can fork or improve on these projects rather than starting from scratch. In turn, 15 when other teams can spend resources improving these shared tools, then everyone will benefit, including 16 the Authority. 17

Additionally, creating open source projects has other advantages. It allows the Authority team to be 18 flexible with consultants because these tools are not proprietary, nor do they include proprietary 19 components. It also creates incentive for these projects to be less vendor-specific. Having an open source 20 project encourages the development team to follow best practices for some tasks which are more easily 21 neglected in a closed environment, such as thorough documentation and portability. Finally, even if other 22 public agencies or interested parties do not use the codebase itself, developing the project in an open 23 source environment means that the codebase and the process can still serve as a reference and help to 24 expose the project’s lessons learned. 25

Why in-house development 26

While the Authority team does recognize that this is not typically work done in-house by other agencies, 27 the decision to develop an initial version in-house was made for several reasons. First, the Authority team 28 had appropriate in-house resources: many of the count data files had already been standardized for 29 previous script work, and the programming work was minimal once the decision was made to use the 30 Django (and GeoDjango) framework. Given this relatively low level of effort needed to get an initial 31 version running, the overhead of writing a scope of work and hiring outside consultants would likely have 32 been more work. Further, the consulting teams that were readily available to the Authority team did not 33 necessarily have particular expertise in the technology involved in this project. 34

Second, since the first use of CountDracula was the DTA Anyway project which was already being 35 managed in-house, the Authority team had a unique understanding of its requirements. In addition, 36 developing this initial version of CountDracula in-house was more flexible to changing demands and 37

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 14 of 18

needs. For future development, the Authority team will seek outside assistance from both collaborators at 1 other agencies and/or from consultants for implementing future features. 2

Complementary Projects 3

As mentioned previously, the initial impetus for the development of CountDracula was the development 4 of San Francisco’s citywide DTA model, which benefited from a large selection of count data for model 5 validation (over 1,100 counts were used for this, with 15-minute movement counts making up the 6 majority). Model validation in general is an excellent use case for count data, and CountDracula 7 complements these projects easily by giving users an API to the dataset, so scripts can algorithmically 8 relate the counts to their modeled links and movements. 9

Other planning/modeling projects which could complement CountDracula include: 10

(1) Pedestrian/Bicycle models. When analyzing pedestrian and bicycle crash data, it is also useful to 11 take exposure into account, which means that vehicle volumes as well as bicycle and pedestrian 12 volumes must be analyzed as well. CountDracula will be helpful for both of these datasets. 13

(2) Transit service analysis. While transit vehicle data (such as automatic vehicle location data and 14 automatic passenger count data) are necessary for transit service analysis, other information on 15 local streets is also relevant. Conflict with auto vehicles, pedestrians and bicyclists likely has an 16 effect on transit performance, and CountDracula could assist in this analysis. 17

(3) On-the fly micro-simulation network creation. Acquiring detailed counts in the study area is 18 crucial for creating an accurate micro-simulation network. By accessing count data from 19 CountDracula, one could map these counts to a study area in an automated fashion (including 20 automatic network balancing), which could lead to possibilities like on-the-fly micro-simulation 21 creation. 22

The Future 23

CountDracula’s Future 24

Since CountDracula is an open source tool, the Authority team hopes that other users and collaborators 25 will become involved and help define the future of CountDracula. Some priorities have been discussed in 26 the context of San Francisco’s needs. For example, pedestrian counts are not currently included in 27 CountDracula, but they are critical to the transportation landscape in San Francisco, and so the addition of 28 pedestrian counts is a high priority. These counts are typically observed as crossings at intersections, so 29 the counts and their locations would need to be defined in a new model. Other vehicular data could be 30 included as well, such as vehicle speed data, enabling CountDracula to support Congestion Management 31 Programs and other types of model validation. 32

There is also room for improvement in the admin interface of CountDracula and many of the pages would 33 benefit from a more map-based representation of points to check that imported data looks correct. The 34 StreetName admin page, for example, would benefit from showing a map of all of relevant Node 35 instances rather than a textual list of them (Figure 2, above). 36

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 15 of 18

Additionally, the current representation of street 1 direction could be re-examined. In San Francisco, 2 where most blocks are short and most streets are 3 straight and form a grid-like pattern, defining links 4 using the simple 4-cardinal directions works 5 reasonably well. However, in other cities where 6 streets may wind and change direction, this may be too 7 ambiguous and require a more nuanced handling. 8 Indeed, in some parts of San Francisco, where the grid 9 alignment is at an angle, the Authority team had to 10 choose a convention and call all northwest-bound 11 streets northbound instead of westbound, for example. 12

Finally, CountDracula should be expanded to handle 13 more input and output formats that are used by other 14 municipalities, such as the Universal Traffic Data 15 Format (UTDF). CountDracula also needs to have an 16 easier installation process (such as one for a generic 17 cloud server, like an Amazon EC2 instance) as well as 18 a tutorial for typical setup and API usage. In this way, 19 CountDracula instances will be easily set up and 20 populated for other cities and regions. 21

Open Data Management’s Benefits 22

In the last five years, providing public access to data 23 has been an emphasis of government on both the 24 federal and local levels. On January 21, 2009, 25 President Obama released the Transparency and Open 26 Government memorandum, asking executive 27 departments and agencies to “establish a system of 28 transparency, public participation, and collaboration.” 29 (Obama 2009) On October 21, 2009, then Mayor 30 Gavin Newsom responded on behalf of the city of San 31 Francisco with Executive Directive 09-06 on Open Data, which stated: “This Directive will enhance open 32 government, transparency, and accountability by improving access to City data that adheres to privacy 33 and security policies. Data which often resides in technology systems is unique from information like 34 documents, emails and calendars in that it is structured and can be used by other computer applications 35 for analysis or new uses such as mapping.” (Newsom 2009) The San Francisco Board of Supervisors 36 expanded on this directive by adding Section 22D into the Administrative Code in 2010 (with an 37 amendment in 2013 to create the position of Chief Data Officer). Section 22D lists the benefits of an 38 open data policy (Board of Supervisors, City and County of San Francisco 2010): 39

(1) enhanced government transparency and accountability; 40

(2) development of new analyses or applications based on the unique data the City provides; 41

Figure 8 CountDracula Process/User Diagram

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 16 of 18

(3) mobilization of San Francisco’s high-tech workforce to use City data to create useful civic tools 1 at no cost to the City; and 2

(4) creation of social and economic benefits based on innovation in how residents interact with 3 government stemming from increased accessibility to City data sets. 4

On the community side, groups have also jumped in to help fill this gap to connect data with those 5 interested in making use of the data (localdata n.d.). Open data policies and practices are gaining 6 momentum in San Francisco and elsewhere, and CountDracula supports this vision. 7

Hosting data with both an easy-to-use point-and-click front end coupled with flexible and powerful API 8 access to the database not only greatly expands the audience of potential users, but also alleviates a 9 burden normally placed on public employees to provide timely responses to custom data requests. 10

Open Source Development’s Benefits 11

Many public agencies have similar needs for planning and analysis tools, and building and maintaining 12 those tools internally leads to inefficient resource allocation and redundant expenditures. To the extent 13 possible, public agencies should collaborate on these tools that are likely to have shared benefits by 14 embracing open source development policies, thereby making public funds go farther. Many agencies are 15 doing this already by becoming involved in open source initiatives; see the referenced website for a 16 partial list: (Collaboration, Open Solutions, and Innovation n.d.). In order for an open source project to be 17 successful, it is not sufficient to simply push the code to a public server. Agencies need to share tools, 18 methods, and resources to develop the project, and joint funding mechanisms are required. To deal with 19 the legal hurdles involved in this type of collaboration in the urban planning space, the Open Source 20 Planning Initiative has been formed (Open Source Planning Initiative n.d.) The Mission Statement of this 21 organization is as follows: 22

“The Open Source Planning Initiative supports innovation, development, and sharing in open source 23 urban planning software. This foundation was formed out of an on-going need identified by urban 24 planning practitioners across the nation to focus on innovation rather than rebuilding the same tools over 25 and over again. This foundation will serve as an independent legal entity to which community members 26 can contribute code, funding, and other resources, secure in the knowledge that their contributions will 27 be maintained for public benefit.” 28

Future development of CountDracula will likely be performed under the Open Source Planning Initiative. 29

Use by Others 30

CountDracula should be directly transferable to use in any other jurisdiction or situation that would want 31 to use the same data structure and that has a list of nodes with intersecting street names. The setup batch 32 file included in the open-source codebase can be easily modified to point to local files and directories. 33 Over time, as more jurisdictions use this tool, they can add more functionality to the system (i.e. more 34 query types, more GUI features, more file types). 35

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 17 of 18

Acknowledgements 1

Special thanks to former intern Varun Kohli for developing the initial version of CountDracula. 2

References 3

Board of Supervisors, City and County of San Francisco. "Ordinance amending the Administrative Code, 4 Sections 22D.2 and 22D.3, relating to San Francisco's open data policies and procedures and 5 establishing the position and duties of Chief Data Officer and Departmental Data Coordinators." 6 March 28, 2013. 7 http://sfbos.org/ftp/uploadedfiles/bdsupvrs/committees/materials/gao_032813_121017.pdf. 8

—. "Ordinance No. 293-10: Addition of Section 22D to Administrative Code - Open Data Policy." Oct 9 28, 2010. http://www.sfbos.org/ftp/uploadedfiles/bdsupvrs/ordinances10/o0293-10.pdf. 10

California Department of Transportation. Caltrans performance Measurement System (PeMS). n.d. 11 http://pems.dot.ca.gov/. 12

Cambridge Systematics. "Traffic Information Management System: Managing Traffic Counts in New 13 York City." n.d. http://www.camsys.com/traffic_count.htm. 14

Citilabs. "Citilabs Cube Voyager." n.d. http://www.citilabs.com/products/cube/cube-voyager. 15

Collaboration, Open Solutions, and Innovation. "Open Source - State & Local Govt." 16 https://sites.google.com/site/cosiopengovt/. n.d. 17 https://sites.google.com/site/cosiopengovt/home/open-source---state-local-govt. 18

Django Software Foundation. n.d. https://www.djangoproject.com/ (accessed 2013). 19

localdata. "About localdata." n.d. http://localdata.com/about.html. 20

Midwestern Software Solutions. "Traffic Count Database System." n.d. 21 http://www.ms2soft.com/trafficcountdata.aspx. 22

Newsom, Gavin. "Executive Directive 09-06: Open Data." October 21, 2009. 23 http://sfmayor.org/ftp/archive/209.126.225.7/executive-directive-09-06-open-data/index.html. 24

Obama, Barack. "Transparency and Open Government Memorandum." January 21, 2009. 25 http://www.gpo.gov/fdsys/pkg/FR-2009-01-26/pdf/E9-1777.pdf. 26

Open Source Planning Initiative. n.d. http://osplanning.org/. 27

Outwater, Maren L., and Billy Charlton. "The San Francisoc Model in Practice: Validation, Testing, and 28 Application." Innovations in Travel Demand Modeling: Summary of a Conference. Volume 2: 29 Papers, Number 42 in Transportation Research Board Conference Proceedings. 2008. 24-29. 30

Parsons Brinckerhoff; San Francisco County Transportation Authority. "San Francisco Dynamic Traffic 31 Assignment Project 'DTA Anyway." Final Calibration & Validation Report, San Francisco, 2012. 32

TRB 2014 Annual Meeting Paper revised from original submittal.

D. Tischler, L. Zorn and E. Sall Page 18 of 18

Sall, Elizabeth, Elizabeth Bent, Billy Charlton, Jesse Koehler, and Greg Erhardt. "Evaluating Regional 1 Pricing Strategies in San Francisco--Application of the SFCTA Activity-Based Regional Pricing 2 Model." Transportation Research Board 89th Annual Meeting. Washington DC: Transportation 3 Research Board, 2010. 4

Sall, Elizabeth, Gregory Erhardt, Lisa Zorn, Renee Alsup, and Dan Tischler. "Modeling Every Hill, Bus, 5 Traffic Signal, and Car - How San Francisco Collaboratively Built a Citywide Dynamic Traffic 6 Assignment Model." 14th TRB National Transportation Planning Applications Conference. 7 Columbus, OH, 2013. 8

San Francisco Municipal Transportation Agency. Raw AVL/GPS data. n.d. 9 https://data.sfgov.org/Transportation/Raw-AVL-GPS-data/5fk7-ivit. 10

Zorn, Lisa, Elizabeth Sall, and Dan Wu. "Incorporating crowding into the San Francisco activity-based 11 travel model." Transportation 39, no. 4 (2012): 755-771. 12

13

TRB 2014 Annual Meeting Paper revised from original submittal.