open government data and mongodb

Post on 21-Jan-2015

1.363 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Given at MongoDC on June 27, 2011.

TRANSCRIPT

Open Government Data & MongoDB

Luigi Montanezluigi@sunlightfoundation.com

Question? @LuigiMontanez

Question? @LuigiMontanez

Open Data + Open Source = Open Government

Question? @LuigiMontanez

MongoDB enablesopen data

Question? @LuigiMontanez

Opening Up Data

✴ Gather data from disparate sources✴ Data dumps (SQL, Fixed-width columns)✴ Web scraping✴ Text/PDF parsing

✴ Serving RESTful JSON APIs

Question? @LuigiMontanez

JSON

✴ Tree structure, not tabular✴ Still relational✴ JSON for data, XML for documents✴ Closely resembles native data structures✴ No manual parsing needed

Question? @LuigiMontanez

Three Projects

✴ Poligraft✴ Real Time Congress API✴ Open State Project

Question? @LuigiMontanez

Three Projects

✴ Poligraft✴ Real Time Congress API✴ Open State Project

Question? @LuigiMontanez

App designdrives

schema design

Text

{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com"}

Text

{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",

"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................",

}

Text

{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",

"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................","entities": [...]

}

Text

{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",

"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................","entities": [

{"name": "Barack Obama","type": "politician",},...

]}

Text

{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",

"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................","entities": [

{"name": "Barack Obama","type": "politician","breakdown": {"indiv": "33", "pac": "67"}"top_industries": ["Lawyers/Lobbyists","Finance/Insurance/Real Estate","Misc. Business"]},...

]}

Question? @LuigiMontanez

Natural Schemas

Question? @LuigiMontanez

Three Projects

✴ Poligraft✴ Real Time Congress API✴ Open State Project

Real-Time Congress API

Credit: vgm8383 on Flickr

Android App: “Congress”

Politiwidgets

Question? @LuigiMontanez

Requirements✴ Aggregate lots of data

Biographical, Bills, Votes, Earmarks, Video Clips, Floor Updates, Legislative Documents, Committee Schedules, Contributions, Interest Group Ratings

✴ Lightweight responses

{legislator: { in_office: true, title: "Rep", nickname: "", district: "9", bioguide_id: "L000551", govtrack_id: "400237", phone: "202-225-2661", website: "http://lee.house.gov/index.html", twitter_id: "", last_name: "Lee", name_suffix: "", last_updated: "2010/04/13 00:00:14 +0000", party: "D", chamber: "house", state: "CA", youtube_url: "http://www.youtube.com/RepLee", first_name: "Barbara", gender: "F", congress_office: "2444 Rayburn House Office Building", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } ...}

// limit selection to a subset of fieldsdb.people.find( { 'first_name' : 'john' }, { 'last_name' : 1, 'address' : 1 } );

// use dot-notation to dig into an objectdb.people.find( { 'state': 'CA' }, { 'address.zip_code': 1 } );

{legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, }}

?sections=last_name,first_name,state,earmarks

{legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { total_amount: 10000000, total_number: 28 }}

?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number

Question? @LuigiMontanez

Partial responses make payloads

smaller

Question? @LuigiMontanez

Three Projects

✴ Poligraft✴ Real Time Congress API✴ Open State Project

Question? @LuigiMontanez

50 States =50 Formats

Question? @LuigiMontanez

Schemalessness allows for granular

control

Question? @LuigiMontanez

Custom Fields✴ Traditional RDBMS

✴ Update the schema for new fields, run a migration, feel icky

✴ Create a custom_fields table✴ MongoDB

✴ Just store it

Question? @LuigiMontanez

Speaking JSONnatively

Source Scraped JSON PythonTransform PostgreSQL

Source Scraped JSON MongoDB

Question? @LuigiMontanez

Three Projects

✴ Poligraft✴ Real Time Congress API✴ Open State Project

Developer Happiness

Question? @LuigiMontanez

Thanks!sunlightlabs.com@LuigiMontanez

top related