big, open, data and semantics for real-world application near you
TRANSCRIPT
Big Open Data and Semantics for a Real-World Application Near You
Dr. Biplav Srivastava, IBM Research – India
Keynote Talk at AMECSE 2014 on 21 October 2014
The Distinguished Speakers Program is made possible by
For additional information, please visit http://dsp.acm.org/
About ACM
ACM, the Association for Computing Machinery is the world’s largest educational and scientific computing society, uniting educators, researchers and
professionals to inspire dialogue, share resources and address the field’s challenges.
ACM strengthens the computing profession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical
excellence.
ACM supports the professional growth of its members by providing opportunities for life-long learning, career development, and professional
networking. ������
With over 100,000 members from over 100 countries, ACM works to advance computing as a science and a profession. www.acm.org
Real-World Applications of ICT: Ingredients
! Data – Available, Consumable with Semantics, Visualization / Analysis
! Access - APIs, Apps (Applications), Usability - Human Computer Interface
! Value – Providing benefits that matter, to people most in need of, in a timely and cost-efficient manner. Going beyond technology to process and people aspects.
Running Example – Data from Conference
! Data – Technical Program
! Access – Website
! Value – To participants, organizers and wider ecosystem
Thought: Can any real-world application immediately benefit from data created at this event?
Outline
! “Big Result” ! IBM’s Watson Q-A System: Intersection of Big Data, Analytics and Human Computer Interaction
! “Small Problem” – do it repeatedly and rapidly for key city services ! Data challenge: Make data available freely; Give semantics to data
! Open: World Wide Web Consortium, Data.gov movement ! Semantic: Linked Open Data, Ontologies
! Access - APIs: standards based access, composition ! Value - application challenge: Give benefit to citizens; create business opportunities
! Emerging Examples of Societal Applications with Analytical (AI) Techniques and Open Government Data ! Tourism: attract people to visit for new experiences and spend their money as well
! Traffic: make public transportation attractive for commuting even without physical sensors ! Corruption: predictable, uniform, public services ! Public Health (covered more later in panel): reduce disease impact
! Not covered: Environment, Water, Public Safety, Energy, … !
Call for action ! Make your data available in usable manner
Use more open data in your ongoing work (apps, research, monitoring, …) ! Build apps and make them available by citizens and other stakeholders
Big Result: Watson
7
Slides Courtesy: IBM Watson Team
Technical details: Ferrucci, D, et al. (2010), "Building Watson: An Overview of the DeepQA Project", AI Magazine (AI Magazine.) 31 (3)
Want to Play Chess or Just Chat?
! Chess ! A finite, mathematically well-defined search space ! Limited number of moves and states ! All the symbols are completely grounded in the mathematical rules of the game
! Human Language ! Words by themselves have no meaning ! Only grounded in human cognition ! Words navigate, align and communicate an infinite space of intended meaning ! Computers can not ground words to human experiences to derive meaning
“Built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring” – IBM (link)
IBM’s Watson is an emerging technology at the intersection of Big Data, Analytics and Human / Computer Interaction trends
IBM's Watson: A HorizonWatching Trend Report
9
“Watson is an artificial intelligence computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project” – Wikipedia (link)
“An application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open domain question answering” – IBM (link)
Wikipedia Definition
IBM Definition
Enabling Technology Areas • Natural Language Processing • Semantic Analysis • Information Retrieval • Automated Reasoning • Machine Learning
http://www.youtube.com/watch?v=dQmuETLeQcg
Video: What is Watson?
“DeepQA is an effective and extensible architecture that can be used as a foundation for combining, deploying, evaluating, and advancing a wide range of algorithmic techniques to rapidly advance the field of question answering (QA)” – AI Magazine (link)
AI Magazine
Easy Questions?
10
ln((12,546,798 * π)) ^ 2 / 34,567.46 =
Owner Serial Number
David Jones 45322190-AK
Serial Number Type Invoice #
45322190-AK LapTop INV10895
Invoice # Vendor Payment
INV10895 MyBuy $104.56
David Jones
David Jones =
0.00885
Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”,
Dave Jones
David Jones ≠
Hard Questions? Computer programs are natively explicit, fast and exacting in their calculation over
numbers and symbols….But Natural Language is implicit, highly contextual, ambiguous and often imprecise.
! Where was X born? One day, from among his city views of Ulm, Otto chose a water color to
send to Albert Einstein as a remembrance of Einstein´s birthplace.
! X ran this? If leadership is an art then surely Jack Welch has proved himself a master
painter during his tenure at GE.
Person Birth Place
A. Einstein ULM
Person Organization
J. Welch GE
Structured
Unstructured
The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key
Dimensions
Broad/Open Domain
Complex Language
High Precision
Accurate Confidence
High Speed
$600 In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus
$200 If you're standing, it's the direction you should look to
check out the wainscoting.
$2000 Of the 4 countries in the world that the U.S. does not have diplomatic relations with, the one that’s farthest north
$1000 The first person mentioned by name in ‘The Man in the Iron Mask’ is this hero of a previous book by the same author.
Basic Game Play Technology Classics The Great
Outdoors Speak of
the Dickens Mind Your Manners
Before and After
$200 $200 $200 $200 $200 $200
$400 $400 $400 $400 $400 $400
$600 $600 $600 $600 $600 $600
$800 $800 $800 $800 $800 $800
$1000 $1000 $1000 $1000 $1000 $1000
6 Categories
5 Levels of Difficulty
q 1 of 3 Players Selects a Clue
q Host reads Clue out loud
ALL POLICEMEN CAN THANK STEPHANIE KWOLEK FOR HER INVENTION OF THIS POLYMER FIBER, 5 TIMES TOUGHER THAN STEEL
TECHNOLOGY
q All Players compete to answer
q 1st to buzz-in gets to answer
q IF correct
Ø earns $ value
Ø selects Next Clue
q IF wrong Ø loses $ value Ø other players buzz again (rebounds)
q Two Rounds Per Game + Final Question
q ONE Daily Double in First Round, TWO in 2nd Round
14
Broad Domain
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
he
film
gr
oup
capi
tal
wom
an
song
si
nger
sh
ow
com
pose
r tit
le
fruit
plan
et
ther
e pe
rson
la
ngua
ge
holid
ay
colo
r pl
ace
son
tree
line
prod
uct
bird
s an
imal
s si
te
lady
pr
ovin
ce
dog
subs
tanc
e in
sect
w
ay
foun
der
sena
tor
form
di
seas
e so
meo
ne
mak
er
fath
er
wor
ds
obje
ct
writ
er
nove
list
hero
ine
dish
po
st
mon
th
vege
tabl
e si
gn
coun
tries
ha
t ba
y
Our Focus is on reusable NLP technology for analyzing volumes of as-is text. Structured sources (DBs and KBs) are used to help interpret the text.
We do NOT attempt to anticipate all questions and build specialized databases.
In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the time. The distribution has a very long tail. And for each these types 1000’s of different things may be asked.
*13% are non-distinct (e.g., it, this, these or NA)
Even going for the head of the tail will barely make a dent
DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture
Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing, Information Retrieval, Machine Learning and Reasoning Algorithms.
These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find.
. . .
Answer Scoring
Models
Answer & Confidence
Question
Evidence Sources
Models
Models
Models
Models
Models Primary Search
Candidate Answer
Generation
Hypothesis Generation
Hypothesis and Evidence Scoring
Final Confidence Merging &
Ranking Synthesis
Answer Sources
Question & Topic
Analysis
Evidence Retrieval
Deep Evidence Scoring
Learned Models help combine and
weigh the Evidence
Hypothesis Generation
Hypothesis and Evidence Scoring
Question Decomposition
1000’s of Pieces of Evidence
Multiple Interpretations
100,000’s Scores from many Deep Analysis
Algorithms
100’s sources
100’s Possible Answers
Balance & Combine
Wilhelm Tempel
HMS Paramour
Isaac Newton
Halley’s Comet
Pink Panther
Christiaan Huygens
Peter Sellers
Edmond Halley
…
Candidate Answer Generation
[0.58 0 -1.3 … 0.97]
[0.71 1 13.4 … 0.72]
[0.12 0 2.0 … 0.40]
[0.84 1 10.6 … 0.21]
[0.33 0 6.3 … 0.83]
[0.21 1 11.1 … 0.92]
[0.91 0 -8.2 … 0.61]
[0.91 0 -1.7 … 0.60]
Evidence Scoring
Example Ques-on IN 1698, THIS COMET DISCOVERER TOOK A
SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY
SCIENTIFIC SEA VOYAGE
Related Content (Structured & Unstructured)
Primary Search
1) Edmond Halley (0.85) 2) Christiaan Huygens (0.20) 3) Peter Sellers (0.05)
Merging & Ranking
Evidence Retrieval
Question Analysis
Keywords: 1698, comet, paramour, pink, … AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) …
One Jeopardy! question can take 2 hours on a single 2.6Ghz Core Optimized & Scaled out on 2,880-Core Power750 using UIMA-AS, Watson is answering in 2-6 seconds.
Question 100s Possible
Answers
1000’s of Pieces of Evidence
Multiple Interpretations
100,000’s scores from many simultaneous Text Analysis Algorithms 100s sources
. . .
Hypothesis Generation
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
Synthesis Question &
Topic Analysis
Question Decomposition
Hypothesis Generation
Hypothesis and Evidence Scoring
Answer & Confidence
IBM’s Watson has been recognized as one of the most important technology achievements of 2011
IBM's Watson: A HorizonWatching Trend Report
18
“CIOs, business planners, enterprise architects, and strategy teams should familiarize themselves with its capabilities, and brainstorm ways in which human decision processes can be supported” – Gartner (link)
“The impact of Watson…will be felt far beyond the game show. This technology could have significant effect on business, government and society.” – TED (link)
Link: TED: Final Jeopardy and the Future of Watson
“Much of the technology that IBM built for Watson can be deployed against other types of tasks besides winning a Jeopardy game, to make solutions for these tasks "smarter." This technology addresses all of the five A's of smart computing that we have identified, that is, Awareness, Analysis, Alternatives, Actions, and Auditability. ” – Forrester (link)
“What is thinking? What is intelligence? What is the role that computers should and will play in our lives, and what are the boundaries between humans and computers? IBM's Watson demands that we reconsider each of these questions” – IDC (link)
Video: The Future of Watson Gartner
Forrester
IDC
Watson – Additional Information and Resources
IBM's Watson: A HorizonWatching Trend Report 19
• AI Magazine: Building Watson: An Overview of the DeepQA Project • CIO Insight: IBM’s Watson: 11 Personal Apps • eWeek: IBM’s Watson: The Future of Computing • IDC: What is Watson: The IBM Jeopardy Challenge • IBM’s Watson Portal: IBM Watson • IBM: Watson press kit and Watson Facebook Page and IBM Research: The DeepQA Project • NY Times: What is IBM’s Watson? • PBS Video: Smartest Machine on Earth • Time: 10 Questions for Watson's Human • Twitter: @IBMWatson and hasthag #ibmwatson • YouTube: Watson playlist • Wikipedia: Watson
“We believe this will be an invaluable resource for our partnering physicians and will dramatically enhance the quality and effectiveness of medical care they deliver to our members.” – Wellpoint (link)
Small Problem
20
Data – Make data available freely; Give semantics to data Access - APIs: standards based access, composition Value – Give benefit to citizens; create business opportunities
Do it repeatedly and rapidly for core services
Big Data
! Volume
! Variety
! Velocity
! Veracity
! …
Cartoon critical of big data application, by T. Gregorius. http://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/Big_data_cartoon_t_gregorius.jpg/220px-Big_data_cartoon_t_gregorius.jpg
Open Data ! Open data is the notion that data should not be
hidden, but made available to everyone. The idea is not new.
! Scientific publications follow this: “standing on the shoulders of giants” ! Science stands for repeatability of results and
hence, sharing ! The scientific community asserts that open data
leads to increased pace of discovery. (See: Ray P. Norris, How to Make the Dream Come True: The Astronomers' Data Manifesto, At http://www.jstage.jst.go.jp/article/dsj/6/0/6_S116/_article, Accessed 2 Apr, 2012)
! Governments are the new source for open data ! Data.gov efforts world-wide; 300+ governmental
bodies, including 20+ national agencies, including India, have opened data
! In India, additional movement is “Right to Information Act” 22
India: Right to Information Act
! Any citizen “may request information from a "public authority" (a body of Government or "instrumentality of State") which is required to reply expeditiously or within thirty days.” ! Passed by Parliament on 15 June 2005 and came fully into force on 13
October 2005. Citation Act No. 22 of 2005
! Lauded and reviled ! Brought transparency ! Also,
! Increased bureaucracy ! Shortcomings in preventing corruption
! More information ! http://en.wikipedia.org/wiki/Right_to_Information_Act ! http://rti.gov.in
Running Example – Temperature at Conference Location
! Measurement System – Celsius, Fahrenheit, Kelvin, Color of spectrum, …
! Indoor or Outdoor ! Indoor – should we need to capture events happening inside? ! Outdoor – should we have to capture predicted weather?
! Location - Latitude, Longitude, Address, Part of building
! Measuring equipment details
! Data quality - refresh rates, default values when equipment broken
Data Quality in Public Data in India
! Right to Information ! Not even 1*
! Information available to requester, but no one else
! Data.gov.in ! 2-3*
! Available in CSV, etc but not uniquely referenceable
! Open data movements are moving to linked data form for semantics
Linking of Open Data for Reusability
30
Source: http://5stardata.info/
Source: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Illustration: W3C Organization ! Abstract: This document describes a core ontology for organiza-onal structures,
aimed at suppor-ng linked-‐data publishing of organiza-onal informa-on across a number of domains. It is designed to allow domain-‐specific extensions to add classifica-on of organiza-ons and roles, as well as extensions to support neighbouring informa-on such as organiza-onal ac-vi-es.
1. Introduc-on 2. Conformance 3. Namespaces 4. Overview of ontology 5. Design notes 6. Notes on style 7. Organiza-onal structure
7.1 Class: Organiza-on 7.1.1 Property: subOrganiza-onOf 7.1.2 Property: transi-veSubOrganiza-onOf 7.1.3 Property: hasSubOrganiza-on 7.1.4 Property: purpose 7.1.5 Property: hasUnit 7.1.6 Property: unitOf 7.1.7 Property: classifica-on 7.1.8 Property: iden-fier 7.1.9 Property: linkedTo
7.2 Class: FormalOrganiza-on 7.3 Class: Organiza-onalUnit 7.4 Notes on formal organiza-ons 7.5 Notes on organiza-onal hierarchy 7.6 Notes on organiza-onal classifica-on
8. Repor-ng rela-onships and roles 8.1 Class: Membership
8.1.1 Property: member 8.1.2 Property: organiza-on 8.1.3 Property: role 8.1.4 Property: hasMembership 8.1.5 Property: memberDuring 8.1.6 Property: remunera-on
8.2 Class: Role 8.2.1 Property: roleProperty
8.3 Property: hasMember 8.4 Property: reportsTo 8.5 Property: headOf 8.6 Discussion
9. Loca-on 9.1 Class: Site
9.1.1 Property: siteAddress 9.1.2 Property: hasSite 9.1.3 Property: siteOf 9.1.4 Property: hasPrimarySite 9.1.5 Property: hasRegisteredSite 9.1.6 Property: basedAt
9.2 Property: loca-on 10. Projects and other ac-vi-es
10.1 Class: Organiza-onalCollabora-on 11. Historical informa-on
11.1 Class: ChangeEvent 11.1.1 Property: originalOrganiza-on 11.1.2 Property: changedBy 11.1.3 Property: resultedFrom 11.1.4 Property: resul-ngOrganiza-on
A. Change history B. Acknowledgments C. References
C.1 Norma-ve references C.2 Informa-ve references
http://www.w3.org/TR/vocab-org/
Usage of W3C’s Org Ontology – Community Directory
@prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix vcard: <http://www.w3.org/2006/vcard/ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dir: <http://dir.w3.org/directory/schema#> . @prefix directory: <http://dir.w3.org/directory/orgtypes/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix gr: <http://purl.org/goodrelations/v1#> . @prefix org: <http://www.w3.org/ns/org#> . <> foaf:primaryTopic <#org> . <#org> a org:Organization, dir:Organization, gr:BusinessEntity, vcard:Organization ; rdfs:label "International Business Machines" ; gr:legalName "International Business Machines" ; vcard:organization-name "International Business Machines" ; skos:prefLabel "International Business Machines" ; dir:isOrganizationType directory:commercial ; vcard:url <http://www.ibm.com> ; vcard:logo <http://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/200px-IBM_logo.svg.png> ; rdfs:comment """International Business Machines Corporation (NYSE: IBM), or IBM, is an American multinational technology and consulting corporation, with headquarters in Armonk, New York, United States. IBM manufactures and markets computer hardware and software, and offers infrastructure, hosting and consulting services in areas ranging from mainframe computers to nanotechnology.""" . <#org> org:siteAddress <#address-1NewOrchardRoad+Armonk+UnitedStates> . <#address-1NewOrchardRoad+Armonk+UnitedStates> a vcard:VCard, vcard:Address ; vcard:street-address "1 New Orchard Road " ; vcard:locality "Armonk " ; vcard:country-name "United States" ; vcard:region "New York" ; vcard:postal-code "10504-1722" .
Small Problem
35
Data – Make data available freely; Give semantics to data Access - APIs: standards based access, composition Value – Give benefit to citizens; create business opportunities
Do it repeatedly and rapidly for core services
Business
Source: Bessemer Venture Partners 2012
Business Capabilities as Services are being via APIs and delivered as-a-service, allowing Businesses to engage with Clients and Partners with speed at Scale
REST v/s Web Services?
40
REST • support limited integration styles, and involves
fewer decisions on architectural alternatives • This simplifies client-side integration steps (at
the cost of lessening automation in system evolution); more focus on do-it-yourself
Source: Pautasso et al, RESTful Web Services vs. “Big” Web Services: Making the Right Architectural Decision, WWW 2008
Running Example – APIs for Temperature at Conference Location
! API examples ! Get temperature (input: current, last, input instant)
! Get temperature interval (input: day)
! Get average temperature (input: time range)
! REST or web-service
! Semantic annotation on input and output
Every citizen is a potential city event sensor • Citizen notices 311 event worth reporting • Reports event using mobile
• Launches mobile application • Browses recent already-reported events • Creates new event report
• [Is pre-enabled or gets any needed credentials to report event] • Identifies service type for new event • Shares location using mobile device (coordinates) • Can add location annotations (road, district, city) and description
• Get confirmation of submission • Get updates on service request
Extreme Personalization
=
Location Intelligence
Empowered Citizen
+
SocialAnalytics
+ +
42
ALLGOV SCENARIO: CROWDSOURCING 311* EVENT REPORTING
*311 data standard • non-emergency events like graffiti,
garbage, down trees, abandoned car, …; Not human life threatening
• 60+ cities support it world-wide; demo works on 4 (Chicago, Boston, Tucson – USA; Bonn – Germany), and backend test of 10s more.
Browsing Services in One’s City: Mary M. can look at the 311 services her city provides On selecting the icon, • She sees a small set of categories
(health, building, traffic, cityimage, others) around which all the city’s services are grouped.
• She can look at a list of services and check out the agencies involved • If there has been a change in agency responsible or new
services added for an agency, she can note that directly
Browsing Services in Other Cities: Her colleagues from another city are visiting. She may want to bring a window (instantiate an app with browse city pattern) to look at what that city offers to their citizens [Alternatively, if she is travelling to another city, she may be interested to know how that city does compared to her’s, by which agency, etc.] On selecting the icon, • See sees a small set of familiar categories (health, building, traffic,
cityimage, others) regardless of what the city calls its services • She can look at a list of services and check out the agencies involved
If her city does something different, she can show that to her colleagues in her or other cities.
Applica-on Pa]ern
! What is it?: A pa]ern is any applica-on using APIs, with some informa-on generalized (i.e., removed and parameterized)
! Business Value: A pa]ern ! standardizes the usage experience by promo-ng similar behavior (for users) ! simplifies applica-on development by templa-zing API interac-ons (for developers) ! serves as the organiza-on’s memory of the best-‐prac-ces in developing a class-‐of-‐applica-ons even when the specific APIs may not be relevant (for business)
! Key Technical Issue ! What pa]erns should one build ? Theore-cally, there exists a trivial method to blindly generate a pa]ern from any applica-on. Any pa]ern development process has to do be]er than this baseline.
! How should the pa]erns be used in prac-ce? ! Building a tool-‐enabled process around Pa]ern-‐based programming
Applica-on Pa]ern
! Approach followed in AllGov ! Common steps taken by a role player is a candidate pa]ern
! Common steps that can be executed in the same infrastructure is a candidate pa]ern
! Pa]ern 1: Browse city services pa]ern [User Role: Govt. Dept Admin; Environment: PRODUCTION system]
! find a city's services
! find a service's defini-on
! find services of a par-cular high-‐level category (example: building, graffi-, ...)
! Pa]ern 2: Create service request pa]ern [User Role: Developer; Environment: TEST system]
! Browse city services
! Browse raised city service requests
! Create a new service request
! Pa]ern 3: Create service request pa]ern [User Role: General ci-zen of a par,cular City; Environment: PRODUCTION system]
! Browse city services
! Browse raised city service requests
! Create a new service request
AllGov Scenario Deconstruction (flows)
Customer Mobile
AllGov City Services
1
2
External IBM Client
browse events get recent events
Request confirmation
get service types create request
Post location coordinates
Post details on Event, location
3 Notify service completed
P1, P1+
P2, P3
Why Tourism Matters
! Pros ! Promotes services jobs ! Helps upgrade infrastructure ! Gives alternative revenue source to government beyond
traditional agriculture and manufacturing ! Helps take local culture world-wide ! Promotes country image
! Cons ! Can lead to environmental impact if not planned well ! Can dilute local traditions and culture if unplanned
World Tourism in Numbers Key Points • In 2013, >1 billion people spent overnight in another city and spent >
1 trillion USD • Among oldest civilizations (> 5K years) in the world, of China, Egypt
and India, only China gets and sends tourists in top-5 by numbers and money spent.
• Tourists go beyond language and history to spend their money for novel experiences
Key Points for Africa and Middle East • In 2013, there were over 55.7 million international tourist arrivals to
Africa, an increase of 5.4% over 2012. • In 2013, there were over 51.5 million international tourist arrivals to
the Middle East, a decrease of 0.2% over 2012. • Top countries are individually getting more tourists than Africa or
Middle-east as a whole (70-80M range v/s 50M-55M)
Tables Courtesy: http://en.wikipedia.org/wiki/World_Tourism_rankings (Accessed 20 Oct, 2014)
Top Cities Tourists Visit (by money spent)
Figure Courtesy: MasterCard 2014 Global Destination Cities Index, At http://newsroom.mastercard.com/digital-press-kits/mastercard-global-destination-cities-index-2014/
Top cities are getting money from tourists that countries in Middle East/ Africa are planning by 2020
Top Cities in MEA
Figure Courtesy: MasterCard 2014 Global Destination Cities Index, At http://newsroom.mastercard.com/digital-press-kits/mastercard-global-destination-cities-index-2014/
There is tremendous scope to grow if things are done differently
Possible Strategy to Promote Tourism
! Increase quality of experience for USPs using better information availability. Examples: ! Service quality – Information on what is happening and what
to expect, when, at what cost; make it easy to consume offerings
! Remove barriers to travel and spending - Remove perception of lack-of-safety, increase transparency about supporting services like roads, hospitals, taxis
! Promote domestic tourism in addition to international tourism ! Helps natives inculcate service-industry culture, build capacity
City Concierge (CC): Serving People by Design
! Target users ! Citizens wanting to know more about their city ! Travellers planning to visit new cities with memorable experiences
! People (e.g., business, government) wanting to compare cities
! Group information along a small set of easy-to-follow categories ! We selected - Traffic, health, building, city image, others
! Easy to change to any set of categories
! Languages supported – English, Portuguese, Spanish, German ! Easy to extend to any
2nd place winner in Europe’s CitySDK App Hackathon in June 2014 Details: http://www.slideshare.net/biplavsrivastava/city-concierge-presentation10june2014
Serving People by Design ! Target users: Citizens, Travellers, People
Citizens, Travellers Most events – Helsinki Most open service requests - Lisbon
Check Services of Your Favorite City – Chicago, in example
Lisbon (in Portuguese) Bonn(in German)
People, Travellers Most city services – Lisbon; Traffic most common category in cities
CC Design Principles
! Focus on features that promote usage of city data ! Overcoming language barriers
! Overcoming API and data diversity barriers ! Highlight commonalities, promote comparison
! Follow standards ! CitySDK for tourism events upcoming ! Open 311 for city’s non-emergency services and service requests
! Programming level approach ! Overcome (City API) errors to stay useful ! Be resource efficient to promote mobile apps ! Standardize on output formats
Tourism Capacity Building with Smarter Transportation
Details: • Making Public Transportation Schedule Information Consumable for Improved Decision Making, Raj
Gupta, Biplav Srivastava, Srikanth Tamilselvam, In 15th International IEEE Annual Conference on Intelligent Transportation Systems (ITSC 2012), Anchorage, USA, Sep 16-19, 2012.
• City Notifications as a Data Source for Traffic Management, Pramod Anantharam, Biplav Srivastava, in 20th ITS World Congress 2013, Tokyo
Promoting Public Transportation: Before and After We Seek
Many cities around the world, and especially in India and emerging ones, are getting their transportation infrastructure in shape.
– They have multiple, fragmented, transportation agencies in a region (e.g., city) – They do not have instrumentation on their vehicles, like GPS, to know about their
operations in real-time – Schedule of public transportation is widely available in semi-structured form. They
are also beginning to invest in new, novel, sensing technologies – Cities give SMS-based alerts about events on the road. Our approach seeks to accelerate time-to-value for such cities.
Kind of Information Today Available to Bus User
With IRL-Transit+ Benefit
Bus Schedule (static) Available online and pamphlets
Available from IT-enabled devices( low-cost phones, smart phones, web)
Increase accessibility
Bus Schedule Changes (dynamic)
No information Infer from city updates Increase information
Analytics (Bus Selection Decision Support)
No information Will be available (Transit)
Increase information
Standardization of information
No support Will be supported (SCRIBE, Transit)
Increase information’s interoperability
Background: Public Transportation Schedule Information
! Is widely available for public transportation agencies around the world
! Gives the basic, static, information about transportation service
! Usually in semi-structured format with varying semantics
! Can have errors, missing data
Basic Solution Steps ! Use the widely available schedule information from individual operators
(agencies)
! Clean and consolidate it across agencies and modes to get a multi-modal view for the region ! Optionally: Convert it into a standard form ! Optionally: Enhance (fuse) it with any real-time updates about
services for the region
! Perform what-if analysis on consolidated data ! Path finding using Djikstra’s algorithm ! Analyses can be pre-determined, analyses can also be user-
created and defined
! Make analysis results available as a service ! On any device ! To any subscriber
Multi-Mode Commuting Recommender in Delhi And Bangalore
64
Highlights • Published data of multiple authorities used; repeatable process • Multiple modes searched • Preference over modes, time, hops and number of choices supported; more extensions, like fare possible • Integration of results with map as future work; already done as part of other projects, viz. SCRIBE-STAT
Further Work* ! Invariant Inputs:
! The person ! has a vehicle (e.g., car), and ! can also walk short distances
! The city has taxis, buses, metros, autos, rickshaws ! Buses and metros have published routes, frequency and stops ! Autos and rickshaws can be available at stands, or opportunistically, on the road ! Taxis can be ordered over the phone
! Input: ! A person wants to travel from place A to B ! [Optional] City provides updates on ongoing events, some may
affect traffic ! Output
! Suggest to the person which mode or combination of modes to select
! Observation: Using preferences over factors that matter to users to keep commuting convenient, while making best use of available public and para-transit commute methods
* City Notifications as a Data Source for Traffic Management, Pramod Anantharam, Biplav Srivastava, in 20th ITS World Congress 2013, Tokyo
Number of SMS messages for bus stops in Delhi for 2 years (Aug 2010 – Aug 2012)*
• 344 stops with updates • 3931 total stops
* using Exact Matching
IRL – Transit in Aug 2012
Key Points • SMS message from city • Event and location identified • Impact assessed • Impact used in search
Increase Accessibility and Availability of Bus Information to Passengers
Kind of Information
Today Available to Bus Users
With Project in Bangalore
Mysore ITS (for reference)*
Benefit
Bus Schedule (static) Available online and pamphlets
Available from low-cost phones (Spoken Web – Static)
Available online and pamphlets
Increase accessibility
Bus Schedule Changes (dynamic)
No information today
Will be available (Spoken Web - Human)
No information but in plan
Increase information
Bus Location No information today
Will be available (GPS)
Will be available (GPS)
Increase information
Bus Condition No information today
Will be available (Spoken Web - Human)
No information today
Increase information
Analytics (Bus Selection Decision Support)
No information today
Will be available (Transit)
No information but in plan
Increase information
Last –mile Connectivity to/ from nearest stop
No information today
Will be available (Spoken Web - Human)
No information today Increase information
Standardization of information
No support Will be supported (SCRIBE, Transit)
Some support due to GPS
Increase information’s interoperability
* Opinion based on only public information
A Flexible Journey Plan
69
Our End Vision: Information to Commuters to Reach Destination in All Eventuality
Pilots running in Dublin, Ireland
Resources ! Tutorial on AI-Driven Analytics In Traffic Management, in conjunction with International
Joint Conference on Artificial Intelligence (IJCAI-13), Biplav Srivastava, Akshat Kumar, at Beijing, China, Aug 3-5, 2013 (tutorial-slides).
! Tutorial on Traffic Management and AI, in conjunction with 26th Conference of Association for Advancement of Artificial Intelligence (AAAI-12), Biplav Srivastava, Anand Ranganathan, at Toronto, Canada, July 22-26, 2012 (tutorial-slides).
! Making Public Transportation Schedule Information Consumable for Improved Decision Making, Raj Gupta, Biplav Srivastava, Srikanth Tamilselvam, In 15th International IEEE Annual Conference on Intelligent Transportation Systems (ITSC 2012), Anchorage, USA, Sep 16-19, 2012.
! Mythologies, Metros & Future Urban Transport , by Prof. Dinesh Mohan, TRIPP, 2008 ! A new look at the traffic management problem and where to start, by Biplav Srivastava, In 18th
ITS Congress, Orlando, USA, Oct 16-20, 2011. ! Arnott, Richard and K.A. Small, 1994, “The Economics of Traffic Congestion,” American
Scientist, Vol. 82, No. 5, pp. 446-455. ! Chengri Ding and Shunfeng Song , Paradoxes of Traffic Flow and Congestion Pricing,
Tourism Capacity Building with Corruption Prevention
Details: • A Computational Model for Corruption Assessment, Nidhi Rajshree, Nirmit V. Desai and Biplav
Srivastava, IJCAI 2013 Workshop on Semantic Cities, Beijing, 2013 [Corruption-FormalModels]
• Open Government Data for Tackling Corruption – A Perspective, Nidhi Rajshree, Biplav Srivastava, in AAAI 2012 Workshop on Semantic Cities, Toronto, July 2012. [Area: Open data-Corruption]
Corruption “the misuse of public office for personal gains”
* Source: http://cpi.transparency.org/cpi2012/results/
Corruption afflicts both public and corporate services world wide. It is known that it has a significant negative impact on the growth of economies and hence, is universally considered undesirable.
Corruption : “Monopoly + Discretion – Accountability” (Klitgaard, Robert E. Controlling corruption. Berkeley: U. of California Press, 1988)
A Nation’s Competitiveness and Corruption Perception
Don’t Go Hand-in-Hand
For Promoting Tourism, Corruption Perception has to
be Removed
Some Key Questions Related to Corruption
• Exchange of money: can a service for which the customer does not pay a fee (free service) be termed corrupt? Or conversely, can a corrupt practice only happen if the customer pays for a service?
• Human agents: can a service be corrupt if the agent delivering the service is not a human but an automated agent?
• Contention for resources: can corruption happen if delivering it requires no contention of resources? Alternatively, if resources are scarce, will an objective way of allocating them help remove corruption?
Metamodel – Expressing Key Concepts for Corruption
Provider
Ac-vity Process
Task Decision Inputs Outputs Escala-on
Requestor
0..1 *
1 +
Person
Organiza-on
1
1
1
1
1 1 1
* Process Instance *
Ac-vity Instance
1
+
Execu-on Time
Execu-on Cost
1
1 1
1
1
Framework Evaluation, by Example
National Registration - Kenya
1. Submit supporting documents
2. Validate
docs
4. Handover serialized App Form
11. App signed and stamped by Chief
Asst. Officer
12. Submit documents to
NRB
13. Verify identity of the
applicant
14. Process ID Card
17. Collect ID Card
- Proof of birth - Proof of citizenship - Proof of residence
5. Fill and submit application form
- Form 101 - Form 136 A - Form 136 C
6. Take finger prints
7. Click photograph for ID card
8. Handover the waiting card
10. Submit documents
to Chief
3. Vetting 15. Send ID card to the
Registration Office - Additional proof of
residence
Ancestral home town is a border district or age >> 18
Insufficient documents
Sufficient documents
9. Receive waiting card and wait for processing
16. Receive ID Card from
NRB
Citi
zen
Reg
istr
atio
n O
ffic
er
Satisfied
Not satisfied
Vet
ting
Com
mitt
ee
Ch.
Ass
t. O
ffic
er
NR
B O
ffic
er
National Registration Kenya India (Aadhar) USA (Social Security)
• The decision node, 3 - vetting, and the activity, 13 - verify identity, are discretionary with no clear mechanism on how to accomplish them.
• In contrast, the checks for documents having been submitted are objective.
• There is no Service Level Agreement (SLA) for the process.
• The ID process is monopolistic since only a single authority
• (registration office) can process it. • The process has little reviewability and
low visibility since there is no escalation mechanism.
• 18 Proofs of Identity (PoI) and 33 Proofs of Address (PoA) documents are permitted for making the request.
• The process also allows discretion by allowing at- tested documents from high-level officials.
• The cost and time limits for the service are prescribed.
• The process, however, can only be handled by a single agency creating a monopoly.
• In SS, a clear list of documents proving US citizenship (or legal residence), age and identity is listed.
• There is little room for discretion because no category allows a signed attestation by a high-level official to be acceptable
• The cost and time limits for the service are prescribed.
• The process, however, can only be handled by a single agency creating a monopoly.
Framework Evaluation, by Example
International Driving Permit (IDP) 1. Submit supporting documents
2. Validate docs
5. Handover Appl Form
10. Stamp and sign the
IDP
13. Collect IDP
- Driver’s license - Passport - Air tickets - VISA
5. Fill and submit application form
- Form CMV1
+ 4. DL Address change
process
8. Verify
applicants driving skills
DL address not under RTO jurisdiction
Insufficient documents
DL address under RTO jurisdiction
Citi
zen
Fron
t Des
k O
ffic
er
Satisfied
Not satisfied
Insp
ecto
r
Reg
iona
l Tr
ansp
ort
Off
icer
3. Validate
address
7. Send applicant for DL Test
6. Verify DL
issuance date
9. Send application to Regional Transport Officer
11. Send IDP to front
desk officer
12. Receive IDP from Regional
Transport Officer
Address has not changed
DL issued within 3 months
Address has changed
DL issued within more than 3 months
International Driving License
India (IDP) USA (AAA)
• Service execution cost is specified (of Rs 500) but not service execution time given.
• There is no escalation mechanism • The check whether all documents have
been sub- mitted is objective. • The IDP is monopolistic since only a
single authority (RTO) can process it. • The process has little reviewability and
low visibility since there is no escalation mechanism.
Procedure involves filling a form online, visiting the office of an authorized agency with a valid state-issued driver’s license, photos and fees, and getting the permit. Here, there are multiple agencies to process the request and the prerequisite driver license can be verified objectively (e.g., with social security databases). • No monopoly • Objective criteria
Tackling Corruption Tackling corruption pro-actively:
! Open Gov. Data ! Increases transparency hence increasing the risk of being caught in the
act of corruption ! Makes measurements by SLAs possible
! Process Redesign ! Ensures a robust process design reducing corruption hotspots ! Reduce monopoly, discretion
! Automation ! Automation needs outcomes to be formally defined ! Reduces discretion, requires data (input, output, outcome) to be
adequately captured
Corruption : “Monopoly + Discretion – Accountability” (Klitgaard, Robert E. Controlling corruption. Berkeley: U. of California Press, 1988)
Running Example – Potential Applications of Temperature at Conference Location (Over Time)
! External temperature ! Environment models, weather forecasting, pollution
spread models, disease spread rates, …
! Internal temperature ! Energy management, security management, building
management, traffic management, …
! Temperature is unrelated to technical program. Imagine what all can be enabled with conference’s technical content if made machine consumable with APIs and used for real applications ?
Call for Action
! Main message ! Use more open data in your research ! Build apps and make them out available
! Specifics ! Governments should
! Come out with data sharing/ disclosure policies, and ! Example: USA - US Executive Order 13556, Controlled Unclassified Information, At
http://www.whitehouse.gov/the-pressoffice/2010/11/04/executive-order-controlled-unclassifiedinformation
! Example: India - National Data Sharing and Accessibility Policy (NDSAP) at http://dst.gov.in/NDSAP.pdf
! Come out with specific application licensing guidelines ! Implement them!
! Academia must ! Lead research in this area ! Make their own data available in linked open form (LOD)
! Industry and standardization bodies should help ! by documenting best practices ! building necessary tools ! using open standards, and ! reporting case studies.
Thank You
Merci Grazie
Gracias Obrigado
Danke
Japanese
French
Russian
German Italian
Spanish
Portuguese
Arabic
Traditional Chinese
Simplified Chinese
Hindi
Romanian
Korean
Multumesc
Turkish
Teşekkür ederim
English
Dr. Biplav Srivastava, [email protected]://www.research.ibm.com/people/b/biplav/