transforming big data into smart data: deriving value via harnessing volume, variety and velocity...
Post on 26-Jan-2015
110 Views
Preview:
DESCRIPTION
TRANSCRIPT
Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity
using semantics and Semantic Web
Keynote at SEBD2013, July 1, 2013, Italy & invited talks in Spain & Italy, June 2013.
Put Knoesis Banner
The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA
Pavan Kapanipathi
Pramod Anantharam
Amit Sheth
Cory Henson
Dr. T.K. Prasad
Maryam Panahiazar
Contributions by many, but Special Thanks to:
Hemant Purohit
2
2011
How much data?
48(2013)
500(2013)
http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
3
1% of the data is used for analysis.
http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
4
Variety
Structured
Unstructured
Semi structured
Audio
Video
Images
5
Velocity
Fast Data
Rapid Changes
Real-Time/Stream Analysis
Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail
6
• Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare
– Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns
– Emphasis on technologies to handle volume/scale, and to a lesser extent velocity: Hadoop, NoSQL,MPP warehouse ….
– Full faith in the power of data (no hypothesis), bottom up analysis
Current Focus on Big Data
7
• What if your data volume gets so large and varied you don't know how to deal with it?
• Do you store all your data?• Do you analyze it all?• How can you find out which data points are
really important?• How can you use it to your best competitive
advantage?
Questions typically asked on Big Data
http://www.sas.com/big-data/
8http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/
Variety of Data Analytics Enablers
9
• Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathematical
models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds
– Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• NY city manholes problem [ICML Discussion, 2012]• Netflix and Amazon’s recommendations, which of the sales people are more effective, …
Illustrative Big Data Applications
10
• Reaction and not innovation• Correlations are not causations
– “we accept the likelihood of what will happen without knowing why it will happen” [1]
• Right answers are good but often are just not sufficient to make a person take an action (see [2])
• Big data cannot replace human judgment [3] - Beyond what a data based prediction says, people also use many subjective, social and soft cues in the process of taking an action (past experience – personal and social circle, personal preferences- eg willingness to be different/ longevity vs quality of life,…)
• Sometime, better algorithms beat more data
What is missing with current “Big Data” focused solutions?
[1]: http://itsminfo.com/a-critique-of-pure-data-part-2/[2] http://www.bbc.com/capital/story/20130612-will-big-data-drown-out-genius/ [3] http://www.matchcite.com/blog/blog/2012/july/big-data-cannot-replace-human-judgment.aspx[4] http://allthingsd.com/20121128/better-algorithms-beat-more-data-and-heres-why/
11
• Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well-being; better disaster coordination) that is highly personalized/individualized/contextualized– Incorporate real-world complexity: multi-modal and multi-sensory nature of
real-world and human perception– Need deeper understanding of data and its role to information (e.g., skew,
coverage)
• Human involvement and guidance (directly or indirectly): Leading to actionable information, understanding and insight right in the context of human activities– Bottom-up & Top-down processing: Infusion of models and background
knowledge (data + knowledge + reasoning)
What is missing?
12
Contextual
Information Smart Data
Makes Sense
Actionable or help decision support/making
13
Smart Data
Smart data makes sense out of Big data
It provides value from harnessing the challenges posed by volume, velocity,
variety and veracity of big data, in-turn providing actionable information and
improve decision making.
Smart data relies on more than big data –It uses background knowledge, experiences,
advanced and contextualized reasoning, and is often highly personalized.
14
“OF human, BY human and FOR human”
Smart data is focused on the actionable value achieved by human involvement
in data creation, processing and consumption phases for improving the
human experience.Smart data is about improving human experience.
Another perspective on Smart Data
15
DescriptiveExploratoryInferentialPredictive
Causal
Improved Analytics CREATION
PROCESSING
EXPERIENCE & DECISION MAKING
Human Centric Computing
16
“OF human, BY human and FOR human”
Let’s discuss this perspective on Smart Data
17Petabytes of Physical(sensory)-Cyber-Social Data everyday!
More on PCS Computing: http://wiki.knoesis.org/index.php/PCS
‘OF human’ : Relevant Real-time Data Streams for Human Experience
18
“OF human, BY human and FOR human”
a perspective on Smart Data
Use of Prior Human-created Knowledge Models
19
‘BY human’: Involving Crowd Intelligence in data processing workflows
Crowdsourcing and Domain-expert guided Machine Learning Modeling
20
“OF human, BY human and FOR human”
a perspective on Smart Data
21
Detection of events, such as wheezing sound, indoor temperature, humidity,
dust, and CO2 level
Weather Application
Asthma Healthcare Application
Close the window at home during day to avoid CO2 in
gush, to avoid asthma attacks at night
‘FOR human’ : Improving Human Experience
Population Level
Personal
Public Health
Action in the Physical World
22
Why do we care about Smart Data rather than Big Data?
Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity
using semantics and Semantic Web
Keynote at SEBD2013, July 1, 2013, Italy & invited talks in Spain & Italy, June 2013.
Put Knoesis Banner
The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA
Pavan Kapanipathi
Pramod Anantharam
Amit Sheth
Cory Henson
Dr. T.K. Prasad
Maryam Panahiazar
Contributions by many, but Special Thanks to:
Hemant Purohit
24
Second-costliest hurricane in United States history estimated damage $75 billion
90-115 mph winds
State of Emergency in New York
285 people killed on the track of Sandy
750,000 without power (NY)
Immense devastation and Human suffering
Big Data to Smart Data: Disaster Management example
http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html
25
20 million tweets with “sandy”, “hurricane” keywords between Oct 27th and Nov 1st
2nd most popular topic on Facebook during 2012
Social (Big) Data during Hurricane Sandy
• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding
• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html
• http://mashable.com/2012/10/31/hurricane-sandy-facebook/
26
For information seeking
For timely information
For unique information
For unfiltered information
To determine disaster magnitude
To check in with family and friends
To self-mobilize
To maintain a sense of community
To seek emotional support and healing
Governments
Emergency management organizations
Journalists
Disaster responders
Public
BIG DATA TO SMART DATA: WHY? and FOR WHOM?
Fraustino et al. Social Media Use during Disasters: A Review of the Knowledge Base and Gaps. US Dept. of Homeland Security, START 2012.
27
Improving situational awareness - Timely delivery of necessary information to the right people
Improving coordination between resource seekers and suppliers
Detecting the magnitude of disaster by people sentiments.
Many more challenges…
Can SNS’s make Disaster Management easier – Giving Actionable Information (Smart Data)
http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html
28
Volume
Twitter hits half a billion tweets a day!
Challenges
Delivering the necessary actionable/information to the right people
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
29
Velocity
Volume
@ConEdison Twitter handle that the company had only set up in June gained an extra 16,000 followers over the storm. – Did the information reach everyone?
Challenges
Delivering the necessary/actionable information to the right people
Rate of Data ArrivalApproximately 7000 TPS10 images per second on instagram
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf
30
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
Velocity
Variety
Volume
Semi Structured
Structured
Unstructured
SensorsLinked Open Data
Wikipedia
Challenges
Delivering the necessary/actionable information to the right people
31
Velocity
Variety
Veracity
Volume
Challenges
Delivering the necessary/actionable information to the right people
http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys
32
Velocity
Variety
Veracity
Volume
33
Value
-Makes Sense-Actionable Information-Decision support/making
Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
Smart Data focuses on the
value
34
Value
-Makes Sense-Actionable Information-Decision support/making
Disaster Management
Victims
Timely and Contextual Information about• Electricity, Food, Water, Shelter and
donation offers related to the disaster.
Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
35
DescriptiveExploratoryInferentialPredictive
Causal
Human Centric Computing
Improved Analytics Creation
Processing
Experience
Revisiting..
36
• Healthcare– kHealth– SemHeath
• Social event coordination– Twitris
• Traffic monitoring– kTraffic
Applications of Smart Data Analytics
37
The Patient of the FutureMIT Technology Review, 2012
http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/
38
To gain new insight in
patient care &
early indications of
disease
Smart Data in Healthcare
39
Sensing is a key enabler of the Internet of Things
BUT, how do we make sense of the resulting avalanche of sensor data?
50 Billion Things by 2020 (Cisco)
40
Parkinson’s disease (PD) data from The Michael J. Fox Foundation for Parkinson’s Research.
1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data
8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data).
Variety Volume
VeracityVelocity
ValueCan we detect the onset of Parkinson’s disease?Can we characterize the disease progression?Can we provide actionable information to the patient?
sem
antic
s Representing prior knowledge of PD led to a focused exploration of this massive dataset
WHY Big Data to Smart Data: Healthcare example
41
Big Data to Smart Data Using a Knowledge Based Approach
ParkinsonMild(person) = Tremor(person) PoorBalance(person)∧ParkinsonModerate(person) = MoveSlow(person) PoorSleep(person) MonotoneSpeech(person)∧ ∧ParkinsonAdvanced(person) = Fall(person)
Control Group PD PatientsMovements of an active
person has a good distribution over X, Y, and
Z axis
Restricted movements bya PD patient can be seen
in the acceleration readings
Audio is well modulated with good variations in the energy of the voice
Audio is not well modulated represented a
monotone speech
Declarative Knowledge of Parkinson’s Disease used to focus
our attention on symptom manifestations in sensor
observations
42
• 25 million people in the U.S. are diagnosed with asthma (7 million are children)1.
• 300 million people suffering from asthma worldwide2.
• Asthma related healthcare costs alone are around $50 billion a year2.
• 155,000 hospital admissions and 593,000 emergency department visits in 20063.
1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.
Asthma: Severity of the problem
43
Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels.
Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.
Variety Volume
VeracityVelocity
Value
Can we detect the asthma severity level?Can we characterize asthma control level?What risk factors influence asthma control?What is the contribution of each risk factor?
sem
antic
s Understanding relationships betweenhealth signals and asthma attacksfor providing actionable information
WHY Big Data to Smart Data: Healthcare example
44
Population Level
Personal
Public Health
Variety: Health signals span heterogeneous sourcesVolume: Health signals are fine grainedVelocity: Real-time change in situationsVeracity: Reliability of health signals may be compromised
Value: Can I reduce my asthma attacks at night?
Decision support to doctorsby providing them with
deeper insights into patientasthma care
Asthma: Demonstration of Value
45
Sensordrone – for monitoring environmental air quality
Wheezometer – for monitoringwheezing sounds
Can I reduce my asthma attacks at night?
What are the triggers?What is the wheezing level?
What is the propensity toward asthma?
What is the exposure level over a day?
What is the air quality indoors?
Commute to Work
Personal
Public Health
Population Level
Closing the window at homein the morning and taking analternate route to office may
lead to reduced asthma attacks
Actionable Information
Asthma: Actionable Information for Asthma Patients
46
Personal, Public Health, and Population Level Signals for Monitoring Asthma
ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist
Asthma Control and Actionable Information
Sensors and their observations for understanding asthma
47
Personal Level Signals
Societal Level Signals
(Personal Level Signals)
(Personalized Societal Level Signal)
(Societal Level Signals)Societal Level Signals
Relevant to the Personal Level
Personal Level Sensors
(kHealth**) (EventShop*)
Qualify QuantifyAction
Recommendation
What are the features influencing my asthma?What is the contribution of each of these features?
How controlled is my asthma? (risk score)What will be my action plan to manage asthma?
Storage
Societal Level Sensors
Asthma Early Warning Model (AEWM)
Query AEWM
Verify & augmentdomain knowledge
Recommended Action
Action Justification
Asthma Early Warning Model
*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4
48
Population Level
Personal
Wheeze – YesDo you have tightness of chest? –Yes
Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding
<Wheezing=Yes, time, location>
<ChectTightness=Yes, time, location>
<PollenLevel=Medium, time, location>
<Pollution=Yes, time, location>
<Activity=High, time, location>
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
RiskCategory
<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>
.
.
.
Expert Knowledge
Background Knowledge
tweet reporting pollution level and asthma attacks
Acceleration readings fromon-phone sensors
Sensor and personal observations
Signals from personal, personal spaces, and community spaces
Risk Category assigned by doctors
Qualify
Quantify
Enrich
Outdoor pollen and pollution
Public Health
Health Signal Extraction to Understanding
Well Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor
49
… and do it efficiently and at scale
What if we could automate this sense making ability?
50
People are good at making sense of sensory input
What can we learn from cognitive models of perception?• The key ingredient is prior knowledge
51* based on Neisser’s cognitive model of perception
ObserveProperty
PerceiveFeature
Explanation
Discrimination
1
2
Perception Cycle*
Translating low-level signals into high-level knowledge
Focusing attention on those aspects of the environment that provide useful information
Prior Knowledge
52
To enable machine perception,
Semantic Web technology is used to integrate sensor data with prior knowledge on the Web
53
Prior knowledge on the Web
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
54
Prior knowledge on the Web
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
55
ObserveProperty
PerceiveFeature
Explanation1
Translating low-level signals into high-level knowledge
Explanation
Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
56
Explanation
Inference to the best explanation• In general, explanation is an abductive problem; and
hard to compute
Finding the sweet spot between abduction and OWL• Single-feature assumption* enables use of OWL-DL
deductive reasoner
* An explanation must be a single feature which accounts forall observed properties
Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
57
Explanation
Explanatory Feature: a feature that explains the set of observed properties
ExplanatoryFeature ≡ ssn:isPropertyOf∃ —.{p1} … ssn:isPropertyOf⊓ ⊓ ∃ —.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
58
Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features
ObserveProperty
PerceiveFeature
Explanation
Discrimination2
Focusing attention on those aspects of the environment that provide useful information
Discrimination
59
Discrimination
Expected Property: would be explained by every explanatory feature
ExpectedProperty ≡ ssn:isPropertyOf.{f∃ 1} … ssn:isPropertyOf.{f⊓ ⊓ ∃ n}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Expected Property Explanatory Feature
60
Discrimination
Not Applicable Property: would not be explained by any explanatory feature
NotApplicableProperty ≡ ¬ ssn:isPropertyOf.{f∃ 1} … ¬ ssn:isPropertyOf.{f⊓ ⊓ ∃ n}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Not Applicable Property Explanatory Feature
61
Discrimination
Discriminating Property: is neither expected nor not-applicable
DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Discriminating Property Explanatory Feature
62
Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information
canary in a coal mine
Our Motivation
kHealth: knowledge-enabled healthcare
63
Qualities-High BP-Increased Weight
Entities-Hypertension-Hypothyroidism
kHealth
Machine Sensors
Personal Input
EMR/PHR
Comorbidity risk score e.g., Charlson Index
Longitudinal studies of cardiovascular risks
- Find correlations- Validation - domain knowledge - domain expert
Parameterize the model
Risk Assessment Model
Current Observations-Physical-Physiological-History
Risk Score(Actionable Information)
Model CreationValidate correlations
Historical observations of each patient
Risk Score: from Data to Abstraction and Actionable Information
64
How do we implement machine perception efficiently on aresource-constrained device?
Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time
• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)
65
intelligence at the edge
Approach 1: Send all sensor observations to the cloud for processing
Approach 2: downscale semantic processing so that each device is capable of machine perception
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
66
Efficient execution of machine perception
Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning
0101100011010011110010101100011011011010110001101001111001010110001101011000110100111
67
O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to
linear
Evaluation on a mobile device
68
2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web
3 Intelligence at the edgeBy downscaling semantic inference, machine perception can
execute efficiently on resource-constrained devices
Semantic Perception for smarter analytics: 3 ideas to takeaway
1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making
69
• Real Time Feature Streams: http://www.youtube.com/watch?v=_ews4w_eCpg
• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4
Demos
74
Smart Data in Social Media Analytics
To Understand the human social dynamics in real world events
75
0.5B Tweets per day
0.5B Users
60% on Mobile
5530 Tweets per second related to the Japan earthquake and tsunami
17000 Tweets per second
Twitter During Real-world Events of Interest
http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
76
Social Semantic
Web Application
Real time
Multi Faceted
Analysis
Insights of Important Events including disaster response
coordination
http://usatoday30.usatoday.com/news/politics/twitter-election-meter
http://twitris.knoesis.org/
77
State of the Art – Uni/Bi Dimensional Analysis During Elections
Topics
Sentiments
78
Twitris’ Dimensions of Integrated Semantic Analysis
Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013
79http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278
80
[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]
/t
81
Twitris: Sentiment Analysis- Smart Answers with reasoning!
How was Obama doing in the first debate?
82
Red Color: Negative TopicsGreen Color: Positive Topics
Twitris: Sentiment Analysis- Smart Answers with reasoning!
How was Obama doing in the second debate?
SMART DATA IS ABOUT ANALYSIS FOR REASONING (what caused the positive sentiment for Democrats) BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)
http://knoesis.wright.edu/library/resource.php?id=1787
83
Top 100 influential users that talks about Barack Obama
Positive or Negative Influence
Twitris: Network Analysis
SMART DATA TELLS YOU HOW CAN A SYSTEM BE TWEAKED FOR THE DESIRED ACTIONS!
Could we engage with users (targeted) with extreme polarity leaning for Obama to spark an agenda in the whole
network of voters (ACTION)?
84
Twitris: Community Evolution
SMART DATA FOCUSES ON THE CAUSALITY OF CHANGES IN REAL-WORLD ACTIONS!
Romney
Obama
Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates
Before 1st debate
After 1st debate
After Hurricane Sandy
After 3rd debate
85
The Dead People mentioned in the event OWC
Twitris: Impact of Background Knowledge
86
How People from Different parts of the world talked
about US Election
Images and Videos Related to US Election
Twitris: Analysis by Location
87
What is Smart Data in the context of Disaster Management
ACTIONABLE: Timely delivery of right resources and information to the right people at right location!
Because everyone wants to Help, but DON’T KNOW HOW!
Join us for the Social Good!
http://twitris.knoesis.org
RT @OpOKRelief: Southgate Baptist Church
on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you can't go,call 794
Text \"FOOD\" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10
in storm relief. #moore #oklahoma
#disasterrelief #donate
Want to help animals in #Oklahoma? @ASPCA tells
how you can help: http://t.co/mt8l9PwzmO
CITIZEN SENSORS
RESPONSE TEAMS (including humanitarian
org. and ‘pseudo’ responders)
VICTIM SITE
Coordination of needs and offers
Using Social MediaDoes anyone
know where to send a check to donate to the
tornado victims?
Where do I go to help out for
volunteer work around Moore? Anyone know?
Anyone know where to donate
to help the animals from the
Oklahoma disaster?
#oklahoma #dogs
Matched
Matched
Matched
Serving the need!
If you would like to volunteer today, help is desperately
needed in Shawnee. Call 273-5331 for more info
http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-1153161288Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration
in-progress: with QCRI
89
Smart Data from Twitris system for Disaster Response Coordination
Which are the primary locations with most negative sentiments/emotions?
Who are all the people to engage with for better information
diffusion?Which are the most important organizations acting at my
location?
Smart data provides actionable information and improve decision making through semantic analysis of Big Data.
Who are the resource seekers and suppliers? How can one donate?
90Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination
Disaster Response Coordination Framework
91
Disaster Response Coordination:Twitris Summary for Actionable Nuggets
Important tags to summarize Big Data flow
Related to Oklahoma tornado
Images and Videos Related to Oklahoma tornado
92
Disaster Response Coordination:Twitris Real-time information for needs
Incoming Tweets with need types to give quick idea of what is needed and where
currently #OKC
Legends for Different needs #OKC
(It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
93
Disaster Response Coordination:Influencers to engage with for specific needs
Influential users are respective needs and their interaction
network on the right.
94
Really sparse Signal to Noise:• 2M tweets during the first week after #Oklahoma-tornado-2013
- 1.3% as the highly precise donation requests to help - 0.02% as the highly precise donation offers to help
• Anyone know how to get involved to help the tornado victims in Oklahoma??\#tornado #oklahomacity (OFFER)
• I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)
Disaster Response Coordination:Finding Actionable Nuggets for Responders to act
• Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST)
• Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST)
For responders, most important information is the scarcity and availability of resources, can we mine it via Social Media?
95
• Features driven by the experience of domain experts at the responder organizations
• Examples,– ‘I want to <donate/ help/ bring>’ for extraction of offering
intention– ‘tent house’ OR ‘cots’ for shelter need types
Disaster Response Coordination:Human Knowledge to drive information extraction
96
• A knowledge-driven approach– A rich inventory of metadata for tweets– Semantic matching for needs (query) vs. offers (documents)
• Example, – @bladesofmilford please help get the word out,we are accepting kid clothes to send
to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)
Disaster Response Coordination:Automatic Matching of needs and offers
Matching the competitive intentions
(Needs and Offers) can offload humans for the
task of resource matchmaking for
coordination.
97
Disaster Response Coordination:Engagement Interface for responders
What-Where-How-Who-Why Coordination
Influential users to engage with and resources for
seekers/supplies at a location, at a timestamp
Contextual Information for a
chosen topical tags
98
• Illustrious scenario: #Oklahoma-tornado 2013
Disaster Response Coordination:Anecdote for the value of Smart Data
FEMA asked us to quickly filter out gas-leak related data
Mining the data for smart nuggets to inform FEMA (Timely needs)
Engaged with the author of this information to confirm (Veracity)
e.g., All gas leaks in #moore were capped and stopped by 11:30 last night (at 5/22/2013 1:41:37)
Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders)e.g., I want to go to Oklahoma this weekend & do what i can to help those people with food,cloths & supplies,im in the feel of wanting to help ! :)
99
An event is a dynamic topic that evolves and might later fork into several distinct events.
Smart Data analytics to capture rapidly evolving social data events
Social Media is the pulse of the populace, a true reflection of
events all over the globe!
100
Continuous Semantics
101
Heliopolis is a suburb of
Cairo.
Dynamic Model Creation
Continuous Semantics
102
Even
ts
“Both Ahmadinejad & Mousavi declare victory in Iranian Elections.”
“situation in tehran University is so worrisome. police have attacked to girls dormitory #tehran #iranelection”
“Reports from Azadi Square - 4 people killed by police, people killed police who shot. More shots being fired #iranelections”June 12 2009 June 13 2009 June 15 2009
Key p
hra
ses
Mod
els
Ahmadinejad & Mousavi are politicians in
Iran
Tehran University is a University in
Iran
Azadi Square is a city square in
Tehran
Dynamic Model Creation:
Example of how background knowledge help understand situation described in the tweets,
while also updating knowledge model also
103
How is Continuous Semantics a form of Smart Data Analytics?
Keeping the Background Knowledge abreast with the changes of the event
Smartly learning and adapting data acquisition (Temporally apt Big Data, i.e. Fast Data)
In-turn providing temporally relevant Smart Data through analysis
104
Smart Data Analytics in Traffic Management
To improve the everyday life entangled due to our most common problem of sticking in traffic
105
By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1
1The Crisis of Public Transport in India2IBM Smarter Traffic
Modes of transportation in Indian Cities
Texas Transportation Institute (TTI) Congestion report in U.S.
Severity of the Traffic Problem
106
Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual)
http://511.org/
Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.
Variety Volume
VeracityVelocity
ValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we provide actionable information to decision makers?
sem
antic
s Representing prior knowledge of traffic lead to a focused exploration of this massive dataset
Big Data to Smart Data: Traffic Management example
107
Slow moving traffic
Link Description
Scheduled Event
Scheduled Event
511.org
511.org
Schedule Information
511.org
Traffic Monitoring
Heterogeneity in a Physical-Cyber-Social System
108
Heterogeneity in a Physical-Cyber-Social System
109
• Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):
– Scheduled Events: music events, fair, theatre events, concerts, road work, repairs, etc.
– Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc.
– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm
• Each of these events may have a varying impact on traffic.• A delay prediction algorithm should process multimodal and
multi-sensory observations.
Uncertainty in a Physical-Cyber-Social System
110
• Internal observations– Speed, volume, and travel time observations– Correlations may exist between these variables
across different parts of the network• External events
– Accident, music event, sporting event, and planned events
– External events and internal observations may exhibit correlations
Modeling Traffic Events
111
Accident
Music event
Sporting eventRoad Work
Theatre event
External events<ActiveEvents, ScheduledEvents>
Internal observations<speed, volume, traveTime>
Weather
Time of Day
Modeling Traffic Events
113
Domain Experts
cold
PoorVisibility
SlowTraffic
IcyRoad
Declarative domain knowledge
Causal knowledge
Linked Open Data
Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)
1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0
Domain Observations
Domain Knowledge
Structure and parameters
Complementing Probabilistic Models with Declarative Knowledge
Correlations to causations using Declarative knowledge on the Semantic Web
114
• Declarative knowledge about various domains are increasingly being published on the web1,2.
• Declarative knowledge describes concepts and relationships in a domain (structure).
• Linked Open Data may be used to derive priors probability of events (parameters).
• Explored the use declarative knowledge for structure using ConceptNet 5.
1http://conceptnet5.media.mit.edu/2http://linkeddata.org/
Domain Knowledge
115http://conceptnet5.media.mit.edu/web/c/en/traffic_jam
Delay
go to baseball game
traffic jam
traffic accident
traffic jam
ActiveEvent
ScheduledEvent
Causestraffic jam
Causestraffic jam
CapableOfslow traffic
CapableOfoccur twice each day
Causes
is_a
bad weatherCapableOf
slow traffic
road iceCauses
accident
TimeOfDay
go to concertHasSubevent
car crash
accidentRelatedTo
car crash
BadWeather
Causes
Causes
is_ais_a
is_a is_a is_a
is_a
is_a
ConceptNet 5
116
Traffic jam
Link Description
Scheduled Event
traffic jambaseball game
Add missing random variables
Time of day
bad weather CapableOf slow traffic
bad weather
Traffic data from sensors deployed on road network in San Francisco Bay Area
time of day
traffic jambaseball gametime of day
slow traffic
Three Operations: Complementing graphical model structure extraction
Add missing links bad weather
traffic jambaseball gametime of day
slow traffic
Add link directionbad weather
traffic jambaseball gametime of day
slow traffic
go to baseball game Causes traffic jam
Knowledge from ConceptNet5
traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic
117
Scheduled Event
Active Event
Day of week Time of day
delay
Travel time
speed
volume
Structure extracted formtraffic observations (sensors + textual) using statistical techniques
Scheduled Event
Active Event
Day of week
Time of day
delayTravel time
speed
volume
Bad Weather
Enriched structure which has link directions and new nodes such as “Bad Weather” potentially leading to better delay predictions
Enriched Probabilistic Models using ConceptNet 5
119
Take Away
• It is all about the human – not computing, not device– Computing for human experience
• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!):– Of Human, By Human, For Human
– But in serving human needs, there is a lot more than what current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions
120
Acknowledgements
• Kno.e.sis team• Funds: NSF, NIH, AFRL, Industry…
• Note:• For images and sources, if not on slides, please see slide notes• Some images were taken from the Web Search results and all such images belong
to their respective owners, we are grateful to the owners for usefulness of these images in our context.
121
• OpenSource: http://knoesis.org/opensource• Showcase: http://knoesis.org/showcase • Vision: http://knoesis.org/node/266 • Publications: http://knoesis.org/library • Smart Data Vision:
http://wiki.knoesis.org/index.php/Smart_Data
Paper: Krishnaprasad Thirunarayan and Amit Sheth, Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications
References and Further Readings
122
Thanks …
Amit Sheth’s PHD students
Ashutosh Jadhav
Hemant Purohit
Vinh Nguyen
Lu ChenPavan
KapanipathiPramod
Anantharam
Sujan Perera
Alan Smith
Pramod Koneru
Maryam Panahiazar
Sarasi Lalithsena
Cory Henson
Kalpa Gunaratna
Delroy Cameron
Sanjaya Wijeratne
Wenbo Wang
Kno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
124
thank you, and please visit us at
http://knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA
Smart Data
top related