transforming big data into smart data: deriving value via harnessing volume, variety and velocity...

Post on 26-Jan-2015

110 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

See instead more recent version (ICDE2014 keynote): http://j.mp/ICDE-key A video of a version of this talk: http://youtu.be/8RhpFlfpJ-A Amit Sheth, "Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web," keynote at the 21st Italian Symposium on Advanced Database Systems, June 30 - July 03 2013, Roccella Jonica, Italy. Also invited talks given in Universities in Spain and Italy in June 2013. Highlight: How to harness Smart Data that is actionable, from the Voluminous Big Data with Velocity and Variety-- using Semantics and the Semantic Web core to bring Human-Centric Computing in practice. Abstract from: http://www.sebd2013.unirc.it/invitedSpeakers.html Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of data, the raison d'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. To accomplish this task requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data. For Volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration, and discuss how this can not simply be wished away using NoSQL. Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics , which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations. Additional background at: http://knoesis.org/vision > SmartData and "Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications," http://www.knoesis.org/library/resource.php?id=1889 .

TRANSCRIPT

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity

using semantics and Semantic Web

Keynote at SEBD2013, July 1, 2013, Italy & invited talks in Spain & Italy, June 2013.

Put Knoesis Banner

The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA

Pavan Kapanipathi

Pramod Anantharam

Amit Sheth

Cory Henson

Dr. T.K. Prasad

Maryam Panahiazar

Contributions by many, but Special Thanks to:

Hemant Purohit

2

2011

How much data?

48(2013)

500(2013)

http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/

3

1% of the data is used for analysis.

http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume

4

Variety

Structured

Unstructured

Semi structured

Audio

Video

Images

5

Velocity

Fast Data

Rapid Changes

Real-Time/Stream Analysis

Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail

6

• Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare

– Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns

– Emphasis on technologies to handle volume/scale, and to a lesser extent velocity: Hadoop, NoSQL,MPP warehouse ….

– Full faith in the power of data (no hypothesis), bottom up analysis

Current Focus on Big Data

7

• What if your data volume gets so large and varied you don't know how to deal with it?

• Do you store all your data?• Do you analyze it all?• How can you find out which data points are

really important?• How can you use it to your best competitive

advantage?

Questions typically asked on Big Data

http://www.sas.com/big-data/

8http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/

Variety of Data Analytics Enablers

9

• Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathematical

models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds

– Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• NY city manholes problem [ICML Discussion, 2012]• Netflix and Amazon’s recommendations, which of the sales people are more effective, …

Illustrative Big Data Applications

10

• Reaction and not innovation• Correlations are not causations

– “we accept the likelihood of what will happen without knowing why it will happen” [1]

• Right answers are good but often are just not sufficient to make a person take an action (see [2])

• Big data cannot replace human judgment [3] - Beyond what a data based prediction says, people also use many subjective, social and soft cues in the process of taking an action (past experience – personal and social circle, personal preferences- eg willingness to be different/ longevity vs quality of life,…)

• Sometime, better algorithms beat more data

What is missing with current “Big Data” focused solutions?

[1]: http://itsminfo.com/a-critique-of-pure-data-part-2/[2] http://www.bbc.com/capital/story/20130612-will-big-data-drown-out-genius/ [3] http://www.matchcite.com/blog/blog/2012/july/big-data-cannot-replace-human-judgment.aspx[4] http://allthingsd.com/20121128/better-algorithms-beat-more-data-and-heres-why/

11

• Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well-being; better disaster coordination) that is highly personalized/individualized/contextualized– Incorporate real-world complexity: multi-modal and multi-sensory nature of

real-world and human perception– Need deeper understanding of data and its role to information (e.g., skew,

coverage)

• Human involvement and guidance (directly or indirectly): Leading to actionable information, understanding and insight right in the context of human activities– Bottom-up & Top-down processing: Infusion of models and background

knowledge (data + knowledge + reasoning)

What is missing?

12

Contextual

Information Smart Data

Makes Sense

Actionable or help decision support/making

13

Smart Data

Smart data makes sense out of Big data

It provides value from harnessing the challenges posed by volume, velocity,

variety and veracity of big data, in-turn providing actionable information and

improve decision making.

Smart data relies on more than big data –It uses background knowledge, experiences,

advanced and contextualized reasoning, and is often highly personalized.

14

“OF human, BY human and FOR human”

Smart data is focused on the actionable value achieved by human involvement

in data creation, processing and consumption phases for improving the

human experience.Smart data is about improving human experience.

Another perspective on Smart Data

15

DescriptiveExploratoryInferentialPredictive

Causal

Improved Analytics CREATION

PROCESSING

EXPERIENCE & DECISION MAKING

Human Centric Computing

16

“OF human, BY human and FOR human”

Let’s discuss this perspective on Smart Data

17Petabytes of Physical(sensory)-Cyber-Social Data everyday!

More on PCS Computing: http://wiki.knoesis.org/index.php/PCS

‘OF human’ : Relevant Real-time Data Streams for Human Experience

18

“OF human, BY human and FOR human”

a perspective on Smart Data

Use of Prior Human-created Knowledge Models

19

‘BY human’: Involving Crowd Intelligence in data processing workflows

Crowdsourcing and Domain-expert guided Machine Learning Modeling

20

“OF human, BY human and FOR human”

a perspective on Smart Data

21

Detection of events, such as wheezing sound, indoor temperature, humidity,

dust, and CO2 level

Weather Application

Asthma Healthcare Application

Close the window at home during day to avoid CO2 in

gush, to avoid asthma attacks at night

‘FOR human’ : Improving Human Experience

Population Level

Personal

Public Health

Action in the Physical World

22

Why do we care about Smart Data rather than Big Data?

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity

using semantics and Semantic Web

Keynote at SEBD2013, July 1, 2013, Italy & invited talks in Spain & Italy, June 2013.

Put Knoesis Banner

The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA

Pavan Kapanipathi

Pramod Anantharam

Amit Sheth

Cory Henson

Dr. T.K. Prasad

Maryam Panahiazar

Contributions by many, but Special Thanks to:

Hemant Purohit

24

Second-costliest hurricane in United States history estimated damage $75 billion

90-115 mph winds

State of Emergency in New York

285 people killed on the track of Sandy

750,000 without power (NY)

Immense devastation and Human suffering

Big Data to Smart Data: Disaster Management example

http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html

25

20 million tweets with “sandy”, “hurricane” keywords between Oct 27th and Nov 1st

2nd most popular topic on Facebook during 2012

Social (Big) Data during Hurricane Sandy

• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding

• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html

• http://mashable.com/2012/10/31/hurricane-sandy-facebook/

26

For information seeking

For timely information

For unique information

For unfiltered information

To determine disaster magnitude

To check in with family and friends

To self-mobilize

To maintain a sense of community

To seek emotional support and healing

Governments

Emergency management organizations

Journalists

Disaster responders

Public

BIG DATA TO SMART DATA: WHY? and FOR WHOM?

Fraustino et al. Social Media Use during Disasters: A Review of the Knowledge Base and Gaps. US Dept. of Homeland Security, START 2012.

27

Improving situational awareness - Timely delivery of necessary information to the right people

Improving coordination between resource seekers and suppliers

Detecting the magnitude of disaster by people sentiments.

Many more challenges…

Can SNS’s make Disaster Management easier – Giving Actionable Information (Smart Data)

http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html

28

Volume

Twitter hits half a billion tweets a day!

Challenges

Delivering the necessary actionable/information to the right people

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US

29

Velocity

Volume

@ConEdison Twitter handle that the company had only set up in June gained an extra 16,000 followers over the storm. – Did the information reach everyone?

Challenges

Delivering the necessary/actionable information to the right people

Rate of Data ArrivalApproximately 7000 TPS10 images per second on instagram

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf

30

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US

Velocity

Variety

Volume

Semi Structured

Structured

Unstructured

SensorsLinked Open Data

Wikipedia

Challenges

Delivering the necessary/actionable information to the right people

31

Velocity

Variety

Veracity

Volume

Challenges

Delivering the necessary/actionable information to the right people

http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys

32

Velocity

Variety

Veracity

Volume

33

Value

-Makes Sense-Actionable Information-Decision support/making

Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/

Smart Data focuses on the

value

34

Value

-Makes Sense-Actionable Information-Decision support/making

Disaster Management

Victims

Timely and Contextual Information about• Electricity, Food, Water, Shelter and

donation offers related to the disaster.

Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/

35

DescriptiveExploratoryInferentialPredictive

Causal

Human Centric Computing

Improved Analytics Creation

Processing

Experience

Revisiting..

36

• Healthcare– kHealth– SemHeath

• Social event coordination– Twitris

• Traffic monitoring– kTraffic

Applications of Smart Data Analytics

37

The Patient of the FutureMIT Technology Review, 2012

http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/

38

To gain new insight in

patient care &

early indications of

disease

Smart Data in Healthcare

39

Sensing is a key enabler of the Internet of Things

BUT, how do we make sense of the resulting avalanche of sensor data?

50 Billion Things by 2020 (Cisco)

40

Parkinson’s disease (PD) data from The Michael J. Fox Foundation for Parkinson’s Research.

1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data

8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data).

Variety Volume

VeracityVelocity

ValueCan we detect the onset of Parkinson’s disease?Can we characterize the disease progression?Can we provide actionable information to the patient?

sem

antic

s Representing prior knowledge of PD led to a focused exploration of this massive dataset

WHY Big Data to Smart Data: Healthcare example

41

Big Data to Smart Data Using a Knowledge Based Approach

ParkinsonMild(person) = Tremor(person) PoorBalance(person)∧ParkinsonModerate(person) = MoveSlow(person) PoorSleep(person) MonotoneSpeech(person)∧ ∧ParkinsonAdvanced(person) = Fall(person)

Control Group PD PatientsMovements of an active

person has a good distribution over X, Y, and

Z axis

Restricted movements bya PD patient can be seen

in the acceleration readings

Audio is well modulated with good variations in the energy of the voice

Audio is not well modulated represented a

monotone speech

Declarative Knowledge of Parkinson’s Disease used to focus

our attention on symptom manifestations in sensor

observations

42

• 25 million people in the U.S. are diagnosed with asthma (7 million are children)1.

• 300 million people suffering from asthma worldwide2.

• Asthma related healthcare costs alone are around $50 billion a year2.

• 155,000 hospital admissions and 593,000 emergency department visits in 20063.

1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.

Asthma: Severity of the problem

43

Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels.

Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.

Variety Volume

VeracityVelocity

Value

Can we detect the asthma severity level?Can we characterize asthma control level?What risk factors influence asthma control?What is the contribution of each risk factor?

sem

antic

s Understanding relationships betweenhealth signals and asthma attacksfor providing actionable information

WHY Big Data to Smart Data: Healthcare example

44

Population Level

Personal

Public Health

Variety: Health signals span heterogeneous sourcesVolume: Health signals are fine grainedVelocity: Real-time change in situationsVeracity: Reliability of health signals may be compromised

Value: Can I reduce my asthma attacks at night?

Decision support to doctorsby providing them with

deeper insights into patientasthma care

Asthma: Demonstration of Value

45

Sensordrone – for monitoring environmental air quality

Wheezometer – for monitoringwheezing sounds

Can I reduce my asthma attacks at night?

What are the triggers?What is the wheezing level?

What is the propensity toward asthma?

What is the exposure level over a day?

What is the air quality indoors?

Commute to Work

Personal

Public Health

Population Level

Closing the window at homein the morning and taking analternate route to office may

lead to reduced asthma attacks

Actionable Information

Asthma: Actionable Information for Asthma Patients

46

Personal, Public Health, and Population Level Signals for Monitoring Asthma

ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist

Asthma Control and Actionable Information

Sensors and their observations for understanding asthma

47

Personal Level Signals

Societal Level Signals

(Personal Level Signals)

(Personalized Societal Level Signal)

(Societal Level Signals)Societal Level Signals

Relevant to the Personal Level

Personal Level Sensors

(kHealth**) (EventShop*)

Qualify QuantifyAction

Recommendation

What are the features influencing my asthma?What is the contribution of each of these features?

How controlled is my asthma? (risk score)What will be my action plan to manage asthma?

Storage

Societal Level Sensors

Asthma Early Warning Model (AEWM)

Query AEWM

Verify & augmentdomain knowledge

Recommended Action

Action Justification

Asthma Early Warning Model

*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4

48

Population Level

Personal

Wheeze – YesDo you have tightness of chest? –Yes

Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding

<Wheezing=Yes, time, location>

<ChectTightness=Yes, time, location>

<PollenLevel=Medium, time, location>

<Pollution=Yes, time, location>

<Activity=High, time, location>

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

RiskCategory

<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>

.

.

.

Expert Knowledge

Background Knowledge

tweet reporting pollution level and asthma attacks

Acceleration readings fromon-phone sensors

Sensor and personal observations

Signals from personal, personal spaces, and community spaces

Risk Category assigned by doctors

Qualify

Quantify

Enrich

Outdoor pollen and pollution

Public Health

Health Signal Extraction to Understanding

Well Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor

49

… and do it efficiently and at scale

What if we could automate this sense making ability?

50

People are good at making sense of sensory input

What can we learn from cognitive models of perception?• The key ingredient is prior knowledge

51* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Perception Cycle*

Translating low-level signals into high-level knowledge

Focusing attention on those aspects of the environment that provide useful information

Prior Knowledge

52

To enable machine perception,

Semantic Web technology is used to integrate sensor data with prior knowledge on the Web

53

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

54

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

56

Explanation

Inference to the best explanation• In general, explanation is an abductive problem; and

hard to compute

Finding the sweet spot between abduction and OWL• Single-feature assumption* enables use of OWL-DL

deductive reasoner

* An explanation must be a single feature which accounts forall observed properties

Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building

57

Explanation

Explanatory Feature: a feature that explains the set of observed properties

ExplanatoryFeature ≡ ssn:isPropertyOf∃ —.{p1} … ssn:isPropertyOf⊓ ⊓ ∃ —.{pn}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Observed Property Explanatory Feature

58

Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features

ObserveProperty

PerceiveFeature

Explanation

Discrimination2

Focusing attention on those aspects of the environment that provide useful information

Discrimination

59

Discrimination

Expected Property: would be explained by every explanatory feature

ExpectedProperty ≡ ssn:isPropertyOf.{f∃ 1} … ssn:isPropertyOf.{f⊓ ⊓ ∃ n}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Expected Property Explanatory Feature

60

Discrimination

Not Applicable Property: would not be explained by any explanatory feature

NotApplicableProperty ≡ ¬ ssn:isPropertyOf.{f∃ 1} … ¬ ssn:isPropertyOf.{f⊓ ⊓ ∃ n}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Not Applicable Property Explanatory Feature

61

Discrimination

Discriminating Property: is neither expected nor not-applicable

DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Discriminating Property Explanatory Feature

62

Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information

canary in a coal mine

Our Motivation

kHealth: knowledge-enabled healthcare

63

Qualities-High BP-Increased Weight

Entities-Hypertension-Hypothyroidism

kHealth

Machine Sensors

Personal Input

EMR/PHR

Comorbidity risk score e.g., Charlson Index

Longitudinal studies of cardiovascular risks

- Find correlations- Validation - domain knowledge - domain expert

Parameterize the model

Risk Assessment Model

Current Observations-Physical-Physiological-History

Risk Score(Actionable Information)

Model CreationValidate correlations

Historical observations of each patient

Risk Score: from Data to Abstraction and Actionable Information

64

How do we implement machine perception efficiently on aresource-constrained device?

Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time

• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)

65

intelligence at the edge

Approach 1: Send all sensor observations to the cloud for processing

Approach 2: downscale semantic processing so that each device is capable of machine perception

Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.

66

Efficient execution of machine perception

Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning

0101100011010011110010101100011011011010110001101001111001010110001101011000110100111

67

O(n3) < x < O(n4) O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to

linear

Evaluation on a mobile device

68

2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web

3 Intelligence at the edgeBy downscaling semantic inference, machine perception can

execute efficiently on resource-constrained devices

Semantic Perception for smarter analytics: 3 ideas to takeaway

1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making

69

• Real Time Feature Streams: http://www.youtube.com/watch?v=_ews4w_eCpg

• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4

Demos

74

Smart Data in Social Media Analytics

To Understand the human social dynamics in real world events

75

0.5B Tweets per day

0.5B Users

60% on Mobile

5530 Tweets per second related to the Japan earthquake and tsunami

17000 Tweets per second

Twitter During Real-world Events of Interest

http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/

76

Social Semantic

Web Application

Real time

Multi Faceted

Analysis

Insights of Important Events including disaster response

coordination

http://usatoday30.usatoday.com/news/politics/twitter-election-meter

http://twitris.knoesis.org/

77

State of the Art – Uni/Bi Dimensional Analysis During Elections

Topics

Sentiments

78

Twitris’ Dimensions of Integrated Semantic Analysis

Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013

79http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278

80

[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]

/t

81

Twitris: Sentiment Analysis- Smart Answers with reasoning!

How was Obama doing in the first debate?

82

Red Color: Negative TopicsGreen Color: Positive Topics

Twitris: Sentiment Analysis- Smart Answers with reasoning!

How was Obama doing in the second debate?

SMART DATA IS ABOUT ANALYSIS FOR REASONING (what caused the positive sentiment for Democrats) BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)

http://knoesis.wright.edu/library/resource.php?id=1787

83

Top 100 influential users that talks about Barack Obama

Positive or Negative Influence

Twitris: Network Analysis

SMART DATA TELLS YOU HOW CAN A SYSTEM BE TWEAKED FOR THE DESIRED ACTIONS!

Could we engage with users (targeted) with extreme polarity leaning for Obama to spark an agenda in the whole

network of voters (ACTION)?

84

Twitris: Community Evolution

SMART DATA FOCUSES ON THE CAUSALITY OF CHANGES IN REAL-WORLD ACTIONS!

Romney

Obama

Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates

Before 1st debate

After 1st debate

After Hurricane Sandy

After 3rd debate

85

The Dead People mentioned in the event OWC

Twitris: Impact of Background Knowledge

86

How People from Different parts of the world talked

about US Election

Images and Videos Related to US Election

Twitris: Analysis by Location

87

What is Smart Data in the context of Disaster Management

ACTIONABLE: Timely delivery of right resources and information to the right people at right location!

Because everyone wants to Help, but DON’T KNOW HOW!

Join us for the Social Good!

http://twitris.knoesis.org

RT @OpOKRelief: Southgate Baptist Church

on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you can't go,call 794

Text \"FOOD\" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10

in storm relief. #moore #oklahoma

#disasterrelief #donate

Want to help animals in #Oklahoma? @ASPCA tells

how you can help: http://t.co/mt8l9PwzmO

CITIZEN SENSORS

RESPONSE TEAMS (including humanitarian

org. and ‘pseudo’ responders)

VICTIM SITE

Coordination of needs and offers

Using Social MediaDoes anyone

know where to send a check to donate to the

tornado victims?

Where do I go to help out for

volunteer work around Moore? Anyone know?

Anyone know where to donate

to help the animals from the

Oklahoma disaster?

#oklahoma #dogs

Matched

Matched

Matched

Serving the need!

If you would like to volunteer today, help is desperately

needed in Shawnee. Call 273-5331 for more info

http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-1153161288Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration

in-progress: with QCRI

89

Smart Data from Twitris system for Disaster Response Coordination

Which are the primary locations with most negative sentiments/emotions?

Who are all the people to engage with for better information

diffusion?Which are the most important organizations acting at my

location?

Smart data provides actionable information and improve decision making through semantic analysis of Big Data.

Who are the resource seekers and suppliers? How can one donate?

91

Disaster Response Coordination:Twitris Summary for Actionable Nuggets

Important tags to summarize Big Data flow

Related to Oklahoma tornado

Images and Videos Related to Oklahoma tornado

92

Disaster Response Coordination:Twitris Real-time information for needs

Incoming Tweets with need types to give quick idea of what is needed and where

currently #OKC

Legends for Different needs #OKC

(It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado

93

Disaster Response Coordination:Influencers to engage with for specific needs

Influential users are respective needs and their interaction

network on the right.

94

Really sparse Signal to Noise:• 2M tweets during the first week after #Oklahoma-tornado-2013

- 1.3% as the highly precise donation requests to help - 0.02% as the highly precise donation offers to help

• Anyone know how to get involved to help the tornado victims in Oklahoma??\#tornado #oklahomacity (OFFER)

• I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)

Disaster Response Coordination:Finding Actionable Nuggets for Responders to act

• Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST)

• Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST)

For responders, most important information is the scarcity and availability of resources, can we mine it via Social Media?

95

• Features driven by the experience of domain experts at the responder organizations

• Examples,– ‘I want to <donate/ help/ bring>’ for extraction of offering

intention– ‘tent house’ OR ‘cots’ for shelter need types

Disaster Response Coordination:Human Knowledge to drive information extraction

96

• A knowledge-driven approach– A rich inventory of metadata for tweets– Semantic matching for needs (query) vs. offers (documents)

• Example, – @bladesofmilford please help get the word out,we are accepting kid clothes to send

to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)

Disaster Response Coordination:Automatic Matching of needs and offers

Matching the competitive intentions

(Needs and Offers) can offload humans for the

task of resource matchmaking for

coordination.

97

Disaster Response Coordination:Engagement Interface for responders

What-Where-How-Who-Why Coordination

Influential users to engage with and resources for

seekers/supplies at a location, at a timestamp

Contextual Information for a

chosen topical tags

98

• Illustrious scenario: #Oklahoma-tornado 2013

Disaster Response Coordination:Anecdote for the value of Smart Data

FEMA asked us to quickly filter out gas-leak related data

Mining the data for smart nuggets to inform FEMA (Timely needs)

Engaged with the author of this information to confirm (Veracity)

e.g., All gas leaks in #moore were capped and stopped by 11:30 last night (at 5/22/2013 1:41:37)

Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders)e.g., I want to go to Oklahoma this weekend & do what i can to help those people with food,cloths & supplies,im in the feel of wanting to help ! :)

99

An event is a dynamic topic that evolves and might later fork into several distinct events.

Smart Data analytics to capture rapidly evolving social data events

Social Media is the pulse of the populace, a true reflection of

events all over the globe!

100

Continuous Semantics

101

Heliopolis is a suburb of

Cairo.

Dynamic Model Creation

Continuous Semantics

102

Even

ts

“Both Ahmadinejad & Mousavi declare victory in Iranian Elections.”

“situation in tehran University is so worrisome. police have attacked to girls dormitory #tehran #iranelection”

“Reports from Azadi Square - 4 people killed by police, people killed police who shot. More shots being fired #iranelections”June 12 2009 June 13 2009 June 15 2009

Key p

hra

ses

Mod

els

Ahmadinejad & Mousavi are politicians in

Iran

Tehran University is a University in

Iran

Azadi Square is a city square in

Tehran

Dynamic Model Creation:

Example of how background knowledge help understand situation described in the tweets,

while also updating knowledge model also

103

How is Continuous Semantics a form of Smart Data Analytics?

Keeping the Background Knowledge abreast with the changes of the event

Smartly learning and adapting data acquisition (Temporally apt Big Data, i.e. Fast Data)

In-turn providing temporally relevant Smart Data through analysis

104

Smart Data Analytics in Traffic Management

To improve the everyday life entangled due to our most common problem of sticking in traffic

105

By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1

1The Crisis of Public Transport in India2IBM Smarter Traffic

Modes of transportation in Indian Cities

Texas Transportation Institute (TTI) Congestion report in U.S.

Severity of the Traffic Problem

106

Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual)

http://511.org/

Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.

Variety Volume

VeracityVelocity

ValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we provide actionable information to decision makers?

sem

antic

s Representing prior knowledge of traffic lead to a focused exploration of this massive dataset

Big Data to Smart Data: Traffic Management example

107

Slow moving traffic

Link Description

Scheduled Event

Scheduled Event

511.org

511.org

Schedule Information

511.org

Traffic Monitoring

Heterogeneity in a Physical-Cyber-Social System

108

Heterogeneity in a Physical-Cyber-Social System

109

• Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):

– Scheduled Events: music events, fair, theatre events, concerts, road work, repairs, etc.

– Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc.

– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm

• Each of these events may have a varying impact on traffic.• A delay prediction algorithm should process multimodal and

multi-sensory observations.

Uncertainty in a Physical-Cyber-Social System

110

• Internal observations– Speed, volume, and travel time observations– Correlations may exist between these variables

across different parts of the network• External events

– Accident, music event, sporting event, and planned events

– External events and internal observations may exhibit correlations

Modeling Traffic Events

111

Accident

Music event

Sporting eventRoad Work

Theatre event

External events<ActiveEvents, ScheduledEvents>

Internal observations<speed, volume, traveTime>

Weather

Time of Day

Modeling Traffic Events

113

Domain Experts

cold

PoorVisibility

SlowTraffic

IcyRoad

Declarative domain knowledge

Causal knowledge

Linked Open Data

Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)

1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0

Domain Observations

Domain Knowledge

Structure and parameters

Complementing Probabilistic Models with Declarative Knowledge

Correlations to causations using Declarative knowledge on the Semantic Web

114

• Declarative knowledge about various domains are increasingly being published on the web1,2.

• Declarative knowledge describes concepts and relationships in a domain (structure).

• Linked Open Data may be used to derive priors probability of events (parameters).

• Explored the use declarative knowledge for structure using ConceptNet 5.

1http://conceptnet5.media.mit.edu/2http://linkeddata.org/

Domain Knowledge

115http://conceptnet5.media.mit.edu/web/c/en/traffic_jam

Delay

go to baseball game

traffic jam

traffic accident

traffic jam

ActiveEvent

ScheduledEvent

Causestraffic jam

Causestraffic jam

CapableOfslow traffic

CapableOfoccur twice each day

Causes

is_a

bad weatherCapableOf

slow traffic

road iceCauses

accident

TimeOfDay

go to concertHasSubevent

car crash

accidentRelatedTo

car crash

BadWeather

Causes

Causes

is_ais_a

is_a is_a is_a

is_a

is_a

ConceptNet 5

116

Traffic jam

Link Description

Scheduled Event

traffic jambaseball game

Add missing random variables

Time of day

bad weather CapableOf slow traffic

bad weather

Traffic data from sensors deployed on road network in San Francisco Bay Area

time of day

traffic jambaseball gametime of day

slow traffic

Three Operations: Complementing graphical model structure extraction

Add missing links bad weather

traffic jambaseball gametime of day

slow traffic

Add link directionbad weather

traffic jambaseball gametime of day

slow traffic

go to baseball game Causes traffic jam

Knowledge from ConceptNet5

traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic

117

Scheduled Event

Active Event

Day of week Time of day

delay

Travel time

speed

volume

Structure extracted formtraffic observations (sensors + textual) using statistical techniques

Scheduled Event

Active Event

Day of week

Time of day

delayTravel time

speed

volume

Bad Weather

Enriched structure which has link directions and new nodes such as “Bad Weather” potentially leading to better delay predictions

Enriched Probabilistic Models using ConceptNet 5

119

Take Away

• It is all about the human – not computing, not device– Computing for human experience

• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!):– Of Human, By Human, For Human

– But in serving human needs, there is a lot more than what current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions

120

Acknowledgements

• Kno.e.sis team• Funds: NSF, NIH, AFRL, Industry…

• Note:• For images and sources, if not on slides, please see slide notes• Some images were taken from the Web Search results and all such images belong

to their respective owners, we are grateful to the owners for usefulness of these images in our context.

121

• OpenSource: http://knoesis.org/opensource• Showcase: http://knoesis.org/showcase • Vision: http://knoesis.org/node/266 • Publications: http://knoesis.org/library • Smart Data Vision:

http://wiki.knoesis.org/index.php/Smart_Data

Paper: Krishnaprasad Thirunarayan and Amit Sheth, Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

References and Further Readings

122

Thanks …

124

thank you, and please visit us at

http://knoesis.org

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA

Smart Data

top related