transforming big data into smart data: deriving value via harnessing volume, variety and velocity...

118
Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web Keynote at SEBD2013 , July 1, 2013, Italy & invited talks in Spain & Italy, June 2013. Put Knoesis Banner The Ohio Center of Excellence in Knowledge-enabled Computing ( Kno.e.sis ), Wright State, USA Pavan Kapanipa thi Pramod Ananthar am Amit Sheth Cory He nson Dr. T.K. Prasad Maryam Panahiaz ar Contributions by many, but Special Thanks to: Hemant Purohi t

Upload: amit-sheth

Post on 26-Jan-2015

110 views

Category:

Technology


2 download

DESCRIPTION

See instead more recent version (ICDE2014 keynote): http://j.mp/ICDE-key A video of a version of this talk: http://youtu.be/8RhpFlfpJ-A Amit Sheth, "Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web," keynote at the 21st Italian Symposium on Advanced Database Systems, June 30 - July 03 2013, Roccella Jonica, Italy. Also invited talks given in Universities in Spain and Italy in June 2013. Highlight: How to harness Smart Data that is actionable, from the Voluminous Big Data with Velocity and Variety-- using Semantics and the Semantic Web core to bring Human-Centric Computing in practice. Abstract from: http://www.sebd2013.unirc.it/invitedSpeakers.html Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of data, the raison d'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. To accomplish this task requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data. For Volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration, and discuss how this can not simply be wished away using NoSQL. Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics , which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations. Additional background at: http://knoesis.org/vision > SmartData and "Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications," http://www.knoesis.org/library/resource.php?id=1889 .

TRANSCRIPT

Page 1: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity

using semantics and Semantic Web

Keynote at SEBD2013, July 1, 2013, Italy & invited talks in Spain & Italy, June 2013.

Put Knoesis Banner

The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA

Pavan Kapanipathi

Pramod Anantharam

Amit Sheth

Cory Henson

Dr. T.K. Prasad

Maryam Panahiazar

Contributions by many, but Special Thanks to:

Hemant Purohit

Page 2: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

2

2011

How much data?

48(2013)

500(2013)

http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/

Page 3: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

3

1% of the data is used for analysis.

http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume

Page 4: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

4

Variety

Structured

Unstructured

Semi structured

Audio

Video

Images

Page 5: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

5

Velocity

Fast Data

Rapid Changes

Real-Time/Stream Analysis

Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail

Page 6: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

6

• Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare

– Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns

– Emphasis on technologies to handle volume/scale, and to a lesser extent velocity: Hadoop, NoSQL,MPP warehouse ….

– Full faith in the power of data (no hypothesis), bottom up analysis

Current Focus on Big Data

Page 7: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

7

• What if your data volume gets so large and varied you don't know how to deal with it?

• Do you store all your data?• Do you analyze it all?• How can you find out which data points are

really important?• How can you use it to your best competitive

advantage?

Questions typically asked on Big Data

http://www.sas.com/big-data/

Page 8: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

8http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/

Variety of Data Analytics Enablers

Page 9: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

9

• Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathematical

models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds

– Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• NY city manholes problem [ICML Discussion, 2012]• Netflix and Amazon’s recommendations, which of the sales people are more effective, …

Illustrative Big Data Applications

Page 10: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

10

• Reaction and not innovation• Correlations are not causations

– “we accept the likelihood of what will happen without knowing why it will happen” [1]

• Right answers are good but often are just not sufficient to make a person take an action (see [2])

• Big data cannot replace human judgment [3] - Beyond what a data based prediction says, people also use many subjective, social and soft cues in the process of taking an action (past experience – personal and social circle, personal preferences- eg willingness to be different/ longevity vs quality of life,…)

• Sometime, better algorithms beat more data

What is missing with current “Big Data” focused solutions?

[1]: http://itsminfo.com/a-critique-of-pure-data-part-2/[2] http://www.bbc.com/capital/story/20130612-will-big-data-drown-out-genius/ [3] http://www.matchcite.com/blog/blog/2012/july/big-data-cannot-replace-human-judgment.aspx[4] http://allthingsd.com/20121128/better-algorithms-beat-more-data-and-heres-why/

Page 11: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

11

• Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well-being; better disaster coordination) that is highly personalized/individualized/contextualized– Incorporate real-world complexity: multi-modal and multi-sensory nature of

real-world and human perception– Need deeper understanding of data and its role to information (e.g., skew,

coverage)

• Human involvement and guidance (directly or indirectly): Leading to actionable information, understanding and insight right in the context of human activities– Bottom-up & Top-down processing: Infusion of models and background

knowledge (data + knowledge + reasoning)

What is missing?

Page 12: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

12

Contextual

Information Smart Data

Makes Sense

Actionable or help decision support/making

Page 13: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

13

Smart Data

Smart data makes sense out of Big data

It provides value from harnessing the challenges posed by volume, velocity,

variety and veracity of big data, in-turn providing actionable information and

improve decision making.

Smart data relies on more than big data –It uses background knowledge, experiences,

advanced and contextualized reasoning, and is often highly personalized.

Page 14: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

14

“OF human, BY human and FOR human”

Smart data is focused on the actionable value achieved by human involvement

in data creation, processing and consumption phases for improving the

human experience.Smart data is about improving human experience.

Another perspective on Smart Data

Page 15: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

15

DescriptiveExploratoryInferentialPredictive

Causal

Improved Analytics CREATION

PROCESSING

EXPERIENCE & DECISION MAKING

Human Centric Computing

Page 16: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

16

“OF human, BY human and FOR human”

Let’s discuss this perspective on Smart Data

Page 17: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

17Petabytes of Physical(sensory)-Cyber-Social Data everyday!

More on PCS Computing: http://wiki.knoesis.org/index.php/PCS

‘OF human’ : Relevant Real-time Data Streams for Human Experience

Page 18: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

18

“OF human, BY human and FOR human”

a perspective on Smart Data

Page 19: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

Use of Prior Human-created Knowledge Models

19

‘BY human’: Involving Crowd Intelligence in data processing workflows

Crowdsourcing and Domain-expert guided Machine Learning Modeling

Page 20: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

20

“OF human, BY human and FOR human”

a perspective on Smart Data

Page 21: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

21

Detection of events, such as wheezing sound, indoor temperature, humidity,

dust, and CO2 level

Weather Application

Asthma Healthcare Application

Close the window at home during day to avoid CO2 in

gush, to avoid asthma attacks at night

‘FOR human’ : Improving Human Experience

Population Level

Personal

Public Health

Action in the Physical World

Page 22: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

22

Why do we care about Smart Data rather than Big Data?

Page 23: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity

using semantics and Semantic Web

Keynote at SEBD2013, July 1, 2013, Italy & invited talks in Spain & Italy, June 2013.

Put Knoesis Banner

The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA

Pavan Kapanipathi

Pramod Anantharam

Amit Sheth

Cory Henson

Dr. T.K. Prasad

Maryam Panahiazar

Contributions by many, but Special Thanks to:

Hemant Purohit

Page 24: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

24

Second-costliest hurricane in United States history estimated damage $75 billion

90-115 mph winds

State of Emergency in New York

285 people killed on the track of Sandy

750,000 without power (NY)

Immense devastation and Human suffering

Big Data to Smart Data: Disaster Management example

http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html

Page 25: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

25

20 million tweets with “sandy”, “hurricane” keywords between Oct 27th and Nov 1st

2nd most popular topic on Facebook during 2012

Social (Big) Data during Hurricane Sandy

• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding

• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html

• http://mashable.com/2012/10/31/hurricane-sandy-facebook/

Page 26: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

26

For information seeking

For timely information

For unique information

For unfiltered information

To determine disaster magnitude

To check in with family and friends

To self-mobilize

To maintain a sense of community

To seek emotional support and healing

Governments

Emergency management organizations

Journalists

Disaster responders

Public

BIG DATA TO SMART DATA: WHY? and FOR WHOM?

Fraustino et al. Social Media Use during Disasters: A Review of the Knowledge Base and Gaps. US Dept. of Homeland Security, START 2012.

Page 27: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

27

Improving situational awareness - Timely delivery of necessary information to the right people

Improving coordination between resource seekers and suppliers

Detecting the magnitude of disaster by people sentiments.

Many more challenges…

Can SNS’s make Disaster Management easier – Giving Actionable Information (Smart Data)

http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html

Page 28: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

28

Volume

Twitter hits half a billion tweets a day!

Challenges

Delivering the necessary actionable/information to the right people

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US

Page 29: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

29

Velocity

Volume

@ConEdison Twitter handle that the company had only set up in June gained an extra 16,000 followers over the storm. – Did the information reach everyone?

Challenges

Delivering the necessary/actionable information to the right people

Rate of Data ArrivalApproximately 7000 TPS10 images per second on instagram

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf

Page 30: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

30

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US

Velocity

Variety

Volume

Semi Structured

Structured

Unstructured

SensorsLinked Open Data

Wikipedia

Challenges

Delivering the necessary/actionable information to the right people

Page 31: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

31

Velocity

Variety

Veracity

Volume

Challenges

Delivering the necessary/actionable information to the right people

http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys

Page 32: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

32

Velocity

Variety

Veracity

Volume

Page 33: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

33

Value

-Makes Sense-Actionable Information-Decision support/making

Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/

Smart Data focuses on the

value

Page 34: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

34

Value

-Makes Sense-Actionable Information-Decision support/making

Disaster Management

Victims

Timely and Contextual Information about• Electricity, Food, Water, Shelter and

donation offers related to the disaster.

Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/

Page 35: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

35

DescriptiveExploratoryInferentialPredictive

Causal

Human Centric Computing

Improved Analytics Creation

Processing

Experience

Revisiting..

Page 36: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

36

• Healthcare– kHealth– SemHeath

• Social event coordination– Twitris

• Traffic monitoring– kTraffic

Applications of Smart Data Analytics

Page 37: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

37

The Patient of the FutureMIT Technology Review, 2012

http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/

Page 38: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

38

To gain new insight in

patient care &

early indications of

disease

Smart Data in Healthcare

Page 39: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

39

Sensing is a key enabler of the Internet of Things

BUT, how do we make sense of the resulting avalanche of sensor data?

50 Billion Things by 2020 (Cisco)

Page 40: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

40

Parkinson’s disease (PD) data from The Michael J. Fox Foundation for Parkinson’s Research.

1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data

8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data).

Variety Volume

VeracityVelocity

ValueCan we detect the onset of Parkinson’s disease?Can we characterize the disease progression?Can we provide actionable information to the patient?

sem

antic

s Representing prior knowledge of PD led to a focused exploration of this massive dataset

WHY Big Data to Smart Data: Healthcare example

Page 41: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

41

Big Data to Smart Data Using a Knowledge Based Approach

ParkinsonMild(person) = Tremor(person) PoorBalance(person)∧ParkinsonModerate(person) = MoveSlow(person) PoorSleep(person) MonotoneSpeech(person)∧ ∧ParkinsonAdvanced(person) = Fall(person)

Control Group PD PatientsMovements of an active

person has a good distribution over X, Y, and

Z axis

Restricted movements bya PD patient can be seen

in the acceleration readings

Audio is well modulated with good variations in the energy of the voice

Audio is not well modulated represented a

monotone speech

Declarative Knowledge of Parkinson’s Disease used to focus

our attention on symptom manifestations in sensor

observations

Page 42: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

42

• 25 million people in the U.S. are diagnosed with asthma (7 million are children)1.

• 300 million people suffering from asthma worldwide2.

• Asthma related healthcare costs alone are around $50 billion a year2.

• 155,000 hospital admissions and 593,000 emergency department visits in 20063.

1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.

Asthma: Severity of the problem

Page 43: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

43

Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels.

Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.

Variety Volume

VeracityVelocity

Value

Can we detect the asthma severity level?Can we characterize asthma control level?What risk factors influence asthma control?What is the contribution of each risk factor?

sem

antic

s Understanding relationships betweenhealth signals and asthma attacksfor providing actionable information

WHY Big Data to Smart Data: Healthcare example

Page 44: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

44

Population Level

Personal

Public Health

Variety: Health signals span heterogeneous sourcesVolume: Health signals are fine grainedVelocity: Real-time change in situationsVeracity: Reliability of health signals may be compromised

Value: Can I reduce my asthma attacks at night?

Decision support to doctorsby providing them with

deeper insights into patientasthma care

Asthma: Demonstration of Value

Page 45: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

45

Sensordrone – for monitoring environmental air quality

Wheezometer – for monitoringwheezing sounds

Can I reduce my asthma attacks at night?

What are the triggers?What is the wheezing level?

What is the propensity toward asthma?

What is the exposure level over a day?

What is the air quality indoors?

Commute to Work

Personal

Public Health

Population Level

Closing the window at homein the morning and taking analternate route to office may

lead to reduced asthma attacks

Actionable Information

Asthma: Actionable Information for Asthma Patients

Page 46: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

46

Personal, Public Health, and Population Level Signals for Monitoring Asthma

ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist

Asthma Control and Actionable Information

Sensors and their observations for understanding asthma

Page 47: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

47

Personal Level Signals

Societal Level Signals

(Personal Level Signals)

(Personalized Societal Level Signal)

(Societal Level Signals)Societal Level Signals

Relevant to the Personal Level

Personal Level Sensors

(kHealth**) (EventShop*)

Qualify QuantifyAction

Recommendation

What are the features influencing my asthma?What is the contribution of each of these features?

How controlled is my asthma? (risk score)What will be my action plan to manage asthma?

Storage

Societal Level Sensors

Asthma Early Warning Model (AEWM)

Query AEWM

Verify & augmentdomain knowledge

Recommended Action

Action Justification

Asthma Early Warning Model

*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4

Page 48: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

48

Population Level

Personal

Wheeze – YesDo you have tightness of chest? –Yes

Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding

<Wheezing=Yes, time, location>

<ChectTightness=Yes, time, location>

<PollenLevel=Medium, time, location>

<Pollution=Yes, time, location>

<Activity=High, time, location>

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

RiskCategory

<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>

.

.

.

Expert Knowledge

Background Knowledge

tweet reporting pollution level and asthma attacks

Acceleration readings fromon-phone sensors

Sensor and personal observations

Signals from personal, personal spaces, and community spaces

Risk Category assigned by doctors

Qualify

Quantify

Enrich

Outdoor pollen and pollution

Public Health

Health Signal Extraction to Understanding

Well Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor

Page 49: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

49

… and do it efficiently and at scale

What if we could automate this sense making ability?

Page 50: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

50

People are good at making sense of sensory input

What can we learn from cognitive models of perception?• The key ingredient is prior knowledge

Page 51: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

51* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Perception Cycle*

Translating low-level signals into high-level knowledge

Focusing attention on those aspects of the environment that provide useful information

Prior Knowledge

Page 52: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

52

To enable machine perception,

Semantic Web technology is used to integrate sensor data with prior knowledge on the Web

Page 53: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

53

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Page 54: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

54

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Page 56: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

56

Explanation

Inference to the best explanation• In general, explanation is an abductive problem; and

hard to compute

Finding the sweet spot between abduction and OWL• Single-feature assumption* enables use of OWL-DL

deductive reasoner

* An explanation must be a single feature which accounts forall observed properties

Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building

Page 57: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

57

Explanation

Explanatory Feature: a feature that explains the set of observed properties

ExplanatoryFeature ≡ ssn:isPropertyOf∃ —.{p1} … ssn:isPropertyOf⊓ ⊓ ∃ —.{pn}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Observed Property Explanatory Feature

Page 58: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

58

Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features

ObserveProperty

PerceiveFeature

Explanation

Discrimination2

Focusing attention on those aspects of the environment that provide useful information

Discrimination

Page 59: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

59

Discrimination

Expected Property: would be explained by every explanatory feature

ExpectedProperty ≡ ssn:isPropertyOf.{f∃ 1} … ssn:isPropertyOf.{f⊓ ⊓ ∃ n}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Expected Property Explanatory Feature

Page 60: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

60

Discrimination

Not Applicable Property: would not be explained by any explanatory feature

NotApplicableProperty ≡ ¬ ssn:isPropertyOf.{f∃ 1} … ¬ ssn:isPropertyOf.{f⊓ ⊓ ∃ n}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Not Applicable Property Explanatory Feature

Page 61: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

61

Discrimination

Discriminating Property: is neither expected nor not-applicable

DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Discriminating Property Explanatory Feature

Page 62: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

62

Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information

canary in a coal mine

Our Motivation

kHealth: knowledge-enabled healthcare

Page 63: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

63

Qualities-High BP-Increased Weight

Entities-Hypertension-Hypothyroidism

kHealth

Machine Sensors

Personal Input

EMR/PHR

Comorbidity risk score e.g., Charlson Index

Longitudinal studies of cardiovascular risks

- Find correlations- Validation - domain knowledge - domain expert

Parameterize the model

Risk Assessment Model

Current Observations-Physical-Physiological-History

Risk Score(Actionable Information)

Model CreationValidate correlations

Historical observations of each patient

Risk Score: from Data to Abstraction and Actionable Information

Page 64: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

64

How do we implement machine perception efficiently on aresource-constrained device?

Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time

• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)

Page 65: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

65

intelligence at the edge

Approach 1: Send all sensor observations to the cloud for processing

Approach 2: downscale semantic processing so that each device is capable of machine perception

Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.

Page 66: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

66

Efficient execution of machine perception

Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning

0101100011010011110010101100011011011010110001101001111001010110001101011000110100111

Page 67: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

67

O(n3) < x < O(n4) O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to

linear

Evaluation on a mobile device

Page 68: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

68

2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web

3 Intelligence at the edgeBy downscaling semantic inference, machine perception can

execute efficiently on resource-constrained devices

Semantic Perception for smarter analytics: 3 ideas to takeaway

1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making

Page 69: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

69

• Real Time Feature Streams: http://www.youtube.com/watch?v=_ews4w_eCpg

• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4

Demos

Page 70: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

74

Smart Data in Social Media Analytics

To Understand the human social dynamics in real world events

Page 71: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

75

0.5B Tweets per day

0.5B Users

60% on Mobile

5530 Tweets per second related to the Japan earthquake and tsunami

17000 Tweets per second

Twitter During Real-world Events of Interest

http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/

Page 72: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

76

Social Semantic

Web Application

Real time

Multi Faceted

Analysis

Insights of Important Events including disaster response

coordination

http://usatoday30.usatoday.com/news/politics/twitter-election-meter

http://twitris.knoesis.org/

Page 73: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

77

State of the Art – Uni/Bi Dimensional Analysis During Elections

Topics

Sentiments

Page 74: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

78

Twitris’ Dimensions of Integrated Semantic Analysis

Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013

Page 75: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

79http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278

Page 76: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

80

[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]

/t

Page 77: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

81

Twitris: Sentiment Analysis- Smart Answers with reasoning!

How was Obama doing in the first debate?

Page 78: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

82

Red Color: Negative TopicsGreen Color: Positive Topics

Twitris: Sentiment Analysis- Smart Answers with reasoning!

How was Obama doing in the second debate?

SMART DATA IS ABOUT ANALYSIS FOR REASONING (what caused the positive sentiment for Democrats) BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)

http://knoesis.wright.edu/library/resource.php?id=1787

Page 79: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

83

Top 100 influential users that talks about Barack Obama

Positive or Negative Influence

Twitris: Network Analysis

SMART DATA TELLS YOU HOW CAN A SYSTEM BE TWEAKED FOR THE DESIRED ACTIONS!

Could we engage with users (targeted) with extreme polarity leaning for Obama to spark an agenda in the whole

network of voters (ACTION)?

Page 80: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

84

Twitris: Community Evolution

SMART DATA FOCUSES ON THE CAUSALITY OF CHANGES IN REAL-WORLD ACTIONS!

Romney

Obama

Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates

Before 1st debate

After 1st debate

After Hurricane Sandy

After 3rd debate

Page 81: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

85

The Dead People mentioned in the event OWC

Twitris: Impact of Background Knowledge

Page 82: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

86

How People from Different parts of the world talked

about US Election

Images and Videos Related to US Election

Twitris: Analysis by Location

Page 83: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

87

What is Smart Data in the context of Disaster Management

ACTIONABLE: Timely delivery of right resources and information to the right people at right location!

Because everyone wants to Help, but DON’T KNOW HOW!

Page 84: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

Join us for the Social Good!

http://twitris.knoesis.org

RT @OpOKRelief: Southgate Baptist Church

on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you can't go,call 794

Text \"FOOD\" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10

in storm relief. #moore #oklahoma

#disasterrelief #donate

Want to help animals in #Oklahoma? @ASPCA tells

how you can help: http://t.co/mt8l9PwzmO

CITIZEN SENSORS

RESPONSE TEAMS (including humanitarian

org. and ‘pseudo’ responders)

VICTIM SITE

Coordination of needs and offers

Using Social MediaDoes anyone

know where to send a check to donate to the

tornado victims?

Where do I go to help out for

volunteer work around Moore? Anyone know?

Anyone know where to donate

to help the animals from the

Oklahoma disaster?

#oklahoma #dogs

Matched

Matched

Matched

Serving the need!

If you would like to volunteer today, help is desperately

needed in Shawnee. Call 273-5331 for more info

http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-1153161288Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration

in-progress: with QCRI

Page 85: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

89

Smart Data from Twitris system for Disaster Response Coordination

Which are the primary locations with most negative sentiments/emotions?

Who are all the people to engage with for better information

diffusion?Which are the most important organizations acting at my

location?

Smart data provides actionable information and improve decision making through semantic analysis of Big Data.

Who are the resource seekers and suppliers? How can one donate?

Page 87: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

91

Disaster Response Coordination:Twitris Summary for Actionable Nuggets

Important tags to summarize Big Data flow

Related to Oklahoma tornado

Images and Videos Related to Oklahoma tornado

Page 88: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

92

Disaster Response Coordination:Twitris Real-time information for needs

Incoming Tweets with need types to give quick idea of what is needed and where

currently #OKC

Legends for Different needs #OKC

(It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado

Page 89: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

93

Disaster Response Coordination:Influencers to engage with for specific needs

Influential users are respective needs and their interaction

network on the right.

Page 90: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

94

Really sparse Signal to Noise:• 2M tweets during the first week after #Oklahoma-tornado-2013

- 1.3% as the highly precise donation requests to help - 0.02% as the highly precise donation offers to help

• Anyone know how to get involved to help the tornado victims in Oklahoma??\#tornado #oklahomacity (OFFER)

• I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)

Disaster Response Coordination:Finding Actionable Nuggets for Responders to act

• Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST)

• Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST)

For responders, most important information is the scarcity and availability of resources, can we mine it via Social Media?

Page 91: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

95

• Features driven by the experience of domain experts at the responder organizations

• Examples,– ‘I want to <donate/ help/ bring>’ for extraction of offering

intention– ‘tent house’ OR ‘cots’ for shelter need types

Disaster Response Coordination:Human Knowledge to drive information extraction

Page 92: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

96

• A knowledge-driven approach– A rich inventory of metadata for tweets– Semantic matching for needs (query) vs. offers (documents)

• Example, – @bladesofmilford please help get the word out,we are accepting kid clothes to send

to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)

Disaster Response Coordination:Automatic Matching of needs and offers

Matching the competitive intentions

(Needs and Offers) can offload humans for the

task of resource matchmaking for

coordination.

Page 93: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

97

Disaster Response Coordination:Engagement Interface for responders

What-Where-How-Who-Why Coordination

Influential users to engage with and resources for

seekers/supplies at a location, at a timestamp

Contextual Information for a

chosen topical tags

Page 94: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

98

• Illustrious scenario: #Oklahoma-tornado 2013

Disaster Response Coordination:Anecdote for the value of Smart Data

FEMA asked us to quickly filter out gas-leak related data

Mining the data for smart nuggets to inform FEMA (Timely needs)

Engaged with the author of this information to confirm (Veracity)

e.g., All gas leaks in #moore were capped and stopped by 11:30 last night (at 5/22/2013 1:41:37)

Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders)e.g., I want to go to Oklahoma this weekend & do what i can to help those people with food,cloths & supplies,im in the feel of wanting to help ! :)

Page 95: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

99

An event is a dynamic topic that evolves and might later fork into several distinct events.

Smart Data analytics to capture rapidly evolving social data events

Social Media is the pulse of the populace, a true reflection of

events all over the globe!

Page 96: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

100

Continuous Semantics

Page 97: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

101

Heliopolis is a suburb of

Cairo.

Dynamic Model Creation

Continuous Semantics

Page 98: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

102

Even

ts

“Both Ahmadinejad & Mousavi declare victory in Iranian Elections.”

“situation in tehran University is so worrisome. police have attacked to girls dormitory #tehran #iranelection”

“Reports from Azadi Square - 4 people killed by police, people killed police who shot. More shots being fired #iranelections”June 12 2009 June 13 2009 June 15 2009

Key p

hra

ses

Mod

els

Ahmadinejad & Mousavi are politicians in

Iran

Tehran University is a University in

Iran

Azadi Square is a city square in

Tehran

Dynamic Model Creation:

Example of how background knowledge help understand situation described in the tweets,

while also updating knowledge model also

Page 99: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

103

How is Continuous Semantics a form of Smart Data Analytics?

Keeping the Background Knowledge abreast with the changes of the event

Smartly learning and adapting data acquisition (Temporally apt Big Data, i.e. Fast Data)

In-turn providing temporally relevant Smart Data through analysis

Page 100: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

104

Smart Data Analytics in Traffic Management

To improve the everyday life entangled due to our most common problem of sticking in traffic

Page 101: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

105

By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1

1The Crisis of Public Transport in India2IBM Smarter Traffic

Modes of transportation in Indian Cities

Texas Transportation Institute (TTI) Congestion report in U.S.

Severity of the Traffic Problem

Page 102: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

106

Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual)

http://511.org/

Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.

Variety Volume

VeracityVelocity

ValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we provide actionable information to decision makers?

sem

antic

s Representing prior knowledge of traffic lead to a focused exploration of this massive dataset

Big Data to Smart Data: Traffic Management example

Page 103: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

107

Slow moving traffic

Link Description

Scheduled Event

Scheduled Event

511.org

511.org

Schedule Information

511.org

Traffic Monitoring

Heterogeneity in a Physical-Cyber-Social System

Page 104: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

108

Heterogeneity in a Physical-Cyber-Social System

Page 105: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

109

• Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):

– Scheduled Events: music events, fair, theatre events, concerts, road work, repairs, etc.

– Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc.

– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm

• Each of these events may have a varying impact on traffic.• A delay prediction algorithm should process multimodal and

multi-sensory observations.

Uncertainty in a Physical-Cyber-Social System

Page 106: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

110

• Internal observations– Speed, volume, and travel time observations– Correlations may exist between these variables

across different parts of the network• External events

– Accident, music event, sporting event, and planned events

– External events and internal observations may exhibit correlations

Modeling Traffic Events

Page 107: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

111

Accident

Music event

Sporting eventRoad Work

Theatre event

External events<ActiveEvents, ScheduledEvents>

Internal observations<speed, volume, traveTime>

Weather

Time of Day

Modeling Traffic Events

Page 108: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

113

Domain Experts

cold

PoorVisibility

SlowTraffic

IcyRoad

Declarative domain knowledge

Causal knowledge

Linked Open Data

Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)

1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0

Domain Observations

Domain Knowledge

Structure and parameters

Complementing Probabilistic Models with Declarative Knowledge

Correlations to causations using Declarative knowledge on the Semantic Web

Page 109: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

114

• Declarative knowledge about various domains are increasingly being published on the web1,2.

• Declarative knowledge describes concepts and relationships in a domain (structure).

• Linked Open Data may be used to derive priors probability of events (parameters).

• Explored the use declarative knowledge for structure using ConceptNet 5.

1http://conceptnet5.media.mit.edu/2http://linkeddata.org/

Domain Knowledge

Page 110: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

115http://conceptnet5.media.mit.edu/web/c/en/traffic_jam

Delay

go to baseball game

traffic jam

traffic accident

traffic jam

ActiveEvent

ScheduledEvent

Causestraffic jam

Causestraffic jam

CapableOfslow traffic

CapableOfoccur twice each day

Causes

is_a

bad weatherCapableOf

slow traffic

road iceCauses

accident

TimeOfDay

go to concertHasSubevent

car crash

accidentRelatedTo

car crash

BadWeather

Causes

Causes

is_ais_a

is_a is_a is_a

is_a

is_a

ConceptNet 5

Page 111: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

116

Traffic jam

Link Description

Scheduled Event

traffic jambaseball game

Add missing random variables

Time of day

bad weather CapableOf slow traffic

bad weather

Traffic data from sensors deployed on road network in San Francisco Bay Area

time of day

traffic jambaseball gametime of day

slow traffic

Three Operations: Complementing graphical model structure extraction

Add missing links bad weather

traffic jambaseball gametime of day

slow traffic

Add link directionbad weather

traffic jambaseball gametime of day

slow traffic

go to baseball game Causes traffic jam

Knowledge from ConceptNet5

traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic

Page 112: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

117

Scheduled Event

Active Event

Day of week Time of day

delay

Travel time

speed

volume

Structure extracted formtraffic observations (sensors + textual) using statistical techniques

Scheduled Event

Active Event

Day of week

Time of day

delayTravel time

speed

volume

Bad Weather

Enriched structure which has link directions and new nodes such as “Bad Weather” potentially leading to better delay predictions

Enriched Probabilistic Models using ConceptNet 5

Page 113: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

119

Take Away

• It is all about the human – not computing, not device– Computing for human experience

• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!):– Of Human, By Human, For Human

– But in serving human needs, there is a lot more than what current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions

Page 114: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

120

Acknowledgements

• Kno.e.sis team• Funds: NSF, NIH, AFRL, Industry…

• Note:• For images and sources, if not on slides, please see slide notes• Some images were taken from the Web Search results and all such images belong

to their respective owners, we are grateful to the owners for usefulness of these images in our context.

Page 115: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

121

• OpenSource: http://knoesis.org/opensource• Showcase: http://knoesis.org/showcase • Vision: http://knoesis.org/node/266 • Publications: http://knoesis.org/library • Smart Data Vision:

http://wiki.knoesis.org/index.php/Smart_Data

Paper: Krishnaprasad Thirunarayan and Amit Sheth, Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

References and Further Readings

Page 116: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

122

Thanks …

Page 118: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

124

thank you, and please visit us at

http://knoesis.org

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA

Smart Data