industry perspective: big data and big data analytics...industry perspective: big data and big data...

28
Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director Emerging Internet Technologies IBM Software Group

Upload: others

Post on 09-Feb-2020

51 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

Industry Perspective: Big Data and Big Data Analytics

David BarnesProgram DirectorEmerging Internet TechnologiesIBM Software Group

Page 2: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

What is Big Data?

Page 3: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

The Adjacent Possible

Page 4: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

Inexpensive disk+ Increased processing power

+ Data Warehouse+The Web

+ X

= Big Data

X=Sensors used to gather climate information, posts to social media sites, digital pictures and videos, transaction records, cell phone GPS signals, and more.

Page 5: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

161 exabytes of data were created in 2006 –3 million times the amount of information contained

in all the books ever written.

In 2010 the number reached hit 988 exabytes.

IDC estimates that 1.8 zettabytes were created and replicated in 2011.

Page 6: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Every day, people create the equivalent of 2.5 quintillion bytes of data from sensors, mobile devices,

online transactions, and social networks.

Every month people send one billion Tweets and post 30 billion messages on Facebook.

90% (or more) of the world’s data is unstructured.

Page 7: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

The true nature of information

Page 8: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

Is noisy

Is often times dirty

Is often full of valuable information

Unstructured Data

Page 9: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Big Data has swept into every industry and business function.

Businesses need to put the power of Big Data analytics in the hands of their business employees – Data Scientist is somewhat misleading.

“Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers.” – McKinsey Global Institute

The Big Data Imperative

9

Big Data Business Patterns

Computational Journalism

Chief Legal Officer

Retail Business Planner

IT Systems Management

Pharma - Clinical Trials

Business Fraud Detection

Evidence Based Medicine

Web Archiving

. . .

Page 10: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Today’s Problem

Data growing at compound annual growth of 60%/year

Storage capacity continue to increase dramatically

Storage access speeds have not kept up

At transfer speed of 500 MB/sec - 1 terabyte of data will require ~30 mins to read from single drive

Enter Map/Reduce• Automates the mechanisms of large-scale distributed computation ( i.e. work

distribution, load balancing, replication, failure/recovery)

• Divide & Conquer: Split 1 terabyte split among 100 drives will require ~20 seconds to read

• M/R parallel processing model provides cost effective framework for new generation of analytic applications on unstructured or semi-structured data

Page 11: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Requirement: A New Class of Big Data Applications

Big Data analytics must be brought to the line-of-business user.

•Leverage easy-to-use manipulation metaphors

•Use natural language technologies for analytics

•Provide rich visualizations to quickly identify insights

Page 12: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

DemoBuyer Sentiment Analysis

Page 13: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation SlideSharenomics - Rise of Social Economy

Social Media: Chiliean Earthquake 2010

2010 Chilean earthquake fifth largest earthquake in recorded history

The affected areas suffered major devastation - buildings, airports, hospitals, prisons, bridges, and roads were severely damaged

Land-based communications systems suffered major outages

The wireless 3G infrastructure remained intact and operational

13

Page 14: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation SlideSharenomics - Rise of Social Economy

Social Media: Chiliean Earthquake 2010

14

Social networking on wireless networks major form of communications

Extreme Blue students collected 226 million Tweets, analyzed,categorized by incidence type and location

Tweets included - Can I get food? Can I get gas? Are the bridges down - images

The results were visualized

Completed in ~12 weeks

Page 15: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

15

•Volume - Scale from terabytes to zettabytes•Variety - Relational and non-relational data types from an ever-

expanding variety of sources•Velocity - Streaming data and large volume data movement

Page 16: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

•Volume - Scale from terabytes to zettabytes•Variety - Relational and non-relational data types from an ever-

expanding variety of sources•Velocity - Streaming data and large volume data movement

Page 17: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management
Page 18: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management
Page 19: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

The Supercomputer is based on over 1,200 high powered IBM System X servers and can perform 150 trillion calculations per second -- equivalent to 30 million calculations per Danish citizen per

second.

Vestas expects its data sets will grow to 20-plus petabytes over the next four years.

Page 20: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

•Volume - Scale from terabytes to zettabytes•Variety - Relational and non-relational data types from an ever-

expanding variety of sources•Velocity - Streaming data and large volume data movement

Page 21: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

©  2011  IBM  Corporation

Seton  Healthcare  FamilyReducing  CHF  readmission  to  improve  care  

Business  ChallengeSeton  Healthcare  strives  to  reduce  the  occurrence  of  high  cost  Congestive  Heart  Failure  (CHF)  readmissions  by  proactively  identifying  patients  likely  to  be  readmitted  on  an  emergent  basis.  

What’s  Smart?IBM  Content  and  Predictive  Analytics  for  Healthcare  solution  will  help  to  better  target  and  understand  high-­‐risk  CHF  patients  for  care  management  programs  by:

Smarter  Business  Outcomes• Seton  will  be  able  to  proactively  target  care  management  

and  reduce  re-­‐admission  of  CHF  patients.• Teaming  unstructured  content  with  predictive  analytics,  

Seton  will  be  able  to  identify  patients  likely  for  re-­‐admission  and  introduce  early  interventions  to  reduce  cost,  mortality  

IBM  solution• IBM  Content  and  

Predictive  Analytics  for  Healthcare

• IBM  Cognos  Business  Intelligence

• IBM  BAO  solution  services

• Utilizing  natural  language  processing  to  extract  key  elements  from  unstructured  History  and  Physical,  Discharge  Summaries,  Echocardiogram  Reports,  and  Consult  Notes

• Leveraging  predictive  models  that  have  demonstrated  high  positive  predictive  value  against  extracted  elements  of  structured  and  unstructured  data  

• Providing  an  interface  through  which  providers  can  intuitively  navigate,  interpret  and  take  action

“IBM  Content  and  Predictive  Analytics  for  Healthcare  uses  the  same  type  of  natural  language  processing  as  IBM  Watson,  enabling  us  to  leverage  information  in  new  ways  not  possible  before.  We  can  access  an  integrated  view  of  relevant  clinical  and  operational  information  to  drive  more  informed  decision  making  and  optimize  patient  and  operational  outcomes.”

Page 22: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

©  2011  IBM  CorporaUon2 ©  2011  IBM  CorporaUon

IBM  Content  and  PredicUve  AnalyUcs  for  HealthcareThe  Seton  CHF  Readmission  SoluUon  

Unstructured  Data(Cerner  Clinical  Documenta0on:  History  and  Physical,  Discharge  Summary,  Echocardiogram.)

Structured  Data(Avega  Cost  Data,  DSS  Admission  History,  DSS  Procedure  History,  Cerner  Clinical  Events)

Raw  Informa=on

Search  and  Visually  Explore  (Mine)

Monitor,  Dashboard  and  Report  (Cognos  BI)

Ques%on  and  Answer*

Custom  SoluBons

Dynamic  Mul=modeInterac=on

IBM  Content  and  Predic=ve  Analy=cs

Content  AnalyBcs•Natural  Language  Processing•Medical  Fact  and  Rela0onship  Extrac0on  (Annota0on)

• Trend,  PaIern,  Anomaly,Devia0on  Analysis

PredicBve  AnalyBcs• Predic0ve  Scoring  and  Probability  Analysis

Analyzed  and  Visualized

Informa=on

Health  Integra=on  Framework

Data  Warehouse  and  Model

Master  Data  Management

Advanced  Case  Management

Business  AnalyBcsPartners  (HLI) Specialized  Research

IBM  Watson  for  Healthcare

Confirm  hypotheses  or  seek  alternaFve  ideas  with  confidence  based  responses  from  learned  knowledge*

UUlizing  natural  language  processing  to  extract  key  elements  from  unstructured  History  and  Physical  and  Discharge  Summary

Leveraging  predicUve  models  that  have  demonstrated  high  posiUve  predicUve  value  against  extracted  elements  of  structured  and  unstructured  data  

Providing  an  interface  through  which  providers  can  intuiUvely  navigate,  interpret  and  take  acUon

Page 23: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

©  2011  IBM  CorporaUon

The  Data  We  Thought  Would  Be  Useful  …  Wasn’t

• 113  candidate  predictors  from  structured  and  unstructured  data  sources

• Structured  data  was  less  reliable  then  unstructured  data  –  increased  the  reliance  on  unstructured  data

New  Unexpected  Indicators  Emerged  …  Highly  Predic=ve  Model

• 18  accurate  indicators  or  predictors  (see  next  slide)

Predictor  Analysis %  EncountersStructured  Data

%  Encounters  Unstructured  Data

Ejec0on  Frac0on  (LVEF) 2% 74%

Smoking  Indicator 35%(65%  Accurate)

81%(95%  Accurate)

Living  Arrangements <1% 73%(100%  Accurate)

Drug  and  Alcohol  Abuse 16% 81%

Assisted  Living 0% 13%

What  Really  Causes  Readmissions  at  SetonKey  Findings

3

97%  at  80th  percen0le

49%  at  20th  percen0le

Page 24: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

©  2011  IBM  CorporaUon

Cognos  dashboard  reporUng  system  can  help  in  monitoring  the  key  clinical,  operaUonal  and  financial  metrics.    More  importantly,  being  able  to  track  down  the  top  priority  cases  for  case  management.  

5

Visualizing  the  Results:  Readmissions  Dashboard

1.Clinical  Sta=s=cs:  admission  count,  readmission  count    and  readmission  rate

2.Opera=onal  Sta=s=c:  Counts  of  different  length  of  stay  periods

3.Financial  Sta=s=c:    Total  direct  cost  by  total  admission  and  by  readmission

4.Mortality:  mortality  rate5.Average  length  of  stay  6.Average  direct  cost  by  total  admission  and  by  readmission  only

7.PA  Model  Score:  Distribu0on  of  propensity  of  readmission

1 2 3

4 5 6

7

Page 25: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

•Volume - Scale from terabytes to zettabytes•Variety - Relational and non-relational data types from an ever-

expanding variety of sources•Velocity - Streaming data and large volume data movement

Page 26: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

USC Annenberg School of Communications

Page 27: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

InfoSphere Streams

27

Page 28: Industry Perspective: Big Data and Big Data Analytics...Industry Perspective: Big Data and Big Data Analytics David Barnes Program Director ... Master*Data*Management Advanced*Case*Management

© 2010 IBM Corporation

Big Data Platform Vision

28

Big Data Enterprise Engines

Big Data Solutions

Internet Scale AnalyticsStreaming Analytics

Developers End Users Administrators

Big Data User Environments

Bringing Big Data to the EnterpriseClient and Partner Solutions

Open Source Foundational Components

Hadoop MapReduce HDFS Hbase Pig Lucene Jaql

AG

ENTS

INTEG

RATIO

N

Marketing

Warehouse Appliances

Data Warehouse

Database

Analytics

Business Intelligence

Master Data Mgmt

InfoSphere Warehouse

Netezza

InfoSphere MDM

DB2

SPSS

Cognos

Unica