future of data hortonworks data platform and hortonworks ... · hortonworks connected data...

30
Future of Data Hortonworks Data Platform and Hortonworks Data Flow Eric Thorsen, VP Industry Solutions

Upload: dinhquynh

Post on 04-Jun-2018

304 views

Category:

Documents


2 download

TRANSCRIPT

Future of DataHortonworks Data Platform and Hortonworks Data FlowEric Thorsen, VP Industry Solutions

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Explosion of Data

Consumer Behavior

• Millennials now outnumber Baby Boomers as dominant transactional generation and will constitute 50% of workforce in next few years, 75% of workforce by 2030

• 2.5 Billion Connected People on Social networks by 2020, 75 Billion Connected Devices by 2020

Big Data Trends

• The number of U.S. firms using big data has jumped 58 percentage points to 63% penetration

• 70% of firms now say that big data is of critical importance to their firms, from only 21% in 2012. One of the fastest tech-adoption rates ever.

• The title of chief data officer — the C-Suite manager of big data — a title that until recently didn’t even exist, is now found in 54% of companies surveyed.

Data Exploding with unprecedented data types

• Sensors, iBeacons, Weighted Shelves, Smart Hangers, Smart Bins, Smart Racks

• Social Media, Tweets, Mentions, Likes, Blogs

• Clickstream, Web logs, video feeds

• Server activity “80% of the world’s data has been created in the last two years.”Ginni Rometty, IBM CEO – January 2014

Big Data Executive Survey 2016 – NewVantage Partners

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

DATA – More Volume and More Types

I N C R E A S I N G D A T A V A R I E T Y A N D C O M P L E X I T Y

USER GENERATED CONTENT

MOBILE WEB

SMS/MMS

SENTIMENT

EXTERNAL DEMOGRAPHICS

HD VIDEO

SPEECH TO TEXT

PRODUCT/SERVICE LOGS

SOCIAL NETWORK

BUSINESS DATA FEEDS

USER CLICK STREAM

WEB LOGS

OFFER HISTORY DYNAMIC PRICING

A/B TESTING

AFFILIATE NETWORKS

SEARCH MARKETING

BEHAVIORAL TARGETING

DYNAMIC FUNNELSPAYMENTRECORD

SUPPORT CONTACTS

CUSTOMER TOUCHESPURCHASE DETAIL

PURCHASERECORD

SEGMENTATIONOFFER DETAILS

P E T A B Y T E S

T E R A B Y T E S

G I G A B Y T E S

E X A B Y T E S

E R P

B I G D A T A

W E B

C R M

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Traditional systems under pressure

Challenges• Constrains data to app

• Can’t manage new data

• Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012

2.8 Zettabytes

1

2 New Data

ERP CRM SCM

New

Traditional

*Multiples of BytesKilobyteMegabyteGigabyteTerabytePetabyteExabyteZettabyteYottabyte

1,0

00

,00

0,0

00

,00

0,0

00

,00

0,0

00

Much of the new data exists in-flight between systems and devices as part of the Internet of Anything

2014

4.1 Zettabytes

2020

40 Zettabytes

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

DATA AT RESTDATA IN MOTION

ACTIONABLEINTELLIGENCE

Modern Data Applications

PERISHABLE INSIGHTS

HISTORICAL INSIGHTS

INTERNETOF

ANYTHING

Hortonworks DataFlow

Hortonworks Data Platform

Hortonworks DeliversConnected Data Platforms

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks Connected Data Platforms and Solutions

HortonworksConnection

Hortonworks Solutions

Enterprise DataWarehouse Optimization

Cyber Security andThreat Management

Internet of Thingsand Streaming Analytics

Hortonworks Connection

Subscription Support

SmartSense

Premier Support

Educational Services

Professional Services

Community Connection

CloudHortonworks Data Cloud

AWS HDInsight

Data CenterHortonworks Data Suite

HDFHDP

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Holistic Customer Interaction Model

HDP and HDF Subscription

Operational Services

Applications

Support/ ”Break Fix”

Professional Services and Partner SI’s

Configure, Manage and Upgrade

Components Included

Customer Proposal Components

8 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hortonworks Influences the Apache Community

We Employ the Committers

--one third of all committers to the Apache®

Hadoop™ project, and a majority in Apache NiFiand other important projects

Our Committers Innovate

and improve Connected Data Platforms

We Influence the Hadoop Roadmap

by communicating important requirements to the community through our leaders

A PA C H E H A D O O P C O M M I T T E R S

9 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 9

Social Mapping

Payment Tracking

Factory YieldsDefect

Detection

Call Analysis

Machine DataProduct Design

M & A

Due Diligence

Next Product Recs

Cyber Security

Risk ModelingAd Placement

Proactive Repair

Disaster Mitigation

Investment Planning

Inventory Predictions

Customer Support

Sentiment Analysis

Supply Chain

Ad PlacementBasket

AnalysisSegments

Cross-Sell

Customer Retention

Vendor Scorecards

Optimize Inventories

OPEX

Reduction

Mainframe

Offloads

Historical

Records

Data

as a Service

Public

Data

Capture

Fraud

Prevention

Device

Data

Ingest

Rapid

Reporting

Digital

Protection

10 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 10

Social Mapping

Payment Tracking

Factory YieldsDefect

Detection

Call Analysis

Machine DataProduct Design

M & A

Due Diligence

Next Product Recs

Cyber Security

Risk ModelingAd Placement

Proactive Repair

Disaster Mitigation

Investment Planning

Inventory Predictions

Customer Support

Sentiment Analysis

Supply Chain

Ad PlacementBasket

AnalysisSegments

Cross-Sell

Customer Retention

Vendor Scorecards

Optimize Inventories

OPEX

Reduction

Mainframe

Offloads

Historical

Records

Data

as a Service

Public

Data

Capture

Fraud

Prevention

Device

Data

Ingest

Rapid

Reporting

Digital

Protection

The Data Journey to a Golden Batch

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Product Monitoring

Process Monitoring

Quality Analysis

Quality Data

Batch Genealogy

Time Series Data

MFGData

Warranty Data

Yield Analysis

Product

EquipmentProduction

Line

Supply Chain

Customer

Process

FactoryLogistics Business

Connected Car

Real-time Operations

Yield Optimization

Quality Optimization

Energy Management

Supply Chain Optimization

ProactiveRepair

InventoryPredictions

Predictive Maintenance

OPEXReduction

Demand Sensing

MainframeOffloads

Device Data

Ingest

Rapid Reporting

DigitalProtection

Dataas a

Service

FraudPrevention

PublicData

Capture

I N N OVAT E

R E N OVAT E

E X P L O R E O P T I M I Z E T R A N S F O R M

A C T I V EA R C H I V E

E T LO N B O A R D

D ATAE N R I C H M E N T

DATAD I S C OV E RY

S I N G L EV I E W

P R E D I C T I V EA N A LY T I C S

M A N U FAC T U R I N G

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Merck’s Journey

Improving Life Sciences Manufacturing Yields Presents a Complex Data Discovery Challenge

Vaccine manufacturing requires precise control of complex fermentation processes

Two batches of a vaccine, produced using an identical manufacturing process, can exhibit significant yield variances

Batches that fail quality standards can cost $1 million each

Merck analyzed one vaccine: 10 years of manufacturing data stored across 16 systems

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Merck’s Journey

Scientific Search

Sensor Data Storage

Vaccine Yield Optimization

Innovate

RenovateThe Journey to the Golden Batch

Combined 10 years data amounted to 1 billion records

5.5 million batch comparisons

1st year yield boost of 40K more doses $10M profit impact

McKinsey: 50% yield improvement

Epidemiology

D ATAD I S C O V E R Y

A C T I V EA R C H I V E

D A T AD I S C O V E R Y

D A T AD I S C O V E R Y

The Golden Batch

The Data Journey to Safe Roads

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Risk Assessment

DueDiligence

SocialMapping

ProductDesign

M & ACall

AnalysisSensor Data

Loss Control

Telematics

CustomerSupport

ClaimAnalysis

Market Segments

CustomerRetention

SentimentAnalysis

Fraud Investigation

Risk Analysis

Cross-Sell

Channel Scorecards

AdPlacement

CyberSecurity

CatModels

InvestmentPlanning

RiskAppetite

RiskModeling

LossControl

Claim Severity

NextBest Action

OPEXReduction

HistoricalRecords

MainframeOffloads

Device Data

Ingest

Rapid Reporting

DigitalProtection

Dataas a

Service

FraudPrevention

PublicData

Capture

I N N OVAT E

R E N OVAT E

E X P L O R E O P T I M I Z E T R A N S F O R M

A C T I V EA R C H I V E

E T LO N B O A R D

D ATAE N R I C H M E N T

DATAD I S C OV E RY

S I N G L EV I E W

P R E D I C T I V EA N A LY T I C S

I N S U R A N C E

Fraud Mitigation

Solvency Analysis

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Progressive’s Journey

Progressive Wanted to Ingest IoT Data to Predict Risk for its Usage-based Insurance Product

Progressive Snapshot offers usage-based insurance through an in-car sensor that transmits IoT driving data

Sensors collect up to six months of data from drivers and the data is archived for years, per regulatory requirements

Progressive’s existing systems were not scaling efficiently

It took 5–7 days to transform only 25% of available UBI data

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Progressive’s Journey

Rewarding Safer Drivers and Improving Traffic Safety

Snapshot plug-in devices capture driving detail, 100% stored in HDP, ingested in 2-3 days

More than 12 billion miles driven stored

Through a web app, customers can review their own driving detail and improve their safety

Snapshot and usage-based insurance drove $2.6 billion in 2014 Progressive premiums, growing since then

Innovate

Renovate

Claims Notes Mining

Individual Driving

Histories

Usage-BasedInsurance (UBI)

Web LogAnalysis

Online AdPlacement

Sensor DataIngest

PREDICTIVEANALYTICS

A C T I V EA R C H I V E

D A T AD I S C O V E R Y

D A T AD I S C O V E R Y

D A T AD I S C O V E R Y

E T LO N B O A R D

Safe Roads

The Journey to Discover the Genetic Links to Cancer

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Arizona State University’s Journey

A Genome Represents 20 Billion Rows of Data and Researchers Couldn’t Explore Enough Genetic Data to Understand How Genes Affect Cancer

Cancer is both complicated (the interplay between multiple biological process) and also complex (affected by biological, genetic, environmental and social factors)

Since each genome represents so much data, legacy platforms couldn't amass enough genomic data to explore cancer patterns across a broad genetic spectrum

This created a “lamp-posting” phenomenon, forcing a focus around incremental research clustered around genes known to influence cancer

ASU turned to HDP to store and process huge amounts of genomic data, to make that data broadly available to researchers and to do it all at a scalable cost

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Arizona State University’s Journey

HDP’s Storage and Compute Efficiencies Allow Individual Researchers to Do the Work Previously Done By Entire Teams

ASU’s Next Generation Cyber Capability (NGCC) project combines HDP with high-performance computing, for genomic analysis in Apache Spark

The NGCC architecture follows President Obama’s “National Cancer Moonshot” guidelines, with a federated framework that encourages data sharing

One query against a table with 20 billion rows would time out before it could return results. In HDP it returned results in 1-2 minutes

“Now with HDP we have both the availability of data and the technical capability to analyze it. We are able to explore spaces where we simply couldn’t go before. It just wasn’t possible before having this technology. This has sped our time to insight infinitely in some cases. Some questions were not possible before, and now they return results in a day.”

-- Dr. Kenneth Buetow, Director of Computational Sciences and Informatics

The Journey to Better Health

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Claims Optimization

SocialSentiment

Cohort Selection

Bill ShockPhysician

NotesDevice

Monitoring

R & DQuality

Benchmarks

PatientExperience

Seasonal Staffing

Net Promoter

Score

Supply Chain

SentimentAnalysis

PatientOutreach

360°PatientView

Patient Throughput

Customer Churn

Analysis

STARS Ratings

Genomics

Remote Monitoring

Drug Diversion

CensusProactive

Maintenance

PreventativeMedicine

Inventory

MedicationSafety

OPEXReduction

Lab Notes Archive

MainframeOffloads

Device Data

Ingest

Rapid Reporting

DigitalProtection

Dataas a

Service

FraudPrevention

Real-time Decision Support

I N N OVAT E

R E N OVAT E

E X P L O R E O P T I M I Z E T R A N S F O R M

A C T I V EA R C H I V E

E T LO N B O A R D

D ATAE N R I C H M E N T

DATAD I S C OV E RY

S I N G L EV I E W

P R E D I C T I V EA N A LY T I C S

H E A LT H C A R E

Care-path Best

Practices

OR Optimization

HCAHPSScores

Staffing Predictions

Proactive Outreach

Legacy System

Data

Imaging Archive

Historical PatientRecords

Improved Drug Yields

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Mercy’s Journey

Mercy Medical System Sought a Data Lake for a Single View of its Patients –“One Patient, One Record”

Existing platform impeded goal of enriching Epic data for 1 million patients over 35 Hospitals and 500 clinics

Moving Epic EMR data to Clarity EDW took 24 hours and was “never goingto enable real-time analytics”. Now that takes 3-5 minutes with HDP.

Improved billing processes resulted in $1M additional annual revenuefrom newly documented secondary diagnoses and care

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

PREDICTIVEANALYTICS

Mercy’s Journey

BillingVital Signs

SinglePatient Record

Lab Notes

PrivacyDatabase

Medical Decision Support

DeviceData

Ingest

PreventiveCare

Epic Enrichment

OPEX Efficiency

Epic EMR Replication

Better HealthThrough Data

Searches of free-text lab notes, speed researcher insight from “never” to “seconds”

Ingest of ICU vital signsincreased by 900X, letting clinicians respond more quickly

Mercy is building real-timetools to support surgical decisions and preventive care

Innovate

Renovate

Better Health

D A T AD I S C O V E R Y

S I N G L EV I E W

D A T AD I S C O V E R Y

S I N G L EV I E W

A C T I V EA R C H I V E

A C T I V EA R C H I V E

A C T I V EA R C H I V E

D A T AE N R I C H M E N T

E T LO N B O A R D

P R E D I C T I V EA N A L Y T I C S

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Partnerships

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hadoop for Retail

DATA REPOSITORIES

ANALYSIS

Single view of consumerTargeted promotionsRecommendation enginesBasket analysis

Price optimizationInventory optimizationLoyalty managementPath to purchase

Secu

rity

Op

era

tio

ns

Go

vern

ance

& In

tegr

atio

n

°1 ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° N

YARN : Data Operating System

Script SQL NoSQL Stream Search Others

HDFS(Hadoop Distributed File System)

In-Mem

ERP

EDW

RDBMS

CRM

EMERGING & NON-TRADITIONAL SOURCES

SOCIAL MEDIA

BEACONS

SENSOR RFID

CLICKSTREAM

IN-STORE WIFI LOGS

SERVER LOGS

TRADITIONAL SOURCES

CRM STORES PRODUCT CATALOG STAFFING PLANS

ERP POS TRANSACTIONS INVENTORY WEB TRANSACTIONS

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Interoperable with Leading Datacenter Technologies

Partners

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Start Your Journey with Hortonworks in 5 Easy Steps

1. Schedule a use case workshop with the local Hortonworks team

2. Complete the Big Data Scorecard– hortonworks.com/get-started/big-data-scorecard

3. Download Hortonworks Sandbox – hortonworks.com/sandbox

4. Subscribe for Support – hortonworks.com/services/jumpstart/

5. Join Hortonworks Community Connection – hortonworks.com/community/

6. Follow the Hortonworks blog – hortonworks.com/blog/

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

[email protected]

@ericthorsen

Thank you!

+1 513-237-3811