© hortonworks inc. 2012 bridge to big data hadoop in the enterprise architecture jim walker october...

24
© Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

Upload: kaelyn-levit

Post on 15-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Bridge to Big DataHadoop in the Enterprise Architecture

Jim Walker

October 23,2012

Strata/Hadoop World 2012

Page 1

Page 2: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Big Data: Organizational Game Changer

Page 2

Megabytes

Gigabytes

Terabytes

Petabytes

Purchase detail

Purchase record

Payment record

ERP

CRM

WEB

BIG DATA

Offer details

Support Contacts

Customer Touches

Segmentation

Web logs

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMSSentiment

External Demographics

HD Video, Audio, Images

Speech to Text

Product/Service Logs

Social Interactions & Feeds

Business Data Feeds

User Click Stream

Sensors / RFID / Devices

Spatial & GPS Coordinates

Increasing Data Variety and Complexity

Transactions + Interactions + Observations

= BIG DATA

Page 3: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

What is a Data Driven Business?

• DEFINITIONBetter use of available data in the decision making process

• RULEKey metrics derived from data should be tied to goals

• PROVEN RESULTSFirms that adopt Data-Driven Decision Making have output and productivity that is 5-6% higher than what would be expected given their investments and usage of information technology*

Page 3

* “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011)

1110010100001010011101010100010010100100101001001000010010001001000001000100000100010010010001000010111000010010001000101001001011110101001000100100101001010010011111001010010100011111010001001010000010010001010010111101010011001001010010001000111

Page 4: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012Page 4

Page 5: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012Page 5

Page 6: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012Page 6

Page 7: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

optimize

optimize

optimize

optimize

optimize

optimize

optimize

optimize

optimize

optimize

Big Data: Optimize Outcomes at Scale

Sports Championships

Intelligence Detection

Finance Algorithms

Advertising Performance

Fraud Prevention

Retail / Wholesale Inventory turns

Manufacturing Supply chains

Healthcare Patient outcomes

Education Learning outcomes

Government Citizen services

Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.

Page 7

Page 8: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Dashboards, Reports, Visualization, …

CRM, ERPWeb, MobilePoint of sale

Enterprise Big Data Flows

Page 8

Big DataPlatform

Business Transactions& Interactions

Business Intelligence & Analytics

UnstructuredData

Log files

DB data

Exhaust Data

Social Media

Sensors, devices

Classic Data Integration & ETL

Capture Big DataCollect data from all sources structured &unstructured

ProcessTransform, refine, aggregate, analyze, report

Distribute ResultsInteroperate and share data with applications/analytics

FeedbackUse operational data w/inthe big data platform

1 2 3 4

Page 9: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Univ. California - Irvine Medical Center

Optimizing patient outcomes while lowering costs

• Current system, Epic holds 22 years of patient data, across admissions and clinical information

– Significant cost to maintain and run system– Difficult to access, not-integrated, stand alone

• Apache Hadoop sunsets legacy system and augments new electronic medical records (EMR)

– Migrate legacy data to Apache Hadoop, replaced existing ETL and temporary databases and captures complete info

– Eliminate maint of legacy system for $500K annual savings– Integrate data with EMR and clinical front-end for Better

patient service and improved research

Page 9

• UC Irvine Medical Center is ranked among the nation's best hospitals by U.S. News & World Report for the 12th year

• More than 400 specialty and primary care physicians

• Opened in 1976

• 422-bed medical facility

Page 10: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Data Platform for Big Data

Data Platform Requirements for Big Data

Page 10

Capture

• Collect data from all sources - structured and unstructured data

• all speeds batch, async, streaming, real-time

Process

• Transform, refine, aggregate, analyze, report

Exchange

• Deliver data with enterprise data systems

• Share data with analytic applications and processing

Operate• Provision, monitor, diagnose, manage at scale• Reliability, availability, affordability, scalability, interoperability

OperatingSystems

VirtualPlatforms

CloudPlatforms

Big DataAppliances

Across all deployment models

Page 11: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Big DataTransactions, Interactions, Observations

Apache Hadoop & Big Data Use Cases

Page 11

Refine Explore Enrich

Business Case

Page 12: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

EnterpriseData Warehouse

Operational Data RefineryHadoop as platform for ETL modernization

Capture• Capture new unstructured data along with log

files all alongside existing sources• Retain inputs in raw form for audit and

continuity purposes

Process• Parse the data & cleanse• Apply structure and definition• Join datasets together across disparate data

sources

Exchange• Push to existing data warehouse for

downstream consumption• Feeds operational reporting and online systems

Page 12

Unstructured Log files

Refinery

Structure and join

Capture and archive

Parse & Cleanse

Refine Explore Enrich

DB data

Upload

Page 13: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

“Big Bank” Key Benefits

• Capture and archive– Retain 3 – 5 years instead of 2 – 10 days– Lower costs– Improved compliance

• Transform, change, refine– Turn upstream raw dumps into small list of “new, update, delete”

customer records– Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible)– Turn raw weblogs into sessions and behaviors

• Upload– Insert into Teradata for downstream “as-is” reporting and tools– Insert into new exploration platform for scientists to play with

Page 13

Page 14: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

VisualizationToolsEDW / Datamart

Explore

Big Data Exploration & VisualizationHadoop as agile, ad-hoc data mart

Capture• Capture multi-structured data and retain inputs

in raw form for iterative analysis

Process• Parse the data into queryable format• Explore & analyze using Hive, Pig, Mahout and

other tools to discover value • Label data and type information for

compatibility and later discovery• Pre-compute stats, groupings, patterns in data

to accelerate analysis

Exchange• Use visualization tools to facilitate exploration

and find key insights• Optionally move actionable insights into EDW

or datamart

Page 14

Capture and archive

upload JDBC / ODBC

Structure and join

Categorize into tables

Unstructured Log files DB data

Refine Explore Enrich

Optional

Page 15: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

“Hardware Manufacturer” Key Benefits

• Capture and archive– Store 10M+ survey forms/year for > 3 years– Capture text, audio, and systems data in one platform

• Structure and join– Unlock freeform text and audio data– Un-anonymize customers

• Categorize into tables– Create HCatalog tables “customer”, “survey”, “freeform text”

• Upload, JDBC– Visualize natural satisfaction levels and groups– Tag customers as “happy” and report back to CRM database

Page 15

Page 16: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Online Applications

Enrich

Application EnrichmentDeliver Hadoop analysis to online apps

Capture• Capture data that was once

too bulky and unmanageable

Process• Uncover aggregate characteristics across data • Use Hive Pig and Map Reduce to identify patterns• Filter useful data from mass streams (Pig)• Micro or macro batch oriented schedules

Exchange• Push results to HBase or other NoSQL alternative

for real time delivery• Use patterns to deliver right content/offer to the

right person at the right time

Page 16

Derive/Filter

Capture

Parse

NoSQL, HBaseLow Latency

Scheduled & near real time

Unstructured Log files DB data

Refine Explore Enrich

Page 17: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

“Clothing Retailer” Key Benefits

• Capture– Capture weblogs together with sales order history, customer

master

• Derive useful information– Compute relationships between products over time

– “people who buy shirts eventually need pants”

– Score customer web behavior / sentiment– Connect product recommendations to customer sentiment

• Share– Load customer recommendations into HBase for rapid website

service

Page 17

Page 18: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Hadoop in Enterprise Data Architectures

Page 18

EDW

Existing Business Infrastructure

ODS &Datamarts

Applications &Spreadsheets

Visualization & Intelligence

Discovery Tools

IDE & Dev Tools

Low Latency/NoSQ

L

Web

WebApplications

Operations

Custom Existing

Templeton SqoopWebHDFS Flume

HCatalog

PigHBase

Hive

Ambari HAOozie

ZooKeeper

MapReduce HDFS

Big Data Sources (transactions, observations, interactions)

CRM ERPExhaust

Datalogs files

financialsSocial Media

New Tech

DatameerTableau

KarmasphereSplunk

Page 19: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Batch & ScheduledIntegration

Near Real-TimeIntegration

Existing Infrastructure

HDFS

Pig

Hadoop Integration Options

REST

Hive HBase

MapReduce

HCatalog

WebHDFS

Databases &Warehouses

Applications &Spreadsheets

Visualization & Intelligence

Flume

Logs &Files

Existing Infrastructure

HDFS

Pig Hive HBase

MapReduce

HCatalog

Databases &Warehouses

Applications &Spreadsheets

Visualization & Intelligence

Logs &Files

Data Integration (Talend, Informatica)

ODBC/JDBC

SQOOP

Page 20: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Sto

rage

What Is the Size of Your Cluster?

Open Source data management with scale-out storage & distributed processing

Page 20

HDFS• Distributed across “nodes”• Natively redundant• Name node tracks locations

Com

pute

Map Reduce• Splits a task across processors

“near” the data & assembles results• Self-Healing, High Bandwidth

Clustered Storage

Key Characteristics• Scalable

– Efficiently store & process petabytes– Linear scale driven by additional

processing and storage

• Reliable– Redundant storage– Failover across nodes and racks

• Flexible– Store all types of data in any format– Apply schema on analysis & sharing of data

• Economical– Use commodity hardware– Open source software guards

against vendor lock-in

Page 21: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Hadoop Balance Innovation & Stability

Page 21

time

rela

tive

% c

ust

om

ers

Th

e C

HA

SM

Customers want solutions & convenience

Customers want technology & performance

Innovators, technology enthusiasts

Early adopters, visionaries

Early majority,

pragmatists

Late majority, conservatives

Laggards, Skeptics

Source: Geoffrey Moore - Crossing the Chasm

www.hortonworks.com/moore

Page 22: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Where Does It Fit into Your Business?

Vertical Refine Explore Enrich

Retail & Web• Log Analysis• Ad Optimization• Cross Channel Analytics

• Social Network Analysis• Event Analytics

• Dynamic Pricing• Session & Content

Optimization• Recommendation Engines

Retail • Loyalty Program Optimization

• Brand and Sentiment Analysis

• Dynamic Pricing/Targeted Offer

Intelligence • Threat Identification • Person of Interest Discovery • Cross Jurisdiction Queries

Finance

• Risk Modeling & Fraud Identification

• Trade Performance Analytics

• Surveillance and Fraud Detection

• Customer Risk Analysis

• Real-time upsell, cross sales marketing offers

Energy • Smart Grid: Production Optimization

• Grid Failure Prevention• Smart Meters

• Individual Power Grid

Manufacturing • Supply Chain Optimization • Customer Churn Analysis• Dynamic Delivery• Replacement parts

Healthcare & Payer

• Electronic Medical Records (EMPI)

• Clinical Trials Analysis• Supply Chain Optimizations

• Insurance Premium Determination

Page 22

Page 23: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

1

• Simplify deployment to get started quickly and easily

• Monitor, manage any size cluster with familiar console and tools

• Only platform to include data integration services to interact with any data

• Metadata services opens the platform for integration with existing applications

• Dependable high availability architecture

• Tested at scale to future proof your cluster growth

Hortonworks Data Platform

Page 23

Reduce risks and cost of adoption Lower the total cost to administer and provision Integrate with your existing ecosystem

Page 24: © Hortonworks Inc. 2012 Bridge to Big Data Hadoop in the Enterprise Architecture Jim Walker October 23,2012 Strata/Hadoop World 2012 Page 1

© Hortonworks Inc. 2012

Next Steps?

• Expert role based training• Course for admins, developers

and operators• Certification program• Custom onsite options

Page 24

Download Hortonworks Data Platformhortonworks.com/download

1

2 Use the getting started guidehortonworks.com/get-started

3 Learn more… get support

• Full lifecycle technical support across four service levels

• Delivered by Apache Hadoop Experts/Committers

• Forward-compatible

Hortonworks Support

hortonworks.com/training hortonworks.com/support