© hortonworks inc. 2012 bridge to big data hadoop in the enterprise architecture jim walker october...
TRANSCRIPT
© Hortonworks Inc. 2012
Bridge to Big DataHadoop in the Enterprise Architecture
Jim Walker
October 23,2012
Strata/Hadoop World 2012
Page 1
© Hortonworks Inc. 2012
Big Data: Organizational Game Changer
Page 2
Megabytes
Gigabytes
Terabytes
Petabytes
Purchase detail
Purchase record
Payment record
ERP
CRM
WEB
BIG DATA
Offer details
Support Contacts
Customer Touches
Segmentation
Web logs
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search Marketing
Behavioral Targeting
Dynamic Funnels
User Generated Content
Mobile Web
SMS/MMSSentiment
External Demographics
HD Video, Audio, Images
Speech to Text
Product/Service Logs
Social Interactions & Feeds
Business Data Feeds
User Click Stream
Sensors / RFID / Devices
Spatial & GPS Coordinates
Increasing Data Variety and Complexity
Transactions + Interactions + Observations
= BIG DATA
© Hortonworks Inc. 2012
What is a Data Driven Business?
• DEFINITIONBetter use of available data in the decision making process
• RULEKey metrics derived from data should be tied to goals
• PROVEN RESULTSFirms that adopt Data-Driven Decision Making have output and productivity that is 5-6% higher than what would be expected given their investments and usage of information technology*
Page 3
* “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011)
1110010100001010011101010100010010100100101001001000010010001001000001000100000100010010010001000010111000010010001000101001001011110101001000100100101001010010011111001010010100011111010001001010000010010001010010111101010011001001010010001000111
© Hortonworks Inc. 2012Page 4
© Hortonworks Inc. 2012Page 5
© Hortonworks Inc. 2012Page 6
© Hortonworks Inc. 2012
optimize
optimize
optimize
optimize
optimize
optimize
optimize
optimize
optimize
optimize
Big Data: Optimize Outcomes at Scale
Sports Championships
Intelligence Detection
Finance Algorithms
Advertising Performance
Fraud Prevention
Retail / Wholesale Inventory turns
Manufacturing Supply chains
Healthcare Patient outcomes
Education Learning outcomes
Government Citizen services
Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.
Page 7
© Hortonworks Inc. 2012
Dashboards, Reports, Visualization, …
CRM, ERPWeb, MobilePoint of sale
Enterprise Big Data Flows
Page 8
Big DataPlatform
Business Transactions& Interactions
Business Intelligence & Analytics
UnstructuredData
Log files
DB data
Exhaust Data
Social Media
Sensors, devices
Classic Data Integration & ETL
Capture Big DataCollect data from all sources structured &unstructured
ProcessTransform, refine, aggregate, analyze, report
Distribute ResultsInteroperate and share data with applications/analytics
FeedbackUse operational data w/inthe big data platform
1 2 3 4
© Hortonworks Inc. 2012
Univ. California - Irvine Medical Center
Optimizing patient outcomes while lowering costs
• Current system, Epic holds 22 years of patient data, across admissions and clinical information
– Significant cost to maintain and run system– Difficult to access, not-integrated, stand alone
• Apache Hadoop sunsets legacy system and augments new electronic medical records (EMR)
– Migrate legacy data to Apache Hadoop, replaced existing ETL and temporary databases and captures complete info
– Eliminate maint of legacy system for $500K annual savings– Integrate data with EMR and clinical front-end for Better
patient service and improved research
Page 9
• UC Irvine Medical Center is ranked among the nation's best hospitals by U.S. News & World Report for the 12th year
• More than 400 specialty and primary care physicians
• Opened in 1976
• 422-bed medical facility
© Hortonworks Inc. 2012
Data Platform for Big Data
Data Platform Requirements for Big Data
Page 10
Capture
• Collect data from all sources - structured and unstructured data
• all speeds batch, async, streaming, real-time
Process
• Transform, refine, aggregate, analyze, report
Exchange
• Deliver data with enterprise data systems
• Share data with analytic applications and processing
Operate• Provision, monitor, diagnose, manage at scale• Reliability, availability, affordability, scalability, interoperability
OperatingSystems
VirtualPlatforms
CloudPlatforms
Big DataAppliances
Across all deployment models
© Hortonworks Inc. 2012
Big DataTransactions, Interactions, Observations
Apache Hadoop & Big Data Use Cases
Page 11
Refine Explore Enrich
Business Case
© Hortonworks Inc. 2012
EnterpriseData Warehouse
Operational Data RefineryHadoop as platform for ETL modernization
Capture• Capture new unstructured data along with log
files all alongside existing sources• Retain inputs in raw form for audit and
continuity purposes
Process• Parse the data & cleanse• Apply structure and definition• Join datasets together across disparate data
sources
Exchange• Push to existing data warehouse for
downstream consumption• Feeds operational reporting and online systems
Page 12
Unstructured Log files
Refinery
Structure and join
Capture and archive
Parse & Cleanse
Refine Explore Enrich
DB data
Upload
© Hortonworks Inc. 2012
“Big Bank” Key Benefits
• Capture and archive– Retain 3 – 5 years instead of 2 – 10 days– Lower costs– Improved compliance
• Transform, change, refine– Turn upstream raw dumps into small list of “new, update, delete”
customer records– Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible)– Turn raw weblogs into sessions and behaviors
• Upload– Insert into Teradata for downstream “as-is” reporting and tools– Insert into new exploration platform for scientists to play with
Page 13
© Hortonworks Inc. 2012
VisualizationToolsEDW / Datamart
Explore
Big Data Exploration & VisualizationHadoop as agile, ad-hoc data mart
Capture• Capture multi-structured data and retain inputs
in raw form for iterative analysis
Process• Parse the data into queryable format• Explore & analyze using Hive, Pig, Mahout and
other tools to discover value • Label data and type information for
compatibility and later discovery• Pre-compute stats, groupings, patterns in data
to accelerate analysis
Exchange• Use visualization tools to facilitate exploration
and find key insights• Optionally move actionable insights into EDW
or datamart
Page 14
Capture and archive
upload JDBC / ODBC
Structure and join
Categorize into tables
Unstructured Log files DB data
Refine Explore Enrich
Optional
© Hortonworks Inc. 2012
“Hardware Manufacturer” Key Benefits
• Capture and archive– Store 10M+ survey forms/year for > 3 years– Capture text, audio, and systems data in one platform
• Structure and join– Unlock freeform text and audio data– Un-anonymize customers
• Categorize into tables– Create HCatalog tables “customer”, “survey”, “freeform text”
• Upload, JDBC– Visualize natural satisfaction levels and groups– Tag customers as “happy” and report back to CRM database
Page 15
© Hortonworks Inc. 2012
Online Applications
Enrich
Application EnrichmentDeliver Hadoop analysis to online apps
Capture• Capture data that was once
too bulky and unmanageable
Process• Uncover aggregate characteristics across data • Use Hive Pig and Map Reduce to identify patterns• Filter useful data from mass streams (Pig)• Micro or macro batch oriented schedules
Exchange• Push results to HBase or other NoSQL alternative
for real time delivery• Use patterns to deliver right content/offer to the
right person at the right time
Page 16
Derive/Filter
Capture
Parse
NoSQL, HBaseLow Latency
Scheduled & near real time
Unstructured Log files DB data
Refine Explore Enrich
© Hortonworks Inc. 2012
“Clothing Retailer” Key Benefits
• Capture– Capture weblogs together with sales order history, customer
master
• Derive useful information– Compute relationships between products over time
– “people who buy shirts eventually need pants”
– Score customer web behavior / sentiment– Connect product recommendations to customer sentiment
• Share– Load customer recommendations into HBase for rapid website
service
Page 17
© Hortonworks Inc. 2012
Hadoop in Enterprise Data Architectures
Page 18
EDW
Existing Business Infrastructure
ODS &Datamarts
Applications &Spreadsheets
Visualization & Intelligence
Discovery Tools
IDE & Dev Tools
Low Latency/NoSQ
L
Web
WebApplications
Operations
Custom Existing
Templeton SqoopWebHDFS Flume
HCatalog
PigHBase
Hive
Ambari HAOozie
ZooKeeper
MapReduce HDFS
Big Data Sources (transactions, observations, interactions)
CRM ERPExhaust
Datalogs files
financialsSocial Media
New Tech
DatameerTableau
KarmasphereSplunk
© Hortonworks Inc. 2012
Batch & ScheduledIntegration
Near Real-TimeIntegration
Existing Infrastructure
HDFS
Pig
Hadoop Integration Options
REST
Hive HBase
MapReduce
HCatalog
WebHDFS
Databases &Warehouses
Applications &Spreadsheets
Visualization & Intelligence
Flume
Logs &Files
Existing Infrastructure
HDFS
Pig Hive HBase
MapReduce
HCatalog
Databases &Warehouses
Applications &Spreadsheets
Visualization & Intelligence
Logs &Files
Data Integration (Talend, Informatica)
ODBC/JDBC
SQOOP
© Hortonworks Inc. 2012
Sto
rage
What Is the Size of Your Cluster?
Open Source data management with scale-out storage & distributed processing
Page 20
HDFS• Distributed across “nodes”• Natively redundant• Name node tracks locations
Com
pute
Map Reduce• Splits a task across processors
“near” the data & assembles results• Self-Healing, High Bandwidth
Clustered Storage
Key Characteristics• Scalable
– Efficiently store & process petabytes– Linear scale driven by additional
processing and storage
• Reliable– Redundant storage– Failover across nodes and racks
• Flexible– Store all types of data in any format– Apply schema on analysis & sharing of data
• Economical– Use commodity hardware– Open source software guards
against vendor lock-in
© Hortonworks Inc. 2012
Hadoop Balance Innovation & Stability
Page 21
time
rela
tive
% c
ust
om
ers
Th
e C
HA
SM
Customers want solutions & convenience
Customers want technology & performance
Innovators, technology enthusiasts
Early adopters, visionaries
Early majority,
pragmatists
Late majority, conservatives
Laggards, Skeptics
Source: Geoffrey Moore - Crossing the Chasm
www.hortonworks.com/moore
© Hortonworks Inc. 2012
Where Does It Fit into Your Business?
Vertical Refine Explore Enrich
Retail & Web• Log Analysis• Ad Optimization• Cross Channel Analytics
• Social Network Analysis• Event Analytics
• Dynamic Pricing• Session & Content
Optimization• Recommendation Engines
Retail • Loyalty Program Optimization
• Brand and Sentiment Analysis
• Dynamic Pricing/Targeted Offer
Intelligence • Threat Identification • Person of Interest Discovery • Cross Jurisdiction Queries
Finance
• Risk Modeling & Fraud Identification
• Trade Performance Analytics
• Surveillance and Fraud Detection
• Customer Risk Analysis
• Real-time upsell, cross sales marketing offers
Energy • Smart Grid: Production Optimization
• Grid Failure Prevention• Smart Meters
• Individual Power Grid
Manufacturing • Supply Chain Optimization • Customer Churn Analysis• Dynamic Delivery• Replacement parts
Healthcare & Payer
• Electronic Medical Records (EMPI)
• Clinical Trials Analysis• Supply Chain Optimizations
• Insurance Premium Determination
Page 22
© Hortonworks Inc. 2012
1
• Simplify deployment to get started quickly and easily
• Monitor, manage any size cluster with familiar console and tools
• Only platform to include data integration services to interact with any data
• Metadata services opens the platform for integration with existing applications
• Dependable high availability architecture
• Tested at scale to future proof your cluster growth
Hortonworks Data Platform
Page 23
Reduce risks and cost of adoption Lower the total cost to administer and provision Integrate with your existing ecosystem
© Hortonworks Inc. 2012
Next Steps?
• Expert role based training• Course for admins, developers
and operators• Certification program• Custom onsite options
Page 24
Download Hortonworks Data Platformhortonworks.com/download
1
2 Use the getting started guidehortonworks.com/get-started
3 Learn more… get support
• Full lifecycle technical support across four service levels
• Delivered by Apache Hadoop Experts/Committers
• Forward-compatible
Hortonworks Support
hortonworks.com/training hortonworks.com/support