ben marden - making sense of big data
DESCRIPTION
Ben Marden from HortonWorks presentation from our Big Data breakfast conferenceTRANSCRIPT
Ben Marden
HortonWorks
Making sense of Big Data
© Hortonworks Inc. 2013
HortonworksMaking sense of Big Data
Benedict Marden
June 2013
Page 2
© Hortonworks Inc. 2013
Hortonworks
• Why data driven business• Who is Hortonworks• Our Approach• Hadoop Data Types• Patterns of Use• Summary
Page 3
© Hortonworks Inc. 2013
Why Data Driven Business?
Page 4
1110010100001010011101010100010010100100101001001000010010001001000001000100000100010010010001000010111000010010001000101001001011110101001000100100101001010010011111001010010100011111010001001010000010010001010010111101010011001001010010001000111
Data driven decisions are better decisions – its as simple as that. Using big data enables mangers to decide on the basis of evidence rather than intuition. For that reason it has the potential to revolutionize management
Harvard Business ReviewOctober 2012
© Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 5
2013
Focus on INNOVATION2005: Yahoo! creates
team under E14 to work on Hadoop
Focus on OPERATIONS2008: Yahoo team extends focus to
operations to support multiple projects & growing clusters
Yahoo! begins to Operate at scale
EnterpriseHadoop
Apache Project Established
HortonworksData Platform
2004 2008 2010 20122006
STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with
24 key Hadoop engineers from Yahoo
© Hortonworks Inc. 2013
Leadership that Starts at the Core
Page 6
• Driving next generation Hadoop– YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006– More than twice nearest contributor
• Deeply integrating w/ecosystem– Enabling new deployment platforms
– (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions– (ex. Teradata big data appliance)
• All Apache, NO holdbacks– 100% of code contributed to Apache
© Hortonworks Inc. 2013
Hortonworks Snapshot
Page 7
• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform
• We engineer, test & certify HDP for enterprise usage
• We employ the core architects, builders and operators of Apache Hadoop
• We drive innovation within Apache Software Foundation projects
• We are uniquely positioned to deliver the highest quality of Hadoop support
• We enable the ecosystem to work better with Hadoop
Develop Distribute Support
We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CAEmployees: 200+ and growingInvestors: Benchmark, Index, Yahoo
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 8
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
Enterprise Readiness
© Hortonworks Inc. 2013
6 Key Hadoop DATA TYPES
1. SentimentUnderstand how your customers feel about your brand and products – right now
2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website
3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines
4. GeographicAnalyze location-based data to manage operations where they occur
5. Server LogsResearch logs to diagnose process failures and prevent security breaches
6. TextUnderstand patterns in text across millions of web pages, emails, and documents
Page
Value
© Hortonworks Inc. 2013
Existing Data ArchitectureAP
PLIC
ATIO
NS
DATA
SYS
TEM
S
TRADITIONAL REPOSRDBMS EDW MP
P
DATA
SO
URC
ES
OLTP, POS SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
Page 10
© Hortonworks Inc. 2013
Next-Generation Data ArchitectureAP
PLIC
ATIO
NS
DATA
SYS
TEM
S
TRADITIONAL REPOSRDBMS EDW MP
P
DATA
SO
URC
ES
OLTP, POS SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensors, social media)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
ENTERPRISE HADOOP PLATFORM
Page 11
© Hortonworks Inc. 2013
Interoperating With Your Tools
Page 12
APPL
ICAT
ION
SDA
TA S
YSTE
MS
TRADITIONAL REPOS
DEV & DATATOOLS
OPERATIONALTOOLS
Viewpoint
Microsoft Applications
HORTONWORKS DATA PLATFORM
DATA
SO
URC
ES
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensors, social media)
© Hortonworks Inc. 2013
Big DataTransactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKSDATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 13
© Hortonworks Inc. 2013
Operational Data RefineryDA
TA S
YSTE
MS
DATA
SO
URC
ES
1
3
1 Capture
Process
Distribute & Retain
2
3
Refine Explore Enrich
2
APPL
ICAT
ION
S
Transform & refine ALL sources of data
Also known as Data Reservoir or Catch Basin
TRADITIONAL REPOSRDBMS EDW MPP
Business Analytics
Custom Applications
Enterprise Applications
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Page 14
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Big Data Exploration & VisualizationDA
TA S
YSTE
MS
DATA
SO
URC
ES
Refine Explore Enrich
APPL
ICAT
ION
S
Leverage “data lake” to perform iterative investigation for value
3
2TRADITIONAL REPOS
RDBMS EDW MPP
1
Business Analytics
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom Applications
Enterprise Applications
1 Capture
Process
Explore & Visualize
2
3
Page 15
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
DATA
SYS
TEM
SDA
TA S
OU
RCES
Refine Explore Enrich
APPL
ICAT
ION
S
Create intelligent applications
Collect data, create analytical models and deliver to online apps
3
1
2TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom Applications
Enterprise Applications
NOSQL
1 Capture
Process & Compute
Deliver Model
2
3
Page 16
Application Enrichment
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Transferring Our Hadoop Expertise to You
The expert source for Apache Hadoop training &
certification
• World class training programs designed to help you learn fast
– Role-based hands on classes with 50% lab time
• Expert consulting services– Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox program– Fastest way to learn Apache Hadoop– Multi-level tutorials for wide applicability– Customizable and updateable
Page 17
© Hortonworks Inc. 2013
Summary
• Leading the Innovation in Core Hadoop• Addressing the requirements for Enterprise usage• Enabling interoperability of the ecosystem• No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out more–www.hortonworks.com
–http://hortonworks.com/hadoop-training/
Page 18