ben marden - making sense of big data

18
Ben Marden HortonWorks Making sense of Big Data

Upload: weareesynergy

Post on 10-May-2015

326 views

Category:

Technology


5 download

DESCRIPTION

Ben Marden from HortonWorks presentation from our Big Data breakfast conference

TRANSCRIPT

Page 1: Ben Marden - Making sense of Big Data

Ben Marden

HortonWorks

Making sense of Big Data

Page 2: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

HortonworksMaking sense of Big Data

Benedict Marden

June 2013

Page 2

Page 3: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Hortonworks

• Why data driven business• Who is Hortonworks• Our Approach• Hadoop Data Types• Patterns of Use• Summary

Page 3

Page 4: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Why Data Driven Business?

Page 4

1110010100001010011101010100010010100100101001001000010010001001000001000100000100010010010001000010111000010010001000101001001011110101001000100100101001010010011111001010010100011111010001001010000010010001010010111101010011001001010010001000111

Data driven decisions are better decisions – its as simple as that. Using big data enables mangers to decide on the basis of evidence rather than intuition. For that reason it has the potential to revolutionize management

Harvard Business ReviewOctober 2012

Page 5: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

A Brief History of Apache Hadoop

Page 5

2013

Focus on INNOVATION2005: Yahoo! creates

team under E14 to work on Hadoop

Focus on OPERATIONS2008: Yahoo team extends focus to

operations to support multiple projects & growing clusters

Yahoo! begins to Operate at scale

EnterpriseHadoop

Apache Project Established

HortonworksData Platform

2004 2008 2010 20122006

STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with

24 key Hadoop engineers from Yahoo

Page 6: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Leadership that Starts at the Core

Page 6

• Driving next generation Hadoop– YARN, MapReduce2, HDFS2, High

Availability, Disaster Recovery

• 420k+ lines authored since 2006– More than twice nearest contributor

• Deeply integrating w/ecosystem– Enabling new deployment platforms

– (ex. Windows & Azure, Linux & VMware HA)

– Creating deeply engineered solutions– (ex. Teradata big data appliance)

• All Apache, NO holdbacks– 100% of code contributed to Apache

Page 7: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Hortonworks Snapshot

Page 7

• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform

• We engineer, test & certify HDP for enterprise usage

• We employ the core architects, builders and operators of Apache Hadoop

• We drive innovation within Apache Software Foundation projects

• We are uniquely positioned to deliver the highest quality of Hadoop support

• We enable the ecosystem to work better with Hadoop

Develop Distribute Support

We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution

Endorsed by Strategic Partners

Headquarters: Palo Alto, CAEmployees: 200+ and growingInvestors: Benchmark, Index, Yahoo

Page 8: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

OS Cloud VM Appliance

HDP: Enterprise Hadoop Distribution

Page 8

PLATFORM SERVICES

HADOOP CORE

DATASERVICES

OPERATIONAL SERVICES

Manage & Operate at

Scale

Store, Process and Access Data

HORTONWORKS DATA PLATFORM (HDP)

Distributed Storage & Processing

Hortonworks Data Platform (HDP)Enterprise Hadoop

• The ONLY 100% open source and complete distribution

• Enterprise grade, proven and tested at scale

• Ecosystem endorsed to ensure interoperability

Enterprise Readiness

Page 9: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

6 Key Hadoop DATA TYPES

1. SentimentUnderstand how your customers feel about your brand and products – right now

2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website

3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines

4. GeographicAnalyze location-based data to manage operations where they occur

5. Server LogsResearch logs to diagnose process failures and prevent security breaches

6. TextUnderstand patterns in text across millions of web pages, emails, and documents

Page

Value

Page 10: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Existing Data ArchitectureAP

PLIC

ATIO

NS

DATA

SYS

TEM

S

TRADITIONAL REPOSRDBMS EDW MP

P

DATA

SO

URC

ES

OLTP, POS SYSTEMS

OPERATIONALTOOLS

MANAGE & MONITOR

Traditional Sources (RDBMS, OLTP, OLAP)

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

Enterprise Applications

Page 10

Page 11: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Next-Generation Data ArchitectureAP

PLIC

ATIO

NS

DATA

SYS

TEM

S

TRADITIONAL REPOSRDBMS EDW MP

P

DATA

SO

URC

ES

OLTP, POS SYSTEMS

OPERATIONALTOOLS

MANAGE & MONITOR

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensors, social media)

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

Enterprise Applications

ENTERPRISE HADOOP PLATFORM

Page 11

Page 12: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Interoperating With Your Tools

Page 12

APPL

ICAT

ION

SDA

TA S

YSTE

MS

TRADITIONAL REPOS

DEV & DATATOOLS

OPERATIONALTOOLS

Viewpoint

Microsoft Applications

HORTONWORKS DATA PLATFORM

DATA

SO

URC

ES

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensors, social media)

Page 13: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Big DataTransactions, Interactions, Observations

Hadoop Common Patterns of Use

Business Cases

HORTONWORKSDATA PLATFORM

Refine Explore Enrich

Batch Interactive Online

“Right-time” Access to Data

Page 13

Page 14: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Operational Data RefineryDA

TA S

YSTE

MS

DATA

SO

URC

ES

1

3

1 Capture

Process

Distribute & Retain

2

3

Refine Explore Enrich

2

APPL

ICAT

ION

S

Transform & refine ALL sources of data

Also known as Data Reservoir or Catch Basin

TRADITIONAL REPOSRDBMS EDW MPP

Business Analytics

Custom Applications

Enterprise Applications

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

Page 14

HORTONWORKS DATA PLATFORM

Page 15: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Big Data Exploration & VisualizationDA

TA S

YSTE

MS

DATA

SO

URC

ES

Refine Explore Enrich

APPL

ICAT

ION

S

Leverage “data lake” to perform iterative investigation for value

3

2TRADITIONAL REPOS

RDBMS EDW MPP

1

Business Analytics

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

Custom Applications

Enterprise Applications

1 Capture

Process

Explore & Visualize

2

3

Page 15

HORTONWORKS DATA PLATFORM

Page 16: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

DATA

SYS

TEM

SDA

TA S

OU

RCES

Refine Explore Enrich

APPL

ICAT

ION

S

Create intelligent applications

Collect data, create analytical models and deliver to online apps

3

1

2TRADITIONAL REPOS

RDBMS EDW MPP

Traditional Sources (RDBMS, OLTP, OLAP)

New Sources (web logs, email, sensor data, social media)

Custom Applications

Enterprise Applications

NOSQL

1 Capture

Process & Compute

Deliver Model

2

3

Page 16

Application Enrichment

HORTONWORKS DATA PLATFORM

Page 17: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Transferring Our Hadoop Expertise to You

The expert source for Apache Hadoop training &

certification

• World class training programs designed to help you learn fast

– Role-based hands on classes with 50% lab time

• Expert consulting services– Programs designed to transfer knowledge

• Industry leading Hadoop Sandbox program– Fastest way to learn Apache Hadoop– Multi-level tutorials for wide applicability– Customizable and updateable

Page 17

Page 18: Ben Marden - Making sense of Big Data

© Hortonworks Inc. 2013

Summary

• Leading the Innovation in Core Hadoop• Addressing the requirements for Enterprise usage• Enabling interoperability of the ecosystem• No lock-in. 100% Open Source.

• Best in industry support with flexible pricing model

• Find out more–www.hortonworks.com

–http://hortonworks.com/hadoop-training/

Page 18