apache hadoop on the open cloud

25
© Hortonworks Inc. 2013 Quick Housekeeping Q&A box is available for your questions Webinar will be recorded for future viewing Thank You for joining!

Upload: hortonworks

Post on 26-Jan-2015

112 views

Category:

Technology


1 download

DESCRIPTION

Deck to our Apache Hadoop in the Open Cloud with Rackspace webinar.

TRANSCRIPT

Page 1: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

Quick Housekeeping

Q&A box is available for your questions

Webinar will be recorded for future viewing

Thank You for joining!

Page 2: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

Apache Hadoop on the Open Cloud

Page 2

Page 3: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

Your Presenters

• Nirmal Ranganathan (@rnirmal) – Software Developer @Rackspace –  Active contributor to various Openstack

projects including Nova, Cinder and Trove. – Fun fact: Currently building the Rackspace

Cloud Big Data Platform.

• Steve Loughran (@steveloughran)

–  Member of Technical Staff @hortonworks –  Hadoop committer since 2008 –  Fun fact: “I break things, a lot.”

Page 3

Page 4: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

Today’s Topics

• Introduction • Key Drivers for Hadoop • Overview of Reference Architecture for Apache Hadoop-ready Infrastructure

• Behind the scene (demo) look at Rackspace Cloud Big Data Platform

• Q&A

Page 4

Page 5: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

Drivers of Hadoop Adoption

Page 5

Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge

Types of Big Data •  CRM, ERP •  Server log •  Clickstream

•  Sentiment/Social •  Machine/Sensor •  Geo-locations

Modern Data Architecture Complement your existing data systems: the right workload in the right place

Efficiency

Opportunity

Page 6: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

Opportunity in types of data

1.  Sentiment Understand how your customers feel about your brand and products – right now

2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website

3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines

4.  Geographic Analyze location-based data to manage operations where they occur

5.  Server Logs Research logs to diagnose process failures and prevent security breaches

6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents

Value

Page 6

Page 7: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

New Types of Data = New Business Apps

• Unlock new OPPORTUNITY via analytic apps built around new types of data – Sentiment (social media) – Clickstream – Machine / sensor data – Geo / tracking data – Web Logs – Unstructured (video, pictures, free text)

• Business case driven

• LOB / Business IT oriented

Page 7

ENTERPRISE  HADOOP  PLATFORM  

Business    Analy9c  App  

New  Sources    (sen9ment,  clickstream,  geo,  sensor,  …)  

Page 8: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.

1

Interoperable Integrated with existing data center investments

Key Services Platform, Operational and Data services essential for the enterprise Skills Leverage your existing skills: development, operations, analytics

23

Requirements for Enterprise Hadoop

Page 8

OS/VM   Cloud   Appliance  

PLATFORM    SERVICES  

   

CORE  

Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

HORTONWORKS    DATA  PLATFORM  (HDP)  

OPERATIONAL  SERVICES  

DATA  SERVICES  

HDFS  

SQOOP  

FLUME  

NFS  

LOAD  &    EXTRACT  

WebHDFS  

KNOX*  

OOZIE  

AMBARI  

FALCON*  

YARN      

MAP       TEZ  REDUCE  

HIVE  &  HCATALOG  PIG  HBASE  

Page 9: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.

Requirements for Enterprise Hadoop

Page 9

1

Interoperable Integrated with existing data center investments

Key Services Platform, operational and data services essential for the enterprise

Skills Leverage your existing skills: development, operations, analytics

23

Develop Java, C, C++, .NET, Python, Pig

Operate Tools, Consoles, Scriptable APIs

Analyze SQL, R, SAS, Excel

Page 10: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis9ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen9ment,  Geo,  Unstructured)  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business    Analy9cs  

Custom  Applica9ons  

Packaged  Applica9ons  

Requirements for Enterprise Hadoop

Page 10

Interoperable Integrated with existing data center investments 3

Integrate with Applications Business Intelligence, Developer IDEs, Data Integration

Systems Data Systems & Storage, Systems Management

Platforms Operating Systems, Virtualization, Cloud, Appliances

Page 11: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.

Hortonworks Apache Hadoop + Openstack

Page 11

Page 12: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2013

Swift Filesystem for Hadoop: HADOOP-8545

• New Hadoop filesystem URL, swift://

• Read from, write to Swift object stores

• Local and Remote

• Anywhere you can use hdfs:// URLs

12

Page 13: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.

Swift for Persistence – HDFS for Performance

Page 13

Hadoop VM Hadoop VM

Hadoop VM Hadoop VM

Swift Server

file block1 block2 block3

Swift Server

Swift Server

Swift Server

Swift Server

Page 14: Apache Hadoop on the Open Cloud

14

Demo

Page 15: Apache Hadoop on the Open Cloud

15

Hadoop in the Cloud Use Cases

Page 16: Apache Hadoop on the Open Cloud

16

Advantages of using the cloud

Fast Easy

Flexible

Page 17: Apache Hadoop on the Open Cloud

17

Development / POC Clusters

Page 18: Apache Hadoop on the Open Cloud

18

Dynamic Clusters

Page 19: Apache Hadoop on the Open Cloud

19

Growth Clusters

Page 20: Apache Hadoop on the Open Cloud

20

Your data is already in the Cloud

Page 21: Apache Hadoop on the Open Cloud

21

Cloud Big Data Platform

•  Hortonworks Data Platform •  HDP 1.1 •  HDP 1.3 •  Pig, Hive, HCatalog •  Coming soon HDP 2.0

Page 22: Apache Hadoop on the Open Cloud

22

Cloud Big Data Platform

•  Secure by default

•  Comes pre-optimized

•  Web UI, CLI, REST API

Page 23: Apache Hadoop on the Open Cloud

23

Built on Openstack

Page 24: Apache Hadoop on the Open Cloud

24

Why an Open Platform matters Sandbox on Rackspace

Cloud

Sandbox VM

RAX Resell

Page 25: Apache Hadoop on the Open Cloud

© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.

Next Steps:

More about Rackspace Cloud Big Data Platform http://www.rackspace.com/cloud/big-data

Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/sandbox

Follow us: @hortonworks @Rackspace