apache hadoop on the open cloud
DESCRIPTION
Deck to our Apache Hadoop in the Open Cloud with Rackspace webinar.TRANSCRIPT
© Hortonworks Inc. 2013
Quick Housekeeping
Q&A box is available for your questions
Webinar will be recorded for future viewing
Thank You for joining!
© Hortonworks Inc. 2013
Apache Hadoop on the Open Cloud
Page 2
© Hortonworks Inc. 2013
Your Presenters
• Nirmal Ranganathan (@rnirmal) – Software Developer @Rackspace – Active contributor to various Openstack
projects including Nova, Cinder and Trove. – Fun fact: Currently building the Rackspace
Cloud Big Data Platform.
• Steve Loughran (@steveloughran)
– Member of Technical Staff @hortonworks – Hadoop committer since 2008 – Fun fact: “I break things, a lot.”
Page 3
© Hortonworks Inc. 2013
Today’s Topics
• Introduction • Key Drivers for Hadoop • Overview of Reference Architecture for Apache Hadoop-ready Infrastructure
• Behind the scene (demo) look at Rackspace Cloud Big Data Platform
• Q&A
Page 4
© Hortonworks Inc. 2013
Drivers of Hadoop Adoption
Page 5
Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge
Types of Big Data • CRM, ERP • Server log • Clickstream
• Sentiment/Social • Machine/Sensor • Geo-locations
Modern Data Architecture Complement your existing data systems: the right workload in the right place
Efficiency
Opportunity
© Hortonworks Inc. 2013
Opportunity in types of data
1. Sentiment Understand how your customers feel about your brand and products – right now
2. Clickstream Capture and analyze website visitors’ data trails and optimize your website
3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines
4. Geographic Analyze location-based data to manage operations where they occur
5. Server Logs Research logs to diagnose process failures and prevent security breaches
6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents
Value
Page 6
© Hortonworks Inc. 2013
New Types of Data = New Business Apps
• Unlock new OPPORTUNITY via analytic apps built around new types of data – Sentiment (social media) – Clickstream – Machine / sensor data – Geo / tracking data – Web Logs – Unstructured (video, pictures, free text)
• Business case driven
• LOB / Business IT oriented
Page 7
ENTERPRISE HADOOP PLATFORM
Business Analy9c App
New Sources (sen9ment, clickstream, geo, sensor, …)
© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.
1
Interoperable Integrated with existing data center investments
Key Services Platform, Operational and Data services essential for the enterprise Skills Leverage your existing skills: development, operations, analytics
23
Requirements for Enterprise Hadoop
Page 8
OS/VM Cloud Appliance
PLATFORM SERVICES
CORE
Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATA SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN
MAP TEZ REDUCE
HIVE & HCATALOG PIG HBASE
© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.
Requirements for Enterprise Hadoop
Page 9
1
Interoperable Integrated with existing data center investments
Key Services Platform, operational and data services essential for the enterprise
Skills Leverage your existing skills: development, operations, analytics
23
Develop Java, C, C++, .NET, Python, Pig
Operate Tools, Consoles, Scriptable APIs
Analyze SQL, R, SAS, Excel
© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis9ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sen9ment, Geo, Unstructured)
OPERATIONAL TOOLS
MANAGE & MONITOR
DEV & DATA TOOLS
BUILD & TEST
Business Analy9cs
Custom Applica9ons
Packaged Applica9ons
Requirements for Enterprise Hadoop
Page 10
Interoperable Integrated with existing data center investments 3
Integrate with Applications Business Intelligence, Developer IDEs, Data Integration
Systems Data Systems & Storage, Systems Management
Platforms Operating Systems, Virtualization, Cloud, Appliances
© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.
Hortonworks Apache Hadoop + Openstack
Page 11
© Hortonworks Inc. 2013
Swift Filesystem for Hadoop: HADOOP-8545
• New Hadoop filesystem URL, swift://
• Read from, write to Swift object stores
• Local and Remote
• Anywhere you can use hdfs:// URLs
12
© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.
Swift for Persistence – HDFS for Performance
Page 13
Hadoop VM Hadoop VM
Hadoop VM Hadoop VM
Swift Server
file block1 block2 block3
Swift Server
Swift Server
Swift Server
Swift Server
14
Demo
15
Hadoop in the Cloud Use Cases
16
Advantages of using the cloud
Fast Easy
Flexible
17
Development / POC Clusters
18
Dynamic Clusters
19
Growth Clusters
20
Your data is already in the Cloud
21
Cloud Big Data Platform
• Hortonworks Data Platform • HDP 1.1 • HDP 1.3 • Pig, Hive, HCatalog • Coming soon HDP 2.0
22
Cloud Big Data Platform
• Secure by default
• Comes pre-optimized
• Web UI, CLI, REST API
23
Built on Openstack
24
Why an Open Platform matters Sandbox on Rackspace
Cloud
Sandbox VM
RAX Resell
© Hortonworks Inc. 2012 © Hortonworks Inc. 2013. Confidential and Proprietary.
Next Steps:
More about Rackspace Cloud Big Data Platform http://www.rackspace.com/cloud/big-data
Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/sandbox
Follow us: @hortonworks @Rackspace