hadoop on azure

38
Hadoop on Azure Lynn Langit Practioner, Author, Instructor June 2012- for SoCalCodeCamp

Upload: lynn-langit

Post on 01-Nov-2014

5.178 views

Category:

Technology


0 download

DESCRIPTION

deck for SoCalCodeCamp June2012

TRANSCRIPT

Page 1: Hadoop on Azure

Hadoop on Azure

Lynn LangitPractioner, Author, Instructor

June 2012- for SoCalCodeCamp

Page 2: Hadoop on Azure

Hadoop = BigData?

• HUGE Hype factor in 2011 / 2012

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers

Page 3: Hadoop on Azure

Oracle Loader for Hadoop

SQL Server Connector for Hadoop

Page 4: Hadoop on Azure

Flavors of NoSQL

Page 5: Hadoop on Azure

Column Database

Wide, sparse column sets

Page 6: Hadoop on Azure

RDBMS vs. HadoopTraditional RDBMS Hadoop

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 7: Hadoop on Azure

What about the cloud?

Page 8: Hadoop on Azure

The reality…two pivots

Storage Methods• SQL (RDBMS) • Hadoop

Storage Locations• On premises • Cloud-hosted

Page 9: Hadoop on Azure

Demo - Setting up Your Cluster

Page 10: Hadoop on Azure

Cluster Allocation Process

Page 11: Hadoop on Azure

Working with Hadoop on AzureTools / Languages• MapReduce

• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)• JavaScript• C# Streaming

• Pig (ETL -- Java)• Hive (HQL Query)

• HBase tables• Others

• Mahout (analyze)• R (analyze)

Page 12: Hadoop on Azure

Tasks – DBA vs. Hadoop on AzureRDBMS Hadoop on AzureImport Data Upload Data using FTP or import via SqoopSetup Security Setup SecurityScale Compute (up or out) Add child nodes to the clusterPerform a Backup Monitor and replace failed nodesRestore a Database n/aClean up data via ETL Execute a PIG jobCreate an Index – query tune Write a HIVE query (HQL)Join Tables Together Run MapReducen/a Monitor and manage running MapReduce jobsSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources used

Send an Email from SQL Server Set up resource threshold alerts

Manage License costs Manage usage time charges

Page 13: Hadoop on Azure

Demo - Basic Administration

Open Ports

Page 14: Hadoop on Azure

Demo - Basic Administration

Connect via RDP

Page 15: Hadoop on Azure

NameNode Utility – Top Level

Page 16: Hadoop on Azure

NameNode Utility – Drill Down

Page 17: Hadoop on Azure

Demo - Basic Administration

Configure connections to remote storage

Page 18: Hadoop on Azure

Configuring Upload from AWS S3

Page 19: Hadoop on Azure

Configuring Upload from Azure

Page 20: Hadoop on Azure

Using the Azure Storage Viewer

Page 21: Hadoop on Azure

Configuring Upload from DataMarket

Page 22: Hadoop on Azure

Asking Questions = MapReduce

Page 23: Hadoop on Azure

Samples

Page 24: Hadoop on Azure

Demo - MapReduce using Java

• WordCount example using AWS S3 data

Page 25: Hadoop on Azure

Demo - MapReduce using C# Streaming

• WordCount example

Page 26: Hadoop on Azure

Demo - MapReduce using JavaScript

• WordCount example

Page 27: Hadoop on Azure

Demo - Using HIVE

• WordCount example

Page 28: Hadoop on Azure

Demo - Using HIVE

Page 29: Hadoop on Azure

Monitoring Job Results• In the portal

– Main Console• Job icon (button) status summary• Job History

– Interactive Console• JS quick feedback• JS detailed feedback (log)

• Using RDP– Map/Reduce tool

Page 30: Hadoop on Azure

Demo – Monitoring Job Status

Page 31: Hadoop on Azure

Download – ODBC for HIVE

• Includes add-in for Excel

Page 32: Hadoop on Azure

Demo - Hadoop Connector to Excel

Page 33: Hadoop on Azure

Connecting to PowerPivot

• Create an ODBC connection to HIVE• Connect to ‘other data source’ in PowerPivot

Page 34: Hadoop on Azure

Real-World – Hadoop and…

Facebook runs on Hadoop & MySQL

Twitter runs on Hadoop (ran on FlockDb/graph)

Yahoo runs on Hadoop

LinkedIn runs on Hadoop & Voldemort

Klout runs Hadoop (on Azure) &HBase (Hive) & SQL Server SSAS BISM cubes

Page 35: Hadoop on Azure

Hadoop To-Do ListBigData = Hadoop• Use Hadoop when business

needs designate

Hadoop on the cloud• Quick and cheap• Specialized use cases• Behavioral data• dev, test , training environments

Hadoop access technologies• Learn Map/Reduce• Use HIVE via Excel

Page 36: Hadoop on Azure

The Changing Data Landscape

HadoopRDBMS

OtherServices

Page 37: Hadoop on Azure

TeachingKidsProgramming.org

Do a Recipe Teach a Kid (Ages 10 ++)SmallBasic or Java Free Courseware (recipes)

Page 38: Hadoop on Azure

Toward Data Craftsmanship…

Follow me @LynnLangit

RSS my blog www.LynnLangit.com

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions