hadoop on azure

39
Hadoop on Azure Lynn Langit Practioner, Author, Instructor Nov 2012 – DevelopMentor / London

Upload: lynn-langit

Post on 15-Jan-2015

1.437 views

Category:

Technology


1 download

DESCRIPTION

deck from DM Nov 2012

TRANSCRIPT

Page 1: Hadoop on Azure

Hadoop on Azure

Lynn LangitPractioner, Author, Instructor

Nov 2012 – DevelopMentor / London

Page 2: Hadoop on Azure

Hadoop = BigData?

• HUGE Hype factor in 2011 / 2012

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers

Page 3: Hadoop on Azure

Oracle Loader for Hadoop

SQL Server Connector for Hadoop

Page 4: Hadoop on Azure

Flavors of NoSQL

Page 5: Hadoop on Azure

Column Database

Wide, sparse column sets

Page 6: Hadoop on Azure

RDBMS vs. HadoopTraditional RDBMS Hadoop

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 7: Hadoop on Azure

What about the cloud?

Page 8: Hadoop on Azure

The reality…two pivots

Storage Methods• SQL (RDBMS) • Hadoop

Storage Locations• On premises • Cloud-hosted

Page 9: Hadoop on Azure

Demo - Setting up Your Cluster

Page 10: Hadoop on Azure

Cluster Allocation Process

Page 11: Hadoop on Azure

Working with Hadoop on AzureTools / Languages• MapReduce

• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)• JavaScript• C# Streaming

• Pig (ETL -- Java)• Hive (HQL Query)

• HBase tables• Others

• Mahout (analyze)• R (analyze)

Page 12: Hadoop on Azure

Tasks – DBA vs. Hadoop on AzureRDBMS Hadoop on AzureImport Data Upload Data using FTP or import via SqoopSetup Security Setup SecurityScale Compute (up or out) Add child nodes to the clusterPerform a Backup Monitor and replace failed nodesRestore a Database n/aClean up data via ETL Execute a PIG jobCreate an Index – query tune Write a HIVE query (HQL)Join Tables Together Run MapReducen/a Monitor and manage running MapReduce jobsSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources used

Send an Email from SQL Server Set up resource threshold alerts

Manage License costs Manage usage time charges

Page 13: Hadoop on Azure

Demo - Basic Administration

Open Ports, Interactive, Remote…

Page 14: Hadoop on Azure

Demo - Basic Administration

Connect via RDP

Page 15: Hadoop on Azure

NameNode Utility – Top Level

Page 16: Hadoop on Azure

NameNode Utility – Drill Down

Page 17: Hadoop on Azure

Demo - Basic Administration

Page 18: Hadoop on Azure

Configuring Upload from Azure

Page 19: Hadoop on Azure

Using the Azure Storage Viewer

Page 20: Hadoop on Azure

Configuring Upload from MarketPlace

Page 21: Hadoop on Azure

Asking Questions = MapReduce

Page 22: Hadoop on Azure

Samples

Page 23: Hadoop on Azure

More Samples

Page 24: Hadoop on Azure

Demo - MapReduce using Java

• WordCount example

Page 25: Hadoop on Azure

Demo - MapReduce using C# Streaming

• WordCount example

Page 26: Hadoop on Azure

Demo - MapReduce using JavaScript

• WordCount example

Page 27: Hadoop on Azure

Demo - Using HIVE

• WordCount example

Page 28: Hadoop on Azure

Demo - Using HIVE

Page 29: Hadoop on Azure

Monitoring Job Results• In the portal

– Main Console• Job icon (button) status summary• Job History

– Interactive Console• JS quick feedback• JS detailed feedback (log)

• Using RDP– Map/Reduce tool

Page 30: Hadoop on Azure

Demo – Monitoring Job Status

Page 31: Hadoop on Azure

Download – ODBC for HIVE

• Includes add-in for Excel

Page 32: Hadoop on Azure

Demo - Hadoop Connector to Excel

Page 33: Hadoop on Azure

Connecting to PowerPivot

• Create an ODBC connection to HIVE• Connect to ‘other data source’ in PowerPivot

Page 34: Hadoop on Azure

Case Study - Klout

Page 35: Hadoop on Azure

Real-World – Hadoop and…

Facebook runs on Hadoop & MySQL

Twitter runs on Hadoop (ran on FlockDb/graph)

Yahoo runs on Hadoop

LinkedIn runs on Hadoop & Voldemort

Klout runs Hadoop (on Azure) &HBase (Hive) & SQL Server SSAS BISM cubes

Page 36: Hadoop on Azure

Hadoop To-Do ListBigData = Hadoop• Use Hadoop when business

needs designate• Use other NoSQL if a better fit

Hadoop on the cloud• Quick and cheap• Specialized use cases

• Behavioral data• dev, test , training

environments

Hadoop access technologies• Learn Map/Reduce• Use HIVE via Excel• Pay attention to Impala

Page 37: Hadoop on Azure

The Changing Data Landscape

HadoopRDBMS

OtherServices

Page 38: Hadoop on Azure

www.TeachingKidsProgramming.org• Free Courseware ( • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic

• recipes)

Page 39: Hadoop on Azure

Toward Data Craftsmanship…

Follow me @LynnLangit

RSS my blog www.LynnLangit.com

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions