hadoop on azure

Post on 15-Jan-2015

1.437 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

deck from DM Nov 2012

TRANSCRIPT

Hadoop on Azure

Lynn LangitPractioner, Author, Instructor

Nov 2012 – DevelopMentor / London

Hadoop = BigData?

• HUGE Hype factor in 2011 / 2012

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers

Oracle Loader for Hadoop

SQL Server Connector for Hadoop

Flavors of NoSQL

Column Database

Wide, sparse column sets

RDBMS vs. HadoopTraditional RDBMS Hadoop

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

What about the cloud?

The reality…two pivots

Storage Methods• SQL (RDBMS) • Hadoop

Storage Locations• On premises • Cloud-hosted

Demo - Setting up Your Cluster

Cluster Allocation Process

Working with Hadoop on AzureTools / Languages• MapReduce

• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)• JavaScript• C# Streaming

• Pig (ETL -- Java)• Hive (HQL Query)

• HBase tables• Others

• Mahout (analyze)• R (analyze)

Tasks – DBA vs. Hadoop on AzureRDBMS Hadoop on AzureImport Data Upload Data using FTP or import via SqoopSetup Security Setup SecurityScale Compute (up or out) Add child nodes to the clusterPerform a Backup Monitor and replace failed nodesRestore a Database n/aClean up data via ETL Execute a PIG jobCreate an Index – query tune Write a HIVE query (HQL)Join Tables Together Run MapReducen/a Monitor and manage running MapReduce jobsSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources used

Send an Email from SQL Server Set up resource threshold alerts

Manage License costs Manage usage time charges

Demo - Basic Administration

Open Ports, Interactive, Remote…

Demo - Basic Administration

Connect via RDP

NameNode Utility – Top Level

NameNode Utility – Drill Down

Demo - Basic Administration

Configuring Upload from Azure

Using the Azure Storage Viewer

Configuring Upload from MarketPlace

Asking Questions = MapReduce

Samples

More Samples

Demo - MapReduce using Java

• WordCount example

Demo - MapReduce using C# Streaming

• WordCount example

Demo - MapReduce using JavaScript

• WordCount example

Demo - Using HIVE

• WordCount example

Demo - Using HIVE

Monitoring Job Results• In the portal

– Main Console• Job icon (button) status summary• Job History

– Interactive Console• JS quick feedback• JS detailed feedback (log)

• Using RDP– Map/Reduce tool

Demo – Monitoring Job Status

Download – ODBC for HIVE

• Includes add-in for Excel

Demo - Hadoop Connector to Excel

Connecting to PowerPivot

• Create an ODBC connection to HIVE• Connect to ‘other data source’ in PowerPivot

Case Study - Klout

Real-World – Hadoop and…

Facebook runs on Hadoop & MySQL

Twitter runs on Hadoop (ran on FlockDb/graph)

Yahoo runs on Hadoop

LinkedIn runs on Hadoop & Voldemort

Klout runs Hadoop (on Azure) &HBase (Hive) & SQL Server SSAS BISM cubes

Hadoop To-Do ListBigData = Hadoop• Use Hadoop when business

needs designate• Use other NoSQL if a better fit

Hadoop on the cloud• Quick and cheap• Specialized use cases

• Behavioral data• dev, test , training

environments

Hadoop access technologies• Learn Map/Reduce• Use HIVE via Excel• Pay attention to Impala

The Changing Data Landscape

HadoopRDBMS

OtherServices

www.TeachingKidsProgramming.org• Free Courseware ( • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic

• recipes)

Toward Data Craftsmanship…

Follow me @LynnLangit

RSS my blog www.LynnLangit.com

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions

top related