hdinsight hadoop on windows azure

35
S Hadoop on Azure @LynnLangit

Upload: lynn-langit

Post on 06-May-2015

3.600 views

Category:

Technology


1 download

DESCRIPTION

Introduction to HDInsight Hadoop on Windows Azure services, including using the interactive console with JavaScript and running WordCount via other methods (Streaming, Hive, etc..)

TRANSCRIPT

Page 1: HDInsight Hadoop on Windows Azure

S

Hadoop on Azure@LynnLangit

Page 2: HDInsight Hadoop on Windows Azure

Data Expertise / Lynn Langit

Practicing Architect

Cloud Deployments (Azure, AWS, Google)

Technical author / trainer

Google Cloud Developer SeriesSQL Server 2012 Developer Series Cloudera Certified Developer2 books on SQL Server BI

Industry awards

Microsoft – MVP for SQL Server Google – GDE for Cloud Platform10Gen – Master for MongoDB

Former MSFT FTE

4 years

Page 3: HDInsight Hadoop on Windows Azure

What is Hadoop?

HUGE Hype factor in 2011 / 2012

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• Uses HDFS storage to enable applications to work with

thousands of nodes and petabytes of data • Uses MapReduce to process the data• Inspired by Google

• MapReduce • Google File System 

Page 4: HDInsight Hadoop on Windows Azure

What is HDInsight?

Hadoop on Windows Azure On-premise

Microsoft worked with Hortonworks to port Hadoop to Windows (from Linux)

Page 5: HDInsight Hadoop on Windows Azure

Working with HDInsight

Page 6: HDInsight Hadoop on Windows Azure

RDBMS vs. Hadoop

RDBMS Hadoop

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times

Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 7: HDInsight Hadoop on Windows Azure

Setting Up Your Cluster

Page 8: HDInsight Hadoop on Windows Azure

Configuration 1

Page 9: HDInsight Hadoop on Windows Azure

Configuration 2

Page 10: HDInsight Hadoop on Windows Azure

Pricing (during Preview)

Page 11: HDInsight Hadoop on Windows Azure

Dem

o

Page 12: HDInsight Hadoop on Windows Azure

Basic Administration

Connect via RDP

Page 13: HDInsight Hadoop on Windows Azure

NameNode Utility – Top Level

Page 14: HDInsight Hadoop on Windows Azure

NameNode Utility – Drill Down

Page 15: HDInsight Hadoop on Windows Azure

Understanding Storage

Page 16: HDInsight Hadoop on Windows Azure

Using the Azure Storage Viewer

Page 17: HDInsight Hadoop on Windows Azure

What is MapReduce?

Page 18: HDInsight Hadoop on Windows Azure

MapReduce using Java

WordCount example

Page 19: HDInsight Hadoop on Windows Azure

MapReduce using C# Streaming

WordCount example

Page 20: HDInsight Hadoop on Windows Azure

MapReduce using JavaScript

WordCount example

Page 21: HDInsight Hadoop on Windows Azure

Simple Output Graphing

WordCount example

Page 22: HDInsight Hadoop on Windows Azure

Using HIVE

Page 23: HDInsight Hadoop on Windows Azure

Understanding Pig

Load>Transform>Dump or Store

Page 24: HDInsight Hadoop on Windows Azure

Monitoring Job Results

In the portal Main Console

Job icon (button) status summary

Job History Interactive Console

JS quick feedback JS detailed feedback (log)

Using RDP Map/Reduce tool Hadoop command

prompt

Page 25: HDInsight Hadoop on Windows Azure
Page 26: HDInsight Hadoop on Windows Azure

Monitoring Job Status

Page 27: HDInsight Hadoop on Windows Azure

Download – ODBC for HIVE

Includes add-in for Excel

Page 28: HDInsight Hadoop on Windows Azure

Hadoop Connector to Excel

Page 29: HDInsight Hadoop on Windows Azure

Connecting to PowerPivot

Create an ODBC connection to HIVE

Connect to ‘other data source’ in PowerPivot

Page 30: HDInsight Hadoop on Windows Azure

Connecting with PowerQuery

Page 31: HDInsight Hadoop on Windows Azure

Pulling it Together - Klout

Page 32: HDInsight Hadoop on Windows Azure

Hadoop To-Do List

• Use Hadoop when business needs designate

• Use other NoSQL if a better fit

BigData = Hadoop

• Quick and cheap• Specialized use

cases• Behavioral data• dev, test ,

training environments

Hadoop on the cloud • Learn

Map/Reduce• Use HIVE via

Excel• Pay attention to

ImpalaHadoop access

technologies

Page 33: HDInsight Hadoop on Windows Azure

www.TeachingKidsProgramming.org

Page 34: HDInsight Hadoop on Windows Azure

VOTECONFIRMSHARE

Page 35: HDInsight Hadoop on Windows Azure

Keep Learning

@LynnLangit

YouTube – SoCalDevGal

Hire Me Architecture Best Practices Performance Tuning