“introducing hadoop on azure:

15
“Introducing Hadoop on Azure: Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials: http://www.joehummel.net/downloads.html Email: [email protected] hello Map-Reduce!”

Upload: abla

Post on 06-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

hello Map-Reduce!”. “Introducing Hadoop on Azure:. Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago. Materials: http ://www.joehummel.net/downloads.html Email: [email protected]. A little history…. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: “Introducing  Hadoop  on Azure:

“Introducing Hadoop on Azure:

Joe Hummel, PhD

Visiting Researcher: U. of California, IrvineAdjunct Professor: U. of Illinois, Chicago &

Loyola U., Chicago

Materials: http://www.joehummel.net/downloads.htmlEmail: [email protected]

hello Map-Reduce!”

Page 2: “Introducing  Hadoop  on Azure:

Hadoop on Azure 2

Map-Reduce is from functional programming

A little history…

// function returns 1 if i is prime, 0 if not:let isPrime(i) = ...

// sums 2 numbers:let sum(x, y) = return x + y

// count the number of primes in 1..N:let countPrimes(N) = let L = [ 1 .. N ] // [ 1, 2, 3, 4, 5, 6, ... ] let T = map isPrime L // [ 0, 1, 1, 0, 1, 0, ... ] let count = reduce sum T // 42 return count

Page 3: “Introducing  Hadoop  on Azure:

3

Hadoop:◦ Created by to drive internet search◦ Parallelism

◦ Data partitioning

◦ Fault tolerance

A little more history…

BIGData

pagehits

Page 4: “Introducing  Hadoop  on Azure:

4

Freely-available framework for big data◦ http://hadoop.apache.org/

Based on concept of Map-Reduce:

Hadoop today

BIGdata

Map

Map

Map

Map...

Reduce R

map function reduce intermediate results

...

Page 5: “Introducing  Hadoop  on Azure:

5

Workflow

Map

Sort

Reduce

Merge

[ <key1, [value,value,…]>, <key2, [value,value,…]>, … ]

[ <key1, value>, <key2, value>… ] R

Data

Map

Sort

Map

Sort

[ <key1,value>, <key4,value>, <key2,value>, … ]

[ <key1,value>, <key1,value>, … ]

Page 6: “Introducing  Hadoop  on Azure:

6

We’ll be working with Chicago crime data…◦ https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

◦ http://www.cityofchicago.org/city/en/narr/foia/CityData.html

Data set for demo

1 GB

5M rows

Page 7: “Introducing  Hadoop  on Azure:

7

Compute top-10 crimes…

Goal?

0486 3669030820 308074...0890 166916

IUCR Count

IUCR = Illinois Uniform Crime Codes

https://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

Page 8: “Introducing  Hadoop  on Azure:

8

Hadoop on Azure…

Demo

Hadoop on Azure

// Javascript version:

var map = function (key, value, context){ var values = value.split(","); context.write(values[4], 1);};

var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum);};

0486 3669030820 308074...

Page 9: “Introducing  Hadoop  on Azure:

9

Rich ecosystem around Hadoop ◦ Pig ◦ Hive◦ HBASE◦ …

Hadoop on Azure

Hadoop++

// interactive PIG with explicit Map-Reduce functions:

pig.from("CC-from-2001.txt"). mapReduce("IUCR-Count.js", "IUCR, Count:long"). orderBy("Count DESC"). take(10). to("output-from-2001")

// interactive PIG without explicit Map-Reduce:

schema = "ID,Case Number,Date,Block,IUCR,..."pig.from("CC-from-2001.txt", schema). groupBy("IUCR"). select("group, SUM($1.Count"). orderBy("Count DESC"). take(10). to("output-from-2001")

Page 10: “Introducing  Hadoop  on Azure:

10

Microsoft is offering free access to Hadoop◦ Request invitation @ http://www.hadooponazure.com/

Hadoop connector for Excel◦ Process data using Hadoop, analyze/visualize using Excel

Hadoop on Azure

Hadoop on Azure

Page 11: “Introducing  Hadoop  on Azure:

Big Data Processing, Cheap

11

Freely-available plugin for Excel 2010◦ http://www.powerpivot.com/

Turns Excel into an in-memory database◦ More precisely, turns spreadsheet into an OLAP cube

Note: ◦ If you have 32-bit Excel, install 32-bit PowerPivot

◦ If you have 64-bit Excel, install 64-bit PowerPivot

◦ GBs of data will require 64-bit

◦ [ How to tell what version of Excel you have? File menu, help… ]

PowerPivot

Page 12: “Introducing  Hadoop  on Azure:

12

PowerPivot…◦ Install

◦ PowerPivot menu

◦ PowerPivot Window

◦ Get Data...

◦ PivotTable…

Demo

Big Data Processing, Cheap

Page 13: “Introducing  Hadoop  on Azure:

Compare and contrastApproach Pros Cons Target Scalable?

PowerPivot

No programming, built-in UI and visualization

Lack of scalability GBsLimited by

RAM

LINQ Flexibility of analysis ProgrammingGBs, few

TBsLimited by

local resources

Hadoop Scalability, ease of programming

Must fit into Map-Reduce

framework; not necessarily fast

GBs, TBs, PBs

Yes!(via cluster or

cloud)

Big Data Processing, Cheap

Page 14: “Introducing  Hadoop  on Azure:

15

That’s it!

Big Data Processing, Cheap

Page 15: “Introducing  Hadoop  on Azure:

16

Presenter: Joe Hummel◦ Email: [email protected]◦ Materials: http://www.joehummel.net/downloads.html

Keep an eye for final release of:◦ Hadoop on Azure

◦ Hadoop on Windows

◦ PowerView plugin for Excel 2013

Thank you for attending

Big Data Processing, Cheap