hack reduce introduction

15

Upload: montrealouvert

Post on 21-Dec-2014

182 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hack reduce introduction
Page 2: Hack reduce introduction

What is hack/reduce?

•A Home for the Big Data Community

•24/7 Access to Cluster Compute Power

•Regular Hackathons

Page 3: Hack reduce introduction
Page 4: Hack reduce introduction

hack/reduceBoston’s Big Data Hackspace

hack/reduceMontreal

Ottawa

TorontoBoston

2011

2012

Page 5: Hack reduce introduction
Page 6: Hack reduce introduction
Page 7: Hack reduce introduction

Why should you care?•Work with Millions and Billions of

records

•Find patterns in Big Data sets

•Use data to detect, predict, forecast

•Extract new information from raw data

Page 8: Hack reduce introduction

APIs SuckIn Big data there are:

•no requests,

•no predefined parameters

•no structured responses.

You are free to intersect anything with anything.

You can analyse, mutate, group, split, reorder in any way you can imagine.

Page 9: Hack reduce introduction

What you can do today

•Access the hack/reduce GoGrid Cluster:

•240 Cores

•240GB of RAM

•10TB of Disk

Page 10: Hack reduce introduction

What you can do today

Use Hadoop to Explore big Open Data sets, like:

•20 Years of the Federal Parliament Hansard

•Hourly Canadian Weather 1953 to 2001

•The 1881 Census. Details about 4.3M people

•One Summer of Bixi Station Status Updates

Page 11: Hack reduce introduction
Page 12: Hack reduce introduction

What is Map/Reduce?

•Framework for distributed computing on large data sets on clusters of computers

•MapReduce patented by Google

•Hadoop implementation is Googlesque

•Michael Stonebraker hates it

Page 13: Hack reduce introduction

What is Map/Reduce?

•Map = function applied in parallel to every item in the dataset

•Reduce = function applied in parallel to groups of values emitted by Map function

Page 14: Hack reduce introduction

What is Map/Reduce?

map(String docId, String document): for each word w in document: emit(w, 1); reduce(String word, Iterator counts): int sum = 0; for each count in counts: sum += count; emit(word, sum);

Page 15: Hack reduce introduction

MapReduce: http://cluster-1-master.gg.hackreduce.net:50030

SSH: ssh -i hackreduce [email protected]

private key (“hackreduce”): http://bit.ly/X13pNh

wiki: http://github.com/hackreduce/Hackathon