hack reduce introduction

What is hack/reduce?

•A Home for the Big Data Community

•24/7 Access to Cluster Compute Power

•Regular Hackathons

hack/reduceBoston’s Big Data Hackspace

hack/reduceMontreal

Ottawa

TorontoBoston

2011

2012

Why should you care?•Work with Millions and Billions of

records

•Find patterns in Big Data sets

•Use data to detect, predict, forecast

•Extract new information from raw data

APIs SuckIn Big data there are:

•no requests,

•no predefined parameters

•no structured responses.

You are free to intersect anything with anything.

You can analyse, mutate, group, split, reorder in any way you can imagine.

What you can do today

•Access the hack/reduce GoGrid Cluster:

•240 Cores

•240GB of RAM

•10TB of Disk

What you can do today

Use Hadoop to Explore big Open Data sets, like:

•20 Years of the Federal Parliament Hansard

•Hourly Canadian Weather 1953 to 2001

•The 1881 Census. Details about 4.3M people

•One Summer of Bixi Station Status Updates

What is Map/Reduce?

•Framework for distributed computing on large data sets on clusters of computers

•MapReduce patented by Google

•Hadoop implementation is Googlesque

•Michael Stonebraker hates it

What is Map/Reduce?

•Map = function applied in parallel to every item in the dataset

•Reduce = function applied in parallel to groups of values emitted by Map function

What is Map/Reduce?

map(String docId, String document): for each word w in document: emit(w, 1); reduce(String word, Iterator counts): int sum = 0; for each count in counts: sum += count; emit(word, sum);

MapReduce: http://cluster-1-master.gg.hackreduce.net:50030

SSH: ssh -i hackreduce [email protected]

private key (“hackreduce”): http://bit.ly/X13pNh

wiki: http://github.com/hackreduce/Hackathon

http://cluster-1-master.gg.hackreduce.net:50070/

http://cluster-1-master.gg.hackreduce.net:50070/

mailto:[email protected]

mailto:[email protected]

http://bit.ly/X13pNh

http://bit.ly/X13pNh

hack reduce introduction

Documents