crystalball - compute relative frequency in hadoop

Big Data Project on

Crystal BallSubmitted By:

Sushil Sedai(984474)

Suvash Shah(984461)

Submitted to:Prof. Prem Nair

Pair approach (Mapper) – pseudo code

method map(docid id, doc d)

for each term w in doc d do

total = 0;for each neighbor u in Neighbor(w) do

Emit(Pair(w, u), 1);

total++;

Emit(Pair(w, *), total);

Pair approach (Mapper) – Java Code

Pair approach (Reducer) – pseudo code

method reduce(Pair p, Iterable<Int> values)

if p.secondValue == *

if p.firstValue is new

currentvalue = p.firstvalue;

marginal = sum(values)

marginal += sum(values)

else Emit(p, sum(values)/marginal);

Pair approach (Reducer) – Java Code

Pair approach - input

Mapper1 input

18 29 12 34 79 18 56 12 34 92

Mapper2 input

18 29 12 34 79 18 56 12 34 92

Pair approach – Output (Reducer1)(10,12) 0.5

(10,34) 0.5

(12,10)0.09090909090909091

(12,18)0.09090909090909091

(12,34)0.36363636363636365

(12,56) 0.18181818181818182

(12,79)0.09090909090909091

(12,92)0.18181818181818182

(18,12) 0.25

(18,29) 0.125

(18,34) 0.25

(18,56) 0.125

(18,79) 0.125

(18,92) 0.125

(29,10)0.06666666666666667

(29,12)0.26666666666666666

(29,18)0.06666666666666667

(29,34)0.26666666666666666

(29,56)0.13333333333333333

(29,79)0.06666666666666667

(29,92)0.13333333333333333

(34,10)0.08333333333333333

(34,12) 0.25

(34,18)0.08333333333333333

(34,29)0.08333333333333333

(34,56) 0.25

(34,79)0.08333333333333333

(34,92)0.16666666666666666

(56,10) 0.1

(56,12) 0.3

(56,29) 0.1

(56,34) 0.3

(56,92) 0.2

(92,10)0.3333333333333333

(92,12)0.3333333333333333

(92,34)0.3333333333333333

Pair approach – Output (Reducer2)

(79,12) 0.2

(79,18) 0.2

(79,34) 0.2

(79,56) 0.2

(79,92) 0.2

Stripe approach (Mapper) – pseudo code

Stripe H;

clear(H);

for each neighbor u in Neighbor(w) do

if H.containsKey(u)

H{u} += 1;

H.add(u, 1);

Emit(w, H);

Stripe approach (Mapper) – Java Code

Stripe approach (Reducer) – pseudo code

total = 0;

method reduce(Text key, Stripe H [H1, H2, …])

total = sumValues(H);

for each Item h in H do

h.secondValue /= total;

Emit(key, H);

Stripe approach (Reducer) – Java Code

Stripe appoach (Reducer) – Java Code

Stripe approach – input

Mapper1 input

34 56 29 12 34 56 92 10 34 12

Mapper2 input

18 29 12 34 79 18 56 12 34 92

Stripe approach – Output(Reducer1)

10 [ (34,0.5000) (12,0.5000) ]

12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ]

18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ]

29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667) (12,0.2667) ]

34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833) (12,0.2500) ]

56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ]

92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]

Stripe approach – Output(Reducer2)

79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000) (12,0.2000) ]

Hybrid approach (Mapper) – pseudo code

HashMap H;

for each neighbor u in Neighbor(w) do

if H.contains(Pair(w, u))

H{Pair(w, u)} += 1;

H.add(Pair(w, u));

for each Pair p in H do

Emit(p, H(p));

Hybrid approach (Mapper) – Java Code

Hybrid approach (Reducer) – pseudo codeprev = null;

HashMap H;

Method reduce(Pair p, Iterable<Int> values)

if p.firstValue != prev and not first

for each item h in H

h(prev.secondValue) /= total;

Emit(p.firstValue, H);

clear(H);

End if

prev = p.firstValue;

H.add(p.secondValue, sum(values));

Method close

//for last pair

for each item h in H

h(prev.secondValue) /= total;

Emit(p.firstValue, H);

Hybrid approach (Reducer) – Java Code

Hybrid approach - Input

Mapper1 input

34 56 29 12 34 56 92 10 34 12

Mapper2 input

18 29 12 34 79 18 56 12 34 92

Hybrid approach – Output(Reducer1)

10(12,0.5) (34,0.5)

12(10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909) (92,0.18181819)

18(12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125)

29(10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334) (79,0.06666667) (92,0.13333334)

34(10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336) (92,0.16666667)

56(10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2)

92(10,0.33333334) (12,0.33333334) (34,0.33333334)

Hybrid approach – Output(Reducer2)

79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)

Comparison

Apache Spark

Write a java program on spark to calculate total number of students in MUM coming in different entries. This program should display total number student by country.

Spark - Java Code

Spark - input

2014 Feb Nepal 20

2014 Feb India 15

2014 Oct Italy 2

2014 July France 1

2015 Feb Nepal 10

2015 Feb India 25

2015 Oct Italy 7

Spark - Output

(France,1)

(Italy,9)

(Nepal,30)

(India,40)

Tools Used

• VMPlayer Pro 7

• cloudera-quickstart-vm-5.4.0-0-vmware

• Eclipse Version: Luna Service Release 2 (4.4.2)

• Windows 8.1

References

• http://glebche.appspot.com/static/hadoop-ecosystem/mapreduce-job-java.html

• https://hadoopi.wordpress.com/2013/06/05/hadoop-implementing-the-tool-interface-for-mapreduce-driver/

• http://www.bogotobogo.com/Hadoop/BigData_hadoop_Apache_Spark.php

Thank You

crystalball - compute relative frequency in hadoop

Software

app inventor crystalball · app inventor tutorial...

2020 crystalball sponsorship brochurelrtitle:...

qct solutions for hadoop - w… · excellent choice for the...

application engineer-technical computing...

mastering odics with big data cloud service compute...

crystalball: predicting and preventing inconsistencies in...

hadoop 3.x more...

crystalball: gazing in the black box of sat...

low-latency machine learning applications...evolution of...

yarn - hadoop next generation compute platform

new services so far, we've learned to compute with hadoop,...

introduction to the minnesota supercomputing …...storage...

semem: deployment of mpi-based in-memory storage for ... ·...

hadoop, hadoop, hadoop!!! jerome mitchell indiana university

creando un laboratorio big data / hadoop / sas...data...

apache’ignitetm)a)in6memory’data’fabric’ ·...

big data is also big compute - analytics and data...

yarn - next generation compute platform fo hadoop

manual crystalball-2

architectureguide · 2020-05-01 · considerations...