Download - CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets
![Page 1: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/1.jpg)
“25th CSI Karnataka Student Convention”
Map/Reduce Algorithm Performance Analysis in Computing Frequency of
Tweets
Shravanthi U M & Nagashree NInformation Science and Engineering
Bangalore Institute of Technology, Bangalore
![Page 2: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/2.jpg)
AGENDA
DataBig DataTwitter and Big DataClassical ApproachWhy hadoop FrameworkMap/ReduceOur Proposed ApproachConclusionQ & A
![Page 3: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/3.jpg)
Its all About Data
STRUCTURED
UNSTRUCTURED
![Page 4: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/4.jpg)
Big Data
Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. Ex : Web logs , Social Network data , Internet Search Index etc.
![Page 5: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/5.jpg)
“BigData”
![Page 6: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/6.jpg)
Classical Approach
egrep _____ files[0-1000]
Remote FileSystem
egrepfile0
file1000
egrep
egrep
![Page 7: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/7.jpg)
Hadoop Framework Fault tolerance Streaming data access - HDFS
emphasizes high throughput. Extreme scalability - HDFS will
scale to petabytes; Example: at Facebook.
Portability - HDFS is portable across operating systems.
Write once read many times Locality of computation -move
the program near to the data
![Page 8: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/8.jpg)
HDFSegrep _____ files[0-1000]
Move Computation to Data
egrepfile0
file1000
egrep
egrep
40 nodes/rack f1000f1000f3f3
f0f0
f2f2
f_f_
f_f_
f_f_
f_f_
f_f_
![Page 9: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/9.jpg)
Map/Reduce
Map()
InputAny file
(e.g. documents)
OutputStream of <key, value> pairs
(e.g. <word, count> pairs)
Dat
a Re
dist
ribut
ion
and
Gro
upin
g
InputAll <key, value> pairs with
the same key grouped(e.g. all <word, count> pairs
where word = “the”)
OutputAnything
(e.g. sum of counts for a specific word)
Reduce()
![Page 10: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/10.jpg)
Advantages:Fine-grained Map and Reduce tasks
◦Improved load balancing◦Faster recovery from failed tasks
Automatic re-execution on failure◦In a large cluster, some nodes are always slow or
flaky◦Framework re-executes failed tasks
Locality optimizations◦Map-Reduce queries HDFS for locations of input
data◦When possible, map tasks are scheduled close to
the inputs (local access, local rack access, remote rack access)
![Page 11: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/11.jpg)
What did we do…Python code to extract tweets using “twitter.Search” API
for i in range(10): turl=urllib.urlopen("http://search.twitter.com/
search.atom?lang=en&q="+AnnaHazare+"&rpp=100& page="+str(i))
tweettext=re.findall('<updated>(.*?)</updated>', turl.read()) print "Got the Page No. ",(i+1) for i in tweettext: tweets.append(i) f.write(i+"\n")
![Page 12: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/12.jpg)
Extracted DATA
![Page 13: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/13.jpg)
Map/Reduce Impelmentation
<6/4/11, 1><6/4/11, 1><6/4/11, 1><6/6/11, 1><6/6/11, 1><6/6/11, 1><15/8/11, 1><15/8/11, 1>
Reduce()
<6/4/11, 1><6/4/11, 1><6/4/11, 1><6/4/11, 1><6/4/11, 1>
<6/6/11,1><6/6/11,1><6/6/11,1>
<15/8/11,1><15/8/11,1><15/8/11,1>
Server 1 Final Result File
6/4/11 85
6/6/11 36
15/8/11 125
Reduce()
Reduce()
![Page 14: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/14.jpg)
What’s UNIQUE…
Business Analytics - Considerable approach to spot popularity of “New Product”
Sentimental Analysis
![Page 15: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/15.jpg)
![Page 16: CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequencey of tweets](https://reader035.vdocuments.us/reader035/viewer/2022070319/55843cbbd8b42abf1e8b4d8c/html5/thumbnails/16.jpg)
Thank You!