www.company.com mapreduce on matlab by: erum afzal

26
www.company.com MapReduce on Matlab By: Erum Afzal

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

MapReduce on MatlabBy:

Erum Afzal

Page 2: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

MapReduce

MapReduce is a programming model devised at Google to facilitate the processing of large data sets. For example, it is used at Google for indexing websites

Page 3: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Matlab

• Matlab, being software tenders with a technical computing environment.

• It is being used for numerical manipulation, simulations and data processing.

Page 4: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

MapReduce on Matlab

• MapReduce on Matlab allows Matlab users to apply MapReduce’s framework to their own data processing requirements. Like all data mining tasks, dense detailed digital images. Similarly if we could import matlab file to Map Reduce framework several functionalities of Matlab can processed on Hadoop as well as.

Page 5: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Working of MapReduce

• As with the application of MapReduce, data can be processed using multiple processors in parallel. With this it can

• Handle large volumes of input data.• Speed up processing due to parallelization of

tasks

Page 6: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Continue…Map:Each piece of input data, identified by a key and a value, is mapped to 1 or more intermediate key/valuepairs.ReduceEach worker processes a part of the intermediate key/values pairs, to generate the final key/value pairs.

Page 7: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Working of MatlabThe Matlab Parallel Computing Toolbox offers the framework to write programs for a cluster of computers. This enables a master computer to dispatch jobs to workers running on McGill’s cluster.

Master createsMapReduce job,passes user definedMap and Reducefunctions to workers

At each worker, the inputkey pairs are fedinto the map function toget intermediatekey/value pairs

At each worker, theintermediate key/value pairsare fed into the reducefunction to get final key/valuepairs the output

Page 8: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Continue…

Page 9: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Orthogonal Matching Pursuit

Here in example

A sparse signal is that x, can be stored by multiplying it with a measurement matrix, A:

• Where, y = Ax• y can be used to

recover x by• using OMP,

Page 10: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Application with MapreduceOMP becomes slow in its tradition solution as A grows larger in size. If we resolve the problem by processing individual performed using MapReduce.

Page 11: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Continue….

• OMP becomes slow as A grows larger in size. This problem can be solved by processing individual slices of A in parallel.

• The MapReduce method actually.

Page 12: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Results

• MapReduce was implemented on Matlab, and was used to run Orthogonal Matching Pursuit..

• MapReduce on Matlab has the potential to improve the performance of numerous parallel processing algorithms by bringing the power ofthe MapReduce programming model to Matlab

Page 13: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Singular Value Decomposition (SVD)

The Singular Value Decomposition (SVD) is a powerful matrix decomposition frequently used for dimensionality reduction. SVD is widely used in problems involving least squares problems, linear systems and finding a low rank representation of a matrix. A wide range of applications uses SVD as its main algorithmic tool.

Page 14: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Problem• Finding patterns in large scale graphs, with millions and billions of

edges is increasing in computer network security intrusion detection, spamming, in web applications.

• Such a setting is the estimation of the clustering coefficients and the transitivity ratio of the graph, which effectively translates in computing the number of triangles that each node participates in or the total number of triangles in the graph respectively.

• The triangles are a frequently used network statistic in the exponential random graph model and naturally appear in models of real-world network evolution, the triangles have been used in several applications such as spam detection ,uncovering the hidden thematic structure of the web and for link recommendation in online social networks .

• It is worth noting that in social networks triangles have a natural interpretation. AS

“friends of friends are frequently friends themselves.”

Page 15: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

MATLAB implementation, k-rank approx

function 0 = EigenTriangleLocal(A,k) {A is the adjacency matrix, k is the

required rank approximation}n = size(A,1);0 = zeros(n,1); {Preallocate space for 0}opts.isreal=1; opts.issym=1; {Specify that the matrix is real and

symmetric}[u l] = eigs(A,k,’LM’,opts); {Compute top k eigenvalues and

eigenvectors ofA}l = diag(l)’;for j=1:n do0(j) = sum( l.ˆ3.*u(j,:).ˆ2)/2end for

Page 16: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Summary of network data

Page 17: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Results

Page 18: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Continue….

• In this work the EIGENTRIANGLE and EIGENTRIANGLELOCAL algorithms have been proposed to estimate the total number of triangles and the number of triangles per node respectively in an undirected, outweighed graph. The special spectral properties which real-world networks frequently possess make both algorithms efficient for the triangle counting problem. our knowledge, the knowledge

Page 19: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Fast Randomized Tensor Decompositions

• There are many real-world problems involve multiple aspect data. For example fMRI (functional magnetic resonance imaging) scans, one of the most popular neuroimaging techniques, result in multi-aspect data: voxels × subjects × trials ×task conditions × timeticks. Monitoring systems result in three-way data, machine id × type of measurement × timeticks. The machine depending on the setting can be for instance a sensor (sensor networks) or a computer (computer networks). Large data volumes generated by personalized web search, are frequently modeled as three way tensors, i.e., users × queries × web pages.

• All above is quite time taking task….

Page 20: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Problem• Ignoring the multi-aspect nature of the data by flattening

them in a two-way matrix and applying an exploratory analysis algorithm, e.g., singular value decomposition (SVD) is not optimal and typically hurts significantly the performance

• The same problem holds in the case of applying e.g., SVD on different 2-way slices of the tensor as observed by [94]. On the contrary, multiway data analysis techniques succeed in capturing the multilinear structures in the data, thus achieving better performance than the aforementioned ideas.

Page 21: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Problem Solution

Tensor decompositions have found as solution in many applications in different scientific disciplines. Specially in computer vision and signal processing like neuroscience, time series anomaly detection, psychometrics, graph analysis and data mining.

Page 22: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Algorithm 8 MACH-HOSVD

Page 23: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Results

Page 24: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Continue….

• Tensor decompositions are useful in many real world problems. A simple randomized algorithm MACH is purposed which is easily parallelizable and adapted to online streaming systems.

• This algorithm will be incorporated in the PEGASUS library, a graph and tensor mining system for handling large amounts of data.

Page 25: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

More Applications

• Comparing the Performance of Clusters, Hadoop, and Active Disks on Microarray Correlation Computations.

• Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce (DRAFT).

• Map-Reduce for Machine Learning on Multicore.

Page 26: Www.company.com MapReduce on Matlab By: Erum Afzal

www.company.com

Refrences

• Charalampos E. Tsourakaki “Data Mining with MAPREDUCE:Graph and Tensor Algorithmswith Applications”, March 2010.

• Arjita Madan, “ MapReduce on Matlab”