www.company.com mapreduce on matlab by: erum afzal
Post on 15-Jan-2016
218 views
TRANSCRIPT
www.company.com
MapReduce on MatlabBy:
Erum Afzal
www.company.com
MapReduce
MapReduce is a programming model devised at Google to facilitate the processing of large data sets. For example, it is used at Google for indexing websites
www.company.com
Matlab
• Matlab, being software tenders with a technical computing environment.
• It is being used for numerical manipulation, simulations and data processing.
www.company.com
MapReduce on Matlab
• MapReduce on Matlab allows Matlab users to apply MapReduce’s framework to their own data processing requirements. Like all data mining tasks, dense detailed digital images. Similarly if we could import matlab file to Map Reduce framework several functionalities of Matlab can processed on Hadoop as well as.
www.company.com
Working of MapReduce
• As with the application of MapReduce, data can be processed using multiple processors in parallel. With this it can
• Handle large volumes of input data.• Speed up processing due to parallelization of
tasks
www.company.com
Continue…Map:Each piece of input data, identified by a key and a value, is mapped to 1 or more intermediate key/valuepairs.ReduceEach worker processes a part of the intermediate key/values pairs, to generate the final key/value pairs.
www.company.com
Working of MatlabThe Matlab Parallel Computing Toolbox offers the framework to write programs for a cluster of computers. This enables a master computer to dispatch jobs to workers running on McGill’s cluster.
Master createsMapReduce job,passes user definedMap and Reducefunctions to workers
At each worker, the inputkey pairs are fedinto the map function toget intermediatekey/value pairs
At each worker, theintermediate key/value pairsare fed into the reducefunction to get final key/valuepairs the output
www.company.com
Continue…
www.company.com
Orthogonal Matching Pursuit
Here in example
A sparse signal is that x, can be stored by multiplying it with a measurement matrix, A:
• Where, y = Ax• y can be used to
recover x by• using OMP,
www.company.com
Application with MapreduceOMP becomes slow in its tradition solution as A grows larger in size. If we resolve the problem by processing individual performed using MapReduce.
www.company.com
Continue….
• OMP becomes slow as A grows larger in size. This problem can be solved by processing individual slices of A in parallel.
• The MapReduce method actually.
www.company.com
Results
• MapReduce was implemented on Matlab, and was used to run Orthogonal Matching Pursuit..
• MapReduce on Matlab has the potential to improve the performance of numerous parallel processing algorithms by bringing the power ofthe MapReduce programming model to Matlab
www.company.com
Singular Value Decomposition (SVD)
The Singular Value Decomposition (SVD) is a powerful matrix decomposition frequently used for dimensionality reduction. SVD is widely used in problems involving least squares problems, linear systems and finding a low rank representation of a matrix. A wide range of applications uses SVD as its main algorithmic tool.
www.company.com
Problem• Finding patterns in large scale graphs, with millions and billions of
edges is increasing in computer network security intrusion detection, spamming, in web applications.
• Such a setting is the estimation of the clustering coefficients and the transitivity ratio of the graph, which effectively translates in computing the number of triangles that each node participates in or the total number of triangles in the graph respectively.
• The triangles are a frequently used network statistic in the exponential random graph model and naturally appear in models of real-world network evolution, the triangles have been used in several applications such as spam detection ,uncovering the hidden thematic structure of the web and for link recommendation in online social networks .
• It is worth noting that in social networks triangles have a natural interpretation. AS
“friends of friends are frequently friends themselves.”
www.company.com
MATLAB implementation, k-rank approx
function 0 = EigenTriangleLocal(A,k) {A is the adjacency matrix, k is the
required rank approximation}n = size(A,1);0 = zeros(n,1); {Preallocate space for 0}opts.isreal=1; opts.issym=1; {Specify that the matrix is real and
symmetric}[u l] = eigs(A,k,’LM’,opts); {Compute top k eigenvalues and
eigenvectors ofA}l = diag(l)’;for j=1:n do0(j) = sum( l.ˆ3.*u(j,:).ˆ2)/2end for
www.company.com
Summary of network data
www.company.com
Results
www.company.com
Continue….
• In this work the EIGENTRIANGLE and EIGENTRIANGLELOCAL algorithms have been proposed to estimate the total number of triangles and the number of triangles per node respectively in an undirected, outweighed graph. The special spectral properties which real-world networks frequently possess make both algorithms efficient for the triangle counting problem. our knowledge, the knowledge
www.company.com
Fast Randomized Tensor Decompositions
• There are many real-world problems involve multiple aspect data. For example fMRI (functional magnetic resonance imaging) scans, one of the most popular neuroimaging techniques, result in multi-aspect data: voxels × subjects × trials ×task conditions × timeticks. Monitoring systems result in three-way data, machine id × type of measurement × timeticks. The machine depending on the setting can be for instance a sensor (sensor networks) or a computer (computer networks). Large data volumes generated by personalized web search, are frequently modeled as three way tensors, i.e., users × queries × web pages.
• All above is quite time taking task….
www.company.com
Problem• Ignoring the multi-aspect nature of the data by flattening
them in a two-way matrix and applying an exploratory analysis algorithm, e.g., singular value decomposition (SVD) is not optimal and typically hurts significantly the performance
• The same problem holds in the case of applying e.g., SVD on different 2-way slices of the tensor as observed by [94]. On the contrary, multiway data analysis techniques succeed in capturing the multilinear structures in the data, thus achieving better performance than the aforementioned ideas.
www.company.com
Problem Solution
Tensor decompositions have found as solution in many applications in different scientific disciplines. Specially in computer vision and signal processing like neuroscience, time series anomaly detection, psychometrics, graph analysis and data mining.
www.company.com
Algorithm 8 MACH-HOSVD
www.company.com
Results
www.company.com
Continue….
• Tensor decompositions are useful in many real world problems. A simple randomized algorithm MACH is purposed which is easily parallelizable and adapted to online streaming systems.
• This algorithm will be incorporated in the PEGASUS library, a graph and tensor mining system for handling large amounts of data.
www.company.com
More Applications
• Comparing the Performance of Clusters, Hadoop, and Active Disks on Microarray Correlation Computations.
• Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce (DRAFT).
• Map-Reduce for Machine Learning on Multicore.
www.company.com
Refrences
• Charalampos E. Tsourakaki “Data Mining with MAPREDUCE:Graph and Tensor Algorithmswith Applications”, March 2010.
• Arjita Madan, “ MapReduce on Matlab”