map reduce (part 3) - graz university of...

45
Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030) Mark Kr¨ oll Institute of Interactive Systems and Data Science, Graz University of Technology Nov. 26, 2018 Mark Kr¨ oll (ISDS, TU Graz) MapReduce Nov. 26, 2018 1 / 45

Upload: others

Post on 28-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Map Reduce (Part 3)Databases 2 (VU) (706.711 / 707.030)

Mark Kroll

Institute of Interactive Systems and Data Science,Graz University of Technology

Nov. 26, 2018

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 1 / 45

Page 2: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Outline

1 Problems Suited for MapReduce

2 MapReduce: ApplicationsMatrix-Vector MultiplicationInformation Retrieval

3 Hadoop EcosystemDesigning a Big Data SystemBig Data Storage Technologies

Slides are partially based onSlides “Mining Massive Datasets” by Jure Leskovec.

“Limitations and Challenges of HDFS and MapReduce” by Weets et al., 2015.

Tutorial on “’Data-Intensive Text Processing with MapReduce’ by Lin, 2009.

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 2 / 45

Page 3: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Not all problems are suited for MapReduce

problems at hand need to be formulated and implemented as MapReduce program, i.e.parallelizable according to the divide&conquer idea, for instance

I analyzing textual data (word count statistics)I matrix-vector multiplications (PageRank)I relational algebra operations (Database query primitives such as joins)

in addition, files need to be largeI due to task setup time and scheduling overhead

and should be rarely updated in placeI not suitable when managing online sales, for instance, the principal operations on Amazon

data involve responding to searches for products, recording sales, and so on; processes thatinvolve relatively li�le calculation and that change the database

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 3 / 45

Page 4: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

When MapReduce is not a suitable choice

when your processing requires a lot of data to be shu�led over the networkI remember the idea is to bring the algorithm to the data

real-time processing - when fast responses are needed

for complex algorithmsI for example, machine learning algorithms such as SVMs

processing graphs→ use dedicated frameworks such as Giraph

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 4 / 45

Page 5: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

When MapReduce is not a suitable choice

when lot of data needs to be sorted / shu�led, e.g. the map phase generates too many keysI shu�ling is very time-consuming o�en overloading the networkI reason is that the reducers fetch all intermediate data at once as soon as the last mapper

finishes→ led to strategies such as virtual shu�ling or predictive scheduling

when iterations are neededI for example clustering algorithms such as K-Means→ use frameworks such as Spark

handling streaming dataI MapReduce is best suited to batch process hugh amounts of data→ use frameworks such as Storm, Flink

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 5 / 45

Page 6: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

MapReduce: Typical Applications

matrix operations such as matrix-matrix and matrix-vector multiplications which fit nicelyinto the MapReduce programming model

I for example, to calculate the PageRank

another important class of operations that can use MapReduce e�ectively arerelational-algebra operations

I many operations on data can be described easily in terms of the common database-queryprimitives such as selection, union, intersection, joins

I which can be used for social network analysis, e.g. finding paths of di�erent lengths, countingfriends, etc.

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 6 / 45

Page 7: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Application 1: Matrix-Vector Multiplication

What do we need to do?Suppose we have an n× n matrixM, whose element in row i and column j will be denoted mij .Suppose we also have a vector v of length n, whose jth element is vj . Then the matrix-vectorproduct is the vector x of length n, whose ith element is given by

xi =n∑

j=1

mijvj

Outline a Map-Reduce program that calculates the vector x.

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 7 / 45

Page 8: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Matrix-Vector Multiplication

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 8 / 45

Page 9: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Matrix-Vector Multiplication

let us first assume that the vector v is large, but it still can fit into the memory

the matrix M and the vector v will be each stored in a file of the DFSassume that the row-column coordinates of a matrix element (indices) can be discovered

I for example, each value is stored as a triple (i, j,mij)

similarly, the position of vj can be discovered analogously

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 9 / 45

Page 10: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Matrix-Vector Multiplication

So how does the Map Function look like?

a map function applies to one element of the matrixMthe vector v is first read in its entirety and is available for all Map tasks at that computenode

from each matrix element mij the map function produces the key-value pair (i,mijvj)

all terms of the sum that make up the component xi of the matrix-vector product will getthe same key i

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 10 / 45

Page 11: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Matrix-Vector Multiplication

What about the Reduce Function?reduce function sums all the values associated with a given key i

result is a pair (i, xi)

thereby calculating one entry in the output vector x

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 11 / 45

Page 12: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Matrix-Vector Multiplication: Divide&Conquer

however, it might be that the vector v does not fit into main memoryI it is not required that the vector v fits into the memory at a compute node, but if it does not

there will be a very large number of disk accesses as we move pieces of the vector into mainmemory

alternatively we can divide the matrixM into vertical stripes of equal width and divide thevector into an equal number of horizontal stripes of the same height

→ use enough stripes so that the portion of the vector in one stripe can fit into mainmemory at a compute node

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 12 / 45

Page 13: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Matrix-Vector Multiplication: Divide&Conquer

Figure: Divide matrixM and vector v into stripes

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 13 / 45

Page 14: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Matrix-Vector Multiplication: Divide&Conquer

the ith stripe of matrixM multiplies only components from the ith stripe of the vector

can divide matrixM into one file for each stripe, and do the same for the vector veach Map task is assigned a chunk from one of the stripes in the matrix and gets the entirecorresponding stripe of the vector

Map and Reduce tasks can then act exactly as before

need to sum up once more the results of the stripes multiplication

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 14 / 45

Page 15: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Application 2: Information Retrieval

Information Retrieval (aka Search) in a Nutshellis the activity of obtaining information relevant to an information need

I for instance a search query you submit to Google

from a collection of information resourcesI for instance databases of texts, images or sounds

Information Retrieval is so to saythe Science of Searching for Information

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 15 / 45

Page 16: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Information Retrieval Process

taken from Eissa Alshari Semantic Arabic Information Retrieval Framework

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 16 / 45

Page 17: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Search Process - Step 1: Indexing

taken from h�ps://developer.apple.com/

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 17 / 45

Page 18: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Search Process - Step 2: Retrieval

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 18 / 45

Page 19: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Are Indexing or Retrieval MapReducable?

The Indexing ProblemScalability is critical

must be relatively fast, but need not be real time

fundamentally a batch operation

→ sounds like a Map-Reduce problem

The Retrieval Problemmust have sub-second response time

for the web, only need relatively few results

→ rather not a Map-Reduce problem

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 19 / 45

Page 20: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Index Construction with MapReduce

original purpose behind the MapReduce programming conceptMap over all documents

I emit term as key, (docNr., termFrequency) as valueI emit other information as necessary (e.g., term position)

Sort/shu�le: group postings by term

ReduceI gather and sort the postings (e.g., by docNr. or tf)I write postings to disk

→MapReduce does all the heavy li�ing

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 20 / 45

Page 21: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Index Construction with MapReduce: An Overview

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 21 / 45

Page 22: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

including information on term position:

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 22 / 45

Page 23: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Apache provides a nice so�ware stack

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 23 / 45

Page 24: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Parts of the Hadoop Eco System

Yarn (Yet Another Resource Negotiator)

is a cluster management system to run Big Data applications on a cluster data; not a dataprocessing platform itself but enables the platforms to run their code in a clusterenvironment

HBase

open source, non-relational, distributed database modeled a�er Google’s BigTable

Hive

data warehouse infrastructure built on top of Hadoop for providing data summarization,query, and analysis

Pig

high-level platform for creating programs that run on Apache Hadoop (language is calledPig Latin)

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 24 / 45

Page 25: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Data Processing: From Batch to Streaming . . .

Batch Processing involves operating over a large, static dataset and returning the result at a later time when thecomputation is complete.Stream Processing systems compute over data as it enters the system.The term Microbatch is frequently used to describe scenarios where batches are small and/or processed at small intervals.Even though processing may happen as o�en as once every few minutes, data is still processed a batch at a time.

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 25 / 45

Page 26: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Big Data Processing Frameworks (1)

Batch-only frameworks:I Apache Hadoop can be considered a processing framework with MapReduce as its default

processing engine. Hadoop was the first big data framework to gain significant traction in theopen-source community.

Stream-only frameworks:I Apache Storm focuses on extremely low latency and is perhaps the best option for workloads

that require near real-time processing. It can handle very large quantities of data with anddeliver results with less latency than other solutions. Its topologies are composed of (i)Streams, (ii) Spouts and (iii) Bolts

I Apache Samza is tightly tied to the Apache Kafka messaging system. While Kafka can beused by many stream processing systems, Samza is designed specifically to take advantage ofKafka’s unique architecture and guarantees. It uses Kafka to provide fault tolerance, bu�ering,and state storage

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 26 / 45

Page 27: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Big Data Processing Frameworks (2)

Hybrid frameworks:I Apache Spark is a next generation batch processing framework with stream processing

capabilities; Spark focuses primarily on speeding up batch processing workloads by o�eringfull in-memory computation and processing optimization.

I Apache Flink a stream processing framework that can also handle batch tasks. It considersbatches to simply be data streams with finite boundaries, and thus treats batch processing as asubset of stream processing. This stream-first approach to all processing has a number ofinteresting side e�ects. This stream-first approach has been called the Kappa architecture.

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 27 / 45

Page 28: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Java(-ish) is the Hadoop Language

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 28 / 45

Page 29: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

The Good, the Bad and the Ugly wrt. the Ecosystem

The GoodI Easy to write parallel, highly scalable applicationsI Stream and Batch processingI Seamless integration with other systems (e.g. RDBMS)

The BadI Rapid development, hard to keep overviewI 150 projects in (or near) Hadoop Eco System

F see https://hadoopecosystemtable.github.io/

The UglyI Maven dependency hell, if integrated with other systemsI Spark depends on > 50 libraries with a specific version!

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 29 / 45

Page 30: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

System Design: An Overview

usually boils down to picking the right components wrt. Input, Processing + OutputMark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 30 / 45

Page 31: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

System Design: An Overview

usually boils down to picking the right components wrt. Input, Processing + OutputMark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 31 / 45

Page 32: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

So let’s design a system that

1 receives 5 million sensor measurements per minuteI 100 bytes per record ∼ 8MB/sek

2 stores measurements for 3 yearsI ∼ 240 TB per year, 720 TB in total

3 creates daily reportsI for any given date

4 detects faulty sensors in real-timeI alert the maintainers via SMS

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 32 / 45

Page 33: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

System Design: Processing & Analyzing Sensor Data

. . . receives 5 million sensor measurements per minute

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 33 / 45

Page 34: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

System Design: Processing & Analyzing Sensor Data

. . . stores measurements for 3 years

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 34 / 45

Page 35: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

System Design: Processing & Analyzing Sensor Data

. . . creates daily reportsMark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 35 / 45

Page 36: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

System Design: Processing & Analyzing Sensor Data

. . . detects faulty sensors in real-timeMark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 36 / 45

Page 37: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Big Data Storage Technologies

File-based: HDFSI distributed, permanent file storageI tuned for large filesI no indexing

Key-Value based: HBaseI distributed Key/Value storeI fast look-upsI based on HDFS

Message-based: KafkaI distributed Producer/Consumer messaging systemI data partitioned in topicsI producer groups / consumer groups

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 37 / 45

Page 38: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

File-based: HDFS

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 38 / 45

Page 39: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

File-based: HDFS

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 39 / 45

Page 40: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Key-Value based: HBase

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 40 / 45

Page 41: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Key-Value based: HBase

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 41 / 45

Page 42: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Message-based: Kafka

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 42 / 45

Page 43: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Message-based: Kafka

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 43 / 45

Page 44: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

Map Reduce Lectures’ Recap:

Part 1:I Handling Big DataI Key Elements: MapReduce Framework & Distributed File System

Part 2:I Optimization: Maximizing ParallelismI Stragglers Problem & Input Data Skew

Part 3:I Suitability & ApplicationsI Hadoop Ecosystem & Storage Technologies

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 44 / 45

Page 45: Map Reduce (Part 3) - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/map_reduce_part... · 2018-11-26 · Map Reduce (Part 3) Databases 2 (VU) (706.711 / 707.030)

The EndNext Lecture: “Beyond Map-Reduce”

Mark Kroll (ISDS, TU Graz) MapReduce Nov. 26, 2018 45 / 45