map-reduce design patterns - github pages

34
Venkatesh Vinayakarao (Vv) Map-Reduce Design Patterns Venkatesh Vinayakarao [email protected] http://vvtesh.co.in Chennai Mathematical Institute https://vvtesh.sarahah.com/ Finding patterns is the essence of wisdom. – Dennis Prager

Upload: others

Post on 16-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Map-Reduce Design Patterns - GitHub Pages

Venkatesh Vinayakarao (Vv)

Map-Reduce Design Patterns

Venkatesh [email protected]

http://vvtesh.co.in

Chennai Mathematical Institute

https://vvtesh.sarahah.com/

Finding patterns is the essence of wisdom. – Dennis Prager

Page 2: Map-Reduce Design Patterns - GitHub Pages

I have a story for you!Patterns here, Patterns there,

Patterns, Patterns everywhere…

Page 3: Map-Reduce Design Patterns - GitHub Pages

Are beauty and quality objective?

• Can we agree some things are beautiful and some are not?

235

Page 4: Map-Reduce Design Patterns - GitHub Pages

Christopher Alexander Asked… What is a good design?

236

Are beauty and quality objective?

What makes a good architectural design?

Page 5: Map-Reduce Design Patterns - GitHub Pages

Good Design

237

Beauty can be objectively measured.

Example: Symmetry is good

Cultural Anthropology: Within a culture, individuals agree what is good design, what is beautiful.

Page 6: Map-Reduce Design Patterns - GitHub Pages

238

Patterns

• Good design structures had similarities between them.

• Alexander called these similarities patterns.

• "Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over..."

Christopher Alexander, A Pattern Language: Towns/Buildings/Construction, 1977

A pattern is a solution to a problem in a context.

Page 7: Map-Reduce Design Patterns - GitHub Pages

239

A question

• Can you tell me one design that is absolutely symmetrical?

Equivalent ideas exist in software design

Page 8: Map-Reduce Design Patterns - GitHub Pages

240

Good Design

• What according to you are the two biggest factors that determine a good/bad design?

Page 9: Map-Reduce Design Patterns - GitHub Pages

241

Good and Bad Design

• What are the commonalities in what is viewed as good (and what is viewed as bad)?• A software system that is easy to maintain is considered

good• A fragile software system is considered bad

• A software system that is easy to understand is considered good• Obfuscated “spaghetti code” is considered bad

Page 10: Map-Reduce Design Patterns - GitHub Pages

242

Quiz

• Have you come across software patterns? Can you give one example?

Page 11: Map-Reduce Design Patterns - GitHub Pages

Singleton Pattern

243

public class CMI{

private static final CMI instance = new CMI();

private CMI() {

// private constructor }

public static CMI getInstance(){ return instance;

}

}

Page 12: Map-Reduce Design Patterns - GitHub Pages

Factory Pattern

244Source: https://dzone.com/articles/creational-design-pattern-series-factory-method-pa

Page 13: Map-Reduce Design Patterns - GitHub Pages

For More on Design Patterns

245

We shall now look at some Map-Reduce design patterns.

Page 14: Map-Reduce Design Patterns - GitHub Pages

Recap

246

Map

Re

du

ce

Shu

ffle

an

d S

ort

Map-Reduce Model

Hadoop Architecture

Page 15: Map-Reduce Design Patterns - GitHub Pages

Map-Reduce Processing

247

< Hello, 1>

< World, 1>

< Bye, 1>

< World, 1>

< Hello, 1>

< Hadoop, 1>

< Goodbye, 1>

< Hadoop, 1>

< Bye, 1>

< Hadoop, 1>

< Hadoop, 1>

< Hello, 1>

< Hello, 1>

< GoodBye,1>

< World, 1>

< World, 1>

< Bye, 1>

< Hadoop, 2>

< Hello, 3>

< GoodBye,1>

< World, 2>

Shu

ffle

an

d S

ort

Map

pe

r Re

du

ce

1 2 3

Combiner can be used to summarize locally per mapper.

Map

pe

r

Par

titi

on

er

Page 16: Map-Reduce Design Patterns - GitHub Pages

Input Splits

248

Note that a remote read may be required at block boundaries.

Page 17: Map-Reduce Design Patterns - GitHub Pages

A Hadoop Map-Reduce Developer

• Writes the “map” code

• Writes the “reduce” code

• Submits the map and reduce code to Hadoop framework.

249

job

Submit

Job Tracker

@Namenode, which datanodeshave the data?

Page 18: Map-Reduce Design Patterns - GitHub Pages

Submitting a Map-Reduce Job

hadoop jar

/usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input/usr/joe/wordcount/output

250

Page 19: Map-Reduce Design Patterns - GitHub Pages

Mapper

251

See https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html for details.

Page 20: Map-Reduce Design Patterns - GitHub Pages

Reducer

252

Page 21: Map-Reduce Design Patterns - GitHub Pages

Create a Job

253

Page 22: Map-Reduce Design Patterns - GitHub Pages

Submit Job to Hadoop

254

Page 23: Map-Reduce Design Patterns - GitHub Pages

Output

255

Page 24: Map-Reduce Design Patterns - GitHub Pages

Readings

256

Page 25: Map-Reduce Design Patterns - GitHub Pages

How Will You Implement These With Map Reduce?• Min/Max

• Average

• Count

• Median

• Filtering

• Top 10

• Convert key-values to hierarchy

• Partioning

• Sorting

257

Page 26: Map-Reduce Design Patterns - GitHub Pages

Summarization Pattern

258

A partitioner controls the logical grouping keys of the intermediate map output.

Page 27: Map-Reduce Design Patterns - GitHub Pages

Min/Max/Count

259

Page 28: Map-Reduce Design Patterns - GitHub Pages

Average

260

Page 29: Map-Reduce Design Patterns - GitHub Pages

Flex Your Brain!

• How will you compute the median?

261

Refer to Chapter 2 of MR Design Patterns Book.

Page 30: Map-Reduce Design Patterns - GitHub Pages

Counting Mappers

262

Global counters belong to job-tracker. Use responsibly.

Page 31: Map-Reduce Design Patterns - GitHub Pages

Filtering

263

No Reducer

Required.

Page 32: Map-Reduce Design Patterns - GitHub Pages

Filter Example

264

Page 33: Map-Reduce Design Patterns - GitHub Pages

Top 10 Pattern

• How will you determine the top 10 numbers in petabytes of numbers?

265

Page 34: Map-Reduce Design Patterns - GitHub Pages

Structure to Hierarchy

266

How to store this in RDBMS?