distributed computing

190
Distributed Computing Varun Thacker Linux User’s Group Manipal April 8, 2010 Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 1 / 42

Upload: varun-thacker

Post on 19-Dec-2014

1.266 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Distributed Computing

Distributed Computing

Varun Thacker

Linux User’s Group Manipal

April 8, 2010

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 1 / 42

Page 2: Distributed Computing

Outline I

1 IntroductionLUG ManipalPoints To Remember

2 Distributed ComputingDistributed ComputingTechnologies to be coveredIdeaData !!Why Distributed Computing is HardWhy Distributed Computing is ImportantThree Common Distributed Architectures

3 Distributed File SystemGFSWhat a Distributed File System DoesGoogle File System ArchitectureGFS Architecture: Chunks

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 2 / 42

Page 3: Distributed Computing

Outline II

GFS Architecture: MasterGFS: Life of a ReadGFS: Life of a WriteGFS: Master Failure

4 MapReduceMapReduceDo We Need It?Bad News!MapReduceMap Reduce ParadigmMapReduce ParadigmWorkingWorkingUnder the hood: SchedulingRobustness

5 HadoopVarun Thacker (LUG Manipal) Distributed Computing April 8, 2010 3 / 42

Page 4: Distributed Computing

Outline III

HadoopWhat is HadoopWho uses Hadoop?MapperCombinersReducerSome TerminologyJob Distribution

6 Contact Information

7 Attribution

8 Copying

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 4 / 42

Page 5: Distributed Computing

Who are we?

Linux User’s Group Manipal

Life, Universe and FOSS!!

Believers of Knowledge Sharing

Most technologically focused “group” in University

LUG Manipal is a non profit “Group” alive only on voluntary work!!

http://lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42

Page 6: Distributed Computing

Who are we?

Linux User’s Group Manipal

Life, Universe and FOSS!!

Believers of Knowledge Sharing

Most technologically focused “group” in University

LUG Manipal is a non profit “Group” alive only on voluntary work!!

http://lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42

Page 7: Distributed Computing

Who are we?

Linux User’s Group Manipal

Life, Universe and FOSS!!

Believers of Knowledge Sharing

Most technologically focused “group” in University

LUG Manipal is a non profit “Group” alive only on voluntary work!!

http://lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42

Page 8: Distributed Computing

Who are we?

Linux User’s Group Manipal

Life, Universe and FOSS!!

Believers of Knowledge Sharing

Most technologically focused “group” in University

LUG Manipal is a non profit “Group” alive only on voluntary work!!

http://lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42

Page 9: Distributed Computing

Who are we?

Linux User’s Group Manipal

Life, Universe and FOSS!!

Believers of Knowledge Sharing

Most technologically focused “group” in University

LUG Manipal is a non profit “Group” alive only on voluntary work!!

http://lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42

Page 10: Distributed Computing

Who are we?

Linux User’s Group Manipal

Life, Universe and FOSS!!

Believers of Knowledge Sharing

Most technologically focused “group” in University

LUG Manipal is a non profit “Group” alive only on voluntary work!!

http://lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 5 / 42

Page 11: Distributed Computing

Points To Remember!!!

If you have problem(s) don’t hesitate to ask

Slides are based on Documentation so discussions are reallyimportant, slides are for later reference!!

Please dont consider sessions as Class( Classes are boring !! )

Speaker is just like any person sitting next to you

Documentation is really important

Google is your friend

If you have questions after this workshop mail me or come to LUGManipal’s forums

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42

Page 12: Distributed Computing

Points To Remember!!!

If you have problem(s) don’t hesitate to ask

Slides are based on Documentation so discussions are reallyimportant, slides are for later reference!!

Please dont consider sessions as Class( Classes are boring !! )

Speaker is just like any person sitting next to you

Documentation is really important

Google is your friend

If you have questions after this workshop mail me or come to LUGManipal’s forums

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42

Page 13: Distributed Computing

Points To Remember!!!

If you have problem(s) don’t hesitate to ask

Slides are based on Documentation so discussions are reallyimportant, slides are for later reference!!

Please dont consider sessions as Class( Classes are boring !! )

Speaker is just like any person sitting next to you

Documentation is really important

Google is your friend

If you have questions after this workshop mail me or come to LUGManipal’s forums

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42

Page 14: Distributed Computing

Points To Remember!!!

If you have problem(s) don’t hesitate to ask

Slides are based on Documentation so discussions are reallyimportant, slides are for later reference!!

Please dont consider sessions as Class( Classes are boring !! )

Speaker is just like any person sitting next to you

Documentation is really important

Google is your friend

If you have questions after this workshop mail me or come to LUGManipal’s forums

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42

Page 15: Distributed Computing

Points To Remember!!!

If you have problem(s) don’t hesitate to ask

Slides are based on Documentation so discussions are reallyimportant, slides are for later reference!!

Please dont consider sessions as Class( Classes are boring !! )

Speaker is just like any person sitting next to you

Documentation is really important

Google is your friend

If you have questions after this workshop mail me or come to LUGManipal’s forums

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42

Page 16: Distributed Computing

Points To Remember!!!

If you have problem(s) don’t hesitate to ask

Slides are based on Documentation so discussions are reallyimportant, slides are for later reference!!

Please dont consider sessions as Class( Classes are boring !! )

Speaker is just like any person sitting next to you

Documentation is really important

Google is your friend

If you have questions after this workshop mail me or come to LUGManipal’s forums

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42

Page 17: Distributed Computing

Points To Remember!!!

If you have problem(s) don’t hesitate to ask

Slides are based on Documentation so discussions are reallyimportant, slides are for later reference!!

Please dont consider sessions as Class( Classes are boring !! )

Speaker is just like any person sitting next to you

Documentation is really important

Google is your friend

If you have questions after this workshop mail me or come to LUGManipal’s forums

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 6 / 42

Page 18: Distributed Computing

Distributed Computing

Distributed Computing

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 7 / 42

Page 19: Distributed Computing

Technologies to be covered

Distributed computing refers to the use of distributed systems tosolve computational problems on the distributed system.

A distributed system consists of multiple computers thatcommunicate through a network.

MapReduce is a framework which implements the idea of adistributed computing.

GFS is the distributed file system on which distributed programs storeand process data in Google. It’s free implementation is HDFS.

Hadoop is an open source framework written in Java whichimplements the MapReduce technology.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42

Page 20: Distributed Computing

Technologies to be covered

Distributed computing refers to the use of distributed systems tosolve computational problems on the distributed system.

A distributed system consists of multiple computers thatcommunicate through a network.

MapReduce is a framework which implements the idea of adistributed computing.

GFS is the distributed file system on which distributed programs storeand process data in Google. It’s free implementation is HDFS.

Hadoop is an open source framework written in Java whichimplements the MapReduce technology.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42

Page 21: Distributed Computing

Technologies to be covered

Distributed computing refers to the use of distributed systems tosolve computational problems on the distributed system.

A distributed system consists of multiple computers thatcommunicate through a network.

MapReduce is a framework which implements the idea of adistributed computing.

GFS is the distributed file system on which distributed programs storeand process data in Google. It’s free implementation is HDFS.

Hadoop is an open source framework written in Java whichimplements the MapReduce technology.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42

Page 22: Distributed Computing

Technologies to be covered

Distributed computing refers to the use of distributed systems tosolve computational problems on the distributed system.

A distributed system consists of multiple computers thatcommunicate through a network.

MapReduce is a framework which implements the idea of adistributed computing.

GFS is the distributed file system on which distributed programs storeand process data in Google. It’s free implementation is HDFS.

Hadoop is an open source framework written in Java whichimplements the MapReduce technology.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42

Page 23: Distributed Computing

Technologies to be covered

Distributed computing refers to the use of distributed systems tosolve computational problems on the distributed system.

A distributed system consists of multiple computers thatcommunicate through a network.

MapReduce is a framework which implements the idea of adistributed computing.

GFS is the distributed file system on which distributed programs storeand process data in Google. It’s free implementation is HDFS.

Hadoop is an open source framework written in Java whichimplements the MapReduce technology.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 8 / 42

Page 24: Distributed Computing

Idea

While the storage capacities of hard drives have increased massivelyover the years, access speeds—the rate at which data can be readfrom drives have not kept up.

One terabyte drives are the norm, but the transfer speed is around100 MB/s, so it takes more than two and a half hours to read all thedata off the disk.

The obvious way to reduce the time is to read from multiple disks atonce. Imagine if we had 100 drives, each holding one hundredth ofthe data. Working in parallel, we could read the data in under twominutes.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42

Page 25: Distributed Computing

Idea

While the storage capacities of hard drives have increased massivelyover the years, access speeds—the rate at which data can be readfrom drives have not kept up.

One terabyte drives are the norm, but the transfer speed is around100 MB/s, so it takes more than two and a half hours to read all thedata off the disk.

The obvious way to reduce the time is to read from multiple disks atonce. Imagine if we had 100 drives, each holding one hundredth ofthe data. Working in parallel, we could read the data in under twominutes.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42

Page 26: Distributed Computing

Idea

While the storage capacities of hard drives have increased massivelyover the years, access speeds—the rate at which data can be readfrom drives have not kept up.

One terabyte drives are the norm, but the transfer speed is around100 MB/s, so it takes more than two and a half hours to read all thedata off the disk.

The obvious way to reduce the time is to read from multiple disks atonce. Imagine if we had 100 drives, each holding one hundredth ofthe data. Working in parallel, we could read the data in under twominutes.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 9 / 42

Page 27: Distributed Computing

Data

We live in the data age.An IDC estimate put the size of the “digitaluniverse” at 0.18 zettabytes(?) in 2006.

And by 2011 there will be a tenfold growth to 1.8 zettabytes.

1 zetabyte is one million petabytes, or one billion terabytes.

The New York Stock Exchange generates about one terabyte of newtrade data per day.

Facebook hosts approximately 10 billion photos, taking up onepetabyte of storage.

The Large Hadron Collider near Geneva produces about 15 petabytesof data per year.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42

Page 28: Distributed Computing

Data

We live in the data age.An IDC estimate put the size of the “digitaluniverse” at 0.18 zettabytes(?) in 2006.

And by 2011 there will be a tenfold growth to 1.8 zettabytes.

1 zetabyte is one million petabytes, or one billion terabytes.

The New York Stock Exchange generates about one terabyte of newtrade data per day.

Facebook hosts approximately 10 billion photos, taking up onepetabyte of storage.

The Large Hadron Collider near Geneva produces about 15 petabytesof data per year.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42

Page 29: Distributed Computing

Data

We live in the data age.An IDC estimate put the size of the “digitaluniverse” at 0.18 zettabytes(?) in 2006.

And by 2011 there will be a tenfold growth to 1.8 zettabytes.

1 zetabyte is one million petabytes, or one billion terabytes.

The New York Stock Exchange generates about one terabyte of newtrade data per day.

Facebook hosts approximately 10 billion photos, taking up onepetabyte of storage.

The Large Hadron Collider near Geneva produces about 15 petabytesof data per year.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42

Page 30: Distributed Computing

Data

We live in the data age.An IDC estimate put the size of the “digitaluniverse” at 0.18 zettabytes(?) in 2006.

And by 2011 there will be a tenfold growth to 1.8 zettabytes.

1 zetabyte is one million petabytes, or one billion terabytes.

The New York Stock Exchange generates about one terabyte of newtrade data per day.

Facebook hosts approximately 10 billion photos, taking up onepetabyte of storage.

The Large Hadron Collider near Geneva produces about 15 petabytesof data per year.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42

Page 31: Distributed Computing

Data

We live in the data age.An IDC estimate put the size of the “digitaluniverse” at 0.18 zettabytes(?) in 2006.

And by 2011 there will be a tenfold growth to 1.8 zettabytes.

1 zetabyte is one million petabytes, or one billion terabytes.

The New York Stock Exchange generates about one terabyte of newtrade data per day.

Facebook hosts approximately 10 billion photos, taking up onepetabyte of storage.

The Large Hadron Collider near Geneva produces about 15 petabytesof data per year.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42

Page 32: Distributed Computing

Data

We live in the data age.An IDC estimate put the size of the “digitaluniverse” at 0.18 zettabytes(?) in 2006.

And by 2011 there will be a tenfold growth to 1.8 zettabytes.

1 zetabyte is one million petabytes, or one billion terabytes.

The New York Stock Exchange generates about one terabyte of newtrade data per day.

Facebook hosts approximately 10 billion photos, taking up onepetabyte of storage.

The Large Hadron Collider near Geneva produces about 15 petabytesof data per year.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 10 / 42

Page 33: Distributed Computing

Why Distributed Computing is Hard

Computers crash.

Network links crash.

Talking is slow(even ethernet has 300 microsecond latency, duringwhich time your 2Ghz PC can do 600,000 cycles).

Bandwidth is finite.

Internet scale: the computers and network areheterogeneous,untrustworthy, and subject to change at any time.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42

Page 34: Distributed Computing

Why Distributed Computing is Hard

Computers crash.

Network links crash.

Talking is slow(even ethernet has 300 microsecond latency, duringwhich time your 2Ghz PC can do 600,000 cycles).

Bandwidth is finite.

Internet scale: the computers and network areheterogeneous,untrustworthy, and subject to change at any time.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42

Page 35: Distributed Computing

Why Distributed Computing is Hard

Computers crash.

Network links crash.

Talking is slow(even ethernet has 300 microsecond latency, duringwhich time your 2Ghz PC can do 600,000 cycles).

Bandwidth is finite.

Internet scale: the computers and network areheterogeneous,untrustworthy, and subject to change at any time.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42

Page 36: Distributed Computing

Why Distributed Computing is Hard

Computers crash.

Network links crash.

Talking is slow(even ethernet has 300 microsecond latency, duringwhich time your 2Ghz PC can do 600,000 cycles).

Bandwidth is finite.

Internet scale: the computers and network areheterogeneous,untrustworthy, and subject to change at any time.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42

Page 37: Distributed Computing

Why Distributed Computing is Hard

Computers crash.

Network links crash.

Talking is slow(even ethernet has 300 microsecond latency, duringwhich time your 2Ghz PC can do 600,000 cycles).

Bandwidth is finite.

Internet scale: the computers and network areheterogeneous,untrustworthy, and subject to change at any time.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 11 / 42

Page 38: Distributed Computing

Why Distributed Computing is Important

Can be more reliable.

Can be faster.

Can be cheaper ($30 million Cray versus 100 $1000 PC’s).

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42

Page 39: Distributed Computing

Why Distributed Computing is Important

Can be more reliable.

Can be faster.

Can be cheaper ($30 million Cray versus 100 $1000 PC’s).

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42

Page 40: Distributed Computing

Why Distributed Computing is Important

Can be more reliable.

Can be faster.

Can be cheaper ($30 million Cray versus 100 $1000 PC’s).

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 12 / 42

Page 41: Distributed Computing

Three Common Distributed Architectures

Hope: have N computers do separate pieces of work. Speed-up < N.Probability of failure = 1–(1− p)N ≈ Np. (p = probability ofindividual crash).

Replication: have N computers do the same thing. Speed-up < 1.Probability of failure = pN .

Master-servant: have 1 computer hand out pieces of work to N-1servants, and re-hand out pieces of work if servants fail. Speed-up< N − 1. Probability of failure ≈ p.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42

Page 42: Distributed Computing

Three Common Distributed Architectures

Hope: have N computers do separate pieces of work. Speed-up < N.Probability of failure = 1–(1− p)N ≈ Np. (p = probability ofindividual crash).

Replication: have N computers do the same thing. Speed-up < 1.Probability of failure = pN .

Master-servant: have 1 computer hand out pieces of work to N-1servants, and re-hand out pieces of work if servants fail. Speed-up< N − 1. Probability of failure ≈ p.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42

Page 43: Distributed Computing

Three Common Distributed Architectures

Hope: have N computers do separate pieces of work. Speed-up < N.Probability of failure = 1–(1− p)N ≈ Np. (p = probability ofindividual crash).

Replication: have N computers do the same thing. Speed-up < 1.Probability of failure = pN .

Master-servant: have 1 computer hand out pieces of work to N-1servants, and re-hand out pieces of work if servants fail. Speed-up< N − 1. Probability of failure ≈ p.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 13 / 42

Page 44: Distributed Computing

GFS

GFS

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 14 / 42

Page 45: Distributed Computing

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.

Allow distributed access to files.

Files are stored distributedly.

If you just do #1 and #2, you are a network file system.

To do #3, it’s a good idea to also provide fault tolerance.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42

Page 46: Distributed Computing

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.

Allow distributed access to files.

Files are stored distributedly.

If you just do #1 and #2, you are a network file system.

To do #3, it’s a good idea to also provide fault tolerance.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42

Page 47: Distributed Computing

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.

Allow distributed access to files.

Files are stored distributedly.

If you just do #1 and #2, you are a network file system.

To do #3, it’s a good idea to also provide fault tolerance.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42

Page 48: Distributed Computing

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.

Allow distributed access to files.

Files are stored distributedly.

If you just do #1 and #2, you are a network file system.

To do #3, it’s a good idea to also provide fault tolerance.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42

Page 49: Distributed Computing

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.

Allow distributed access to files.

Files are stored distributedly.

If you just do #1 and #2, you are a network file system.

To do #3, it’s a good idea to also provide fault tolerance.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 15 / 42

Page 50: Distributed Computing

GFS Architecture

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 16 / 42

Page 51: Distributed Computing

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may besmaller).

Each chunk is identified by an unique 64-bit id.

Chunks are stored as regular files on local disks.

By default, each chunk is stored thrice, preferably on more than onerack.

To protect data integrity, each 64 KB block gets a 32 bit checksumthat is checked on all reads.

When idle, a chunkserver scans inactive chunks for corruption.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42

Page 52: Distributed Computing

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may besmaller).

Each chunk is identified by an unique 64-bit id.

Chunks are stored as regular files on local disks.

By default, each chunk is stored thrice, preferably on more than onerack.

To protect data integrity, each 64 KB block gets a 32 bit checksumthat is checked on all reads.

When idle, a chunkserver scans inactive chunks for corruption.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42

Page 53: Distributed Computing

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may besmaller).

Each chunk is identified by an unique 64-bit id.

Chunks are stored as regular files on local disks.

By default, each chunk is stored thrice, preferably on more than onerack.

To protect data integrity, each 64 KB block gets a 32 bit checksumthat is checked on all reads.

When idle, a chunkserver scans inactive chunks for corruption.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42

Page 54: Distributed Computing

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may besmaller).

Each chunk is identified by an unique 64-bit id.

Chunks are stored as regular files on local disks.

By default, each chunk is stored thrice, preferably on more than onerack.

To protect data integrity, each 64 KB block gets a 32 bit checksumthat is checked on all reads.

When idle, a chunkserver scans inactive chunks for corruption.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42

Page 55: Distributed Computing

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may besmaller).

Each chunk is identified by an unique 64-bit id.

Chunks are stored as regular files on local disks.

By default, each chunk is stored thrice, preferably on more than onerack.

To protect data integrity, each 64 KB block gets a 32 bit checksumthat is checked on all reads.

When idle, a chunkserver scans inactive chunks for corruption.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42

Page 56: Distributed Computing

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may besmaller).

Each chunk is identified by an unique 64-bit id.

Chunks are stored as regular files on local disks.

By default, each chunk is stored thrice, preferably on more than onerack.

To protect data integrity, each 64 KB block gets a 32 bit checksumthat is checked on all reads.

When idle, a chunkserver scans inactive chunks for corruption.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 17 / 42

Page 57: Distributed Computing

GFS Architecture: Master

Stores all metadata (namespace, access control).

Stores (file − > chunks) and (chunk − > location) mappings.

Clients get chunk locations for a file from the master, and then talkdirectly to the chunkservers for the data.

Advantage of single master simplicity.

Disadvantages of single master:

Metadata operations are bottlenecked.

Maximum Number of files limited by master’s memory.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42

Page 58: Distributed Computing

GFS Architecture: Master

Stores all metadata (namespace, access control).

Stores (file − > chunks) and (chunk − > location) mappings.

Clients get chunk locations for a file from the master, and then talkdirectly to the chunkservers for the data.

Advantage of single master simplicity.

Disadvantages of single master:

Metadata operations are bottlenecked.

Maximum Number of files limited by master’s memory.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42

Page 59: Distributed Computing

GFS Architecture: Master

Stores all metadata (namespace, access control).

Stores (file − > chunks) and (chunk − > location) mappings.

Clients get chunk locations for a file from the master, and then talkdirectly to the chunkservers for the data.

Advantage of single master simplicity.

Disadvantages of single master:

Metadata operations are bottlenecked.

Maximum Number of files limited by master’s memory.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42

Page 60: Distributed Computing

GFS Architecture: Master

Stores all metadata (namespace, access control).

Stores (file − > chunks) and (chunk − > location) mappings.

Clients get chunk locations for a file from the master, and then talkdirectly to the chunkservers for the data.

Advantage of single master simplicity.

Disadvantages of single master:

Metadata operations are bottlenecked.

Maximum Number of files limited by master’s memory.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42

Page 61: Distributed Computing

GFS Architecture: Master

Stores all metadata (namespace, access control).

Stores (file − > chunks) and (chunk − > location) mappings.

Clients get chunk locations for a file from the master, and then talkdirectly to the chunkservers for the data.

Advantage of single master simplicity.

Disadvantages of single master:

Metadata operations are bottlenecked.

Maximum Number of files limited by master’s memory.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42

Page 62: Distributed Computing

GFS Architecture: Master

Stores all metadata (namespace, access control).

Stores (file − > chunks) and (chunk − > location) mappings.

Clients get chunk locations for a file from the master, and then talkdirectly to the chunkservers for the data.

Advantage of single master simplicity.

Disadvantages of single master:

Metadata operations are bottlenecked.

Maximum Number of files limited by master’s memory.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42

Page 63: Distributed Computing

GFS Architecture: Master

Stores all metadata (namespace, access control).

Stores (file − > chunks) and (chunk − > location) mappings.

Clients get chunk locations for a file from the master, and then talkdirectly to the chunkservers for the data.

Advantage of single master simplicity.

Disadvantages of single master:

Metadata operations are bottlenecked.

Maximum Number of files limited by master’s memory.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 18 / 42

Page 64: Distributed Computing

GFS: Life of a Read

Client program asks for 1 Gb of file “A” starting at the 200 millionthbyte.

Client GFS library asks master for chunks 3, ... 16387 of file “A”.

Master responds with all of the locations of chunks 2, ... 20000 of file“A”.

Client caches all of these locations (with their cache time-outs)

Client reads chunk 2 from the closest location.

Client reads chunk 3 from the closest location.

...

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42

Page 65: Distributed Computing

GFS: Life of a Read

Client program asks for 1 Gb of file “A” starting at the 200 millionthbyte.

Client GFS library asks master for chunks 3, ... 16387 of file “A”.

Master responds with all of the locations of chunks 2, ... 20000 of file“A”.

Client caches all of these locations (with their cache time-outs)

Client reads chunk 2 from the closest location.

Client reads chunk 3 from the closest location.

...

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42

Page 66: Distributed Computing

GFS: Life of a Read

Client program asks for 1 Gb of file “A” starting at the 200 millionthbyte.

Client GFS library asks master for chunks 3, ... 16387 of file “A”.

Master responds with all of the locations of chunks 2, ... 20000 of file“A”.

Client caches all of these locations (with their cache time-outs)

Client reads chunk 2 from the closest location.

Client reads chunk 3 from the closest location.

...

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42

Page 67: Distributed Computing

GFS: Life of a Read

Client program asks for 1 Gb of file “A” starting at the 200 millionthbyte.

Client GFS library asks master for chunks 3, ... 16387 of file “A”.

Master responds with all of the locations of chunks 2, ... 20000 of file“A”.

Client caches all of these locations (with their cache time-outs)

Client reads chunk 2 from the closest location.

Client reads chunk 3 from the closest location.

...

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42

Page 68: Distributed Computing

GFS: Life of a Read

Client program asks for 1 Gb of file “A” starting at the 200 millionthbyte.

Client GFS library asks master for chunks 3, ... 16387 of file “A”.

Master responds with all of the locations of chunks 2, ... 20000 of file“A”.

Client caches all of these locations (with their cache time-outs)

Client reads chunk 2 from the closest location.

Client reads chunk 3 from the closest location.

...

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42

Page 69: Distributed Computing

GFS: Life of a Read

Client program asks for 1 Gb of file “A” starting at the 200 millionthbyte.

Client GFS library asks master for chunks 3, ... 16387 of file “A”.

Master responds with all of the locations of chunks 2, ... 20000 of file“A”.

Client caches all of these locations (with their cache time-outs)

Client reads chunk 2 from the closest location.

Client reads chunk 3 from the closest location.

...

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42

Page 70: Distributed Computing

GFS: Life of a Read

Client program asks for 1 Gb of file “A” starting at the 200 millionthbyte.

Client GFS library asks master for chunks 3, ... 16387 of file “A”.

Master responds with all of the locations of chunks 2, ... 20000 of file“A”.

Client caches all of these locations (with their cache time-outs)

Client reads chunk 2 from the closest location.

Client reads chunk 3 from the closest location.

...

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 19 / 42

Page 71: Distributed Computing

GFS: Life of a Write

Client gets locations of chunk replicas as before.

For each chunk, client sends the write data to nearest replica.

This replica sends the data to the nearest replica to it that has notyet received the data.

When all of the replicas have received the data, then it is safe forthem to actually write it.

Tricky Details:

Master hands out a short term ( 1 minute) lease for a particularreplica to be the primary one.

This primary replica assigns a serial number to each mutation so thatevery replica performs the mutations in the same order.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42

Page 72: Distributed Computing

GFS: Life of a Write

Client gets locations of chunk replicas as before.

For each chunk, client sends the write data to nearest replica.

This replica sends the data to the nearest replica to it that has notyet received the data.

When all of the replicas have received the data, then it is safe forthem to actually write it.

Tricky Details:

Master hands out a short term ( 1 minute) lease for a particularreplica to be the primary one.

This primary replica assigns a serial number to each mutation so thatevery replica performs the mutations in the same order.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42

Page 73: Distributed Computing

GFS: Life of a Write

Client gets locations of chunk replicas as before.

For each chunk, client sends the write data to nearest replica.

This replica sends the data to the nearest replica to it that has notyet received the data.

When all of the replicas have received the data, then it is safe forthem to actually write it.

Tricky Details:

Master hands out a short term ( 1 minute) lease for a particularreplica to be the primary one.

This primary replica assigns a serial number to each mutation so thatevery replica performs the mutations in the same order.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42

Page 74: Distributed Computing

GFS: Life of a Write

Client gets locations of chunk replicas as before.

For each chunk, client sends the write data to nearest replica.

This replica sends the data to the nearest replica to it that has notyet received the data.

When all of the replicas have received the data, then it is safe forthem to actually write it.

Tricky Details:

Master hands out a short term ( 1 minute) lease for a particularreplica to be the primary one.

This primary replica assigns a serial number to each mutation so thatevery replica performs the mutations in the same order.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42

Page 75: Distributed Computing

GFS: Life of a Write

Client gets locations of chunk replicas as before.

For each chunk, client sends the write data to nearest replica.

This replica sends the data to the nearest replica to it that has notyet received the data.

When all of the replicas have received the data, then it is safe forthem to actually write it.

Tricky Details:

Master hands out a short term ( 1 minute) lease for a particularreplica to be the primary one.

This primary replica assigns a serial number to each mutation so thatevery replica performs the mutations in the same order.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42

Page 76: Distributed Computing

GFS: Life of a Write

Client gets locations of chunk replicas as before.

For each chunk, client sends the write data to nearest replica.

This replica sends the data to the nearest replica to it that has notyet received the data.

When all of the replicas have received the data, then it is safe forthem to actually write it.

Tricky Details:

Master hands out a short term ( 1 minute) lease for a particularreplica to be the primary one.

This primary replica assigns a serial number to each mutation so thatevery replica performs the mutations in the same order.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42

Page 77: Distributed Computing

GFS: Life of a Write

Client gets locations of chunk replicas as before.

For each chunk, client sends the write data to nearest replica.

This replica sends the data to the nearest replica to it that has notyet received the data.

When all of the replicas have received the data, then it is safe forthem to actually write it.

Tricky Details:

Master hands out a short term ( 1 minute) lease for a particularreplica to be the primary one.

This primary replica assigns a serial number to each mutation so thatevery replica performs the mutations in the same order.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 20 / 42

Page 78: Distributed Computing

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutationlog.

Both are replicated.

Master election and notification is implemented using an external lockserver.

New master restores state from checkpoint and log.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42

Page 79: Distributed Computing

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutationlog.

Both are replicated.

Master election and notification is implemented using an external lockserver.

New master restores state from checkpoint and log.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42

Page 80: Distributed Computing

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutationlog.

Both are replicated.

Master election and notification is implemented using an external lockserver.

New master restores state from checkpoint and log.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42

Page 81: Distributed Computing

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutationlog.

Both are replicated.

Master election and notification is implemented using an external lockserver.

New master restores state from checkpoint and log.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 21 / 42

Page 82: Distributed Computing

MapReduce

MapReduce

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 22 / 42

Page 83: Distributed Computing

Do We Need It?

Yes: Otherwise some problems are too big.

Example: 20+ billion web pages x 20KB = 400+ terabytes

One computer can read 30-35 MB/sec from disk

four months to read the web

Same problem with 1000 machines, < 3 hours

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42

Page 84: Distributed Computing

Do We Need It?

Yes: Otherwise some problems are too big.

Example: 20+ billion web pages x 20KB = 400+ terabytes

One computer can read 30-35 MB/sec from disk

four months to read the web

Same problem with 1000 machines, < 3 hours

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42

Page 85: Distributed Computing

Do We Need It?

Yes: Otherwise some problems are too big.

Example: 20+ billion web pages x 20KB = 400+ terabytes

One computer can read 30-35 MB/sec from disk

four months to read the web

Same problem with 1000 machines, < 3 hours

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42

Page 86: Distributed Computing

Do We Need It?

Yes: Otherwise some problems are too big.

Example: 20+ billion web pages x 20KB = 400+ terabytes

One computer can read 30-35 MB/sec from disk

four months to read the web

Same problem with 1000 machines, < 3 hours

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42

Page 87: Distributed Computing

Do We Need It?

Yes: Otherwise some problems are too big.

Example: 20+ billion web pages x 20KB = 400+ terabytes

One computer can read 30-35 MB/sec from disk

four months to read the web

Same problem with 1000 machines, < 3 hours

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 23 / 42

Page 88: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 89: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 90: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 91: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 92: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 93: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 94: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 95: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 96: Distributed Computing

Bad News!

Bad News!!

communication and coordination

recovering from machine failure (all the time!)

debugging

optimization

locality

Bad news II: repeat for every problem you want to solve

Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 24 / 42

Page 97: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 98: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 99: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 100: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 101: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 102: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 103: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 104: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 105: Distributed Computing

MapReduce

A simple programming model that applies to many large-scalecomputing problems

Hide messy details in MapReduce runtime library:

automatic parallelization

load balancing

network and disk transfer optimization

handling of machine failures

robustness

Therfore we can write application level programs and let MapReduceinsulate us from many concerns.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 25 / 42

Page 106: Distributed Computing

Map Reduce Paradigm

Read a lot of data

Map: extract something you care about from each record.

Shuffle and Sort.

Reduce: aggregate, summarize, filter, or transform

Write the results.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42

Page 107: Distributed Computing

Map Reduce Paradigm

Read a lot of data

Map: extract something you care about from each record.

Shuffle and Sort.

Reduce: aggregate, summarize, filter, or transform

Write the results.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42

Page 108: Distributed Computing

Map Reduce Paradigm

Read a lot of data

Map: extract something you care about from each record.

Shuffle and Sort.

Reduce: aggregate, summarize, filter, or transform

Write the results.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42

Page 109: Distributed Computing

Map Reduce Paradigm

Read a lot of data

Map: extract something you care about from each record.

Shuffle and Sort.

Reduce: aggregate, summarize, filter, or transform

Write the results.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42

Page 110: Distributed Computing

Map Reduce Paradigm

Read a lot of data

Map: extract something you care about from each record.

Shuffle and Sort.

Reduce: aggregate, summarize, filter, or transform

Write the results.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 26 / 42

Page 111: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 112: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 113: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 114: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 115: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 116: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 117: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 118: Distributed Computing

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

For example, key = URL, value = HTML of the web page.

Programmer specifies two primary methods:

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,v”n”)>

All v’ with same k’ are reduced together.

(Remember the invisible “Shuffle and Sort” step.)

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 27 / 42

Page 119: Distributed Computing

Working

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 28 / 42

Page 120: Distributed Computing

Working

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 29 / 42

Page 121: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 122: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 123: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 124: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 125: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 126: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 127: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 128: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 129: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 130: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 131: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 132: Distributed Computing

Under the hood: Scheduling

One master, many workers

Input data split into M map tasks (typically 64 MB in size)

Reduce phase partitioned into R reduce tasks (#̄ of output files)

Tasks are assigned to workers dynamically

Master assigns each map task to a free worker

Considers locality of data to worker when assigning task

Worker reads task input (often from local disk!)

Worker produces R local files containing intermediate (k,v) pairs

Master assigns each reduce task to a free worker

Worker reads intermediate (k,v) pairs from map workers

Worker sorts & applies user’s Reduce op to produce the output

User may specify Partition: which intermediate keys to which Reducer

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 30 / 42

Page 133: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 134: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 135: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 136: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 137: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 138: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 139: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 140: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 141: Distributed Computing

Robustness

One master, many workers.

Detect failure via periodic heartbeats.

Re-execute completed and in-progress map tasks.

Re-execute in-progress reduce tasks.

Master assigns each map task to a free worker.

Master failure:

State is checkpointed to replicated file system.

New master recovers & continues.

Very Robust: lost 1600 of 1800 machines once, but finishedfine-Google.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 31 / 42

Page 142: Distributed Computing

Hadoop

Hadoop

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 32 / 42

Page 143: Distributed Computing

What is hadoop

Apache Hadoop is a Java software framework that supportsdata-intensive distributed applications under a free license.

Hadoop was inspired by Google’s MapReduce and Google File System(GFS) papers.

A Map/Reduce job usually splits the input data-set into independentchunks which are processed by the map tasks in a completely parallelmanner.

It is then made input to the reduce tasks.

The framework takes care of scheduling tasks, monitoring them andre-executes the failed tasks.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42

Page 144: Distributed Computing

What is hadoop

Apache Hadoop is a Java software framework that supportsdata-intensive distributed applications under a free license.

Hadoop was inspired by Google’s MapReduce and Google File System(GFS) papers.

A Map/Reduce job usually splits the input data-set into independentchunks which are processed by the map tasks in a completely parallelmanner.

It is then made input to the reduce tasks.

The framework takes care of scheduling tasks, monitoring them andre-executes the failed tasks.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42

Page 145: Distributed Computing

What is hadoop

Apache Hadoop is a Java software framework that supportsdata-intensive distributed applications under a free license.

Hadoop was inspired by Google’s MapReduce and Google File System(GFS) papers.

A Map/Reduce job usually splits the input data-set into independentchunks which are processed by the map tasks in a completely parallelmanner.

It is then made input to the reduce tasks.

The framework takes care of scheduling tasks, monitoring them andre-executes the failed tasks.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42

Page 146: Distributed Computing

What is hadoop

Apache Hadoop is a Java software framework that supportsdata-intensive distributed applications under a free license.

Hadoop was inspired by Google’s MapReduce and Google File System(GFS) papers.

A Map/Reduce job usually splits the input data-set into independentchunks which are processed by the map tasks in a completely parallelmanner.

It is then made input to the reduce tasks.

The framework takes care of scheduling tasks, monitoring them andre-executes the failed tasks.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42

Page 147: Distributed Computing

What is hadoop

Apache Hadoop is a Java software framework that supportsdata-intensive distributed applications under a free license.

Hadoop was inspired by Google’s MapReduce and Google File System(GFS) papers.

A Map/Reduce job usually splits the input data-set into independentchunks which are processed by the map tasks in a completely parallelmanner.

It is then made input to the reduce tasks.

The framework takes care of scheduling tasks, monitoring them andre-executes the failed tasks.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 33 / 42

Page 148: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 149: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 150: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 151: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 152: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 153: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 154: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 155: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 156: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 157: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 158: Distributed Computing

Who uses Hadoop?

Adobe

AOL

Baidu - the leading Chinese language search engine

Cloudera, Inc - Cloudera provides commercial support andprofessional training for Hadoop.

Facebook

Google

IBM

Twitter

Yahoo!

The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 34 / 42

Page 159: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.

The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 160: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.

Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 161: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.

Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 162: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.

The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 163: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.

Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 164: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.

All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 165: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.

The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 166: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.

The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 167: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.

Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 168: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 169: Distributed Computing

Mapper

Mapper maps input key/value pairs to a set of intermediate key/valuepairs.The Hadoop Map/Reduce framework spawns one map task for eachInputSplit generated by the InputFormat.Output pairs do not need to be of the same types as input pairs.Mapper implementations are passed the JobConf for the job.The framework then calls map method for each key/value pair.Applications can use the Reporter to report progress.All intermediate values associated with a given output key aresubsequently grouped by the framework, and passed to theReducer(s) to determine the final output.The intermediate, sorted outputs are always stored in a simple(key-len, key, value-len, value) format.The number of maps is usually driven by the total size of the inputs,that is, the total number of blocks of the input files.Users can optionally specify a combiner to perform local aggregationof the intermediate outputs.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 35 / 42

Page 170: Distributed Computing

Combiners

When the map operation outputs its pairs they are already availablein memory.

If a combiner is used then the map key-value pairs are notimmediately written to the output.

They are collected in lists, one list per each key value.

When a certain number of key-value pairs have been written, thisbuffer is flushed by passing all the values of each key to the combiner’sreduce method and outputting the key-value pairs of the combineoperation as if they were created by the original map operation.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42

Page 171: Distributed Computing

Combiners

When the map operation outputs its pairs they are already availablein memory.

If a combiner is used then the map key-value pairs are notimmediately written to the output.

They are collected in lists, one list per each key value.

When a certain number of key-value pairs have been written, thisbuffer is flushed by passing all the values of each key to the combiner’sreduce method and outputting the key-value pairs of the combineoperation as if they were created by the original map operation.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42

Page 172: Distributed Computing

Combiners

When the map operation outputs its pairs they are already availablein memory.

If a combiner is used then the map key-value pairs are notimmediately written to the output.

They are collected in lists, one list per each key value.

When a certain number of key-value pairs have been written, thisbuffer is flushed by passing all the values of each key to the combiner’sreduce method and outputting the key-value pairs of the combineoperation as if they were created by the original map operation.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42

Page 173: Distributed Computing

Combiners

When the map operation outputs its pairs they are already availablein memory.

If a combiner is used then the map key-value pairs are notimmediately written to the output.

They are collected in lists, one list per each key value.

When a certain number of key-value pairs have been written, thisbuffer is flushed by passing all the values of each key to the combiner’sreduce method and outputting the key-value pairs of the combineoperation as if they were created by the original map operation.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 36 / 42

Page 174: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 175: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 176: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 177: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 178: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 179: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 180: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 181: Distributed Computing

Reducer

Reducer reduces a set of intermediate values which share a key to asmaller set of values.

Reducer implementations are passed the JobConf for the job.

The framework then calls reduce(WritableComparable, Iterator,OutputCollector, Reporter) method for each ¡key, (list of values)¿ pairin the grouped inputs.

The reducer has 3 primary phases:

Shuffle:Input to the Reducer is the sorted output of the mappers. Inthis phase the framework fetches the relevant partition of the outputof all the mappers, via HTTP.

Sort:The framework groups Reducer inputs by keys (since differentmappers may have output the same key) in this stage.

Reduce:In this phase the reduce method is called for each <key, (listof values)> pair in the grouped inputs.

The generated ouput is a new value.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 37 / 42

Page 182: Distributed Computing

Some Terminology

Job – A “full program” - an execution of a Mapper and Reduceracross a data set.

Task – An execution of a Mapper or a Reducer on a slice of data

Task Attempt – A particular instance of an attempt to execute a taskon a machine.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42

Page 183: Distributed Computing

Some Terminology

Job – A “full program” - an execution of a Mapper and Reduceracross a data set.

Task – An execution of a Mapper or a Reducer on a slice of data

Task Attempt – A particular instance of an attempt to execute a taskon a machine.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42

Page 184: Distributed Computing

Some Terminology

Job – A “full program” - an execution of a Mapper and Reduceracross a data set.

Task – An execution of a Mapper or a Reducer on a slice of data

Task Attempt – A particular instance of an attempt to execute a taskon a machine.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 38 / 42

Page 185: Distributed Computing

Job Distribution

MapReduce programs are contained in a Java “jar” file + an XML filecontaining serialized program configuration options.

Running a MapReduce job places these files into the HDFS andnotifies TaskTrackers where to retrieve the relevant program code.

Data Distribution: Implicit in design of MapReduce!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42

Page 186: Distributed Computing

Job Distribution

MapReduce programs are contained in a Java “jar” file + an XML filecontaining serialized program configuration options.

Running a MapReduce job places these files into the HDFS andnotifies TaskTrackers where to retrieve the relevant program code.

Data Distribution: Implicit in design of MapReduce!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42

Page 187: Distributed Computing

Job Distribution

MapReduce programs are contained in a Java “jar” file + an XML filecontaining serialized program configuration options.

Running a MapReduce job places these files into the HDFS andnotifies TaskTrackers where to retrieve the relevant program code.

Data Distribution: Implicit in design of MapReduce!

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 39 / 42

Page 188: Distributed Computing

Contact Information

Varun [email protected]

http://varunthacker.wordpress.com

Linux User’s Group Manipalhttp://lugmanipal.org

http://forums.lugmanipal.org

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 40 / 42

Page 189: Distributed Computing

Attribution

GoogleUnder the Creative Commons Attribution-Share Alike 2.5 Generic.

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 41 / 42

Page 190: Distributed Computing

Copying

Creative Commons Attribution-Share Alike 2.5 India Licensehttp://creativecommons.org/licenses/by-sa/2.5/in/

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 42 / 42