csc 660: advanced operating systemsslide #1 csc 660: advanced os concurrent programming

28
CSC 660: Advanced Operating Systems Slide #1 CSC 660: Advanced OS Concurrent Programming

Upload: sabrina-shelton

Post on 03-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #1

CSC 660: Advanced OS

Concurrent Programming

Page 2: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #2

Topics

1. Multi-core processors

2. Types of concurrency

3. Threads

4. MapReduce

Page 3: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #3

Multi-Core Processors• Building multiple

processors on one die.• Multi-Core vs SMP

– Cheaper

– May share cache/bus.

– Faster communication.

Intel Core 2 Duo Architecture

Page 4: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #4

Why Multi-Core?

Page 5: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #5

Amdahl’s LawThe increase in performance of a computation due to an improvement of a proportion P of the computation by a factor S is given by

             .

Graph shows speedup for computations where 10%, 20%, 50%, or 100% of the computation is parallelizable.

Page 6: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #6

Concurrency vs Parallelism

Concurrency: Logically simultaneous processing. Does not require multiple processors.

Parallelism: Physically simultaneous processing. Does require multiple processors.

Page 7: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #7

Parallel Programming Paradigms

Shared Memory

• Communicate via altering shared vars.

• Requires synchronization.

• Faster.

• Harder to reason about.

Message Passing

• Communicate by exchanging messages.

• Doesn’t require synchronization.

• Safer.

• Easier to reason about.

Page 8: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #8

Shared Memory

Shared Memory Concurrency isThreads + Synchronization

Synchronization Types• Locks• Semaphores• Monitors• Transactional Memory

Page 9: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #9

Threads

• Multiple paths of execution running in a single shared memory space.– Threads have local data, e.g. stack.– Code and global data are shared.– Need to synchronize concurrent accesses.

• Types of threads– pthreads (POSIX threads)– Java threads– .NET threads

Page 10: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #10

Thread Implementation

Kernel threadsKernel supports and schedules threads.

Blocking I/O blocks only one thread.

User threads (green threads)Co-operatively scheduled in single kernel thread.

Lightweight (faster to start, switch than kernel).

Need to use non-blocking I/O.

Page 11: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #11

Why are threads hard?

SynchronizationMust coordinate access to shared data w/ locks.

DeadlockMust always acquire locks in same order.

Must know every path that accesses data and what other data each path requires.

Breaks modularitySee deadlock

DebuggingData and time dependencies.

Page 12: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #12

Why are threads hard?

PerformanceLow concurrency if locks are coarse grained.

Code is complex if locks are fine grained.

Performance cost of fine grained locks.

SupportDifferent OSes use different thread libraries.

Many libraries are not thread safe.

Few debugging tools.

Page 13: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #13

Software Transactional Memory

Memory Transactions– Set of memory ops that executes atomically.– Illusion of serial execution like db transactions.– Easy to code like coarse-grained locking.– Scalability comparable to fine-grained locking

without deadlock issues.

Language Support– Haskell– Fortress

Page 14: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #14

Synchronized vs Atomic Code

Page 15: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #15

Implementing STM

Data Versioning– Transaction works on new copy of data.– Copy becomes visible if transaction succeeds.

Conflict Detection– Conflict occurs when two transactions work with

same data and at least one writes.– Track read and write sets for each transaction.– Pessimistic detection: checks during transaction.– Optimistic detection: checks after transaction.

Page 16: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #16

Message Passing

Threads have no shared state.No need for synchronization.

Communication via messages.

Message-passing Languages• Erlang• E• Oz

Page 17: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #17

Erlang

Features– No mutable state.– Message passing concurrency.– Green threads (1000s of parallel connections)– Fault tolerance (99.999+% availability)

Applications– Telecommunications– Air traffic control– IM (ejabberd at jabber.org)

Page 18: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #18

Yaws vs Apache: Throughput vs Load

Page 19: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #19

MapReduce

MapInput: (k1, v1)

Output: list(k2,v2)

ReduceInput: (k2, list(v2))

Output: list(v2)

Similar to Lisp functions.

Page 20: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #20

Example: wcmap(String key, String value):

// key: document name

// value: document contents

for each word w in value:

EmitIntermediate(w, "1");

reduce(String key, Iterator values):

// key: a word

// values: a list of counts

int result = 0;

for each v in values:

result += ParseInt(v);

Emit(AsString(result));

Page 21: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #21

MapReduce Clusters

• Cluster consists of 1000s of machines.

• Machines are dual-proc x86 Linux 2-4GB.

• Networking: 100MBps – 1 GBps

• Storage: Local IDE hard disks, GoogleFS

• Users submit job to scheduling system.

Page 22: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #22

Execution Overview

1. MapReduce library in user program splits input files into M pieces of 16-64MB each.

2. Master copy and workers start. Master has M map tasks and R reduce tasks to assign to idle workers. M >> #workers

3. Map worker reads contents of input split, parses key/value pairs, and passes them to user-defined Map function.

4. Periodically map workers write output pairs to local disk, partitioned into R regions. Locations of pairs passed back to master.

Page 23: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #23

Execution Overview

5. When a reduce worker is notified about a location, it uses RPCs to read map output pairs from local disk of map workers. It sorts the data by keys.

6. Reduce worker iterates over intermediate data. For each unique intermediate key, it passes key and set of intermediate values to user-defined Reduce function. Output of Reduce function sent to final output file.

7. When all Map and Reduce tasks complete, Master wakes up user program and MapReduce() call returns to user code.

Page 24: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #24

Execution Overview

Page 25: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #25

Fault Tolerance

Worker Failure– Master pings workers periodically.– If no response, Master marks worker as failed.– Master will mark worker task as idle and reassign.

Master Failure– Current implementation fails if Master dies.– Master writes periodic checkpoints.– If dies, restarts from last checkpoint.

Page 26: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #26

Backup Tasks

Stragglers– Workers that take an unusually long time to complete

their tasks.

– Often due to failing disk being slow.

Backup Tasks– When MapReduce almost complete, Master schedules

backup executions of remaining tasks.

– Task marked completed when either original worker or backup worker completes it.

– Increases computational requirements slightly.

– Improves execution time noticeably.

Page 27: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #27

Google Index Build

• Crawler returns 20 TB of data.

• Indexer runs 5-10 mapreduce ops.

• Code complexity of each operation reduced significantly from prior indexer.

• MapReduce efficiency high enough not to reduce number of data passes.

Page 28: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Concurrent Programming

CSC 660: Advanced Operating Systems Slide #28

References1. Ali-Reza Adl-Tabatabai, Christos Kozyrakis, Bratin Saha, “Multicore

Programming with Transactional Memory,” Computer Architecture, 4(10), http://acmqueue.com/modules.php?name=Content&pa=printer_friendly&pid=444&page=1, 2007.

2. Gregory Andrews, Fundamentals of Multithreaded, Paralle, and Distributed Programming, Addison-Wesley, 2000.

3. Gregory Andrews and Fred Schneider, “Concepts and Notations for Concurrent Programming,” ACM Computing Surveys, 15(1), 1983.

4. Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” OSDI’04: Sixth Symposium on Operating System Design and Implementation, December 2004.

5. John Osterhout, “Why Threads are a Bad Idea,” http://home.pacbell.net/ouster/threads.pdf, 2002.

6. Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne, Operating System Concepts, 6th edition, Wiley, 2003.

7. Herb Sutter, “The Free Lunch is Over: A Fundamental Turn Towards Concurrency,” Dr. Dobbs Journal, 30(3), March 2005.