cool numbers 7 students got the lowest complexity: o(n 1/6 ) well done! –emily vukovich –priyank...

25
Cool Numbers 7 students got the lowest complexity: O(n 1/6 ) Well done! Emily Vukovich Priyank Malvania Kunal Choudhary Vincent Lo Carl Lam Jerry Zhang Weicheng Cao Best solution: not just coding, also thinking about the math of the problem Typical of the best programs/algorithms

Upload: sawyer-jerome

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Cool Numbers

• 7 students got the lowest complexity: O(n1/6)Well done!– Emily Vukovich– Priyank Malvania– Kunal Choudhary– Vincent Lo– Carl Lam– Jerry Zhang– Weicheng Cao

• Best solution: not just coding, also thinking about the math of the problem– Typical of the best programs/algorithms

m4 Leaderboard

Now you’ve made the head TA angry!

Administration

• Course evaluations– Due April 13

• Response rate so far: 1.2% !

– Please fill out • We read them all & take very seriously• Completely redesigned course

What worked & what didn’t?

• Free-form questions– 1 most interesting / important thing you learned– 1 least interesting thing what would you remove?

Multi-Threading• Program start: 1 “main” thread created• Need more: construct threads

#include <iostream>#include <thread> // C++ 11 featureusing namespace std;

void start_func_for_thread ( ) { cout << “myThread lives!\n”; do_func_a ();}

int main() { cout << “Main thread starting.\n”; thread myThread (start_func_for_thread); cout << “Main thread continues!\n”; do_func_b (); return 0;}

Compile:g++ --std=c++11 -pthread main.cpp

Thread TimingThread

Time

main“M

ain thread starting”

“Main thread continues”

myThread

do_func_b ()

“myThread lives!”

do_func_a ()

ExitProgram

Main thread startingmyThread lives!Main thread continues

Output

Another Possible TimingThread

Time

main“M

ain thread starting”

“Main thread continues”

myThread

do_func_b ()

“myThread lives!”

do_func_a ()

ExitProgram

Main thread startingMain thread continuesmyThread lives!

Output

Possible Timing?Thread

Time

main“M

ain thread starting”

“Main thread continues”

myThread

do_func_b ()

ExitProgram

Main thread startingMain thread continues

Output

Multi-Threading#include <iostream>#include <thread> // C++ 11 featureusing namespace std;

void start_func_for_thread ( ) { cout << “myThread lives!\n”; do_func_a ();}

int main() { cout << “Main thread starting.\n”; thread myThread (start_func_for_thread); cout << “Main thread continues!\n”;

// main thread waits here until myThread finishes myThread.join (); do_func_b (); return 0;}

New Thread TimingThread

Time

main“M

ain thread starting”

“Main thread continues”

myThread

do_func_b ()

“myThread lives!”

do_func_a ()

ExitProgram

Main thread startingmyThread lives!Main thread continues

Output

joinconstruct thread

Many Threads#define NUM_THREADS 10

void call_from_thread (int tid) { cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl;}

int main() { thread myThread[NUM_THREADS];

for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);

cout << “Main thread: Godfrey’s the main man!\n”;

// Main thread waits for all threads to complete for (int i = 0; i < NUM_THREADS; i++) myThread[i].join();

return 0;}

Can pass parametersto thread start function

Output?Thread Thread Thread Thread 12: Godfrey Hounsfield is #: Godfrey Hounsfield is #120: Godfrey Hounsfield is #0Thread 4: Godfrey Hounsfield is #43: Godfrey Hounsfield is #3

Thread 6: Godfrey Hounsfield is #6Thread 5: Godfrey Hounsfield is #5Thread 7: Godfrey Hounsfield is #7Thread 8: Godfrey Hounsfield is #8Main thread: Godfrey's the main man!Thread 9: Godfrey Hounsfield is #9

What happened?

Thread Synchronization

• Problem– cout is a global variable– All threads writing to it without coordination /

synchronization– Getting their output interleaved

• Solution– Only one thread should write to cout at a time– How?– mutex (mutual exclusion) variable– “lock” this variable

• Only 1 thread gets lock at a time• Only that thread can write to cout

Synchronizing Threads#include <mutex>mutex outputSync; // Global variable: who controls cout?

void call_from_thread (int tid) { // Need to grab the cout control variable (mutex) // Will block (wait) on next line until we get the lock lock_guard<mutex> getTheOutput(outputSync); cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl; // Lock will be automatically released by destructor // Then another thread can get it.}

int main() { thread myThread[NUM_THREADS];

for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);

cout << “Main thread: Godfrey’s the main man!\n”; . . .

New OutputThread 0: Godfrey Hounsfield is #0Thread 7: Godfrey Hounsfield is #7Main thread: Godfrey's the main man!Thread 9: Godfrey Hounsfield is #9Thread 3: Godfrey Hounsfield is #3Thread 4: Godfrey Hounsfield is #4Thread 5: Godfrey Hounsfield is #5Thread 6: Godfrey Hounsfield is #6Thread 1: Godfrey Hounsfield is #1Thread 2: Godfrey Hounsfield is #2Thread 8: Godfrey Hounsfield is #8

Output not interleaved

Each thread grabs cout to output an entire line

Order in which threads execute still arbitrary

Find the Bug!#include <mutex>mutex outputSync; // Global variable: who controls cout?

void call_from_thread (int tid) { lock_guard<mutex> getTheOutput(outputSync); cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl;}

int main() { thread myThread[NUM_THREADS];

for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);

cout << “Main thread: Godfrey’s the main man!\n”; . . .

Not respecting the mutex on cout! Can interleave output with the helper threads

Corrected Code#include <mutex>mutex outputSync; // Global variable: who controls cout?

void call_from_thread (int tid) { lock_guard<mutex> getTheOutput(outputSync); cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl;}

int main() { thread myThread[NUM_THREADS];

for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);

{ lock_guard<mutex> getOutput(outputSync); cout << “Main thread: Godfrey’s the main man!\n”; } . . .

//Destructor will release the lock here

Why are these

braces here?

Who is Godfrey Hounsfield?

1. A British Electrical Engineer

2. Knighted (Sir Godfrey)

3. Winner of the Nobel Prize in medicine

4. Inventor of the CAT scanner

Broad knowledge innovate across disciplines!

Multithreading for Speed• Vector dot product

– prod = v1 v2

double dotProd (const vector<float>& v1, const vector<float>& v2) { double prod = 0; for (int i = 0; i < v1.size(); i++) { prod += v1[i] * v2[i]; } return (prod);}

#define N 400000000int main () { vector<float> v1(1,N), v2(2,N); cout << “dot product: “ << dotProd (v1, v2) << endl;}

Output: 8e8

Parallel Dot Product: Idea

111111111111

222222222222

v1 v2

myThread[0]111111111111

222222222222

v1 v2 prod02468

Serial

myThread[1]

myThread[2]

myThread[3]

Parallel on 4 CPUs

prod

081624

. . .

24

Parallel Dot Product

#define NUM_THREADS 4 // 4 CPUs on each UG machine

double dotProd (const vector<float>& v1, const vector<float>& v2) { double prod = 0; thread myThread[NUM_THREADS];

// Send 1/4 of the vector to each of 4 threads for (int ithr = 0; ithr < NUM_THREADS; ithr++) myThread[ithr] = thread (dpHelper, ref(v1), ref(v2), ithr * N/NUM_THREADS, (ithr+1)*N/NUM_THREADS, ref(prod));

// Wait for all threads to complete for (ithr = 0; ithr < NUM_THREADS; ithr++) myThread[ithr].join(); return (prod);}

Won’t pass by reference unless you explicitly say

to wants to make a copy per thread

Parallel Dot Product

#define NUM_THREADS 4 // 4 CPUs on each UG machine

void dpHelper (const vector<float>& v1, const vector<float>& v2, int istart, int iend, double& prod) {

for (int i = istart; i < iend; i++) prod += v1[i] * v2[i];}

Output: 2e8or 3.3e8or 4.6e8, ??

Race condition!Many threads updating one variable

This is read, then a write! Some threads reading old value, adding and overwriting someone else’s addition

What’s Really Happening

111111111111

222222222222

v1 v2

myThread[0]

myThread[1]

myThread[2]

myThread[3]

Parallel on 4 CPUs

prod

061014Should be 24, not 14!

Unsynchronized updates some

additions are being lost/over-written

Fix with Output Product per Thread

111111111111

222222222222

v1 v2

myThread[0]

myThread[1]

myThread[2]

myThread[3]

Parallel on 4 CPUs

prod[0]

0246

prod[1]

0246

prod[2]

0246

prod[3]

0246

mainthread

total

06121824

Only one thread ever reads/writes to these variables at a time

No race condition, and no lost additions

Fixed Code: Output Var per Thread

double dotProd (const vector<float>& v1, const vector<float>& v2) { double prod[NUM_THREADS]; // Partial dot products thread myThread[NUM_THREADS];

for (int ithr = 0; ithr < NUM_THREADS; ithr++) { myThread[ithr] = thread (dpHelper, ref(v1), ref(v2), ithr * N/NUM_THREADS, (ithr+1)*N/NUM_THREADS, ref(prod[ithr]));

// Wait for all threads to complete for (ithr = 0; ithr < NUM_THREADS; ithr++) myThread[ithr].join();

double total = 0; // Now add up the complete total for (int ithr = 0; ithr < NUM_THREADS; ithr++) total += prod[ithr];

return (total);}

m4

• Do not try multi-threading until you have a good serial implementation!

• Could multi-thread:– Calculation of paths / path delays– Search for best solutions

• Need to privitize (duplicate per thread) some data structures– E.g. Priority queue for wavefront– Best solution for optimizer?

• Or should you lock it?• Or a combination?