unlocking energy - usenix · unlocking energy vasileios trigonakis | 06.2016 5 energy efficiency by...

20
Vasileios Trigonakis | 06.2016 1 Unlocking Energy Babak Falsafi, Rachid Guerraoui, Javier Picorel, Vasileios Trigonakis

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20161

Unlocking Energy

Babak Falsafi, Rachid Guerraoui, Javier Picorel, Vasileios Trigonakis

Page 2: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20162

Unlocking Energy

Conference Dilemma: To Sleep or Not to Sleep?

For the next 25 minutes

Do not sleep for such short duration

Motivation

sleep

busy wait

locking

Page 3: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20163

Unlocking Energy

Energy Efficiency Through Synchronization (Locking)

Why lock-based synchronization?

1. Concurrent systems synchronize with locks

2. Locks are well-defined abstractions

lock() / unlock()

3. Locking strategies affect power consumption

Motivation

busy lock

lock() wait!(waste time

and energy)

Page 4: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20164

Unlocking Energy

0

0.5

1

1.5

2

Power Energy Efficiency

No

rmal

ized

to

Sle

epin

g

sleeping spinning

Lock Waiting Techniques

Locking is a good candidate for reducing energy consumption

Motivation

busy lock

sleeping

(blocking)

busy waiting

(spinning)

Java CopyOnWriteArrayList

Energy Efficiency

Throughput / Power

Q. Is sleeping energy-friendly?

Page 5: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20165

Unlocking Energy

Energy Efficiency By Improving Locking

1. Concurrent systems synchronize with locks

2. Locks are well-defined abstractions

3. Locking strategies affect power consumption

POLY

4.

Energy efficiency and throughput go hand in hand

in the context of lock algorithms

Page 6: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20166

Unlocking Energy

Outline

• Motivation

• Improving the energy efficiency of systems

Target platform2-socket Intel Ivy Bridge

20 cores, 40 hyper-threads

busy lock

busy waiting hurts power consumption

sleeping saves power, but hurts throughput

spin-then-sleep “cleverly”

Page 7: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20167

Unlocking Energy

Observations

Power Consumption of Waiting

Busy waiting is very power hungry

1 lock, never released

Waiting

1. Sleeping power-friendly

2. Busy waiting

while (*lock != FREE) {}

0

20

40

60

80

100

120

140

160

10 20 30 40

Po

wer

(W

atts

)

# Threads

sleeping busy waiting

Page 8: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20168

Unlocking Energy

Observations

Reducing Power of Busy Waiting

Power consumption of busy waiting cannot be practically reduced

1 lock, never released

Busy Waiting

1. empty: 1 iteration / cycle

2. Intel docs: use pause

3. pause not ideal

4. mfence > pause

5. Still, mfence ~5% better

6. DVFS and mwait

are not practical

(details in the paper)

while (*lock != FREE) { ?? }

0

50

100

150

200

10 20 30 40

Po

wer

(W

atts

)

# Threads

empty pause mfence DVFS mwait

Page 9: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.20169

Unlocking Energy

Outline

• Motivation

• Improving the energy efficiency of systems

busy lock

busy waiting hurts power consumption

sleeping saves power, but hurts throughput

spin-then-sleep “cleverly”

Page 10: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201610

Unlocking Energy

0

0.2

0.4

0.6

0.8

1

sleeping unfair spin fair spin

No

rmal

ized

Th

rou

gh

pu

t

MySQL In-Memory

Sleeping Might Be Necessary (For Two Reasons)

Sleeping can reduce power consumption (and more)

Sleeping

Power Consumption—Waiting Locks with Multiprogramming1 2

# threads >

hw contexts

0

50

100

150

10 20 30 40

Po

wer

(W

atts

)

# Threads

sleeping busy waiting

Page 11: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201611

Unlocking Energy

Observations

Latency: The Price of Sleeping

Frequent sleep/wake-up calls reduce throughput without saving energy

2 threads invoke futex

1 sleeps,1 wakes up

Sleeping

1. Sleep call:

release context

2. Wake-up call:

to handover the lock

3. Turnaround latency ≈

lock handover latency

0

1000

2000

3000

4000

5000

6000

7000

8000

sleep call wake-up call turnaround

Lat

ency

(cy

cles

)

large critical section

small critical section

Page 12: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201612

Unlocking Energy

Outline

• Motivation

• Improving the energy efficiency of systems

busy lock

busy waiting hurts power consumption

sleeping saves power, but hurts throughput

spin-then-sleep “cleverly”

Page 13: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201613

Unlocking Energy

Reducing Fairness: Sleeping for Long Durations

Trade fairness for energy efficiency

Power Consumption Communication Throughput

Passing a “token”

from thread to thread

Sping-then-sleep

0

50

100

150

1 5 10 15 20 25 30 35

Po

wer

(W

atts

)

# Threads

sleep spin unfair

0

5

10

15

1 5 10 15 20 25 30 35Th

rou

gh

pu

t (O

ps/

s)

Mill

ions

# Threads

sleep spin unfair

unfair: 1000:1 spin-to-sleep ratio (while 2 threads spin, the rest sleep)

Page 14: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201614

Unlocking Energy

How can we use these results in designing locks?

Design locks

Page 15: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201615

Unlocking Energy

Problems of Pthread MUTEX lock

Pthread MUTEX does not take into account the sleep overheads

MUTEX

MUTEX

For up to 100 attempts

spin with pause

if still busy, sleep

MUTEX

release in user space

wake up a thread

lock()

unlock()

1. Spins less than sleep latencies

2. Spins with pause

3. Always wakes up a thread

Page 16: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201616

Unlocking Energy

MUTEXEE: An Optimized MUTEX Lock

MUTEXEE

MUTEX MUTEXEE

For up to 100 attempts For up to ~8000 cycles

spin with pause spin with mfence

if still busy, sleep

MUTEX MUTEXEE

release in user space (lock->locked = 0)

wait in user space (~300 cycles)

wake up a thread

lock()

unlock()

Page 17: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201617

Unlocking Energy

Performance of MUTEXEE over MUTEX

MUTEXEE fixes the problematic cases of MUTEX

One lock

MUTEXEE

10

30

500

1

2

3

0

1000

2000

3000

4000

5000

6000

7000

8000

# T

hre

ads

Critical Section (cycles)

10

30

500123456

0

1000

2000

3000

4000

5000

6000

7000

8000

# T

hre

ads

Critical Section (cycles)

Fairness: results and analysis in the paper

Throughput Energy Efficiency

Page 18: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201618

Unlocking Energy

Outline

• Motivation

• Improving the energy efficiency of systems

busy lock

busy waiting hurts power consumption

sleeping saves power, but hurts throughput

spin-then-sleep “cleverly”

Page 19: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201619

Unlocking Energy

Evaluation: Improving Energy Efficiency of Systems Through Locks

Six modern software systems

– Overload pthread mutex

Average Throughput and Energy Efficiency

Locking can indeed be used to improve the energy efficiency of systems

Systems

KyotoCabinet

0

0.5

1

1.5

Throughput Energy Efficiency

No

rmal

ized

to

M

UT

EX

TICKET MUTEXEE

1.26

1.28

Results

1. Benefits: Avoid sleeping

2. Sleeping is sometimes necessary

3. Throughput-driven benefits

4. MUTEXEE >> MUTEX

Page 20: Unlocking Energy - USENIX · Unlocking Energy Vasileios Trigonakis | 06.2016 5 Energy Efficiency By Improving Locking 1. Concurrent systems synchronize with locks 2. Locks are well-defined

Vasileios Trigonakis | 06.201620

Unlocking Energy

Concluding Remarks

• An analysis of the energy efficiency of lock-based synchronization

– Energy efficiency of locks goes hand in hand with throughput

– MUTEXEE: an optimized MUTEX lock

• LOCKIN: https://github.com/LPD-EPFL/lockin

THANK YOU! QUESTIONS?

Conclusions

Locking can be used to improve the energy efficiency of systems