energy efficiant computing in the 21c

53
1 Energy Efficient Computing ... In the early 21C Abstract: With the assistance of its global partners, ARM shipped 8.7 billion CPUs in 2012; a number which continues to grow at around ~20%pa. The 40B we have shipped to date outnumber the total of PC's more than 50 times; and today more than 75% of the things connected to the Internet are ARM based. The dominant nature of Computing in the 21c is very different to that of the Mainframe era. It is sobering to think that if each of those 8.7B CPUs was to dissipate just 100mw, then it would require the output of two modern power stations to drive them; with 2.4 next year, and 3 the year after that! So Electronic Systems are also defining where the real Energy Efficient Computing issue is! But with such a small footprint it must be easy to measure and manage power optimisation? An increasing percentage of these are immensely complex systems, running significant multi-tasking and multi-threaded operating systems on platforms which include multi-processor CPU/GPU configurations, and GB of memory. Whilst their minimum dissipations are a few uW, their peak power exceed the silicon's ability to dissipate it; so the penalty for power un-aware software design is huge. What has been done to manage this in Electronic Systems design, and can any lessons can be transferred to the Classic Computing domains? Context 1hr talk at The Centre for Robotics and Neural Systems (CNRS) at University of Plymouth, Devon, UK. The CRNS has a regular seminar series inviting national and international speakers. http://www.tech.plym.ac.uk/SOCCE/CRNS/ SlideCast and pdf available via http://ianp24.blogspot.co.uk/ Opinions expressed are those of the author alone

Upload: ian-phillips

Post on 06-May-2015

524 views

Category:

Technology


0 download

DESCRIPTION

The early 21c has brought the power of the computer into the hands of the general population, and though these computers consume small amounts of energy they are so numerous that their Energy Efficiency will soon become a major issue. This presentation looks at modern Computing, the ways that Energy Efficiency is currently being enhanced, and the principles behind this.

TRANSCRIPT

Page 1: Energy Efficiant Computing in the 21c

1

Energy Efficient Computing ... In the early 21C

Abstract: With the assistance of its global partners, ARM shipped 8.7 billion CPUs in 2012; a number which continues

to grow at around ~20%pa. The 40B we have shipped to date outnumber the total of PC's more than 50 times; and today more than 75% of the things connected to the Internet are ARM based. The dominant nature of Computing in the 21c is very different to that of the Mainframe era. It is sobering to think that if each of those 8.7B CPUs was to dissipate just 100mw, then it would require the output of two modern power stations to drive them; with 2.4 next year, and 3 the year after that! So Electronic Systems are also defining where the real Energy Efficient Computing issue is! But with such a small footprint it must be easy to measure and manage power optimisation? An increasing percentage of these are immensely complex systems, running significant multi-tasking and multi-threaded operating systems on platforms which include multi-processor CPU/GPU configurations, and GB of memory. Whilst their minimum dissipations are a few uW, their peak power exceed the silicon's ability to dissipate it; so the penalty for power un-aware software design is huge. What has been done to manage this in Electronic Systems design, and can any lessons can be transferred to the Classic Computing domains?

Context 1hr talk at The Centre for Robotics and Neural Systems (CNRS) at University of Plymouth, Devon, UK.

The CRNS has a regular seminar series inviting national and international speakers.

http://www.tech.plym.ac.uk/SOCCE/CRNS/

SlideCast and pdf available via http://ianp24.blogspot.co.uk/

Opinions expressed are those of the author alone

Page 2: Energy Efficiant Computing in the 21c

2

Prof. Ian Phillips Principal Staff Eng’r,

ARM Ltd [email protected]

Visiting Prof. at ...

Contribution to Industry Award 2008

Centre for Robotics and Neural Systems Uo.Plymouth

1nov13

1v0

SlideCast and pdf available via http://ianp24.blogspot.co.uk/

Opinions expressed are those of the author alone

Page 3: Energy Efficiant Computing in the 21c

3

Energy Efficient Computing ..?

Page 4: Energy Efficiant Computing in the 21c

4

Energy Efficient Computing ..?

Page 5: Energy Efficiant Computing in the 21c

5

Energy Efficient Computing ..?

Page 6: Energy Efficiant Computing in the 21c

6

The Visible Face of Computing Today

Page 7: Energy Efficiant Computing in the 21c

7

The Invisible Face of Computing Today

100’s of Billions of computers each consuming mW!

Bringing Embedded Intelligence to the Consumer Market, has changed the Face of Computing! (Again)

Page 8: Energy Efficiant Computing in the 21c

8

Our 21c World ...

Page 9: Energy Efficiant Computing in the 21c

9

Markets provide the Growth Drivers

1960 1970 1980 1990 2000 2010 2020

Milli

ons

of U

nits

1st Era Select work

tasks

2nd Era Broad-based computing

for specific tasks

3rd Era Computing as part

of our lives

Today: ~2% of our Energy Use goes on Computing and Electronics! ... Tomorrow: It could easily be 20%!

Page 10: Energy Efficiant Computing in the 21c

10

ARM in the Digital World

1998 2012 2020

40+ billion CPUs to date

150+ billion CPUs cumulative by 2020

http://www.arm.com/

8.7B CPUs shipped in 2012 (Growing 20%pa.pa)

75% of the things connected to the Internet today are ARM Powered! Gartner

Page 11: Energy Efficiant Computing in the 21c

11

Moore’s Law ... 10nm

100nm

1um

10um

100um

Appr

oxim

ate

Proc

ess

Geo

met

ry

ITRS’99

Tran

sist

ors/

Chi

p (M

)

Tran

sist

or/P

M (K

)

X

... x More Functionality on a Si Chip in 20 yrs!

Gordon Moore. Founder of Intel. (1965)

http://en.wikipedia.org/wiki/Moore’s_law

Page 12: Energy Efficiant Computing in the 21c

12

A Machine for Computing ... Computing: A general term for algebraic manipulation of data ...

... State and Time are always factors (variable weight).

It can include phenomena ranging from human thinking to calculations with a narrower meaning. Wikipedia

Usually used it to exercise analogies (models) of real-world situations; Frequently in real-time (Fast enough to be a stabilising factor in a loop).

... So what part does Hardware and Software play? ... And what about Energy?

y=F(x,t,s) Numerated Phenomena

IN (x)

Processed Data/ Information

OUT (y)

Page 13: Energy Efficiant Computing in the 21c

13

Antikythera c87BC ... Planet Motion Computer

See: http://www.youtube.com/watch?v=L1CuR29OajI

Mechanical Technology

• Inventor: Hipparchos (c.190 BC – c.120 BC). Ancient Greek Astronomer, Philosopher and Mathematician.

• Single-Task, Continuous Time, Analogue Mechanical Computing (With backlash!)

Page 14: Energy Efficiant Computing in the 21c

14

Orrery c1700 ... Planet Motion Computer

• Inventor: George Graham (1674-1751). English Clock-Maker. • Single-Task, Continuous Time, Analogue Mechanical Computing (With backlash!)

Mechanical Technology

Page 15: Energy Efficiant Computing in the 21c

15

Babbage's Difference Engine 1837

The difference engine consists of a number of columns, numbered from 1 to N. Each column is able to store one decimal number. The only operation the engine can do is add the value of a column n + 1 to column n to produce the new value of n. Column N can only store a constant, column 1 displays (and possibly prints) the value of the calculation on the current iteration.

Computer for Calculating Tables: A Basic ALU Engine

(Re)construction c2000

Mechanical Technology

Page 16: Energy Efficiant Computing in the 21c

16

“Enigma” c1940

Data Encryption/Decryption Computer

Mechanical Technology

Page 17: Energy Efficiant Computing in the 21c

17

“Colossus” 1944

Code-Breaking Computer: A Data Processor

Valve/Mechanical Technology

Page 18: Energy Efficiant Computing in the 21c

18

“Baby” 1947 (Reconstruction)

General Purpose, Quantised Time and Data, (Digital) Electronic Computing

Valve/Software Technology

Page 19: Energy Efficiant Computing in the 21c

19

Signal Processing

Bush Radio 7 Transistors

1 Diode

c1960

Evoke DAB Radio 100 M Transistors

2-3 Embedded Processors

c2005

BTH Crystal Set

1 Diode

c1925

Tele-Verta Radio 4 Valves

1 Rectifier Valve

c1945

Page 20: Energy Efficiant Computing in the 21c

20

Vrf=Vi*100

Vlo=Cos(t*1^6)

Vi

Vrf

Vif=Vrf*Vlo

Vlo

Vif

Vro='Bandpass'(Vif*1000)

Vro

Radio as Computation ...

Single-Task (Embedded), Real-Time, Analogue (Close-Enough) Computing

Page 21: Energy Efficiant Computing in the 21c

21

Vrf=Vi*100

Vlo=Cos(t*1^6)

Vi

Vrf

Vif=Vrf*Vlo

Vlo

Vif

Vro='Bandpass'(Vif*1000)

Vro

Radio as Computation ...

Single-Task (Embedded), Real-Time, Analogue (Close-Enough) Computing

Valve Technology

Page 22: Energy Efficiant Computing in the 21c

22

Vrf=Vi*100

Vlo=Cos(t*1^6)

Vi

Vrf

Vif=Vrf*Vlo

Vlo

Vif

Vro='Bandpass'(Vif*1000)

Vro

Radio as Computation ...

Single-Task (Embedded), Real-Time, Analogue (Close-Enough) Computing

Valve Technology Transistor Technology

‘Integrated Circuit’ Technology

Page 23: Energy Efficiant Computing in the 21c

23

Computing is Era and Application Related ...

Computing: Creating Useful Output from Input ... Architecture: The way this is done on the day.

It is the Most Important Product Decision! (HW, SW, Digital, Analogue, Optics, Graphene, Mechanics, Steam, etc)

Page 24: Energy Efficiant Computing in the 21c

24

Electronic era:

1975-2005 System era:

2003-2030

Cascade of Technologies supporting Functional growth ...

... The ‘Law’ started with Wood ⇒ Stone ⇒ Bronze ⇒ Iron

Moore's Real Law: x2 Functionality Every 18mth! Fu

nctio

nal D

ensi

ty (u

nits

)

1960 1980 2000 2020

102

1010

106

1012

100

Page 25: Energy Efficiant Computing in the 21c

25

Computing in a Cool iCon ...

Page 26: Energy Efficiant Computing in the 21c

26

‘A lot’ of Architecture in a Smart Phone ...

... Computation in many forms

Page 27: Energy Efficiant Computing in the 21c

27

Take a Look Inside...

http://www.ifixit.com

The Control Board.

Level-1: Modules

Page 28: Energy Efficiant Computing in the 21c

28

Inside The Control Board (a-side)

http://www.ifixit.com

Level-2: Sub-Assemblies Visible Computing Contributors ...

Samsung: Flash Memory - NV-MOS (ARM Partner) Cirrus Logic: Audio Codec - Bi-CMOS (ARM Partner) AKM: Magnetic Sensor - MEM-CMOS Texas Instruments:Touch Screen Controller and mobile DDR - Analogue-CMOS (ARM Partner) RF Filters - SAW Filter Technology

Invisible Computing Contributors ... OS, Drivers, Stacks, Applications, GSM, Security, Graphics, Video, Sound, etc Software Tools, Debug Tools, etc

Page 29: Energy Efficiant Computing in the 21c

29

Inside The Control Board (b-side)

GPS Bluetooth, EDR &FM

http://www.ifixit.com

Level-2: Sub-Assemblies More Visible Computing Contributors ... A4 Processor. Spec:Apple, Design & Mfr: Samsung Digital-CMOS (nm) ...

Provides the iPhone 4 with its GP computing power. (Said to contain ARM A8 600 MHz CPU and other ARM IP)

ST-Micro: 3 axis Gyroscope - MEM-CMOS (ARM Partner) Broadcom: Wi-Fi, Bluetooth, and GPS - Analogue-CMOS (ARM Ptr) Skyworks: GSM Analogue-Bipolar Triquint: GSM PA Analogue-GaAs Infineon: GSM Transceiver - Anal/Digi-CMOS (ARM Partner)

Page 30: Energy Efficiant Computing in the 21c

30

Level-3: Processor (Nvidea Tegra 3, Around 1B transistors)

NB: The Tegra 3 is similar to the A4/5, but not used in the iPhone

Page 31: Energy Efficiant Computing in the 21c

31

Packing Technology into an iCon

Analogue and Digital Design Embedded Software Mechanics, Plastics and Glass Micro-Machines (MEMs) Displays and Transducers Robotics and Test Knowledge and Know-How Research, Education and Training Components, Sub-Systems and Systems;

Design, Assembly and Manufacture Metrology, Methodology and Tools ... Involving Many Specialist Businesses

... Round and Round the World ...Not-Least from Europe

Page 32: Energy Efficiant Computing in the 21c

32

Architecting your Product : Is the cumulative non-functional choices made to

support the functional need A Good Architecture is the one that ‘survives’ History is written by winners (2nd is for losers)

: Component Performance may be ‘poor’ as long as System Performance is ‘better’ for its use.

Architectural Options ... : Business Model (Cost-of Ownership, ROI), TTM (Productivity, History, IP-

Availability, Know-How), Aesthetics (Power, Quality, Behaviour, Appearance)

: Analogue, Digital, Mechanical, Optical, RF, Software, Plastics, Metal-forming, Manufacturing, Glass, ...

: More than 99% of a Product is Reused from its Predecessor

... is assumed (working is expected!) ... It used to be the only consideration!

Page 33: Energy Efficiant Computing in the 21c

33

Power Philosophy Hardware Dissipates Power ... Chose Underlying Technology for best power efficiency. One size does not fit all (Products, Applications or Instances)

... Software Doesn’t (But it Tells Hardware To!) Chips can literaly melt-down under software ‘instruction’ Make computing hardware power as ‘Activity’ dependent as possible Zero Activity => Zero Power

Make OS/Apps aware of the power/performance situation, and their options for controlling it (Need Indicators and Levers)

... Think System: It’s how the ‘box’ performs, not the components

Page 34: Energy Efficiant Computing in the 21c

34

Core Power Management For Processor and Peripheral Circuitry...

Variable/Gated - Clock Domains

Variable/Switched - Power Domains

Indicators and Levers Allow the software to see and influence what is going on

Principles of Core Power Efficiency... Minimise voltage/frequency (P=CV2f) so that processor has just

enough performance for the current application need Maximises ‘Activity Power’ dependence (Zero Activity => Zero Power) Management by the OS and the Application SW Apply to all on & off-chip zones (not just the CPU) ... Methodology Retention Flops/Latches, Level Shifters, Power-Switch Cells, PLLs

Page 35: Energy Efficiant Computing in the 21c

35

Architectural Energy Efficiency - Parallelism

Processor

f

Input Output

Processor

f/2

Processor

f/2

f

Input

Output

Capacitance = 2.2C Voltage = 0.6V

Frequency = 0.5f

Power = (2.2*0.6*0.6*0.5)CV2f = 0.4CV2f

Capacitance = C Voltage = V

Frequency = f

Power = CV2f

To a limit determined by Amdahl’s or Gustafson’s Law ... Amdahl: Extracted parallelism from existing code (Reuse) Gustafson: Some needs only benefit from parallelism (Custom)

... Actual improvement is application specific.

Page 36: Energy Efficiant Computing in the 21c

36

Architectural Energy Efficiency - Data Moving Data takes significant Energy Becoming the dominant energy consumption in a system

Data Location Avoid moving or copying Data Energy ∝ DataVolume x Speed x Distance>2(3)

Bring the processing to the data

Bring the Processing to the Data Caching is good (depends on implementation) Write back is better than write-through

Local working memory is good Aka Software Caching

... The Arrangement of your Data matters!

Page 37: Energy Efficiant Computing in the 21c

37

All ARM Processors are Power Efficient

Page 38: Energy Efficiant Computing in the 21c

38

Chose The Horses for The Course

... Delivering ~5x speed (Architecture + Process + Clock)

About 50MTr

About 50KTr

Page 39: Energy Efficiant Computing in the 21c

39

Multicore ARM On-Chip ... Heterogeneous Multicore Systems have been in ARM for a long time:

Cortex™-A8 Mali™-400

MP Cortex-M3

Interconnect

Power Manager Application UI & 3D Graphics

Memory

Page 40: Energy Efficiant Computing in the 21c

40

Coherent Multicore Cluster ...

Cortex-A9 Cortex-A9 …

Coherency Logic

Power Manager User Interface

and 3D graphics

Mali-400 MP Cortex-M3

Interconnect

Homogenous Multicore cluster, as part of a heterogeneous system:

Page 41: Energy Efficiant Computing in the 21c

41

Multiple Clusters ... Multiple Homogeneous Coherent Clusters

Cortex-A15 …

Coherency Logic in L2 Cache

Coherent Interconnect

Cortex-A15 Cortex-A15 …

Coherency Logic in L2 Cache

Cortex-A15

Page 42: Energy Efficiant Computing in the 21c

42

Today’s Consumer require a pocket ‘Super-Computer’ ... Silicon Technology Provides a Billion transistors ...

It will be supported with a few GB of memory ...

Computer On a Chip c2010 ...

• Typically 10 Processors ... • 4 x A9 Processors (2x2): • 4 x MALI 400 Frag. Proc • 1 x MALI 400 Vertex Proc • 1 x MALI Video CoDec • Software Stacks, OS’s and Design

Tools/

• ARM Technology gives chip/system designers ...

• Improved Productivity • Improved TTM • Improved Quality/Certainty

http://www.arm.com/

Page 43: Energy Efficiant Computing in the 21c

43

CoreLink™ CCN-504 and DMC-520

ACE

ACE

NIC-400 Network Interconnect

Flash GPIO

NIC-400

USBQuad Cortex-

A15

L2 cache

Interrupt Control

CoreLink™DMC-520

x72DDR4-3200

PHY

AHB

Snoop Filter

Quad Cortex-

A15

L2 cache

Quad Cortex-

A15

L2 cache

Quad Cortex-

A15

L2 cache

CoreLink™DMC-520

x72DDR4-3200

8-16MB L3 cache

PCIe10-40GbE

DPI Crypto

CoreLink™ CCN-504 Cache Coherent Network

IO Virtualisation with System MMU

DSPDSP

DSP

SATA

Dual channel DDR3/4 x72

Up to 4 cores per cluster

Up to 4 coherent clusters

Integrated L3 cache

Up to 18 AMBA

interfaces for I/O coherent accelerators

and IO

Peripheral address space

Heterogeneous processors – CPU, GPU, DSP and accelerators

Virtualized Interrupts

Uniform System

memory

Page 44: Energy Efficiant Computing in the 21c

44

C/C++ Development

Middleware

Debug & Trace

Methodology As Well As Hardware

Energy Trace Modules

Page 45: Energy Efficiant Computing in the 21c

45

big.LITTLE Processing For High-Performance systems...

Tightly coupled combination of two ARM CPU clusters: Cortex-A15 and Cortex-A7 - functionally identical Same programmers view, looks the same to OS and applications

big.LITTLE combines high-performance and low power Automatically selects the right processor for the right job Redefines the efficiency/performance trade-off

big

“Demanding tasks”

LITTLE

“Always on, always connected tasks”

30% of the Power (select use cases)

Current smartphone

big.LITTLE Current smartphone

big.LITTLE

>2x Performance

Page 46: Energy Efficiant Computing in the 21c

46

Fine-Tuned to Different Performance Points

Simple, in-order, 8 stage pipelines

Performance better than mainstream, high-volume smartphones (Cortex-A8 and Cortex-A9)

Most energy-efficient applications processor from ARM

Complex, out-of-order, multi-issue pipelines

Up to 2x the performance of today’s high-end smartphones

Highest performance in mobile power envelope

Cortex-A7 Cortex-A53

Cortex-A15 Cortex-A57

LIT

TLE

bi

g

Queue

Issue

Integer

Page 47: Energy Efficiant Computing in the 21c

47

CPU Migration Migrate a single processor workload to the appropriate CPU Migration = save context then resume on another core Also known as Linaro “In Kernel Switcher”

DVFS driver modifications and kernel modifications Based on standard power management routines Small modification to OS and DVFS, ~600 lines of code

big.LITTLE MP OS scheduler moves threads/tasks to appropriate CPU Based on CPU workload Based on dynamic thread performance requirements

Enables highest peak performance by using all cores at once

big.LITTLE Software

Page 48: Energy Efficiant Computing in the 21c

48

Bringing the Processing to the Data …

288 server nodes in a 4U rack space Public Source: http://www.engadget.com/2011/11/02/hp-and-calxedas-moonshot-arm-servers-will-bring-all-the-boys-to/

Dell + Marvell, Copper

BaiDu + Marvell, Baserock

Press Claims:

Page 49: Energy Efficiant Computing in the 21c

49

... Refining Data into Information

Page 50: Energy Efficiant Computing in the 21c

50

Transferrable Lessons to GP Software Moving data is Power Expensive ... Don’t move data; use it locally (Cache it) Refine it once, use it often (Pre-Process it)

Your CPU Power is work-load independent ... So, get in; get the work done; and get out. Maximise the workload of your code; terminate when complete.

Make your Processing work-load dependent Use a Hypervisor and turn off (at least free) processors not in use.

Page 51: Energy Efficiant Computing in the 21c

51

Societies Challenges in the 21c Urbanisation (Smart Cities) Health (eHealth) Transport Energy (Smart Grid) Security Environment

And whilst our technologies will be an essential part of all solutions, they cannot not fix them without Society’s help and cooperation!

... Energy Efficient Computing will minimise the impact not avert the challenges!

Food/Water Ageing Society Sustainability Digital Inclusion Economics

Having a great time!

Page 52: Energy Efficiant Computing in the 21c

52

Conclusions Putting the power of Computation into the hands of the masses,

has changed the face of Computing (again) Electronic Systems will become Essential to our Lives and the Economy

Power Efficient ES are a major issue to Society Which faces a future with them as a significant energy consumer in themselves

Power Efficiency must be architected into the System Hardware and Software from the beginning To realise the maximum potential out of your Silicon (Avoiding Dark Si) Architect & Design HW as efficiently as possible (reflecting the task) Strive for: No Work => No Power

Equip HW with Indicators and Levers so the System/App can manage it Bring Processing to the Data ... Don’t move Data; move Information Process data Locally Energy ∝ DataVolume x Speed x Distance>2(3)

Page 53: Energy Efficiant Computing in the 21c

53

Computing at the heart of the 21c

ARM: Enabling the Creation of High-Performance Electronic Systems

--- • Productively, Economically and Reliably • Through Hw/Sw Reuse Methodologies • Based on a family of CPU/GPU cores