driving performance beyond moore’s...

29
© 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond Moore’s Law

Upload: others

Post on 06-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

© 2018 Arm Limited

Ian SmytheOctober 2018

Driving Performance

Beyond Moore’s Law

Page 2: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

2 © 2018 Arm Limited © 2018 Arm Limited 2

Innovation continues to drive growth and performance demands on our compute devices

5G transformationMulti-day battery life SecuritySmall to large screen

Untethered. Connected. Immersive.

Page 3: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

3 © 2018 Arm Limited

Last year we announced new DynamIQ processors

>20%More mobile performance vs Cortex-A73

SameSustained

performance as Cortex-A73

+40%Infrastructure performance vs Cortex-A72

Performance leadership in mobile

Best possible power profile

Improved performance in infrastructure

Up to

2xmore

performance

Up to

15%better power

efficiency

Up to

10xmore

configurable

For advanced use cases

Higher sustainerperformance

Edge to cloud scalability

Page 4: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

4 © 2018 Arm Limited

Arm Cortex-A portfolio

2008 – 2013 2017201620152014

Year of IP release, volume devices in the subsequent year

Arm big.LITTLE compatible

Cortex-A5xseries

Cortex-A3xseries

Cortex-A9

Well-established, mid-range processor

Cortex-A5/A7

Smallest and lowest power

Armv7-A

Cortex-A15/A17

Infrastructure performance;

mobile efficiency

Cortex-A57Proven

infrastructure performance

Cortex-A72For all

applications

Cortex-A35Smallest, lowest power Armv8-A

Cortex-A53Balanced

performance and efficiency

Cortex-A73For mobile and

consumer

Cortex-A32Smallest, lowest

power 32-bitArmv8-A

Cortex-A55Highest

efficiency mid-range processor

64/32-bit

Cortex-A75Ground-breaking performance for

all markets

64/32-bit

32-bit64/32-bit

64/32-bit64/32-bit64/32-bit

64/32-bit

32-bit

32-bit

32-bit

2018

Cortex-A7xseries

Armv8-AArmv7-A

Page 5: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

5 © 2018 Arm Limited

Arm Cortex-A portfolio

2008 – 2013 2017201620152014

Year of IP release, volume devices in the subsequent year

Arm big.LITTLE compatible

Cortex-A5xseries

Cortex-A3xseries

Cortex-A9

Well-established, mid-range processor

Cortex-A5/A7

Smallest and lowest power

Armv7-A

Cortex-A15/A17

Infrastructure performance;

mobile efficiency

Cortex-A57Proven

infrastructure performance

Cortex-A72For all

applications

Cortex-A35Smallest, lowest power Armv8-A

Cortex-A53Balanced

performance and efficiency

Cortex-A73For mobile and

consumer

Cortex-A32Smallest, lowest

power 32-bitArmv8-A

Cortex-A55Highest

efficiency mid-range processor

64/32-bit

Cortex-A75Ground-breaking performance for

all markets

64/32-bit

32-bit64/32-bit

64/32-bit64/32-bit64/32-bit

64/32-bit

32-bit

32-bit

32-bit

2018

Cortex-A76Laptop-class performance

with smartphone efficiency

64/32-bit

Cortex-A7xseries

Armv8-AArmv7-A

Page 6: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

6 © 2018 Arm Limited

Designing for user experience from small screens to large

Energy savings and power efficiency

Efficient scheduling

• Power efficient cores for low demand tasks and background services

• Fast switching between cores

Power optimization

• Finer-grained speed control

• Autonomous memory power management

• Fast power on/sleep/off management

Page 7: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

7 © 2018 Arm Limited

Designing for user experience from small screens to large

System responsiveness

High single thread performance for fantastic user facing response

• App. launch and closing

• Web browsing

• Productivity applications

Scalability with area efficient octa-core solution

• High CPU availability for performance and responsiveness

Page 8: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

8 © 2018 Arm Limited © 2018 Arm Limited 8

Let’s take a closer look

Page 9: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

9 © 2018 Arm Limited

Arm Cortex-A76 CPULaptop-class performance, smartphone experience

• Built from the ground up with new microarchitecture capabilities

• Built on innovative DynamIQ technology

• Battery life that can outlast your work day

Longer Battery Life

Better energy efficiency

Performance without compromise

IncreasingProductivity

Increased Machine Learning performance

Intelligent Computing

Page 10: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

10 © 2018 Arm Limited

Cortex-A76: Performance efficiency - focus on the user

Cortex-A76 CPU is focused on performance and also performance efficiency

Performance efficiency - extract significantly more performance than any other microarchitecture at similar complexity

Requires intense focus on every aspect of the microarchitecture• More performance from every logic block

Focus on the end-user to enable sustained full-speed performance

• Yes, we also do well on benchmarks

Page 11: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

11 © 2018 Arm Limited

Cortex-A76: Front-end

Front-end built to hide latency at high bandwidth

Multi-level branch-target caches

Hybrid indirect predictor - unparalleled prediction capability

Decode/Rename/Commit

Front-end

L1-ITLB

Bra

nch

p

red

icti

on Instruction Fetch

64K I-Cache

Page 12: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

12 © 2018 Arm Limited

Cortex-A76: Decode/Rename/Commit

4-instruction/cycle, power-optimized decode

High-density decode/rename

Dispatch to out-of-order core and commit unit

Decode/Rename/Commit

Commit

DQ

De

cod

e

Re

gist

er r

en

ame

4-8

inst

ruct

ion

s/cy

cle

4 M

op

s/cy

cle

8 u

op

s/cy

cle

dis

pat

chDis

pat

ch

Execution core

L1 Data cache / MMU

Page 13: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

13 © 2018 Arm Limited

Cortex-A76: Execution core

Uops dispatched to 120-entry issue queue capacity

Dual 128-bit ASIMD/FP execution pipelines

L1 Data cache / MMU

Execution core

IQ

IQ

IQ

Branch

ALU

ALU

ALU/MAC/DIV

FMUL/FADD/FDIV/ALU/IMAC

FMUL/FADD/ALU

Inte

ger

ASI

MD

Page 14: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

14 © 2018 Arm Limited

Cortex-A76: Cache hierarchy and performance

Full cache hierarchy is co-optimized for latency and bandwidth

Sophisticated 4th generation prefetcher

256KB-512KB private L2 with 9-cycle LD-use

2M-4M DynamIQ L3 with 26-31 cycle LD-use

50% performance uplift from Cortex-A750

0.5

1

1.5

2

2.5

3

L1 cache L2 cache L3 cache DRAM

Memory hierarchy bandwidthCortex-A76 vs. Cortex-A75

Page 15: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

15 © 2018 Arm Limited

Accelerating the performance curve in any workloads

Pushing the single-thread performance

• +25% more integer IPC than the Cortex-A75 CPU• +35% higher ASIMD/FP performance• +90% higher memory bandwidth

Boosting mobile experience

• +28% more Geekbench performance• +35% more Javascript performance

Enabling intelligence at the edge

• 3.9x more AI performance

Baseline IPC - frequency upside from here

1.58x1.79x

1.56x1.77x

2.44x

9.7x

SPECINT2K6 SPECFP2K6 Geekbenchv4

Javascript LMBenchmemcpy

GEMM lowp

Cortex-A73 Cortex-A75 Cortex-A76

IPC comparison - iso-process/-frequency

Page 16: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

16 © 2018 Arm Limited

Cortex-A76 CPU delivers premium performance

Peak single-thread performance big.LITTLE performance 5W

Performance(relative scores based on AArch64 SpecInt2K6)

Cortex-A7316nm

Cortex-A7510nm

Cortex-A767nm

Configuration: Cortex-A73 – 2.45GHz, L1 64KB, L3 2MB: Cortex-A75 – 2.8 GHz, L1 64KB, L2 512KB, L3 2MB: Cortex-A76 – 3.3 GHz, L1 64KB, L2 512KB, L3 4MB

2xPerformance improvement

2.1x 1.9x

Page 17: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

17 © 2018 Arm Limited

Cortex-A76

Building for the premium experience for advanced process

DynamIQ Shared Unit

CoreLink CCI-550

Cortex-A55

LPDDR4x

Memory SystemIntegrated TrustZone technology

DMC DMC

High-performance Cortex-A76 implementation3+ GHz in 7nm

Increasing Cortex-A55 CPU private L2 cache

Implementing 4MB L3 cache

Optimized memory system

Other IPs

Page 18: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

18 © 2018 Arm Limited © 2018 Arm Limited 18

It starts with an ecosystem

Page 19: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

1919

Miix 630

NovaGo Always On, Always Connected PCs powered by Snapdragon

Envy x2

Pace of innovation*Requires network connection and will support up to 20 hours of battery life

Yoga

C630

And more…

835

Credit: Qualcomm Technologies, Inc.

Page 20: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

Process Technology

Mobile leadership

2008 2009 2010 2011 2012 2015 2016 201720142013

45nm

32nm

22nm

14/16nm

14/16nm

40nm

28nm

14nm12nm

10nm 1st GenSnapdragon 835

10nm 2nd GenSnapdragon 850

14nmSnapdragon 820

20 nmSnapdragon 810

28 nmSnapdragon S4

45nmSnapdragon S1

Time

Pro

ce

ss N

od

e

2018

Credit: Qualcomm Technologies, Inc.

Page 21: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

Performance

Performance forscenarios that matter

Balancing powerand performance

Sustained performancethat doesn’t throttle

Credit: Qualcomm Technologies, Inc.

Page 22: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

23 © 2018 Arm Limited

Delivering on promises

1.4x 1.3x 1.2x 1.3x1.4x 1.4x

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Speedometer Geekbenchsingle-core

Geekbenchmulti-core

WebXPRT 3 TouchXPRT16 MotionMark 1.0

Cortex-A75 based system performance(relative to Cortex-A73 system)

Improvement across all benchmarks

Over 25% minimum performance uplift

Source: Shrout Research, measured on Lenovo C630 and HP Envy x2 devices

Page 23: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

24 © 2018 Arm Limited

1.3x

Web browsing battery life improvement (relative to Cortex-A73 system)

Extended battery life and thermal constraints on real-system

Running your apps longer Multi-day battery life

1.3x

Time improvement between charge(relative to Cortex-A73 system)

Source: Shrout Research, measured on Lenovo C630 and HP Envy x2 devices

Page 24: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

25 © 2018 Arm Limited

The evolution of the always on, always connected pc

Image created by Arm based off Shrout Research data: download the full whitepaper at www.shroutresearch.com

Page 25: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

26 © 2018 Arm Limited © 2018 Arm Limited 26

The journey ahead

Page 26: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

27 © 2018 Arm Limited

Client Compute CPU roadmap

Cortex-A767nm

202020192018

‘Deimos’7nm

‘Hercules’7nm and 5nm

2017

Cortex-A7510nm

Cortex-A7316nm

Page 27: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

28 © 2018 Arm Limited

Perf

orm

ance

Path to compute performance leadership with efficiency

Cortex-A73Cortex-A15 Cortex-A57 Cortex-A72 Cortex-A75 Cortex-A76 Deimos Hercules

2.5xincrease

Arm Compute

Intel Core i5 U-series

Single-core performance estimates based on SPECINT2k6

Core i5-4300u22nm

Core i5-6300u14nm

Core i5-7300u14nm

28nm 20nm 16nm 16nm 10nm 7nm 7nm 5nm2013

A performance trajectory surpassing Moore’s law

Unmatched year-over-year Arm CPU performance gains

Page 28: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

29 © 2018 Arm Limited

Expanding the mobile experience

• Innovation on mobile from small screens to large is changing the user experience and continues to push growth

• The new premium IP delivers Laptop-class performance

• Arm with its ecosystem is aligning itself to meet customer needs and get ready for 5G evolution for truly connected experiences

Page 29: Driving Performance Beyond Moore’s Lawtest.armtechforum.com.cn/attached/article/A_5_Ian20181109171814… · © 2018 Arm Limited Ian Smythe October 2018 Driving Performance Beyond

3030

Thank YouDankeMerci谢谢ありがとうGraciasKiitos감사합니다धन्यवादתודה

© 2018 Arm Limited