at - leveraging advanced physical ip to deliver optimized ... · leveraging advanced physical ip to...
TRANSCRIPT
Leveraging Advanced Physical IPto Deliver Optimized SoCto Deliver Optimized SoC
Implementations at 40nm and below
John Heinlein, Ph.D.Vice President MarketingARM Physical IP DivisionARM Physical IP Division
November 19, 2010
1
Consumer Driving Decade of Change
2
Connectivity Driving The Future
Mobile Internet10B+ Units?
The Internetof things
PC
Desktop Internet1B+ Units/Users
100MM+ Units
Minicomputer10MM+ Units
of things100B+ UnitsMainframe
1MM+ Units
3
Convergence Driving SoC Requirements
ARM C t S b t
Comprehensive range of video standards on NEON™ and
Mali-VE
Software stack optimized for
Cortex™ + Mali™ 1.1 & 2.01.1
JSR237JSR237
ARM Compute Subsystem
VidGraphics
Tools support for CoreSight™
Mali VE
Interconnect &
VideoGraphics processor
Foundation IP & POPsInterconnect & memory controllers
Coherency and virtualization Secure systemson TrustZone®
4
MobiCore
Integration Complexity & Specifications
• 1 GHz or more• 0.5 mW per MHz• 1 mW standby power • 400 MHz or more
ARM C t S b t
• 1 mW standby power • 400 MHz or more• HD memory banks
ARM Compute Subsystem
Graphics
Interconnect &
VideoGraphics processor
Foundation IP & POPsInterconnect & memory controllers
• DFI 2.0 CompliantD t R t 1600 Mb
• MBIST Support• Software compatibility
5
• Data Rates > 1600 Mbpsp y
Outline: Addressing SoC ChallengesInnovation in ARM Artisan physical IP
Advanced Process Technologies
Co-developed optimizations for coresCo-developed optimizations for cores
Bringing advanced technology to older geometries
6
World-class ARM SoC Implementations
7
ARM Offers Broadest Foundry SolutionExtensive Process Optimized IP Portfolio
8
designstart.arm.com
Fully Independent Physical IP Provider
ARM Physical IPARM Physical IP
LogicMemoryy
GPIODDR
9
Artisan Physical IP:Technical Leadership
10
ARM Leading Multi-channel LibrariesARM multi-channel length advantages include:
Finer grained tradeoffs between performance and leakage reduction
SAME SAME FOOTPRINTFOOTPRINT
performance and leakage reductionBetter low voltage operation as long channel device RVT performance degrades less with lower voltages thandegrades less with lower voltages than HVT deviceLower cost because multi channels require no additional mask layers Minimum channel length cell
1
SC12
ARM footprint compatible MC libraries work: 2
SC12or
SC9
With standard synthesis and place & route toolsWithout the need for any additional d i t Footprint Compatible Multi
Long channel length cell, pin compatible to (1)
11
design steps Footprint Compatible Multi Channel Length Libraries
Benefits of Multi-channel Librariese
LVT
RVT
min
+x
Leak
age
HVT
RVTmin
+x
+x+y
minL
+x+y+x
+x+y
• Provides larger leakage/performance spread• Fine resolution for design trade-offs
12
Performance
New Market-leading Memory ProductsNew in 2010: Fundamental memory compiler redesignProject Goals
Deliver market leading PPA (Power Performance and Area)Deliver market leading PPA (Power, Performance and Area)Develop modular development methodology to speed time to market
Deploying now at 65nm, 55nm, 40nm and 28nm nodes
Example: Planned SMIC 40LL high density compilers superior area
Compiler Instance Vendor Best area(sq µm) Result
High density single port 8192x16ARM 49600 ARM 21% smaller
Competitor 62600Competitor 62600
High density single port 2048x16ARM 14200 ARM 17% smaller
Competitor 17100
High density two port 32x34ARM 3000 ARM 33% smaller
13
High density two port 32x34Competitor 4500
Physical IP New Feature Timeline
2011-2012
• Variation tolerant: High performance & low power yieldimprovement technology
2010
• Lithography-aware layout
• Streamlined high density and low Vdd memory compilers• Dual row programmable fail safe GPIOs
2008
2009
• First commercially available multi-channel libraries
Lithography aware layout • Processor optimization packs launch
2007 • Extended memory test features
y
2006 • DFM-aware layout • Power format support (UPF/CPF)
14
2005 • Integrated memory power management
AdvancedProcess Technologies
15
ARM 32/28nm Platform
“G”Platform
“LP”Platform
28nm28nm PlatformPlatform28HP 28HPL
28nm28nm PlatformPlatformLogicLogic
MemoryMemoryGPIOGPIO
32LP28LPGPIOGPIO
DDRDDR28LP
28HPP 28SLP28HPP 28SLP
16
Common Platform 28nm LP Platform
• Rich foundry-sponsored foundation IP
Comprehensive 28LP Physical IP platformComprehensive 28LP Physical IP platform
• 2 Libraries, M lti Vt/Ch l
y p• End-user licensable enhanced IP• Multi-channel logic products for 9 and 12 track• Market-leading memory compilers:
• Write Assist Mixed Vt flexible power gating
Multi-Vt/Channel• Power Mgmt, ECO
LogicLogic
• 10 Memory• Write Assist, Mixed Vt, flexible power gating• Low voltage operation
Processor Optimization Pack for CortexProcessor Optimization Pack for Cortex--A9A9
10 Memory CompilersMemoryMemory
GPIOIOIO
• Targets >1 GHz Cortex-A9 (worst case)
Processor Optimization Pack for CortexProcessor Optimization Pack for Cortex A9A9 • GPIOIOIO
• Cortex-A9ProcessorProcessor
• IP available in Q4 2010 • Test chip tape out Q4 2010
Product Availability and Silicon ValidationProduct Availability and Silicon ValidationCortex A9 Optimizations
ProcessorProcessorOptimizationOptimization
17
p p
Requires Major Commit to Si ValidationValidated ARM Physical IP and integrated SoC design flowB d f i iBroad range of experimentation needed for proven success:
Early chips to decide architecture choices Cortex-M3IBM 28nm
TC2-shrinkIBM 28nmJun 2010
y pSilicon validation to ensure IP qualityPrototype implementation for proven power / performance / area achievement
IBM 28nmAug 2009
power / performance / area achievementARM Core Proto
SEC 32nmFeb 2010
E l C t M3 32LP TC1aC t M3 C t M3 32LP TC1b
32LP TC2SEC 32nmFeb 2010
18
ExplorerIBM 32nmJul 2008
Cortex-M3IBM 32nmFeb 2009
32LP TC1aIBM 32nmJun 2009
Cortex-M3IBM 32nmOct 2008
Cortex-M3GF 32nmMay 2009
32LP TC1bSEC 32nmJul 2009
ARM TSMC 28nm HP Platform
• End-user licensable enhanced IPM lti h l l i d t f 9 d 12 t k
Comprehensive 28nm Physical IP platform
• 2 Libraries, M lti Vt/Ch l• Multi-channel logic products for 9 and 12 track
• New architecture memory compilers:• Mixed Vt, mixed channel length, flexible
power gating
Multi-Vt/Channel• Power Mgmt, ECO
Logic
• 6 Memory
• > 2GHz target for Cortex A9 in development
Processor Optimization Pack for Cortex-A9
6 Memory CompilersMemory
DDR2/3IO• > 2GHz target for Cortex-A9 in development
Product Availability
• DDR2/3IO
• Cortex-A9Processor• Evaluation libraries available Now• Alpha libraries in Q4 2010• EAC production release in Q2 2011• Testchip tape out in Q1 and Q2 2011
Cortex A9 Optimizations
ProcessorOptimization
19
• Testchip tape out in Q1 and Q2 2011
ARM Platform for TSMC 40LP Process
• Multi-channel logic products for 9 and 12 trackNew architecture memory compilers:
Comprehensive Physical IP Platform• 9 and 12 track
standard cells, Multi-Vt and Multi-channelLogic
• New architecture memory compilers:• New low leakage mode• Embedded scan chains for DFT• Long channel usage reduce mask cost
• Power Mgmt, ECO
g g
Targeting 1GHz Cortex A9 available JAN 2011
Performance Optimization Package for Cortex-A9 and Cortex-A5
• 5 Memory CompilersMemory
• Targeting 1GHz Cortex-A9 available JAN-2011• 600+ MHz target for Cortex-A5 in development
Product Availability and Silicon Validation
• A9 Optimized IP• A5 Optimized IP
ProcessorOptimization
• IP available – NOW (Production PDK)-• Test chip tape out Sept 2009 • Test chip report Jan 2011
Product Availability and Silicon Validation
• DDR 3/2 PHY• GPIO 2V5 Gox (In Dev)Interface
20
• Test chip report – Jan 2011
Processor OptimizationPacksARM ArtisanARM Artisan
21
Processor Optimization Pack “POP”O ti i f fOptimize for performance or powerComprehensive implementation solutionSilicon proven resultsFast time to market Achieve up to 25% increase in
Cortex-A9
Fast time to marketperformance or more than 80% reduction in leakage power
Optimized Physical IP
ImplementationARM ImplementationKnowledge
ARMBenchmarking
22
Where Do the Gains Come From?S C GExample: TSMC 40nmG improvements vs. baseline
RTL11%
Physical IP45%
Flow19%
45%
FloorplanFloorplan25%
23
Percentage of overall performance improvement by technique
Physical IP Optimizations: L1 Memories
Architectural range optimized for L1 cache sizes & performanceD
Low profile feature setIO Tuned to Processor requirements
Physical IP45%
performanceDQA
45%
CENWENCLK
SI/SO/SE
IO Scan Chain(increased testability/ reduced pattern count)
Power Efficiency OptionsGL Bitcell Option
24
+ Instance based characterisation for variety of operating conditions
Physical IP Optimizations: LogicHigh Performance Macro: targeting 2GHz (LVt, OD)
40nm base libraries very, including complex cells / i d i h ( 1000 ll )varying drive strengths (over 1000 cells)
Physical IP45%
Supplemental “High Performance Kit”Base 12-track library strain optimized for performance
+ Tapered cells added to library+ Tapered cells added to library+ Increase drive strength / beta ratio of cells for
better stage gain+ Optimized FF’s (CLK Q , small setup)+ Additional AOI / OAI functions
Result: 60%+ hit rate on critical paths, resulting in additional 80MHz frequency
25
additional 80MHz frequency
Processor Optimization Packs on 40G / 40LP
ARM ANNOUNCES PROCESSOR OPTIMIZATION PACKPRODUCT FAMILY FOR ARM CORTEX-A9PRODUCT FAMILY FOR ARM CORTEX-A9
ARM delivers optimized Artisan physical IP enabling SoC designers to achieve up to 1.7 GHz performance on TSMC 40nm G process in worst case conditions
Cambridge, UK – November 9, 2010 – ARM today announced the immediate availabilityof the ARM® Cortex™-A9 Processor Optimization Packs (“POPs”). ProcessorOptimization Packs leverage ARM Artisan® physical IP to enable customers to achievetechnology leading performance or power targets on their Cortex-A9 implementations intechnology leading performance or power targets on their Cortex-A9 implementations inthe shortest time to market. A silicon-proven POP is available now TSMC 40nm G processtechnology. The Cortex-A9 POP on TSMC 40nm LP process technology will be availableto customers in January 2011.
26
Cortex-A9 TSMC 40G – Speed Optimizedze
d
2GHzTypicalSilicon
Silicon Devices
ptim
iz Hard Macro Implementation1.71GHzOD, <3% LVt
ARM DevelopedHard Macro
nce
O Implementation with Enhanced IP- High performance kit- Multi-channel libraries- Fast cache instances
1.29GHzRVt
Implementation using Cortex-A9
Performance
Implementation with Enhanced IP- Multi-channel libraries
M l i Vform
a - High speed standard cell architectures
950MHz
RVt
Implementation with standard physical IP
Performance Optimization Pack
- Multi-Vt- High speed memories
Perf
Implementation with Foundation IP850MHz(out of the box)
RVtstandard physical IP
Partner Licensed IPPartner Licensed IP
Partner Licensed Macro
27
p(out o t e bo )
Foundry Funded IPFoundation IP-worst case conditions; SS, 0.81V, 125C (Nominal) ; SS, 1.0V, 105C (Overdrive)
Cortex-A9 TSMC 40G – Power Optimizedem
ent Silicon Devices
800 MHz Speed5mWPower shut-off
using PMK
mpr
ove Hard Macro Implementation ARM Developed
Hard Macro0.66W50nm MC, RVT/HVT
wer
Im Implementation with Enhanced IP- High performance kit- Multi-channel libraries- Fast cache instances
Implementation using Cortex-A9 Performance
Optimization Pack1 25W
1.15W50nm MC, RVT/HVT
Implementation with Enhanced IP- Multi-channel libraries
M l i Vge P
ow - High speed standard cell architectures
Implementation with standard physical IP
3.83W40nm MC, RVT
1.25W50nm MC, RVT
- Multi-Vt- High speed memories
Leak
ag
Implementation with Foundation IP
p y
Partner Licensed IPPartner Licensed IP
Partner Licensed Macro
3.96W RVT
28
L pFoundry Funded IPFoundation IP
- All worst case conditions; SS, 0.81V, 125C
Cortex-A9 TSMC 40LP – Speed Optimizedze
d 1.15 GHz(Typical Silicon)
Silicon Devices
ptim
iz
Implementation with Enhanced IP- High performance kit- Multi-channel libraries
Implementation using Cortex A9 Performance
1.0 GHzLVt & OD
nce
O
Multi channel libraries- Fast cache instances- High speed standard cell architectures
Cortex-A9 Performance Optimization Pack
0 78 GH
0.85 GHzMixed Vt
Implementation with Enhanced IP- Multi-channel libraries
M l i Vform
a
Implementation with standard physical IP
0.78 GHzRVT
0 6 GHz - Multi-Vt- High speed memories
Perf
Implementation with Foundation IP
p y
Partner Licensed IPPartner Licensed IP0.5 GHz
0.6 GHz
29
pFoundry Funded IPFoundation IP
(out of the box)
- All worst case conditions; SS, 1.08/0.99V, 125C
Processor Optimization Pack for 32nm
ARM ANNOUNCES 32NM CORTEX-A9 OPTIMIZATIONSWorld’s first 32nm Cortex™- A9 core test chip implemented on
Samsung HKMG process using ARM Processor Optimization PackSamsung HKMG process using ARM Processor Optimization Pack
Santa Clara, CA – November 9, 2010 – ARM today announced their newest optimizationpackage for the ARM® Cortex™-A9 processor, targeting Samsung 32nm LP High-K MetalGate (HKMG) process technology. This ARM Processor Optimization Pack (POP) providesa highly tuned foundation for implementing Cortex-A9 processors in low power, mobileapplications. Based on ARM Artisan® optimized logic and memory physical IP, the POP isalso supported by implementation knowledge and ARM benchmarking, providing a richfoundation for leading edge System-on-Chip (SoC) designs. The Processor OptimizationPack enables operation over 1 GHz, and is available for immediate licensing from ARM.
30
Cortex-A9 Samsung 32LP – Speed OptimizedSilicon DevicesSilicon Devices
zed
Partner Licensed IP
Foundry Funded IP1.47GHz
typical
Silicon DevicesSilicon Devices
Opt
imiz
1.24GHz10% LVt, OD
Implementation with Enhanced IP-SC12MC LVT C30 Base Library-SC12MC LVT C30 HPK-SC12MC RVT C30 HPK-L1 Fast Cache Instances RVT
Implementation with Enhanced IP-SC12MC LVT C30 Base Library-SC12MC LVT C30 HPK-SC12MC RVT C30 HPK-L1 Fast Cache Instances RVT
ance
O
Implementation using Cortex-A9 Performance
Optimization Pack
1.08GHz10% LVt
rfor
ma Implementation with Enhanced IP
-SC12MC RVT C30 HPK-L1 Fast Cache Instances RVT
Implementation with Enhanced IP-SC12MC RVT C30 HPK-L1 Fast Cache Instances RVT980MHz
RVt
Implementation with Foundation IP-SC12MC RVT C30 Base Library-L1 instance from memory compiler SP-RF-HS
Implementation with Foundation IP-SC12MC RVT C30 Base Library-L1 instance from memory compiler SP-RF-HS
Per
795MHz(out of the box)
Implementation with Standard Physical IP
31
Standard Physical IP
- All worst case conditions; SS, 1.045/0.95V, -40C
32nm Cortex-A9 Silicon Results
S d ti i d ARM C t A9 hiSpeed-optimized ARM Cortex-A9 achieves1.24 GHz on low power processSilicon proof-point of Common Platform 32nm
( ) G ( G)low power (LP) High-K Metal Gate (HKMG) process technology
1 61.8
z) 800
900
W)
0.81
1.21.41.6
quen
cy (G
Hz
WC 400
500
600
700
800
Pow
er (m
W
00.20.40.6
Max
Fre
q WCTYP
0
100
200
300D
ynam
ic
32
0.76V 0.9V 1.045V
Voltage654 1080 1240
Frequency (MHz)
Breadth of Portfolio
ARM Artisan Foundation IPARM Artisan Foundation IP
33
ARM Offers Broadest Foundry SolutionExtensive Process Optimized IP Portfolio
34
designstart.arm.com
Physical IP for Embedded ProcessorsUltra Low Power Mainstream High Performance
TSMC CV011LPTSMC CM011LPTSMC CL011LVPTSMC CE018FG
Tower 110GTSMC CM011G_Hybrid
Grace 180GTSMC 180G
GF 110GSMIC Logic011TSMC CL011G
Leakage-optimizedUlt hi h d it
TSMC CE018FGGF 180ULL
TSMC 180G
Area-optimizedUlt hi h d it
Performance optimizedUltra high density
standard cellsUltra low power memories
Ultra high density standard cellsHigh density memories
optimizedHigh densitystandard cellsHigh speed
35
memoriesLow power I/Os
memories High speedmemories
designstart.arm.com
Th k !Thank you!
36