neeraj paliwal senior engineering manager advanced processor development
DESCRIPTION
CAD Challenges For Designing A High Frequency Multi-Core SoC Implementation Of The First-Generation CELL Processor . Neeraj Paliwal Senior Engineering Manager Advanced Processor Development IBM Corporation, Austin TX. Outline. Introduction Design Goals - PowerPoint PPT PresentationTRANSCRIPT
11
22
CAD Challenges For Designing CAD Challenges For Designing A High Frequency Multi-Core A High Frequency Multi-Core SoC Implementation Of The SoC Implementation Of The
First-Generation CELL Processor First-Generation CELL Processor
Neeraj PaliwalNeeraj PaliwalSenior Engineering ManagerSenior Engineering Manager
Advanced Processor DevelopmentAdvanced Processor Development
IBM Corporation, Austin TXIBM Corporation, Austin TX
33
OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion
44
Digital Media ApplicationsDigital Media Applications
55
Design GoalsDesign GoalsDesign for natural human interactionDesign for natural human interaction– Realism requires Supercomputer attributes with extreme floating Realism requires Supercomputer attributes with extreme floating
point capabilitiespoint capabilities2 TFLOPS in the new Playstation3 System2 TFLOPS in the new Playstation3 System
Set new performance standardSet new performance standard– Exploits parallelism while achieving high frequencyExploits parallelism while achieving high frequency
Multiple HF CoresMultiple HF Cores
Foster innovation in Design & MethodologyFoster innovation in Design & Methodology– Holistic Design approachHolistic Design approach– Scalability and Flexibility through Modular designScalability and Flexibility through Modular design
66
OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion
77
Design ChallengesDesign ChallengesTriple ConstraintsTriple Constraints– PowerPower– FrequencyFrequency– CostCost
Design TrendsDesign Trends– SoC and Giga Scale IntegrationSoC and Giga Scale Integration– Multi-Core on a ChipMulti-Core on a Chip
Time to MarketTime to Market
88
System Trends Toward IntegrationSystem Trends Toward Integration
Increased integration is driving processors to take on Increased integration is driving processors to take on many functions typically associated with systemsmany functions typically associated with systems– Integration forces processor developers to address off-load and Integration forces processor developers to address off-load and
acceleration in the design of the processoracceleration in the design of the processor– Integration of bridge chip functionalityIntegration of bridge chip functionality
Memory
Accel
Southbridge
Processor
Northbridge Memory
Cell
Processor
IO IO
99
Giga Scale IntegrationGiga Scale Integration
CPU
Media
SecurityConfig.
IOSynergistic
Processor
Mem.
Contr.
Synergistic
Processor
64b Power
Processor
CPU
Media
Processor
Security
Processor
Network
Processor
Streaming
Graphics
Processor
NIC
GPU
Hardwired
Function
Programmable
ASIC
Cell
Need an innovative Design Methodology for High Frequency Multi-Core SoC
1010
Implementation ChallengesImplementation ChallengesTechnology ScalingTechnology Scaling– Minimize cross chip variations in delay and leakageMinimize cross chip variations in delay and leakage– Array bit cell stability, writability, yieldArray bit cell stability, writability, yield– Growing impact of wire RC vs. device speedGrowing impact of wire RC vs. device speed
11FO4 design within air-cooled power envelope11FO4 design within air-cooled power envelope– Power, Clock, Signal Distribution variation due to hot spots, inductance Power, Clock, Signal Distribution variation due to hot spots, inductance
effects, etceffects, etc– Multi Clock domainsMulti Clock domains– Intra-Chip interconnectionsIntra-Chip interconnections– Global Optimization with “triple constraints”: Frequency, Power, Cost Global Optimization with “triple constraints”: Frequency, Power, Cost
(Die Size and Yield)(Die Size and Yield)
1111
OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion
1212
Holistic Design ApproachHolistic Design ApproachDesignDesign– Cover all aspects of the designCover all aspects of the design
Circuits, Cores, Chips, System, SoftwareCircuits, Cores, Chips, System, Software
Development processDevelopment process– Fast ConvergenceFast Convergence
Top Down / Bottom UpTop Down / Bottom UpEarly Design Planning / Final ConvergenceEarly Design Planning / Final Convergence
– Adaptability and ScalabilityAdaptability and ScalabilityFor long duration projects need to allows for refinement of ideasFor long duration projects need to allows for refinement of ideas
Organizational structureOrganizational structure– Building the best processor development team spans across Building the best processor development team spans across
the globethe globe– Enable Learning and Adaptive to changes in marketEnable Learning and Adaptive to changes in market
1313
Design Methodology PhilosophyDesign Methodology PhilosophyMicro architecture definition must go hand-in-Micro architecture definition must go hand-in-hand with physical floorplan definition – wire hand with physical floorplan definition – wire delays are major component of performancedelays are major component of performance““Divide and Conquer”Divide and Conquer”– Chip hierarchy: macros, units, islands, partitions and chipChip hierarchy: macros, units, islands, partitions and chip– Macro is lowest level floorplannable objectMacro is lowest level floorplannable object– Physical partitioning represented in RTLPhysical partitioning represented in RTL– Each level of hierarchy verified independently (DRC, LVS, Each level of hierarchy verified independently (DRC, LVS,
Equivalence checking)Equivalence checking)
Formal Equivalence Checking required between Formal Equivalence Checking required between RTL and schematicRTL and schematic– Latch points must match – no retimingLatch points must match – no retiming– Performed hierarchically up to the chip levelPerformed hierarchically up to the chip level
VHDL drives physical designVHDL drives physical designDerived data is auditedDerived data is audited
1414
Schematic Illustration of Design HierarchySchematic Illustration of Design Hierarchy
1515
High-Level
Design
Logic Design
Circuit/Physical
Design & Integration
Verification
Global
Processes
Hardware
Validation
Software
Development
Design Specs
Customer Reqs.
Business Plan
RTL Design
Mfg. Data
Workloads
S/W Dev. Kit
STI Development Process
To Manufacturing Sample Hardware To Customers
1616
OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion
1717
Chip/UnitVHDL
CustomVHDL
ArrayVHDL
RLMVHDL
Portals
DADB
MESAAWAN
Sim env(Fusion,
Specman)
Testcases
GenesysProXGEN
Portals/BooleDozer Portals
TestPat
TECH
ChipBench or CadenceFloorplan
Routing
EinstimerTECH
Layout
CadenceComposer
DeviceVIMPowerSpice
Cadence/GYMLayout Editor
Layout
Verity
LVSERIE
PlacementPDSrtl
TECH
CadenceRoute
Layout
DeviceVIM
Verity
DCMRules
3DX
PDM
GlobalNoise
Device VIM
EinsTLT
DCM TimingRule
Gatemaker
TPGTECH
Macro Noise
Noise Rule
Echk
Merged Layout
NiagaraDRC, LVS
STI Chip Design Flow
PhysVIM
NoiseRules
DesignAudit
CPAMLAVA
PowerRule
LVS
TexPower
CadenceComposer
DeviceVIMPowerSpice
Ultrasim
Cadence/GYMLayout Editor
Layout
VerityESPCV
LVSERIE
SVV
ERIE
1818
Design Data ManagementDesign Data ManagementSeven sites & 450+ designersSeven sites & 450+ designers– Need a way to verify that every check has been run on every Need a way to verify that every check has been run on every
piece of data that is going on the chip => this process is called piece of data that is going on the chip => this process is called AuditAudit
– Over the course of the chip development, snapshots of the chip Over the course of the chip development, snapshots of the chip data are going to be needed so that different design teams can data are going to be needed so that different design teams can work with data that is of a certain quality. A work with data that is of a certain quality. A level level can be created can be created to identify that data => this process is called Promoteto identify that data => this process is called Promote
1919
Circuit Design PhilosophyCircuit Design Philosophy
Strict design guidelines to minimize design Strict design guidelines to minimize design variationsvariations– Layout topology check and DFM rules for yieldLayout topology check and DFM rules for yield– Circuit topology and electrical checksCircuit topology and electrical checks– Global active clock pulse limiter for dynamic circuitsGlobal active clock pulse limiter for dynamic circuits– Hold time margin scale with clock path delayHold time margin scale with clock path delay
Reduce design sensitivity to technology Reduce design sensitivity to technology leakageleakage– Limited dynamic logic circuit usageLimited dynamic logic circuit usage– No Low-Vt devicesNo Low-Vt devices
Array yield focusArray yield focus– Array redundancy for bit cell stability failsArray redundancy for bit cell stability fails– Reduced cell stress during readReduced cell stress during read
2020
Clock PhilosophyClock Philosophy
Clock Distribution using Grid-Tree approachClock Distribution using Grid-Tree approach– Minimal global clock skew – HOLD margin built into Minimal global clock skew – HOLD margin built into
latch timing rule latch timing rule – Do not include clock arrival times in chip static timing Do not include clock arrival times in chip static timing
– eliminates dependency on clock distribution – eliminates dependency on clock distribution analysis analysis
– Clock Distribution area is pre-allocated and tuned Clock Distribution area is pre-allocated and tuned concurrently with unit integrationconcurrently with unit integration
Main Mesh
2121
Timing Practices – “Fast Convergence”Timing Practices – “Fast Convergence”
Macro partitioning encouraged to be on Macro partitioning encouraged to be on timing/latch boundariestiming/latch boundariesUnit/Partition/Chip level static timing done early Unit/Partition/Chip level static timing done early and often - progressively improving accuracyand often - progressively improving accuracy– Shell rules -> schematic based rules -> layout extracted Shell rules -> schematic based rules -> layout extracted
rulesrules– Steiner routes -> add wire codes -> 3D extraction -> noise Steiner routes -> add wire codes -> 3D extraction -> noise
upliftuplift
All latches treated as hard timing boundaries, no All latches treated as hard timing boundaries, no transparencytransparencyTransistor level static timing required for all Transistor level static timing required for all macrosmacros
2222
Hierarchical Timing ExampleHierarchical Timing ExampleTiming at 4 Levels of Timing at 4 Levels of Hierarchy:Hierarchy:
Unit (eg: sfx)Unit (eg: sfx) Island (eg: spu core)Island (eg: spu core) Partition (eg: spc)Partition (eg: spc) ChipChip
Hierarchical approach breaks Hierarchical approach breaks down larger problem into down larger problem into manageable pieces (Units)manageable pieces (Units)
Chip Timing run times all Chip Timing run times all paths across all hierarchies.paths across all hierarchies.
Internal Macro Timing Closed Internal Macro Timing Closed via EinsTLT but ALL paths via EinsTLT but ALL paths visible in chip runvisible in chip run
ChipPartition
Island
Unit A
Macro
Macro
Macro
Unit B
2323
Noise Analysis ExampleNoise Analysis ExampleMacro Analysis Unit/Chip Analysis
Noise analysis with focus on transistors and wires
Global analysis with focus on behavior of wires
2424
Power Management PracticesPower Management Practices
Dynamic power is controlled by fine-grain Dynamic power is controlled by fine-grain clock gatingclock gatingLeakage power is managed by adding lower Leakage power is managed by adding lower vt devices only where necessaryvt devices only where necessaryAccurate power estimationAccurate power estimation– Macro level uses circuit simulation and generates a power Macro level uses circuit simulation and generates a power
rule (0-50% input switching)rule (0-50% input switching)– Partition/Chip level uses behavior simulation with specific Partition/Chip level uses behavior simulation with specific
workloads and macro level power rulesworkloads and macro level power rules
2525
Integration FlowIntegration FlowVHDL To Finished LayoutVHDL To Finished LayoutCommon Code And Methodology Infrastructure With RLMCommon Code And Methodology Infrastructure With RLMAdditional Steps Unique To Unit ConstructionAdditional Steps Unique To Unit Construction– Generate Power BussesGenerate Power Busses– Buffer Planning/InsertionBuffer Planning/Insertion– Generate hierarchy design constraintsGenerate hierarchy design constraints– Decap InsertionDecap Insertion– Unit Clock Router, minimize powerUnit Clock Router, minimize power– Routing with noise awareness, wire bendingRouting with noise awareness, wire bending– Generate Power and Redundant ViasGenerate Power and Redundant Vias– Verification and Analysis: Extraction, Timing, IREM, Noise, Meth Verification and Analysis: Extraction, Timing, IREM, Noise, Meth
Check, Density Check, Yield Rule Check, DRC/LVS, VerityCheck, Density Check, Yield Rule Check, DRC/LVS, Verity
Saved Parameters For Each Design Making Rebuild SimpleSaved Parameters For Each Design Making Rebuild Simple– Use Of Existing Designs As Template For New DesignsUse Of Existing Designs As Template For New Designs
2626
Hot Spot AnalysisHot Spot AnalysisExtensive thermal analysis Extensive thermal analysis early in the design cycleearly in the design cycle
Power maps created for use Power maps created for use with package and heat sink with package and heat sink models.models.
Steady state and transient Steady state and transient thermal behavior simulatedthermal behavior simulated
Analysis feedback to chip Analysis feedback to chip floorplan and thermal sensor floorplan and thermal sensor designdesign
2727
Hierarchical VerificationHierarchical Verification
Top Down Specification / Bottom up Top Down Specification / Bottom up ImplementationImplementationTest Generation: provide simulation with Test Generation: provide simulation with good stimulusgood stimulusModel Build, Simulation, and AnalysisModel Build, Simulation, and AnalysisFormal VerificationFormal Verification
2828
Test / Pervasive Design PracticesTest / Pervasive Design Practices
Distributed test functionsDistributed test functions– LBIST engine for coresLBIST engine for cores– ABIST engine for arraysABIST engine for arrays
Distributed debug featuresDistributed debug features– Common debug busCommon debug bus– Centralized trace arrayCentralized trace array
Centralized test and pervasive controlCentralized test and pervasive control– Common strategy for logic debug and performance monitoringCommon strategy for logic debug and performance monitoring– Monitor some activity externallyMonitor some activity externally
Early focus on design bring upEarly focus on design bring up– At speed test (internal chip scan, ABIST, programmable LBIST)At speed test (internal chip scan, ABIST, programmable LBIST)– On chip logic analyzer for debugOn chip logic analyzer for debug– On chip performance monitorOn chip performance monitor– Isolate, start, stop, step controls for lab debug.Isolate, start, stop, step controls for lab debug.
2929
OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion
3030
LessonsLessonsLearnedLearned
Data Translation Time Data Translation Time Open Access DB Open Access DB
Early PDV Planning Early PDV Planning Black box approach Black box approach
Layout automation Layout automation Migration and DFM friendly layouts Migration and DFM friendly layouts
Synthesis to layout loop Synthesis to layout loop Physical/DFM aware synthesis Physical/DFM aware synthesis
Hardware resource Hardware resource Linux based CAD flow for better Linux based CAD flow for better ROI and TATROI and TAT
Communication Communication Wiki based documentation system Wiki based documentation system
Multiple sites and IT/OS Issues Multiple sites and IT/OS Issues Regression suite Regression suite
RecommendationRecommendation
3131
OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion
3232
ConclusionsConclusions
The CELL processor, a multi-core design, was The CELL processor, a multi-core design, was successfully implemented usingsuccessfully implemented using– Innovative design methodologyInnovative design methodology– Good design practicesGood design practices– Rules for modularity and reuseRules for modularity and reuse– Triple Constraints for optimum design pointTriple Constraints for optimum design point
Correct operation has been observed with good Correct operation has been observed with good Frequency range (over 3.2GHz)Frequency range (over 3.2GHz)
Sony/SCEI announced PS3 System in 5/05Sony/SCEI announced PS3 System in 5/05
Recommendations being implemented in the next Recommendations being implemented in the next generation chips!generation chips!
3333
AcknowledgementAcknowledgement
The Authors: Dac Pham (APDAC 2006 Presentation), Han-The Authors: Dac Pham (APDAC 2006 Presentation), Han-Werner Anderson, Erwin Behnen, Mark Bolliger, Sanjay Werner Anderson, Erwin Behnen, Mark Bolliger, Sanjay Gupta, Peter Hofstee, Paul Harvey, Charles Johns, Jim Kahle, Gupta, Peter Hofstee, Paul Harvey, Charles Johns, Jim Kahle, Atsushi Kameyama, John Keaty, Bob Le, Sang Lee, Tuyen Atsushi Kameyama, John Keaty, Bob Le, Sang Lee, Tuyen Nguyen, John Petrovick, Mydung Pham, Juergen Pille, Nguyen, John Petrovick, Mydung Pham, Juergen Pille, Stephen Posluszny, Mack Riley, Joseph Verock, James Stephen Posluszny, Mack Riley, Joseph Verock, James Warnock, Steve Weitzel, Dieter Wendel.Warnock, Steve Weitzel, Dieter Wendel.
Deep collaboration and many contributions from the entire Deep collaboration and many contributions from the entire SONY-Toshiba-IBM team who worked tirelessly side-by-side SONY-Toshiba-IBM team who worked tirelessly side-by-side on the design of this processor.on the design of this processor.
The executive management teams of the three companies The executive management teams of the three companies who provided management insight and created the right who provided management insight and created the right business conditions for this project.business conditions for this project.
3434
Thank You