analyze system performance using iwb - t&vs huang (cade… · cadence® interconnect workbench...

28
Analyze system performance using IWB Interconnect Workbench Dave Huang [email protected] 1

Upload: phungdieu

Post on 08-Mar-2018

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Analyze system performance using IWB

Interconnect Workbench Dave Huang

[email protected]

1

Page 2: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

•  Personal Speech of personal experience

•  I am on behalf on myself

Information

Page 3: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Interconnects Are at the Heart of Modern SoCs Verification Challenges:

§  Checking system behavior

§  Point to point data integrity §  Verify system behavior

§  Understanding system scenarios §  Manage data flow from multiple protocols

§  Concurrent scenarios

§  Cover all system scenarios

§  Validate system performance

Quad Core

Cortex-A15

Quad Core

Cortex-A15

Page 4: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Analyzing Performance : Influence

CoreLink™ CCI-400 Cache Coherent Interconnect128 bit @ up to 0.5 Cortex-A15 frequency

Quad core Cortex-A7

CoherentI/O

device

128b

Mali-T604Graphics

ADB-400 ADB-400

128b 128b

MMU-400 MMU-400

128b 128b

ACE

ACE ACE-Lite + DVM

ACE-LiteACE-LiteACE-Lite

ACE-Lite

NIC-400

Other Slaves

Other Slaves

128b

NIC-400

LCD

DMA

Quad core Cortex-A15

128b

ACE

ACE

AXI4

AXI4

Configurable: AXI4/AXI3/AHB/APB

Configurable: AXI4/AXI3/AHB

GIC-400

ACE-Lite + DVM ACE-Lite + DVM

128b

MMU-400ADB-400 ADB-400

DMC-400

DDR3/2LPDDR2

ACE-LiteACE-Lite

PHYPHY

DDR3/2LPDDR2

Thin Link

Focus for performance of a path requires us to

consider other masters that may influence the

delay

Hardware influences performance. Thin links,

NIC-400 configuration, L2 cache Speed, DDR Controller speed.

Scenario influences : Local traffic conflict,

ACE-Lite Traffic, Processor Activity

Hardware influences : Thin links, NIC-400

configuration, L2 cache Speed, DDR Controller

speed.

Hardware influences : Thin links, NIC-400

configuration, L2 cache Speed, DDR Controller

speed.

Hardware influences : Thin links, NIC-400

configuration, L2 cache Speed, DDR Controller

speed.

Hardware influences : Thin links, NIC-400

configuration, QoS, L2 Cache Speed, DDR

Controller speed.

Scenario influences : Local traffic conflict,

ACE-Lite Traffic, Processor Activity

Scenario influences : Local traffic conflict,

ACE-Lite Traffic, Processor Activity

Scenario influences : Local traffic conflict,

ACE-Lite Traffic, Processor Activity

Modeling all these HW artifacts in TLM is

impractical. Accurate performance analysis

must therefore use cycle-accurate RTL

models

Page 5: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Cadence VIP Library for

AMBA®

Interconnect Workbench Assembly

Performance Measurements

UVM Testbench

IP-specific Traffic Profiles

SoC Traffic Testbench

CoreLink 400 System IP

RTL & IP-XACT

Incisive

Performance Analysis

Verification Closure

Interconnect Workbench Analysis &

Debug

Performance GUI

For Interconnect IP Integration • Performance of use case traffic loads • Verify configuration functionality

For SoC Integration • Validate performance in context of IPs

Benefits Ø  Shorten performance tuning and analysis iteration loop from days to hours Ø  Reduce testbench development time from weeks to hours

Tune Architecture

Manual SoC Testbench

Page 6: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Cadence® Interconnect Workbench Automated Testbench Assembly for CoreLink 400 System IP

Cadence AMBA VIP Library

Interconnect Workbench Assembly

UVM Testbench

CoreLink 400 System IP

RTL & IP-XACT

Testsuite

vPlan

SimVision config

Scripts

AMBA® Designer

Architectural Information

User Configuration

Page 7: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

<prefix>_nic400_mp5x8_(env|tb)

tb _nic400_mp5x8

IWB Generate Operation (Verification Testbench)

S M

S M

S M

S M

S M

M S

M S

M S

M S

M S

M S

M S

M S

clk rst

clk rst

RTL-shell ACTIVE Agent PASSIVE Agent M S AXI4™ Master/Slave Interface

M S AXI3 Master/Slave Interface M S AHB-Lite Master/Slave Interface M S APB Master/Slave Interface

######### IMPORTING THE DUT ########### ###### STARTING GENERATION FLOW #######

Starting IWB: IWB: (c) Copyright 2012 Cadence... ######## IWB CONFIGURATION ######### library path set to project_libraries library name set to iva_nic400_mp5x8 XML file path set to <...>/nic400_mp5x8.xml target path set to fabric_target package prefix set to iva platform configured to UVM_E SIM

##### BUILDING THE HDL TESTBENCH ###### ##### BUILDING THE UVM TESTBENCH ######

Page 8: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

IWB Generate Operation (Performance Testbench) <prefix>_nic400_mp5x8_(env|tb)

tb _nic400_mp5x8

S M

S M

S M

S M

S M

M S

M S

M S

M S

M S

M S

M S

M S

clk rst

clk rst

RTL-shell ACTIVE Agent PASSIVE Agent M S AXI4™ Master/Slave Interface

M S AXI3 Master/Slave Interface M S AHB-Lite Master/Slave Interface M S APB Master/Slave Interface

######### IMPORTING THE DUT ########### ###### STARTING GENERATION FLOW #######

Starting IWB: IWB: (c) Copyright 2012 Cadence... ######## IWB CONFIGURATION ######### library path set to project_libraries library name set to iva_nic400_mp5x8 XML file path set to <...>/nic400_mp5x8.xml target path set to fabric_target package prefix set to iva platform configured to UVM_E SIM

##### BUILDING THE HDL TESTBENCH ###### ##### BUILDING THE UVM TESTBENCH ###### ### GENERATING VERIFICATION CONTENT ### ###### GENERATION FLOW COMPLETE #######

Verification Content UVM e/SV Testbench

VIP Configuration vPlan

(Perf) Test Suite

Performance Generator

Page 9: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Generate Interconnect Testbench

HVL

RTL

Functional Verification Platform Incisive

Verification Computing Platform

Palladium XP

System Development Suite

Interconnect Workbench

Testbench Generation

VIP Meta-data

Library

CoreLink AMBA Designer

Cascaded Interconnect

NIC-400 M

S M M M M M

S S

CCI-400 M

S M M M M

S S

Generate

Page 10: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Generate Interconnect Testbench

Generated Testbench

HVL

RTL

Functional Verification Platform Incisive

Verification Computing Platform

Palladium XP

System Development Suite

Interconnect Workbench

Testbench Generation

VIP Meta-data

Library

Generate

CoreLink AMBA Designer

Cascaded Interconnect

NIC-400 M

S M M M M M

S S

CCI-400 M

S M M M M

S S

IP-XACT

Page 11: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

VIP

Virtual Sequence

Generated Testbench Routing

Model

P P P P P P

A A A

ICM

P P P P P P

A A A

A A A IC

M

Generate Interconnect Testbench

HVL

RTL

Functional Verification Platform Incisive

Verification Computing Platform

Palladium XP

System Development Suite

Interconnect Workbench

Testbench Generation

VIP Meta-data

Library

Generate

CoreLink AMBA Designer

Cascaded Interconnect

NIC-400 M

S M M M M M

S S

CCI-400 M

S M M M M

S S

IP-XACT

Performance Metrics

Verification Metrics

Page 12: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Cadence® Interconnect Workbench Automated Testbench with Normal IWB flow

Cadence AMBA VIP Library

Interconnect Workbench Assembly

UVM Testbench

CoreLink 400 System IP

RTL & IP-XACT

Testsuite

vPlan

SimVision config

Scripts

AMBA® Designer

Architectural Information

User Configuration

Page 13: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Cadence® Interconnect Workbench Automated Testbench with New IWB Flow

Cadence AMBA VIP Library

Interconnect Workbench Assembly

UVM Testbench

Meta Data file

Testsuite

vPlan

SimVision config

Scripts

Standard Format

Architectural Information

User Configuration

Page 14: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

•  Modified AMBA buses are used to save power consumption and improve performance.

•  Every customer has every specific feature on interconnect structure such as memory interleaving and (AMBA + NOC).

•  IP-XACT can’t handle customized buses & specific interconnect structure.

Why is A New Flow Needed ?

Page 15: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

•  Most of masters(Multimedia IPs) generate periodic transactions at real working

•  Require specific traffic generator to create periodic transactions

•  Traffic Synthesizer can mimic the real master’s working

How to Create Real Transactions ?

Traffic Synthesizer

AXI Protocol Abstractor

Read buffer

Write buffer

Page 16: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

•  Why is the user interested in the worst case scenario ?

– Define Hardware Specification

•  Need the various scenarios –  Look for optimized using modes considering DVFS and QoS –  Search for an optimized interconnect structure. Various

Scenarios help the user find some weak points of bandwidth and latency.

Scenarios for Performance Analysis

Page 17: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Worst Case Scenario Example

Minchae

MPEG4  Video  (How  many  f/s?)  

Camera  is  working  with  Scaling  (How  many  f/s?)    

On  Screen  Characters  with  RotaCng  

3D  Graphics  with  Scaling  (How  many  f/s?)  

How  many  windows    are  overlaid  ?    

Page 18: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

•  Typical Wireless ARM based SoC

•  Multimedia Masters –  GPU : 3D Graphics –  MFC : MPEG4 Video –  Display : Overlay Windows –  CAMIF : Scaling, Rotating, Camera Interface

•  Performance of the “Memory Funnel” is key to system performance –  Slave : Memory Controller

Which Master & Slave are Concerned about Performance Analysis ?

Typical ARM based SoC

Non-coherent SoC Interconnect

DDR3

Display

INTC Timer

CSI DSI

UART

GPU

Memory Controller

SATA USB3

System Boot

Peripheral Fabric

USB2

Ethernet

Coherent SoC Interconnect

CPU Cluster

Cortex- A15

Cortex- A15

Cortex- A15

Cortex- A15

L2 Cache

Memory Funnel

CPU Cluster

Cortex- A7

Cortex- A7

Cortex- A7

Cortex- A7

L2 Cache

MFC CAMIF

Page 19: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Information for Traffic Synthesizer

  Read fps bytes/sec Write fps bytes/sec arsize arlength awsize awlength

Master2 5 10368000 30 62208000 4words 4 4words 4

Master5 15 31104000 4words 4

Master6 15 31104000 4words 4

Master7 15 31104000 4words 4  

Master8 15 31104000 4words 4

Master17 10 100700160 2words 4

Master18 10 100700160 2words 4

Master19 10 100700160 2words 4

Master21 10 100700160 10 20736000 4words 4 4words 4

Master22 10 100700160 10 20736000 4words 4 4words 4

Master23 10 100700160 10 20736000 4words 4 4words 4

Master24 15 31104000 4words 4  

Master25 15 31104000 4words 4  

Master28 20 41472000 4words 4  

Master29 20 41472000 4words 4  

Master30 20 41472000 4words 4  

Master31 30 302100480 10 100700160 4words 4 4words 4

Master34 5 10368000 30 62208000 4words 4 4words 4

Page 20: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

How to Analyze Performance

Page 21: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Overview of IPA Shows  which  slave  is  popular  

Shows  overall  transacCon  data  

Page 22: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Maximum Latency in the Worst Scenario

Points  the  maximum  latency  

Shows  the  detailed  informaCon    

Shows  the  overlapped  transacCons    

Page 23: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Compare with different QoS values - Same Master & Same Scenario

QoS  value  is  High  

QoS  value  is  Low  

Two  different  Runs  

Page 24: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Checks with User Definition - Latency

Added  User’s  Checks  

ViolaCon  TransacCons  

Each  ViolaCon  TransacCon  

Page 25: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Checks with User Definition - Bandwidth

ViolaCon  TransacCons  

Each  ViolaCon  TransacCon  

Added  User’s  Checks  

Page 26: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Checking Read Latency – Hit/Miss

Show  the  detailed  InformaCon  

Cache  Miss  Latency  Cache  Hit  latency  

Page 27: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Interconnect Workbench for SoC Interconnect Verification, Performance Analysis

•  Performance Measurement and Analysis for SoC Interconnect –  Explore performance aspects across multiple simulations, multiple

scenarios –  QoS, Outstanding Transactions, Issuing Rate, etc

–  To optimize interconnect –  Topology, QoS Scheme, Transaction Buffer Depths, etc

–  Visualize cycle-accurate performance against a variety of scenarios –  Assess the effect of different traffic scenarios on performance

•  Automated Verification of SoC Interconnect –  Quickly configure verification environment to the interconnect –  Run out-of-the-box tests on the generated interconnect –  Easily update environment to verify changes

•  Mimic Real Transactions with Traffic Synthesizer –  Easily generate periodic transactions –  Easily implement the worst case scenario and analyze the performance

Summary

Page 28: Analyze system performance using IWB - T&VS Huang (Cade… · Cadence® Interconnect Workbench Pre-integration Cycle-accurate Performance Analysis and Verification System IP Data

Q&A