accelerating system-level design and multi-processor systems … · 2004-12-05 · system...
TRANSCRIPT
© 2002© 2004 Altera Corporation
Accelerating System-Level Design and Multi-Processor Systems in FPGA
Accelerating System-Level Design and Multi-Processor Systems in FPGA
AgendaAgendaAccelerating System-Level Design
System Integration IssuesSOPC BuilderInterfacing to External Processors Summary
Multi-Processor Systems in FPGAWhy Use Multiple Processors?Multi-Processor ArchitecturesDesign ConsiderationsSoftware Development
2© 2004 Altera Corporation
System Integration IssuesSystem Integration IssuesMatching I/O Specifications− Voltage Standards− Timing Requirements
Connecting the Block Interfaces− Control Signal Types − Control Signal Assertion Levels− Data Widths− Transaction Timing
Domain Crossing− Interface Domain− Clock Domain
Time− Develop− Change Orders− Verification
3© 2004 Altera Corporation
Data
Embedded System IntegrationEmbedded System Integration
Address
Address Decoder
Processor (Bus Master)
32-BitInterrupt
Controller
Address
Data
Ethernet (Bus Master)
32-Bit
Slave 316-Bit
Slave 18-Bit
Slave 564-Bit
Slave 432-Bit
Slave 232-Bit
Arbiter
Width-Match Width-MatchWidth-MatchWidth-Match Width-Match
5© 2004 Altera Corporation
SOPC Builder: System IntegrationSOPC Builder: System IntegrationSingle Master
Multiple Slaves Multi-Master− Slave-Side Arbitration− Optimize for Throughput− Patch Panel Selection
Automatic Generation− Datapath Logic− Chip Selects− Arbitration Logic− Timing
7© 2004 Altera Corporation
SOPC Builder: IntegrationSOPC Builder: Integration
Width-Match
Interrupt Controller
Address Decoder
Arbiter
Width-Match
Arbiter
Width-Match
Arbiter
Width-Match
Arbiter
Width-Match
Arbiter
SOPC Builder-GeneratedAvalon™
Switch Fabric
Wait-State Generation
Data Multiplexing
Processor(Bus Master)
32-Bit
Ethernet (Bus Master)
32-Bit
Slave 18-Bit
Slave 232-Bit
Slave 316-Bit
Slave 432-Bit
Slave 564-Bit
8© 2004 Altera Corporation
9© 2004 Altera Corporation
Avalon InterfaceAvalon Interface Avalon Signal Typesreset
chipselectaddress
byteenableread
readdatawrite
writedatadata
waitrequestreadyfordata
dataavailabledatavalid
flushbegintransfer
endofpacketirq
irqnumberclk
resetrequest
All Signals AvailableIn Positive or Negative
Form
Avalon Signal Typesreset
chipselectaddress
byteenableread
readdatawrite
writedatadata
waitrequestreadyfordata
dataavailabledatavalid
flushbegintransfer
endofpacketirq
irqnumberclk
resetrequest
All Signals AvailableIn Positive or Negative
Form
An Open Standard− Simple, Synchronous Interface− Switch Fabric Interconnect
Controls Signal Timing Matching Logic Levels
Many Signal Types Supported− Peripheral Uses Only the Signals it Needs
Any Combination of Signals Possible− Support for Arbitrary Setup Time, Hold
Time & Wait StatesSwitch Fabric Generated in SOPC Builder− Eliminates Tedious HDL Writing− Increases Intellectual Property (IP) Re-
Use− Decreases Development Time
Avalon Switch FabricAvalon Switch FabricPeripherals Can Be Dedicated or Shared− Shared Connections Arbitrated on the Slave-Side
Simultaneous Data Transfers Boost Throughput
System CPU(Master 1)
DMA(Master 2)
I/O1
I/O2
DataMemory(32-bit)
ProgramMemory
Masters
Slaves
ArbiterArbiter
Avalon Switch Fabric
VGAController
(8-bit)
VGAController
(8-bit)Data
Memory(32-bit)
ArbiterArbiter
10© 2004 Altera Corporation
SOPC Builder: System GenerationSOPC Builder: System GenerationSoftware
Development-Kit Generation
C Header files
Peripheral Drivers
Software IDE
Software Debuggers
RTOS
Simulation Test Bench-Generation
System GenerationVHDL/Verilog HDL
11© 2004 Altera Corporation
11
12© 2004 Altera Corporation
SOPC Builder: ApplicationsSOPC Builder: Applications
Custom Microcontroller
Ethernet I/FEthernet I/F
RAMRAM
UARTUART
SPISPI CustomCustom
Multi-Processor
CustomCustom
ExternalCPUor
DSP
ExternalCPUor
DSP Bus
I/F
Bus
I/F
UARTUART
PCIPCI
SPISPI
SPISPI
Companion Chip
CustomCustom
SPI+SPI+
SPISPIB
us I/
FB
us I/
F
UARTUART
PCIPCI
ExternalCPUor
DSP
ExternalCPUor
DSP
Interfacing External ProcessorsInterfacing External ProcessorsIncumbent Processor or Digital Signal ProcessorAdd System Elements − Peripherals − Custom Logic− Accelerators
SOPC Builder Glues it TogetherMissing Element is the Interface
External Processor InterfaceExternal Processor Interface
Avalon Switch FabricAvalon Switch Fabric
PeripheralPeripheral
Slave PortSlave Port
Custom IPCustom IP
Slave PortSlave Port
AcceleratorAccelerator
Slave PortSlave Port
ExternalProcessorExternal
Processor
13© 2004 Altera Corporation
Classes of Processor InterfacesClasses of Processor Interfaces
14© 2004 Altera Corporation
Complex & Simple Interfaces− Complex
PCI(X)(Express), Rapid I/O™ & HyperTransport™TechnologiesSolved with Specialized Logic Blocks & Software Protocols
− SimpleCommon on Most Embedded ProcessorsAddress, Data, Chipselect, R/W control, Etc.Simple Software Control of Memory-Mapped Peripherals
Bus Frequency, I/O Timing, & I/O Standard − FPGAs Are Rich in I/O Standards− FPGA Tools Provide I/O Timing− High-Speed Busses Require High-Speed Design
TechniquesSee High-Speed Design Seminar Notes
Control Signal Matching Data Transfer LatencyClock Domain Crossing
Processor Interface IssuesProcessor Interface Issues
15© 2004 Altera Corporation
Simple Processor InterfaceSimple Processor InterfaceMaps FPGA to a Memory Region, SOPC Builder Decodes Each Element Operates Synchronous to the Processor’s Bus ClockTranslates Processor Signal Types to Avalon Equivalents
16© 2004 Altera Corporation
DMA_0DMA_0
MEM_0MEM_0
MEM_NMEM_N
MMMM
MM
SS
SS
CUST_0CUST_0
CUST_NCUST_N
SS
SS
MM
MM
AvalonSwitchFabric
AvalonSwitchFabric
CPU-to-Avalon I/FCPU-to-Avalon I/F
AddressAddressExternalProcessorExternal
Processor Avalon MasterAvalon Master
MM
MM
MM
SS
SS
SS
SS
Exte
rnal
Mem
ory
Inte
rfac
eEx
tern
al M
emor
y In
terf
ace
MM
MMMMDataData
ControlControl
FPGAFPGA
Intel PXA255 Example Intel PXA255 Example
10/100/1G EMACOr 802.11x MAC
10/100/1G EMACOr 802.11x MAC
Backplane Interface
Backplane Interface
USB 2.0 /FireWire
USB 2.0 /FireWire
Integrated Drive Electronics (IDE)Integrated Drive Electronics (IDE)
Visual Display Unit
(LCD/VGA/ SVGA)
Visual Display Unit
(LCD/VGA/ SVGA)
GPIOs(LEDs,
Pushbuttons)
GPIOs(LEDs,
Pushbuttons)UARTUART
Motor Drive(PWM / Stepper)
Motor Drive(PWM / Stepper)
CustomPeripheralCustom
Peripheral
Avalon Sw
itch FabricA
valon Switch Fabric
Altera® FPGAIntel PXA255
Processor InterfaceProcessor Interface
17© 2004 Altera Corporation
VLIO as an Avalon Master PortVLIO as an Avalon Master PortIntel PXA255 Variable Latency I/O (VLIO) Uses a Bi-Directional Data Path, RDY Signal to Add Wait StatesInterface Separates DATA into Read Data & Write Data Paths
VLIO AvalonRDY
MA [25:0])
nPWE
MD[31:0]
nOE
MD [31:0]
DQM[3:0]
EXT_IRQ
MEMCLOCK
RESET
waitrequest
address
write_n
writedata
read_n
readdata
Byteenble[3:0]
irq
clk
reset
Altera FPGAEMIF
RESET
MEMCLOCK
RDY
EXT_IRQ
reset
clk
waitrequest
irq
readdata[31:0]MD [31:0]
writedata[31:0]
nOE
MA[25:0]
nPWE
DQM[3:0]
address [ n:2]
write_n
byteenable[3:0]
read_nPXA
255
VLI
O
18© 2004 Altera Corporation
Relevant Verilog Code to ImplementRelevant Verilog Code to Implement
module ext_proc_if ( //port declarations)//signal declarationsassign a_write_n = e_pwe_n;assign a_read_n = e_oe_n;assign a_addr = {e_ma, 2'b0};assign e_rdy = a_waitrequest; assign a_be_n = e_dqm_n;// work out the bi-directional data bus// if output enable is low, then get the data from the readdata path of the Avalon switch fabricassign e_data = (!e_oe_n && !e_cs_n)? a_data_read : 'bz; // assign the Avalon Switch Fabric write data path to athe a_data_write net (i.e. the incoming data..assign a_data_write = a_write_data;always @(!e_cs_n)
beginif (!e_pwe_n ) a_write_data = e_data;
endendmodule
19© 2004 Altera Corporation
EMIF to Avalon Master PortEMIF to Avalon Master PortTexas Instruments (TI) DSP External Memory Interface (EMIF) Uses a Bi-Directional Data PathInterface Separates DATA into Read Data & Write Data Paths
EMIF AvalonARDY
ADDRESS [21:2])
AWE_n
DATA [31:0]
ARE_n
DATA [31:0]
BE[3:0]
EXT_IRQ
E_CLOCK
reset
waitrequest
address
write_n
writedata
read_n
readdata
byteenble
irq
clk
reset
Altera FPGAEMIF
RESETE_CLOCK
ARDY
EXT_IRQ
resetclk
waitrequest
irq
readdata[31:0]
DATA [31:0]writedata[31:0]
AOE_n
ADDRESS [ n:2]
AWE_n
BE[3:0]
ARE_naddress [ n:2]
write_n
byteenable[3:0]
read_n
TI –
DSP
EM
IF
20© 2004 Altera Corporation
Relevant Verilog Code to ImplementRelevant Verilog Code to Implement
module ext_proc_if ( //port declarations)//signal declarationsassign a_write_n = e_write_n;assign a_read_n = e_read_n;assign a_addr = {e_addr, 2'b0};assign e_rdy = a_waitrequest; assign a_be_n = e_be_n;// work out the bi-directional data bus// if output enable is low, then get the data from the readdata path of the Avalon switch fabricassign e_data = (!e_oe_n && !e_read_n && !e_cs_n)? a_data_read : 'bz; // assign the Avalon Switch Fabric write data path to athe a_data_write net (i.e. the incoming data..assign a_data_write = a_write_data;always @(!e_cs_n)
beginif (!e_write_n ) a_write_data = e_data;
endendmodule
21© 2004 Altera Corporation
Simple Processor InterfaceSimple Processor InterfaceAdvantages− Simple as Wires—1 Logic Element (LE) to
Control Data Bus I/O Direction− Easy Access to Peripherals for the Incumbent
Processor− Limitless Options for Adding Components
Restrictions− Processor Bus & FPGA Logic Operate on
Single Clock Domain− Processor Must Honor waitrequest
22© 2004 Altera Corporation
23© 2004 Altera Corporation
Removing Clock Domain RestrictionsRemoving Clock Domain RestrictionsState Machines Pass Transactions Between Clock DomainsProcessor State Machine Initiates Transaction − Waits for Avalon Master State Machine To Complete
Avalon Master State Machine Initiates Transaction with Avalon Switch Fabric− Waits for Avalon Switch Fabric to Complete
DMA_0DMA_0
MEM_0MEM_0
MEM_NMEM_N
M
M
S
CUST_0CUST_0
CUST_NCUST_N
S
S
M
M
AvalonSwitchFabric
AvalonSwitchFabric
AddressAddressExternalProcessorExternal
ProcessorProcessor
State Machine
Processor State
Machine
Avalon Master State
Machine
Avalon Master State
Machine
External CPU - Avalon I/F
MM
MM
MM
MM
MM
SS
SS
SS
SS
SS
FPGAFPGA
DataData
ControlControl
State Machine Control DiagramState Machine Control Diagram
H_WriteH_wait = 1H_Write
H_wait = 1
H_Idle(Start)H_Idle(Start)
H_DoneH_wait=1H_Done
H_wait=1
H_FinishingH_wait = 0
H_FinishingH_wait = 0
H_ReadH_wait=1H_Read
H_wait=1
CS =1 & Write
= 1
T_Done = 1
CS=1 & Read = 1
CS= 0
T_Done = 1
T_Idle = 1
T_Idle = 1
T_Idle = 0T_Done = 0
T_WriteT_Write
T_IdleT_Idle
T_DoneT_Done
T_ReadT_ReadH_Done OR
H_Finishing ORH_Idle = 1
H_Write= 1 H_Read = 1
Waitreq
uest
= 0
Waitrequest
Waitrequest = 0
Waitrequest = 1
= 1
H_Read ORH_Write = 1
External CPU or Digital SignalExternal CPU or Digital SignalProcessor (Host) SM Diagram
Avalon Master Avalon Master SM DiagramProcessor (Host) SM Diagram SM Diagram
24© 2004 Altera Corporation
State Machine Control State Machine Control Advantages− Processor & FPGA Operation Decoupled− Each Side Free to Operate at fMAX
Restrictions− Adds Size & Complexity to Core− Adds 3-5 Cycles of Latency per Transaction
25© 2004 Altera Corporation
Removing waitrequest RestrictionRemoving waitrequest RestrictionFixed Timing Between Processor & Interface− FIFO or DPRAM− Additional Logic Required to Move Data In/Out of FIFO / DPRAM
FIFO-Based Interface− Single Address, No Random Access− Excellent for Data Streaming Applications
Dual-Port RAM-Based Interface− Allows Fully Loaded Memory with Random Access− Excellent for Multi-Processor Applications
26© 2004 Altera Corporation
FIFO-Based InterfaceFIFO-Based InterfaceFIFO Used to Stream Data to/from a Fixed Destination− Custom Accelerators− Streaming Devices
Data Movement Controlled by Nios II Processor, DMA Controllers, Custom LogicFIFO Depth Affects Data Throughput
27© 2004 Altera Corporation
CPU-to-Avalon I/FCPU-to-Avalon I/F
AddressAddressExternalProcessorExternal
Processor
Read FIFORead FIFO
Write FIFOWrite FIFO SS
SS
DMA_0DMA_0MMMM
SS
SS
CUST_0CUST_0
CUST_NCUST_N
SS
SS
MM
MM
AvalonSwitchFabric
AvalonSwitchFabric
MM
MM
MM
MM
SS
SS
SS
SS
Accelerator
Accelerator
MM
FPGAFPGA
DataData
ControlControl
28© 2004 Altera Corporation
DPRAM-Based Interface DPRAM-Based Interface DPRAMs− Random Access− Mailbox Access
DPRAM Has Fixed Cycle Timing Data Movement on DPRAM Slave Controlled by:− Soft Embedded Processor− DMA on FPGA− IP Accelerator
DMA_0DMA_0MMMM
SS
SS
CUST_0CUST_0
CUST_NCUST_N
SS
SS
MM
MM
AvalonSwitchFabric
AvalonSwitchFabric
CPU-to-Avalon I/FCPU-to-Avalon I/F
AddressAddressExternalProcessorExternal
Processor
MM
MM
MM
MM
SS
SS
SS
SS
DPRAMDPRAMSS
Accelerator
Accelerator
MM
FPGAFPGA
DataData
ControlControl
Combination of InterfacesCombination of Interfaces
Variable Cycle Interfaces− Random Access
Peripherals− Control Variables
FIFO/DPRAM − Low-Latency− Streaming Data
Use External Processor Chip_Selects to Select the Interface
DMA_0DMA_0
MEM_0MEM_0
MEM_NMEM_N
MMMM
SS
SS
CUST_0CUST_0
CUST_NCUST_N
SS
SS
MM
MM
MM
AvalonSwitchFabric
AvalonSwitchFabric
CPU-to-Avalon I/FCPU-to-Avalon I/F
AddressAddressExternalProcessorExternal
ProcessorAvalonMasterAvalonMaster
MM
MM
MM
MM
SS
SS
SS
Streaming DataStreaming Data
Decoder
Decoder
SS
FIFO/DPRAMFIFO/DPRAM SS
FPGAFPGA
DataData
ControlControl
29© 2004 Altera Corporation
Interface Bandwidth & ThroughputInterface Bandwidth & ThroughputDetermined by Your Processor Bus− Width of Bus, Frequency of Bus, Number Cycles per Transfer
Dedicated or Shared Peripheral− SOPC Builder Easy to Define− Slave-Side Arbitration
Latency− Processor: Inherent by Design− Processor Interface: Depends on Type Used− Avalon Switch Fabric: 0 Cycles, Combinatorial− Peripherals: Defined by the Peripheral
30© 2004 Altera Corporation
SummarySummaryMultiple Processor Interface Options − Simple Processor Interface− Asynchronous Interface with State Machines− FIFO-Based Interface− DPRAM-Based Interface− Combination Interfaces
SOPC Builder Automatically Integrates Additional Components− Converts Your Concepts into Systems in
Minutes
31© 2004 Altera Corporation
Additional ResourcesAdditional ResourcesSOPC Builder− Included in All Quartus II Products
Evaluation Nios II & IP Cores− MegaCore IP Library CD− Nios II Embedded Processor Evaluation
Editionwww.altera.com− Literature Section – SOPC Builder− Complete List of Application Notes− Wide Range of Tutorials
32© 2004 Altera Corporation
© 2002© 2004 Altera Corporation
Multi-Processor Systems in FPGAs: Implementation & Debug
Multi-Processor Systems in FPGAs: Implementation & Debug
Multi-Processor Systems Address these ChallengesMulti-Processor Systems Address these Challenges
Increasing Application Requirements− Single Processor Faster Clock Rate Higher Power− Multi–Processors Slower Clock Rate Lower Power− Execute Tasks in Serial or Parallel for Highest
PerformanceExpanding Product Functionality− Custom-Fit Processors for Specific Functions
Simplify Application Software − Limit Code to the Task at Hand
34© 2004 Altera Corporation
Multi-Processor ArchitecturesMulti-Processor ArchitecturesMultiple Independent ProcessorsHost with Off-Load ProcessorsSerially Linked ProcessorsMulti-Channel Processors
35© 2004 Altera Corporation
Multiple Independent ProcessorsMultiple Independent ProcessorsMain System Processor − Primary Application
Secondary Processors− System Monitoring & Reporting Functions
PHYPHYASSPASSP
FlashFlash SDRAMSDRAMMainSystem
Processor
MainSystem
ProcessorMACMAC
SDRAMControllerSDRAM
Controller
MonitoringProcessorMonitoringProcessor
PowerSupplyPowerSupply
FanControl
FanControl
StatusDisplayStatusDisplay
StatusSensorsStatus
Sensors
I/OI/O
I/OI/O
JTAGJTAG
36© 2004 Altera Corporation
Off-Load of Host ProcessorOff-Load of Host ProcessorApplication or I/O Requirements Not Being Met by CPU Add an Off-Load CPUAdditional Off-Load CPU(s) to Further Address Performance or Functional Requirements
General PurposeProcessor
General PurposeGeneral PurposeProcessorProcessor
I/OI/OI/OI/OI/OI/O
SharedMemory/Resource
SharedSharedMemory/Memory/ResourceResource
Off-LoadProcessorOff-Load
Processor I/OI/O
I/O
Off-LoadProcessorOff-Load
Processor I/OI/O
I/O
General-PurposeProcessor
GeneralGeneral--PurposePurposeProcessorProcessor
37© 2004 Altera Corporation
Serial ProcessingSerial ProcessingApplication Divided into Sequential Tasks− One Processor per Task− Simplify Software
Processors Communicate Through Shared Data/ Resources
ININ CPU 0CPU 0 CPU 1CPU 1 OUTOUT
Shared DataResources
38© 2004 Altera Corporation
Multi-Channel ProcessingMulti-Channel Processing
APEX 20KCPU
MAC PacketBuffer
DMA
CPU
MAC PacketBuffer
DMA
UserLogicUserLogic
CPU
MAC PacketBuffer
DMA
CPU
MAC PacketBuffer
DMA
Develop a Single Channel Solution
Replicate to More Channels
39© 2004 Altera Corporation
Master/Slave Processor SystemMaster/Slave Processor SystemMaster Processor− Running Operating System− Primary Application(s)
Slave Processors− Operate Under Master
Processor Control
Examples− Load Balancing in Web
Server − Control Processor with
Multiple Processor Engines for Packet Processing
Master CPU
Master CPU
Slave CPU 0Slave CPU 0
Slave CPU 1Slave CPU 1
Slave CPU 2Slave CPU 2
OUTOUT
ININ
ININ
ININ
40© 2004 Altera Corporation
Design ConsiderationsDesign Considerations
41© 2004 Altera Corporation
System Development Environment− FPGAs− Soft Embedded Processors− System Development Tools
Inter-Processor CommunicationResource SharingSoftware− Project Management− Collective Run & Stop Control
Altera’s Product OfferingsAltera’s Product Offerings
High-Density FPGAs
High-Density FPGAsCPLDsCPLDs
StructuredASICs
StructuredASICs
Low-Cost FPGAs
Low-Cost FPGAs
NextGeneration
NextGeneration
42© 2004 Altera Corporation
Nios: Most Popular Soft Processor EverNios: Most Popular Soft Processor EverOver 14,000 Kits Shipped− Over 4,000 Unique Nios® Licensees (Companies)− More than All Other Embedded Soft Processors Combined *
EDN Hot 100 Product of 2003 Keys to Success− Complete Kit − Simple, Capable CPU− Easy to Use− Low Cost ($995)− Perpetual License− No Royalties
* Semico: Soft-Core Processor Licensees
43© 2004 Altera Corporation
Nios II OverviewNios II OverviewFamily of 3 Processor Cores− Performance Over 200 DMIPS− Cost as Low as 35¢ of Logic
Powerful New Software Development Tools− IDE/Debugger, RTOS,
TCP/IP StackComplete Portfolio of Development Kits− Stratix® , Stratix II & Cyclone™ Kits − Application-Specific Expansion Boards
44© 2004 Altera Corporation
0
50
100
150
200
250
300
$0.00 $1.00 $2.00 $3.00 $4.00 $5.00Cost of CPU Logic
Perf
orm
ance
(DM
IPS)
Processor Cost vs. PerformanceProcessor Cost vs. Performance
Stratix
Cyclone
Stratix II
HardCopy® Stratix
e
s
f
e
s
f
e
s
f
e
s
f
45© 2004 Altera Corporation
46© 2004 Altera Corporation
FPGA Processor Core AdvantagesFPGA Processor Core Advantages
Stratix II EP2S180 & Nios II FastCPU 1% of Device220 DMIPs (each)
Complex Embedded System-on-a-Chip
Processor?Processor? How Many Processors?How Many Processors?Cyclone Series & Nios II Economy
CPU < 20% of Device20 DMIPs
As Low as 35¢
Low-Cost Embedded Solution
2,910 Logic Elements
179,400 Logic Elements
Host-Side Arbitration SchemeHost-Side Arbitration Scheme
MastersCPU 0
Shared BusShared Bus
ArbiterArbiter
I/O1 Shared
MemoryUART(s) SPI(s) Shared
MemorySharedMemory
CustomAccelerator Peripheral
48© 2004 Altera Corporation
CPU 2CPU 1
BottleneckArbiter Determines which Master Has
Access to Shared Bus
BottleneckArbiter Determines which Master Has
Access to Shared Bus
SlavesSlaves
Slave Side ArbitrationSlave Side Arbitration
SharedMemory
49© 2004 Altera Corporation
Slaves MailboxMemory
ArbiterArbiter
Masters
CustomAccelerator Peripheral
DataMemory
DataMemory
Slaves SharedMemory
ArbiterArbiter
SharedI/O
ArbiterArbiterArbiterArbiter ArbiterArbiter
CPU 0
LocalResources
CPU 2CPU 1
LocalResources
LocalResources
Simultaneous Operation for All Masters
SOPC Builder – Slave Side ArbitrationSOPC Builder – Slave Side Arbitration
50© 2004 Altera Corporation
Inter-Processor CommunicationsInter-Processor CommunicationsMemory Management− Access Control− Data Integrity− Amount of Memory
CPU 0CPU 0 CPU 1CPU 1
Buffer
DataData
Write
Read Write
Read
Shared Memory
51© 2004 Altera Corporation
Sharing MemorySharing MemoryPotential Problems with Shared Memory
Buffer 1
Buffer 2
Buffer 3
writewrite = Corrupt Data!CPU 0CPU 0 CPU 1CPU 1
52© 2004 Altera Corporation
Processors Writing to the Same Buffer
Sharing MemorySharing MemoryPartition & Allocate Memory by Processor Limit Access in Application Software
CPU 0CPU 0
Buffer 1
Buffer 2
Buffer 3
CPU 1CPU 1
readwrite
writeread
53© 2004 Altera Corporation
Sharing Memory – MailboxesSharing Memory – MailboxesSoftware Protocol Controls Exchange of Data− Flags Signal Data Ready or Taken
Independent Data Buffers − N * (N-1) Buffers Required
N = Number of Processors Assumes All Shared
CPU 0CPU 0 CPU 1CPU 1
Buffer A
Buffer B
DataDataFlagFlag
DataDataFlagFlag
Write
Read Write
Read
Shared Memory
54© 2004 Altera Corporation
Sharing Resources – MutexSharing Resources – MutexControls Access to Shared Memory & Resources
CPU 0CPU 0
CPU 1CPU 1
Buffer 1
Buffer 2
MutexStructures
Hardware Mutex
Hardware Mutex
Any Shared Resource
Any Shared Resource
UARTUART
56© 2004 Altera Corporation
Software Development ChallengesSoftware Development Challenges
Software DevelopmentMaintaining Hardware/Software CoherencyPartitioning Software− Sorry, That’s Your Job
58© 2004 Altera Corporation
Nios II Integrated Development Environment (IDE)*Nios II Integrated Development Environment (IDE)*
Features− Project Management− Editor & Compile− Debugger− Flash Programmer
Single JTAG Communication Link− HW/SW Download− HW/SW Debug
SignalTap® II Embedded Logic AnalyzerJTAG Debug Module
− JTAG UARTAdvanced Hardware Debug Features− Hardware Break Points, Data Triggers, Trace
59© 2004 Altera Corporation
* Based on Eclipse Project Integrated Development Environment (IDE)
Nios II IDE: Project Management Nios II IDE: Project Management
Features:• Project Manager• Editor & Compiler• Debugger• Flash Programmer
Software Templates
•
• System Library Customized to Hardware
• Painless Configuration
Software Components
• Provide Easy-to-Use Examples
• Users May Add Their Own Templates
Included Software Components
• LWIP TCP/IP Stack
• MicroC/OS-II RTOS*
• Altera ZIP FS
• Nios II Run-Time Library
60© 2004 Altera Corporation
Integrated Software Project Management
* MicroC/OS-II requires a separate license for product deployment!
Nios II IDE – Multi-ProcessorsNios II IDE – Multi-ProcessorsMultiple Processors SupportSystem Library Customized to the SOPC Builder Project − HW/SW Coherency
Assign Regions of a Shared Memory to Individual Processors− Linker Script
61© 2004 Altera Corporation
62© 2004 Altera Corporation
Nios II Hardware Abstraction LayerNios II Hardware Abstraction LayerHAL Mailbox RoutinesHAL Mailbox Routines
inter_processor_mbox_init()− Initialize a Mmailbox
inter_processor_mbox_get()− Copy a Message from Mailbox. Block
Until Requested bBtes are Available.inter_processor_mbox_tryget()− Copy a Message from Mailbox. Fail if
Requested Bytes are Not Available.inter_processor_mbox_peek_item()− Copy a Message from Mailbox. Fail if
Requested Bytes are Not Available. Do not Remove the Message from the Mailbox.
inter_processor_mbox_put()− Copy a Message to Mailbox. Block until
the Message Can Be Placed in the Mailbox.
inter_processor_mbox_tryput()− Copy a Message to Mailbox. Fail if the
Message Will Not Fit in the Mailbox.inter_processor_mbox_peek()− Retrieve the Number of Bytes Available
for Getting from the Mailbox.
HAL HAL MutexMutex RoutinesRoutinesinter_processor_mutex_init()− Initialize a Mutex, & Associate it with
the Supplied Hardware Component.inter_processor_mutex_destroy()− Destroy a Mutex, Disassociating it
with the Supplied Hardware Component.
inter_processor_mutex_lock()− Acquire the Lock for the Mutex.
Block until The Lock Can Be Acquired.
inter_processor_mutex_trylock()− Acquire the Lock for the Mutex.
Return if the Mutex is Unavailable.inter_processor_mutex_unlock()− Release the Lock for the Mutex.
Nios II IDE: Debugger Nios II IDE: Debugger
• Breakpoints
• Code Position
Source View
Basic Debug• Run Controls
• Stack View
• Active Debug Sessions
• Variables
• Registers
• Signals
Memory View
Features:• Project Manager• Editor & Compiler• Debugger• Flash Programmer
Debug Targets:• Hardware (JTAG)• Instruction Set Simulator• Logic Simulator (ModelSim)Debug Scope:• HW/SW breakpoints• H/W Data Triggers• Watchpoints• Instruction Trace
Debug Scope
63© 2004 Altera Corporation
Integrated, Broad Capability Debug
Debug – Multi-Processor SupportDebug – Multi-Processor SupportCollective Launch & Terminate
64© 2004 Altera Corporation
Multi-Processor Systems Using FPGAsMulti-Processor Systems Using FPGAs
Multi-Processors − Address Increasing Application Requirements− Expand Product Functionality− Simplify Software Development
FPGAs, Nios II, SOPC Builder − The Perfect-Fit Solutions
65© 2004 Altera Corporation
Additional ResourcesAdditional ResourcesNios II & IP MegaCore® Functions− Nios II Embedded Processor Evaluation Edition− MegaCore IP Library CD
SOPC Builder− Included in All Quartus® II Products
www.altera.com Literature Section− Multi-Processor Systems with Nios II Application
Note− Nios II Handbook− SOPC Builder Documentation
66© 2004 Altera Corporation
© 2002© 2004 Altera Corporation
Multi-Processor Systems in FPGAs: Implementation & Debug
Multi-Processor Systems in FPGAs: Implementation & Debug
68© 2004 Altera Corporation
FPGAs for Multi-Processor DesignsFPGAs for Multi-Processor Designs
PerformancePerformanceThree ConfigurationsHardware AccelerationCustom Instructions
SOPC BuilderThe Perfect-Fit Solution
FlexibilityFlexibility
Powerful Debug Tools
Powerful Debug Tools
Fastest Timeto Market
Fastest Timeto Market
Single or Multiple Standard PeripheralsCustom Peripherals
On-Chip Processor DebugSignalTap II Logic AnalyzerPerformance Analysis Plug-Ins