digital design: an embedded systems approach using verilog chapter 7 processor basics portions of...
TRANSCRIPT
Digital Design:An Embedded Systems Approach Using Verilog
Chapter 7Processor Basics
Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.
Digital Design — Chapter 7 — Processor Basics 2
Verilog
Embedded Computers
A computer as part of a digital system Performs processing to implement or control
the system’s function Components
Processor core Instruction and data memory Input, output, and input/output controllers
For interacting with the physical world Accelerators
High-performance circuit for specialized functions Interconnecting buses
Digital Design — Chapter 7 — Processor Basics 3
Verilog
Memory Organization
Von Neumann architecture Single memory for instructions and data
Harvard architecture Separate instruction and data memories Most common in embedded systems
CPU
…
AcceleratorInstructionmemory
Inputcontroller
Outputcontroller
I/Ocontroller
Datamemory
Digital Design — Chapter 7 — Processor Basics 4
Verilog
Bus Organization Single bus for low-cost low-performance
systems Multiple buses for higher performance
CPU
Accelerator
Instructionmemory
Inputcontroller
Outputcontroller
I/Ocontroller
Datamemory
Digital Design — Chapter 7 — Processor Basics 5
Verilog
Microprocessors
Single-chip processor in a package External connections to memory
and I/O buses Most commonly seen in general
purpose computers E.g., Intel Pentium family, PowerPC, …
Digital Design — Chapter 7 — Processor Basics 6
Verilog
Microcontrollers Single chip combining
Processor A small amount of instruction/data memory I/O controllers
Microcontroller families Same processor, varying memory and I/O
8-bit microcontrollers Operate on 8-bit data Low cost, low performance
16-bit and 32-bit microcontrollers Higher performance
Digital Design — Chapter 7 — Processor Basics 7
Verilog
Processor Cores
Processor as a component in an FPGA or ASIC
In FPGA, can be a fixed-function block E.g., PowerPC cores in some Xilinx FPGAs
Or can be a soft core Implemented using programmable resources E.g., Xilinx MicroBlaze, Altera Nios-II
In ASIC, provided as an IP block E.g., ARM, PowerPC, MIPS, Tensilica cores Can be customized for an application
Digital Design — Chapter 7 — Processor Basics 8
Verilog
Digital Signal Processors
DSPs are processors optimized for signal processing operations E.g., audio, video, sensor data;
wireless communication Often combined with a
conventional core for processing other data Heterogeneous multiprocessor
Digital Design — Chapter 7 — Processor Basics 9
Verilog
Instruction Sets
A processor executes a program A sequence of instructions, each performing
a small step of a computation Instruction set: the repertoire of
available instructions Different processor types have different
instruction sets High-level languages: more abstract
E.g., C, C++, Ada, Java Translated to processor instructions by a
compiler
Digital Design — Chapter 7 — Processor Basics 10
Verilog
Instruction Execution
Instructions are encoded in binary Stored in the instruction memory
A processor executes a program by repeatedly Fetching the next instruction Decoding it to work out what to do Executing the operation
Program counter (PC) Register in the processor holding the
address of the next instruction
Digital Design — Chapter 7 — Processor Basics 11
Verilog
Data and Endian-ness Instructions operate on data from the data
memory Byte: 8-bit data
Data memory is usually byte addressed 16-bit, 32-bit, 64-bit words of data
0
least sig. byte
Little endian Big endian
8-bit data
16-bit data
32-bit data
most sig. byte
least sig. byte
most sig. byte
m
m + 1
n
n + 2
n + 3
n + 1
0
least sig. byte
8-bit data
16-bit data
32-bit data
most sig. byte
least sig. byte
most sig. byte
m
m + 1
n
n + 2
n + 3
Digital Design — Chapter 7 — Processor Basics 12
Verilog
The Gumnut Core
A small 8-bit soft core Can be used in FPGA designs
Instruction set illustrates features typical of 8-bit cores and processors in general
Programs written in assembly language Each processor instruction written explicitly Translated to binary representation by an
assembler Resources available on companions web
site
Digital Design — Chapter 7 — Processor Basics 13
Verilog
Gumnut Storage
r0 0r1r2r3r4r5r6r7
PC
CZ
General-Purpose Registers Condition Code Registers
012
254255
Data Memory(256 × 8-bit, 8-bit addresses)
012
40944095
Instruction Memory(4K × 18-bit, 12-bit addresses)
Program Counter
CarryZero
Digital Design — Chapter 7 — Processor Basics 14
Verilog
Arithmetic Instructions Operate on register data and put
result in a register add, addc, sub, subc Can have immediate value operand
Condition codes Z: 1 if result is zero, 0 if result is non-
zero C: carry out of add/addc, borrow out of
sub/subc addc and subc include C bit in
operation
Digital Design — Chapter 7 — Processor Basics 15
Verilog
Arithmetic Instructions
Examples add r3, r4, r1 add r5, r1, 2 sub r4, r4, 1
Evaluate 2x + 1; x in r3, result in r4 add r4, r4, r3 ; double xadd r4, r4, 1 ; then add 1
Digital Design — Chapter 7 — Processor Basics 16
Verilog
Logical Instructions
Operate on register data and put result in a register and, or, xor, mask (and not) Operate bitwise on 8-bit operands Can have immediate value operand
Condition codes Z: 1 if result is zero, 0 if result is non-
zero C: always 0
Digital Design — Chapter 7 — Processor Basics 17
Verilog
Logical Instructions Examples
and r3, r4, r5 or r1, r1, 0x80 ; set r1(7) xor r5, r5, 0xFF ; invert r5
Set Z if least-significant 4 bits of r2 are 0101 and r1, r2, 0x0F ; clear high bitssub r0, r1, 0x05 ; compare with 0101
Digital Design — Chapter 7 — Processor Basics 18
Verilog
Shift Instructions
Logical shift/rotate register data and put result in a register shl, shr, rol, ror Count specified as a literal operand
Condition codes Z: 1 if result is zero, 0 if result is non-
zero C: the value of the last bit
shifted/rotated past the end of the byte
Digital Design — Chapter 7 — Processor Basics 19
Verilog
Shift Instructions Examples
shl r4, r1, 3 ror r2, r2, 4
Multiply r4 by 8, ignoring overflow shl r4, r4, 3
Multiply r4 by 10, ignoring overflow shl r1, r4, 1 ; multiply by 2shl r4, r4, 3 ; multiply by 8add r4, r4, r1
Digital Design — Chapter 7 — Processor Basics 20
Verilog
Memory Instructions Transfer data between registers and
data memory Compute address by adding an offset to a
base register value Load register from memory
ldm r1, (r2)+5 Store from register to memory
stm r1, (r4)-2 Use r0 if base address is 0
ldm r3, 23 ldm r3, (r0)+23 Condition codes not affected
Digital Design — Chapter 7 — Processor Basics 21
Verilog
Memory Instructions
Increment a 16-bit integer in memory Little-endian: address of lsb in r2, msb in
next location ldm r1, (r2) ; increment lsbadd r1, r1, 1stm r1, (r2)ldm r1, (r2)+1 ; increment msbaddc r1, r1, 0 ; with carrystm r1, (r2)+1
Digital Design — Chapter 7 — Processor Basics 22
Verilog
Input/Output Instructions
I/O controllers have registers that govern their operation Each has an address, like data memory Gumnut has separate data and I/O address
spaces Input from I/O register
inp r3, 157 inp r3, (r0)+157 Output to I/O register
out r3, (r7) out r3, (r7)+0 Condition codes not affected Further examples in Chapter 8
Digital Design — Chapter 7 — Processor Basics 23
Verilog
Branch Instructions
Programs can evaluate conditions and take alternate courses of action Condition codes (Z, C) represent outcomes
of arithmetic/logical/shift instructions Branch instructions examine Z or C
bz, bnz, bc, bnc Add a displacement to PC if condition is true Specifies how many instructions forward or
backward to skip Counting from instruction after branch
Digital Design — Chapter 7 — Processor Basics 24
Verilog
Branch Example
Elapsed seconds in location 100 Increment, wrapping to 0 after 59 ldm r1, 100add r1, r1, 1sub r0, r1, 60 ; Z set if r1 = 60bnz +1 ; Skip to store ifadd r1, r0, 0 ; Z is 0stm r1, 100
Digital Design — Chapter 7 — Processor Basics 25
Verilog
Jump Instruction Unconditionally skips forward or
backward to specified address Changes the PC to the address
Example: if r1 = 0, clear data location 100 to 0; otherwise clear location 200 to 0 Assume instructions start at address 10 10: sub r0, r1, 011: bnz +212: stm r0, 10013: jmp 1514: stm r0, 20015: ...
Digital Design — Chapter 7 — Processor Basics 26
Verilog
Subroutines A sequence of instructions that
perform some operation Can call them from different parts of a
program using a jsb instruction Subroutine returns with a ret instruction
subroutine
instructions
……
…
ret
mjsb m
…
jsb m
Digital Design — Chapter 7 — Processor Basics 27
Verilog
Subroutine Example Subroutine to increment second count
Address of count in r2 ldm r1, (r2)add r1, r1, 1sub r0, r1, 60bnz +1add r1, r0, 0stm r1, (r2)ret
Call to increment locations 100 and 102 add r2, r0, 100jsb 20add r2, r0, 102jsb 20
Digital Design — Chapter 7 — Processor Basics 28
Verilog
Return Address Stack
The jsb saves the return address for use by the ret But what if the subroutine includes a jsb?
Gumnut core includes an 8-entry push-down stack of return addresses
return addr for first call
return addr for second call
return addr for first call
return addr for second call
return addr for third call
Digital Design — Chapter 7 — Processor Basics 29
Verilog
Miscellaneous Instructions
Instructions supporting interrupts See Chapter 8 retiReturn from interrupt enaiEnable interrupts disiDisable interrupts waitWait for an interrupt stbyStand by in low power mode
untilan interrupt occurs
Digital Design — Chapter 7 — Processor Basics 30
Verilog
The Gumnut Assembler
Gasm: translates assembly programs Generates memory images for program
text (binary-coded instructions) and data See documentation on web site
Write a program as a text file Instructions Directives Comments Use symbolic labels
Digital Design — Chapter 7 — Processor Basics 31
Verilog
Example Program; Program to determine greater of value_1 and value_2
text org 0x000 ; start here on reset jmp main
; Data memory layout
datavalue_1: byte 10value_2: byte 20result: bss 1
; Main program
text org 0x010main: ldm r1, value_1 ; load values ldm r2, value_2 sub r0, r1, r2 ; compare values bc value_2_greater stm r1, result ; value_1 is greater jmp finishvalue_2_greater: stm r2, result ; value_2 is greater
finish: jmp finish ; idle loop
Digital Design — Chapter 7 — Processor Basics 32
Verilog Gumnut Instruction Encoding
Instructions are a form of information Can be encoded in binary
Gumnut encoding 18 bits per instruction Divided into fields representing
different aspects of the instruction Opcodes and function codes Register numbers Addresses
Digital Design — Chapter 7 — Processor Basics 33
Verilog Gumnut Instruction Encoding
1 1 01 1 1 fn disp6 2 2 8
Branch
Arith/LogicalRegister
Arith/LogicalImmediate
Shift
Memory, I/O
1 1 01 fnrd rs rs24 3 33 3 2
0 fn rd rs immed1 83 3 3
1 1 0 fnrd rs count3 31 23 3 3
1 0 fn rd rs offset2 2 3 3 8
1 1 1 1 0
0
fn addr5 1 12
Jump
1 1 1 1 1 1
Digital Design — Chapter 7 — Processor Basics 34
Verilog
Encoding Examples
Encoding for addc r3, r5, 24 Arithmetic immediate, fn = 001
0 fn rd rs immed
0 00 1 10 1 01 1 0 0 10 1 00 0
1 83 3
Instruction encoded by 2ECFC
1 1 01 1 1 fn disp6 2 2 8
1 1 0 0 01 1 1 1 1 1 1 11 1 0 01
Branch bnc -4
05D18
Digital Design — Chapter 7 — Processor Basics 35
Verilog
Other Instruction Sets
8-bit cores and microcontrollers Xilinx PicoBlaze: like Gumnut 8051, and numerous like it
Originated as 8-bit microprocessors Instructions encoded as one or more bytes Instruction set is more complex and irregular Complex instruction set computer (CISC) C.f. Reduced instruction set computer (RISC)
16-, 32- and 64-bit cores Mostly RISC E.g., PowerPC, ARM, MIPS, Tensilica, …
Digital Design — Chapter 7 — Processor Basics 36
Verilog Instruction and Data Memory
In embedded systems Instruction memory is usually ROM,
flash, SRAM, or combination Data memory is usually SRAM
DRAM if large capacity needed
Processor/memory interfacing Gluing the signals together
Digital Design — Chapter 7 — Processor Basics 37
Verilog
Example: Gumnut Memory
inst_adr_oinst_dat_i
rst_i
gumnut dataSRAM
inst_cyc_oinst_stb_o
inst_ack_i
data_adr_o
data_dat_idata_dat_o
data_cyc_odata_stb_o
data_ack_i
data_we_o
adr
dat_odat_i
en
weadr
dat_o
en
clk_iclk_i
instructionROM
clk_i
D Q
clk
DQ
clk
Digital Design — Chapter 7 — Processor Basics 38
Verilog
Example: Gumnut Memory
always @(posedge clk) // Instruction memory if (inst_cyc_o && inst_stb_o) begin inst_dat_i <= inst_ROM[inst_adr_o[10:0]]; inst_ack_i <= 1'b1; end else inst_ack_i <= 1'b0;
Digital Design — Chapter 7 — Processor Basics 39
Verilog
Example: Gumnut Memory
always @(posedge clk) // Data memory if (data_cyc_o && data_stb_o) if (data_we_o) begin data_RAM[data_adr_o] <= data_dat_o; data_dat_i <= data_dat_o; data_ack_i <= 1'b1; end else begin data_dat_i <= data_RAM[data_adr_o]; data_ack_i <= 1'b1; end else data_ack_i <= 1'b0;
Digital Design — Chapter 7 — Processor Basics 40
Verilog Example: Microcontroller Memory
A(15..8)
A(7..0)
CE
WE
OE
D
A(16)
D
LE
P2
Q
PSEN
ALE
8051 SRAM
RD
WR
P0
Digital Design — Chapter 7 — Processor Basics 41
Verilog
32-bit Memory
Four bytes per memory word Little-endian: lsb at least address Big-endian: msb at least address
0 1 2 34 5 6 78 9 10 11
Partial-word read Read all bytes, processor selects those needed
Partial-word write Use byte-enable signals
Digital Design — Chapter 7 — Processor Basics 42
Verilog Example: MicroBlaze Memory
D_in
A
SSRAM
en
wr
D_out
clk
D_in
A
SSRAM
en
wr
D_out
clk
D_in
A
SSRAM
en
wr
D_out
clk
D_in
A
SSRAM
en
wr
D_out
clk
0:7
8:15
16:23
24:31
0:7
2:16
8:15
16:23
24:31
Addr
Data_Write
AS
Read_Strobe
Ready
Clk
Data_Read
Write_Strobe
Byte_Enable(0)
Byte_Enable(1)
Byte_Enable(2)
Byte_Enable(3)
+V
Digital Design — Chapter 7 — Processor Basics 43
Verilog
Cache Memory
For high-performance processors Memory access time is several clock
cycles Performance bottleneck
Cache memory Small fast memory attached to a
processor Stores most frequently accessed items,
plus adjacent items Locality: those items are most likely to be
accessed again soon
Digital Design — Chapter 7 — Processor Basics 44
Verilog
Cache Memory Memory contents divided into fixed-
sized blocks (lines) Cache copies whole lines from memory
When processor accesses an item If item is in cache: hit - fast access
Occurs most of the time If item is not in cache: miss
Line containing item is copied from memory Slower, but less frequent May need to replace a line already in cache
Digital Design — Chapter 7 — Processor Basics 45
Verilog
Fast Main Memory Access Optimize memory for line access by cache
Wide memory Read a line in one access
Burst transfers Send starting address, then read successive locations
Pipelining Overlapping stages of memory access E.g., address transfer, memory operation, data
transfer Double data rate (DDR), Quad data rate (QDR)
Transfer on both rising and falling clock edges
Digital Design — Chapter 7 — Processor Basics 46
Verilog
Summary Embedded computer
Processor, memory, I/O controllers, buses
Microprocessors, microcontrollers, and processor cores
Soft-core processors for ASIC/FPGA Processor instruction sets
Binary encoding for instructions Assembly language programs Memory interfacing