arces university of bologna reconfigurable architectures andrea lodi
Post on 20-Dec-2015
218 views
TRANSCRIPT
![Page 1: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/1.jpg)
ARCES University of Bologna
Reconfigurable Architectures
Andrea Lodi
![Page 2: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/2.jpg)
ARCES University of Bologna
SoC trends
• Increasing mask cost (~ 3M$)• Increasing design complexity• Increasing design time (~ 3M$)
• Rapidly changing communication standards• Low-power design in wireless environment• Increasing algorithmic complexity
requirements
![Page 3: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/3.jpg)
ARCES University of Bologna
Product life cycle
time
sales
Growth Maturity Decrease
LOSS
time-
to-m
arke
t met
time-
to-m
arke
t fai
led
![Page 4: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/4.jpg)
ARCES University of Bologna
Trends in wireless systems
• Increased on-chip Transistor density
• Increased design complexity
Millions of transistors/Chip
1997199920012003200520070
400
200
300
100
2009
Technology (nm)
• Increased Algorithmic complexity
• Low battery capacity growth
1997199920012003200520072009
Algorithm complexityMoore’s law
Battery capacity
• Demand for reusability and flexibility
• Demand for high performance and energy efficiency
![Page 5: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/5.jpg)
ARCES University of Bologna
Digital architecture design space
![Page 6: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/6.jpg)
ARCES University of Bologna
Parallelism in computation
• Thread level parallelism
• Instruction level parallelism (ILP)
• Pipeline (loop level)
• Fine-grain parallelism (bit/byte-level)
![Page 7: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/7.jpg)
ARCES University of Bologna
Instruction level parallelism
+ + +
**
-
+
e
3
a b c d
+ + +
* *3
-
+
ASIC Implementation
![Page 8: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/8.jpg)
ARCES University of Bologna
Spatial vs. Temporal Computing
Ax2 + Bx + c (Ax + B)x + C
Spatial (ASIC) Temporal (Processor)
![Page 9: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/9.jpg)
ARCES University of Bologna
Superscalar/VLIW processors
• FU limitations• Register file size limitation• Crossbar inefficiency
![Page 10: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/10.jpg)
ARCES University of Bologna
Byte-level parallelism in processors
• MMX technology: 57 new instructions • Byte and half word parallel computation• SIMD execution model
![Page 11: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/11.jpg)
ARCES University of Bologna
Bit-level parallelismReverse (int v) {
int x, r;
for (c=0; x<WIDTH; x++) {r |= v&1;v = v >> 1;R = r << 1;
}return r;
}
v
r
popcount (int v) {int r=0;
while (v) {if (v&1) r++;v = v >> 1;
}return r;
}
+ + + ++ + + +
+ + +
v
r
![Page 12: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/12.jpg)
ARCES University of Bologna
Pipeline parallelism
for (j=0; j<MAX; j++)b[j] = popcount[a[j]];
+ + + +
+ + + +
+ + +
v
r= register
![Page 13: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/13.jpg)
ARCES University of Bologna
FPGAFPGA (Field-Programmable Gate Array) composed of 2 elements:• Array of clbs (configurable logic blocks) composed of :
– 1 or few small size LUTs (4:1 or 3:1)– Control logic: mux controlled by configuration bits– Dedicated computational logic (carry chain …)
• Configurable routing network connecting clbs composed of:– Different length wires– Connection blocks connecting clbs to the routing network– Switch blocks connecting routing wires
LUTs, configuration bits to program clbs and the routing network represent the FPGA configuration, which determines the function implemented
![Page 14: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/14.jpg)
ARCES University of Bologna
Configurable logic block
![Page 15: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/15.jpg)
ARCES University of Bologna
Xilinx Clb
• Xilinx clb 4000 series:– 11 input 4 output bits
– 3 LUTs
– Carry logic
– 2 output registers
![Page 16: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/16.jpg)
ARCES University of Bologna
Configurable routing network
![Page 17: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/17.jpg)
ARCES University of Bologna
Example
![Page 18: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/18.jpg)
ARCES University of Bologna
Density Comparison
![Page 19: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/19.jpg)
ARCES University of Bologna
FPGA vs. Processor
FPGA(computing in space)• Parallel execution• Configurable in 102-103 cycles• Fine-grained data• Application specific operators• Large area (switches, SRAM)• Entire applications don’t fit• Slow synthesis, P&R tools
Processor(computing in time)• Sequential execution• Programmable every cycle• Fixed-size operands• Basic operators (ALU)• Compact• Handles complex control flow• Fast compilers
![Page 20: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/20.jpg)
ARCES University of Bologna
Reconfigurable processors
But:
• 90% execution time spent in computational kernels:– FPGAs 10-100x speed-up over processors– FPGAs 10-100x denser than processors (bit-ops/2s)
• Reconfigurable processor: Risc + FPGA
![Page 21: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/21.jpg)
ARCES University of Bologna
Reconfigurable processor architecture
• Hybrid architectures:– RISC processor– FPGA
![Page 22: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/22.jpg)
ARCES University of Bologna
Computational models
• RC Array: IO Processor/Interface logic
• Attached processor– Piperench, T-Recs
• ISA Extension– Function unit:
• PRISC, OneChip, Chimaera
– Coprocessor• Garp, NAPA, Molen
![Page 23: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/23.jpg)
ARCES University of Bologna
IO Processor/Interface Logic
• Logic used in place of – ASIC environment
customization– external FPGA/PLD
devices
• Looks like IO peripheral to processor
• Example– protocol handling– stream computation
• compression, encrypt– peripherals– sensors, actuators
• Case for:– Always have some system
adaptation to do
– Modern chips have capacity to hold processor + glue logic
– reduce part count
– Glue logic vary
– many protocols, services
– only need few at a time
![Page 24: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/24.jpg)
ARCES University of Bologna
Example: Interface/Peripherals
• Triscend E5
![Page 25: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/25.jpg)
ARCES University of Bologna
Instruction Set Extension
• Instruction Bandwidth– Processor can only describe a small number of basic
computations in a cycle• I bits 2I operations
– This is a small fraction of the operations one could do even in terms of www Ops
• w22(2w) operations
– Processor could have to issue w2(2 (2w) -I) operations just to describe some computations
– An a priori selected base set of functions could be very bad for some applications
![Page 26: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/26.jpg)
ARCES University of Bologna
Instruction Set Extension
• Idea:– provide a way to augment the processor’s
instruction set– with operations needed by a particular
application
![Page 27: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/27.jpg)
ARCES University of Bologna
Architectural Models for I.S.A extension
Cpu surrounded by a collection of
Application-specific Custom Computing Devices
PLEIADES PLEIADES
High performance Overdesigned for most applications Difficult to program
Zhang et al, 2000
XTENSA XTENSA
Risc CPU featuring application-specific function units optionally inserted in the
processor pipeline
Good performance Easy to program Configured at mask-level
Tensilica inc, 2002
![Page 28: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/28.jpg)
ARCES University of Bologna
Dynamic ISA Extension models
Standard processor coupled with embedded programmable logic where application specific functions are dynamically
re-mapped depending on the performed algorithm
1: Coprocessor model 2: Function unit model
![Page 29: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/29.jpg)
ARCES University of Bologna
Coprocessor model: Garp Explicit instructions moving Explicit instructions moving data to and from the arraydata to and from the array High communication overheadHigh communication overhead (long latency array operations)(long latency array operations) Processor stalled each time the Processor stalled each time the array is activearray is active
Array performs at TASK level Array performs at TASK level (Very coarse grain)(Very coarse grain)
10-20x on stream, feed-forward 10-20x on stream, feed-forward operationsoperations 2-3x when data-dependencies 2-3x when data-dependencies limit pipelininglimit pipelining
Callahan, Hauser, Wawrzynek, 2000
![Page 30: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/30.jpg)
ARCES University of Bologna
Function unit model: Prisc
Razdan, Smith 1994
Array fit in the risc pipelineArray fit in the risc pipeline
No communication overheadNo communication overhead Some degree of parallelism Some degree of parallelism between between function unitsfunction units
Gate array performs Gate array performs combinatorial combinatorial instructions ONLY (very fine instructions ONLY (very fine grain)grain)
Low speedup figures (2x/3x)Low speedup figures (2x/3x)
![Page 31: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/31.jpg)
ARCES University of Bologna
Function Unit Model: pros
• No communication overhead:– Strict synergy between FPGA and other function
units– FPGA can be used frequently even for small
functions– Small reconfigurable array area
• Flow control handled by the core• Memory access handled by the core• Easy instruction set extension• Configuration streams compiled from C
![Page 32: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/32.jpg)
ARCES University of Bologna
EXTENDIBLE INSTRUCTION SET RISC ARCHITECTURE
32-bit load/store Risc architecture (5 stages pipeline)
•Concurrent fetch and execution of two 32-bit instructions per cycle•Fully bypassed, to minimize pipeline stalls (Average of 10/20% for most computational cores)•DSP-oriented reconfigurable functional unit (PiCoGA)•Fully configurable at execution time•Elaboration and configuration controlled by asm instructions inserted in C source code•PiCoGA used as a programmable Data-path with independent pipeline structure
•Multiply/Mac Unit•Branch/Decrement Unit•Alu featuring “MMX” byte-wide concurrent operations
Embedded reconfigurable device for dynamic ISA extension
VLIW Elaboration
Set of specialized functional units
![Page 33: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/33.jpg)
ARCES University of Bologna
XiRisc Architecture
![Page 34: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/34.jpg)
ARCES University of Bologna
Dynamic Instruction Set Extension
![Page 35: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/35.jpg)
ARCES University of Bologna
…..pgaload …..…..…..pgaop $3,$4,$5…...…...Add $8, $3
Dynamic Instruction Set Extension
Register FileRegister File
Con
fig
ura
tion
Mem
ory
Con
fig
ura
tion
Mem
ory
![Page 36: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/36.jpg)
ARCES University of Bologna
PiCoGA Architecture
PiCoGA(Pipelined Configurable Gate Array): Embedded datapathfor dynamic i.s.a. extension •Dynamically reconfigurable•Structured in rows activated in data- flow fashion by the PiCoGA control unit• Can hold a state• pGA-op latency depends on the specific mapped function• Functionality is determined from DFG extracted from C code
PiC
oG
APiC
oG
A C
on
trol
Con
trol U
nit
Un
itPiC
oG
APiC
oG
A C
on
trol
Con
trol U
nit
Un
it
Processor InterfaceProcessor Interface
PicoRow (Synchronous Element)
![Page 37: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/37.jpg)
ARCES University of Bologna
Pico-cell Description4x32-bit input data from Reg File2x32-bit output data to Reg File
PiC
oG
A C
on
trol U
nit
PiC
oG
A C
on
trol U
nit
PiC
oG
A C
on
trol U
nit
PiC
oG
A C
on
trol U
nit
INPUTLOGICINPUTLOGIC
LUT16x2LUT16x2
OUTPUTLOGIC,
REGISTERS
OUTPUTLOGIC,
REGISTERS
CARRYCHAINCARRYCHAIN
LUT16x2LUT16x2
EN
PiCoGA control unit signals
Configuration bus
Loop-back
12 global lines to/from R
eg File
INPUT CONNECT
BLOCK
INPUT CONNECT
BLOCK
SWITCHBLOCK
INP
UT
CO
NN
EC
TB
LO
CK
INP
UT
CO
NN
EC
TB
LO
CK
OU
TP
UT
CO
NN
EC
TB
LO
CK
OU
TP
UT
CO
NN
EC
TB
LO
CK
RLC
…
…
… …
…
…
…
![Page 38: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/38.jpg)
ARCES University of BolognaPiC
oG
APiC
oG
A C
on
trol
Con
trol U
nit
Un
it
Computing on PiCoGA
Mapping
Pga_op2
Mapping
Pga_op1
Data Flow Graph
Data out
Data in
![Page 39: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/39.jpg)
ARCES University of Bologna
Multi-context Array
Func. 1Func. 1
Func. 2Func. 2
Func. 3Func. 3
Func. 4Func. 4
Func. nFunc. n
Configuration Configuration CacheCache
PiCoGAPiCoGA
Four configuration planes are Four configuration planes are available, one of them executingavailable, one of them executing
Plane switch takes just 1 clock Plane switch takes just 1 clock cyclecycle
While a plane is executing another While a plane is executing another may be reconfigured may be reconfigured → No → No reconfiguration time overheadreconfiguration time overhead
![Page 40: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/40.jpg)
ARCES University of Bologna
Architecture FlexibilityParallelism to exploit ?
Bit-level operations ?
MAC intensive ?
Memory intensive ?
Yes
Yes
Yes
Yes
No
No
No
(Ex: Turbo Decod., Motion Est.)
(Ex: DES, Reed-Solomon)
(Ex: FFT, Scalar product)
(Ex: DCT, Motion Est.)
Speed-up from
pGA (5x – 100x)
Speed-up from DSP instructions and VLIW
(1.5x – 2x)
Improvements for a large number of Data & Signal Processing algorithms
![Page 41: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/41.jpg)
ARCES University of Bologna
Programming XiRisc: Restrictions
• Fixed-point algorithms• Variable size specification at the bit level
Not supported yet:• Dynamic memory allocation• Math library• Operating System
![Page 42: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/42.jpg)
ARCES University of Bologna
File.cFile.c
C COMPILERC COMPILER
PROFILERPROFILER
XiRisc Compilation Flow
PiCoGAConfigurator
PiCoGAConfigurator
ConfigurationBit stream
ConfigurationBit stream
PiCoGAopPiCoGAop
ConfigurationLibrary
ConfigurationLibrary
Software Simulation
![Page 43: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/43.jpg)
ARCES University of Bologna
Example: Motion Estimation
Sum of Absolute Difference
(SAD)-
High instruction-level
and inter-iteration parallelism
![Page 44: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/44.jpg)
ARCES University of Bologna
Data Flow Graphpixel-pixel
absolute difference
Abs (p1[i] – p2[i])•p1[i], p2[i] pixel
Absolute DifferenceSum tree
…..
![Page 45: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/45.jpg)
ARCES University of Bologna
SAD
Writebackto
Register File
AD1 AD2 AD3 AD4
From Register File
Sum of Absolute Difference
SAD8
SAD8
![Page 46: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/46.jpg)
ARCES University of Bologna
Emulation Functionwith
Latency and Issue Delay
Emulation Functionwith
Latency and Issue Delay
Place & Route
ConfigurationBits
ConfigurationBits
Place & RoutePlace & Route
MappingMapping
DFG-based descriptionDFG-based description
High-LevelC Compiler
High-LevelC Compiler
GriffyCompiler
![Page 47: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/47.jpg)
ARCES University of Bologna
Performance evaluation
• Emulation function• Latency and Issue-Delay back-annotation• Profiling
![Page 48: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/48.jpg)
ARCES University of Bologna
Motion Estimation: Results
Motion estimation:• 16 SAD operations in parallel• PiCoGA occupation: ~100%• Speed-up: 7x (with respect to standard XiRisc)
MPEG preliminary result:• H.261 standard QCIF (176x144): 10 frame/sec
![Page 49: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/49.jpg)
ARCES University of Bologna
Reed-Solomon Encoder: Results
Encoder RS(15,9): 4-bit symbols• PiCoGA occupation: ~25%• Speed-up: 37x• Throughput: 70.6 Mb/sec
Encoder RS(255,239) widely used: 8-bit symbols• PiCoGA occupation: ~60%• Speed-up: 135x• Throughput: 187.1 Mb/sec
![Page 50: ARCES University of Bologna Reconfigurable Architectures Andrea Lodi](https://reader034.vdocuments.us/reader034/viewer/2022051516/56649d485503460f94a234e5/html5/thumbnails/50.jpg)
ARCES University of Bologna
Speed-up and Power Consumption
AlgorithmAlgorithmEnergy consumption Energy consumption
reductionreduction
(vs. std. XiRisc)(vs. std. XiRisc)
Speed-up Speed-up
(vs. std. XiRisc)(vs. std. XiRisc)
DES encryptionDES encryption 89%89% 13.5x13.5x
Turbo decoderTurbo decoder 75%75% 11.7x11.7x
Motion predictionMotion prediction 46%46% 4.5x4.5x
Median filterMedian filter 60%60% 7.7x7.7x
CRCCRC 49%49% 4.3x4.3x