philips research iccd 1999 - 1 1 trimedia cpu64 application domain and benchmark suite a.k. riemens...
TRANSCRIPT
![Page 1: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/1.jpg)
Philips Research
ICC
D 1
999
- 1
1
TriMedia CPU64Application Domain and
Benchmark Suite
A.K. RiemensPhilips Research
Eindhoven, The Netherlands
![Page 2: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/2.jpg)
Philips Research
ICC
D 1
999
- 2
2
Outline
• Introduction• Approach• Benchmark suite• Example• Results
![Page 3: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/3.jpg)
Philips Research
ICC
D 1
999
- 3
3
Design Problem
VLIWcpu
I$
video-in
video-out
audio-in
SDRAM
audio-out
PCI bridge
Serial I/O
timers I2C I/O
D$
![Page 4: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/4.jpg)
Philips Research
ICC
D 1
999
- 4
4
Application Domain
• High volume consumer electronics productsfuture TV, home theatre, set-top box, etc.
• Embedded core• Media processing:
audio, video, graphics, communication
![Page 5: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/5.jpg)
Philips Research
ICC
D 1
999
- 5
5
Media Processing CPU
Design considerations:• Cost (embedded in high volume products)• Performance for the application domain• Ease of use (programming model)
Benchmark suite needed
![Page 6: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/6.jpg)
Philips Research
ICC
D 1
999
- 6
6
Application: Natural Motion
n
n-1
n-1/2
picture number
-D/2
D/2
![Page 7: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/7.jpg)
Philips Research
ICC
D 1
999
- 7
7
Outline
• Introduction• Approach• Benchmark suite• Example• Results
![Page 8: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/8.jpg)
Philips Research
ICC
D 1
999
- 8
8
Project Approach: Y-chart
MachineDescription
BenchmarkApplicationsCompiler
Simulator
PerformanceNumbers
![Page 9: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/9.jpg)
Philips Research
ICC
D 1
999
- 9
9
Outline
• Introduction• Approach• Benchmark suite• Example• Results
![Page 10: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/10.jpg)
Philips Research
ICC
D 1
999
- 10
10
Terminology
Application domain
Application class
Application
Benchmark suite: set of all applications
![Page 11: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/11.jpg)
Philips Research
ICC
D 1
999
- 11
11
Benchmark Suite Characteristics
• Each application: typical for a class of applications within application domain
• The set covers a significant area of the application domain
• Each benchmark is sufficiently well tuned to the architecture to measure its performance
![Page 12: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/12.jpg)
Philips Research
ICC
D 1
999
- 12
12
The Benchmark SuiteData comm. Viterbi decoding
Audio coding AC3 decode
Video coding MPEG2 encodeDVC decode
Video proc. Layered natural motionDynamic noisereductionPeaking
Graphics 3D renderer library
![Page 13: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/13.jpg)
Philips Research
ICC
D 1
999
- 13
13
• Signal ratesaudio, video (at block and pixel rate)
• Basic data typesbyte (8), half-words (16), words (32), float (32)
• Data access patternssample stream, bitstream, random access
• Data independent and dependent load• Control processing and signal processing
Processing Characteristics
![Page 14: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/14.jpg)
Philips Research
ICC
D 1
999
- 14
14
Application DevelopmentAlgorithm
Reference C
TM1000 optimized CPU64 optimized
simulationexplorationICvideo video
knowledge
![Page 15: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/15.jpg)
Philips Research
ICC
D 1
999
- 15
15
Initial Design Considerations
• Goal: 6-8 times TM1000 performance• Standard ANSI-C, reuse of code• Utilize instruction and data parallelism• Limited complexity (embedded core)• Compatibility through recompilation• VLIW architecture
![Page 16: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/16.jpg)
Philips Research
ICC
D 1
999
- 16
16
Initial Design Choices
• Double vector length: 32 64• Enriched media instruction set
![Page 17: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/17.jpg)
Philips Research
ICC
D 1
999
- 17
17
Instruction Set Considerations
• A machine operation must be sufficiently generic within the application domain
• Sufficiently powerful operations• Limited number of operations• Consistency and orthogonality
![Page 18: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/18.jpg)
Philips Research
ICC
D 1
999
- 18
18
Outline
• Introduction• Approach• Benchmark suite• Example• Results
![Page 19: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/19.jpg)
Philips Research
ICC
D 1
999
- 19
19
Vector Instruction Example
|.| |.| |.| |.|
x3 x2 x1 x0
z
y3 y2 y1 y0
- - - -
|.| |.| |.| |.|
x7 x6 x5 x4
y7 y6 y5 y4
- - - -
0
Sum of absolute differences (“ub_me”)
![Page 20: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/20.jpg)
Philips Research
ICC
D 1
999
- 20
20
Application Code Example
int calc_sad(vec64ub *prv, vec64ub *cur, int s)
{
vec64ub left, right;
int i; int sad = 0;
for (i=0; i<8; i++)
{ left = prv[i*s]; right = cur[i*s];
sad += ub_me(left, right);
}
return(sad);
}
![Page 21: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/21.jpg)
Philips Research
ICC
D 1
999
- 21
21
Outline
• Introduction• Approach• Benchmark suite• Example• Results
![Page 22: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/22.jpg)
Philips Research
ICC
D 1
999
- 22
22
Results: Natural Motion
nops
cache stalls
instructions
Averagegain: 2.7x(cycles)
10
15
5
10
15
5
CPU64 cycles x 10 7 TM1000 cycles x 107
SAD
MED
SUB
UPC
SEG
SAD MEDSUB
UPC
SEG
0 20 40 60 80 100% 0 20 40 60 80 100%0 0
![Page 23: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/23.jpg)
Philips Research
ICC
D 1
999
- 23
23
Natural Motion Dynamic Load
10 20 30 40 50 60 70 80 90 100 110 0
50
100
150
Field number
cpu
lo
ad (
%)
90%
16%
tm1000 @ 125MHz
cpu64 @ 300MHz
![Page 24: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/24.jpg)
Philips Research
ICC
D 1
999
- 24
24
Summary
• Application domain:– Media processing for CE industry
• Benchmark suite:– Targeted to application domain – Optimized for class of processors
• Future products:– Gradual shift to software implementations of
signal processing functions
![Page 25: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/25.jpg)
Philips Research
ICC
D 1
999
- 25
25
TriMedia CPU64Architecture
J. T. J. van EijndhovenPhilips Research
Eindhoven, The Netherlands
![Page 26: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/26.jpg)
Philips Research
ICC
D 1
999
- 26
26
Outline
• Introduction• VLIW architecture• Instruction set• IDCT example• Status & conclusion
![Page 27: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/27.jpg)
Philips Research
ICC
D 1
999
- 27
27
Application Target
• Processor core, to be embedded in different ICs and products
• Real time processing of media streams• Cost-sensitive consumer electronics market
![Page 28: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/28.jpg)
Philips Research
ICC
D 1
999
- 28
28
Embedded Application
VLIWcpu
I$
video-in
video-out
audio-in
SDRAM
audio-out
PCI bridge
Serial I/O
timers I2C I/O
D$
![Page 29: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/29.jpg)
Philips Research
ICC
D 1
999
- 29
29
Performance Target
• 6x to 8x performance increase,to process f.i. higher-resolution video streams or more tasks in parallel
• Not more than double transistor count
Relative to TriMedia TM1000 (5-slot VLIW,100MHz, 32-bit datapath, media operations):
![Page 30: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/30.jpg)
Philips Research
ICC
D 1
999
- 30
30
Efficiency
Good performance/silicon cost ratio by:• Optimize CPU architecture and media
benchmark source code towards each other• Solve resource conflicts at compile time
![Page 31: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/31.jpg)
Philips Research
ICC
D 1
999
- 31
31
Outline
• Introduction• VLIW architecture• Instruction set• IDCT example• Results
![Page 32: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/32.jpg)
Philips Research
ICC
D 1
999
- 32
32
Architectural Speedup
Not simply increasing the VLIW issue width:• Diminishing gain of compiler-generated ILP• Increasing implementation complexity
(area, timing)
![Page 33: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/33.jpg)
Philips Research
ICC
D 1
999
- 33
33
Architectural Speedup
• Extended SIMD: uniform 64-bit design,vectors of 1-, 2-, and 4-byte elements(data throughput per cycle)
• New, extensive, media instruction set(functionality per cycle)
• Improved cache control(prefetch, alloc)
![Page 34: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/34.jpg)
Philips Research
ICC
D 1
999
- 34
34
CPU64 Architecture
datacache16KB
datacache16KB
mmummu
mmummu
64-bitmemory
bus multi-port 128 words x 64 bits register filemulti-port 128 words x 64 bits register file
FUFU FUFU FUFU FUFUFUFU
instructioncache32 KB
instructioncache32 KB
mmummu
bypass networkbypass network
PCPCexceptions exceptions
32-bitperipheral
bus
VLIW instruction decode and launch
![Page 35: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/35.jpg)
Philips Research
ICC
D 1
999
- 35
35
Architecture VLIW+SIMD
• Issues a 5-slot instruction every cycle• Each slot supports a selection of FUs• All FUs support vectorized (SIMD) data• Double-slot FU allows powerful multi-
argument, multi-result operations• All FUs are pipelined, latency 1 to 4
(except floating point divide and sqrt)
![Page 36: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/36.jpg)
Philips Research
ICC
D 1
999
- 36
36
Outline
• Introduction• VLIW architecture• Instruction set• IDCT example• Results
![Page 37: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/37.jpg)
Philips Research
ICC
D 1
999
- 37
37
Instruction Format
Per operation, per slot:• Up to 2 arguments, up to 1 result• Extra register for guarding (conditional execution)• Immediate argument size can be 32 bit
Instructions have compressed variable length format, decompressed during decode
![Page 38: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/38.jpg)
Philips Research
ICC
D 1
999
- 38
38
Operations
Intends to cover combinations of:• 1-, 2-, 4-byte elements in an 8-byte vector• Signed or unsigned type• Clipped versus wrap-around arithmetic• Set of basic functions
(ld, st, add, mul, compare, shift, ...)
![Page 39: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/39.jpg)
Philips Research
ICC
D 1
999
- 39
39
Instruction SetCategory Cnt CommentLoad/store ops 39 vectorized, endian-correctByte shuffles 67 vector type convertBit shifts 48 round, fieldsMultiplies 54 round, clip, wrap aroundInteger arith 104 clip, wrap aroundFloating point 59 scalar and vectorizedLookup table 6 vectorizedSpecial ops 23 MMU, cache, special regsBranch 10 (un)interruptible, trap
![Page 40: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/40.jpg)
Philips Research
ICC
D 1
999
- 40
40
Example: msp-multiply
Multiply tointernal doubleprecision integer
Each argument:4-way 16-bit int
Round lower half into upper half
Return upper half
+ + + +1 1 1 1
![Page 41: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/41.jpg)
Philips Research
ICC
D 1
999
- 41
41
Example: Transpose
mm nn oo pp
ii jj kk ll
ee ff gg hh
aa bb cc dd
bb ff jj nn
aa ee ii mm
2-slot ‘super-op’:takes 4 arguments,shuffles bytes,produces 2 results
(2-dimensional filtering)
![Page 42: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/42.jpg)
Philips Research
ICC
D 1
999
- 42
42
Example: 4-way average
(MPEG 2-dimensional half-pixel average)
+ + + + + + + +
+ + + + + + + +
1
add with extended precision
round
1
![Page 43: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/43.jpg)
Philips Research
ICC
D 1
999
- 43
43
Branches
• Branch operations have 3 branch delay slots• Compiler & scheduler try to fill these
(profiling, loop unrolling, inlining, guarding)• Branch units are properly pipelined• Up to 3 branch ops in 1 instruction• No branch prediction hardware• Branches are preferred moments for interrupt
servicing
![Page 44: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/44.jpg)
Philips Research
ICC
D 1
999
- 44
44
Memory Management Units
• Separate IMMU and dual-port DMMU• Variable page size: 4K, 16K, … ,16M• Indexed with 32-bit VA and 8-bit task ID• Inter-task, X/W, and supervisor-mode
protections• TLBs are software managed through precise
exceptions
![Page 45: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/45.jpg)
Philips Research
ICC
D 1
999
- 45
45
Outline
• Introduction• VLIW architecture• Instruction set• IDCT example• Results
![Page 46: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/46.jpg)
Philips Research
ICC
D 1
999
- 46
46
2D-IDCT Example
• IDCT code was tested early in the project, also for studying the programming model
• Mapped through our experimental C compiler and instruction scheduler
• Operates entirely on (vectors of) 16-bit data elements
• Simulation showed IEEE 1180 accuracy compliancy
![Page 47: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/47.jpg)
Philips Research
ICC
D 1
999
- 47
47
2D-IDCT Instructions
0
50
100
150
200
250
300
350
400CPU64
TM-1000
TI 320C62
Altivec
PA-8000
PentiumI
PentiumII
NEC V830OKOK
OK
Execution time in cycles, excluding data cache misses
OK = accuracy compliant
![Page 48: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/48.jpg)
Philips Research
ICC
D 1
999
- 48
48
Outline
• Introduction• VLIW architecture• Instruction set• IDCT example• Status & conclusion
![Page 49: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/49.jpg)
Philips Research
ICC
D 1
999
- 49
49
Status
Now in construction at Philips Semiconductors, Sunnyvale, CA
• 7M transistors (target)• 300Mhz clock (target)• 0.18 technology• 1.8 volt• couple of watts power
![Page 50: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/50.jpg)
Philips Research
ICC
D 1
999
- 50
50
Conclusion
• The 5-slot 64-bit VLIW provides a powerful architecture for media processing
• Performance gain of 6x to 8x on TM1000• Rich and regular media instruction set• Powerful multi-argument, multi-result ‘super-
op’ naturally fits VLIW architecture• Embedded DSP supports robust multi-tasking
through MMU
![Page 51: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/51.jpg)
Philips Research
ICC
D 1
999
- 51
51
TriMedia CPU64Application Development
Environment
E.J.D. PolPhilips Research
Eindhoven, The Netherlands
![Page 52: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/52.jpg)
Philips Research
ICC
D 1
999
- 52
52
Outline
• Introduction• Machine description• Vector models• Compilation trajectories• Optimization• Conclusion
![Page 53: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/53.jpg)
Philips Research
ICC
D 1
999
- 53
53
Basic architecture
• CPU64 is based on TM1000 VLIW DSPCPU• Application domain: digital media processing• Register-rich• Custom operations• Applications show several levels of parallelism
![Page 54: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/54.jpg)
Philips Research
ICC
D 1
999
- 54
54
Parallelism
• MIMD– Implicit in program (C)– Compile-time scheduling (ILP)
• SIMD– Explicit in program (vector types)– Mixes well with MIMD
• Task level parallelism
![Page 55: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/55.jpg)
Philips Research
ICC
D 1
999
- 55
55
Instruction Set
• General-purpose (scalar) operations for control• Special-purpose (custom, vector) operations
for media processing• 5 scalar types (8,16,32,64 bit ints, 32 bit floats)• 7 vector types (8,16,32 bit (un)signed ints, 32
bit floats)
![Page 56: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/56.jpg)
Philips Research
ICC
D 1
999
- 56
56
Goal
Make application development as simple as possible!
• Higher levels of parallelism • More special-purpose hardware• Range of CPUs will be available
![Page 57: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/57.jpg)
Philips Research
ICC
D 1
999
- 57
57
Outline
• Introduction• Machine description• Vector models• Compilation trajectories• Optimization• Conclusion
![Page 58: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/58.jpg)
Philips Research
ICC
D 1
999
- 58
58
Traditional Compilation
App 1App 2
App 3
Compiler AExe 1A
Exe 2A
Exe 3A
Exe 1BExe 2B
Exe 3BCompiler B
Machine A
Machine B
![Page 59: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/59.jpg)
Philips Research
ICC
D 1
999
- 59
59
Retargetable Compilation
App 1App 2
App 3
Compiler
Exe 1AExe 2A
Exe 3A
Exe 1BExe 2B
Exe 3B
Machine A
Machine B
Machine ADescription
Machine BDescription
![Page 60: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/60.jpg)
Philips Research
ICC
D 1
999
- 60
60
MD file contents
• MD describes instance from class of machines
• Contains information such as:– number & width of registers– number of issue slots– number & placement of function units– instruction latencies
![Page 61: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/61.jpg)
Philips Research
ICC
D 1
999
- 61
61
Y-Chart
MachineDescription
BenchmarkApplicationsCompiler
Simulator
PerformanceNumbers
![Page 62: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/62.jpg)
Philips Research
ICC
D 1
999
- 62
62
Outline
• Introduction• Machine description• Vector models• Compilation trajectories• Optimization• Conclusion
![Page 63: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/63.jpg)
Philips Research
ICC
D 1
999
- 63
63
Memory Endianness
• Endianness affects storage of scalars • Programmability requires vectors to be
subranges of scalar arrays• C address arithmetic requires increasing
address with increasing array index
![Page 64: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/64.jpg)
Philips Research
ICC
D 1
999
- 64
64
Register Endianness
• Endianness does not affect storage of scalars• Specific vector elements are addressed by
certain custom ops (if necessary)• Endianness might affect ordering of vector
elements in registers
![Page 65: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/65.jpg)
Philips Research
ICC
D 1
999
- 65
65
Endianness design choices
• Memory endianness is fixed for intra-element order (C requirement)
• Register endianness is fixed for both inter- and intra-element order, but it is irrelevant which (programming ease)
• Load/Store operations translate inter-element order between registers and memory
![Page 66: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/66.jpg)
Philips Research
ICC
D 1
999
- 66
66
Vector Load/Store operations
A[1].msbA[1].lsb A[0].msbA[0].lsb
A[0].lsbA[0].msbA[1].lsbA[1].msb register
(bit number)31
memory3 2 1 0
….
….
(byte address)
0
load/store
![Page 67: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/67.jpg)
Philips Research
ICC
D 1
999
- 67
67
Outline
• Introduction• Machine description• Vector models• Compilation trajectories• Optimization• Conclusion
![Page 68: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/68.jpg)
Philips Research
ICC
D 1
999
- 68
68
Software and Hardware Operations
• Hardware view of instruction set– Implement minimal, changeable set
• Software view of instruction set– provide regular, orthogonal, stable set
• Two instruction sets were defined for these purposes
![Page 69: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/69.jpg)
Philips Research
ICC
D 1
999
- 69
69
Instruction set libraries
• Hardware operations library– untyped, minimal, changing
• Software operations library– vector-typed, orthogonal, stable
• Mapping in MD file and library
![Page 70: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/70.jpg)
Philips Research
ICC
D 1
999
- 70
70
Library structure
simulator hw_ops sw_ops application
simulator application application
sourcecode
workstationexecutable
CPU64executable
![Page 71: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/71.jpg)
Philips Research
ICC
D 1
999
- 71
71
Functional Development
hw_ops sw_ops application
application
sourcecode
workstationexecutable
gcc
![Page 72: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/72.jpg)
Philips Research
ICC
D 1
999
- 72
72
Code Tuning
simulator hw_ops application
simulator application
gcc
interpretation
tmcc
sourcecode
workstationexecutable
CPU64executable
![Page 73: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/73.jpg)
Philips Research
ICC
D 1
999
- 73
73
Outline
• Introduction• Machine description• Vector models• Compilation trajectories• Optimization• Conclusion
![Page 74: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/74.jpg)
Philips Research
ICC
D 1
999
- 74
74
General guidelines
• Design application architecture (data flow)• Determine data formats (scalar/vector types)• Arrange data locale (memory/cache/registers)• Design control architecture (loop structure)• Optimize computations (custom-ops)• Fine tune cache behavior (prefetch/alloc)
![Page 75: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/75.jpg)
Philips Research
ICC
D 1
999
- 75
75
Outline
• Introduction• Machine description• Vector models• Compilation trajectories• Optimization• Conclusion
![Page 76: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/76.jpg)
Philips Research
ICC
D 1
999
- 76
76
Conclusions
• Applications are written in C code only• Machine Description supports changes in ISA• Vector types keep code legible• Endianness is not visible in application code• Fast functional development track• Accurate application tuning track
![Page 77: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/77.jpg)
Philips Research
ICC
D 1
999
- 77
77
TriMedia CPU64Design Space Exploration
G.J. HekstraPhilips Research
Eindhoven, The Netherlands
![Page 78: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/78.jpg)
Philips Research
ICC
D 1
999
- 78
78
Outline
• Introduction• Tools• Exploration• Real numbers• Conclusions
![Page 79: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/79.jpg)
Philips Research
ICC
D 1
999
- 79
79
Methodology: the Y-chart
MachineDescription
BenchmarkApplicationsCompiler
Simulator
PerformanceNumbers
![Page 80: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/80.jpg)
Philips Research
ICC
D 1
999
- 80
80
64-bit VLIW CPU
• Many parameters, large design space(s)
datacache16KB
datacache16KB
mmummu
mmummu
multi-port 128 words x 64 bits register filemulti-port 128 words x 64 bits register file
FUFU FUFU FUFU FUFUFUFU
64-bitmemory
bus
instructioncache32 KB
instructioncache32 KB
mmummu
bypass networkbypass network
VLIW instruction decode and launch
![Page 81: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/81.jpg)
Philips Research
ICC
D 1
999
- 81
81
Multimedia benchmark
• Nine multimedia applications from:– Data communication– Audio coding– Video coding– Video processing– Graphics
• Representative– Applications– Code– Datasets
![Page 82: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/82.jpg)
Philips Research
ICC
D 1
999
- 82
82
Challenge for design space exploration
• Simulation time for the benchmark for a single machine is 18 hours
• The number of design points for functional unit configuration alone:
• Results in 2000,000,000,000 years of computation time
1510
![Page 83: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/83.jpg)
Philips Research
ICC
D 1
999
- 83
83
Outline
• Introduction• Tools• Exploration• Real numbers• Conclusions
![Page 84: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/84.jpg)
Philips Research
ICC
D 1
999
- 84
84
Essential tooling
• Retargetable toolchain– Core compiler, scheduler, simulator
• Design and experiment management• DSE support tools
– Machine generation, pseudosim, analysis• Glue software
![Page 85: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/85.jpg)
Philips Research
ICC
D 1
999
- 85
85
Pseudo-simulationPseudosim: a retargetable pseudo-simulator• Performs a cycle-accurate calculation of the
amount of instruction cycles, without doing the actual machine simulation
• Gathers other machine statistics, such as utilisation of slots, functional units, operations
• Time for the whole benchmark is reduced from 18 hours to 4 minutes
![Page 86: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/86.jpg)
Philips Research
ICC
D 1
999
- 86
86
Pseudo-simulation
once,
18 hours
many times,
4 minutesSimulator
Compiler
Pseudosim
Machinedescription
Benchmarkapplications
Machinedescription’
code
statisticsresults
![Page 87: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/87.jpg)
Philips Research
ICC
D 1
999
- 87
87
Outline
• Introduction• Tools• Exploration• Real numbers• Conclusions
![Page 88: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/88.jpg)
Philips Research
ICC
D 1
999
- 88
88
Functional unit configuration
Problem:• How many of each type
of FU do I need?• Where do I place them?• How do I prevent
simulating too many machine variations?
slot
FU
type
![Page 89: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/89.jpg)
Philips Research
ICC
D 1
999
- 89
89
Naпve calculation of the design space size
• 24 types of FU’s31 configurations per type
• 6 types of super-op FU’s7 configurations per type
Size of design space: 40624 10731
![Page 90: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/90.jpg)
Philips Research
ICC
D 1
999
- 90
90
Not so naпve calculation of the design space size
• Assume relationship between FU types• Best-guess lower and upper bounds on
number of FU’s per type• Reduce permutations
Size of design space: 1510
![Page 91: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/91.jpg)
Philips Research
ICC
D 1
999
- 91
91
Systematic exploration
• It is not feasible to exhaustively compute all design points in the design space
• Therefore we probe, partition, and then explore the design space
• Careful set-up of experiments
![Page 92: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/92.jpg)
Philips Research
ICC
D 1
999
- 92
92
Reduction of the design space
• Observation: over 93% of operations are done in 30% of FU types
• Action: partition space – Exhaustive exploration
of important FU’s– Greedy exploration of
the remainder
70% 30%
7% 93%
operations
functional unit types
![Page 93: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/93.jpg)
Philips Research
ICC
D 1
999
- 93
93
Exhaustive experiment
• Close to 3000 machine variations
• Only 4 machines survive• Large performance
variation due to placement
• Time taken = 200 hours
194 195 196 197 198 199 200 201 202 8.6
8.8
9
9.2
9.4
9.6
9.8
10x 107
area
cy
cle
s
Exhaustive experiment
![Page 94: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/94.jpg)
Philips Research
ICC
D 1
999
- 94
94
Greedy experiments
191 192 193 194 195 196 197 8.6
8.7
8.8
8.9
9
9.1
9.2
9.3 x 107
area
First greedy experiment
cy
cle
s
• Less machine variations per experiment
• More machines survive• Less performance variation
due to placement• Close to another 3000
machine variations over all greedy experiments
![Page 95: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/95.jpg)
Philips Research
ICC
D 1
999
- 95
95
Outline
• Introduction• Tools• Exploration• Real numbers• Conclusions
![Page 96: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/96.jpg)
Philips Research
ICC
D 1
999
- 96
96
Simulated machines
• 180 Gbyte data transferred
• 800 Mbyte compressed data stored
• 2T cycles simulated• 10T operations issued
Experiment number
Probing 185
Exhaustive 3023
Greedy 2519
Total 5727
![Page 97: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/97.jpg)
Philips Research
ICC
D 1
999
- 97
97
Outline
• Introduction• Tools• Exploration• Real numbers• Conclusions
![Page 98: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/98.jpg)
Philips Research
ICC
D 1
999
- 98
98
Summary and conclusions
• A full-fledged design space exploration was done for the 64-bit TriMedia VLIW CPU core
• The use of both pseudo-simulation and a systematic exploration has made this feasible
• The outcome is:– A range of machines
in A-T space– Rules for functional
unit placement– A flexible, integrated
environment for continuation of DSE
![Page 99: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/99.jpg)
Philips Research
ICC
D 1
999
- 99
99
Summary and conclusions
• A large amount of time is spent in developing tools to enable the DSE
• The design of experiments and the analysis of results takes up more time than the execution
• Performing a DSE stretches the capabilities of the tools to the maximum
• The pseudo-simulator is a powerful tool that supports the full design space offered by the machine description
![Page 100: Philips Research ICCD 1999 - 1 1 TriMedia CPU64 Application Domain and Benchmark Suite A.K. Riemens Philips Research Eindhoven, The Netherlands](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eb55503460f94bbdcf3/html5/thumbnails/100.jpg)
Philips Research
ICC
D 1
999
- 10
0
100