iaf0042 arvo toomsalu - ttu.ee · 5 memory hierarchy memory hierarchy wide-spread model a typical...
TRANSCRIPT
![Page 1: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/1.jpg)
1
Lecture Notes
IAF0042
Arvo Toomsalu
![Page 2: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/2.jpg)
2
Computer Architecture Introduction
I/O-subsystem
Processor subsystem Memory subsystem
CORE
Computer Model
Classical Architectures
Princeton or von Neumann architecture
System Bus
Data and Instructions
CPU
MEMORY
Data &
Instructions
![Page 3: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/3.jpg)
3
Harvard architecture
I Bus D Bus
MEMORY Instructions
CPU
MEMORY Data
![Page 4: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/4.jpg)
4
MEMORY SYSTEM
MSD (Memory Storage Devices)
Primary storage Secondary storage
RAM ROM Magnetic Optical
CAM ROM M tape CD-ROM
SRAM PROM M disk WORM (CD-R)
DRAM (OTP) Magneto-optical CD-RW
Molecular RPROM DVD
RAM Hologram Optical Disc
Flash ROM
EEPROM
[UVROM]
CAM – Content Addressable Memory (Associative Memory)
![Page 5: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/5.jpg)
5
Memory Hierarchy
Memory hierarchy wide-spread model
A typical hierarchy consists of:
1. Register file;
2. Per-processor level 1 (L1) instruction and data cache;
3. On-chip, shared unified level 2 (L2) cache;
4. Off-chip level 3 (L3) cache;
5. Main memory;
6. Hard disc for virtual memory.
Extended memory hierarchy model
![Page 6: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/6.jpg)
6
Memories Internal (architectural) Organization
DRAM – dynamic RAM
SDRAM - synchronous dynamic RAM
DDR-SDRAM - double-data-rate SDRAM
MDRAM – multi-bank DRAM
ESDRAM - cache-enhanced DRAM
etc.
![Page 7: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/7.jpg)
7
MULTIPROCESSOR SYSTEMS
Flynn-Johnson taxonomy
SISD Architecture
I- instructions; D – data
SISD
SIMD
MISD
MIMD
SINGLE
DATA
STREAM
MULTIPLE
DATA
STREAM
SINGLE
INSTRUCTION
STREAM
MULTIPLE
INSTRUCTION
STREAM
CU EU MUI
I
D
![Page 8: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/8.jpg)
8
SIMD Architecture
MISD Architecture
D
D
D
I
I
I
I
I
CU
MUEU
EU
EU
MU
MU
MU CU CU CU
EU EU EUD D
I I I
I
I
I
![Page 9: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/9.jpg)
9
MIMD Architecture
SM - Shared Memory
LM – Local Memory
CU
CU
CU
EU
EU
EU
MU
MU
MU
D
D
D
I
I
I
PR PR PR
SM SMIO
System interconnect(bus, crossbar, network) UMA model
![Page 10: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/10.jpg)
10
PR – processor; IO – input-output unit; SM – shared memory; LM – local memory;
GSM – global shared memory; CSM – cluster shared memory;
CIN – cluster interconnection network.
GSMGSMGSM
PR
PR
PR
PR
CSM CSM
CSMCSM
CIN CIN
Global interconnection network
Cluster 1 Cluster n
NUMA model (cluster)
![Page 11: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/11.jpg)
11
UMA versus NUMA
CPU CPU CPU CPU Cache Cache CacheCache
MEM MEM MEM MEM
Interconnection Network
Uniform
memory
latency
CPU CPU CPU CPU CacheCacheCacheCache
MEM MEM MEM MEM
Interconnection Network
Long memory latency
NUMA
UMA
Short
local
memory
latency
Microprocessor systems capabilities are related to system processing capabilities include:
Cost-performance
Throughput (operations per time unit)
Resource sharing
![Page 12: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/12.jpg)
12
Example
The Newisys ASIC implementation HORUS
![Page 13: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/13.jpg)
13
Summary
Taxonomy of Mono- and Multiprocessor Organizations
Serial
UniprocessorVectorprocessor
Arrayprocessor
Symmetricmultiprocessor
(SMP)
Nonuniformmemory access(NUMA)
Clusters
SISD SIMD MIMD
Shared
memory
Distributed
memory
Tightlycoupled
Looselycoupled
Multi ALUOverlappedoperations
MISD
Parallel
Processor Organization
Literature Arthur W. Burks, Herman H. Goldstine, John von Neumann. Preliminary Discussion of the
Logical Design of an Electronic Computing Instrument.
Arvutivõrgus: http://www.cs.unc.edu/~adyilie/comp265/vonNeumann.html
![Page 14: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/14.jpg)
14
Network Processors
Network processor is a programmable CPU chip that is optimized for networking and
communications functions.
Two common approaches (a, b) to parallelism in network processors:
a. Input packets are distributed among multiple processing units to divide the load.
b. Input packets flow through a pipeline of processing elements.
![Page 15: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/15.jpg)
15
Graphics Processor
A graphics processor (video card, graphic accelerator card, display adapter) is a special
purpose microprocessor specifically designed to generate signals to drive a video
monitor.
In graphics applications, complex shapes and structures are formed through the
sampling, interconnection and rendering of more simple objects (primitives).
Graphics primitives may include lines, characters, areas (triangles and ellipses), and
shapes (polygons, spheres, cylinders and the like).
![Page 16: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/16.jpg)
16
These primitives are formed by the interconnection of individual pixels.
3D graphics images, there are three dimensions, include the dimension of depth
(Z dimension).
Modern computers typically produce graphical output using a sequence of tasks known as
a graphics pipeline.
![Page 17: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/17.jpg)
17
NVIDIA GeForce 6800 Features
1. High performance:
2. Multiple small independent memory partitions for improved latency
3. Early culling and clipping, cull non-visible primitives at high rate;
4. Rasterization supports aliased and anti-aliasing and triangles, etc;
5. Z-Cull, allows high-speed removal of hidden surfaces;
6. Occlusion Query, keeps a record of the number of fragments passing or failing the
depth test and reports it to the CPU.
![Page 18: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/18.jpg)
18
Pyramid3D Real-time Graphics Processor TriTech Microelectronics, Inc.
Multiprocessor architecture
Single-chip 3D graphics solution, which consists of:
� Geometry Processor
� Primitive Processor
� Pixel Processor
![Page 19: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/19.jpg)
19
Multimedia Processors
Multimedia is media and content that uses a combination of different content forms.
Multimedia is integration of multiple forms of media: text, graphics, audio, video,
communication etc.
Multimedia Applications Characteristics
The most important ones are:
• Real-time response.
• Processing of streaming data.
• Significant fine and coarse grained data parallelism.
• Data reorganization.
• Small loops.
• High memory bandwidth requirement. The applications process large data sets, putting a severe burden on memory system.
• Small data types.
• MMAs perform significantly more arithmetic operations than GPAs.
•
![Page 20: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/20.jpg)
20
Classification of Processor Architectures that Support Multimedia
Dedicated multimedia processors
The dedicated processors are typically custom designed architectures intended to perform
specific multimedia functions. Some advanced multimedia processors provide also
support for 2D and 3D graphics applications.
Designs of dedicated multimedia processors ranges from fully custom architectures,
referred to as function specific architectures, with minimal programmability, to fully
programmable architectures.
A. Function specific architectures
Function specific dedicated multimedia architectures provide limited, programmability,
because they use dedicated architectures for a specific encoding or decoding standard.
B1. Flexible programmable architectures
These processors can have a moderate to high flexibility, are based on coprocessor
concept as well as parallel datapaths and deeply pipelined designs.
![Page 21: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/21.jpg)
21
TI’s Multimedia Video Processor
B2. Adapted programmable architectures These processors provide increased efficiency by adapting the architecture to the specific
requirements of video coding applications.
![Page 22: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/22.jpg)
22
C-Cube’s VideoRISC processor
The modern advanced dedicated multimedia processors use SIMD and VLIW
architectural schemes and their variations to achieve very high parallelism.
![Page 23: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/23.jpg)
23
Philips TriMedia CPU64
![Page 24: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/24.jpg)
24
Philips TriMedia CPU64 TM1x00 with VLIW-core
Processor’s main characteristics are:
1. A 5-issue VLIW architecture with a 32-bit word size;
2. 27 functional units;
3. Any operation can be guarded to provide conditional execution without branching;
4. Instruction set and functional units optimized with respect to media processing;
5. A single multi-ported register file with bypass network, allowing 1-cycle latency operations;
6. 32 kB, 8-way instruction cache;
7. 16 kB, 8-way, quasi-dual ported, data cache;
8. A variable-length (compressed) instruction set design.
![Page 25: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/25.jpg)
25
Example
ZMS-08 Media Processor ZiiLABS Pte Ltd.
Typical Application
o Web tablets
o Netbooks
o Connected TVs
o Portable infotainment
o Digital media hubs
o Point of service terminals
o Video conferencing systems
Main Features
o Blue-ray Quality 1080p H.264 video decode
o 1080p H.264 video encode
o 720p H.264 video conferencing
o Multi format media codecs o ARM Cortex-A8 at 1GHz
o Accelerated graphics and compositing
o Advanced image signal processing
o Rich peripheral integration and connectivity
Performance
o Blue-ray Quality 1080p H.264 video decode at 40mbps
o Simultaneous 720p H.264 video encode and decode
o 1080p H.264 video encode
o ARM Cortex-A8 at 1GHz
![Page 26: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/26.jpg)
26
General-purpose (GP) processors
GP processors provide support for multimedia by including multimedia instructions
into the instruction set.
Multimedia Processors Architecture Development Trends
There are three new architectural concepts:
1. Reconfigurable computing;
2. Simultaneous multithreading (SMT):
![Page 27: IAF0042 Arvo Toomsalu - ttu.ee · 5 Memory Hierarchy Memory hierarchy wide-spread model A typical hierarchy consists of: 1. Register file; 2. Per-processor level 1 (L1) instruction](https://reader031.vdocuments.us/reader031/viewer/2022022009/5ae9f3ed7f8b9ae5318bc07d/html5/thumbnails/27.jpg)
27
SMT based multimedia architecture
3. Associative controlling