the x86 family
DESCRIPTION
The x86 FamilyTRANSCRIPT
![Page 1: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/1.jpg)
Assembly Languagex86 Family Architecture
Motaz K. SaadSpring 2007
1Motaz K. Saad, Dept. of CS
![Page 2: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/2.jpg)
Overview• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System
2Motaz K. Saad, Dept. of CS
![Page 3: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/3.jpg)
General Concepts
• Basic microcomputer design• Instruction execution cycle• Reading from memory• How programs run
3Motaz K. Saad, Dept. of CS
![Page 4: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/4.jpg)
Basic Microcomputer Design• Clock synchronizes CPU operations• Control unit (CU) coordinates sequence of execution steps• ALU performs arithmetic and bitwise processing
4Motaz K. Saad, Dept. of CS
![Page 5: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/5.jpg)
Motaz K. Saad, Dept. of CS 5
ProcessorProcessor
Control Control UnitUnit
Arithmetic Arithmetic Logic Unit (ALU)Logic Unit (ALU)
Arithmetic Arithmetic Logic Unit (ALU)Logic Unit (ALU)
InputInputDevicesDevices
StorageStorageDevicesDevices
OutputOutputDevicesDevicesMemoryMemoryDataData InformationInformation
InstructionsInstructionsDataData
InformationInformation
InstructionsInstructionsDataData
InformationInformation
Control Control UnitUnit
![Page 6: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/6.jpg)
6Motaz K. Saad, Dept. of CS
![Page 7: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/7.jpg)
Clock• Synchronizes all CPU and BUS operations• Machine (clock) cycle measures time of a
single operation• Clock is used to trigger events
one cycle
1
0
7Motaz K. Saad, Dept. of CS
![Page 8: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/8.jpg)
What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System
8Motaz K. Saad, Dept. of CS
![Page 9: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/9.jpg)
Instruction Execution Cycle
• Fetch• Decode• Fetch operands• Execute • Store output
9Motaz K. Saad, Dept. of CS
![Page 10: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/10.jpg)
Motaz K. Saad, Dept. of CS 10
Cache Memory• High-speed expensive static RAM both inside and
outside the CPU.– Level-1 cache: inside the CPU– Level-2 cache: outside the CPU
• Cache hit: when data to be read is already in cache memory
• Cache miss: when data to be read is not in cache memory.
![Page 11: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/11.jpg)
How a Program Runs
11Motaz K. Saad, Dept. of CS
![Page 12: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/12.jpg)
Multitasking
• OS can run multiple programs at the same time.• Multiple threads of execution within the same
program.• Scheduler utility assigns a given amount of CPU
time to each running program.• Rapid switching of tasks
– gives illusion that all programs are running at once– the processor must support task switching.
12Motaz K. Saad, Dept. of CS
![Page 13: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/13.jpg)
IA-32 Processor Architecture
• Modes of operation• Basic execution environment• Floating-point unit• Intel Microprocessor history
13Motaz K. Saad, Dept. of CS
![Page 14: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/14.jpg)
Modes of Operation• Protected mode
– native mode (Windows, Linux)
• Real-address mode– native MS-DOS
• System management mode– power management, system security, diagnostics
• Virtual-8086 mode• hybrid of Protected
• each program has its own 8086 computer14Motaz K. Saad, Dept. of CS
![Page 15: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/15.jpg)
Basic Execution Environment• Addressable memory• General-purpose registers• Index and base registers• Specialized register uses• Status flags• Floating-point, MMX, XMM registers
15Motaz K. Saad, Dept. of CS
![Page 16: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/16.jpg)
Addressable Memory
• Protected mode– 4 GB– 32-bit address
• Real-address and Virtual-8086 modes– 1 MB space– 20-bit address
16Motaz K. Saad, Dept. of CS
![Page 17: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/17.jpg)
X86 General-Purpose Registers
CS
SS
DS
ES
EIP
EFLAGS
16-bit Segment Registers
EAX
EBX
ECX
EDX
32-bit General-Purpose Registers
FS
GS
EBP
ESP
ESI
EDI
Named storage locations inside the CPU, optimized for speed.
17Motaz K. Saad, Dept. of CS
![Page 18: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/18.jpg)
Accessing Parts of Registers• Use 8-bit name, 16-bit name, or 32-bit name• Applies to EAX, EBX, ECX, and EDX
18Motaz K. Saad, Dept. of CS
![Page 19: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/19.jpg)
Index and Base Registers• Some registers have only a 16-bit name for
their lower half:
19Motaz K. Saad, Dept. of CS
![Page 20: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/20.jpg)
Some Specialized Register Uses• Segment
– CS – code segment– DS – data segment– SS – stack segment– ES, FS, GS - additional
segments
• EIP – instruction pointer• EFLAGS
– status and control flags– each flag is a single
binary bit
• General-Purpose– EAX – accumulator– EBX – base register– ECX – loop counter– EDX – data register– ESP – stack pointer– ESI, EDI – index registers– EBP – extended frame
pointer (stack)
Motaz K. Saad, Dept. of CS 20
![Page 21: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/21.jpg)
Status Flags• Carry– unsigned arithmetic out of range
• Overflow– signed arithmetic out of range
• Sign– result is negative
• Zero– result is zero
• Auxiliary Carry– carry from bit 3 to bit 4
• Parity– sum of 1 bits is an even number
21Motaz K. Saad, Dept. of CS
![Page 22: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/22.jpg)
Intel Microprocessor History
• Intel 8086, 80286• IA-32 processor family• P6 processor family• CISC and RISC
22Motaz K. Saad, Dept. of CS
![Page 23: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/23.jpg)
Early Intel Microprocessors• Intel 8080
– 64K addressable RAM– 8-bit registers– CP/M operating system– S-100 BUS architecture– 8-inch floppy disks!
• Intel 8086/8088– IBM-PC Used 8088– 1 MB addressable RAM– 16-bit registers– 16-bit data bus (8-bit for 8088)– separate floating-point unit (8087)
23Motaz K. Saad, Dept. of CS
![Page 24: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/24.jpg)
The IBM-AT
• Intel 80286– 16 MB addressable RAM– Protected memory– several times faster than 8086– introduced IDE bus architecture– 80287 floating point unit
24Motaz K. Saad, Dept. of CS
![Page 25: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/25.jpg)
Intel IA-32 Family• Intel386
– 4 GB addressable RAM, 32-bit registers, paging (virtual memory)
• Intel486– instruction pipelining
• Pentium– superscalar, 32-bit address bus, 64-bit
internal data path
25Motaz K. Saad, Dept. of CS
![Page 26: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/26.jpg)
Intel P6 Family• Pentium Pro
– advanced optimization techniques in microcode
• Pentium II– MMX (multimedia) instruction set
• Pentium III– SIMD (streaming extensions) instructions
• Pentium 4 and Xeon– Intel NetBurst micro-architecture, tuned for
multimedia
26Motaz K. Saad, Dept. of CS
![Page 27: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/27.jpg)
CISC and RISC• CISC – complex instruction set
– large instruction set– high-level operations– requires microcode interpreter– examples: Intel 80x86 family
• RISC – reduced instruction set– simple, atomic instructions– small instruction set– directly executed by hardware– examples:
• ARM (Advanced RISC Machines)• DEC Alpha (now Compaq)
27Motaz K. Saad, Dept. of CS
![Page 28: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/28.jpg)
What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System
28Motaz K. Saad, Dept. of CS
![Page 29: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/29.jpg)
IA-32 Memory Management
• Real-address mode• Calculating linear addresses• Protected mode• Multi-segment model• Paging
29Motaz K. Saad, Dept. of CS
![Page 30: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/30.jpg)
Real-Address mode
• 1 MB RAM maximum addressable• Application programs can access any
area of memory• Single tasking• Supported by MS-DOS operating
system
30Motaz K. Saad, Dept. of CS
![Page 31: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/31.jpg)
Segmented MemorySegmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16-bit offset
li ne
ar a
ddr e
sse
s
one segment
31Motaz K. Saad, Dept. of CS
![Page 32: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/32.jpg)
Calculating Linear Addresses• Given a segment address, multiply it by 16 (add
a hexadecimal zero), and add it to the offset• Example: convert 08F1:0100 to a linear address
Adjusted Segment value: 0 8 F 1 0
Add the offset: 0 1 0 0
Linear address: 0 9 0 1 0
32Motaz K. Saad, Dept. of CS
![Page 33: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/33.jpg)
Your turn . . .What linear address corresponds to the segment/offset address 028F:0030?
028F0 + 0030 = 02920
Always use hexadecimal notation for addresses.
33Motaz K. Saad, Dept. of CS
![Page 34: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/34.jpg)
Your turn . . .What segment addresses correspond to the linear address 28F30h?
Many different segment-offset addresses can produce the linear address 28F30h. For example:
28F0:0030, 28F3:0000, 28B0:0430, . . .
34Motaz K. Saad, Dept. of CS
![Page 35: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/35.jpg)
Protected Mode (1 of 2)
• 4 GB addressable RAM– (00000000 to FFFFFFFFh)
• Each program assigned a memory partition which is protected from other programs
• Designed for multitasking• Supported by Linux & MS-Windows
35Motaz K. Saad, Dept. of CS
![Page 36: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/36.jpg)
Protected mode (2 of 2)
• Segment descriptor tables• Program structure
– code, data, and stack areas– CS, DS, SS segment descriptors– global descriptor table (GDT)
• MASM Programs use the Microsoft flat memory model
36Motaz K. Saad, Dept. of CS
![Page 37: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/37.jpg)
What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System
37Motaz K. Saad, Dept. of CS
![Page 38: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/38.jpg)
Components of an IA-32 Microcomputer
• Motherboard• Video output• Memory• Input-output ports
38Motaz K. Saad, Dept. of CS
![Page 39: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/39.jpg)
Motherboard
• CPU socket• External cache memory slots• Main memory slots• BIOS chips• Sound synthesizer chip (optional)• Video controller chip (optional)• IDE, parallel, serial, USB, video, keyboard,
joystick, network, and mouse connectors• PCI bus connectors (expansion cards)
39Motaz K. Saad, Dept. of CS
![Page 40: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/40.jpg)
Intel D850MD Motherboard
dynamic RAM
Pentium 4 socket
Speaker
IDE drive connectors
mouse, keyboard, parallel, serial, and USB connectors
AGP slot
Battery
Video
Power connector
memory controller hub
Diskette connector
PCI slots
I/O Controller
Firmware hub
Audio chip
Source: Intel® Desktop Board D850MD/D850MV Technical Product Specification
40Motaz K. Saad, Dept. of CS
![Page 41: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/41.jpg)
Video Output
• Video controller– on motherboard, or on expansion card– AGP (
accelerated graphics port technology)• Video memory (VRAM)• Video CRT Display
– uses raster scanning– horizontal retrace– vertical retrace
• Direct digital LCD monitors– no raster scanning required
41Motaz K. Saad, Dept. of CS
![Page 42: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/42.jpg)
Sample Video Controller (ATI Corp.)
• 128-bit 3D graphics performance powered by RAGE™ 128 PRO
• 3D graphics performance
• Intelligent TV-Tuner with Digital VCR
• TV-ON-DEMAND™
• Interactive Program Guide
• Still image and MPEG-2 motion video capture
• Video editing
• Hardware DVD video playback
• Video output to TV or VCR
42Motaz K. Saad, Dept. of CS
![Page 43: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/43.jpg)
Memory• ROM
– read-only memory• EPROM
– erasable programmable read-only memory• Dynamic RAM (DRAM)
– inexpensive; must be refreshed constantly• Static RAM (SRAM)
– expensive; used for cache memory; no refresh required• Video RAM (VRAM)
– dual ported; optimized for constant video refresh• CMOS RAM
– complimentary metal-oxide semiconductor– system setup information
• See: Intel platform memory (Intel technology brief)43Motaz K. Saad, Dept. of CS
![Page 44: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/44.jpg)
Input-Output Ports• USB (universal serial bus)
– intelligent high-speed connection to devices
– up to 12 megabits/second– USB hub connects multiple devices– enumeration: computer queries devices– supports hot connections
• Parallel– short cable, high speed– common for printers– bidirectional, parallel data transfer– Intel 8255 controller chip
44Motaz K. Saad, Dept. of CS
![Page 45: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/45.jpg)
Input-Output Ports (cont)
• Serial– RS-232 serial port– one bit at a time– uses long cables and modems– 16550 UART (universal asynchronous
receiver transmitter)– programmable in assembly language
45Motaz K. Saad, Dept. of CS
![Page 46: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/46.jpg)
What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System
46Motaz K. Saad, Dept. of CS
![Page 47: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/47.jpg)
Levels of Input-Output• Level 3: Call a library function (C++, Java)
– easy to do; abstracted from hardware; details hidden– slowest performance
• Level 2: Call an operating system function– specific to one OS; device-independent– medium performance
• Level 1: Call a BIOS (basic input-output system) function– may produce different results on different systems– knowledge of hardware required– usually good performance
• Level 0: Communicate directly with the hardware– May not be allowed by some operating systems
47Motaz K. Saad, Dept. of CS
![Page 48: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/48.jpg)
Displaying a String of Characters
When a HLL program displays a string of characters, the following steps take place:
48Motaz K. Saad, Dept. of CS
![Page 49: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/49.jpg)
ASM Programming levelsASM programs can perform input-output at each of the following levels:
49Motaz K. Saad, Dept. of CS
![Page 50: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/50.jpg)
Summary• Central Processing Unit (CPU)• Arithmetic Logic Unit (ALU)• Instruction execution cycle• Multitasking• Floating Point Unit (FPU)• Complex Instruction Set• Real mode and Protected mode• Motherboard components• Memory types• Input/Output and access levels
50Motaz K. Saad, Dept. of CS
![Page 51: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/51.jpg)
More Details about X86 Family Architecture
X86 family Generations
51Motaz K. Saad, Dept. of CS
![Page 52: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/52.jpg)
X86 Family• 8086 and 8088 Microprocessors• 80x86 architecture
Address bus : 20 bits, 16 bits for 8-bit chips Max. memory capacity : 1 Mbytes
Internal structure is divided into BIU and EU Fetch and instruction execution can occur simultaneously
Length of internal registers expanded from 8 bit to 16 bit
H/W multiply and divide instructions built into the processor
Support for an external math coprocessor for floating-point operations in H/W as much as 100 times faster
52Motaz K. Saad, Dept. of CS
![Page 53: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/53.jpg)
Intel 8085 architecture : 8-bit data, 16-bit address
53Motaz K. Saad, Dept. of CS
![Page 54: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/54.jpg)
Internal architecture of 8086
54Motaz K. Saad, Dept. of CS
![Page 55: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/55.jpg)
PC Standard
For 16bit data bus, two 8-bit memory banks are required expensive at the time
in 1979, Intel announced 8088 µ-P that is identical to the 8086 except an external 8-bit data bus. Two memory accesses are needed to input a word.
IBM announced the IBM-PC, using 8088 µ-P and 16 KB memory (expandable to 64 KB). Clock speed : 4.77 MHz -------- PC standard is defined.
55Motaz K. Saad, Dept. of CS
![Page 56: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/56.jpg)
80186 and 80188 Microprocessors
High-integration CPUs : includes 8086 (or 8088) core and a clock generator, a programmable timer, an interrupt controller, a DMA controller, etc.
Instruction set is fully compatible to 8086 and 8088, but include 9 new instructions.
Used for IBM-PC compatibles and many embedded computers.
56Motaz K. Saad, Dept. of CS
![Page 57: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/57.jpg)
80286 Microprocessor
Processor of IBM PC-AT Provide two programming modes
1) Real mode - functions exactly same as 8086 - use only 20 least significant address lines (max. 1 MB) - faster than 8086 due to redesigning and higher clock
2) Protected mode - 16 new instructions are added- support multi-program environment by giving each program a predetermined amount of memory (16 MB)- programs no longer have physical addresses, but are addressed by a segment selector- Several programs can be loaded into memory at the same time, but protected from each other (*MS-DOS)
57Motaz K. Saad, Dept. of CS
![Page 58: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/58.jpg)
The 8086 and 80286 microprocessors.
58Motaz K. Saad, Dept. of CS
![Page 59: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/59.jpg)
80386 Microprocessor
New Standard announced (1985) by Intel with commitment of successive u-P generations being remained compatible with this chip, Intel Architecture-32 (IA-32) thru 2000.
Data bus & internal registers : 32 bits Address bus : 32 bits max. 4 GB of physical memory
59Motaz K. Saad, Dept. of CS
![Page 60: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/60.jpg)
Internal architecture of 80386
60Motaz K. Saad, Dept. of CS
![Page 61: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/61.jpg)
Internal registers (partly) of 80386
61Motaz K. Saad, Dept. of CS
![Page 62: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/62.jpg)
80386 supports two operating modes (like 80286)
1) Real Address Mode - used by MS-DOS - in this mode, 80386 becomes a fast 8086. 2) Protected Virtual Address Mode (Protected Mode) - On-board MMU manages 4 GB of memory - Each task is given a segment of memory governed by a descriptor register, that defines the segment base address, the segment limit, and the attributes for the segment (code, data, read-only, etc.) - Use paging technique : 4 KB pages can be swapped in and out of memory (using a disk) to allow a task to have a virtual memory space as large as 64 TB.
62Motaz K. Saad, Dept. of CS
![Page 63: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/63.jpg)
When operating with 64 KB of cache, the 386 achieves a hit rate of 93% the processor operates at full speed 93% of the time
Instruction set of 386 is 100% compatible with the older processors in the family.
14 new instructions are added and several others are modified. [ex] data can be moved between the internal registers at a time.
80386SX : designed to ease the transition from 16- to 32-bit processors --- 16-bit external data bus and 24-bit address bus.
63Motaz K. Saad, Dept. of CS
![Page 64: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/64.jpg)
80486 Microprocessor
Maintain compatibility with the older u-Ps Only 6 new instructions are added to be used by OS S/W, not by application programs. Redesigned using RISC concepts frequently used instructions to execute in a single clock cycle. New 5-stage instruction execution pipeline 5 instructions can be executed at once. On-board 8K cache and 80387 coprocessor twice faster than 386 (20 MHz 387 = 40 MHz 386)
486SX : excludes 80387, designed for low-end appli- cations that do not require a coprocessor.
64Motaz K. Saad, Dept. of CS
![Page 65: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/65.jpg)
486DX2 and DX4
DX2 : the internal clock rate is twice the external clock. DX4 : the internal clock rate is three times.
Allow to use less expensive components on the computer system board, while the processor operate at its maximum data rate (internally).
[Ex] 486DX2 66 : 66 MHz (int. clock) & 33 MHz (ext. clock) 486DX4 100 : 100 MHz (int. clock) & 33 MHz (ext. clock)
Overdrive Processors : 486 system boards include an over-drive socket to allow users to upgrade low-speed 486DX or486SX with 486DX2 and DX4 style processors.
65Motaz K. Saad, Dept. of CS
![Page 66: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/66.jpg)
Pentium
Superscalar Architecture : provides two instruction execution pipelines, each with its own ALU, addressgeneration circuitry, and data cache interface. execute two different instructions simultaneously
Additional Features :• includes on-board cache (separate 8K instruction cache and data cache) and a coprocessor• 8-stage instruction pipelines• achieves 5~8 times floating-point performance of 486• external data bus : 64 bits• about twice as fast as the 486
66Motaz K. Saad, Dept. of CS
![Page 67: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/67.jpg)
Key features of the Pentium microprocessor. The execution unit has two pipelines allowing two instructions to be executed simultaneously.
67Motaz K. Saad, Dept. of CS
![Page 68: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/68.jpg)
MMX (Multimedia Extension) : provides 3 architecturalenhancements over non-MMX Pentium
① 57 instructions are added for multimedia (audio, video,and graphic data) applications.
② SIMD(Single-Instruction stream Multiple-Data stream)allows the same operation to be performed on multipledata items. Because many multimedia applications require large blocks of data to be manipulated, SIMD provides a significant performance enhancement.
③ Internal cache size is increased from 16K to 32K.
For general applications, 10~20% performance improved.For multimedia applications, nearly 70% improved.
68Motaz K. Saad, Dept. of CS
![Page 69: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/69.jpg)
Socket 7 : ZIF(zero insertion force) socket
Pentium chip : 296-pin PGA package. A heat sink and fan are mounted atop the chip, and the entire assembly plugged into a ZIF, so-called socket 7.
Socket 7 defines a platform that defines the front side bus connection to the L2 cache, disk interface, video interface, and the ISA and PCI expansion buses.
69Motaz K. Saad, Dept. of CS
![Page 70: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/70.jpg)
Pentium processor with heat sink and fan mated to a Socket 7 connector.
70Motaz K. Saad, Dept. of CS
![Page 71: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/71.jpg)
Pentium Pro
6th - generation processors (Pentium Pro, Pentium II, Pentium III and Celeron)
36 address lines max. 64 GB memory New features 1. Inclusion of L2 cache in the same package with proc. 2. New system board platform called Socket 8 (Pro), slot 1 & 2 (Pentium II, III, and Celeron), and Socket 370 (Pentium III and Celeron). 3. New instruction architecture based on Dynamic Execution
Two chips in One Package : Pentium Pro consists of two separate silicon dies – one for the processor and the other for256KB L2 cache.
71Motaz K. Saad, Dept. of CS
![Page 72: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/72.jpg)
The Pentium Pro is two chips in one. The larger die is the processor, the smaller a 256K L2 cache. (Courtesy of Intel Corporation.)
72Motaz K. Saad, Dept. of CS
![Page 73: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/73.jpg)
Dynamic Execution : a new approach to processing S/Winstructions that reduces idle processor time.
① Multiple Branch Prediction : Pentium Pro can look as far as 30 instructions ahead to anticipate conditionalbranches reduce waste of pipeline clocks.
② Data Flow Analysis : looks at upcoming S/W instruc-tions for the optimal sequence of processing.
③ Speculative Execution : allows to execute instructionsin a different order from which they are entered theprocessor = “out-of-order execution”. The result ofthese instructions are stored as speculative resultsuntil their final states can be determined.
73Motaz K. Saad, Dept. of CS
![Page 74: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/74.jpg)
Superscalar Processor of Degree Three : Pentium has
three instruction decoders, and can execute 3 simul-
taneous instructions.
Internal Cache : L2 cache in the same package.
74Motaz K. Saad, Dept. of CS
![Page 75: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/75.jpg)
Pentium II
• Pentium Pro is dead (short life) due to - the lack of MMX instructions- use of the expensive dual- and tri-cavity package
Pentium II is a Pentium Pro with MMX technology, repackaged in a new single-edge contact(SEC) cartridge that is inserted in “Slot 1 connector – 242 pins” or “Slot 2 connector – 330 pins
Processor and L2 are mounted on a ceramic substrate (silicon dies are separate)
Processor clock : 300 ~ 450 MHz, bus clock : 100 MHz L1(32 KB) & L2(512 KB) with 64-bit dedicated bus
75Motaz K. Saad, Dept. of CS
![Page 76: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/76.jpg)
Exploded view of single-edge contact (SEC) cartridge. (Courtesy of Intel Corporation.)
76Motaz K. Saad, Dept. of CS
![Page 77: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/77.jpg)
Installing the SEC cartridge into the retention mechanism. (Courtesy of Intel Corporation.)
77Motaz K. Saad, Dept. of CS
![Page 78: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/78.jpg)
Celeron
Pentium II without L2 cache (Pentium II SX ?)
Use the slot 1 connector without the plastic cover called “naked CPU”
Celeron A : Include 128KB L2 cache on the same die with processor.
- Drawback : 66 MHz bus cycle- 370-pin PGA package (called Socket 370)
78Motaz K. Saad, Dept. of CS
![Page 79: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/79.jpg)
The Celeron processor is a Pentium II without the L2 cache. Later versions, called the Celeron A, include this cache on the same silicon die with the processor. (Courtesy of Intel
Corporation.)
79Motaz K. Saad, Dept. of CS
![Page 80: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/80.jpg)
Pentium III
Higher clock speed : based on the Pentium II core, with 600MHz clock and an external bus freq. of 133MHz
70 new streaming SIMD extensions (SSE) :- 50 to improve floating-point performance- 12 to improve multimedia processing- 8 to improve the efficiency of L1 cache
80Motaz K. Saad, Dept. of CS
![Page 81: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/81.jpg)
The Pentium III microprocessor with integrated L2 cache. This chip has more than 22 million transistors. (Courtesy of Intel Corporation.)
81Motaz K. Saad, Dept. of CS
![Page 82: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/82.jpg)
Xeon Processors
Scalability : As processing demands increase, additional processors can be interconnected to keep pace.
- One of the advantages of Pentium Pro that can support up to 4 processors ; SMP (symmetric multiprocessing)
Pentium II Xeon processor can be scaled to 2, 4, 8 or more, and used for high-end server and workstations.
Pentium III Xeon processor : similar but offer the strea- ming SIMD technology.
82Motaz K. Saad, Dept. of CS
![Page 83: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/83.jpg)
P7 Itanium
IA-64 : 7th-generation processor architecture, Code name = Merced
64-bit architecture : 128 64-bit registers & 128 82-bit floating-point registers (including hidden bits) [c.f.] IA-32 : 10 32-bit reg., 8 fl-pt. reg.
Explicit parallelism : instructions are packed in 128-bit bundles ready for execution. Each bundle consists of 3 41-bit instructions and 5-bit template. All three inst- ructions are dispatched in parallel
83Motaz K. Saad, Dept. of CS
![Page 84: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/84.jpg)
Speculation : preload data to minimize memory delays when data is needed
Predication : When a conditional branch instruction is encountered, Itanium follows both branch paths, then commits the results of the correct path only.
Data bus : 128 bits
Address bus : 64 bits max. 264 bytes memory
84Motaz K. Saad, Dept. of CS
![Page 85: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/85.jpg)
80x86 Compatible Microprocessors
Second Sources : manufacturing 80x86 u-P chips after licensed by Intel.
Clones and Look-Alikes
Pin-for-pin replacements with all of the same fea- tures as the Intel processor.
[Ex] AMD 386DX, 486DX4-100, Cyrix 5x86, etc.
85Motaz K. Saad, Dept. of CS
![Page 86: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/86.jpg)
The AMD K7 or Athlon processor. It mates to a new proprietary socket called Slot A. (Courtesy of Advanced Micro Devices.)
86Motaz K. Saad, Dept. of CS
![Page 87: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/87.jpg)
Measuring Processor Performance
Benchmark programs : used to measure the performance of a computer system (system benchmarks) or of a com- ponent in that system such as the processor, disk, video card, or main memory (component benchmarks).
Component-level Benchmarks
• Whetstone : used to measure the time to execute integer and floating-point arithmetic instructions and “if” statements. --- including a high percentage of fl.pt. operations mostly used to represent numerical programs.
87Motaz K. Saad, Dept. of CS
![Page 88: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/88.jpg)
• Dhrystone : a synthetic benchmark consisting of 12 procedures with 94 statements, no fl.-pt. ops.
• Microprocessor Benchmarks : developed for compa- ring the processing ability of the vaious u-P chips. --- Ziff-Davis’ CPUmark and Intel’s iCOMP index.
CPUmark : measures the speed of a PC’s proc- essor subsystem, including the CPU, its internal and external caches, and system RAM. [Ex] Fig. 1-20 : CPUmark99 ratings for 80x86s
iCOMP : combines 4 industry standard benchmarks : CPUmark32, Norton SI32, SPEC95, and the Intel Media Benchmark (audio, vedio, image, 3-D, etc.).
88Motaz K. Saad, Dept. of CS
![Page 89: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/89.jpg)
CPUmark is a benchmark that measures the speed of the processor and its internal cache.
89Motaz K. Saad, Dept. of CS
![Page 90: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/90.jpg)
System-level Benchmarks
• Microcomputer Benchmarks : measures the speed of processor with considering a slow disk or video subsystem.
Winston : System-level, application-based benchmark to measure a PC’s overall performance when running today’s 32-bit applications on Window 95, 98, NT. [Ex] Winstone 98 ratings for 80x86s
Performance Rating : Cyrix and AMD developed theP-rating (Processor Performance) system --- runningapplications on a processor and compare to a Pentiumu-P. [Ex] Table 1-2 : PR166 ~ 366 for AMD and Cyrix chips
90Motaz K. Saad, Dept. of CS
![Page 91: The x86 Family](https://reader033.vdocuments.us/reader033/viewer/2022061219/54b984d24a7959ab058b456e/html5/thumbnails/91.jpg)
Winstone 98 measures the performance of a PC system running typical Windows applications.
91Motaz K. Saad, Dept. of CS