ioana burcea initial observations of the simultaneous multithreading pentium 4 processor nathan tuck...

Ioana Burcea

Initial Observations of the Simultaneous Multithreading

Pentium 4 Processor

Nathan Tuck and Dean M. Tullsen

Agenda

SMT – proposed in research Intel Hyper-threading Methodology

- Benchmarks and experiments Experimental Results Questions?

SMT in Research

Up to 8 contexts – 8 way SMT ICOUNT 2.8 fetching policy

Intel: Hyper-threading

SMT in real silicon – Intel Pentium 4

- Single vs. multithreaded mode

Methodology

Pentium 4 2.5 GHz 512 DRAM RedHat 7.3 Linux 2.4.28smp

- Linux treats the system as a dual-processor

- It has a separate run queue for each virtual processor Benchmarks

- SPEC CPU2000

- NAS parallel benchmarks

- SPLASH2 (modified input)

Speedup for Heterogeneous Workloads

TSMT = total_execution_time / number of runs

Speedup = Tseq / TSMT

Speedup per combination = Sbench_1 + Sbench_2

• At least 12 total jobs

• At least 3 runs for each job

Static Partitioning of Resources

• SPECINT 83% on average

• SPECFP 85% on average

• eon 71%

• wupwise 72%

• mcf 93%

• art 97%

• swim 98%

Independent Threads

Parallel Multithreaded Speedup

SPLASH NAS

Synchronization and Communication Speed

Reading a value protected by a lock

- 37 million times per second

- 68 cycles = lock & read Updating a value protected by a lock

- 14.6 million times per second

- 171 cycles = lock & update

Synchronization and Communication Speed (cont’d)

Loop result = independent computationcomputation that uses result – flow dependence

Independent computation a loop that contains

a load a float multiply a float add

Synchronization and Communication Speed (cont’d)

Heterogeneous vs. Homogeneous Workloads

Two self copies of SPEC

- Average speedup 1.11 < 1.20 Integer vs. integer 1.17 Float vs. float 1.20 Integer vs. float 1.21

Compiler Interaction

Baseline?

Questions?

Is resource partitioning a good approach? IBM’s Power5 implementation? Other implementations?

ioana burcea initial observations of the simultaneous multithreading pentium 4 processor nathan tuck...

Documents