architectural characterization of an ibm rs6000 s80 server running tpc-w workloads

23
Architectural Characterization of an IBM Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Lei Yang & Shiliang Hu Computer Sciences Department, University of Computer Sciences Department, University of Wisconsin - Madison Wisconsin - Madison

Upload: bryson

Post on 06-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads. Lei Yang & Shiliang Hu Computer Sciences Department, University of Wisconsin - Madison. Outline. TPC-W Benchmarks in Java IBM RS6000 S80 Enterprise Server Hardware Counters in S80 Experiment Results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Architectural Characterization of an IBM Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W RS6000 S80 Server Running TPC-W

WorkloadsWorkloads

Lei Yang & Shiliang HuLei Yang & Shiliang HuComputer Sciences Department, University of Wisconsin - Computer Sciences Department, University of Wisconsin -

MadisonMadison

Page 2: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

OutlineOutline

• TPC-W Benchmarks in JavaTPC-W Benchmarks in Java• IBM RS6000 S80IBM RS6000 S80 Enterprise Server Enterprise Server• Hardware Counters in S80Hardware Counters in S80• Experiment ResultsExperiment Results• Problems and Future work Problems and Future work • Conclusions Conclusions

Page 3: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

TPC-W benchmarkTPC-W benchmark

• TPC-W is the TPC Council’s newest benchmark for Transactional Web Environments (E-Commerce) Modeling an online book store similar to www.amazon.com – Browsing 95% browsing, 5% transactions – Shopping 80% browsing, 20% transactions – Ordering 50% browsing, 50% transactions

• Transactional Web Environments:– Web serving of static and dynamic content– Online Transaction processing (OLTP)– Some decision support (DSS)

Page 4: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

IBM RS6000 S80 Enterprise ServerIBM RS6000 S80 Enterprise Server

• 6 RS64-III Pulsar processors (451MHz)6 RS64-III Pulsar processors (451MHz)

– 4-issue in-order Super Scalar microprocessor with on chip 4-issue in-order Super Scalar microprocessor with on chip 128KB L1 I-Cache, 128KB L1 D-Cache and 8MB L2 Cache.128KB L1 I-Cache, 128KB L1 D-Cache and 8MB L2 Cache.

– No Branch Prediction, Aggressive early branch resolutionNo Branch Prediction, Aggressive early branch resolution– Coarse grain 2-context Multithreading.Coarse grain 2-context Multithreading.

• SMP system. Snooping bus inter-processor SMP system. Snooping bus inter-processor connection.connection.

• 8GB main memory, Huge disk volumes. And very 8GB main memory, Huge disk volumes. And very high bandwidth IO systems.high bandwidth IO systems.

Page 5: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

System Configuration:System Configuration:

RS64-III processor

32bits Control word

RS64-III processor

32bits Control word

AIX kernel Kernel Extension

Performance Monitor

Performance Monitor

Performance Monitor

Snooping bus

Java Virtual Machine

Emulated Browser

Java Virtual Machine

DB2 DBMS

Processes

JDBChttp

SUN

Java Web

Server2.0

Java Servlet

Java Servlet

Page 6: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Hardware Counters in S80Hardware Counters in S80

• 3 levels of objects can be counted with their own 3 levels of objects can be counted with their own counting contexts:counting contexts:- System level counting, whole system level context- System level counting, whole system level context

- Process / Process group, process level context- Process / Process group, process level context

- Individual thread, thread level context.- Individual thread, thread level context.

• 3 major components3 major components

- 8 Built-in hardware counters in each RS64-III processor.- 8 Built-in hardware counters in each RS64-III processor.

- - Kernel extension to AIX 4.3Kernel extension to AIX 4.3

- Performance Monitor API in the next release of AIX.- Performance Monitor API in the next release of AIX.

• Some Problems with current version of PM API.- Cannot count for individual processor.- Some Listed events are not available.

Page 7: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Hardware Counters in S80: Countable EventsHardware Counters in S80: Countable Events

• Processor eventsProcessor events- execution cycles and the number of instructions executed- execution cycles and the number of instructions executed. .

• Instruction mix eventsInstruction mix events- Pipeline M, S, B and S instructions executed.- Pipeline M, S, B and S instructions executed.

• Branch eventsBranch events- Conditional branch T/NT events, unconditional branches, zero cycle - Conditional branch T/NT events, unconditional branches, zero cycle branches.branches.

• Address Translation eventsAddress Translation events- TLB/SLB and ERAT/IERAT miss and duration events.- TLB/SLB and ERAT/IERAT miss and duration events.

• Cache eventsCache events- Cache misses and latencies for each of the L1 I-Cache L1 D-Cache L2 - Cache misses and latencies for each of the L1 I-Cache L1 D-Cache L2 CacheCache

• Bus and multi-processor bus snooping eventsBus and multi-processor bus snooping events- bus utilization. multi-processor bus snooping events - bus utilization. multi-processor bus snooping events

Page 8: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: CPI for RBE, Java Web Server and Results: CPI for RBE, Java Web Server and DB2DB2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

RBE JWS DB2

CPI

Browsing Shopping Ordering

Page 9: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: CPU Cycle CountsResults: CPU Cycle Counts

0 100 2000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

9 Browsing Mix

Time/sec

DB2JWSRBE

0 100 2000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

9 Shopping Mix

Time/sec

DB2JWSRBE

0 100 2000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

9 Ordering Mix

Time/sec

DB2JWSRBE

Cyc

le C

ount

s

Page 10: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Instruction DispatchResults: Instruction Dispatch

Dis

patc

h P

erce

ntag

e %

• Browsing MixBrowsing Mix

0 100 2000

10

20

30

40

50

60

70

80

90

100DB2

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

0 100 2000

10

20

30

40

50

60

70

80

90

100JWS

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

0 100 2000

10

20

30

40

50

60

70

80

90

100RBE

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

Dis

patc

h P

erce

ntag

e %

Page 11: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Instruction DispatchResults: Instruction Dispatch

• Shopping MixShopping Mix

0 100 2000

10

20

30

40

50

60

70

80

90

100DB2

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

0 100 2000

10

20

30

40

50

60

70

80

90

100JWS

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

0 100 2000

10

20

30

40

50

60

70

80

90

100RBE

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

Dis

patc

h P

erce

ntag

e %

Page 12: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Instruction DispatchResults: Instruction Dispatch

• Ordering MixOrdering Mix

0 100 2000

10

20

30

40

50

60

70

80

90

100DB2

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

0 100 2000

10

20

30

40

50

60

70

80

90

100JWS

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

0 100 2000

10

20

30

40

50

60

70

80

90

100RBE

Time/sec

0 Instr1 Instr2 Instr3 Instr4 Instr

Dis

patc

h P

erce

ntag

e %

Page 13: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Instruction MixResults: Instruction Mix

• Browsing MixBrowsing Mix

0 100 2000

10

20

30

40

50

60

70

80

90

100DB2

Time/sec

Logic ArithematicsBranch LD/ST

0 100 2000

10

20

30

40

50

60

70

80

90

100JWS

Time/sec

Logic ArithematicsBranch LD/ST

0 100 2000

10

20

30

40

50

60

70

80

90

100RBE

Time/sec

Logic ArithematicsBranch LD/ST

Inst

ruct

ion

type

Per

cent

age

%

Page 14: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Instruction MixResults: Instruction Mix

• Shopping MixShopping Mix

0 100 2000

10

20

30

40

50

60

70

80

90

100DB2

Time/sec

Logic ArithematicsBranch LD/ST

0 100 2000

10

20

30

40

50

60

70

80

90

100JWS

Time/sec

Logic ArithematicsBranch LD/ST

0 100 2000

10

20

30

40

50

60

70

80

90

100RBE

Time/sec

Logic ArithematicsBranch LD/ST

Inst

ruct

ion

type

Per

cent

age

%

Page 15: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Instruction MixResults: Instruction Mix

• Ordering MixOrdering Mix

0 100 2000

10

20

30

40

50

60

70

80

90

100DB2

Time/sec

Logic ArithematicsBranch LD/ST

0 100 2000

10

20

30

40

50

60

70

80

90

100JWS

Time/sec

Logic ArithematicsBranch LD/ST

0 100 2000

10

20

30

40

50

60

70

80

90

100RBE

Time/sec

Logic ArithematicsBranch LD/ST

Inst

ruct

ion

type

Per

cent

age

%

Page 16: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Branch BehaviorResults: Branch Behavior

Shopping MixShopping Mix

1 2 3 4 5 6 7 80

2

4

6

8

10

12

14

16x 10

9

DB2JWSRBE

1 2 3 4 5 6 7 80

2

4

6

8

10

12

14

16

18x 10

9

DB2JWSRBE

Browsing MixBrowsing Mix

1. Branches conditional taken2. Branch to link register taken3. Branch to counter taken4. Absolute branches

5. Branches unconditional6. Branches conditional not taken7. Zero cycle branch not taken8. Zero cycle branch taken

Page 17: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Branch BehaviorResults: Branch Behavior

1 2 3 4 5 6 7 80

2

4

6

8

10

12x 10

9

DB2JWSRBE

Ordering MixOrdering Mix

1. Branches conditional taken2. Branch to link register taken3. Branch to counter taken4. Absolute branches5. Branches unconditional6. Branches conditional not taken7. Zero cycle branch not taken8. Zero cycle branch taken

Page 18: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Cache Behavior Results: Cache Behavior

1. L1 I cache miss duration latency2. L1 D cache miss duration latency

Browsing MixBrowsing Mix Shopping MixShopping Mix

1 20

10

20

30

40

50

60

70

80

DB2JWSRBE

Lat

ency

/cyc

les

1 20

10

20

30

40

50

60

70

80

DB2JWSRBE

Page 19: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Cache Behavior Results: Cache Behavior

1. L1 I cache miss duration latency2. L1 D cache miss duration latency

1 20

10

20

30

40

50

60

70

80

DB2JWSRBE

Lat

ency

/cyc

les

Shopping MixShopping Mix

1 20

10

20

30

40

50

60

70

80

DB2JWSRBE

Ordering MixOrdering Mix

Page 20: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Results: Cache Behavior Results: Cache Behavior

1. L2 miss count per instruction2. L1 I cache miss count per instruction3. L1 D cache miss count per instruction

Ordering MixOrdering MixShopping MixShopping MixBrowsing MixBrowsing Mix

Cou

nt

1 2 30

0.002

0.004

0.006

0.008

0.01

0.012

0.014DB2JWSRBE

1 2 30

0.002

0.004

0.006

0.008

0.01

0.012

0.014DB2JWSRBE

1 2 30

0.002

0.004

0.006

0.008

0.01

0.012

0.014DB2JWSRBE

Page 21: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

Problems & Future WorksProblems & Future Works

• Problems:- Large Dataset - Network and Server end software are the bottleneck?- Hardware counters vs. Simulations.

• Future works:- Measurement of other transactional processing and web serving benchmarks for comparison. - More architectural characterizations such as multithreaded processors, multiprocessor scaling and multiprocessor snooping bus issues.

Page 22: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

ConclusionsConclusions

• Server end Software is critical for high-end servers- Network and Server end software are the bottleneck - This is true both for high end commercial server systems and other high performance parallel computers designed for scientific or engineering computing.

• Preliminary performance characterization shows: - CPU utilization is highly dependent upon the application workloads. - High dispatching mechanism on RS64III appears less efficiently used.- Branch instructions are second to load and store instructions.- L2 cache miss rate is unreasonably low and L1 D-cache miss latency is considerable larger than that of L1 I-cache.

Page 23: Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads

AcknowledgementAcknowledgement

• Trey Cain for setting up Java TPC-W and discussion

• Morris Marden for helping quiet the machine and discussion

• Prof. Mikko Lipasti for guidance and support

• Everyone helped us