performance model for future multicore process designs

Post on 30-Dec-2015

30 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Performance Model for Future Multicore Process Designs. Yipkei Kwok 02/06/2008. A Non-Work-Conserving Operating System Scheduler For SMT Processors. Authors: A. Fedorova et. al Calculate optimal level of //ism of SMT Processors at run time Analytical model - PowerPoint PPT Presentation

TRANSCRIPT

Performance Model for Future Multicore Process Designs

Yipkei Kwok

02/06/2008

A Non-Work-Conserving Operating System Scheduler For SMT

Processors• Authors: A. Fedorova et. al• Calculate optimal level of //ism of SMT

Processors at run time• Analytical model• Estimate the workload’s IPC for a given

degree of concurrency• 1st id’fy performance bottle• Suppressing L2 misses improves

performance the best

A Non-Work-Conserving Operating System Scheduler For SMT

Processors• Factors

– N– perf_cache_CPI(N)– L2_RMR– L2_WMR– L2_WBR_R– L2_WBR_W– WSC– L2_MCOST

Non-Work-Conserving Operating System Scheduler For SMT

Processors• 2-phases scheduling

– Preparation phase• Collect model inputs under full //ism• W./ hardware counters• Till the retirement of the 100 million-th instructions

– Optimization phase• Estimate optimal N• Enforce it• Till … …

– New locality phase

Limitations

• 3-56% improvement but … ..

• Empirical model based on UltraSparc T1

• SMT only– But expandable w./, hopefully, reasonable

effort

• Once expanded, performance prediction

• What’re needed?– Extra factors?

What new factors?

• Depends on systems to model

• Shared-memory machine

• Threaded // workloads

• SMP of CMPs

• SMT per core

What new factors?

• Architecture– Homo/hetero cores

• Difference in speed, or functionality

– Level of cache sharing– Interconnects

What new factors?

• Params– #(cores)– Cache size– Degree of set-associativity– #(cores) sharing a cache– Bus, ring, crossbar, tiny-network– Switching & flow mechanisms– Routing algos– Fault tolerance techniques

What new factors?

• Protocols– Cache coherence protocol at dedicated/semi-

shared cache

• Algorithms– Block replacement algorithm– Algorithms of cache coherence and data

consistency protocols

Potential uses

• Performance prediction for future processors

• Scheduler

Similar work exists?

• Multi2Sim (2007)– Framework simulating the system working as

a whole– Yet, app-only simulation– Evaluate multicore-multithreaded processors– 3 major components simulated

• Core• Cache hierarchy• Interconnect

– Note: source code available

Enough?

• Limitations– Homogenous core– Topology

• Bus only• W./ variable bus width though

top related