performance model for future multicore process designs
DESCRIPTION
Performance Model for Future Multicore Process Designs. Yipkei Kwok 02/06/2008. A Non-Work-Conserving Operating System Scheduler For SMT Processors. Authors: A. Fedorova et. al Calculate optimal level of //ism of SMT Processors at run time Analytical model - PowerPoint PPT PresentationTRANSCRIPT
Performance Model for Future Multicore Process Designs
Yipkei Kwok
02/06/2008
A Non-Work-Conserving Operating System Scheduler For SMT
Processors• Authors: A. Fedorova et. al• Calculate optimal level of //ism of SMT
Processors at run time• Analytical model• Estimate the workload’s IPC for a given
degree of concurrency• 1st id’fy performance bottle• Suppressing L2 misses improves
performance the best
A Non-Work-Conserving Operating System Scheduler For SMT
Processors• Factors
– N– perf_cache_CPI(N)– L2_RMR– L2_WMR– L2_WBR_R– L2_WBR_W– WSC– L2_MCOST
Non-Work-Conserving Operating System Scheduler For SMT
Processors• 2-phases scheduling
– Preparation phase• Collect model inputs under full //ism• W./ hardware counters• Till the retirement of the 100 million-th instructions
– Optimization phase• Estimate optimal N• Enforce it• Till … …
– New locality phase
Limitations
• 3-56% improvement but … ..
• Empirical model based on UltraSparc T1
• SMT only– But expandable w./, hopefully, reasonable
effort
• Once expanded, performance prediction
• What’re needed?– Extra factors?
What new factors?
• Depends on systems to model
• Shared-memory machine
• Threaded // workloads
• SMP of CMPs
• SMT per core
What new factors?
• Architecture– Homo/hetero cores
• Difference in speed, or functionality
– Level of cache sharing– Interconnects
What new factors?
• Params– #(cores)– Cache size– Degree of set-associativity– #(cores) sharing a cache– Bus, ring, crossbar, tiny-network– Switching & flow mechanisms– Routing algos– Fault tolerance techniques
What new factors?
• Protocols– Cache coherence protocol at dedicated/semi-
shared cache
• Algorithms– Block replacement algorithm– Algorithms of cache coherence and data
consistency protocols
Potential uses
• Performance prediction for future processors
• Scheduler
Similar work exists?
• Multi2Sim (2007)– Framework simulating the system working as
a whole– Yet, app-only simulation– Evaluate multicore-multithreaded processors– 3 major components simulated
• Core• Cache hierarchy• Interconnect
– Note: source code available
Enough?
• Limitations– Homogenous core– Topology
• Bus only• W./ variable bus width though