a new methodology for studying realistic processors in computer science degrees
DESCRIPTION
A New Methodology for Studying Realistic Processors in Computer Science Degrees. Crispín Gómez , María E. Gómez y Julio Sahuquillo DISCA. Technical University of Valencia DSI. University of Castilla -La Mancha. Outline. Motivation Simulator Proposed Methodology Case Study Conclusions. - PowerPoint PPT PresentationTRANSCRIPT
Conference title 1
A New Methodology for Studying Realistic Processors in Computer Science DegreesCrispín Gómez, María E. Gómez y Julio Sahuquillo
DISCA. Technical University of ValenciaDSI. University of Castilla-La Mancha
2
Outline• Motivation
• Simulator
• Proposed Methodology
• Case Study
• Conclusions
3
Motivation
• Astonishingly quick evolution of processor architecture:
• Teaching should cover from the basics to the most realistic up-to-date concepts
In-OrderExecution
Superscalar
Out-Of-OrderExecution
Manycore Multicore
POWER
4
Motivation
• Current designs imply a big complexity
• Out-Of-order complex cores
• Multi-level memory hierarchy
• On-chip Interconnection network
5
Outline• Motivation
• Simulator
• Proposed Methodology
• Case Study
• Conclusions
6
Simulator
• Multi2Sim: multicore and multithreaded• X86 binary compatibility
• Application-only
• Free simulator: Open source project–http://www.multi2sim.org/
• Widely used on research–Academia
– Industry
7
Simulator – Cores
• CPU: 6-staged pipelined processors, out-of-order execution– Execution stage maybe customized to be multicycle.
• Speculative execution
• Three mutithreading paradigms are supported:–Coarse grain, fine grain, simultaneous multithreading
• All microarchitectural parameters are customizable–Type of branch predictor
– Issue width
–Etc.
• GPUs
8
Simulator – Memory Hierarchy
• Complete memory hierarchy• Coherency: MOESI
• Flexible hierarchy: # of memory levels and memory structures in each level
• Each memory structure is fully customizable–#Sets
–#Ways
–Block size
9
Simulator – Interconnection Network
• Interconnection network:• Any topology can be implemented
• Forwarding tables routing (any routing algorithm can be used)
• Each network element is fully customizable–Buffer size at switches
–Link bandwidth
10
Outline• Motivation
• Simulator
• Proposed Methodology
• Case Study
• Conclusions
11
Proposed Methodology
• Tries to motivate the students into processor architecture
• Realistic examples
• Increasing difficulty levels
• Shared use in several courses
• Develop basic skills for final projects, MS thesis or Ph.D thesis
• Based on a progressive interaction with Multi2Sim• 4 learning phases with increasing difficulty due to
the simulator’s complexity
12
Proposed Methodology
• 1st phase: Simulation parameters modifications ( at labs)–Configure the system components
–Launch simulations
–Analyze the effects of the parameters on the system performance
13
Proposed Methodology
• 2nd phase: Modify small pieces of code–Very small and bounded fragments of source code
–Completely guided by the instructors
–Modification of a provided baseline
–Examples: Branch predictor, prefetch mechanisms,…
–Final work of the course
14
Proposed Methodology
• 3rd phase: Implementation complete functionalities–Consolidated simulator skills
–Development of functionalities from scratch
–Examples: Memory controller, Stream-buffers based prefetcher,…
–Final project or MS thesis
–Some works have been published in top level conferences
• 4th phase: Complete autonomy–The students are in a privileged position to start a Ph.D.
15
Outline• Motivation
• Simulator
• Proposed Methodology
• Case Study
• Conclusions
16
Case study
• The methodology has been implanted at the UPV in two courses
• Advanced Processor Architectures–Computer Science Degree and Master Degree
• Networks on-chip–Master Degree
• We have defined several learning stages with the simulator
• Baseline system modeling
• Execution of standard benchmark suites
• Prefetching mechanisms implementation
17
Case study
• Baseline system modeling
18
Case study
• Baseline system modeling
• Detailed explanation of the configuration for–Memory
–Cores
– Interconnection network
• Sample configuration files are used
19
Case study
Benchmark Execution• Parallel (Splash 2)
• Multiprogrammed mixes (Spec)
• Performance study (IPC, Execution Time, Network latency) varying L2 block size
20
Case study
• Prefetching mechanisms implementation• Base simple prefetching mechanism provided
–OBL (One Block Look-ahead) on L2 miss
• Modification to this mechanism–N-block sequential
–N-block with regular stride
21
Case study
• Results• This year 2 final projects have been performed in
memory controller and prefetching
• Results from these projects are expected to be sent to first level international conferences
• These projects are expected to be evolved into MS thesis
• Results projection is based on the experiences from previous year, in which results from the projects were accepted in PACT and IPDPS conferences
22
Outline• Motivation
• Simulator
• Proposed Methodology
• Case Study
• Conclusions
23
Conclusions
• We have reduced the gap between theoretical contents on Computer Architecture topics and real processors
• By using a well-established CMP-simulator in the international research community
• Methodology based on an increasing degree of difficulty
• First steps are very guided by instructors
• Students are encouraged to go ahead to more complex implementations
• Methodology + simulator = good platform for future works as the range of design choices is very wide