© 2003, carla ellis vague idea “groping around” experiences hypothesis model initial...
DESCRIPTION
© 2003, Carla Ellis Vague idea 1. Understand the problem, frame the questions, articulate the goals. A problem well-stated is half-solved. “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental LifecycleTRANSCRIPT
© 2003, Carla Ellis
Vague idea
“groping around” experiences
Hypothesis
Model
Initialobservations
Experiment
Data, analysis, interpretation
Results & finalPresentation
Experimental Lifecycle
© 2003, Carla Ellis
A Systematic Approach1. Understand the problem, frame the questions,
articulate the goals.A problem well-stated is half-solved.
• Must remain objective• Be able to answer “why” as well as “what”
2. Select metrics that will help answer the questions.3. Identify the parameters that affect behavior
• System parameters (e.g., HW config)• Workload parameters (e.g., user request patterns)
4. Decide which parameters to study (vary).
© 2003, Carla Ellis
Vague idea
1. Understand the problem,frame the questions, articulate the goals.A problem well-stated is half-solved.
“groping around” experiences
Hypothesis
Model
Initialobservations
Experiment
Data, analysis, interpretation
Results & finalPresentation
Experimental Lifecycle
© 2003, Carla Ellis
An Example• Vague idea: there should be “interesting”
interactions between DVS (dynamic voltage scaling of the CPU) and PADRAM (power-aware memory)– DVS: in soft real-time applications, slow down CPU
speed and reduce supply voltage so as to just meet the deadlines.
– PADRAM: when there are no memory accesses pending, transition memory chip into lower power state
– Intuition: DVS will affect the length of memory idle gaps
© 2003, Carla Ellis
Back of the Envelope
What information do you need to know?
Xscale range – 50MHz, .65V, 15mWto 1GHz, 1.75V, 2.2W
Fully active mem – 300mWnap – 30mW w. 60ns extra latency
E = P * t
© 2003, Carla Ellis
Power Aware Memory
Standby180mW
Active300mW
Power Down3mW
Nap30mW
Read/Write
Transaction
+6 ns+6000 ns
+60 ns
RDRAM Power States
© 2003, Carla Ellis
Example• Hypothesis: the best speed/voltage
choice for DVS to minimize energy consumption when idle memory can power down is not necessarily the lowest speed that is able to meet deadline – counter to the assumption made by most DVS studies.
© 2003, Carla Ellis
Example• Restate hypothesis to disprove:
the best speed/voltage choice for DVS to minimize energy consumption when idle memory can power down is still the lowest speed that is able to meet deadline – the assumption made by most DVS studies.
© 2003, Carla Ellis
What can go wrong at this stage?
• Never understanding the problem well enough to crisply articulate the goals / questions / hypothesis.
• Getting invested in some solution before making sure a real problem exists. Getting invested in any desired result. Not being unbiased enough to follow proper methodology.
• Fishing expeditions (groping around forever).• Having no goals but building apparatus for it 1st.
© 2003, Carla Ellis
A Systematic Approach1. Understand the problem, frame the questions,
articulate the goals.A problem well-stated is half-solved.
• Must remain objective• Be able to answer “why” as well as “what”
2. Select metrics that will help answer the questions.
3. Identify the parameters that affect behavior• System parameters (e.g., HW config)• Workload parameters (e.g., user request patterns)
© 2003, Carla Ellis
Vague idea
2. Select metrics that will help answer the questions.
3. Identify the parameters that affect behaviorSystem parametersWorkload parameters
“groping around” experiences
Hypothesis
Model
Initialobservations
Experiment
Data, analysis, interpretation
Results & finalPresentation
Experimental Lifecycle
© 2003, Carla Ellis
An ExampleSystem under test: CPU and memory.Metrics:
total energy used by CPU + memory, CPU energy, memory energy, leakage, execution time, ave. memory gap
© 2003, Carla Ellis
Parameters Affecting Behavior
Hardware parameters• CPU voltage/speed settings,• Processor model (e.g. in-order, out-of-order, issue width)• Cache organization• Number of memory chips and data layout across them• Memory power state transitioning policy
– Threshold values• Power levels of power states• Transitioning times in & out of power states.Workload: periods, miss ratio, memory access pattern
© 2003, Carla Ellis
What can go wrong at this stage?
• Wrong metrics (they don’t address the questions at hand)What everyone else uses. Easy to get.
• Not clear about where the “system under test” boundaries are.
• Unrepresentative workload. Not predictive of real usage. Just what everyone else uses (adopted blindly) – or NOT what anyone else uses (no comparison possible)
• Overlooking significant parameters that affect the behavior of the system.
© 2003, Carla Ellis
4. Decide which parameters to study (vary).5. Select technique:
• Measurement of prototype implementationHow invasive? Can we quantify interference of monitoring? Can we directly measure what we want?
• Simulation – how detailed? Validated against what?
• Repeatability6. Select workload
• Representative?• Community acceptance• Availability
A Systematic Approach
© 2003, Carla Ellis
Vague idea
“groping around” experiences
Hypothesis
Model
Initialobservations
Experiment
Data, analysis, interpretation
Results & finalPresentation
Experimental Lifecycle
4. Decide which parameters to vary
5. Select technique6. Select workload
© 2003, Carla Ellis
An Example• Choice of workload: MediaBench applications
(later iterations will use a synthetic benchmark as well in which miss ratio can be varied)
• Technique: simulation using SimpleScalar augmented with RDRAM memory, PowerAnalyzer
• Factors to study– CPU speed/voltage• Comparing nap memory policy with base case
© 2003, Carla Ellis
What can go wrong at this stage?
• Choosing the wrong values for parameters you aren’t going to vary.Not considering the effect of other values (sensitivity analysis)
• Not choosing to study the parameters that matter most – factors
• Wrong technique• Wrong level of detail
© 2003, Carla Ellis
7. Run experiments• How many trials? How many combinations of
parameter settings?• Sensitivity analysis on other parameter values.
8. Analyze and interpret data• Statistics, dealing with variability, outliers
9. Data presentation10. Where does it lead us next?
• New hypotheses, new questions, a new round of experiments
A Systematic Approach
© 2003, Carla Ellis
Vague idea
“groping around” experiences
Hypothesis
Model
Initialobservations
Experiment
Data, analysis, interpretation
Results & finalPresentation
Experimental Lifecycle
7. Run experiments8. Analyze and interpret
data9. Data presentation
© 2003, Carla Ellis
An Example
© 2003, Carla Ellis
What can go wrong at this stage?
• One trial – data from a single run when variation can arise.
• Multiple runs – reporting average but not variability
• Tricks of statistics• No interpretation of what the results mean.• Ignoring errors and outliers• Overgeneralizing conclusions – omitting
assumptions and limitations of study.
© 2003, Carla Ellis
7. Run experiments• How many trials? How many combinations of
parameter settings?• Sensitivity analysis on other parameter values.
8. Analyze and interpret data• Statistics, dealing with variability, outliers
9. Data presentation10. Where does it lead us next?
• New hypotheses, new questions, a new round of experiments
A Systematic Approach
© 2003, Carla Ellis
Vague idea
“groping around” experiences
Hypothesis
Model
Initialobservations
Experiment
Data, analysis, interpretation
Results & finalPresentation
Experimental Lifecycle
10. What next?
© 2003, Carla Ellis
An Example• New Hypothesis: different controller
policies are appropriate at different speed settings– Vary miss ratio of synthetic benchmark– Vary speed/voltage
© 2003, Carla Ellis
Metrics• Criteria to compare performance
– Quantifiable, measureable– Relevant to goals– Complete set reflects all possible outcomes:
• Successful – responsiveness, productivity rate (throughput), resource utilization
• Unsuccessful – availability (probability of failure mode) or mean time to failure
• Error – reliability (probability of error class) or mean time between errors
© 2003, Carla Ellis
Common Performance Metrics (Successful
Operation)• Response time • Throughput
(requests per unit of time) MIPS, bps, TPS
Request sta
rts
Request ends
Service
begins
Service
completes
Response
back
Request sta
rts
reaction think
response time
load
thru
put
nominal capacity
knee usablecapacity
© 2003, Carla Ellis
Discussion: Sampling of Metrics from Literature