analyzing measurement data
DESCRIPTION
Analyzing Measurement Data. Example. Prediction: I f a spring on the slingshot were pulled back 1m , the softball will land a distance of 17m downrange To confirm prediction, data is collected from 20 trials. Example. Most values fall between 14 and 20 m. - PowerPoint PPT PresentationTRANSCRIPT
• Engineering 1811.01
Analyzing Data 1
College of EngineeringEngineering Education Innovation Center
Analyzing Measurement Data
Rev: 20130604, MC
• Engineering 1811.01
Analyzing Data 2
Example
Prediction: If a spring on the slingshot were pulled back 1m, the softball will land a distance of 17m downrange
To confirm prediction, data is collected from 20 trials.
Rev: 20120103, AM
.1 Brockman , Jay B.. Data Analysis ∧ Empirical Models . Introduction to Engineering : Modeling and Problem Solving . Hoboken , NJ : John Wiley & Sons , Inc., 2009. 226−228. Print .
• Engineering 1811.01
Analyzing Data
Example
Rev: 20120103, AM 3
• Most values fall between 14 and 20 m.
• This data contains an outlier of 45.2 m.
.1 Brockman , Jay B.. Data Analysis ∧ Empirical Models . Introduction to Engineering : Modeling and Problem Solving . Hoboken , NJ : John Wiley & Sons , Inc., 2009. 226−228. Print .
• Engineering 1811.01
Analyzing Data 4
Represent the Data with a Histogram• First, determine an appropriate bin size.• The bin size [k] can be assigned directly or can be calculated from a
suggested number of bins [h]:• Let’s try the most commonly
used formula first:
Rev: 20120103, AM
If you have this many data points [n]
Use this number of bins [h]
Less than 50 5 to 7
50 to 99 6 to 10
100 to 250 7 to 12
More than 250 10 to 20
= 4.43 ≈ 5
• Engineering 1811.01
Analyzing Data 5
Histogram - Example
0-19 19-24 24-29 29-34 34-39 39-44 44-4902468
1012141618
Slingshot Data
Bin
Fre
qu
ency
Rev: 20120103, AM
Is this the best way to represent this data?By changing our bin size, [k], we can improve the representation.
Bin Size Frequency
0-19 17
19-24 2
24-29 0
29-34 0
34-39 0
39-44 0
44-49 1
• Engineering 1811.01
Analyzing Data 6
Histogram - Example
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
46
0
1
2
3
4
5
6
7Slingshot Data
Bin
Fre
qu
ency
Rev: 20120103, AM
All 3 histograms represent the exact same data set, but the bin width and number of bins for the two shown above were selected manually.
Which one is most descriptive?
14-15 15-16 16-17 17-18 18-19 19-20 >200
1
2
3
4
5
6
7
Slingshot Data
Bin
Fre
qu
ency
• Engineering 1811.01
Analyzing Data 7
Dealing with outliers
• Engineers must carefully consider any outliers when analyzing data.
• It is up to the engineer to determine whether the outlier is a valid data point or if it is invalid and should be discarded.
• Invalid data points can result from measurement errors or recording the data incorrectly.
Rev: 20120103, AM
• Engineering 1811.01
Analyzing Data 8
Characterizing the data
• Statistics allows us to characterize the data numerically as well as graphically.
• We characterize data in two ways: – Central Tendency – Variation
Rev: 20120103, AM
• Engineering 1811.01
Analyzing Data 9
Central Tendency (Expected Value)
• Central tendency is a single value that best represents the data.
• But which number do we choose? • Mean• Median• Mode
– Note: For most engineering applications, mean and median are most relevant.
Rev: 20120103, AM
• Engineering 1811.01
Analyzing Data 10
Central Tendency - Mean
Rev: 20120103, AM
𝑀𝑒𝑎𝑛=∑ 𝑥n
=369.320
=18.47
Is the mean value a good depiction of the data?How does the outlier affect the mean?
• Engineering 1811.01
Analyzing Data 11
Central Tendency - MeanProblem: Outliers may decrease the usefulness of the mean as a central value. Observe how outliers can affect the mean for this simple data set:
Rev: 20120103, AM
3 7 12 17 21 21 23 27 32 36 44-112 212
Without outliersChanging 3 to -112
Outlier: -112Changing 44 to 212
Outlier: 212
Solution: Look at the median.
• Engineering 1811.01
Analyzing Data 12
Central Tendency - Median
Rev: 20120103, AM
n = 20 even number of data points. Must take the average of the 2 middle values
In this case, the 2 middle values are both 17.4
Which value looks like a better representation of the data? Mean (18.47) or median (17.4)? Why?
• Engineering 1811.01
Analyzing Data 13
Central Tendency Median
Rev: 20120103, AM
Using the simple data set, observe how the median reduces the impact of outliers on the central tendency.
3 7 12 17 21 21 23 27 32 36 44
-112 7 12 17 21 21 23 27 32 36 212
Median = 21
Median = 21
• Engineering 1811.01
Analyzing Data 14
0 5 10 15 20 2510
15
20
25
30
35
40
45
50
Slingshot Distance Testing
Trial Number
Dis
tan
ce T
rave
ld [
m]
Central Tendency – Mean and Median
Which value,
the mean (18.47 m) or the median (17.4) is a better representation of the data?
Rev: 20120103, AM
• Engineering 1811.01
Analyzing Data 15
Characterizing the data
• We can select a value of central tendency to represent the data, but is one number enough?
• It is also important to know how much variation there is in the data set.
• Variation refers to how the data is distributed around the central tendency value.
Rev: 20120103, AM
• Engineering 1811.01
Analyzing Data 16
Variation
• As with central tendency, there are multiple ways to represent the variation of a set of data. • ± (“Plus, Minus”) gives the range of the values.• Standard Deviation provides a more
sophisticated look at how the data is distributed around the central value.
Rev: 20120103, AM
• Engineering 1811.01
Analyzing Data 17
Variation - Standard Deviation
Definition: how closely the values cluster around the mean; how much
variation there is in the data
Equation:
Rev: 20120103, AM
• Engineering 1811.01
18
Standard Deviation Example
Rev: 20130604, MC Analyzing Data
mean = ∑ =
𝜎=√41.32 𝜎=6.4281
• Engineering 1811.01
Analyzing Data 19
Standard Deviation: Interpretation
Rev: 20120103, AM
These curves describe the distribution of students’ exam grades. The average value is an 83%.
Which class would you rather be in?
Curve B
Curve A
A B
• Engineering 1811.01
Analyzing Data 20
• Data that is normally distributed occurs with greatest frequency around the mean.
• Normal distributions are also frequently referred to as Gaussian distributions or bell curves
Normal Distribution
Rev: 20120103, AM
Fre
quen
cy
Bins
0 1 2 3 4 5-1-2-3-4-5
mean
• Engineering 1811.01
Analyzing Data 21
Normal Distribution
Rev: 20120103, AM
Mean = Median = Mode
- 68% of values fall within 1 SD
- 95% of values fall within 2 SDs
• Engineering 1811.01
Analyzing Data 22
Other Distributions
Rev: 20120103, AM
Skewed distributions:
Multimodal distribution: Uniform distribution:
• Engineering 1811.01
Analyzing Data 23
What we’ve learned
• This lecture has introduced some basic statistical tools that engineers use to analyze data.
• Histograms are used to represent data graphically.
• Engineers use both central tendency and variation to numerically describe data.
Rev: 20120103, AM