hudm4122 probability and statistical inference
TRANSCRIPT
![Page 1: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/1.jpg)
HUDM4122Probability and Statistical Inference
January 26, 2015
![Page 2: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/2.jpg)
ASSISTments
• Did everyone get an account for the ASSISTments system?
• Did anyone have difficulties setting up an account?
• First homework is due in a week
![Page 3: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/3.jpg)
Today
• Ch. 1 in Mendenhall, Beaver, & Beaver
• Variables and Variable Types• Graphing Data• Basic Exploratory Data Analysis
![Page 4: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/4.jpg)
Variables
• What is a variable?
![Page 5: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/5.jpg)
Variables
• What is a variable?
• “A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration.” –MBB p. 8
![Page 6: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/6.jpg)
Which of these are examples of variables?
• GPA• Shoe size• Age• Number of correct answers in ASSISTments• Number of times gamed the system in ASSISTments
• Favorite vegetable• Favorite type of pie• Pi
![Page 7: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/7.jpg)
What is a measurement?
![Page 8: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/8.jpg)
What is a measurement?
• A measurement is the result of measuring a variable on a single experimental unit – A person, if you are studying people– A class, if you are studying classes– A pizza, if you are studying pizzas
![Page 9: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/9.jpg)
A measurement
• Person furthest towards my left in the front row, what is your name?
![Page 10: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/10.jpg)
Now I have a measurement
![Page 11: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/11.jpg)
A measurement
• Person furthest towards my right in the second row, what is your name?
![Page 12: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/12.jpg)
Now I have data
• A set of measurements
![Page 13: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/13.jpg)
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
![Page 14: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/14.jpg)
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
• I only know one exception
![Page 15: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/15.jpg)
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
• I only know one exception
![Page 16: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/16.jpg)
Everyone repeat after me
![Page 17: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/17.jpg)
Everyone repeat after me
• “My data are in this Excel file.”
![Page 18: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/18.jpg)
Everyone repeat after me
• “My data are in this Excel file.”• “Your data aren’t evidence for that conclusion.”
![Page 19: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/19.jpg)
Everyone repeat after me
• “My data are in this Excel file.”• “Your data aren’t evidence for that conclusion.”
• “His data were hard to collect.”
![Page 20: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/20.jpg)
However…
![Page 21: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/21.jpg)
However…
• I do not recommend insisting that data is plural in bars, on first dates, or at Thanksgiving dinner
![Page 22: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/22.jpg)
Any questions or concerns?
![Page 23: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/23.jpg)
Univariate Data
• A single variable is collected
Height5’11”5’11”5’10”5’6”
![Page 24: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/24.jpg)
Univariate Data
• Two variables are collected (for the same data point)
Height Drum‐Playing Skill5’11” 15’11” 25’10” 45’6” 8
![Page 25: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/25.jpg)
Multivariate Data
• 3+ variables are collected
Name Height Drum‐Playing SkillJohn Lennon 5’11” 1
Paul McCartney 5’11” 2George Harrison 5’10” 4
Ringo Starr 5’6” 8
![Page 26: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/26.jpg)
Any questions or concerns?
![Page 27: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/27.jpg)
Types of Variables
![Page 28: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/28.jpg)
Quantitative/Numerical Data
• Data that can be expressed as numbers
![Page 29: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/29.jpg)
What are some examples
• Of numerical data?
![Page 30: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/30.jpg)
Ordinal Data
• Refers to data where there is a known order, but either– The data clearly isn’t numbers– The space between values is not guaranteed to be equal
![Page 31: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/31.jpg)
Examples of Ordinal Data
• Months of the year: January, February, March, April, …
• Agreement level: Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree
• Quality of university: Highly selective, selective, somewhat selective, non‐selective
![Page 32: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/32.jpg)
Other examples of ordinal data?
![Page 33: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/33.jpg)
Nominal data
• Values have no order or spacing
• Name• State of Residence
– New Jersey is not greater or less than New York
![Page 34: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/34.jpg)
Nominal data
• Values have no order or spacing
• Name• State of Residence
– New Jersey is not greater or less than New York– Although my brother might disagree
![Page 35: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/35.jpg)
Other Examples of Nominal Data?
![Page 36: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/36.jpg)
Another name
• Nominal data is often also called categorical data
![Page 37: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/37.jpg)
Another name
• Nominal data is often also called categorical data
• Technically ordinal data is also categorical, but no one ever uses the term that way
![Page 38: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/38.jpg)
Any questions or concerns?
![Page 39: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/39.jpg)
Exploratory Data Analysis
• “Analyzing data sets to summarize their main characteristics”
• “Seeing what the data can tell us beyond the formal modeling or hypothesis testing task”
![Page 40: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/40.jpg)
Goal
• Generate hypotheses• Understand your data better
![Page 41: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/41.jpg)
Often (but not always)done with graphs
![Page 42: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/42.jpg)
Which of these is your favorite type of graph?
• Pie chart• Bar graph• Frequency histogram• Line graph• Scatterplot• Stem‐and‐leaf plot• Box plot• Other
![Page 43: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/43.jpg)
Pie Chart
• Take a set of categories that add to 100%• Show the proportion each category has
![Page 44: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/44.jpg)
Pie Chart: Example
What is everyone's favorite pie?
PumpkinAppleCherryRhubarbBanana Cream
![Page 45: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/45.jpg)
Interpret This Graph Please
What is everyone's favorite pie?
PumpkinAppleCherryRhubarbBanana Cream
![Page 46: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/46.jpg)
Never Ever Do This:Completely Visually Misleading
Fair use; critique
![Page 47: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/47.jpg)
Let’s make a pie chart
• Using the “your favorite graph” data
![Page 48: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/48.jpg)
Any questions?
![Page 49: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/49.jpg)
Alternative: Bar Graphs
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
![Page 50: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/50.jpg)
Interpret this graph please
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
![Page 51: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/51.jpg)
What are the advantages/disadvantages relative to pie chart?
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
![Page 52: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/52.jpg)
By the way: X and Y axes
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
X axis
Y axis
![Page 53: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/53.jpg)
Strengths of bar graphs
• Categories don’t have to add to 100%• Easier to see small differences between categories
• You can compare variables too
![Page 54: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/54.jpg)
Two‐group bar graph
0
10
20
30
40
50
60
Football Team Chess Team SpidermanTeam
Qua
lity (Highe
r is B
etter)
School Rankings
Midtown High
Harlem Success Academy
![Page 55: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/55.jpg)
Let’s make a bar graph
• Using the “your favorite graph” data
![Page 56: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/56.jpg)
Any questions?
![Page 57: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/57.jpg)
Some suggest always using bar graphs instead of pie charts
![Page 58: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/58.jpg)
Some suggest always using bar graphs instead of pie charts
• “The only thing worse than a pie chart is several of them.” – Edward Tufte
• “Save the pies for dessert.” – Stephen Few
![Page 59: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/59.jpg)
But they’re wrong
![Page 60: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/60.jpg)
But they’re wrong
• Pie charts are good for representing part‐whole relationships in really easy to see ways
• Pie charts are good at representing overall proportions
![Page 61: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/61.jpg)
Nice example(Gabrielle, 2013)
![Page 62: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/62.jpg)
Any questions?
![Page 63: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/63.jpg)
Frequency Histogram
• A type of bar graph – But usually when people say “bar graph”, they do not mean “frequency histogram”
– Also: by convention, no space between bars
• X axis shows values or ranges of a quantitative variable
• Y axis shows how many data points have that value or range for the quantitative variable
![Page 64: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/64.jpg)
Example from the book
Visits to Starbucks
![Page 65: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/65.jpg)
Another Example
0
2
4
6
8
10
12
14
16
18
Freq
uency
Exam Grade
![Page 66: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/66.jpg)
Was this an easy exam or a hard exam?
0
2
4
6
8
10
12
14
16
18
Freq
uency
Exam Grade
![Page 67: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/67.jpg)
Would you rather be in the blue class or the orange class?
0
2
4
6
8
10
12
14
16
18
51‐55
56‐60
61‐65
66‐70
71‐75
76‐80
81‐85
86‐90
91‐95
96‐100
Freq
uency
Exam Grade
0
2
4
6
8
10
12
14
16
18
51‐55
56‐60
61‐65
66‐70
71‐75
76‐80
81‐85
86‐90
91‐95
96‐100
Exam Grade
![Page 68: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/68.jpg)
By the way: outliers
0
2
4
6
8
10
12
14
16
18
Freq
uency
Exam Grade
OUTLIER
![Page 69: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/69.jpg)
If there’s time, let’s make a frequency histogram
• Everybody: What’s your height in feet‐inches?
• (Example: I’m 5’9”)
![Page 70: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/70.jpg)
Any questions?
![Page 71: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/71.jpg)
Line Graph
• Shows trends from left‐to‐right• The trend is usually over time• But it doesn’t have to be…
![Page 72: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/72.jpg)
Example Line Graph
http://www.wilderdom.com/personality/L4‐1IntelligenceNatureVsNurture.htmlUsed under Creative Commons License
![Page 73: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/73.jpg)
Example Line Graph(VanLehn, 2011)
(This graph shows perceptions, not data on effectiveness.)
![Page 74: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/74.jpg)
Any questions?
![Page 75: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/75.jpg)
Not going to discuss today
• Stem‐and‐leaf plot
• Very, very rare to see in actual use• Quite poor for any sizable data set
• If you want to learn about them, see the book
![Page 76: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/76.jpg)
Future Classes
• Scatterplot• Box plot
![Page 77: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/77.jpg)
Upcoming Classes
• 1/28 Describing Data with Numerical Measures– Ch. 2
• 2/2 Describing Bivariate Data (Asgn. 1 due)– Ch. 3
• 2/4 Introduction to Probability– Ch. 4
![Page 78: HUDM4122 Probability and Statistical Inference](https://reader030.vdocuments.us/reader030/viewer/2022041014/624f5f9021bb49488850d559/html5/thumbnails/78.jpg)
Questions? Comments?