![Page 1: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/1.jpg)
Models for Continuous Variables
![Page 2: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/2.jpg)
Longevity of Women (years)10090807060504030
ChallengeWhat to do about histograms describing distributions for continuous data? Especially for large collections.
Tabulating each unique value is cumbersome.
Bin choices are arbitrary.
![Page 3: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/3.jpg)
Longevity of Women (years)10090807060504030
Models for Continuous PopulationsDistributions of continuous data are modeled with smooth curves (“density functions”).
Nonnegative.
Total area under the curveis exactly 1.
The area under the curve above an interval is equal to the probability of a result in that interval.
![Page 4: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/4.jpg)
Example – Female LongevityLeft skewed. Median > Mean.
Standard deviation 10 – 12
![Page 5: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/5.jpg)
Example – Female LongevityIf we want the probability a woman lives to at least 90 years of age…
![Page 6: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/6.jpg)
Example – Female LongevityIf we want the probability a woman lives to at least 90 years of age…
…we find the area under the curve over the interval extending from 90 to the right.
![Page 7: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/7.jpg)
Total area = 1
Total area = # rectangles size of one rectangle
Each rectangle is 2 0.004 = 0.008 units of area.
1 = # rectangles 0.008
# rectangles = 1 / 0.008 = 125
32 34 etc.
![Page 8: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/8.jpg)
![Page 9: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/9.jpg)
Interval # Rectangles
90 – 92 10.9
92 – 94 10.2
94 – 96 8.5
96 – 98 5.7
98 – 100 2.2
1 + 0.0+
Total 37.5
![Page 10: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/10.jpg)
Interval # Rectangles
90 – 92 10.9
92 – 94 10.2
94 – 96 8.5
96 – 98 5.7
98 – 100 2.2
1 + 0.0+
Total 37.5
• Total of 37.5 rectangles
• Each rectangle is 0.008 area.
• The area under the curve is
37.5 0.008 = 0.30, or
37.5 / 125 = 0.30.
30.0% of women live to at least 90
years of age.
90 years is the 70th percentile.
![Page 11: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/11.jpg)
Example – Female LongevityWhat is the probability a woman dies between the ages of 60 and 70?
![Page 12: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/12.jpg)
Interval # Rectangles
60 – 62 1.2
62 – 64 1.6
64 – 66 1.9
66 – 68 2.3
68 – 70 2.8
Total 9.8
• Total of 9.8 rectangles
• Each rectangle is 0.008 area.
• The area under the curve is
9.8 0.008 = 0.0784.
About 7.84% of women die
between the ages of 60 and 70.
![Page 13: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/13.jpg)
Example – Female LongevityDetermine the median.
It’s below 90. And above 80.
![Page 14: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/14.jpg)
Example – Female Longevity0.50 probability above M / 0.5 below M
0.5 area (under curve) below M; 0.5 above M
![Page 15: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/15.jpg)
Median Female Longevity
67.5 rectangles under curve below 86
67.50.008 = 0.54. 86 is 54th %-ile
86 is too high
![Page 16: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/16.jpg)
Median Female Longevity
58 rectangles under curve below 84
580.008 = 0.464. 84 is 46.4th %-ile
84 is too low
![Page 17: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/17.jpg)
%-ile k 46.4 50.0 54.0
Value x 84 ??? 86
![Page 18: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/18.jpg)
%-ile k 46.4 50.0 54.0
Value x 84 84.75 86
The median is about 84.75 (85) years of age.
![Page 19: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/19.jpg)
Example – Female LongevityTo approximate the mean…
…use midpoints and probabilities.
Midpoint = 71(rounded to nearest odd year)Area = 3.4 rectangles
Probability = 3.40.008 = 0.0272
![Page 20: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/20.jpg)
Left skewed
Mode = 90
Median = 85
Mean = 83
![Page 21: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/21.jpg)
Longevity of Women (years)
Den
sity
10090807060504030
0.044
0.040
0.036
0.032
0.028
0.024
0.020
0.016
0.012
0.008
0.004
0.000
m = 83.0
Example – Female LongevityMean = balance point
= 83.0
(Easy for symmetric distributions.)
![Page 22: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/22.jpg)
Longevity of Women (years)
Den
sity
10090807060504030
0.044
0.040
0.036
0.032
0.028
0.024
0.020
0.016
0.012
0.008
0.004
0.000
m = 83.0s = 11.0
Example – Female LongevityThe mean and standard deviation will generally be given, or follow from formulas.
= 83.0 = 11.0
![Page 23: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/23.jpg)
The curve models a population distribution for a continuous variable.
The model must capture the important information in the population of data.
If we think of the experiment that randomly selects a single item from the population and records a result, we call this curve a probability distribution.
![Page 24: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/24.jpg)
ExampleWait time (minutes) until seating at a restaurant is modeled by y = x / 50 over the range from x = 0 to 10.
1086420
0.20
0.15
0.10
0.05
0.00
x
y = x / 5
0
![Page 25: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/25.jpg)
Why is this a legitimate model for a continuous variable?
It’s nonnegative.
The total area is ½ b h = 0.5(10)(0.2) = 1.
![Page 26: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/26.jpg)
Determine the probability of a wait less than 6 min.
If x = 6: y = 6/50 = 0.12.
The shaded area is 0.5(6)(0.12) = 0.36.
(6 is the 36th percentile.)
![Page 27: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/27.jpg)
Determine the probability of waiting longer than 8 min.
If x = 8, y = 8/50 = 0.16. The pink area is 0.5(8)(0.16) = 0.64. (So 8 is the 64th percentile.)
The yellow area is the probability of a result greater than 8:
1 – 0.64 = 0.36.
![Page 28: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/28.jpg)
Determine the median wait.
6 is the 36th percentile; 8 is the 64th. The median is the 50th.
7 would be a good guess. However, the probability of a result less than 7 is 0.5(7)(0.14) = 0.49. The median is a bit above 7.
![Page 29: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/29.jpg)
Determine the median m.
Whatever m is: 0.5 m (m/50) = 0.5. So m2 = 50.
The median is m = 7.071.071.750 =m
![Page 30: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/30.jpg)
The mode is 10.00
The median is m = 7.071.
The mean is = 6+ 2/3 = 6.667.
![Page 31: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/31.jpg)
Determine the probability of a result between 5.5 and 6.5.
Area below 6.5: 0.4225
Area below 5.5: 0.3025
Area between 5.5 and 6.5: 0.1200
If the result were rounded to the nearest whole number, the probability is 0.12 that it rounds to 6.
1086420
0.20
0.15
0.10
0.05
0.00
x
y = x / 5
0
![Page 32: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/32.jpg)
Determine the probability of a result between 4.5 and 5.5.
Area below 5.5: 0.3025
Area below 4.5: 0.2025
Area between 4.5 and 5.5: 0.1000
If the result were rounded to the nearest whole number, the probability is 0.10 that it rounds to 5.
1086420
0.20
0.15
0.10
0.05
0.00
x
y = x / 5
0
![Page 33: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/33.jpg)
Rounds to 0 (between 0.0 and 0.5)
Rounds to 1 (between 0.5 and 1.5)
Rounds to 2 (between 1.5 and 2.5)
Rounds to 3 (between 2.5 and 3.5)
Rounds to 4 (between 3.5 and 4.5)
Rounds to 5 (between 4.5 and 5.5) 0.1000
Rounds to 6 (between 5.5 and 6.5) 0.1200
Rounds to 7 (between 6.5 and 7.5)
Rounds to 8 (between 7.5 and 8.5)
Rounds to 9 (between 8.5 and 9.5)
Rounds to 10 (between 9.5 and 10.0)
![Page 34: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/34.jpg)
Rounds to 0 (between 0.0 and 0.5)
Rounds to 1 (between 0.5 and 1.5) 0.0200
Rounds to 2 (between 1.5 and 2.5) 0.0400
Rounds to 3 (between 2.5 and 3.5) 0.0600
Rounds to 4 (between 3.5 and 4.5) 0.0800
Rounds to 5 (between 4.5 and 5.5) 0.1000
Rounds to 6 (between 5.5 and 6.5) 0.1200
Rounds to 7 (between 6.5 and 7.5) 0.1400
Rounds to 8 (between 7.5 and 8.5) 0.1600
Rounds to 9 (between 8.5 and 9.5) 0.1800
Rounds to 10 (between 9.5 and 10.0)
Sum to 0.90
![Page 35: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/35.jpg)
Determine the probability of a result between 0 and 0.5. (Would round to 0.)
Area below 0.5: 0.0025
Determine the probability of a result between 9.5 and 10.0. (Would round to 10.)
Area below 10.0: 1.0000
Area below 9.5: 0.9025
Area between9.5 and 10.0: 0.0975
1086420
0.20
0.15
0.10
0.05
0.00
x
y = x / 5
0
![Page 36: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/36.jpg)
Rounds to 0 (between 0.0 and 0.5) 0.0025
Rounds to 1 (between 0.5 and 1.5) 0.0200
Rounds to 2 (between 1.5 and 2.5) 0.0400
Rounds to 3 (between 2.5 and 3.5) 0.0600
Rounds to 4 (between 3.5 and 4.5) 0.0800
Rounds to 5 (between 4.5 and 5.5) 0.1000
Rounds to 6 (between 5.5 and 6.5) 0.1200
Rounds to 7 (between 6.5 and 7.5) 0.1400
Rounds to 8 (between 7.5 and 8.5) 0.1600
Rounds to 9 (between 8.5 and 9.5) 0.1800
Rounds to 10 (between 9.5 and 10.0) 0.0975
Sum to 1.0
![Page 37: Models for Continuous Variables. Challenge What to do about histograms describing distributions for continuous data? Especially for large collections](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0111a28abf838ccb7f2/html5/thumbnails/37.jpg)
Rounded Value x P(x) Mean Computation
0 0.0025 00.0025 = 0.0000
1 0.0200 10.0200 = 0.0200
2 0.0400 20.0400 = 0.0800
3 0.0600 30.0600 = 0.1800
4 0.0800 40.0800 = 0.3200
5 0.1000 50.1000 = 0.5000
6 0.1200 60.1200 = 0.7200
7 0.1400 70.1400 = 0.9800
8 0.1600 80.1600 = 1.2800
9 0.1800 90.1800 = 1.6200
10 0.0975 100.0975 = 0.9750
SUM = 6.6750