chapter 5 probability and statistics
DESCRIPTION
Probability, probability distribution, statistics, binomial, bernoulli, confidence interval, hypothesis test, simple linear regression, least square method, sample correlationTRANSCRIPT
1
CHAPTER 5 PROBABILITY AND STATISTICS
Definition of statistics: The mathematics of the collection, organization and interpretation of numerical data,
especially the analysis of population by inference from sampling
Let denotes a probability of an event A which is a subset of a sample space.
5.1 Rules of probability
1. Complement rule
2. Addition rule
3. For disjoint events, , thus
4. Product rule, If and independent, then .
5.2 Conditional probability
(a) If and are any events with then
(b) If and are any events with then
5.3 Multiplication rule
If and are any events then
5.4 Total probability rule
If are mutually exclusive and exhaustive events, then
kk
k
EPEAPEPEAPEPEAP
EAPEAPEAPAP
2211
21
5.5 Bayes Theorem
If are mutually exclusive events, one of which occurs given that another event occurs, then
Example 5.1 Three machines produce similar car parts. A produces 40% of the total output, machines B and C
produce 25% and 15% respectively. The proportions of the output from each machine that do not conform to the
specification are 10% for A, 5% for B and 1% for C. What proportion of these parts that do not conform to the
specification are produced by machine A?
Solution
Let D represent the event that a particular part is defective. Then the overall proportion of defective parts is
Using Bayes theorem,
Example 5.2 Suppose that 0.1% of the people in a certain area have a disease D and that a mass screening test is
used to detect cases. The test gives either a positive result or a negative result for each person. In practice the test
gives a positive result with probability 99.9% for a person who has D and a probability of 0.2% for a person who has
not. What is the probability that a person for whom the test is positive actually has the disease?
2
Solution
Let T represent the event that the test gives a positive result.
Then,
Using Bayes theorem,
5.6 Random variables
A random variable (rv) has a sample space of possible numerical values together with a distribution of probabilities.
Examples: (a) the number of defectives in a process (b) number of successful projects.
Random variables can be discrete or continuous.
Discrete random variables and distributions
Definition
If X is a discrete random variable, then xXPxp is called a probability mass function or
probability distribution if, for each outcome of x ,
(a) 0xp
(b) x
xp 1
Cumulative distribution functions
The cumulative distribution function, xF for a discrete random variable X with probability distribution
xXPxp is
xt
tXPxXPxF
Properties of the cumulative distribution functions
xF satisfies the following properties:
(a)
xt
tXPxXPxF
(b) 10 xF
(c) If yx , then yFxF
Mean of a discrete random variable
If X is a discrete random variable with probability distribution xXPxp , then the mean or
expected value for X which is denoted by X
or XE is given by
3
x
XxxpXE
Variance of a discrete random variable
If X is a discrete random variable with probability distribution xXPxp , then the variance for
X which is denoted by XV or is given by
x
XXXxpxXEXV
222
Standard deviation of a discrete random variable
The standard deviation of a discrete random variable, denoted as X , is the positive square root for the variance,
2
X .
Example 5.3
The number of successful projects X per day obtained by a small engineering firm can be described by the
following probability distribution:
otherwise0
4,3,2,1,0for10
xx
xXP
Find the cumulative distribution function for X . Find the mean and variance for the number of successful projects
per day.
Solution
The cumulative distribution function for X is given by
xt
XXtXPxXPxF
For 0x , 010
0000 PXPF
For 1x , 1011 PPXPF
1.010
10
For 2x , 21022 PPPXPF
3.010
2
10
10
For 3x , 321033 PPPPXPF
4
6.010
3
10
2
10
10
For 4x , 4321044 PPPPPXPF
0.110
4
10
3
10
2
10
10
5.7 Continuous random variables and distributions
Definition
If X is a continuous random variable defined over a set of real numbers, then xf is called a probability
density function, if
(a) 0xf
(b)
1dxxf
(c) b
a
dxxfbXaP where X lies in the interval ba,
Cumulative distribution functions
The cumulative distribution function, xF for a continuous random variable X with probability density
function xf is
x
dttfxXPxF for x
Properties of the cumulative distribution functions
xF satisfies the following properties:
(a)
a
dttfaXP for x
(b)
a
dttfaXP for x
(c) b
a
dxtfbXaP for x
Mean of a continuous random variable
If X is a continuous random variable with probability density function xf , then the mean or expected value
for X which is denoted by X
or XE is given by
5
dxxxfXEX
Variance of a continuous random variable
If X is a continuous random variable with probability density function xf , then the variance for X which is
denoted by XV or 2
X is given by
22
2
22
XX
XX
XX
dxxfx
dxxfx
XEXV
Standard deviation of a continuous random variable
The standard deviation of a continuous random variable, denoted asX
, is the positive square root for the variance,
2
X .
Example 5.4 Assume that the particle size of an air pollutant (in micrometers) can be described by the following
probability function:
otherwise0
1for3
4x
xxf X
(a) Show that the xf is a probability density function
(b) Find the cumulative distribution function
(c) Determine the mean and standard deviation
Solution
(a) xf is a probability density function if it satisfies
1dxxf .
Here
1
4
3dx
xdxxf X
11
3 x
Therefore xf is a probability density function.
(b) The cumulative distribution function for X is given by
x
XX dttfxXPxF for x
6
x
dxx
1
4
3
x
x1
3
1
33
111
1
xx
(c) The mean for X is given by
dxxxfXEX
1
4
3dx
xx
1
3
3dx
x
1
22
13
x
smicrometer2
3
The variance for X is given by
222
XX dxxfxXV
2
1
4
2
2
33
dxx
x
2
1
2 2
33
dxx
2
1 2
33
x
smicrometersq.4
3
4
93
5.8 Discrete distributions
Bernoulli distribution
PMF xx ppxXP
1
1
Range 1,0x and 10 p
Mean p
Variance pp 1
7
Binomial distribution
PMF xnx ppx
nxXP
1
Range nx ,,1,0 and 10 p
Parameters n and p
Mean np
Variance pnp 1
Example 5.5 Suppose a road is flooded with probability during a year and not more than one flood occurs
during a year. What is the probability that it will be flooded at least once during a five year period?
Solution Let X be the event a flood occurs in a year.
Then,
Poisson distribution
PMF !
e
xxXP
x
Range ,2,1,0x
Parameter
Mean
8
Variance
If and , the binomial distribution can be approximated by the Poisson distribution with .
Example 5.6 The number of flaws for a thin copper wire follows a Poisson distribution with a mean of 2.3 flaws per
mm. (a)Determine the probability of exactly two flaws in 1mm of wire. (b)Determine the probability of ten flaws in
5mm of wire.
Solution
(a) Let X be the number of flaws in 1mm of wire.
Given that , thus
(a) Let X be the number of flaws in 5mm of wire. Then X has a Poisson distribution with flaws.
5.9 Continuous distribution
Normal distribution
2
2
1exp
2
1
xxf
Range 0,0, x
Parameters : location parameter, : scale parameter
If X follows a normal distribution then .
Also,
5.10 Sample measures and parameter estimates
Let n
XXX ,,,21 be a random sample from a population with mean and variance
2 . Then the point
estimate for and are
x̂ where
n
x
n
xxxx
n
ii
n
121
is the sample mean
z
6420-2-4-6
f(x)
.5
.4
.3
.2
.1
0.0
9
And
22ˆ s where
N
ii
xxn
s1
22
1
1is the sample variance.
Thus if then
.
5.11 Confidence interval for the mean based on the normal distribution
(1)Population variance is known
The %1100 confidence interval for the mean is given by
nzX
nzX
22
where
(a) X is the sample mean.
(b) 2
z is the th
2100
quantile of the standard normal distribution which is given in Table 1.
Assumptions:
(a) n
XXX ,,,21 is the random sample of size n from a population which has a normal distribution
with mean and variance 2 .
(b) The sample size n can either be small or large.
(2)Population variance is unknown
The %1100 confidence interval for the mean is given by
n
SzX
n
SzX
22
where
(a) X is the sample mean and S is the sample standard deviation.
(b) 2
z is the th
2100
quantile of the standard normal distribution which is given in Table 1.
Assumptions:
(a) n
XXX ,,,21 is the random sample of size n from a population which has a normal distribution with
mean and variance 2 .
(b) The sample size n is large.
Table 1: Cumulative distribution function for the standard normal distribution
10
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
Example 5.7 A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
0
zx
dxzZP2
2
1
e2
1
11
15.42 12.85 10.28 13.36 15.42 20.56 16.28 25.70 15.42 9.25
10.28 9.25 8.22 11.31 14.91 16.45 13.36 15.42 13.36 12.85
11.31 11.31 12.85 11.82 14.39 15.42 16.96 21.59 15.42 15.42
12.85 12.85 11.82 14.39 12.34 24.67 12.85 20.05 27.24 22.62
Find a 90% confidence interval for the true mean wind speed in Penang.
Solution
Let be the true mean wind speed (in m/s) in Penang.
Since the sample size is large 40n , the following confidence interval is used.
Thus the %90 confidence interval for the true population means is given by
n
SzX
n
SzX
22
n
SzX
n
SzX
05.005.0
40
489.465.1953.14
40
489.465.1953.14
710.065.1953.14710.065.1953.14
172.1953.14172.1953.14
125.16781.13
Calculations
953.1440
62.2242.1528.1042.15
40
4021
XXX
X
149.20489.4
39
953.1462.22953.1428.10953.1442.15
39
1
2
22240
1
22
ii
XXS
From Table 1, 65.105.0
z
Example 5.8 The flow discharge of Sungai Kerian (measured in m
3/s) was obtained at random. 50 readings were collected and the
mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m
3/s. Construct a 99% confidence
interval for the true mean flow discharge of Sungai Kerian.
Solution
Let be the true mean flow discharge of Sungai Kerian.
Since the sample size is large 50n , the following confidence interval is used.
12
Thus the %99 confidence interval for the true population means is given by
n
SzX
n
SzX
22
n
SzX
n
SzX
005.0005.0
50
5.057.2512.3
50
5.057.2512.3
071.057.2512.3071.057.2512.3
182.0512.3182.0512.3
694.3330.3
Calculations
512.3X 50n 5.0S . From Table 1, 57.2005.0
z
5.12 Confidence intervals for the mean based on the t distribution
The %1100 confidence interval for the mean is given by
n
StX
n
StX
nn 1,2
1,2
where
(a) X is the sample mean.
(b) S is the sample standard deviation.
(c) 1,
2n
t
is the th2
100
quantile of the t distribution with 1n degrees of freedom. The critical
values of the t distribution is given in Table 2.
Assumptions:
(a) n
XXX ,,,21 is the random sample of size n from a population which has a
normal distribution with mean and variance 2 .
(b) The sample size n is small.
Example 5.9
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81 2.00 2.74 3.56 2.13
4.64 3.64 4.62 4.47 3.12
Construct a 98% confidence interval for the true moisture content for clay by assuming that the sample is from a
normal distribution.
Solution
Let be the true mean moisture content (in percentage) for clay.
13
Since the sample size is small 10n , the following confidence interval is used.
Thus the %98 confidence interval for the true population means is given by
n
StX
n
StX
nn 1,2
1,2
n
StX
n
StX
9,01.09,01.0
10
091.1821.2273.3
10
091.1821.2273.3
345.0821.2273.3345.0821.2273.3
973.0273.3973.0273.3
246.4300.2
Calculations
273.310
12.313.264.481.1
10
0021
XXX
X
190.1091.19
273.312.3273.364.4273.381.1
9
1 2
22210
1
22
ii
XXS
From Table 2, 821.29,01.0
t
Table 2: Critical values for the t distribution with degrees of freedom
0 t
14
0.40 0.30 0.20 0.15 0.10 0.05 0.025 0.02 0.015 0.01
1 0.325 0.727 1.376 1.963 3.078 6.314 12.706 15.895 21.205 31.821
2 0.289 0.617 1.061 1.386 1.886 2.920 4.303 4.849 5.643 6.965
3 0.277 0.584 0.978 1.250 1.638 2.353 3.182 3.482 3.896 4.541
4 0.271 0.569 0.941 1.190 1.533 2.132 2.776 2.999 3.298 3.747
5 0.267 0.559 0.920 1.156 1.476 2.015 2.571 2.757 3.003 3.365
6 0.265 0.553 0.906 1.134 1.440 1.943 2.447 2.612 2.829 3.143
7 0.263 0.549 0.896 1.119 1.415 1.895 2.365 2.517 2.715 2.998
8 0.262 0.546 0.889 1.108 1.397 1.860 2.306 2.449 2.634 2.897
9 0.261 0.543 0.883 1.100 1.383 1.833 2.262 2.398 2.574 2.821
10 0.260 0.542 0.879 1.093 1.372 1.813 2.228 2.359 2.528 2.764
11 0.260 0.540 0.876 1.088 1.363 1.796 2.201 2.328 2.491 2.718
12 0.259 0.539 0.873 1.083 1.356 1.782 2.179 2.303 2.461 2.681
13 0.259 0.538 0.870 1.080 1.350 1.771 2.160 2.282 2.436 2.650
14 0.258 0.537 0.868 1.076 1.345 1.761 2.145 2.264 2.415 2.625
15 0.258 0.536 0.866 1.074 1.341 1.753 2.131 2.249 2.397 2.603
16 0.258 0.535 0.865 1.071 1.337 1.746 2.120 2.235 2.382 2.584
17 0.257 0.534 0.863 1.069 1.333 1.740 2.110 2.224 2.368 2.567
18 0.257 0.534 0.862 1.067 1.330 1.734 2.101 2.214 2.356 2.552
19 0.257 0.533 0.861 1.066 1.328 1.729 2.093 2.205 2.346 2.540
20 0.257 0.533 0.860 1.064 1.325 1.725 2.086 2.197 2.336 2.528
21 0.257 0.532 0.859 1.063 1.323 1.721 2.080 2.189 2.328 2.518
22 0.256 0.532 0.858 1.061 1.321 1.717 2.074 2.183 2.320 2.508
23 0.256 0.532 0.858 1.060 1.320 1.714 2.069 2.177 2.313 2.500
24 0.256 0.531 0.857 1.059 1.318 1.711 2.064 2.172 2.307 2.492
25 0.256 0.531 0.856 1.058 1.316 1.708 2.060 2.167 2.301 2.485
26 0.256 0.531 0.856 1.058 1.315 1.706 2.056 2.162 2.296 2.479
27 0.256 0.531 0.855 1.057 1.314 1.703 2.052 2.158 2.291 2.473
28 0.256 0.530 0.855 1.056 1.313 1.701 2.048 2.154 2.286 2.467
29 0.256 0.530 0.854 1.055 1.311 1.699 2.045 2.150 2.282 2.462
30 0.256 0.530 0.854 1.055 1.310 1.697 2.042 2.147 2.278 2.457
40 0.255 0.529 0.851 1.050 1.303 1.684 2.021 2.123 2.250 2.423
60 0.254 0.527 0.848 1.046 1.296 1.671 2.000 2.099 2.223 2.390
120 0.254 0.526 0.845 1.041 1.289 1.658 1.980 2.076 2.196 2.358
0.253 0.524 0.842 1.036 1.282 1.645 1.960 2.054 2.170 2.326
5.13 Tests of hypotheses for the mean based on the normal distribution
(1)Population variance is known
One tail tests Two tail tests
01
01
00
:
:
:
dH
dH
dH
01
00
:
:
dH
dH
Test statistic
15
n
dXZ
2
0
Rejection region
Reject
0H if
zZ
(or
zZ )
2
zZ
Notes:
(a) 0
d is a constant.
(b) X is the sample mean.
(c) 2
z is the th
2100
quantile of the standard normal distribution which is given in Table 1.
Assumptions:
(a) n
XXX ,,,21 is a random sample of size n from a population which has a normal distribution with
mean and variance 2 .
(b) The sample size n can either be small or large.
2 Population variance is unknown
One tail tests Two tail tests
01
01
00
:
:
:
dH
dH
dH
01
00
:
:
dH
dH
Test statistic
n
S
dXZ
2
0
Rejection region
Reject
0H if
zZ
(or
zZ )
2
zZ
Notes:
(a) 0
d is a constant.
16
(b) X is the sample mean and S is the sample standard deviation.
(c) 2
z is the th
2100
quantile of the standard normal distribution which is given in Table 1.
Assumptions:
(a) n
XXX ,,,21 is a random sample of size n from a population which has a normal distribution with
mean and variance 2 .
(b) The sample size n is large.
Example 5.10
A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
15.42 12.85 10.28 13.36 15.42 20.56 16.28 25.70 15.42 9.25
10.28 9.25 8.22 11.31 14.91 16.45 13.36 15.42 13.36 12.85
11.31 11.31 12.85 11.82 14.39 15.42 16.96 21.59 15.42 15.42
12.85 12.85 11.82 14.39 12.34 24.67 12.85 20.05 27.24 22.62
Can you conclude that the mean wind speed in Penang is less than 12m/s? Use 10.0 .
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let be the true mean wind speed (in m/s) in Penang.
Since the sample size is large 40n , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses
12:
12:
1
0
H
H
Step 3 : Calculate the test statistic
n
S
dXZ
2
0
40
149.20
12953.14 Z
710.0
953.2Z
159.4Z
Calculations
17
953.1440
62.2242.1528.1042.15
40
4021
XXX
X
149.20489.4
39
953.1462.22953.1428.10953.1442.15
39
1
2
22240
1
22
ii
XXS
Step 4 : Determine the rejection region
Reject 0
H if 28.110.0
zzZ
(From Table 1).
Step 5 : Result
The null hypothesis cannot be rejected.
Step 6 : Conclusion
At 10.0 , there is insufficient evidence to show that the true mean wind speed (in m/s) in Penang is less
than 12m/s.
Example 5.11
The flow discharge of Sungai Kerian (measured in m3/s) was obtained at random. Fifty readings were collected and
the mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m
3/s. Show that the true mean
flow discharge at Sungai Kerian is not equal to 4 m3/s. Use 05.0 .
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let be the true mean flow discharge of Sungai Kerian.
Since the sample size is large 50n , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses
4:
4:
1
0
H
H
Step 3 : Calculate the test statistic
n
S
dXZ
2
0
where 50,25.0,512.3 2 nSX
50
25.0
4512.3 Z
18
071.0
488.0Z
873.6Z
Step 4 : Determine the rejection region
Reject 0H if 96.1
025.02
zzZ
or 96.1025.0
2
zzZ
(From Table 1)
Step 5 : Result
The null hypothesis is rejected.
Step 6 : Conclusion
At 10.0 , there is sufficient evidence to show that the true mean flow discharge of Sungai Kerian is not
equal to 4 m3/s.
5.14 Test of hypothesis for the mean based on the t distribution
One tail tests Two tail tests
01
01
00
:
:
:
dH
dH
dH
01
00
:
:
dH
dH
Test statistic
n
S
dXT
2
0
Rejection region
Reject
0H if
1,
ntT
(or 1, ntT )
1,
2
n
tT
Notes:
(a) 0
d is a constant.
(b) X is the sample mean.
(c) S is the sample standard deviation.
(d) 1,
2n
t
is the th2
100
quantile of the t distribution with 1n degrees of freedom. The critical values
of the t distribution is given in Table 2.
Assumptions:
19
(a) n
XXX ,,,21 is a random sample of size n from a population which has a normal distribution with
mean and variance 2 .
(b) The sample size n is small.
Example 5.12
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81 2.00 2.74 3.56 2.13
4.64 3.64 4.62 4.47 3.12
Is the moisture content greater than 3.0%? Use 05.0 .
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let be the true mean moisture content (in percentage) for clay.
Since the sample size is small 10n , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses
0.3:
0.3:
1
0
H
H
Step 3 : Calculate the test statistic
n
S
dXT
2
0
9
190.1
0.3273.3 T
364.0
273.0T
750.0T
Calculations
273.310
12.313.264.481.1
10
0021
XXX
X
190.1091.1
9
273.312.3273.364.4273.381.1
9
1 2
22210
1
22
i
i XXS
Step 4 : Determine the rejection region
Reject 0H if 833.19,05.01, ttT n (From Table 2).
Step 5 : Result
The null hypothesis cannot be rejected.
20
Step 6 : Conclusion
At 10.0 , there is insufficient evidence to show that the true mean moisture content (in percentage) for clay is
greater than 3%.
5.15 Sample correlation
Correlation measures the linear relationship between two variables, X andY .
The sample correlation coefficient of n pairs of observations nn
yxyxyx ,,,,,,2211 denoted by
r is given by
n
YY
n
XX
n
YXYX
YYXX
YYXXr
n
iin
ii
n
iin
ii
n
ii
n
iin
iii
n
ii
n
ii
n
iii
2
1
1
2
2
1
1
2
11
1
1
2
1
2
1ˆ
The strength of the linear relationship is determined by the following:
If 00.180.0 r then the relationship is very strong.
If 79.060.0 r then the relationship is strong.
If 59.040.0 r then the relationship is moderate.
If 39.020.0 r then the relationship is weak.
If 19.000.0 r then the relationship is very weak.
21
Example 5.13
The cost, Y of a manufacturing product usually depends on the lot size, X . The following data on the cost of the
manufacturing product and its lot size is given below:
Y 30 70 140 270 530 1000 2000 3000
X 1 5 10 25 50 100 250 500
Find the value of the correlation coefficient for the above data.
Solution
The correlation coefficient between Y and X is given by
n
YY
n
XX
n
YXYX
rn
iin
ii
n
iin
ii
n
ii
n
iin
iii
2
1
1
2
2
1
1
2
11
1
8
704014379200
8
941325751
8
70409412135030
22
22
1326696
1306950
8.286075.463
1306950
985.0
Therefore, there is a very strong linear relationship between cost and lot size.
Calculations
8n
9418
1
i
iX , 7040
8
1
i
iY , 2135030
8
1
i
iiYX , 00.325751
8
1
2 i
iX ,
143792008
1
2 i
iY
5.16 Simple linear regression
Let nn
YXYXYX ,,,,,,2211 be n pairs of random variables. Then the simple linear regression
model is given by
niXYiii
,,2,110
where
i
Y is the dependent or response variable
i
X is the independent or regressor or explanatory or predictor
variable
0
is the intercept of the regression model
1
is the slope of the regression model
i is the random error term
Assumptions
The assumptions of the random error term are:
(a) 0i
E
(b) 2
ciV (a constant)
(c) The probability distribution is normal
(d) Random error term is independent
Method of least squares
The method of least squares can be used to estimate the values of the intercept (0
) and slope (1
) parameters.
This method minimizes the sum of squares of the random error term, that is
n
iii
n
ii
XYL1
2
101
2 minmin
23
Hence,
0ˆˆ21
10
0
n
iii
XYL
0ˆˆ21
10
1
n
iiii
XXYL
Simplifying yields,
n
ii
n
ii
YXn11
10
ˆˆ
i
n
ii
n
ii
n
ii
XYXX
11
2
11
0
ˆˆ
Solving the two equations yield,
XY10
ˆˆ and
n
XX
n
XYXY
n
iin
ii
n
i
n
ii
n
ii
ii
2
1
1
2
1
11
1
ˆ
where
n
YY
n
ii
1and
n
XX
n
ii
1.
Thus the fitted or estimated regression model is
niXYii
,,2,1ˆˆˆ10
24
iiiYYe ˆ is called the residual.
Example 5.14
The yield of a chemical process (in percentage) is hypothesized to be linearly related with the amount of catalyst (in
grams). Let Y denote the yield of the chemical process and X be the amount of catalyst. The data is given below.
X 0.9 1.4 1.6 1.7 1.8 2.0 2.1
Y 60.54 63.86 63.76 60.15 66.66 71.66 70.81
Fit a simple linear regression model.
Solution
The following simple linear regression model is fitted
7,,2,110
iXYiii
where
i
Y is the yield of a chemical process
i
X is the amount of catalyst
By using the least squares method, the estimates for 0
and 1
are
8929.1887.19
5086.75117.760ˆ2
1
1
2
1
11
1
n
XX
n
XYXY
n
iin
ii
n
i
n
ii
n
ii
ii
8644.89771.0
6614.8
And
7844.505642.143486.65643.18644.83486.65ˆˆ10
XY
Therefore the fitted simple linear regression model is ii
XY 864.8784.50 for 7,,2,1 i
Example 5.15
A study was conducted to determine the relationship between bridge pier scour depths, D and discharge intensity,
q . A simple linear regression model of the form 1
0
qD was proposed. The following data was obtained:
D q D q D q D q
35.67 52.51 12.62 11.99 20.73 25.56 11.48 13.22
31.71 52.04 9.76 10.33 11.24 7.39 8.71 11.21
17.84 22.58 8.54 8.36 8.80 6.71 4.94 2.61
14.63 8.51 13.87 8.24 12.44 13.28 10.07 13.21
25
12.71 11.15 11.60 6.29 9.20 6.49 5.50 1.62
13.72 13.75 19.51 22.03 9.76 6.42 7.13 7.72
12.88 14.31 11.89 11.15 11.42 7.78 6.85 4.68
19.35 9.20 13.72 18.59 11.22 11.85 4.00 3.40
11.92 8.60 11.89 13.66 10.47 9.78 4.07 4.00
14.98 11.43 12.80 15.99 9.48 7.48 4.08 3.18
Determine the simple linear regression model for this problem.
Solution
The proposed model is given by
1
0
qD
The above model can be transformed into a simple linear regression model by taking natural logarithm as follows:
1
0lnln
qD
1lnlnln0
qD
qD lnlnln10
Letting DYi
ln ,00
ln and qXi
ln , we will obtain the following linear regression model
40,,2,110
iXYiii
The following data gives the new values for DYi
ln and qXi
ln
iY iX iY iX iY iX iY iX
3.57 3.96 2.54 2.48 3.03 3.24 2.44 2.58
3.46 3.95 2.28 2.34 2.42 2.00 2.16 2.42
2.88 3.12 2.14 2.12 2.17 1.90 1.60 .96
2.68 2.14 2.63 2.11 2.52 2.59 2.31 2.58
2.54 2.41 2.45 1.84 2.22 1.87 1.70 .48
2.62 2.62 2.97 3.09 2.28 1.86 1.96 2.04
2.56 2.66 2.48 2.41 2.44 2.05 1.92 1.54
2.96 2.22 2.62 2.92 2.42 2.47 1.39 1.22
2.48 2.15 2.48 2.61 2.35 2.28 1.40 1.39
2.71 2.44 2.55 2.77 2.25 2.01 1.41 1.16
By using the least squares method, the estimates for 0
and 1
are
1615.20725.226
4492.21809.230ˆ2
1
1
2
1
11
1
n
XX
n
XYXY
n
iin
ii
n
i
n
ii
n
ii
ii
26
6098.00885.19
6408.11
And
012.13877.13997.22757.26098.03997.2ˆˆ10
XY
Here 00
ln
So 7511.2012.1
0
0
ee
Therefore the fitted model is 6098.0
07511.21 qqD
for 40,,2,1 i
Calculations
3997.240
99.95
40
41.140.146.357.3
40
40
1
i
iY
Y
2757.240
03.91
40
16.139.195.396.3
40
40
1
i
iX
X
09.23016.141.139.140.195.346.396.357.340
1
i
iiXY
4492.218
40
9697.8737
40
03.9199.95
40
1
40
1
n
XYi
ii
i
22222225
1
2 16.139.122.112.395.396.3
i
iX
25.22634.192.150.172.962.1569.15
1615.207
40
4609.8286
40
03.91 2
240
1
n
Xi
i