probability

59
Probability

Upload: zion

Post on 31-Jan-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Probability. Principles of probability calculations. Probability values range from 0 to 1. Adding all probabilities of the sample yields 1. The probability that an event A will not occur is 1 minus the probability of A. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probability

Probability

Page 2: Probability

Probability values range from 0 to 1. Adding all probabilities of the sample yields 1. The probability that an event A will not occur is 1

minus the probability of A. If two events are independent, the probability that

one or the other event occurs is the sum of their individual probabilities.

Principles of probability calculations

Page 3: Probability

Simple probability

P(A) = 1/6 = 0.1666

Sample space: 1,2,3,4,5,6

Page 4: Probability

Joint probability

P(5,6) =

P(A,B) = P(A) P(B)

P(0.166) P(0.166) = 0.0277

Page 5: Probability

Joint probability

-> V NP PP-> V [NP PP]

(1) keep the dogs on the beach

Page 6: Probability

keep

VP VP → V NP XP [.15]

V NP PP

the dogs on the beach

keep: V NP XP [.81]

.15 x .81 = .12

Conditional probability

Page 7: Probability

keep

VP VP → V NP XP [.15]

V

NP PP

the dogs on the beach

keep: V NP [.19]

.19 x .39 x 14 = .01

NP NP → NP PP [.14]

Conditional probability

Page 8: Probability

Conditional probability

Page 9: Probability

Conditional probability

In a corpus including 12.000 nouns and 3.500 adjectives, 2.000 adjectives precede a noun. What is the likelihood that a noun occurs after an adjective?

P(2000)

P(12000)P(ADJ|N) = 0.1666

Page 10: Probability

Conditional probability

What is the likelihood that an adjective precedes a noun?

P(2000)

P(3500)P(N|ADJ) = 0.5714

Page 11: Probability

Probability distribution

Page 12: Probability

Discrete probability distribution Continuous probability distribution

Types of probability distributions

Page 13: Probability

Binomial distribution

Page 14: Probability

two possible outcomes on each trail

the outcomes are independent of each other

the probability ratio is constant across trails

Bernoulli trail:

Binomial distribution

Page 15: Probability

TH

HH HT TH TT

Binomial distribution

Page 16: Probability

0 heads = HH

1 head = HT + TH

2 heads = TT

Binomial distribution

Page 17: Probability

HH

HT

TH

TT

0

1

2

Sample space Random variable

Binomial distribution

Page 18: Probability

Cumulative outcome Probability

0 = 11 = 22 = 1

0.250.500.25

P(x) = 1

Binomial distribution

Page 19: Probability

TH

HH HT TH TT

HHH HHT HTH HTT THH THT TTH TTT

Page 20: Probability

Sample space: HHH TTTHHT TTHHTH THTTHH HTT

Random variables: 0 Head1 Head2 Heads3 Heads

0 head: 11 head: 32 heads: 33 heads: 1

/ 8 = 0.125/ 8 = 0.375/ 8 = 0.375/ 8 = 0.125

Page 21: Probability

Binomial distribution

Page 22: Probability

Poisson distribution

Page 23: Probability

Normal distribution

Page 24: Probability

The center of the curve represents the mean, median, and mode.

The curve is symmetrical around the mean. The tails meet the x-axis in infinity. The curve is bell-shaped. The total under the curve is equal to 1 (by definition).

Normal distribution

Page 25: Probability

Normal distribution

Page 26: Probability

Standard normal distribution

1.96

Page 27: Probability

x1 – x

SD

z-scores

Page 28: Probability

z-scores

Zwei Kandidaten haben an zwei unterschiedlichen Sprachtests teilgenommen. Kandidat A hat 121 Punkte erzielt, Kandidat B hat 177 Punkte erzielt. Im ersten Test (an dem Kandidat A teilgenommen hat) lag der Mittelwert bei 92 und die Standardabweichung bei 14; im zweiten Test (an dem Kandidat B teilgenommen hat) lag der Mittelwert bei 143 und die Standardabweichung bei 21. Welcher der beiden Kandidaten hat besser abgeschnitten (im Vergleich zu allen übrigen Kandidaten)?

ZA = 121 – 92 / 14 = 2.07

ZB = 177 – 143 / 21 = 1.62

Page 29: Probability

Central limit theorem

Page 30: Probability

Central limit theorem

6, 2, 5, 6, 2, 3, 1,

6, 1, 1, 4, 6, 6, 2,

2, 1, 1, 5, 1, 3 = 2.64

Page 31: Probability

X1 X2 X3 X4 M

Sample 1 6 2 5 6 4.75

Central limit theorem

Page 32: Probability

X1 X2 X3 X4 M

Sample 1 6 2 5 6 4.75

Sample 2 2 3 1 6 3

Central limit theorem

Page 33: Probability

X1 X2 X3 X4 M

Sample 1 6 2 5 6 4.75

Sample 2 2 3 1 6 3

Sample 3 1 1 4 6 3

Central limit theorem

Page 34: Probability

X1 X2 X3 X4 M

Sample 1 6 2 5 6 4.75

Sample 2 2 3 1 6 3

Sample 3 1 1 4 6 3

Sample 4 6 2 2 1 2.75

Central limit theorem

Page 35: Probability

X1 X2 X3 X4 M

Sample 1 6 2 5 6 4.75

Sample 2 2 3 1 6 3

Sample 3 1 1 4 6 3

Sample 4 6 2 2 1 2.75

Sample 5 1 5 1 3 2.5

Central limit theorem

Page 36: Probability

4.75 + 3.0 + 3.0 + 2.75 + 2.5 = 3.2

5

Mean of sample mean

Page 37: Probability

The sample means are normally distributed (even if the phenomenon in the parent population is not normally distributed).

2,50 3,00 3,50 4,00 4,50 5,00

case

0

2

4

6

8

10

12

Häu

fig

keit

Mean = 3,352Std. Dev. = 0,44802N = 25

Central limit theorem

Page 38: Probability

Der Mittelwert der individuellen Mittelwerte nähert sich dem Mittelwert in der wahren Population an.

Die Mittelwerte der Stichproben ist normalverteilt, selbst wenn das Phänomen, das wir untersuchen, in der wahren Population nicht normalverteilt ist.

Alle parametrischen Tests nutzen die Tatsache, dass die Mittelwerte der Stichproben (ab einer bestimmten Anzahl von Stichproben) normalverteilt sind.

Central limit theorem

Page 39: Probability

population

Page 40: Probability

population

sample

Page 41: Probability

population

sample

mean of this sample

Page 42: Probability

population

sample

mean of this sample

distribution of many sample means

Page 43: Probability

How many samples do you need to assume that the mean of the sample means is normally distributed?

Are your data normally distributed?

Page 44: Probability

The distribution in the parent population (normal, slightly skewed, heavily skewed).

The number of observations in the individual sample.

The total number of individual samples.

Are your data normally distributed?

Page 45: Probability

Confidence intervals

Page 46: Probability

Confidence intervals indicate a range within which the mean (or other parameters) of the true population is located given the values of your sample and assuming a particular degree of certainty.

Confidence intervals

Page 47: Probability

The mean of the sample means

The SDs of the sample means, i.e. the standard error

The degree of certainty with which you want to state

the estimation

Confidence intervals

Page 48: Probability

(xn – x)2

N- 1

Standard deviation

Page 49: Probability

Samples Mean

12345

1.51.81.32.01.7

8.3 / 5= 1.66 (mean)

Standard error

Page 50: Probability

Samples Mean Individual means – Mean of means

12345

1.51.81.32.01.7

1.5 – 1.661.8 – 1.664 – 1.669 – 1.6612 – 1.66

8.3 / 5= 1.66 (mean)

Standard error

Page 51: Probability

Samples Mean Individual means – Mean of means

12345

1.51.81.32.01.7

1.5 – 1.661.8 – 1.664 – 1.669 – 1.6612 – 1.66

0.160.14– 0.36– 0.360.04

8.3 / 5= 1.66 (mean)

Standard error

Page 52: Probability

Samples Mean Individual means – Mean of means

squared

12345

1.51.81.32.01.7

1.5 – 1.661.8 – 1.664 – 1.669 – 1.6612 – 1.66

0.160.14– 0.36– 0.360.04

0.02560.01960.12960.11560.0016

8.3 / 5= 1.66 (mean)

Standard error

Page 53: Probability

Samples Mean Individual means – Mean of means

squared

12345

1.51.81.32.01.7

1.5 – 1.661.8 – 1.664 – 1.669 – 1.6612 – 1.66

0.160.14– 0.36– 0.360.04

0.02560.01960.12960.11560.0016

8.3 / 5= 1.66 (mean)

0.292

Standard error

Page 54: Probability

0.292

5 - 1= 0.2701

Standard error

Page 55: Probability

[degree of certainty] [standard error] = x

[sample mean] +/–x = confidence interval

Confidence intervals

Page 56: Probability

95% degree of certainty = 1.96 [z-score]

Confidence interval of the first sample (mean = 1.5):

1.96 0.2701 = 0.53

1.5 +/- 0.53 = 0.97–2.03

We can be 95% certain that the population mean is located in the range between 0.97 and 2.03.

Confidence intervals

Page 57: Probability

SD

N

Confidence intervals

Page 58: Probability

What is the 95% confidence interval of the following

sample: 2, 5, 6, 7, 10, 12?

SD: (2-7)2 + (5-7)2 + (6-7)2 + (7-7)2 + (10-7)2 + (12-7)2

6 -1

Standard error: 3.58 / 6 = 1.46

Mean: 7

= 3.58

Confidence I.: 1.46 1.96 = 2.86

7 +/– 2.86 = 4.14 – 9.86