1 probability theory ling 570 fei xia week 2: 10/01/07
TRANSCRIPT
1
Probability theory
LING 570
Fei Xia
Week 2: 10/01/07
2
Misc.
• Patas account and dropbox
• Course website, “Collect it”, and GoPost.
• Mailing list– Received message on Thursday?
• Questions about hw1?
3
Outline
• Quiz #1
• Unix commands
• Linguistics
• Elementary Probability theory: M&S 2.1
4
Quiz #1
Five areas: weight ave
• Programming: 4.0 (3.74)– Try Perl or Python
• Unix commands: 1.2 (0.99)
• Probability: 2.0 (1.09)
• Regular expression: 2.0 (1.62)
• Linguistics knowledge: 0.8 (0.71)
5
Results
• 9.0-10: 4
• 8.0-8.9: 8
• < 8.0: 8
6
Unix commands
• ls (list), cp (copy), rm (remove)
• more, less, cat
• cd, mkdir, rmdir, pwd
• chmod: to change file permission
• tar, gzip: to tar/zip files
• ssh, sftp: to log on or ftp files
• man: to learn a command
7
Unix commands (cont)
• compilers: javac, gcc, g++, perl, …• ps, top, • which
• Pipe:cat input_file | eng_tokenizer.sh | make_voc.sh > output_file
• sort, unique, awk, grepgrep “the” voc | awk ‘{print $2}’ | sort | uniq –c | sort -nr
8
Examples
• Set the permission of foo.pl so it is readable and executable by the user and the group.
rwx rwx rwx => 101 101 000 chmod 550 foo.pl
• Move a file, foo.pl, from your home dir to /tmp mv ~/foo.pl /tmp
9
Linguistics: POS tags
• Open class: Noun, verb, adjective, adverb– Auxiliary verb/modal: can, will, might, ..– Temporal noun: tomorrow– Adverb: adj+ly, always, still, not, …
• Closed class: Preposition, conjunction, determiner, pron,– Conjunction: CC (and), SC (if, although)– Complementizer: that,
10
Linguistics: syntactic structure
• Two kinds:– Phrase structure (a.k.a. parse tree): – Dependency structure
• Examples: – John said that he would call Mary tomorrow
11
Outline
• Quiz #1
• Unix commands
• Linguistics
• Elementary Probability theory
12
Probability Theory
13
Basic concepts
• Sample space, event, event space
• Random variable and random vector
• Conditional probability, joint probability, marginal probability (prior)
14
Sample space, event, event space
• Sample space (Ω): the set of all possible outcomes. – Ex: toss a coin three times: {HHH, HHT, HTH, HTT, …}
• Event: an event is a subset of Ω.– Ex: an event is {HHT, HTH, THH}
• Event space (2Ω): the set of all possible events.
15
Probability function
• A probability function (a.k.a. a probability distribution) distributes a probability mass of 1 throughout the sample space .
• It is a function from 2 ! [0,1] such that:
P() = 1
For any disjoint sets Aj 2 2, P( Aj) = P(Aj)
- Ex: P({HHT, HTH, HTT})
= P({HHT}) + P({HTH}) + P({HTT})
16
The coin example
• The prob of getting a head is 0.1 for one toss. What is the prob of getting two heads out of three tosses?
• P(“Getting two heads”)
= P({HHT, HTH, THH})
= P(HHT) + P(HTH) + P(THH)
= 0.1*0.1*0.9 + 0.1*0.9*0.1+0.9*0.1*0.1
= 3*0.1*0.1*0.9
17
Random variable
• The outcome of an experiment need not be a number.
• We often want to represent outcomes as numbers.
• A random variable X is a function: ΩR.– Ex: the number of heads with three tosses:
X(HHT)=2, X(HTH)=2, X(HTT)=1, …
18
The coin example (cont)
• X = the number of heads with three tosses
• P(X=2)
= P({HHT, HTH, THH})
= P({HHT}) + P({HTH}) + P({THH})
19
Two types of random variables
• Discrete: X takes on only a countable number of possible values.– Ex: Toss a coin three times. X is the number
of heads that are noted.
• Continuous: X takes on an uncountable number of possible values.– Ex: X is the speed of a car
20
Common trick #1: Maximum likelihood estimation
• An example: toss a coin 3 times, and got two heads. What is the probability of getting a head with one toss?
• Maximum likelihood: (ML)
* = arg max P(data | )
• In the example, – P(X=2) = 3 * p * p * (1-p) e.g., the prob is 3/8 when p=1/2, and is 12/27 when p=2/3 3/8 < 12/27
21
Random vector
• Random vector is a finite-dimensional vector of random variables: X=[X1,…,Xk].
• P(x) = P(x1,x2,…,xn)=P(X1=x1,…., Xn=xn)
• Ex: P(w1, …, wn, t1, …, tn)
22
Notation
• X, Y, Xi, Yi are random variables.
• x, y, xi are values.
• P(X=x) is written as P(x)
• P(X=x | Y=y) is written as P(x | y).
23
Three types of probability
• Joint prob: P(x,y)= prob of X=x and Y=y happening together
• Conditional prob: P(x | y) = prob of X=x given a specific value of Y=y
• Marginal prob: P(x) = prob of X=x for all possible values of Y.
24
An example
• There are two coins. Choose a coin and then toss it. Do that 10 times.
• Coin 1 is chosen 4 times: one head and three tails.
• Coin 2 is chosen six times: four heads and two tails.
• Let’s calculate the probabilities.
25
Probabilities
• P(C=1) = 4/10, P(C=2) = 6/10
• P(X=h) = 5/10, P(X=t) = 5/10
• P(X=h | C=1) = ¼, P(X=h |C=2) =4/6• P(X=t | C=1) = ¾, P(X=t |C=2) = 2/6
• P(X=h, C=1) =1/10, P(X=h, C=2)= 4/10• P(X=t, C=1) = 3/10, P(X=t | C=2) = 2/10
26
Relation between different types of probabilities
P(X=h, C=1)
= P(C=1) * P(X=h | C=1)
= 4/10 * ¼ = 1/10
P(X=h)
= P(X=h, C=1) + P(X=h, C=2)
= 1/10 + 4/10 = 5/10
27
Common trick #2:Chain rule
)|(*)()|(*)(),( BAPBPABPAPBAP
),...|(),...,( 111
1 ii
in AAAPAAP
28
Common trick #3: joint prob Marginal prob
B
BAPAP ),()(
nAA
nAAPAP,...,
11
2
),...,()(
29
Common trick #4:Bayes’ rule
)(
)()|(
)(
),()|(
AP
BPBAP
AP
BAPABP
)()|(maxarg
)(
)()|(maxarg
)|(maxarg*
yPyxP
xP
yPyxP
xyPy
y
y
y
30
Independent random variables
• Two random variables X and Y are independent iff the value of X has no influence on the value of Y and vice versa.
• P(X,Y) = P(X) P(Y)
• P(Y|X) = P(Y)
• P(X|Y) = P(X)
• Our previous examples: P(X, C) != P(X) P(C)
31
Conditional independence
Once we know C, the value of A does not affect the value of B and vice versa.
• P(A,B | C) = P(A|C) P(B|C)
• P(A|B,C) = P(A | C)
• P(B|A, C) = P(B |C)
32
Independence and conditional independence
• If A and B are independent, are they conditional independent?
• Example:– Burglar, Earthquake– Alarm
33
Common trick #5:Independence assumption
)|(
),...|(),...,(
11
111
1
ii
i
ii
in
AAP
AAAPAAP
34
An example
• P(w1 w2 … wn)
= P(w1) P(w2 | w1) P(w3 | w1 w2) * …
* P(wn | w1 …, wn-1)
¼ P(w1) P(w2 | w1) …. P(wn | wn-1)
• Why do we make independence assumption which we know are not true?
35
Summary of elementaryprobability theory
• Basic concepts: sample space, event space, random variable, random vector
• Joint / conditional /marginal probability
• Independence and conditional independence
• Five common tricks:– Max likelihood estimation– Chain rule– Calculating marginal probability from joint probability– Bayes’ rule– Independence assumption
36
Outline
• Quiz #1
• Unix commands
• Linguistics
• Elementary Probability theory
37
Next time
• J&M Chapt 2– Formal language and formal grammar– Regular expression
• Hw1 is due at 3pm on Wed.