![Page 1: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/1.jpg)
Alan Ritter ◦ socialmedia-class.org
Social Media & Text Analysis lecture 7 - Paraphrase Identification
and Linear Regression
CSE 5539-0010 Ohio State UniversityInstructor: Alan Ritter
Website: socialmedia-class.org
![Page 2: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/2.jpg)
(Recap)
what is Paraphrase?“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)
![Page 3: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/3.jpg)
(Recap)
what is Paraphrase?
wealthy richword
“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)
![Page 4: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/4.jpg)
(Recap)
what is Paraphrase?
wealthy richword
the king’s speech His Majesty’s addressphrase
“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)
![Page 5: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/5.jpg)
(Recap)
what is Paraphrase?
wealthy richword
the king’s speech His Majesty’s addressphrase
… the forced resignation of the CEO of Boeing,
Harry Stonecipher, for …sentence
… after Boeing Co. Chief Executive Harry Stonecipher was ousted from …
“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)
![Page 6: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/6.jpg)
The Ideal
![Page 7: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/7.jpg)
(Recap)
Paraphrase Research'01
Novels'05 '13Bi-Text
'04News
'13* '14* Twitter
'11Video
XuRitter
Callison-Burch Dolan
Ji
XuRitter Dolan
Grishman Cherry
'12*Style
‘01Web
XuCallison-Burch
Napoles
'15*'16*Simple
80sWordNet
![Page 8: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/8.jpg)
Distributional Similarity
modified by adjectives objects of verbsadditional, administrative,
assigned, assumed, collective, congressional,
constitutional ...
assert, assign, assume, attend to, avoid, become,
breach ...
Decking Lin and Patrick Pantel. “DIRT - Discovery of Inference Rules from Text” In KDD (2001)
Lin and Panel (2001) operationalize the Distributional Hypothesis using dependency relationships to define
similar environments.
Duty and responsibility share a similar set of dependency contexts in large volumes of text:
![Page 9: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/9.jpg)
Bilingual Pivoting
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
Source: Chris Callison-Burch
word alignment
![Page 10: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/10.jpg)
Bilingual Pivoting
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
Source: Chris Callison-Burch
word alignment
![Page 11: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/11.jpg)
Bilingual Pivoting
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
Source: Chris Callison-Burch
word alignment
![Page 12: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/12.jpg)
Bilingual Pivoting
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
Source: Chris Callison-Burch
word alignment
![Page 13: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/13.jpg)
Bilingual Pivoting
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
Source: Chris Callison-Burch
word alignment
![Page 14: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/14.jpg)
Key Limitations of PPDB?
![Page 15: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/15.jpg)
Key Limitations of PPDB?
bug
insect, beetle,pest, mosquito,
fly
squealer, snitch, rat, mole
microphone, tracker, mic,
wire, earpiece, cookie
glitch, error, malfunction, fault, failure
bother, annoy, pester
microbe, virus, bacterium,
germ, parasite
Source: Chris Callison-Burch
word sense
![Page 16: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/16.jpg)
Another Key Limitation'01
Novels'05 '13Bi-Text
'04News
'13* '14*Twitter
'11Video
'12*Style
‘01Web
'15*'16*Simple
80sWordNet
only paraphrases, no non-paraphrases
![Page 17: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/17.jpg)
Paraphrase Identificationobtain sentential paraphrases automatically
Mancini has been sacked by Manchester City
Mancini gets the boot from Man City Yes!%
WORLD OF JENKS IS ON AT 11
World of Jenks is my favorite show on tvNo!$
Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. “Extracting Lexically Divergent Paraphrases from Twitter” In TACL (2014)
(meaningful) non-paraphrases are needed to train classifiers!
![Page 18: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/18.jpg)
Also Non-Paraphrases'01
Novels'05 '13Bi-Text
'04News
'13* ’14* 17*Twitter
'11Video
'12*Style
‘01Web
'15*'16*Simple
80sWordNet
(meaningful) non-paraphrases are needed to train classifiers!
![Page 19: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/19.jpg)
News Paraphrase Corpus
Microsoft Research Paraphrase Corpus
(Dolan, Quirk and Brockett, 2004; Dolan and Brockett, 2005; Brockett and Dolan, 2005)
also contains some non-paraphrases
![Page 20: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/20.jpg)
Twitter Paraphrase Corpus
Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. “Extracting Lexically Divergent Paraphrases from Twitter” In TACL (2014)
also contains a lot of non-paraphrases
![Page 21: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/21.jpg)
Alan Ritter ◦ socialmedia-class.org
Paraphrase Identification:
A Binary Classification Problem• Input:
- a sentence pair x- a fixed set of binary classes Y = {0, 1}
• Output: - a predicted class y ∈ Y (y = 0 or y = 1)
![Page 22: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/22.jpg)
Alan Ritter ◦ socialmedia-class.org
Paraphrase Identification:
A Binary Classification Problem• Input:
- a sentence pair x- a fixed set of binary classes Y = {0, 1}
• Output: - a predicted class y ∈ Y (y = 0 or y = 1)
negative (non-paraphrases)
![Page 23: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/23.jpg)
Alan Ritter ◦ socialmedia-class.org
Paraphrase Identification:
A Binary Classification Problem• Input:
- a sentence pair x- a fixed set of binary classes Y = {0, 1}
• Output: - a predicted class y ∈ Y (y = 0 or y = 1)
negative (non-paraphrases)
positive (paraphrases)
![Page 24: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/24.jpg)
Alan Ritter ◦ socialmedia-class.org
Paraphrase Identification:
A Binary Classification Problem• Input:
- a sentence pair x- a fixed set of binary classes Y = {0, 1}
• Output: - a predicted class y ∈ Y (y = 0 or y = 1)
![Page 25: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/25.jpg)
Alan Ritter ◦ socialmedia-class.org
Classification Method:
Supervised Machine Learning• Input:
- a sentence pair x- a fixed set of binary classes Y = {0, 1}- a training set of m hand-labeled sentence pairs
(x(1), y(1)), … , (x(m), y(m))
• Output:
- a learned classifier 𝜸: x → y ∈ Y (y = 0 or y = 1)
![Page 26: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/26.jpg)
Alan Ritter ◦ socialmedia-class.org
Classification Method:
Supervised Machine Learning• Input:
- a sentence pair x (represented by features)- a fixed set of binary classes Y = {0, 1}- a training set of m hand-labeled sentence pairs
(x(1), y(1)), … , (x(m), y(m))
• Output:
- a learned classifier 𝜸: x → y ∈ Y (y = 0 or y = 1)
![Page 27: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/27.jpg)
Alan Ritter ◦ socialmedia-class.org
(Recap) Classification Method: Supervised Machine Learning
• Naïve Bayes• Logistic Regression • Support Vector Machines (SVM) • …
![Page 28: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/28.jpg)
Alan Ritter ◦ socialmedia-class.org
(Recap)
Naïve Bayes• Cons:features ti are assumed independent given the class y
• This will cause problems:- correlated features ➞ double-counted evidence - while parameters are estimated independently - hurt classier’s accuracy
P(t1,t2,...,tn | y) = P(t1 | y) ⋅P(t2 | y) ⋅...⋅P(tn | y)
![Page 29: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/29.jpg)
Alan Ritter ◦ socialmedia-class.org
Classification Method: Supervised Machine Learning
• Naïve Bayes • Logistic Regression • Support Vector Machines (SVM) • …
![Page 30: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/30.jpg)
Logistic Regression• One of the most useful supervised machine
learning algorithm for classification!
• Generally high performance for a lot of problems.
• Much more robust than Naïve Bayes (better performance on various datasets).
![Page 31: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/31.jpg)
Let’s start with something simpler!
Before Logistic Regression
![Page 32: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/32.jpg)
Alan Ritter ◦ socialmedia-class.org
Paraphrase Identification:
Simplified Features
• We use only one feature:
- number of words that two sentences share in common
![Page 33: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/33.jpg)
Alan Ritter ◦ socialmedia-class.org
A very related problem of Paraphrase Identification:
Semantic Textual Similarity• How similar (close in meaning) two sentences are?
5: completely equivalent in meaning
4: mostly equivalent, but some unimportant details differ
3: roughly equivalent, some important information differs/missing
2: not equivalent, but share some details
1: not equivalent, but are on the same topic
0: completely dissimilar
![Page 34: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/34.jpg)
Alan Ritter ◦ socialmedia-class.org
A Simpler Model:
Linear RegressionSe
nten
ce S
imila
rity
(rat
ed b
y H
uman
)
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
![Page 35: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/35.jpg)
Alan Ritter ◦ socialmedia-class.org
A Simpler Model:
Linear Regression
• also supervised learning (learn from annotated data)
• but for Regression: predict real-valued output (Classification: predict discrete-valued output)
Sent
ence
Sim
ilarit
y
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
![Page 36: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/36.jpg)
Alan Ritter ◦ socialmedia-class.org
A Simpler Model:
Linear Regression
• also supervised learning (learn from labeled data)
• but for Regression: predict real-valued output (Classification: predict discrete-valued output)
Sent
ence
Sim
ilarit
y
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
threshold ➞ Classification
![Page 37: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/37.jpg)
Alan Ritter ◦ socialmedia-class.org
A Simpler Model:
Linear Regression
• also supervised learning (learn from labeled data)
• but for Regression: predict real-valued output (Classification: predict discrete-valued output)
Sent
ence
Sim
ilarit
y
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
threshold ➞ Classification
paraphrase{
![Page 38: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/38.jpg)
Alan Ritter ◦ socialmedia-class.org
A Simpler Model:
Linear Regression
• also supervised learning (learn from labeled data)
• but for Regression: predict real-valued output (Classification: predict discrete-valued output)
Sent
ence
Sim
ilarit
y
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
threshold ➞ Classification
paraphrase
non-paraphrase
{{
![Page 39: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/39.jpg)
Alan Ritter ◦ socialmedia-class.org
Training Set
- m hand-labeled sentence pairs (x(1), y(1)), … , (x(m), y(m)) - x’s: “input” variable / features - y’s: “output”/“target” variable
#words in common (x)
Sentence Similarity (y)
1 0
4 1
13 4
18 5
… …
![Page 40: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/40.jpg)
Alan Ritter ◦ socialmedia-class.org Source: NLTK Book
(Recap)
Supervised Machine Learningtraining set
![Page 41: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/41.jpg)
Alan Ritter ◦ socialmedia-class.org
Supervised Machine Learningtraining set
(also called) hypothesis
![Page 42: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/42.jpg)
Alan Ritter ◦ socialmedia-class.org
Supervised Machine Learningtraining set
(also called) hypothesis
#words in commonSentence Similarity
x(estimated)
y
![Page 43: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/43.jpg)
Alan Ritter ◦ socialmedia-class.org
Supervised Machine Learningtraining set
(also called) hypothesis
x(estimated)
y
![Page 44: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/44.jpg)
Alan Ritter ◦ socialmedia-class.org
hypothesis (h)
x(estimated)
y
training set
• How to represent h ?
hθ (x) = θ0 +θ1x
Linear Regression w/ one variable
Linear Regression:
Model Representation
![Page 45: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/45.jpg)
Alan Ritter ◦ socialmedia-class.org
- m hand-labeled sentence pairs (x(1), y(1)), … , (x(m), y(m)) 𝞱’s: parameters
#words in common (x)
Sentence Similarity (y)
1 0
4 1
13 4
18 5
… …
hθ (x) = θ0 +θ1x
Linear Regression w/ one variable:
Model Representation
Source: many following slides are adapted from Andrew Ng
![Page 46: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/46.jpg)
Alan Ritter ◦ socialmedia-class.org
- m hand-labeled sentence pairs (x(1), y(1)), … , (x(m), y(m))
- 𝞱’s: parameters
#words in common (x)
Sentence Similarity (y)
1 0
4 1
13 4
18 5
… …
hθ (x) = θ0 +θ1x
Linear Regression w/ one variable:
Model Representation
![Page 47: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/47.jpg)
Alan Ritter ◦ socialmedia-class.org Andrew'Ng'
0'
1'
2'
3'
0' 1' 2' 3'0'
1'
2'
3'
0' 1' 2' 3'0'
1'
2'
3'
0' 1' 2' 3'
Linear Regression w/ one variable::
Model Representation
![Page 48: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/48.jpg)
Alan Ritter ◦ socialmedia-class.org
Sent
ence
Sim
ilarit
y (r
ated
by
Hum
an)
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
• Idea: choose 𝞱0, 𝞱1 so that h𝞱(x) is close to y for training examples (x, y)
Linear Regression w/ one variable:
Cost Function
![Page 49: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/49.jpg)
Alan Ritter ◦ socialmedia-class.org
• Idea: choose 𝞱0, 𝞱1 so that h𝞱(x) is close to y for training examples (x, y)
Sent
ence
Sim
ilarit
y (r
ated
by
Hum
an)
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
Linear Regression w/ one variable:
Cost Function
![Page 50: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/50.jpg)
Alan Ritter ◦ socialmedia-class.org
• Idea: choose 𝞱0, 𝞱1 so that h𝞱(x) is close to y for training examples (x, y)
Sent
ence
Sim
ilarit
y (r
ated
by
Hum
an)
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
J(θ0,θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑
minimizeθ0 , θ1
J(θ0,θ1)
squared error function:
Linear Regression w/ one variable:
Cost Function
![Page 51: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/51.jpg)
Alan Ritter ◦ socialmedia-class.org
Linear Regression• Hypothesis:
• Parameters:
• Cost Function:
• Goal:
hθ (x) = θ0 +θ1x
minimizeθ0 , θ1
J(θ0,θ1)
Andrew'Ng'
Hypothesis:'
Parameters:'
Cost'Func6on:'
Goal:'
Simplified'
𝞱0, 𝞱1
J(θ0,θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑
![Page 52: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/52.jpg)
Alan Ritter ◦ socialmedia-class.org
Andrew'Ng'
Hypothesis:'
Parameters:'
Cost'Func6on:'
Goal:'
Simplified'
Linear Regression• Hypothesis:
• Parameters:
• Cost Function:
• Goal:
hθ (x) = θ0 +θ1x
minimizeθ0 , θ1
J(θ0,θ1)
Andrew'Ng'
Hypothesis:'
Parameters:'
Cost'Func6on:'
Goal:'
Simplified'
𝞱0, 𝞱1
hθ (x) = θ1x
J(θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑minimize
θ1J(θ1)
𝞱1
Simplified
J(θ0,θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑
![Page 53: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/53.jpg)
Alan Ritter ◦ socialmedia-class.org
0
1
2
3
0 1 2 3
hθ (x)
y
(for fixed 𝞱1, this is a function of x )
x
J(θ1)(function of the parameter 𝞱1 )
𝞱1=1
Hypothesis Cost Function
Q:
![Page 54: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/54.jpg)
Alan Ritter ◦ socialmedia-class.org
0
1
2
3
0 1 2 3
hθ (x)
y
(for fixed 𝞱1, this is a function of x )
x
J(θ1)(function of the parameter 𝞱1 )
𝞱1=1
Hypothesis Cost Function
![Page 55: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/55.jpg)
Alan Ritter ◦ socialmedia-class.org
0
1
2
3
0 1 2 3
hθ (x)
y1
2
3
0 0.5 1 1.5 2 2.5-0.5 0
(for fixed 𝞱1, this is a function of x )
𝞱1
J(𝞱1)
x
J(θ1)(function of the parameter 𝞱1 )
𝞱1=1
Hypothesis Cost Function
![Page 56: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/56.jpg)
Alan Ritter ◦ socialmedia-class.org
0
1
2
3
0 1 2 3
hθ (x)
y
(for fixed 𝞱1, this is a function of x )
x
J(θ1)(function of the parameter 𝞱1 )
𝞱1=0.5
Q:
![Page 57: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/57.jpg)
Alan Ritter ◦ socialmedia-class.org
0
1
2
3
0 1 2 3
hθ (x)
y1
2
3
0 0.5 1 1.5 2 2.5-0.5 0
(for fixed 𝞱1, this is a function of x )
𝞱1
J(𝞱1)
x
J(θ1)(function of the parameter 𝞱1 )
𝞱1=0.5
![Page 58: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/58.jpg)
Alan Ritter ◦ socialmedia-class.org
0
1
2
3
0 1 2 3
hθ (x)
y1
2
3
0 0.5 1 1.5 2 2.5-0.5 0
(for fixed 𝞱1, this is a function of x )
𝞱1
J(𝞱1)
x
J(θ1)(function of the parameter 𝞱1 )
minimizeθ1
J(θ1)
![Page 59: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/59.jpg)
Alan Ritter ◦ socialmedia-class.org
Andrew'Ng'
Hypothesis:'
Parameters:'
Cost'Func6on:'
Goal:'
Simplified'
Linear Regression• Hypothesis:
• Parameters:
• Cost Function:
• Goal:
hθ (x) = θ0 +θ1x
minimizeθ0 , θ1
J(θ0,θ1)
Andrew'Ng'
Hypothesis:'
Parameters:'
Cost'Func6on:'
Goal:'
Simplified'
𝞱0, 𝞱1
hθ (x) = θ1x
J(θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑minimize
θ1J(θ1)
𝞱1
Simplified
J(θ0,θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑
![Page 60: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/60.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x ) (function of the parameter 𝞱0, 𝞱1 )
𝞱1 𝞱0
J(𝞱1, 𝞱2)
J(θ0,θ1)
![Page 61: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/61.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'contour plot
![Page 62: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/62.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 63: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/63.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 64: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/64.jpg)
Alan Ritter ◦ socialmedia-class.org
Parameter Learning• Have some function
• Want
• Outline:
- Start with some 𝞱0, 𝞱1
- Keep changing 𝞱0, 𝞱1 to reduce J(𝞱1, 𝞱2) until we hopefully end up at a minimum
J(θ0,θ1)minθ0 , θ1
J(θ0,θ1)
![Page 65: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/65.jpg)
Alan Ritter ◦ socialmedia-class.org
1
2
3
0 0.5 1 1.5 2 2.5-0.5 0𝞱1
J(𝞱1)
minimizeθ1
J(θ1)
Gradient DescentSimplified
![Page 66: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/66.jpg)
Alan Ritter ◦ socialmedia-class.org
1
2
3
0 0.5 1 1.5 2 2.5-0.5 0𝞱1
J(𝞱1)
minimizeθ1
J(θ1)
θ1 := θ1 −α∂∂θ1
J(θ1)
learning rate
Gradient DescentSimplified
![Page 67: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/67.jpg)
Alan Ritter ◦ socialmedia-class.org
Andrew'Ng'
θ1"θ0"
J(θ0,θ1)"
Gradient Descent
minimizeθ0 ,θ1
J(θ0,θ1)
![Page 68: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/68.jpg)
Alan Ritter ◦ socialmedia-class.org
Andrew'Ng'
θ0"
θ1"
J(θ0,θ1)"
Gradient Descent
minimizeθ0 ,θ1
J(θ0,θ1)
![Page 69: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/69.jpg)
Alan Ritter ◦ socialmedia-class.org
repeat until convergence {
}
Gradient Descent
θ j := θ j −α∂∂θ j
J(θ0,θ1) simultaneous updatefor j=0 and j=1
learning rate
![Page 70: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/70.jpg)
Alan Ritter ◦ socialmedia-class.org
Linear Regression w/ one variable:
Gradient Descent
∂∂θ0
J(θ0,θ1) = ?∂∂θ1
J(θ0,θ1) = ?
hθ (x) = θ0 +θ1x
J(θ0,θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑Cost Function
![Page 71: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/71.jpg)
Alan Ritter ◦ socialmedia-class.org
repeat until convergence {
}
θ0 := θ0 −α1m
(hθ (x(i ) )− y(i ) )
i=1
m
∑ simultaneous update 𝞱0, 𝞱1
Linear Regression w/ one variable:
Gradient Descent
θ1 := θ1 −α1m
(hθ (x(i ) )− y(i ) )
i=1
m
∑ ⋅ x(i )
![Page 72: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/72.jpg)
Alan Ritter ◦ socialmedia-class.org
𝞱1 𝞱0
J(𝞱1, 𝞱2)
Linear Regressioncost function is convex
![Page 73: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/73.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 74: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/74.jpg)
Alan Ritter ◦ socialmedia-class.org Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
![Page 75: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/75.jpg)
Alan Ritter ◦ socialmedia-class.org Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
![Page 76: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/76.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 77: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/77.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 78: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/78.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 79: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/79.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 80: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/80.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 81: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/81.jpg)
Alan Ritter ◦ socialmedia-class.org
hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )
J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
![Page 82: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/82.jpg)
Alan Ritter ◦ socialmedia-class.org
Batch Gradient Descent
• Each step of gradient descent uses all the training examples
J(θ0,θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑Cost Function
![Page 83: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/83.jpg)
Alan Ritter ◦ socialmedia-class.org
(Recap)
Linear Regression
• also supervised learning (learn from annotated data)
• but for Regression: predict real-valued output (Classification: predict discrete-valued output)
Sent
ence
Sim
ilarit
y
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
threshold ➞ Classification
![Page 84: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/84.jpg)
Alan Ritter ◦ socialmedia-class.org
(Recap)
Linear Regression
• also supervised learning (learn from annotated data)
• but for Regression: predict real-valued output (Classification: predict discrete-valued output)
Sent
ence
Sim
ilarit
y
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
threshold ➞ Classification
paraphrase{
![Page 85: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/85.jpg)
Alan Ritter ◦ socialmedia-class.org
(Recap)
Linear Regression
• also supervised learning (learn from annotated data)
• but for Regression: predict real-valued output (Classification: predict discrete-valued output)
Sent
ence
Sim
ilarit
y
0
1
2
3
4
5
#words in common (feature)0 5 10 15 20
threshold ➞ Classification
paraphrase
non-paraphrase
{{
![Page 86: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/86.jpg)
Alan Ritter ◦ socialmedia-class.org
𝞱1 𝞱0
J(𝞱1, 𝞱2)
(Recap)
Linear Regression• Hypothesis:
• Parameters:
• Cost Function:
• Goal:
hθ (x) = θ0 +θ1x
minimizeθ0 , θ1
J(θ0,θ1)
𝞱0, 𝞱1
J(θ0,θ1) =12m
(hθ (x(i ) )− y(i ) )2
i=1
m
∑
![Page 87: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/87.jpg)
Alan Ritter ◦ socialmedia-class.org
repeat until convergence {
}
θ j := θ j −α∂∂θ j
J(θ0,θ1) (simultaneous update for j=0 and j=1)
learning rate
(Recap)
Gradient Descent
![Page 88: Social Media & Text Analysissocialmedia-class.org/slides/lecture7_paraphrase... · 2020-03-24 · Alan Ritter socialmedia-class.org Social Media & Text Analysis lecture 7 - Paraphrase](https://reader034.vdocuments.us/reader034/viewer/2022042923/5f7110df5aa8cd20b265652a/html5/thumbnails/88.jpg)
Alan Ritter ◦ socialmedia-class.org
socialmedia-class.org
Next Class: - Logistic Regression