discovering context

17
Discovering Context: Classifying tweets through a semantic transform based on Wikipedia Yegin Genc, Yasuaki Sakamoto, and Jeffrey V. Nickerson

Upload: yegin-genc

Post on 09-Jul-2015

109 views

Category:

Technology


0 download

DESCRIPTION

Presented at HCII2011

TRANSCRIPT

Page 1: Discovering Context

Discovering Context: Classifying tweets through a semantic

transform based on Wikipedia

Yegin Genc, Yasuaki Sakamoto, and Jeffrey V. Nickerson

Page 2: Discovering Context

"So I'm told by a reputable person they have killed Osama Bin Laden. …"

“I hate how my phone has this stupid … spell check …”

Twitter to function as a large sensor system, and can increase our awareness of our surroundings

Page 3: Discovering Context

Discovering Context: Classifying tweets through a semantic transform based on

Wikipedia

Why classify?

Page 4: Discovering Context

"So I'm told by a reputable person they have killed Osama Bin Laden. …"

“I hate how my phone has this stupid … spell check …”

Terrorism (?) Irritating technology

NOT IMPORTANT

important

important

important

important

important

Page 5: Discovering Context

How to classify?

message

message

transform

transform

distance(T(m1), T(m2))

d(message1, message2) α d(T(message1),T( message2))

Page 6: Discovering Context

Tweet 1

Tweet 2

Tweet 3

.

.

.

Tweet n

Wiki Page 1(WP1)

Wiki Page 2(WP2)

Wiki Page 3(WP3)

Wiki Page n(WPn)

.

.

.

WP1 WP2

WP3

WPn

d13

d3n

d12

d1n

d2n

d32

STEP 1:FINDING WIKI PAGES

STEP 2:CALCULATING DISTANCE

A Two-Step Approach

Page 7: Discovering Context

Candidate Pages

(word13)

Candidate Pages

(word12)Tweet 1 Word-Set (WS) =

Word11

Word12

Word13

.

.

.

Word1n

Candidate Pages

(word11)

.

.

.

Candidate Pages

(word1n)

Wiki Page 1

max overlap btw. WS and CP content

.

.

.

Step – 1: Finding Wiki Pages

Page 8: Discovering Context

Tweet:RT ashajayy Rest in peace JD Salinger Catcher in the Rye is one of my absolute

favourite books Sad day

Candidate Pages Hits

//en.wikipedia.org/wiki/J.D._Salinger 290

//en.wikipedia.org/wiki/J._D._Salinger 289

//en.wikipedia.org/wiki/books 145

//en.wikipedia.org/wiki/Doris_Day 138

//en.wikipedia.org/wiki/peace 131

Words:Rest, peace, JD, Salinger, Catcher, Rye, absolute, favourite, books, Sad, day

Page 9: Discovering Context

Wiki Page 1

Wiki Page 2

WP1

L1

WP1

L2

WP1

L2

WP1

L3

WP1

L3

WP1

L3

WP2

L1

WP2

L2

WP2

L3

WP1 L3WP2 L2

1

2

3

d12= 3

Step – 2: Calculating the Distance

Page 10: Discovering Context

DiscriminantAnalysis

T1

T3

T2

AccuracyRate

Method

Tweets-T 1 (Topic 1)-T 2 (Topic 1)-T 3 (Topic 2)

.

.

.

T1 T2 T3

T1 0 d12 d12

T2 d21 0 d23

T3 d31 d32 0

Distance Matrix

MDS

X Y

T1 t1x t1y

T2 t2x t2y

T3 t3x t3y

SED

LSA

Wikipedia

DSED

DLSA

DWIKI

Acc. SED

Acc. LSA

Acc. WIKI

Page 11: Discovering Context

Other Techniques

String Edit Distance (SED)

Minimum number of edits needed to transform one string into the other

Kitten → sitten (subst. of 's' for 'k')

SED = 1

Latent Semantic Analysis (LSA)

Natural language processing technique for classification based on term occurrences in documents

Page 12: Discovering Context

DataWithout Noise With Noise

Category Count

X J.D. Salinger 15

iPad 15

Haiti 15

TOTAL 45

Category Count

X J.D. Salinger 15

iPad 15

Haiti 15

Random 55

TOTAL 100

RT @ashajayy Rest in peace, JD Salinger. Catcher in the Rye is one of my absolute favourite books. Sad day.

@JMNelis I fear I may have killed him because I talked about how I hate "Catcher." (1/2)

'Catcher In The Rye' Author J.D. Salinger Dies At 91 - The author of The Catcher in the Rye died of natural causes,... http//ow.ly/16rETF

What Yall think about me buying a whole bunch of sour patch kids and giving them to haiti i bet they would be HAPPY!

Please ReTweet (http//caltweet.com/4gx ) - Lets ALL really AID Haiti

RT @UNC_Health_Care Video Want to help the #Haitian patients at #UNC Hospitals? Here's how. http//bit…

@Alitas_Way naw im kiddin but ma'am it really looks great on u

Please come to our Legal Studies Open House on Tuesday February 2nd from 6-730pm.Please call for exact location and to RSVP …

Most impressive stat for Warner is he holds the top 3 most passing yards in a superbowl. Three games three most passing yards in 40

iPad..not so appealing to me (Yet!) It's basically the MacBook&iPhone combined.I have both so don't think i'll be getting the iPad soon.

Have u seen it?Apple iPad Tablet Steve Jobs Unveils Visionary Computer http//bit.ly/9IslTP

The new Apple formula Hype

Page 13: Discovering Context

Technique J. D. Salinger iPad Haiti

String Edit Distance .67 .13 .60 Latent Semantic Analysis .67 .73 .80 Wikipedia .93 .87 .80

Tweets without noise:

-0.3 -0.2 -0.1 0.0 0.1 0.2

-0.3

-0.2

-0.1

0.0

0.1

0.2

SED

Coordinate 1

Co

ord

ina

te 2

-0.6 -0.2 0.2 0.6

-0.6

-0.2

0.2

0.6

LSA

Coordinate 1

Co

ord

ina

te 2

-2 0 2 4 6 8

-4-2

02

46

Wiki

Coordinate 1

Co

ord

ina

te 2

X J.D. SalingeriPadHaiti

Page 14: Discovering Context

Tweets with noise:

Technique J. D. Salinger iPad Haiti

Latent Semantic Analysis .60 .60 .20 Wikipedia .93 .87 .73

-0.3 -0.2 -0.1 0.0 0.1 0.2

-0.3

-0.2

-0.1

0.0

0.1

0.2

SED

Coordinate 1

Co

ord

ina

te 2

-0.6 -0.2 0.2 0.6

-0.6

-0.2

0.2

0.6

LSA

Coordinate 1

Co

ord

ina

te 2

-2 0 2 4 6 8

-4-2

02

46

Wiki

Coordinate 1C

oo

rdin

ate

2

-0.3 -0.2 -0.1 0.0 0.1 0.2

-0.3

-0.2

-0.1

0.0

0.1

0.2

SED

Coordinate 1

Co

ord

ina

te 2

-0.6 -0.2 0.2 0.6

-0.6

-0.2

0.2

0.6

LSA

Coordinate 1

Co

ord

ina

te 2

-2 0 2 4 6 8

-4-2

02

46

Wiki

Coordinate 1

Co

ord

ina

te 2

X J.D. SalingeriPadHaitiRandom

Page 15: Discovering Context

Conclusion

Wikipedia Space shows promising results in defining similarity of short text

– Socially constructed

– Large space

– Immune to noise

Page 16: Discovering Context

Future Work

• Adaptive classification

– What we consider as noise may contain useful information depending on the context

• Improved mapping and distance calculations

• Utilizing other social aspects of Wikipedia

Page 17: Discovering Context

Thank you!

Q&A