statistical models of (social) networks andrew mccallum computer science department university of...
TRANSCRIPT
![Page 1: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/1.jpg)
Statistical Models of (Social) Networks
Andrew McCallum
Computer Science Department
University of Massachusetts Amherst
Joint work with
Xuerui Wang, Natasha Mohanty, Andres Corrada
![Page 2: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/2.jpg)
![Page 3: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/3.jpg)
Workplace effectiveness ~ Ability to leverage network of acquaintances
But filling Contacts DB by hand is tedious, and incomplete.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Email Inbox Contacts DB
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
WWW
Automatically
Managing and UnderstandingConnections of People in our Email World
![Page 4: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/4.jpg)
System Overview
ContactInfo andPerson Name
Extraction
Person Name
Extraction
NameCoreference
HomepageRetrieval
Social NetworkAnalysis
KeywordExtraction
CRFWWW
names
Email QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
![Page 5: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/5.jpg)
An ExampleTo: “Andrew McCallum” [email protected]
Subject ...
First Name:
Andrew
Middle Name:
Kachites
Last Name:
McCallum
JobTitle: Associate Professor
Company: University of Massachusetts
Street Address:
140 Governor’s Dr.
City: Amherst
State: MA
Zip: 01003
Company Phone:
(413) 545-1323
Links: Fernando Pereira, Sam Roweis,…
Key Words:
Information extraction,
social network,…
Search for new people
![Page 6: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/6.jpg)
Summary of Results
Token
Acc
Field
Prec
Field
Recall
Field
F1
CRF 94.50 85.73 76.33 80.76
Person Keywords
William Cohen Logic programming
Text categorization
Data integration
Rule learning
Daphne Koller Bayesian networks
Relational models
Probabilistic models
Hidden variables
Deborah McGuiness
Semantic web
Description logics
Knowledge representation
Ontologies
Tom Mitchell Machine learning
Cognitive states
Learning apprentice
Artificial intelligence
Contact info and name extraction performance (25 fields)
Example keywords extracted
1. Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!)
2. Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 7: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/7.jpg)
Outline
• Social Network Analysis with (Language) Attributes
– Roles and Topics (Author-Recipient-Topic Model)
– Groups and Topics (Group-Topic Model)
• Demo: Rexa, a Web portal for researchers
![Page 8: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/8.jpg)
Outline
• Social Network Analysis with (Language) Attributes
– Roles and Topics (Author-Recipient-Topic Model)
– Groups and Topics (Group-Topic Model)
• Demo: Rexa, a Web portal for researchers
![Page 9: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/9.jpg)
Clustering words into topics withLatent Dirichlet Allocation
[Blei, Ng, Jordan 2003]
Sample a distributionover topics,
For each document:
Sample a topic, z
For each word in doc
Sample a wordfrom the topic, w
Example:
70% Iraq war30% US election
Iraq war
“bombing”
GenerativeProcess:
![Page 10: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/10.jpg)
STORYSTORIESTELL
CHARACTERCHARACTERS
AUTHORREADTOLD
SETTINGTALESPLOT
TELLINGSHORTFICTIONACTIONTRUE
EVENTSTELLSTALENOVEL
MINDWORLDDREAMDREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEASWIM
SWIMMINGPOOLLIKESHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASESGERMSFEVERCAUSECAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
Example topicsinduced from a large collection of text
FIELDMAGNETICMAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGYFIELD
PHYSICSLABORATORY
STUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTSBAT
TERRY
JOBWORKJOBS
CAREEREXPERIENCEEMPLOYMENTOPPORTUNITIES
WORKINGTRAININGSKILLS
CAREERSPOSITIONS
FINDPOSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
[Tennenbaum et al]
![Page 11: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/11.jpg)
STORYSTORIESTELL
CHARACTERCHARACTERS
AUTHORREADTOLD
SETTINGTALESPLOT
TELLINGSHORTFICTIONACTIONTRUE
EVENTSTELLSTALENOVEL
MINDWORLDDREAMDREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEASWIM
SWIMMINGPOOLLIKESHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASESGERMSFEVERCAUSECAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
FIELDMAGNETICMAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGYFIELD
PHYSICSLABORATORY
STUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELDPLAYER
BASKETBALLCOACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTSBAT
TERRY
JOBWORKJOBS
CAREEREXPERIENCEEMPLOYMENTOPPORTUNITIES
WORKINGTRAININGSKILLS
CAREERSPOSITIONS
FINDPOSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
Example topicsinduced from a large collection of text
[Tennenbaum et al]
![Page 12: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/12.jpg)
From LDA to Author-Recipient-Topic(ART)
![Page 13: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/13.jpg)
Inference and Estimation
Gibbs Sampling:- Easy to implement- Reasonably fast
r
![Page 14: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/14.jpg)
Enron Email Corpus
• 250k email messages• 23k people
Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001
Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions.
DP
Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]
![Page 15: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/15.jpg)
Topics, and prominent senders / receiversdiscovered by ARTTopic names,
by hand
![Page 16: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/16.jpg)
Topics, and prominent senders / receiversdiscovered by ART
Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice President of Regulatory Affairs”Steffes = “Vice President of Government Affairs”
![Page 17: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/17.jpg)
Comparing Role Discovery
connection strength (A,B) =
distribution overauthored topics
Traditional SNA
distribution overrecipients
distribution overauthored topics
Author-TopicART
![Page 18: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/18.jpg)
Comparing Role Discovery Tracy Geaconne Dan McCarty
Traditional SNA Author-TopicART
Similar roles Different rolesDifferent roles
Geaconne = “Secretary”McCarty = “Vice President”
![Page 19: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/19.jpg)
Traditional SNA Author-TopicART
Different roles Very similarNot very similar
Geaconne = “Secretary”Hayslett = “Vice President & CTO”
Comparing Role Discovery Tracy Geaconne Rod Hayslett
![Page 20: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/20.jpg)
Traditional SNA Author-TopicART
Different roles Very differentVery similar
Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”
Comparing Role Discovery Lynn Blair Kimberly Watson
![Page 21: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/21.jpg)
McCallum Email Corpus 2004
• January - October 2004• 23k email messages• 825 people
From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]
There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for:
NIPS registration receipt.CALO registration receipt.
Thanks,Kate
![Page 22: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/22.jpg)
McCallum Email Blockstructure
![Page 23: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/23.jpg)
Four most prominent topicsin discussions with ____?
![Page 24: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/24.jpg)
![Page 25: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/25.jpg)
Two most prominent topicsin discussions with ____?
Words Problove 0.030514house 0.015402
0.013659time 0.012351great 0.011334hope 0.011043dinner 0.00959saturday 0.009154left 0.009154ll 0.009009
0.008282visit 0.008137evening 0.008137stay 0.007847bring 0.007701weekend 0.007411road 0.00712sunday 0.006829kids 0.006539flight 0.006539
Words Probtoday 0.051152tomorrow 0.045393time 0.041289ll 0.039145meeting 0.033877week 0.025484talk 0.024626meet 0.023279morning 0.022789monday 0.020767back 0.019358call 0.016418free 0.015621home 0.013967won 0.013783day 0.01311hope 0.012987leave 0.012987office 0.012742tuesday 0.012558
![Page 26: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/26.jpg)
![Page 27: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/27.jpg)
Pairs with highestrank difference between ART & SNA
5 other professors3 other ML researchers
![Page 28: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/28.jpg)
Role-Author-Recipient-Topic Models
![Page 29: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/29.jpg)
Results with RART:People in “Role #3” in Academic Email
• olc lead Linux sysadmin• gauthier sysadmin for CIIR group• irsystem mailing list CIIR sysadmins• system mailing list for dept. sysadmins• allan Prof., chair of “computing
committee”• valerie second Linux sysadmin• tech mailing list for dept. hardware• steve head of dept. I.T. support
![Page 30: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/30.jpg)
Roles for allan (James Allan)
• Role #3 I.T. support• Role #2 Natural Language
researcher
Roles for pereira (Fernando Pereira) • Role #2 Natural Language researcher• Role #4 SRI CALO project participant• Role #6 Grant proposal writer• Role #10 Grant proposal coordinator• Role #8 Guests at McCallum’s house
![Page 31: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/31.jpg)
Traditional SNA Author-TopicART
Block structured NotNot
ART: Roles but not Groups
Enron TransWestern Division
![Page 32: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/32.jpg)
Outline
• Social Network Analysis with (Language) Attributes
– Roles and Topics (Author-Recipient-Topic Model)
– Groups and Topics (Group-Topic Model)
• Demo: Rexa, a Web portal for researchers
![Page 33: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/33.jpg)
Groups and Topics
• Input:– Observed relations between people– Attributes on those relations (text, or categorical)
• Output:– Attributes clustered into “topics”– Groups of people---varying depending on topic
![Page 34: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/34.jpg)
Discovering Groups from Observed Set of Relations
Admiration relations among six high school students.
Student Roster
AdamsBennettCarterDavisEdwardsFrederking
Academic Admiration
Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)
![Page 35: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/35.jpg)
Adjacency Matrix Representing Relations
A B C D E FABCDEF
A B C D E FG1G2G1G2G3G3
G1G2G1G2G3G3
ABCDEF
A C B D E FG1G1G2G2G3G3
G1G1G2G2G3G3
ACBDEF
Student Roster
AdamsBennettCarterDavisEdwardsFrederking
Academic Admiration
Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)
![Page 36: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/36.jpg)
![Page 37: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/37.jpg)
Group Model: Partitioning Entities into Groups
2Sv
β
2Gγ α
Stochastic Blockstructures for Relations[Nowicki, Snijders 2001]
S: number of entities
G: number of groups
Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004]
BetaDirichlet
Binomial
SgMultinomial
![Page 38: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/38.jpg)
Two Relations with Different Attributes
A C B D E FG1G1G2G2G3G3
G1G1G2G2G3G3
A C E B D FG1G1G1G2G2G2
G1G1G1G2G2G2
ACEBDF
Student Roster
AdamsBennettCarterDavisEdwardsFrederking
Academic Admiration
Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)
Social Admiration
Soci(A, B) Soci(A, D) Soci(A, F)Soci(B, A) Soci(B, C) Soci(B, E)Soci(C, B) Soci(C, D) Soci(C, F)Soci(D, A) Soci(D, C) Soci(D, E)Soci(E, B) Soci(E, D) Soci(E, F)Soci(F, A) Soci(F, C) Soci(F, E)
ACBDEF
![Page 39: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/39.jpg)
Goal:Model relations and their (textual) attributes simultaneously to obtain better groups and more meaningful topics.
budget, funding, annual, cash
document, corrections, review, annual
![Page 40: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/40.jpg)
The Group-Topic Model: Discovering Groups and Topics Simultaneously
bNw
t
B
T
φ
η
DirichletMultinomial
Uniform
2Sv
β
2Gγ α
Beta
Dirichlet
Binomial
SgMultinomial
T
![Page 41: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/41.jpg)
Inference and EstimationGibbs Sampling:- Many r.v.s can be integrated out- Easy to implement- Reasonably fast
We assume the relationship is symmetric.
![Page 42: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/42.jpg)
Dataset #1:U.S. Senate
• 16 years of voting records in the US Senate (1989 – 2005)
• a Senator may respond Yea or Nay to a resolution
• 3423 resolutions with text attributes (index terms)
• 191 Senators in total across 16 years
S.543 Title: An Act to reform Federal deposit insurance, protect the deposit insurance funds, recapitalize the Bank Insurance Fund, improve supervision and regulation of insured depository institutions, and for other purposes. Sponsor: Sen Riegle, Donald W., Jr. [MI] (introduced 3/5/1991) Cosponsors (2) Latest Major Action: 12/19/1991 Became Public Law No: 102-242. Index terms: Banks and banking Accounting Administrative fees Cost control Credit Deposit insurance Depressed areas and other 110 terms
Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay ……
![Page 43: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/43.jpg)
Topics Discovered (U.S. Senate)Education Energy
MilitaryMisc.
Economic
education energy government federalschool power military laboraid water foreign insurance
children nuclear tax aiddrug gas congress tax
students petrol aid businesselementary research law employeeprevention pollution policy care
Mixture of Unigrams
Group-Topic Model
Education
+ DomesticForeign Economic
Social Security
+ Medicareeducation foreign labor socialschool trade insurance securityfederal chemicals tax insuranceaid tariff congress medical
government congress income caretax drugs minimum medicare
energy communicable wage disabilityresearch diseases business assistance
![Page 44: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/44.jpg)
Groups Discovered (US Senate)
Groups from topic Education + Domestic
![Page 45: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/45.jpg)
Senators Who Change Coalition the most Dependent on Topic
e.g. Senator Shelby (D-AL) votes with the Republicans on Economicwith the Democrats on Education + Domesticwith a small group of maverick Republicans on Social Security + Medicaid
![Page 46: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/46.jpg)
Dataset #2:The UN General Assembly
• Voting records of the UN General Assembly (1990 - 2003)
• A country may choose to vote Yes, No or Abstain
• 931 resolutions with text attributes (titles)
• 192 countries in total
• Also experiments later with resolutions from 1960-2003
Vote on Permanent Sovereignty of Palestinian People, 87th plenary meeting
The draft resolution on permanent sovereignty of the Palestinian people in the occupied Palestinian territory, including Jerusalem, and of the Arab population in the occupied Syrian Golan over their natural resources (document A/54/591) was adopted by a recorded vote of 145 in favour to 3 against with 6 abstentions:
In favour: Afghanistan, Argentina, Belgium, Brazil, Canada, China, France, Germany, India, Japan, Mexico, Netherlands, New Zealand, Pakistan, Panama, Russian Federation, South Africa, Spain, Turkey, and other 126 countries. Against: Israel, Marshall Islands, United States. Abstain: Australia, Cameroon, Georgia, Kazakhstan, Uzbekistan, Zambia.
![Page 47: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/47.jpg)
Topics Discovered (UN)
Everything Nuclear
Human RightsSecurity
in Middle East
nuclear rights occupiedweapons human israel
use palestine syriaimplementation situation security
countries israel calls
Mixture ofUnigrams
Group-TopicModel
NuclearNon-proliferation
Nuclear Arms Race
Human Rights
nuclear nuclear rightsstates arms humanunited prevention palestine
weapons race occupiednations space israel
![Page 48: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/48.jpg)
GroupsDiscovered(UN)The countries list for each group are ordered by their 2005 GDP (PPP) and only 5 countries are shown in groups that have more than 5 members.
![Page 49: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/49.jpg)
Do We Get Better Groups with the GT Model?
1. Cluster bills into topics using mixture of unigrams;
2. Apply group model on topic-specific subsets of bills.
Agreement Index (AI) measures group cohesion. Higher, better.
Datasets Avg. AI for Baseline Avg. AI for GT p-value
Senate 0.8198 0.8294 <.01
UN 0.8548 0.8664 <.01
1. Jointly cluster topic and groups at the same time using the GT model.
Baseline Model GT Model
![Page 50: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/50.jpg)
Groups and Topics, Trends over Time (UN)
![Page 51: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/51.jpg)
Outline
• Social Network Analysis with (Language) Attributes
– Roles and Topics (Author-Recipient-Topic Model)
– Groups and Topics (Group-Topic Model)
• Demo: Rexa, a Web portal for researchers
![Page 52: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/52.jpg)
Previous Systems
![Page 53: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/53.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 54: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/54.jpg)
ResearchPaper
Cites
Previous Systems
![Page 55: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/55.jpg)
ResearchPaper
Cites
Person
UniversityVenue
Grant
Groups
Expertise
More Entities and Relations
![Page 56: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/56.jpg)
![Page 57: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/57.jpg)
![Page 58: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/58.jpg)
![Page 59: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/59.jpg)
![Page 60: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/60.jpg)
![Page 61: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/61.jpg)
![Page 62: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/62.jpg)
![Page 63: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/63.jpg)
![Page 64: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/64.jpg)
![Page 65: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/65.jpg)
![Page 66: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/66.jpg)
Outline
• Examples of IE and Data Mining.
• Brief introduction of Conditional Random Fields
• Joint inference: Motivation and examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling for Transfer Learning (Piecewise Training & BP)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Joint Segmentation and Co-ref (Sparse BP)
• Joint Topic Discovery and Social Network Analysis
– Roles and Topics (Author-Recipient-Topic Model)
– Groups and Topics (Group-Topic Model)
• Demo: Rexa, a Web portal for researchers
![Page 67: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/67.jpg)
End of Talk
![Page 68: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha](https://reader035.vdocuments.us/reader035/viewer/2022062517/56649ecf5503460f94bdca89/html5/thumbnails/68.jpg)
Summary• Traditionally, SNA examines links,
but not the language content on those links.
• Presented ART, an Bayesian network for messages sent in a social network: captures topics and role-similarity.
• RART explicitly represents roles.
• Additional work– Group-Topic model discovers groups
and clusters attributes of relations.[Wang, Mohanty, McCallum, LinkKDD 2005]