complex network analysis reveals kernel-periphery structure in web search queries
DESCRIPTION
Query Representation and Understanding Workshop 2011 (QRU '11) ACM SIGIR 2011, Beijing, China. Complex Network Analysis Reveals Kernel-Periphery Structure in Web Search Queries. Language of Queries. - PowerPoint PPT PresentationTRANSCRIPT
Complex Network Analysis Reveals Kernel-Periphery Structure in Web Search
Queries
Query Representation and Understanding Workshop 2011 (QRU '11)
ACM SIGIR 2011, Beijing, China
Rishiraj Saha Roy and Niloy Ganguly
IIT KharagpurIndia
Monojit ChoudhuryMicrosoft Research
IndiaIndia
Naveen Kumar Singh
NIT DurgapurIndia
Language of Queries
Interaction between user and search engines over the years has resulted in the evolution of a distinct language for Web search queries
gprs config samsung focus at&t
samsung focus at&t gprs config
focus config at&t gprs samsungApril 19, 2023 Query Representation and Understanding 2011 (QRU '11) 2
Language of Queries
How can we begin to
analyze this new language?
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 3
Complex Networks
Real life networks not easily explained by standard topologies
Applications to linguistics – word co-occurrences, consonant inventories, syntactic and semantic features, language dynamics
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 4
Complex Networks
Word co-occurrence networks: Interesting tool to discover fundamental properties of a language
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 5
Data
16.7 million entries sampled from Bing Query
Logs from Australia (February – May 2009)
Courtesy: Microsoft India Development Center
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 6
Network Models for Queries
“gprs” “config” “samsung
focus” “at&t”
“dell laptop” “extreme”
“gaming” “config”
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 7
samsung focus
config
gprs
extreme
gamingdell
laptopat&t
Globalco-
occurrenceEdge restriction
Localco-
occurrence
Two-regimePower Law
Two-regime power law in degree distribution
Similar coefficients for queries and English
Kernel (K-Lex) and peripheral (P-Lex) lexicon
distinctionApril 19, 2023 Query Representation and Understanding 2011 (QRU '11) 8
Insights (1)
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 9
Differences in compositions of K-Lex and P-Lex
Heads and modifiersK-Lex (popular
segments)P-Lex (rarer segments)
how to matthew brodrick
wiki accessories
free police officer
and who is
in australia epson tx800
videos star trek next gen
real estate adams apple
difference between harvard university
windows xp leukemia
K-Lex and P-Lex Higher mean shortest
paths Less tight kernel More k-p edges Socio-cultural effects
Insights (2)
Higher mean shortest path in query networks
Peripheral units can independently form queries
More difficult to understand the context of a previously unseen unit
High surprise factorApril 19, 2023 Query Representation and Understanding 2011 (QRU '11) 10
K-Lex and P-Lex Higher mean shortest
paths Less tight kernel More k-p edges Socio-cultural effects
airedale
terrier
tumor
where
download
prison break
Insights (3)
Kernel is less tightly coupled
98% edges run between kernel and periphery, while intra-kernel edges dominate in English
Socio-cultural factors govern kernel-periphery distinction (lyrics, movies, adelaide in K-Lex; code, accessories, delhi in P-Lex)
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 11
K-Lex and P-Lex Higher mean shortest
paths Less tight kernel More k-p edges Socio-cultural effects
April 19, 2023 Query Representation and Understanding 2011 (QRU '11) 12