aruna balasubramanian, yun zhou, w bruce croft, brian n levine and arun venkataramani department of...

22
Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst Web Search From a Bus

Upload: arlene-marshall

Post on 08-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Connectivity characteristics of testbeds Goal: Build web search in the presence of frequent disconnections and small connectivity duration

TRANSCRIPT

Page 1: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Aruna Balasubramanian, Yun Zhou, W Bruce Croft,Brian N Levine and Arun Venkataramani

Department of Computer Science, University of Massachusetts, Amherst

Web Search From a Bus

Page 2: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Why web search from a bus?

Open access point commonly available

Intermittent internet connectivity from vehicles possible• no subscription cost• useful when no other connectivity is available

Web search 2nd most common web activity (survey by pewinternet.org)

Page 3: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Connectivity characteristics of testbeds

Goal: Build web search in the presence of frequent disconnections and small connectivity duration

Page 4: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Web search process<your favorite search engine>

Retrievin

g web….

Retrievin

g images…

Retrieving….

Page 5: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Adapting to vehicular network

Page 6: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Why challenging?

Interactive• several exchanges between user and search engine

needed

Results imprecise• response may not be relevant• difficult to measure relevance

Thedu: Proxy Architecture: sustain interactionIR contribution: increase usefulness of returned response

Page 7: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Thedu proxy

Between vehicle and search engine

When proxy receives query request from vehicle• retrieves urls and snippets• prefetches URL contents including images• stores responses and maintains state

When vehicle connects to proxy• downloads pending responses

Page 8: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Client and proxy architecture

USER

Web interface

Store query

Process response

Client-side Vehicle Server-side Proxy

Queries for vehicle

Fetch URL/images

Prioritize response

Pending responses

Search engine

Web site

Inte

rmitt

ent

conn

ectiv

ity

New queries

Queries

ResponsebundlesResponses

Page 9: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

How to prioritize?

Search engines use relevance scores to rank responses• scores not comparable across queries

Even if response is relevant it may not be useful• Query “chants 2007” needs only one response

Thedu• Normalize relevance scores: Comparable across queries• Classify query-type: To capture user intent

http://www.netlab.hut.fi/chants-2007/

Page 10: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Query-Type classification

Query-type classification• Homepage query: “cnn”, “chants 2007”• Non-homepage query: “Harry potter review”

Thedu classifies using URL, snippet and title field• E.g., “chants 2007” on Google• <url> http://www.netlab.hut.fi/chants-2007 </url>• <snippet> Welcome to the home page of the ACM MobiCom

workshop on Challenged Networks (CHANTS 2007). </snippet>• <title> chants workshop </title>

Homepage Non HomepageQuery terms occur in URL Query is in question form

All query terms occur in title or snippet

Top URL is wikipedia

Less than 3 words Length greater than 3 words

URL is root

Page 11: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Relevance score normalization

Modified language model framework

D: Document, Q: Query, C: Collection

Normalized score

Kullback-Leibler divergence (distance between Q and D)

Probability of word occurring in document

Probability of word occurring in collection

Page 12: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Thedu protocol

1. Sort responses in the order of normalized score

2. For response r for query q,

2a. Update

2b. If q is homepage query and do not send

2c. Else send response to vehicle

: expected relevance of all response sent for a query q

: probability that r is relevant for q

Page 13: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Evaluation goals

What is the delay in getting search results?

How many results were relevant to the user?

Page 14: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Evaluation Tools

DieselNet

Indri search engine

TREC (Text Retrieval Conference)• Predefined web data collection (10G)• Predefined set of queries (100 homepage + 50 content)• Relevance judgments (which documents are relevant for query)

Thedu’s query-type classifier accuracy: 88%

Page 15: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Deployment on DieselNet

Page 16: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Thedu vs Proxy-less server

Thedu• March 26 to March 30• Bundle responses• Returns responses in

prioritized order• Maintains state

Proxy-less server• April 30 to May 5• Bundle responses• Returns responses as

FIFO• No state

Page 17: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Connectivity duration

Mean connection duration: 35 secMean disconnection duration: 8 min

Page 18: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Thedu vs Proxy-less architecture

Thedu Stateless proxy

Page 19: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Delay until first relevant response

Page 20: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Extending Thedu

Can we use connectivity among buses to improve throughput?

Are we limited to academic search engines?• Convince commercial search providers to provide

relevance scores• Or, assign scores based on ranking

Are users really happy with search results and delay?

traces.cs.umass.edu

Page 21: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Simulation Results

Page 22: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst

Inter-meeting times