peer-to-peer search that works, djoerd hiemstra
DESCRIPTION
TRANSCRIPT
![Page 1: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/1.jpg)
PEERTOPEER SEARCH THAT WORKS
Djoerd Hiemstrahttp://www.cs.utwente.nl/~hiemstra
Yandex, Moscow, 27 April 2011
![Page 2: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/2.jpg)
2/50
WHAT DOES A SEARCH ENGINE
LOOK LIKE?
?
![Page 3: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/3.jpg)
3/50
A DATA CENTER...?
Goose Creak, California
![Page 4: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/4.jpg)
4/50
A DATA CENTER...?
![Page 5: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/5.jpg)
5/50
A DATA CENTER...?
In Eemshaven... ? Biggest data center in Europe 100,000 servers, 19000 m2, Uses electricity equal to 80.000 households
![Page 6: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/6.jpg)
6/50
A DATA CENTER...?
… where the * is Eemshaven?
Close to a power plant Close to the sea (cooling!)
![Page 7: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/7.jpg)
7/50
WHAT ELSE DOES A SEARCH ENGINE
LOOK LIKE?
?
![Page 8: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/8.jpg)
8/50
A “BIG BROTHER” ?
![Page 9: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/9.jpg)
9/50
A “BIG BROTHER” ?
![Page 10: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/10.jpg)
10/50
NO REALLY, WHAT DOES A SEARCH
ENGINE LOOK LIKE?
?
![Page 11: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/11.jpg)
11/50
… FINDS WHAT YOU NEED ?
![Page 12: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/12.jpg)
12/50
… FINDS WHAT YOU NEED ?
![Page 13: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/13.jpg)
13/50
… FINDS WHAT YOU NEED ?
![Page 14: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/14.jpg)
14/50
SO, NOT NECESSARILY...
Green; environmentally friendly respecting privacy, objective... nor democratic.
![Page 15: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/15.jpg)
15/50
WHAT SHOULD A SEARCH ENGINE
LOOK LIKE?
?
![Page 16: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/16.jpg)
16/50
YOUR PERSONAL SYSTEM:
![Page 17: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/17.jpg)
17/50
PEERTOPEER SEARCH
![Page 18: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/18.jpg)
18/50
YOUR PERSONAL SYSTEM:
Each user brings processing power: As search consumer and search supplier
Green! Democratic No “big brother”
![Page 19: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/19.jpg)
19/50
PEERTOPEER SEARCH
Moscow
Results for “Moscow”
![Page 20: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/20.jpg)
20/50
PEERTOPEER SEARCH
RuSSIRGo to peer 74
![Page 21: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/21.jpg)
21/50
PEERTOPEER SEARCH
RuSSIRGo to peer 74R
uSS
IRR
esul
ts fo
r “R
uSS
R”
![Page 22: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/22.jpg)
22/50
PEERTOPEER SEARCH
RuSSIR
Go to peer 2
![Page 23: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/23.jpg)
23/50
PEERTOPEER SEARCH
RuSSIR
Results for “R
uSSR”
RuSSIR
Go to peer 2
![Page 24: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/24.jpg)
24/50
OVERVIEW
1. Caching in P2P networks
2. Querybased sampling using snippets
3. Deep web querying
![Page 25: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/25.jpg)
25/50
P2P LOAD BALANCING BY CACHING
If you do not index documents, cache them!
Handles query bursts: (e.g., “michael jackson's death”)
![Page 26: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/26.jpg)
26/50
QUERY LOG & CACHING POTENTIAL
![Page 27: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/27.jpg)
27/50
SHARE RATIOS
![Page 28: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/28.jpg)
28/50
CACHE SIZES
![Page 29: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/29.jpg)
29/50
EFFECT OF TEXT PROCESSING
![Page 30: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/30.jpg)
30/50
CHURN
![Page 31: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/31.jpg)
31/50
DISCUSSION
About 55 % from cache in ideal case About 78 % from cache with subsumption,
stemming, etc. About 33 % from cache if bounded cache
and churn (but no subsumption)
![Page 32: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/32.jpg)
32/50
OVERVIEW
1. Caching in P2P networks
2. Querybased sampling using snippets
3. Deep web querying
![Page 33: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/33.jpg)
33/50
QUERYBASED SAMPLING
Never download any documents Instead, use the search results
snippets to learn about documents
![Page 34: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/34.jpg)
34/50
DO SAMPLES RESEMBLE THE FULL INDEX?
![Page 35: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/35.jpg)
35/50
DO SAMPLES RESEMBLE THE FULL INDEX?
![Page 36: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/36.jpg)
36/50
DO SAMPLES RESEMBLE THE FULL INDEX?
![Page 37: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/37.jpg)
37/50
CAN WE DO BETTER THAN RANDOM?
![Page 38: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/38.jpg)
38/50
CAN WE DO BETTER THAN RANDOM?
![Page 39: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/39.jpg)
39/50
DISCUSSION
1. Sampling snippets is as effective as sampling full documents
2. Can be done at no extra costs(!)3. Random sampling is an effective strategy
![Page 40: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/40.jpg)
40/50
OVERVIEW
1. Caching in P2P networks
2. Querybased sampling using snippets
3. Deep web querying
![Page 41: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/41.jpg)
41/50
DEEP WEB QUERYING
Opportunity: while we are sending queries to search engines directly...… we might as well search the deep web!
![Page 42: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/42.jpg)
42/50
YOUR TYPICAL DEEP WEB SITEYOUR TYPICAL DEEP WEB SITEhttp://www.ns.nlhttp://www.ns.nl
![Page 43: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/43.jpg)
43/50
NATURAL LANGUAGE QUERYING
![Page 44: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/44.jpg)
44/50
EASY TO SPECIFY
![Page 45: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/45.jpg)
45/50
USER STUDY
![Page 46: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/46.jpg)
46/50
USER STUDY
![Page 47: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/47.jpg)
47/50
USER STUDY
A = fromB = toV = viaD = dateT = time
![Page 48: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/48.jpg)
48/50
DISCUSSION
1. Users like the interface2. Users perform the tasks faster3. Considerable query variation between
subjects: No “one size fits all”!
![Page 49: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/49.jpg)
49/50
CONCLUSIONS
Peertopeer is a viable approach to large scale search
Peertopeer search will make Google, Yahoo, Bing and Yandex irrelevant ;)
![Page 50: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/50.jpg)
50/50
PUBLICATIONS Almer Tigelaar, Djoerd Hiemstra, and Dolf Trieschnigg, Search
Result Caching in P2P Information Retrieval Networks, Proceedings of the 2nd Information Retrieval Facility Conference (IRFC), 2011.
Almer Tigelaar and Djoerd Hiemstra, QueryBased Sampling using Snippets, In Proceedings of the SIGIR 2010 Workshop on LargeScale Distributed Systems for Information Retrieval, 2010.
Kien TjinKamJet, Dolf Trieschnigg, and Djoerd Hiemstra, FreeText Search versus Complex Web Forms, Proceedings of the European Conference on Information Retrieval (ECIR), 2011.
![Page 51: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/51.jpg)
51/50
ACKNOWLEDGEMENTS
Netherlands Organization for Scientific Research
Almer Tigelaar Kien TjinKamJet Dolf Trieschnigg
![Page 52: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/52.jpg)
52/50
![Page 53: Peer-to-peer search that works, Djoerd Hiemstra](https://reader034.vdocuments.us/reader034/viewer/2022051817/5486dceab4af9f690d8b52c0/html5/thumbnails/53.jpg)
53/50
“MAIL” RESULTS FROM YANDEX ?