geographic web information retrieval
DESCRIPTION
Geographic Web Information Retrieval. Alexander Markowetz , University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg. Current Situation In Web-IR. Everybody is online But never seen. Queries are too short Resultsets are too large. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/1.jpg)
Geographic Web Information Retrieval
Alexander Markowetz, University of Marburg
Thomas Brinkhoff, FH Oldenburg
Bernhard Seeger, University of Marburg
![Page 2: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/2.jpg)
2
Current Situation In Web-IR
Everybody is online But never seen
![Page 3: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/3.jpg)
3
Current Situation In Web-IR
Queries are too short
Resultsets are too large
You can effectively block your competitors
Good results get buried
Smaller Results
Ways to drill the ice-berg
![Page 4: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/4.jpg)
4
Solutions
Personalized Search
Dynamic/Interactive Search
![Page 5: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/5.jpg)
5
Geographic Web-IR
Location is the most personal property „All business is local“
People already use the web geographically „Yoga Brooklyn“
„Linux usergroup Frankfurt“
And get poor results
We are going to make that a lot better
![Page 6: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/6.jpg)
6
How-Not-To
Semantic Web „If just everybody included Geographic Markup in
their web-pages“
Two problems Chicken-Egg
Malicious Webmaster Metatags Anyone?
Bottomline Semantic web is for „B2B“ situations only.
![Page 7: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/7.jpg)
7
How-To
Modify traditional IR techniques to extract geographic markers Multigranular approach
Extending basic Web-IR
Map pages to geographic positions Footprint
Aggregate and Cluster them
Build Applications Geographic Search
Geographic Web-Mining
![Page 8: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/8.jpg)
8
Geocoding
Footprint Geographic Position of
a Webpage
Set of points and polygons, associated with some amplitude
![Page 9: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/9.jpg)
9
Preliminaries
Basic IR Assumptions can easily be extended to „geographic-IR“ Radius-1 Hypothesis
Radius-2 Hypothesis (co-citation)
Intra-Site Hypothesis Intra-subdomain
Intra-directory
![Page 10: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/10.jpg)
10
Multigranularity
Information extraction on different levels Domain
Subdomain
Directory
File
Need to aggregate
Dir
File
Dom
SDom SDom
Dir
File
![Page 11: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/11.jpg)
11
Sources
On all levels Names of places
Zip-codes
Area-codes
On Site Level Whois
Business Directories
Links Density over a given area
Radius-1 and Radius-2
Geospatial Mapping and Navigation of the Web, Kevin S. McCurley, 10th WWW, 2001
Computing Geographical Scopes of Web Resources, J. Ding, L. Gravano, and N. Shivakumar, VLDB 2000
Dir
File
Dom
SDom SDom
Dir
File
![Page 12: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/12.jpg)
12
Geographic Search
A simple interface Not so exciting, but...
Key Words
City
Street
State
Area code
SEARCH
![Page 13: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/13.jpg)
13
Dynamic Geographic-IR
Replacing the „next“ button
Closer Continue Wider
Next Closer Wider
Next ½ mile 1 mile 2 miles5 miles 10 miles 25 miles 100 miles
![Page 14: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/14.jpg)
14
Locality
Final ranking is a (linear) combination of importance and geographic distance.
Chances are: Amazon will still rank first: no matter where you are
Amazon is a „global bully“
Idea: Eliminate global bullies by computing importance
differently
Give less weight to links that span a longer distance
![Page 15: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/15.jpg)
15
Evaluation
Evaluation Web-IR is hard
Evaluating geo-Search is even harder
Mistakes are hard to find
![Page 16: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/16.jpg)
16
Impact of geo-IR
Next generation Search Engine
Location based Service For cellphones under UMTS
Move traffic from A&E Local companies will get more traffic
Increase Profits from Adwords Smallest businesses will advertise online
Locally focused
The „Leaflet-industry“ will shrink
![Page 17: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/17.jpg)
17
Geographic Web-Mining
The web reflects human society. Distorted
Delayed/Ahead
A lot of interesting social questions can be answered by looking at a large webcrawl
You can save time and money compared to door-to-door surveys This is widely used
But: Most of them are of geographic nature
![Page 18: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/18.jpg)
18
Example Queries
Where in Germany are vintage sneakers a trend?
Is there a fashion authority that is accepted in all regions of Germany?
Do Britney and Madonna have the same audience?
Draw a map of Germany with all sites about vintage sneakers.
Find all fashion-sites that get a min of 1000 equally distributed links.
Map the areas in Germany, where there are significantly more Sites for B. than for M.
Precise Semantics?
![Page 19: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/19.jpg)
19
Current Work
Older Prototype Metasearch on top of lycos.de
Screen-scrape & re-order
Whois only
Did very well
![Page 20: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/20.jpg)
20
Current Work
Current Prototype for Geographic Search Limited to Germany = .de domains
50.000.000 Pages
Expected online by late summer
In co-operation with Yen-Yu Chen
Xiaohui Long
Torsten Suel
Polytechnic University, Brooklyn
![Page 21: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/21.jpg)
21
Reinventing Web-IR
Nearly no (academic) work in geo-IR
Allmost every aspect of Web-IR needs to be looked at again Interfaces
Query processing
Index distribution
Link analysis
User profile analysis
Spam detection
Even: Other aspects of personalized search
Changes in the web
![Page 22: Geographic Web Information Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062308/5681329a550346895d993571/html5/thumbnails/22.jpg)
22
Thank you
Any questions?