location-based search: services, photos, web

47
Location-based search: services, photos, web Andrei Tabarcea Mohammad Rezaei 4.12.2013

Upload: takoda

Post on 05-Jan-2016

23 views

Category:

Documents


2 download

DESCRIPTION

Location-based search: services, photos, web. Andrei Tabarcea Mohammad Rezaei 4.12.2013. Introduction. keyword. The goal is to find services , photos and points of interest close to the user’s location We call this “ location-based search ” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Location-based search: services, photos, web

Location-based search: services, photos, web

Andrei TabarceaMohammad Rezaei

4.12.2013

Page 2: Location-based search: services, photos, web

Introduction

The goal is to find services, photos and points of interest close to the user’s locationWe call this “location-based search”We try to search our local database of photos and services and to find location information in web-pages

keyword

Results on map

User location

Page 3: Location-based search: services, photos, web

MOPSI search

Mopsi Services Database

Mopsi Photo Collection

Mopsi Web Search

Combinationof

search results

User locationkeyword

Mopsi search

Page 4: Location-based search: services, photos, web

Web interfaceInput: (keyword, user location)Output: array of results

keyword

Results list

Search optionsResults on map

User location

Page 5: Location-based search: services, photos, web

Mobile interfaceUser location

Search results

Page 6: Location-based search: services, photos, web

Mopsi search (server workflow)Input: (keyword, flagS, flagP, flagW, user location)Output: (g_markersData) array of results

keyword

Results list

Search options

flagSflagPflagW

Results on map

User location

Page 7: Location-based search: services, photos, web

Location based searchInput: keyword, flagS, flagP, user location (lat,lon)Output: list of results

Note: A service has a list of keywords and a title A photo has just a descriptionSo, Keyword search is done according to this information

Notation: S: service, P: phototext(S): keywords and title of servicetext(P): description of photoflagS: search for services if trueflagP: search for photos if true

Page 8: Location-based search: services, photos, web

Overall flowStart

Update keywords statisticsUpdate keywords history

flagS

Y

Photo search and add results to the list

flagP

flagW

Local service search and display results in list

Web searchAdd results to the list and on map

End

Y

Y

N

N

N

When a keyword is searched: statistics: the count of it in database is incremented, keyword and city are stored

history: keyword, location, userid and time are stored

Stage 1:Search mopsi services

Display all results on Map

Stage 2:Search mopsi photos

Stage 3:Search web

Page 9: Location-based search: services, photos, web

Local service searchStart

nL>0

Do search on servernL=number of results

Display results in the list

Y

N

End

Cluster results with almost same title and location

Sort the results(distance to user location)

Take and display one of the similar results as representative

The list of results

Page 10: Location-based search: services, photos, web

Photo searchStart

nP>0

Do search on servernP=number of results

YN

End

Cluster results with almost same title and location

Cluster the results and Local services with almost

same title and location

Sort the results(distance to user location)

Add results to the list

Page 11: Location-based search: services, photos, web

Web search

Start

nW>0

Do search on servernW=number of results

YN

End

Cluster results with almost same title and location

Cluster the results and Local services and photos with almost same title and

location

Sort the results(distance to user location)

Add results to the list

Add results on the map

Page 12: Location-based search: services, photos, web

Filtering results: old solution Fixed distance to user location: d

Find services wheretext(S) ≈ keyword AND dist(S,User) < d

Find photos wheretext(P) ≈ keyword AND dist(P,User) < d

d

Advantages: SimpleSame time for any search

Disdvantages: Parameter d (User can choose d, but still not automatic)There are many cases with “no results”

Page 13: Location-based search: services, photos, web

Current solution: Binary search K-nearest services

Show all the results in 10 km

If number of results is less than K, double the distance (until whole earth), when number of results is bigger than K, divide the distance

d

2d

4d

xExample with k=5:Number of results n in distance d: 1 < k Double distance: in 2d, n=2 < kIn 4d, n=8 > kNow dividing distance in colored area:In 3d, n=4 < kIn 3.5d, n=5 (=k)So, we have 5 nearest results to user location in distance x User location

A photo or service with required keyword

Page 14: Location-based search: services, photos, web

Algorithmd=10000: initial distanceK=10: number of required resultsdelta_dist: minimum distance for dividingns: number of resulted services res_Snp: number of resulted photos res_P

res_S = services where text(S) ≈ keywordres_P = photos where text(P) ≈ keywordif ( ns+np > K )

(res_S res_P dist) = extend_distance();(res_S res_P dist) = contract_distance();

display (ns+np) services and photos

extend_distance() ns= 0; np=0; While ( ns+np < K AND dist < earth_r*pi)

res_S = services where text(S) ≈ keyword AND dist(S,User) < distres_P = photos where text(P) ≈ keyword AND dist(P,User) < distdist = dist*2

dist = dist/2

d

2d

4d

Δ

Page 15: Location-based search: services, photos, web

Algorithm (cont.)contract_distance(dist, K)

d1 = dist/2d2 = dist dist = (d1 + d2)/2delta = dist – d1ns=np=0While ( ns+np != K AND delta > delta_dist AND dist > d )

res_S = services where text(S) ≈ keyword AND dist(S,User) < dist res_P = photos where text(P) ≈ keyword AND dist(P,User) < dist if ( ns+np > K )

d1 = d1; d2= distelse

d1 = dist; d2 = d2dist = (d1 + d2)/2delta = dist-d1

Page 16: Location-based search: services, photos, web

Simplifying distance calculation

Since there is no spatial dist function in mysql:Points with distance < d from user locationSimplified: |lat-lat1|< Δlat AND |lon-lon1|< Δlon

User location(lat1, lon1)

d

d

(lat1+ Δlat, lon1)

(lat1, lon1+ Δlon)

Δlat and Δlon?lat1, lon1

d (in meter)

Page 17: Location-based search: services, photos, web

Δlat and Δlon?

)2/(sin)cos()cos()2/(sinarcsin( 221

2 lonlatlatlatEd

Distance d (in meter) between two points (lat1, lon1) and (lat2, lon2):

Earth diameter (in meter)

Haversine distance:

(lat1, lon1) and (lat1, lon1+ Δlon) Δlat=0

))2/(sin)(cosarcsin( 21

2 lonlatEd

)2/sin()cos()/sin( 1 lonlatEd

))cos(

)/sin(sin(2

1lat

Edalon

(lat1, lon1) and (lat1+ Δlat, lon1) Δlon=0

)2/(sinarcsin( 2 latEd

E

dlat

2

lonlat

EdifNote 1

)cos(

)/sin(:

1

Page 18: Location-based search: services, photos, web

How to find location-information in web-pages?

Location-based web data mining

Page 19: Location-based search: services, photos, web

Mopsi web search

Web mining

Page 20: Location-based search: services, photos, web

Geo-referencing

Geo-referencing:

A geographic reference is an information entity that is discovered from the context and can be mapped to a geographic location

Strategies for geographic reference extraction:– Gazetteer-based text matching– Rule-based linguistic analysis– Regular-expression based text matching– Using host location– Geographic meta-tags

Hu, Y. H., Lim, S., & Rizos, C. Georeferencing of Web Pages based on Context-Aware Conceptual Relationship Analysis. 2006

Page 21: Location-based search: services, photos, web

Ad-Hoc Georeferencing

The problem is how to extract and validate location data from semi-structured textPostal address is the most common location data foundOur goal is to give geographical coordinates to services mentioned in web-pagesWe call this method ad-hoc georeferencing

<HTML><HEAD profile"="http://geotags.com/geo> <META name="geo.position" content="62.35;29.44"> <META name="geo.region" content="FI"><META name="geo.placename" content="Joensuu"> <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><link rel="stylesheet" href="http://www.joensuu.fi/tkt/sivutyyli.css" type="text/css"><TITLE>Pages of Pasi Fränti</TITLE></HEAD>

VS.

Page 22: Location-based search: services, photos, web

Location Information in Webpages

Site hosting information (owner address, server address etc.)

HTML tags (geo-tags, address-tags, vcards for Google Maps etc.)

Natural language descriptions

Addresses, postal codes, phone numbers

Page 23: Location-based search: services, photos, web

Site hosting informationdomain: uef.fidescr: ITÄ-SUOMEN YLIOPISTO (UNIV OF EASTERN FINLAND)descr: 22857339address: TIETOTEKNIIKKAKESKUS (IT-CENTRE)/Jarno Huuskonenaddress: PL 1627address: 70211address: KUOPIO FINLANDphone: +358 44 7162810status: Grantedcreated: 26.5.2010modified: 19.8.2011expires: 26.5.2015nserver: ns-secondary.funet.fi [Ok]nserver: ns1.uef.fi [Ok]nserver: ns2.uef.fi [Ok]dnssec: no

Page 24: Location-based search: services, photos, web

geo-tags, address-tags, vcards for Google Maps etc.

HTML tags

<HTML><HEAD profile"="http://geotags.com/geo>

<META name="geo.position" content="62.35;29.44"> <META name="geo.region" content="FI"><META name="geo.placename" content="Joensuu"> <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><link rel="stylesheet" href="http://www.joensuu.fi/tkt/sivutyyli.css" type="text/css"><TITLE>Pages of Pasi Fränti</TITLE></HEAD>

Page 25: Location-based search: services, photos, web

Natural language descriptionsScouts' Youth Hostel (8.3 km from Joensuu Airport) Show map

Good, 7.4 Latest booking: January 23 Scouts’ Youth Hostel is located at the outfall of River Pielisjoki, 1.5 km from Joensuu city centre. It offers free Wi-Fi and rooms with shared bathroom and kitchen facilities. Olga Saint-Petersburg, Russia "Great price for the nice room. Friendly stuff, cozy atmosphere. But a bit loud."

from € 46

Page 26: Location-based search: services, photos, web

Postal addresses

Page 27: Location-based search: services, photos, web

Input:

• user location (lat, lon)

• keywords

Output: list of services containing:

• name/title

• website

• address (street, number. city)

• location (lat, lon)

• image

• other info (opening hours, telephone etc.)

Main idea:

• preprocess the search results of an external search engine (Google, Yahoo, Bing etc.) by detecting postal address in order to find the location

Mopsi search

Page 28: Location-based search: services, photos, web

Problems- How to evaluate relevance?

- Mixed keyword meanings

- No relation between keywords and addresses

Page 29: Location-based search: services, photos, web

Mopsi Web Search Workflow

Geocoded street-name

database

Geo-referencing module

Mobileapplication

Web userinterface

Coordinates

AddressKeywordCoordinates

Searchresults

KeywordCoordinates

Searchresults

Page 30: Location-based search: services, photos, web

Georeferencing module

Georeferencing module

Geocodeddatabase

Address and description

detector

Address validator

Word list

Results list

Sorted results list

KeywordMunicipalities

<keyword, municipality>

query

Result links

Coordinates

Municipalities listAddresses

Coordinates

Relevant municipalities

detector

Keyword, Address,Coordinates

Page parser

Page 31: Location-based search: services, photos, web

1. Convert user location (lat, lon) into user address = Geocoding step

2. Search with the query "keyword+city" using an external search engine API and download the first k results (web pages) = Web page retrieval step

3. Detect addresses and additional informatio from the downloaded web pages = Data mining step

4. Ranking the results (distance, relevance etc.) = Ranking step

5. Display the search results to the user

Proposed steps

1. Geocoder

2.Web page retrieval

3. Data

mining

4. Result

rankingUser

lat, lon

keywordsweb

pagesresult

list

5. ranked result list

Page 32: Location-based search: services, photos, web

1. Geocoding

Geocoder

Web page retrieval

Data mining

Result ranking

User

lat, lon

keywordsweb

pagesresult

list

ranked result list

Convert user location (lat, lon) into user address using:

Page 33: Location-based search: services, photos, web

2.Web page retrieval

Geocoder

Web page retrieval

Data mining

Result ranking

User

lat, lon

keywordsweb

pagesresult

list

ranked result list

Download k webpages from the query <keyword, city> using API of:

Page 34: Location-based search: services, photos, web

3.Data mining

Geocoder

Web page retrieval

Data mining

Result ranking

User

lat, lon

keywordsweb

pagesresult

list

ranked result

list

Main idea:Find location information in HTML pages by detecting postal addresses

Steps:1. Parse and segment the HTML page2. Identify addresses and locations3. Identify the services the addresses are pointing to (name/title) and

retrieve extra information (photos, opening hours, telephone etc.)

Page 35: Location-based search: services, photos, web

3.1 Parsing HTML pages

-Current solution extracts an array of text from HTML pages-We don’t exploit the advantage that we extract data from web pages-Proposed future solution:

- Segmentation of web pages using DOM trees- Detection of the address block- Nearest-neighbor search considering text and visual characteristics

Joen Pizza Special Y-tunnus 2129577-6 Käyntiosoite Koskikatu 17 80100 JOENSUU Postiosoite Koskikatu 17 80100 JOENSUU Puhelin: 013-220246 Virallinen toimiala Kahvila-ravintolat

Page 36: Location-based search: services, photos, web

Web page example - Homepage

Page 37: Location-based search: services, photos, web

DOM tree

blue: links (the A tag)red: tables (TABLE, TR and TD tags)green: dividers (DIV tag)violet: images (the IMG tag)yellow: forms (FORM, INPUT, TEXTAREA, SELECT and OPTION tags)orange: linebreaks and blockquotes (BR, P, and BLOCKQUOTE tags)black: HTML tag, the root nodegray: all other tags

Page 38: Location-based search: services, photos, web

DOM subtree

<html>

<body>

<table> <td>

<tr> <div>

<table>

<tr><td>

PizzaPojat Niinivaara

Niinivaarantie 19

80200 Joensuu

013 - 137 017

<br/>

<div>

<table align="center“> <tr> <td> <div id="footerleft"> <h3>PizzaPojat Niinivaara</h3> <p>Niinivaarantie 19</p> <p>80200 Joensuu</p> <br /> <p>013 - 137 017</p> </div> <td> </tr> </table>

Page 39: Location-based search: services, photos, web

Web page example - Catalog

Bosbor kebab

Fiesta

Miami

Page 40: Location-based search: services, photos, web

<html>

<body>

<table> <td>

<tr> <div>

<table>

<tr><td>

PizzaPojat Niinivaara

Niinivaarantie 19

80200 Joensuu

013 - 137 017

<br/>

1. Convert HTML pages to xHTML for using xQuery

2. Detect addresses and postal codes

3. Break the DOM tree into subtrees

4. Use heuristics and regular expressions to detect extra information from the subtree (service name, telephone, opening hours etc.)

Proposed implementation

Page 41: Location-based search: services, photos, web

Rule-based pattern matching algorithmStarting point: the detection of street-namesPrefix trees are used for fast text matching for street-namesAn address-block candidate is constructed by detecting:

• street names and number• postal codes• municipal names

We will use OpenStreetMap database for global detection

3.2 Postal address detection

Street namesStreetnumbers

City namesTelephonenumbers

Page 42: Location-based search: services, photos, web

AddressDetection(words)i=0while i < count(words)

set street, number, postcode, city as emptyif word[i] is streetName

i++street = words[i]for j = i to i+5

if words[j] is numbernumber = words[j]break

for k = j+1 to j+5if word[k] is postcode postcode = words[k]j = kbreak

for k = j+1 to j+5if words[k] is citycity = words[k]i = k+1break

if street is not empty AND number is not empty AND city is not emptycandidate = (street, number, postcode, city)

3.2 Postal address detection

Joen Pizza Special Y-tunnus: 2129577-6 Käyntiosoite: Koskikatu 17 80100 JOENSUU Puhelin: 013-220246 Virallinen toimiala: Kahvila-ravintolat

streetName number postcode city

Page 43: Location-based search: services, photos, web

Prefix TreesInvented by Friedkin (1960)

The prefix tree (or trie) is a fast ordered tree data structure used for retrieval

Root is associated with an empty string

All the descendants of a node have a common prefix of the string associated with that node

Some nodes can have associated values (usually they mark the end of a word)

Page 44: Location-based search: services, photos, web

Street-name prefix trees

Our solution is to detect street-names using prefix trees constructed from the gazetteer

A street-name prefix tree is build for each municipality used in the search

The user’s location and his area of interested are known, therefore prefix-trees can be limited to municipalities

Prefix Tree Statistics Finland Singapore

Maximum tree depth 34 14

Average tree depth 12.7 7.4

Average tree width 105 167

Average number of nodes per tree 2338 2335

Total size (MB) 74.4 0.18

Page 45: Location-based search: services, photos, web

3.3 Retrieve extra information - Title detection (or company detection) is a

Named Entity Recognition problem

Usually, the text before the address holds relevant informationThere are other methods to investigate such as using classifiers or using web page structure

Joen Pizza Special Y-tunnus: 2129577-6 Käyntiosoite: Koskikatu 17 80100 JOENSUU Postiosoite Koskikatu 17 80100 JOENSUU Puhelin: 013-220246 Virallinen toimiala: Kahvila-ravintolat

addresswords before the address

Page 46: Location-based search: services, photos, web

4. Ranking

Geocoder

Web page retrieval

Data mining

Result ranking

User

lat, lon

keywordsweb

pagesresult

list

ranked result list

Main criterion: distance from the user’s location

Future idea: relevance to user’s profile and history

Page 47: Location-based search: services, photos, web

Future ideas recap

– Use freely available geographical sources for extending the prototype to other regions

– Use geographical scope of a web page to improve address detection and disambiguation

– Use the structure of the HTML page and DOM tree semantic analysis for better data extraction

– Gather and tag a testing dataset for better evaluation of the algorithms