business identification: spatial detection alexander darino week 5
TRANSCRIPT
Business Identification:Spatial Detection
Alexander DarinoWeek 5
2
Outline
• Recap of Previous Work• Business Name Detection• Business Name Matching• Business Spatial Detection• Weaknesses to Current Approach• Alternatives to Current Approach• Acknowledgements
3
Outline
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
Week 4 Week 5
Previous Work
4
Image Where Am I? Latitude, Longitude
Latitude, Longitude
Geocoding
ReverseGeocoding
Nearby Businesses
65George S Aiken CoWinghart's Burger & Whiskey BarMarket SquareBella Sera On the SquareChipotleNOLALas Velas…
5
Business Name Detection
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
6
Business Name Detection
7
Business Name Detection…<line dy="95" dx="1573" y="420" x="11" value="1">
<space dy="26" dx="9" y="379" x="11"/> <box dy="26" dx="9" y="379" x="11" value="0" weights="96" numac="1"/> <box dy="25" dx="6" y="406" x="11" value="J" weights="98,62" numac="2"
achars="p"/> <box dy="19" dx="5" y="382" x="19" value="n" weights="96" numac="1"/> <space dy="5" dx="30" y="441" x="25"/> <box dy="5" dx="7" y="441" x="56" value="."/> <box dy="24" dx="5" y="401" x="57" value="."/> <box dy="13" dx="8" y="429" x="58" value="v" weights="98" numac="1"/> <box dy="26" dx="9" y="402" x="60" value="." weights="94" numac="1"/> <box dy="22" dx="5" y="406" x="67" value="0" weights="96" numac="1"/> <box dy="10" dx="12" y="444" x="71" value="."/>
</line>…
8
Business Name Matching
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
9
Business Name Matching
• Developed Confidence Attribution Algorithm– Confidence of OCR Token being Name Token• Example: Confidence of “ESTUANT” representing
“RESTAURANT”• Point-based system
– Confidence of Name appearing in Image• Sum of points of matching OCR Text• Use logarithmically-normalized points to determine
business inclusion threshold
10
Business Name Matching
11
12
Business Name Matching
13
14
Business Name Matching
15
Business Name Matching
Note: k is usually 2 or 3
16
Business Name Matching
17
Business Name Matching
Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification
18
Business Spatial Identification
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
19
Business Spatial Identification
20
Business Spatial Identification
Aiken George S Co
Category:Food, GroceryAddress: 218 Forbes AvePittsburgh, PA 15222Phone: (412) 391-6358Rating: 4.5/5 (2 Reviews)
21
Business Spatial Identification
22
Business Spatial Identification
23
Business Spatial Identification
Bruegger's Bagels
Category:BagelsAddress: Market Sq
Pittsburgh, PA 15222Phone: (412) 281-2515Rating: Not Rated
24
Weaknesses to Current Approach
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
25
Weaknesses to Current Approach
Lots of Garbage
26
Weaknesses to Current Approach
Fragmented Word Detection
27
Weaknesses to Current ApproachFails with
non-orthogonal perspective
Did I already mention lots of
garbage?
28
Weaknesses to Current Approach
Fails withnon-roman text
Not scale-invariant
29
ALTERNATIVE APPROACHESTwo different
30
Alternative #1: Image Matching
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image
Match to Storefront
Image
BusinessIdentification
Business Spatial
Detection
31
Alternative #1: Image Matching
32
Alternative #1: Image Matching
• Weaknesses– Storefront images aren’t always available for
matching– Computationally Expensive• Hundreds of images to compare to
– Nothing new– Boring!
33
Alternative #2: Template Matching
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image
Render Templates of Business Names in Different Fonts
Business SpatialDetection
Image Matching(eg. SIFT, HAAR)
Template Images
Business Identification
34
Alternative #2: Template Matching
• Tambellini• Tambellini• Tambellini• Tambellini
• Tambellini• Tambellini• Tambellini• Tambellini
35
Alternative #2: Template Matching
OCR• Not Scale Invariant• Unbounded Search• Fragmented Recognition• Roman-only font
Alternative #2• Scale Invariant• Bounded Search• Whole-word recognition• All fonts
36
Acknowledgements
• Subh– Provided several ideas regarding template
matching using SIFT, HAAR features, etc
Thank You