business identification: local neighborhood alexander darino
TRANSCRIPT
![Page 1: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/1.jpg)
Business Identification:Local Neighborhood
Alexander Darino
![Page 2: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/2.jpg)
Outline
• Where Am I? project obtains geolocation of camera from image
• Objective: Obtain the geolocation and address of Businesses in image– Assume Business is nearby, eg. < 100m from
camera– Compare methods of obtaining this information
![Page 3: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/3.jpg)
Outline
LatitudeLongitude
GeocodingReverse Geocoding
Nearby Businesses
Image OCR Detected Text
Business Name
MatchingBusiness
Identification
![Page 4: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/4.jpg)
Outline
• This Week:– Finding Local Businesses via Geocode Search– Finding Local Addresses via Reverse Geocoding– Extracting Identifying Text (ie. store names) via
Optical Character Recognition (OCR) – Matching OCR text to Business Names
• Next Steps/Weekend Objectives• Acknowledgements
![Page 5: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/5.jpg)
Obtaining Business Names
✓
![Page 6: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/6.jpg)
Local Businesses: Geocode Search
• Used Three Place-Search APIs:– Yelp API - detailed yellow page-type results– Google Places API - "Skinny" + Reference to more
information– CityGrid API - minimal yellow page-type results
• Used by Yellow Pages, Super Pages
• At present, only interested in business names • Aggregated names from all three APIs • Example (next slide)
![Page 7: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/7.jpg)
Local Businesses: Geocode Search40.441127247181797 -80.002821624487595Denham & Company SalonUllrich's Shoe RepairingNicholas Coffee CoBella Sera On the SquareA & J RibsStarbucks CoffeeJenny Lee BakeryGalardi's 30 Minute CleanersJimmy John's Gourmet SandwichesCharley's Grilled SubsFresh CornerLagondola Pizzeria & RestaurantCamera Repair Service IncPittsburgh Cigar BarOriginal Oyster HouseMixStirs1902 TavernCostanzo'sPittsburgh Silver LlcGraeme StGalardi's 30 Minute CleanersDenham & Co SalonBruegger's Bagel BakeryNicholas Coffee CoMarket SquareFat Tommy's PizzeriaMixstirs CafeGigglesRycon Construction IncGarbera, Dennis C, Dds - Emmert Dental AssocBella Sera on the SquareMancini's Bread CoLas VelasCiao BabyWashington Reprographics IncHighmark Life Insurance CoFischer, Donald R, Md - Highmark Life Insurance CoJimmy John'sLynx Energy Partners IncEmmert Dental Assoc
![Page 8: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/8.jpg)
Local Businesses: Geocode SearchResults:
12 Success, 3 PartialQ9: First Presbyterian Church [turns out it wasn't a cathedral] (SUCCESS)Q28: Moe's (SUCCESS)Q34: Bruegger's Bagels (SUCCESS)Q35: Breuggers, Tavern, Nicholas (SUCCESS)Q42: Tavern, Nicholas, Constanzo's [in distance] (SUCCESS)Q57: Tambellini (SUCCESS)Q63: Benedum Center (SUCCESS)Q141: Roberts/7-Eleven (PARTIAL - misses Roberts)Q200: Goodyear (SUCCESS)Q238: Far from Bruegger's, Tavern, Nicholas (PARTIAL - misses Tavern)Q246: Some theater (can't read it) (SUCCESS)Q249: George Aikens (SUCCESS)Q260: Dogs Dun Wright, Cherrie's diner (SUCCESS) Q300: Giggles, Bruegger's, Tavern (in distance) (SUCCESS)Q318: Fifth Avenue Place, Wines & Spirits (PARTIAL - misses Wine & Spirits)
![Page 9: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/9.jpg)
Local Businesses: Geocode Search• Strengths
– Aggregated results almost always found Business of interest
• Weaknesses
– Each API limits query result set size - this is why we aggregate– Contacted Yelp, Google, CityGrid for extended API Access.
• Heard back from CityGrid; conference call next week.
– Only businesses listed– Not all businesses listed
• All but one "Partial" result were for unlisted businesses
• Limitations
– Have only tested for 15 Pittsburgh images - unknown result quality for rural areas.
![Page 10: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/10.jpg)
Local Businesses: Geocode Search
✓
✓*
✓* Implicitly verified: APIs can search by latitude/longitude OR address
![Page 11: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/11.jpg)
Local Addresses: Reverse Geocoding
• Used Two Reverse-Geocoding APIs– Google: provides a range of addresses on the same road
• Usually the road is correct, but sometime's it's slightly off• Sometimes the road is correct, but the actual address number is not in
the range
• Bing: provides one or two proximate addresses– Rates it's own confidence. Even 'Medium' confidences are very
accurate– Address is never exact, but is almost always adjacent to correct
address– Results returned never consistent: always returns one or the
other or both of the two addresses regardless of confidence level
![Page 12: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/12.jpg)
Local Addresses: Reverse Geocoding
• Intent: Get up to ~500 nearby addresses• No Address Search API Available
✓
✓*
✓✗
![Page 13: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/13.jpg)
Extracting Identifying Text: OCR
LatitudeLongitude
GeocodingReverse Geocoding
Nearby Businesses
Image OCR Detected Text
Business Name
MatchingBusiness
Identification
![Page 14: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/14.jpg)
Extracting Identifying Text: OCR
• Given:– List of nearby businesses (names, addresses, etc)– Image containing businesses with visible names
• Objective:– Extract name of businesses from image– Identify businesses located in image
• Match names extracted from image to names in business list
![Page 15: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/15.jpg)
Extracting Identifying Text: OCR
• Used Two OCR APIs:– GNU OCR (Ocrad)– GOCR
• OCR APIs highly sensitive to:– Font (only works well with roman font)– Perspective– Scale– Binarization Threshold– Dark on Light vs. Light on Dark (inversion)
![Page 16: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/16.jpg)
Extracting Identifying Text: OCR
• OCR API evaluations– Ocrad - could not yield any meaningful data across
over 200 scale/threshold/inversion combinations– GOCR - produced good results across 10 scales
with and without inversion using threshold automatically determined by Otsu's method
• Examples of GOCR output (next slides)
![Page 17: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/17.jpg)
Extracting Identifying Text: OCR
![Page 18: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/18.jpg)
Extracting Identifying Text: OCR
n.c.......o.a...u..............oU..D.oa..e......_RuEGGE..KERy..J...w...........L........M.II.....c..
...i
.......l.
.J
.t...llt...lSHA.P.It..tllt.........._.l...Jy._.c_...._tt.._....t.._.r.........t.t_t.._.._.l..J.r.r.I.
![Page 19: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/19.jpg)
Extracting Identifying Text: OCR
![Page 20: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/20.jpg)
Extracting Identifying Text: OCRu..........._nq......eoR.E.l.e...í....e...n....n....n.e.R.E...e....o._....E.R.E.IKE........I.ltlO.........rE..o......E.....I.K.E.o.....
J.n....c...E.R.E.I.E.......M..E.R.E...E...aJ...Gu.ge..geE.F.._.....E..gE.D...fUlI..lll.lll.IIi.l..Xl..
![Page 21: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/21.jpg)
Extracting Identifying Text: OCR
![Page 22: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/22.jpg)
Extracting Identifying Text: OCR..e_..w.._......D.........uJ.....J.................n......n..........n_..r.l_d..J.ec.m._..n.......J.n.._...tn..ct..._.................D.u.v...e.n....u..
Y.._w.n.n....Jn.......G..o..r..._........J...ml.t..l.tt.l.._w....................._....l....t........j..ilI.i..
![Page 23: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/23.jpg)
Extracting Identifying Text: OCR
![Page 24: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/24.jpg)
Extracting Identifying Text: OCR__.ncu_.l..._..._J...ne......._n._..v.....ra......d_..._.............i..n..UllREsT.unAN...r.c.....r...Tt.rJll......m...c.....n.......
..
.Jn.I..c...r.rESTAU.ANT.r.O....c.cc.
Note: Even though "Tambellini" is a roman font, it is too stretched to be picked up by GOCR
![Page 25: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/25.jpg)
Extracting Identifying Text: OCR
• Strengths– Applicable to expected input of orthogonal images– Output can be run through word similarity matching algorithms
• Weaknesses– Only works well(-ish) for strictly roman font
• Limitations– Will perform poorly for artistic fonts and business signs
• Conclusion– By itself, OCR is not the best approach towards Business
identification (poor recognition, franchises, perspective, etc)– OCR could be used as part of Business identification voting
scheme
![Page 26: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/26.jpg)
Matching OCR Text to Business Names
LatitudeLongitude
GeocodingReverse Geocoding
Nearby Businesses
Image OCR Detected Text
Business Name
MatchingBusiness
Identification
![Page 27: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/27.jpg)
Matching OCR Text to Business Names
• Fuzzy String Matching: TRE Package– Approximate Regular Expression Matching– Returns edit-distance of matched text
• Filter OCR text– Trimming– Chunking– Uselessness (ie. Less than two letters)
• Developing algorithm to rate confidence of business name appearing in image
![Page 28: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/28.jpg)
Matching OCR Text to Business Names
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 (𝑁𝑎𝑚𝑒 )= 1𝑂𝐶𝑅 h𝑀𝑎𝑡𝑐 𝑒𝑠 ∑
𝑂𝐶𝑅
❑ h𝐿𝑒𝑛𝑔𝑡 (𝑂𝐶𝑅 )(1+¿𝐸𝑟𝑟𝑜𝑟𝑠 )¿¿
¿
![Page 29: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/29.jpg)
Matching OCR Text to Business Names
![Page 30: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/30.jpg)
Next Steps/Weekend Objectives
• Implement ‘chunking’ to OCR output• Evaluate and refine algorithm against multiple inputs• Detect location of text in image
![Page 31: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/31.jpg)
Acknowledgements
• Subh– Directed us to the Ocrad and GOCR OCR packages– Provided feedback on how to calibrate OCR
packages to extract meaningful text (eg. scaling, inversion, etc)
![Page 32: Business Identification: Local Neighborhood Alexander Darino](https://reader035.vdocuments.us/reader035/viewer/2022062718/56649e875503460f94b8aca8/html5/thumbnails/32.jpg)
Thank You