geospatial search with amazon cloudsearch
DESCRIPTION
Presented by Tom Hill, Amazon CloudSearch Solution Architect, at the LA Amazon CloudSearch User Meetuo, this talk covers using location & distance as factors in search. Techniques for measuring proximity are discussed as well as performance considerations.TRANSCRIPT
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
GeoSpatial Search in
Amazon CloudSearch
Tom Hill January 30, 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
! What is GeoSpatial search? ! Why do we care? ! Computing distance ! Geospatial Search in CloudSearch
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What is GeoSpatial Search?
! Using location & distance as factors in search
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What's this "Geo", anyway?
! Geographic • On the earth
! Spatial • Simple distance
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
How do we use distance?
! Limit to an Area • Box • Circle • Polygon*
! Sort by Distance ! Include distance in score
*not yet!
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Why Do You Care?
! People care about things near them. • Pizza, Classified Ads, etc. • Find a Doctor, Lawyer,…
! Mobile is a key driver
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What can you do with CloudSearch?
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Computing Distance
! Many Formulas • Rectangular distance • Equirectangular projection • Spherical Law of Cosines • Haversine Formula • Vincenty's Formula
! Speed Vs. Accuracy • Speed: Rectangular distance • Accuracy: Vincenty's Formula
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Why So Many Ways to Compute Distance?
The earth isn't flat! It isn't a sphere either.
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Is the Earth Flat?
! If it's flat • distance = sqrt((lat1-lat2)^2 + (lon1-lon2)^2)
! If it's not
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
How Much Accuracy?
! "Pizza, 1 Mile" ! Haversine is more accurate
• If you are a bird
! Any distance is approximate
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Comparing Distance Computations
! Four computations: Haversine, Cosines, EquiRrect, Rect. ! Different computations – Different Results ! How accurate do you need to be?
Haversine Cosines EuqiRect Rect - CosErr EquErr RecErr 994.79893 994.79893 995.25921 1044.40926 - 0.000 0.000 0.050 Fort Lauderdale, FL to Anniston, AL 624.04339 3624.04339 3642.98321 4163.41737 - 0.000 0.005 0.149 Fort Lauderdale, FL to San Diego, CA1812.54997 1812.54997 1814.38516 1871.77660 - 0.000 0.001 0.033 Fort Lauderdale, FL to New Haven, CT8175.93897 8175.93897 8817.96563 11107.90729 - 0.000 0.079 0.359 Fort Lauderdale, FL to Adak, AK7244.45661 7244.45661 8008.74698 12015.96646 - 0.000 0.106 0.659 Bangor, ME to Adak,AK
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
GeoSpatial Search in
Amazon CloudSearch
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
How to Compute Distance in CloudSearch?
! Rank Expressions • Computations run for each matching document
! Can be used for • Sorting • Influencing Scoring
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Two Types of Rank Expressions
! Static Rank Expressions • Computation based on values in index
! Query Time Rank Expressions • Allow including parameters at run time.
• e.g. latitude, longitude.
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Query Time Rank Expressions
! Define Rank Expression • &rank-NAME=EXPRESSION • &rank-geo=sqrt(pow(lat-userlat),2)+pow(lon-userlon),2))
! Select Rank Expression • &rank=NAME • &rank=geo
http://searchendpoint?q=creek&rank=geo&rank-geo=sqrt(pow(la1-123),2)+pow(lo1-456),2))
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Influencing Scoring
! Include both text relevance and distance:
&rank-geo=(1000-text_relevance) + sqrt(pow(la1-ulat),2)+pow(lo1-ulon),2))
! Relative Weight • N * text_relevance + M * distance • That's where the art comes in. • Will vary by your application. • Test & tune, and test again.
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Storing Data
! CloudSearch supports unsigned integers ! Have to convert latitude, longitude to positive ranges
• latitude + 90 • longitude + 180
! Have to store as integers; need to scale • latitude = (latitude + 90) * 100 • longitude= (longitude+ 180) * 100
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Performance
! Don't query the whole world • Can limit by literals or numeric fields. • Literals are more efficient for limits.
! Limit Options • Literal
• &bq=state:'CA' • &bq=zip:'94402'
• Numeric • &bq=(and latitude:40..50 longitude:80..85)
• Geohash
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Performance Measures
!GeoMethod!!!!TextRel!!!!!Limits!!!!Queries!!Seconds!!QTimeMS!!!!Threads!CompletedQ!!!!!!AveHits!!!!!!!NONE!!!!!!false!!!!!!!!!!!!!!!!!!!!10!!!6.2255!!!!!!622!!!!!!!!!!1!!!!!!!!!10!!!8345450.00!!CARTESIAN!!!!!!false!!!!!!!!!!!!!!!!!!!!10!!15.6064!!!!!1560!!!!!!!!!!1!!!!!!!!!10!!!8345450.00!!!!!!!EQUI!!!!!!false!!!!!!!!!!!!!!!!!!!!10!!19.7106!!!!!1971!!!!!!!!!!1!!!!!!!!!10!!!8345450.00!!!!COSINES!!!!!!false!!!!!!!!!!!!!!!!!!!!10!!27.4968!!!!!2749!!!!!!!!!!1!!!!!!!!!10!!!8345450.00!!HAVERSINE!!!!!!false!!!!!!!!!!!!!!!!!!!!10!!31.2595!!!!!3125!!!!!!!!!!1!!!!!!!!!10!!!8345450.00!!!!!!!!NONE!!!!!!false!!!!Numeric!!!!!!!!!10!!!9.1758!!!!!!917!!!!!!!!!!1!!!!!!!!!10!!!!!!3807.00!!CARTESIAN!!!!!!false!!!!Numeric!!!!!!!!!10!!!9.0255!!!!!!902!!!!!!!!!!1!!!!!!!!!10!!!!!!3807.00!!!!!!!EQUI!!!!!!false!!!!Numeric!!!!!!!!!10!!!9.1158!!!!!!911!!!!!!!!!!1!!!!!!!!!10!!!!!!3807.00!!!!COSINES!!!!!!false!!!!Numeric!!!!!!!!!10!!!9.8321!!!!!!983!!!!!!!!!!1!!!!!!!!!10!!!!!!3807.00!!HAVERSINE!!!!!!false!!!!Numeric!!!!!!!!!10!!!9.1272!!!!!!912!!!!!!!!!!1!!!!!!!!!10!!!!!!3807.00!!!!!!!!NONE!!!!!!false!!!!literal!!!!!!!!!10!!!0.8254!!!!!!!82!!!!!!!!!!1!!!!!!!!!10!!!!!!3781.00!!CARTESIAN!!!!!!false!!!!literal!!!!!!!!!10!!!0.5936!!!!!!!59!!!!!!!!!!1!!!!!!!!!10!!!!!!3781.00!!!!!!!EQUI!!!!!!false!!!!literal!!!!!!!!!10!!!0.6173!!!!!!!61!!!!!!!!!!1!!!!!!!!!10!!!!!!3781.00!!!!COSINES!!!!!!false!!!!literal!!!!!!!!!10!!!0.5916!!!!!!!59!!!!!!!!!!1!!!!!!!!!10!!!!!!3781.00!!HAVERSINE!!!!!!false!!!!literal!!!!!!!!!10!!!0.6289!!!!!!!62!!!!!!!!!!1!!!!!!!!!10!!!!!!3781.00!!
Why you don't want query the whole world!
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Geo-Spatial Demo Application
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Demo Application Structure
HTML Page
Javascript Server (Tomcat)
CloudSearch Ajax
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Demo Implementation
! JavaScript • Ajax • JQuery • Google Maps API
! Twitter Bootstrap • css
! Tomcat Server • Java • Just for forwarding of requests
• Because XSS, that's why.
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Querying CloudSearch $.ajax({(
(url(:("searchx",((data(:({(( ('q'(:(currentQuery,(( ('domain'(:("geoname25",(( ('return@fields'(:(returnFields.join(),(( ("rank"(:("geo",(( ("rank@geo"(:("Math.sqrt(Math.pow(Math.abs(doc.latitude_90@(12539),2)(+(
Math.pow(Math.abs(doc.longitude_180@(5784),2))"((},((dataType(:("json",((success(:(function(data)({(( (var(hits(=(data['hits'];(( (displaySearchResults(hits['hit'],(hits['found']);(( (populateMap(hits['hit']);((}(
});(
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Wrap Up
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Thanks for Coming!
! Data • http://www.geonames.org/export/
! Slides • On the meetup group soon
! Sample Code • Talk to me. ([email protected])
! Computations • http://www.movable-type.co.uk/scripts/latlong.html • wikipedia
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Querying Data ! Java/Javascript
• Fields are latitude_90, longitude_180 • user location is userlat, userlon
! simple distance
rank!=!"Math.sqrt(Math.pow(Math.abs(latitude_90W("!+!userlat!+!")),2)+Math.pow(Math.abs(longitude_180W("!+!userlon!+!")),2))";!!
! Spherical Law of Cosines
rank = "6371*Math.acos(Math.sin(" + userlat + ") * Math.sin(lat_rad/" + scale + ") + Math.cos(" + userlat + ") * Math.cos(lat_rad/" + scale + ") * Math.cos((lon_rad/" + scale + ") - " + userlon + ") )";!
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What CloudSearch Doesn't Do
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Issues with assuming the earth is flat.