how to tune search requests using amazon cloudsearch

40
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. How to Tune Search Requests Tom Hill / [email protected]

Upload: amazon-web-services

Post on 20-Aug-2015

2.000 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

How to Tune Search Requests

Tom Hill / [email protected]

Page 2: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Agenda

!   Query Processing !   Common Issues !   Benchmarking !   Analytics

Page 3: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

…query processing

Page 4: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query

564

726

123

Ranking

564

726

123

Sorting Filtering Matching

Query Processing

Page 5: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query Processing !   Matching

•  Text Fields •  Literal Fields

!   Filtering •  Numeric terms, ranges

!   Ranking •  Score computation for each document

!   Sorting •  Based on rank computation, alphabetic, numeric

!   Faceting

Page 6: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Matching !   The starting point for results !   Full text matching with “text” fields !   Exact matching with “literal” fields    

Page 7: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Filtering !   UINT fields

•  Numbers •  Numeric ranges

!   After all of these you get results:  <hits  found="2432"  start="0">  

Page 8: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Ranking !   Score can be

•  Text Relevance •  Rank expression

!   Done for every document that makes it past filtering

Page 9: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Sorting !   Last step !   Again, cost a function of match set size !   Sort by

•  Text Relevance •  Rank Expression •  Field Value

Page 10: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Performance: Match Set Size

AND color:black AND age:0..6 cat|dog <all documents>

Increasing Performance

Page 11: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

…common issues

Page 12: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Literals vs Uints

!   Literals will tend to improve performance ! Uints will tend to take less space

Page 13: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query Restriction: Literals Vs. Uints

 GeoMethod        TextRel          Limits        Queries    Seconds    QTimeMS        Threads  CompletedQ            AveHits              NONE            false                                        10      6.2255            622                    1                  10      8345450.00    CARTESIAN            false                                        10    15.6064          1560                    1                  10      8345450.00              EQUI            false                                        10    19.7106          1971                    1                  10      8345450.00        COSINES            false                                        10    27.4968          2749                    1                  10      8345450.00    HAVERSINE            false                                        10    31.2595          3125                    1                  10      8345450.00                NONE            false        Numeric                  10      9.1758            917                    1                  10            3807.00    CARTESIAN            false        Numeric                  10      9.0255            902                    1                  10            3807.00              EQUI            false        Numeric                  10      9.1158            911                    1                  10            3807.00        COSINES            false        Numeric                  10      9.8321            983                    1                  10            3807.00    HAVERSINE            false        Numeric                  10      9.1272            912                    1                  10            3807.00                NONE            false        literal                  10      0.8254              82                    1                  10            3781.00    CARTESIAN            false        literal                  10      0.5936              59                    1                  10            3781.00              EQUI            false        literal                  10      0.6173              61                    1                  10            3781.00        COSINES            false        literal                  10      0.5916              59                    1                  10            3781.00    HAVERSINE            false        literal                  10      0.6289              62                    1                  10            3781.00    

No Restriction

UINT Restriction

Literal Restriction

Page 14: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Negative Queries

!   CloudSearch supports negative queries •  &q=-amazon

!   Can match all documents •  if "doesntmatchanything" matches 0 docs, then

-doesntmatchanything will match all docs.

!   Matching all docs means lots of computations •  Sorting, rank expressions, etc.

Page 15: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Implicit Limits

!   If the user doesn't give you restrictions, add some!

!   Top items for their category !   Add implicit limits to broad queries

•  &bq=important:1 •  &bq=population:10000..

!   select the best

Page 16: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Wildcard Queries

!   Expands the query terms •  a* => aardvark, aaron, … azimuth •  Limited to first 2000 terms.

•  But that's still 2000 terms! •  Match set is all docs that contain any one of the terms

!   Match set gets big!

a* a aaron after apple

doc1 doc3

doc99 doc3

doc9 doc50 doc110 doc5

doc17 doc87 doc117 doc18

doc80 doc111

doc111

doc85 doc90

Page 17: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Wildcard Queries

!   Stemming •  cats* doesn't match cats •  cats is stemmed to "cat", but wildcards are not stemmed

!   Avoid negative wildcard queries •  -cat works fine

•  but may take a while to execute •  -cat123 may confuse you. It becomes:

•  -cat 123

Page 18: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Boolean Queries

!   Used to restrict your searches •  Faceting (e.g. category) •  Date •  Security

!   &bq=(and  title:'potter'  author:'rowling')  !   Can improve performance !   Can slow performance

Page 19: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Boolean Queries

!   Effects of additional AND !   Effects of Additional OR

Color

Style Size

Color

Style Size

Color

Size Style

Page 20: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Rank Expressions Review

!   Used to enhance search results !   Include non-text factors in scoring

•  Popularity (likes/upvotes) •  Distance •  Price/Profit

&rank-­‐pop=((0.3*popularity)/10.0)+(0.7*text_relevance)&rank=pop  

Page 21: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Rank Expressions

!   Execute once for each document in match set •  Reduce your match set with text matches, literals, uints •  Done after filters applied

Faster! Slower

Page 22: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Rank Expressions

!   Avoid queries that match all (or many) documents •  -foo •  state:CA

! Precompute static parts of rank expression •  sqrt(rating)+200*log10(doc.votes) + 20*log10(doc.sales) =>

"precomputed"

!   Add implicit limits to broad searches

Page 23: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Default Search Field

!   Searches all text fields by default •  Usually the right thing •  If not, remove unneeded fields from it

!   Changeable via the API !   Alternative: explicitly select one field to search

•  title:'harry potter'

Page 24: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Target Slow Queries

!   Slow searches affect other searches !   Optimizing the slowest search requests may speed up all

of your searches. !   Slow queries can produce timeouts (507s)

Page 25: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

…benchmarking

Page 26: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Testing Latency

!   Hard to estimate •  Depends on queries, usage patterns, data…

!   Build a domain & test •  It's the cloud, spin up & down as needed! •  Use your own data •  As close to real usage as you can

•  Log replay is good!

!   A/B Test your changes

Page 27: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Two Approaches

!   Testing •  Just run some queries

!   Benchmarking •  Run enough to have some statistical validity

Page 28: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Testing Approaches

!   Statistics provided

!   Browser tools •  Chrome •  Firebug for Firefox

Page 29: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Benchmarking

!   Apache JMeter (or similar) •  Well documented •  Well tested •  General

•  This is good, and bad !   Custom Code

•  Usually looks a lot like your application •  So, not as much code as you might think •  Flexible

Page 30: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Custom Code

!   Multithread it for realistic results •  If it's simulating more than one client

!   Make sure the clients (tester) isn't the bottleneck •  Benchmark it with searches stubbed •  Avoid languages with global interpreter locks

!   Personally, I use: •  Java •  Apache Http Client

Page 31: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Custom Code Example  LinkedList<Thread>  threads  =  new  LinkedList<Thread>();    int  nQ  =  queue.size();    long  time  =  System.nanoTime();    List<Consumer>  consumers  =  new  LinkedList<Consumer>();    for  (int  i  =  0;  i  <  threadCount;  i++)  {      Consumer  c1  =  new  Consumer(queue);      consumers.add(c1);      Thread  t  =  new  Thread(c1);      threads.add(t);      t.start();    }    //  Wait  for  signal  that  we  have  processed  all  queries.    for  (Thread  thread  :  threads)  {      thread.join();  

 

Page 32: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Sample Data & Queries

!   Sample Data – must be realistic •  BAD: a123 b456 xyzzyx •  Better: use Wikipedia, project Gutenberg •  Best: Your own data

!   Queries •  BAD: random words •  Better: read words from test data •  Best: Replay log files •  Always check your number of responses in benchmarks.

•  It's easy to get fast queries, if you get no hits.

Page 33: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

…analytics

Page 34: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Analytics & Metrics

!   You have to know what to tune !   CloudSearch Metrics !   Custom Logging

Page 35: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

CloudSearch Metrics

!   Top Queries !   Zero result queries

•  Lack of data? •  Or query issues?

Page 36: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Custom Logging

!   Log all requests on your end !   Watch

•  Longest running queries •  Failed Queries •  HTTP error codes

!   Track Changes over time

Page 37: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Warning Signs

!   Know your http error codes •  500 series can be retried •  May indicate server is overloaded •  Long queries can tie up threads until timeout

•  More of an issue on small servers.

Page 38: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

…wrap up

Page 39: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Wrap Up

!   1) Limit match set size !   2) Limit Match set size J !   Be aware of the cost of features

•  Test/Benchmark !   Resources

•  Slides will be on meetup group •  http://jmeter.apache.org/ •  https://getfirebug.com/ •  http://en.wikipedia.org/wiki/Category:Load_testing_tools

Page 40: How to Tune Search Requests using Amazon CloudSearch

© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Thanks!

Tom Hill [email protected]