snapguide - cloudsearch
DESCRIPTION
TRANSCRIPT
Sam KimbrelSoftware Engineer
Share what you know
Monday, April 1, 13
[email protected] • confidential do not distribute
What is Snapguide?
• 1.5 million uniques/month
• ~2000 reqs/min across app and web
• Python (Pyramid/uWSGI/nginx)
• MySQL/Redis
• Built primarily on AWS: EC2, RDS, S3, SQS, SNS, CloudSearch, CloudFront
Monday, April 1, 13
[email protected] • confidential do not distribute
Monday, April 1, 13
[email protected] • confidential do not distribute
Monday, April 1, 13
[email protected] • confidential do not distribute
Monday, April 1, 13
[email protected] • confidential do not distribute
• Beta trial users after mentioning Solr on the phone (seriously!)
• Primary data set: guides
• Facets: guide topic, “featured” boolean, visibility/ACL flags
• “autocomplete” search (more later)
Snapguide on CloudSearch
Monday, April 1, 13
[email protected] • confidential do not distribute
{ "lang": "en", "fields": { "step_count": "14", "author_external_id": "qS878yliQ4mxg_9uHt2AZg", "author": "Claire Hesseltine", "items": [ "Preheat oven to 325 degrees Fahrenheit.", ... ], "title": "Make Brown Butter Sea Salt Cookies", "featured": 1, "summary": "The brown butter adds a nutty, caramel-like taste to these delicious cookies.", "topic": [ "desserts" ], "main_image_uuid": "43d201c8fd4b4833b83d3f95d112f1c1", "like_count": 761,
"public": "true" }, "version": 1364333310, "type": "add", "id": "9eabff97e32c4244a8205da3fba442e9"}
Monday, April 1, 13
[email protected] • confidential do not distribute
• Guide text search:
q=cookies
• Guide search with topic:
q=cookies&facet=topic&bq=topic:‘desserts’
• “Typeahead”/suggestion search:
bq=(or ‘paper flower’ ‘paper flower*’)
Queries
Monday, April 1, 13
[email protected] • confidential do not distribute
• Use “Compare Rank Expressions”
• text_relevance is your friend
• Goals:
• Boost popular/featured guides
• Make title/summary matches worth more than item (supplies, step text) matches
Result Ranking
Monday, April 1, 13
[email protected] • confidential do not distribute
min(cs.text_relevance({"weights":{"title":2.5, "author": 1.5, "items": 0.1, "summary": 1.5},
"default_weight":1}),1000)
+ min(200, like_count / 10)+ 100*featured
Monday, April 1, 13
[email protected] • confidential do not distribute
• Extracting guide data to update document is slow
• Remove update from online web request process
• Internal-only API endpoints
• SQS
• queue_consumer daemon
Offline index updates
Monday, April 1, 13
[email protected] • confidential do not distribute
Offline index updates
SQSWeb server
Queue consumer
CloudSearchWeb server
(dedicated to queues)
Snapguide DB/Redis
Monday, April 1, 13
[email protected] • confidential do not distribute
Performance
but physical proximity (us-west-1) is awesome
Monday, April 1, 13
[email protected] • confidential do not distribute
• Add more domains (users, new features)
• Search-based suggestion engine
• Improved ranking/scoring — crawl our social graph
Future work
Monday, April 1, 13