london amazon cloudsearch meetup jon handler

53
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. London Amazon CloudSearch Meetup Jon Handler

Upload: hanzila

Post on 24-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

London Amazon CloudSearch Meetup Jon Handler . Agenda. CloudSearch technical overview (Jon Handler, Amazon CloudSearch Solution Architect) NakedWines and CloudSearch (Matt Reid, Developer at NakedWines ) Searching Wikipedia with Amazon CloudSearch (Iain Fletcher, Search Technologies) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

London Amazon CloudSearch Meetup

Jon Handler

Page 2: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

AgendaCloudSearch technical overview (Jon Handler, Amazon CloudSearch Solution Architect)

NakedWines and CloudSearch (Matt Reid, Developer at NakedWines)

Searching Wikipedia with Amazon CloudSearch (Iain Fletcher, Search Technologies)

Building UI with CloudSearch (Stefan Olafsson, Co-Founder, Twigkit)

Page 3: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 4: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What is SearchShoes

Page 5: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Do You Want Search With That?

Build your own – database, home-rolled, site search

Open source

Legacy enterprise search

Page 6: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Challenges

Complex, expertise required

Costly, often with up-front expenditure

Long time to market, innovation and experimentation are slowed

Operational overhead is undifferentiated work

Page 7: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon CloudSearch

Pay for infrastructure you need when you need itLow costNo need to guess capacityExperiment fast with low riskWe do the undifferentiated heavy liftingGo global in minutes

Page 8: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon CloudSearch ArchitectureDNS / Load Balancing

Search API Console

SEARCH SERVICE

DocSvc API

CommandLine Tools

Console

DOCUMENT SERVICE

AWS Query

ConfigAPI

CommandLine Tools

Console

CONFIG SERVICE

Search Domain

Page 9: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Automatic Scaling

SEARCH INSTANCEIndex Partition n

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 2

SEARCH INSTANCEIndex Partition n

Copy 2

SEARCH INSTANCEIndex Partition 2

Copy n

SEARCH INSTANCE

DATA Document Quantity and Size

TRAFFICSearch Request Volume and Complexity

Index Partition nCopy n

SEARCH INSTANCEIndex Partition 1

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 1

SEARCH INSTANCEIndex Partition 1

Copy 2

SEARCH INSTANCEIndex Partition 1

Copy n

Page 10: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

SEARCH INSTANCEIndex Partition n

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 2

SEARCH INSTANCEIndex Partition n

Copy 2

SEARCH INSTANCEIndex Partition 2

Copy n

SEARCH INSTANCEIndex Partition n

Copy n

SEARCH INSTANCEIndex Partition 1

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 1

SEARCH INSTANCEIndex Partition 1

Copy 2

SEARCH INSTANCEIndex Partition 1

Copy n

ComputeStorage

Load BalancingSecurity

Page 11: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 12: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search

Page 13: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Highly Relevant Results

Page 14: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Faceted Drilldown

Page 15: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Integer Range Searching

Page 16: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Complex Queries

Page 17: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query

564

726

123

Ranking

564

726

123

SortingFilteringMatching

Search Query Processing

Page 18: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Reference Architecture

Page 19: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Create An Amazon CloudSearch Domain

Page 20: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text fields for matching user terms

Result enabled to retrieve source data

Page 21: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Literal fields for Faceting

Facet enabled to retrieve facets

Search enabled for narrowing

Page 22: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Integer fields for ranking, narrowing

Page 23: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Configure the Domain

Page 24: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data Preparation and Upload

Search Documents

ExtractSDF Batch

Amazon CloudSearch

POST

Page 25: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

CloudSearch SDF[{"type":"add", "id": "b007oznzg0", "version": 1, "lang": "en", "fields": { "title":"Kindle Paperwhite", "description":"World's most advanced e-reader", "category": ["Electronics","eBook Readers"], "price":11900} }, ...]

Page 26: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Document Service API

http(s)://< document service endpoint >/2011-02-01/documents/batchAccept: application/json Content-Length: 1176 Content-Type: application/json Host: doc.imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com

[{"type": "add","id":"b007oznzg0","version": 1,"lang": "en","fields": {"title":"Kindle Paperwhite","description":"World's most advanced e-reader","category":["Electronics","eBook Readers"],"price":11900} },{ "type": "delete", "id": "tt0434409", "version": 1337648735 } ]

Page 27: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Service APIhttp(s)://< search service endpoint>/2011-02-01/search?

Simple searches• q= text

Boolean combination of fields• bq= (or field:'value1' (and field:'value2' field:'value3'))

Faceting• facet= comma separated list of facet fields

Pagination• start=, size=

Customized ranking• rank= sort results based on the rank expression provided

Page 28: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Results{"rank": "-text_relevance","match-expr": "(label 'kindle paperwhite')","hits": { "found": 204, "start": 0, "hit": [ { "id": "sontsst12cf5f88b42" }, { "id": "sopvopr12ab017f082" }, { "id": "sorzrpw12ac468a13b" }, ] },...}

Page 29: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Customizing Ranking

Rank expressions• Compute a score for each document• &rank=<function>

E.g. recency based

Page 30: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Customizing Ranking With Queries

Define rank expressions in your query• &rank-recency=text_relevance + (1 / (2012 - year)) * 100• &rank=-recency

Uses• A/B testing• User-customized searches• Geo-searching

Page 31: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

IMDB DATA DEMO

Page 32: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Pricing

Get started for just $2.40/day; $75/month

AWS Calculator http://calculator.s3.amazonaws.com/calc5.html

Free Trial

Page 33: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Wrap Up

Powerful search is a critical component of today's applications

Amazon CloudSearch makes adding search easy

Create a domain, POST documents, GET search results

Page 34: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Resources and Q&AAmazon CloudSearch Overview Pagehttp://aws.amazon.com/cloudsearch/• FAQs• Community Forum• Documentation & Getting Started Tutorial (IMDb)

Contact our EU business development team• http://aws.amazon.com/contact-us

Page 35: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Thank You

Jon Handler / [email protected]

Page 36: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Searching Wikipedia with Amazon CloudSearch

Iain [email protected]

Page 37: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Engine ExpertiseMicrosoft SharePoint/FASTGoogle Search ApplianceSolrAmazon CloudSearch LucidWorksAttivioExaleadAutonomyMarkLogicelasticsearchVivisimoSinequaHadoopSphinx…..

37

Page 38: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

400+ Customers

Page 39: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Searching Wikipedia with Amazon CloudSearch

Iain [email protected]

Page 40: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Agenda

Project BackgroundHigh-level ArchitectureSummary & Observations

Page 41: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Project Background

Amazon contracted with Search Technologies to help with beta-testing, prior to the launch of Amazon CloudSearchDecision to use Wikipedia as a convenient data set for testing purposes

41

Page 42: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

High-level Architecture42

Page 43: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Indexing

Wikipedia provides content in a series of large xml filesAmazon CloudSearch ingests xml in a specified formVarious content processing tasks to perform• Splitting into individual documents• Date normalization• Metadata extraction & mapping• Cleanup, etc.

We used Aspire for these tasks

43

Page 44: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Aspire in BriefBased on Apache Felix / OSGi• Thread-safe, multi-threaded, distributable• Any number of pipelines, conditional branching• Plug-in components individually testable & upgradable• In use with FAST ESP, FS4SP, Solr, Amazon CloudSearch, GSA.• Tested with Elasticsearch and SP 2013

44

Page 45: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

XML Input45

Page 46: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Indexing

Streaming Wikipedia Dump Files directly into CloudSearch500 docs/second achieved without much effort• Using 4 x XL instances of CloudSearch• 1 x XL EC2 instance for Aspire

46

Page 47: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Searching

Amazon CloudSearch provides a RESTful/XML interface for search purposesFor the Wikipedia project, we needed a UI• Chose to use Twigkit• Wrote a Java API for CloudSearch • The Java API is freely downloadable (with source) at http://

www.searchtechnologies.com/java-api-amazon-cloudsearch.html

47

Page 48: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

SearchingSupports navigators and relevancy customization• E.g. a “PageRank” style link analysis

was performed

Limits set high: E.g. retrieve 500,000 results in a single list, delivered in just a few seconds• Hugely useful for analysis applications

So, what does it look like?

48

Page 49: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

wikipedia.searchtechnologies.com 49

Page 50: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

wikipedia.searchtechnologies.com50

Page 51: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Summary & Observations

A capable and scalable “raw” engine• xml in, RESTful/xml out• Easy to set up – much the same as an EC2 instance• Elastic scalability

51

Page 52: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Summary & Observations

Cost effective• From $75 per month, including management /

maintenanceExtremely convenient• Switch on / off at leisure• Promotes experimentation & agility

52

Page 53: London Amazon CloudSearch  Meetup Jon Handler

© 2012 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Iain [email protected]

For further details, see Paul Nelson’s blog at www.searchtechnologies.com