using amazon cloudsearch with databases - cloudsearch meetup 061913

48
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Searching for Success Amazon CloudSearch and Relational Databases

Upload: michael-bohlig

Post on 15-Jan-2015

2.973 views

Category:

Business


5 download

DESCRIPTION

Presentation on using Amazon CloudSearch with databases. What to use when? How can you use CloudSearch with a database? Tom Hill, Solutions Architect, Amazon CloudSearch

TRANSCRIPT

Page 1: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Searching for Success

Amazon CloudSearch and Relational Databases

Page 2: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Agenda

Finding things• Types of Databases

Making Choices

What is CloudSearch?

Combining CloudSearch with Relational

Sample Code

Page 3: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Finding Things

So Many Databases

Page 4: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Finding Your Information

Your users need to find things• What do you use?

A Database!• What Kind?

Page 5: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

It's a Big World Out There!

"Database" != "Relational Database"

Tons of relational databases• Amazon RDS• MySQL• MSSQL• Oracle

but…

Page 6: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Many Other Types

NoSQL databases• Dynamo, Cassandra, CouchDB…

Graph databases• Neo4J, Titan, …

Column oriented databases• Redshift, Bigtable…

Text Search Engine• CloudSearch, Lucene, Autonomy...

Page 7: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engine

Good at text queries• "Harry Potter and the Philosopher's Stone"

Harry Potter and the Philosopher's Stone

harry potter and the philosopher's stone

harry potter and the philosopher stone

harry potter philosopher stone

Page 8: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engine

Basic element is the document

Documents are made of fields"title" => "star wars"

Fields can be• Missing• Multi-valued• Variable length

Page 9: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engine

Documents are not "normalized"• In a relational database

• A movie table• A director table• An actor table

• In CloudSearch• One document per movie

Page 10: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

RelationalID Document

1 title:star trek actor: chris pine zacchary quinto zoe saldana directory: j j abrams

ID Title

1 Star Wars

2 Star Trek

3 Dark Star

ID Actor

1 Zacchary Quinto

2 Chris Pine

3 Zoë Saldana

ID Director

1 J.J. Abrams

2 George Lucas

3 John Carpenter

Text Search Engine

Page 11: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Relevance

Key differentiator for text search

Not "does this match?"• "how WELL does this match?

Includes multiple factors• Term Frequency, Document Frequency, Proximity

Users can customize this• Distance• Popularity• Field Weighting

Page 12: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text is more than "War & Peace"

It's not just books & blog posts

Meta-data• Author, Title, Category, Tags• Can include numbers: counts, dates, latitude,…

Page 13: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Making Choices

Relational? CloudSearch?

Page 14: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Relational Database

Good at • Exact matches• Joins• Atomic Transactions

Not so good at• Relevance

• How well does this match?

• Handling words

Page 15: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engines

Good at finding • Words, Phrases• Relevance

Not so good at• Joins• Transactions

Page 16: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Options for Search

Can I just use a relational database?• Yes.

Do I want to just use a relational database?• Probably not

Page 17: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Simple Approach

Widely supported, easy

SELECT id, title FROM books WHERE title LIKE "%amazon%"

Does not perform well

Doesn't deal with multiple words

Page 18: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Extensions for Relational Databases

Vendor specificSELECT id,title FROM books WHERE MATCH(title) AGAINST('Harry Potter') IN NATURAL LANGUAGE MODE

• Use different index structures• Typically MUCH less mature than relational code• More manual processes

• Scaling, (if possible)• Managing

• minimal relevance, no control

Page 19: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Appropriate Tools

VS

Page 20: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Options

Relational database• Weak relevance• Scaling & performance limits

Text Search Engine• No transactions & locking• No Joins

Both• Some extra effort, then best of both worlds

Page 21: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What is Amazon CloudSearch?

Page 22: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

CloudSearch

Fully-managed text search engine

High Performance

Automatically Scaling

Reliable, Resilient

Based on Amazon Product Search

Page 23: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Features

Faceting

Complex queries• (and 'potter harry' (not author:'rowling'))

Configurable synonyms, stemming & stopwords

Custom Sorting/Ranking

Page 24: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Scaling

CloudSearch scales automatically• Handle your spikes• Plan for success, but don't spend until you need it• Handle more data• Scaling is seamless – no downtime

Page 25: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Automatic Scaling

SEARCH INSTANCEIndex Partition n

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 2

SEARCH INSTANCEIndex Partition n

Copy 2

SEARCH INSTANCEIndex Partition 2

Copy n

SEARCH INSTANCE

DATA Document Quantity and Size

TRAFFICSearch Request Volume and Complexity

Index Partition nCopy n

SEARCH INSTANCEIndex Partition 1

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 1

SEARCH INSTANCEIndex Partition 1

Copy 2

SEARCH INSTANCEIndex Partition 1

Copy n

Page 26: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Easy to Use

Rest API

Simple to add• Http Post

Simple to query• q=star trek

Simple to integrate• JSON

Documents

CloudSearch

Queries

HTTP

HTTP

Page 27: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon CloudSearch Architecture

DNS / Load Balancing AWS Query

Search API Console ConfigAPI

CommandLine Tools

ConsoleDoc Svc API

CommandLine Tools

Console

SEARCH SERVICE DOCUMENT SERVICE CONFIG SERVICE

Search Domain

Page 28: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What Can You Search For With CloudSearch?

Wine

Your college buddies

Curly hair products

Downton Abbey episodes

News in Bermuda

Playoff tickets

Online courses

Cat memes

Furniture

Doctor reviews

Take out food

Vacation rentals

Trademarks

African safaris

Kids arts & crafts

French dating/marriage

Online videos

Recipes

Weather insurance

Fashion news

Bollywood music

Stock artAnd more!

Page 29: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 30: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Combining CloudSearch+

Relational Database

Page 31: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Combining the Two

Best of both worlds• Relational queries run on relational database• Text queries run on CloudSearch

Downside: Complexity• More moving parts• Synchronization

Page 32: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Synchronization

Which one is the master?• Usually the relational database

Updates• All at once• At regular intervals• When data is available

Deletes

Page 33: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Simultaneous updates

RDBMS

CloudSearch

LoaderSource

Page 34: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Two loaders

RDBMS CloudSearchLoaderSource

Loader

Page 35: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Log updates

Two loader

RDBMS CloudSearchLoaderSource

Log Loader

Page 36: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

RDBMS CloudSearchLoaderSource

Log Loader

Source

Source

Page 37: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Sample Code

Page 38: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Two loaders

RDBMS CloudSearchLoaderSource

Loader

Page 39: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Java Example

Read from MySQL• JDBC – Nothing special

Post to CloudSearch• Apache HTTP Client

Page 40: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Libraries

Apache • HTTP Client• HTTP Core• Commons Logging

AWS Java SDK

MySQL connector

Page 41: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Source Files

CloudSearchRDS• Just does the setup for the demo

ExtractAndUpload• Does the main work

Batcher• Groups documents into batches

PosterHttp• Posts to CloudSearch

Page 42: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Main Loop

ResultSet rs = stmt.executeQuery("select * from movies");ResultSetMetaData meta = rs.getMetaData();for (int col = 1; col <= meta.getColumnCount(); col++)

names.add(meta.getColumnName(col));while (rs.next()) {

int version = (int) (lastModified.getTime() / 1000);JSONObject doc = new JSONObject();for (String name : names) {

doc.put(name, rs.getString(name));}String id = rs.getString("id");if (batcher != null) {

batcher.addDocument(doc, version, id);}

}

Page 43: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

SQL

select * from movies;

select key as id, title as name from movies

Denormalizing may require multiple queries

Page 44: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Demo

Page 45: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search: It's not just for Relational Data

You can pull data from • S3• Redshift• Web• Internal Documents• And more…

And make it searchable

Page 46: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Indexing S3

ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(bucketName);

ObjectListing objectListing;

do {

objectListing = s3client.listObjects(listObjectsRequest);

for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {

processObject(objectSummary);

}

listObjectsRequest.setMarker(objectListing.getNextMarker());

} while (objectListing.isTruncated());

Page 47: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Summary

Use the right tool!• Text Search for Searching Text

CloudSearch is fully managed text search

Easy to get data from relational DB

Easy to load data into CloudSearch

Page 48: Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Next Step: Free Trial

One month (750 hours) free.

Set up an account

Give it a try!

Questions? • [email protected]