using amazon cloudsearch with databases - cloudsearch meetup 061913

Post on 15-Jan-2015

2.973 Views

Category:

Business

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation on using Amazon CloudSearch with databases. What to use when? How can you use CloudSearch with a database? Tom Hill, Solutions Architect, Amazon CloudSearch

TRANSCRIPT

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Searching for Success

Amazon CloudSearch and Relational Databases

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Agenda

Finding things• Types of Databases

Making Choices

What is CloudSearch?

Combining CloudSearch with Relational

Sample Code

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Finding Things

So Many Databases

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Finding Your Information

Your users need to find things• What do you use?

A Database!• What Kind?

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

It's a Big World Out There!

"Database" != "Relational Database"

Tons of relational databases• Amazon RDS• MySQL• MSSQL• Oracle

but…

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Many Other Types

NoSQL databases• Dynamo, Cassandra, CouchDB…

Graph databases• Neo4J, Titan, …

Column oriented databases• Redshift, Bigtable…

Text Search Engine• CloudSearch, Lucene, Autonomy...

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engine

Good at text queries• "Harry Potter and the Philosopher's Stone"

Harry Potter and the Philosopher's Stone

harry potter and the philosopher's stone

harry potter and the philosopher stone

harry potter philosopher stone

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engine

Basic element is the document

Documents are made of fields"title" => "star wars"

Fields can be• Missing• Multi-valued• Variable length

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engine

Documents are not "normalized"• In a relational database

• A movie table• A director table• An actor table

• In CloudSearch• One document per movie

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

RelationalID Document

1 title:star trek actor: chris pine zacchary quinto zoe saldana directory: j j abrams

ID Title

1 Star Wars

2 Star Trek

3 Dark Star

ID Actor

1 Zacchary Quinto

2 Chris Pine

3 Zoë Saldana

ID Director

1 J.J. Abrams

2 George Lucas

3 John Carpenter

Text Search Engine

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Relevance

Key differentiator for text search

Not "does this match?"• "how WELL does this match?

Includes multiple factors• Term Frequency, Document Frequency, Proximity

Users can customize this• Distance• Popularity• Field Weighting

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text is more than "War & Peace"

It's not just books & blog posts

Meta-data• Author, Title, Category, Tags• Can include numbers: counts, dates, latitude,…

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Making Choices

Relational? CloudSearch?

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Relational Database

Good at • Exact matches• Joins• Atomic Transactions

Not so good at• Relevance

• How well does this match?

• Handling words

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Search Engines

Good at finding • Words, Phrases• Relevance

Not so good at• Joins• Transactions

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Options for Search

Can I just use a relational database?• Yes.

Do I want to just use a relational database?• Probably not

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Simple Approach

Widely supported, easy

SELECT id, title FROM books WHERE title LIKE "%amazon%"

Does not perform well

Doesn't deal with multiple words

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Text Extensions for Relational Databases

Vendor specificSELECT id,title FROM books WHERE MATCH(title) AGAINST('Harry Potter') IN NATURAL LANGUAGE MODE

• Use different index structures• Typically MUCH less mature than relational code• More manual processes

• Scaling, (if possible)• Managing

• minimal relevance, no control

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Appropriate Tools

VS

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Options

Relational database• Weak relevance• Scaling & performance limits

Text Search Engine• No transactions & locking• No Joins

Both• Some extra effort, then best of both worlds

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What is Amazon CloudSearch?

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

CloudSearch

Fully-managed text search engine

High Performance

Automatically Scaling

Reliable, Resilient

Based on Amazon Product Search

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Features

Faceting

Complex queries• (and 'potter harry' (not author:'rowling'))

Configurable synonyms, stemming & stopwords

Custom Sorting/Ranking

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Scaling

CloudSearch scales automatically• Handle your spikes• Plan for success, but don't spend until you need it• Handle more data• Scaling is seamless – no downtime

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Automatic Scaling

SEARCH INSTANCEIndex Partition n

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 2

SEARCH INSTANCEIndex Partition n

Copy 2

SEARCH INSTANCEIndex Partition 2

Copy n

SEARCH INSTANCE

DATA Document Quantity and Size

TRAFFICSearch Request Volume and Complexity

Index Partition nCopy n

SEARCH INSTANCEIndex Partition 1

Copy 1

SEARCH INSTANCEIndex Partition 2

Copy 1

SEARCH INSTANCEIndex Partition 1

Copy 2

SEARCH INSTANCEIndex Partition 1

Copy n

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Easy to Use

Rest API

Simple to add• Http Post

Simple to query• q=star trek

Simple to integrate• JSON

Documents

CloudSearch

Queries

HTTP

HTTP

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon CloudSearch Architecture

DNS / Load Balancing AWS Query

Search API Console ConfigAPI

CommandLine Tools

ConsoleDoc Svc API

CommandLine Tools

Console

SEARCH SERVICE DOCUMENT SERVICE CONFIG SERVICE

Search Domain

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What Can You Search For With CloudSearch?

Wine

Your college buddies

Curly hair products

Downton Abbey episodes

News in Bermuda

Playoff tickets

Online courses

Cat memes

Furniture

Doctor reviews

Take out food

Vacation rentals

Trademarks

African safaris

Kids arts & crafts

French dating/marriage

Online videos

Recipes

Weather insurance

Fashion news

Bollywood music

Stock artAnd more!

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Combining CloudSearch+

Relational Database

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Combining the Two

Best of both worlds• Relational queries run on relational database• Text queries run on CloudSearch

Downside: Complexity• More moving parts• Synchronization

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Synchronization

Which one is the master?• Usually the relational database

Updates• All at once• At regular intervals• When data is available

Deletes

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Simultaneous updates

RDBMS

CloudSearch

LoaderSource

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Two loaders

RDBMS CloudSearchLoaderSource

Loader

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Log updates

Two loader

RDBMS CloudSearchLoaderSource

Log Loader

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

RDBMS CloudSearchLoaderSource

Log Loader

Source

Source

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Sample Code

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Dataflow

One source

Two loaders

RDBMS CloudSearchLoaderSource

Loader

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Java Example

Read from MySQL• JDBC – Nothing special

Post to CloudSearch• Apache HTTP Client

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Libraries

Apache • HTTP Client• HTTP Core• Commons Logging

AWS Java SDK

MySQL connector

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Source Files

CloudSearchRDS• Just does the setup for the demo

ExtractAndUpload• Does the main work

Batcher• Groups documents into batches

PosterHttp• Posts to CloudSearch

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Main Loop

ResultSet rs = stmt.executeQuery("select * from movies");ResultSetMetaData meta = rs.getMetaData();for (int col = 1; col <= meta.getColumnCount(); col++)

names.add(meta.getColumnName(col));while (rs.next()) {

int version = (int) (lastModified.getTime() / 1000);JSONObject doc = new JSONObject();for (String name : names) {

doc.put(name, rs.getString(name));}String id = rs.getString("id");if (batcher != null) {

batcher.addDocument(doc, version, id);}

}

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

SQL

select * from movies;

select key as id, title as name from movies

Denormalizing may require multiple queries

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Demo

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search: It's not just for Relational Data

You can pull data from • S3• Redshift• Web• Internal Documents• And more…

And make it searchable

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Indexing S3

ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(bucketName);

ObjectListing objectListing;

do {

objectListing = s3client.listObjects(listObjectsRequest);

for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {

processObject(objectSummary);

}

listObjectsRequest.setMarker(objectListing.getNextMarker());

} while (objectListing.isTruncated());

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Summary

Use the right tool!• Text Search for Searching Text

CloudSearch is fully managed text search

Easy to get data from relational DB

Easy to load data into CloudSearch

© 2013 Amazon.com, Inc. and its affiliates.  All rights reserved.  May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Next Step: Free Trial

One month (750 hours) free.

Set up an account

Give it a try!

Questions? • TomHill@amazon.com

top related