retail referencearchitecture productcatalog

Post on 10-Nov-2014

215 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

During this session we will cover the best practices for implementing a product catalog with MongoDB. We will cover how to model an item properly when it can have thousands of variations and thousands of properties of interest. You'll learn how to index properly and allow for faceted search with milliseconds response latency and how to implement per-store, per-sku pricing while still keeping a sane number of documents. We will also cover operational considerations, like how to bring the data closer to users to cut down the network latency.

TRANSCRIPT

One Catalog Service to rule them all

Antoine GirbalPrincipal Solutions Engineer, MongoDB Inc.@antoinegirbal

Problem Statement

3

The many catalogs problem

4

1. One department in charge of master product works hard at fitting data into SQL tables

2. Resulting data sits in a SQL server with a couple replicas. It's forbidden to hit it more than 100 times / sec

3. Other departments need to access the data way more often for their own services

4. Other departments need more information that is not available since it did not fit in that long devised rigid SQL schema

5. ETLs and Message Buses are put in place for other teams to try figure it out themselves…

6. Data becomes inconsistent, fragmented, not up-to-date…Problem visible both internally and by customers!

The many catalogs problem

5

How many Catalogs and

Catalog Caches do you have?

Search – Using Solr

6

The many catalogs problem

Online Store

Catalog

Marketing

Catalog

Department 3

Catalog

Product Department

MasterCatalog

Department 4

Catalog

Department 5

Catalog

Department 1

Catalog

Message Bus

ETLs

Dozens of catalogs!

7

• Single view of a product, one central catalog service

• Flexible schema containing all useful data

• Read volume high and sustained, 100k reads / s

• Can seamlessly take write spikes during catalog update

• Advanced indexing and querying

• Geographical distribution for HA and low latency

Goal: Single View of Product

8

1. MongoDB Overview

2. Catalog Service Architecture

3. Data Store Models

4. Product Search

Agenda

MongoDB Overview

10

• Holds complex JSON structures

• Dynamic Schema for Agility

• complex querying and in-place updating

• Secondary, compound and geo indexing

• full consistency, durability, atomic operations

• HA and geo-distributed via Replication

• Near linear scaling via Sharding

• Overall, MongoDB is a unique fit!

MongoDB is a great fit

11

MongoDB Strategic Advantages

Horizontally Scalable-Sharding

AgileFlexible

High Performance &Strong Consistency

Application

HighlyAvailable-Replica Sets

{ customer: “roger”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}

12

build your data to fit your application

Relational MongoDB{ customer_id : 1,

name : "Mark Smith",city : "San Francisco",orders: [ {

order_number : 13,store_id : 10,date: “2014-01-03”,products: [

{SKU: 24578234,

Qty: 3, Unit_price:

350},{SKU:

98762345, Qty: 1, Unit_Price:

110}]

},{ <...> }

]}

CustomerID First Name Last Name City0 John Doe New York1 Mark Smith San Francisco2 Jay Black Newark3 Meagan White London4 Edward Danields Boston

Order Number Store ID Product Customer ID10 100 Tablet 011 101 Smartphone 012 101 Dishwasher 013 200 Sofa 114 200 Coffee table 115 201 Suit 2

13

Notions

RDBMS MongoDB

Database Database

Table Collection

Row Document

Column Field

Catalog Service Architecture

15

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

Architecture Overview

Customer

ChannelsAmazon

Ebay…

StoresPOSKiosk

MobileSmartphone

Tablet

Website

Contact Center

APIData and Service

Integration

SocialFacebook

Twitter…

Data Warehouse

Analytics

Supply Chain Management

System

Suppliers

3rd Party

In Network

Web Servers

Application Servers

16

Commerce Functional Components

Information Layer

Look & Feel

Navigation

Customization

Personalization

Branding

Promotions

Chat

Ads

Customer's Perspective

ResearchBrowseSearch

SelectShopping Cart

PurchaseCheckout

ReceiveTrack

UseFeedbackMaintain

DialogAssist

Market / Offer

Guide

Offer

Semantic Search

Recommend

Rule-based Decisions

Pricing

Coupons

Sell / Fullfill

Orders

Payments

Fraud Detection

Fulfillment

Business Rules

InsightSession CaptureActivity

Monitoring

Customer Enterprise

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

17

Merchandising Components

Merchandising

MongoDB

Variant

Hierarchy

Pricing

Promotions

Ratings & Reviews

Calendar

Semantic Search

Item

Localization

19

MongoDB Data Store

Merchandising - Architecture

Items Pricing Promotions

VariantsRatings & Reviews

Search Engine

Product Service API

Online Store Marketing Inventory SCMS Public API …

Data Store Models

21

Models - Product Page

Product images

General Informatio

n

List of Variants

External Informatio

n

Localized Descriptio

n

22

• Item: the overall product info (e.g. Levi’s 501)

• Variant: a specific variant of an item (e.g. in black size 6) which typically has a specific SKU / UPC

• Price: price information may vary based on the store, the variant, etc

• Hierarchy: the item taxonomy

• Facet: facets to search products by

• Vendors: a given sku may be available through several vendors if the site is a marketplace

> Don't try to fit all in the same document!

Models - Overview

23

Hundreds of sizes

One Item

Dozens of colors

Models – Overview

24

• A single item may have thousands of variants

• Each variant can have hundreds of attributes

• Altogether a single item can represent many MBs worth of JSON text

• Don't try to fit everything into the same document!

• Use a schema that is natural and fits the API

Models - Overview

25

{ "_id": "054VA72303012P", // the item id "desc": [ // item descriptions { "lang": "en", "val": "Give your dressy look a lift with ..." }, ... ], "name": "Women's Kate Ivory Peep-Toe Stiletto Heel", "category": "/84700/80009/1282094266/1200003270", // hierarchy "brand": { "id": "2483510", "img": "http://...", "name": "Metaphor" }, "assets": { // references to all assets "imgs": [ { "img": { "width": 1900, "height": 1900, "src": "http://..." }, ... ] }, "shipping": { // shipping specs }, "specs": { // item specs }, "attrs": [ // list of items attributes (facets) { "name": "Heel Height", "value": "High (2-1/2 to 4 in.)" }, { "name": "Toe", "value": "Open toe" }, ... ], "variants": { // quick info on the variants "cnt": 9, "attrs": [ { "dispType": "DROPDOWN", "name": "Color" }, { "dispType": "DROPDOWN", "name": "Shoe Size" }, ... ] }, "lastUpdated": 1400877254787 // keep track of updates }

Models - Item Model

26

• Get item by id

db.definition.findOne( { _id: "301671" } )

• Get items from list of ids

db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } )

• Get items by department

db.definition.find({ category: { $regex: "^/84700/" } })

• Get items by category prefix

db.definition.find( { category: { $regex: "^/84700/80009/" } } )

• Secondary Indices

name, category, lastUpdated

Models - Item Model

27

{ "_id": "05458452563", // the sku

"name": "Width:Medium,Color:Ivory,Shoe Size:6.5",

"itemId": "054VA72303012P", // reference to the item id

"altIds": { "upc": "632576103580" },

"assets": { // list of assets specific to variant

"imgs": [

{ "width": 1900, "height": 1900, "src": "http://..." },

{ "width": 1900, "height": 1900, "src": "http://..." }, ...

]

},

"attrs": [ // list of attributes specific to variant

{ "name": "Width", "value": "Medium" },

{ "name": "Color", "family": "White", "value": "Ivory" },

{ "name": "Size", "value": "6.5" }, ...

],

"lastUpdated": 1400877254787 // keep track of updates }

Models – Variant Model

28

• Get variant from SKU

db.variant.find( { _id: "05458452563" } )

• Get all variants for a product, sorted by SKU

db.variant.find( { itemId: "054VA72303012P" } ).sort( { _id: 1 } )

• Indices

itemId, lastUpdated

Models – Variant Model

29

Models - Hierarchy

{

"_id": "1200003270", // the node id

"name": "Women's Heels & Pumps",

"count": 22305, // how many items in this category

"parents": [ // list of parents

"1282094266"

],

"facets": [ // facets that exists for this category

"Heel Height",

"Toe",

"Upper Material",

"Width",

"Shoe Size",

"Color"

]

}

30

• Get hierarchy node by id

db.hierarchy.find( { _id: "1200003270" } )

• Get hierarchy node from parent id

db.hierarchy.find( { parents: "1282094266" } )

• Get departments (no parent)

db.hierarchy.find( { parents: null } )

• Secondary Indices

parents

Models – Hierarchy

31

Per store pricing could result in billions of documents…unless it is built in a modular way:

_id: concatenation of item and store.

Item: can be an item id or variant id (sku)

Store: can be a store group (online) or store id.

Models – per Store Pricing

{ "_id": "skuSPM8824542513_1234/store123", "price": 69.99, "sale": { "salePrice": 42.72, "saleEndDate": "2050-12-31 23:59:59" }, "lastUpdated": 1374647707394 }

32

• Get all prices for a given item

db.prices.find( { _id: /^item301671/ )

• Get all prices for a given sku (price could be at item level)

db.prices.find( { _id: { $in: [ /^sku730223104376/, /^item301671/ ])

• Get minimum and maximum prices for a sku

db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },

max: { $max : price} } })

• Get price for a sku and store id (returns up to 4 prices)

db.prices.find( { _id: { $in: [ "sku730223104376/store1234",

"sku730223104376/sgroup0",

"item301671/store1234",

"item301671/sgroup0"] , { price: 1 })

Models – per store Pricing

Product Search

34

Search – Browse and Search products

Browse by category

Special Lists

Filter by attributes

Lists hundreds of item

summaries

By far the toughest page to get right and fast …

35

The previous page presents many challenges:

• Response within milliseconds for hundreds of items

• Faceted search on many attributes: category, brand, …

• Efficient sorting on several attributes: price, popularity

• Pagination feature which requires deterministic ordering

> Search engines are built for this purpose!

Search – Browse and Search products

36

Search – Traditional Architecture

Product Data Store Product Search

Indexing

#1 obtain search

results IDs

ApplicationCache

#2 obtain objects by ID from cache or DB

Pre-joined into objects

37

The traditional architecture issues:

• 3 different systems to maintain: RDBMS, Search engine, Caching layer

• RDBMS schema is complex and static

• Applications needs to talk many languages

Search – Traditional Architecture

38

Search – Architecture with MongoDB

Product Data Store Product Search

Indexing

#1 obtain search

results IDs

Applications

#2 obtain objects by list of IDs

MongoDB

Ready-to-use product documents

Search Engine

Product API

Application issues single

query

39

MongoDB

Search - Mongo-Connector

Search Engine

OplogMongo

Connector

#1 Initial dump of the

collections

#2 Updates streaming via

OplogTranslation, filtering

Indexing

Indexing

40

• Open-source Project at https://github.com/10gen-labs/mongo-connector

• Python app that reads from MongoDB's oplog and publishes to target of choice

• Supports initial sync by dumping the data

• Default connectors for Solr, Elastic Search, other MongoDB cluster

• Easily extensible to update other systems like SQL

Search - Mongo-Connector

41

What is the data to index?

Search – Mongo-Connector

42

Search – More Searching

Images of the matching variants are displayed

Facets for variants

Price and Rating

43

… more challenges:

• Attributes at the variant level: color, size, etc

• Attributes from other docs: pricing, ratings, etc

• Display the matching variant's image and details

• Thousands of matching variants for an item, still need to display a single item

• Challenge to properly index the data

> Need for a single summary document per item

Search – More Searching

44

MongoDB Data Store

Search - Architecture

SummariesItems Pricing

PromotionsVariantsRatings & Reviews

45

{ "_id": "3ZZVA46759401P", // the item id "name": "Women's Chic - Black Velvet Suede", "dep": "84700", // useful as standalone for indexing "cat": "/84700/80009/1282094266/1200003270", "desc": { "lang": "en", "val": "This pointy toe slingback ..." }, "img": { "width": 450, "height": 330, "src": "http://..." }, "attrs": [ // global attributes, easily indexable by SE "heel height=mid (1-3/4 to 2-1/4 in.)", "brand=metaphor", "shoe size=6", "shoe size=6.5", ... ], "sattrs": [ // global attributes, not to be indexed "upper material=synthetic", "toe=open toe", ... ], "vars": [ { "id": "05497884001", "img": [ // images], "attrs": [ // list of variant attributes to index ] "sattrs": [ // list of variant attributes not to index ] }, … ] }

Search – Summary Model

46

Let's use Solr …

Search – Using Solr

47

Search - Using Solr

48

Search - Using Solr

Defining the schema in schema.xml

<fields> <!-- some of the core fields --> <field name="_id" type="string" indexed="true" stored="true" /> <field name="name" type="text_general" indexed="true" stored="true" /> <field name="cat" type="string" indexed="true" stored="true" /> <field name="price" type="float" indexed="true" stored="true"/>

<!-- the full text to index --> <field name="desc.0.val" type="text_general" indexed="true" stored="true"/>

<!-- dynamic attributes for facetting --> <dynamicField name="attrs.*" type="string" indexed="true" stored="true"/>

<!– some Solr specific fields --> <field name="_version_" type="long" indexed="true" stored="true"/> <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/> <dynamicField name="*" type="ignored" multiValued="true"/></fields>

49

Search - Using Solr

Starting up the connector

> Keep it running, it will just stream the Oplog

> mongo-connector -m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo -t http://localhost:8983/solr // the solr -d mongo_connector/doc_managers/solr_doc_manager.py -n "catalog.summary" // target summary collection --auto-commit-interval=60 // commit every 1 min…

50

Document in Solr looks like:

Lists are flattened which is difficult to use

> Must use to named fields to implement Facets

Search – Using Solr

{ "desc.0.val": "Our classic \"Flying Duck\" styled as a ...", "name": "Drake Waterfowl Duck Label SS T-Shirt Army Green", "attrs.1": "brand=Drake Waterfowl", "attrs.0": "style=t-shirts", "cat": "/84700/1200000239/1282094207/1200000817", "_id": "SPM10823491916", "_version_": 1479173524477182000, "timestamp": "2014-09-13T23:09:59.782Z"}

51

Let's use Elastic Search…

Search – Using Elastic Search

52

Search - Using Elastic Search

53

Search - Using Elastic Search

ElasticSearch understands whole document right off the bat

Just need to tell ES not to tokenize the facets:

> Everything else is indexed auto-magically!

$ curl -XPOST localhost:9200/largecat3.summary -d '{ "settings" : { "number_of_shards" : 1 }, "mappings" : { "string" : { // string is the name of default mapping type "properties" : { "attrs" : { "type" : "string", "index" : "not_analyzed" } } } } }'

54

Search - Using Elastic Search

Starting up the connector

> Keep it running, it will just stream the Oplog

> mongo-connector -m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo -t http://localhost:9200 // the ES -d mongo_connector/doc_managers/elastic_doc_manager.py -n "catalog.summary" // target summary collection --auto-commit-interval=60 // commit every 1 min…

55

Search - Using Elastic Search

Querying for documents, with Facet info… works well $ curl -X POST "http://localhost:9200/largecat3.summary/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "Ipad"} }, "facets" : { "tags" : { "terms" : {"field" : "attrs"} } } }'{ "took" : 6, "hits" : { "total" : 151, "max_score" : 0.5892989, "hits" : [ { "_index" : "largecat3.summary", "_type" : "string", "_id" : "000000000000000012730000000000QAU-QR2442P", "_score" : 0.5892989, "_source": { // original JSON from MongoDB }, ... ] }, "facets" : { "tags" : { "_type" : "terms", "total" : 1577, "terms" : [ { "term" : "ring size=9", "count" : 120 }, { "term" : "ring size=8", "count" : 120 }, { "term" : "metal=sterling silver", "count" : 112 }, ... ] } } }

56

How about MongoDB's indexes and Full-Text-Search?

Search – Using MongoDB Indexing

57

The summary contains:

• department e.g. "Shoes"

• Fields to index

– Category path, e.g. "Shoes/Women/Pumps"

– Price

– List of Item Attributes, e.g. Brand = Guess

– List of Variant Attributes, e.g. Color = red

• Fields not to index

– List of Item Secondary Attributes, e.g. Style = Designer

– List of Variant Secondary Attributes, e.g. heel height = 4.0

Search – Using MongoDB indexing

58

• Get summary from item iddb.variation.find({ _id: "p301671" })

• Get summary's specific variation from SKUdb.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )

• Get summary by department, sorted by ratingdb.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )

• Get summary with mix of parametersdb.variation.find( { department : "Shoes" ,

"vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" :

180.99 } } )

Search - Using MongoDB indexing

59

Search – Using MongoDB indexing

• The following indices are used:– department + attr + category + _id– department + vars.attrs + category + _id– department + category + _id– department + price + _id– department + rating + _id

• _id used for pagination

• Can take advantage of index intersection

• With several attributes specified (e.g. color=red and size=6), which one is looked up?

60

Facet samples:

{ "_id" : "Accessory Type=Hosiery" , "count" : 14}

{ "_id" : "Ladder Material=Steel" , "count" : 2}

{ "_id" : "Gold Karat=14k" , "count" : 10138}

{ "_id" : "Stone Color=Clear" , "count" : 1648}

{ "_id" : "Metal=White gold" , "count" : 10852}

Single operations to insert / update:

db.facet.update( { _id: "Accessory Type=Hosiery" },

{ $inc: 1 }, true, false)

The facet with lowest count is the most restrictive…

It should come first in the $all query!

Search – Using MongoDB indexing

61

• Search Engine advantages:– Index size (~ 10x smaller than MongoDB's)

– Indexing speed

– Read speed, integrated cache

– All languages support

– Built-in facetted search, which includes facet counts

• MongoDB's Indexing advantages:– Built-in the data store, no additional server / software needed

– Single query to get the results

– Can filter down the variant entry and save computing

> Winner here is Elastic Search

Search – Comparing Solutions

62

Search – Benchmarking

Department Category Price Primary attribute

Time Average (ms)

90th (ms) 95th (ms)

1 0 0 0 2 3 3

1 1 0 0 1 2 2

1 0 1 0 1 2 3

1 1 1 0 1 2 2

1 0 0 1 0 1 2

1 1 0 1 0 1 1

1 0 1 1 1 2 2

1 1 1 1 0 1 1

1 0 0 2 1 3 3

1 1 0 2 0 2 2

1 0 1 2 10 20 35

1 1 1 2 0 1 1

Closing Comments

64

Q & A Time

Thank You!

Antoine GirbalPrincipal Solutions Engineer, MongoDB Inc.@antoinegirbal

top related