retail reference architecture
TRANSCRIPT
![Page 1: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/1.jpg)
Retail Reference Architecturewith MongoDB
Antoine GirbalPrincipal Solutions Engineer, MongoDB Inc.@antoinegirbal
![Page 2: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/2.jpg)
Introduction
![Page 3: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/3.jpg)
4
• it is way too broad to tackle with one solution
• data maps so well to the document model
• needs for agility, performance and scaling
• Many (e)retailers are already using MongoDB
• Let's define the best ways and places for it!
Retail solution
![Page 4: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/4.jpg)
5
• Holds complex JSON structures
• Dynamic Schema for Agility
• complex querying and in-place updating
• Secondary, compound and geo indexing
• full consistency, durability, atomic operations
• Near linear scaling via sharding
• Overall, MongoDB is a unique fit!
MongoDB is a great fit
![Page 5: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/5.jpg)
6
MongoDB Strategic Advantages
Horizontally Scalable-Sharding
AgileFlexible
High Performance &Strong Consistency
Application
HighlyAvailable-Replica Sets
{ customer: “roger”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}
![Page 6: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/6.jpg)
7
build your data to fit your application
Relational MongoDB{ customer_id : 1,
name : "Mark Smith",city : "San Francisco",orders: [ {
order_number : 13,store_id : 10,date: “2014-01-03”,products: [
{SKU: 24578234,
Qty: 3, Unit_price:
350},{SKU:
98762345, Qty: 1, Unit_Price:
110}]
},{ <...> }
]}
CustomerID First Name Last Name City0 John Doe New York1 Mark Smith San Francisco2 Jay Black Newark3 Meagan White London4 Edward Danields Boston
Order Number Store ID Product Customer ID10 100 Tablet 011 101 Smartphone 012 101 Dishwasher 013 200 Sofa 114 200 Coffee table 115 201 Suit 2
![Page 7: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/7.jpg)
8
Notions
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
![Page 8: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/8.jpg)
Retail Components Overview
![Page 9: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/9.jpg)
10
Information Management
Merchandising
Content
Inventory
Customer
Channel
Sales & Fulfillment
Insight
Social
Architecture Overview
Customer
ChannelsAmazon
Ebay…
StoresPOSKiosk
…
MobileSmartphone
Tablet
Website
Contact Center
APIData and Service
Integration
SocialFacebook
Twitter…
Data Warehouse
Analytics
Supply Chain Management
System
Suppliers
3rd Party
In Network
Web Servers
Application Servers
![Page 10: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/10.jpg)
11
Commerce Functional Components
Information Layer
Look & Feel
Navigation
Customization
Personalization
Branding
Promotions
Chat
Ads
Customer's Perspective
ResearchBrowseSearch
SelectShopping Cart
PurchaseCheckout
ReceiveTrack
UseFeedbackMaintain
DialogAssist
Market / Offer
Guide
Offer
Semantic Search
Recommend
Rule-based Decisions
Pricing
Coupons
Sell / Fullfill
Orders
Payments
Fraud Detection
Fulfillment
Business Rules
InsightSession CaptureActivity
Monitoring
Customer Enterprise
Information Management
Merchandising
Content
Inventory
Customer
Channel
Sales & Fulfillment
Insight
Social
![Page 11: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/11.jpg)
Merchandising
![Page 12: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/12.jpg)
13
Merchandising
Merchandising
MongoDB
Variant
Hierarchy
Pricing
Promotions
Ratings & Reviews
Calendar
Semantic Search
Item
Localization
![Page 13: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/13.jpg)
14
• Single view of a product, one central catalog service
• Read volume high and sustained, 100k reads / s
• Write volume spikes up during catalog update
• Advanced indexing and querying
• Geographical distribution and low latency
• No need for a cache layer, CDN for assets
Merchandising - principles
![Page 14: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/14.jpg)
15
Merchandising - requirements
Requirement Example Challenge MongoDB
Single-view of product Blended description and hierarchy of product to ensure availability on all channels
Flexible document-oriented storage
High sustained read volume with low latency
Constant querying from online users and sales associates, requiring immediate response
Fast indexed querying, replication allows local copy of catalog, sharding for scaling
Spiky and real-time write volume
Bulk update of full catalog without impacting production, real-time touch update
Fast in-place updating, real-time indexing, , sharding for scaling
Advanced querying Find product based on color, size, description
Ad-hoc querying on any field, advanced secondary and compound indexing
![Page 15: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/15.jpg)
16
Merchandising - Product Page
Product images
General Informatio
n
List of Variants
External Informatio
n
Localized Descriptio
n
![Page 16: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/16.jpg)
17
> db.item.findOne()
{ _id: "301671", // main item id
department: "Shoes",
category: "Shoes/Women/Pumps",
brand: "Guess",
thumbnail: "http://cdn…/pump.jpg",
image: "http://cdn…/pump1.jpg", // larger version of thumbnail
title: "Evening Platform Pumps",
description: "Those evening platform pumps put the perfect finishing touches on your most glamourous night-on-the-town outfit",
shortDescription: "Evening Platform Pumps",
style: "Designer",
type: "Platform",
rating: 4.5, // user rating
lastUpdated: Date("2014/04/01"), // last update time
… }
Merchandising - Item Model
![Page 17: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/17.jpg)
18
• Get item by id
db.definition.findOne( { _id: "301671" } )
• Get item from Product Ids
db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } )
• Get items by department
db.definition.find({ department: "Shoes" })
• Get items by category prefix
db.definition.find( { category: /^Shoes\/Women/ } )
• Indices
productId, department, category, lastUpdated
Merchandising - Item Definition
![Page 18: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/18.jpg)
19
> db.variant.findOne()
{
_id: "730223104376", // the sku
itemId: "301671", // references item id
thumbnail: "http://cdn…/pump-red.jpg", // variant specific
image: "http://cdn…/pump-red.jpg",
size: 6.0,
color: "Red",
width: "B",
heelHeight: 5.0,
lastUpdated: Date("2014/04/01"), // last update time
…
}
Merchandising – Variant Model
![Page 19: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/19.jpg)
20
• Get variant from SKU
db.variation.find( { _id: "730223104376" } )
• Get all variants for a product, sorted by SKU
db.variation.find( { productId: "301671" } ).sort( { _id: 1 } )
• Indices
productId, lastUpdated
Merchandising – Variant Model
![Page 20: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/20.jpg)
22
Per store Pricing could result in billions of documents,
unless you build it in a modular way
Price: {
_id: "sku730223104376_store123",
currency: "USD",
price: 89.95,
lastUpdated: Date("2014/04/01"), // last update time
…
}
_id: concatenation of item and store.
Item: can be an item id or sku
Store: can be a store group or store id.
Indices: lastUpdated
Merchandising – per store Pricing
![Page 21: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/21.jpg)
23
• Get all prices for a given item
db.prices.find( { _id: /^p301671_/ )
• Get all prices for a given sku (price could be at item level)
db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ])
• Get minimum and maximum prices for a sku
db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },
max: { $max : price} } })
• Get price for a sku and store id (returns up to 4 prices)
db.prices.find( { _id: { $in: [ "sku730223104376_store1234",
"sku730223104376_sgroup0",
"p301671_store1234",
"p301671_sgroup0"] , { price: 1 })
Merchandising – per store Pricing
![Page 22: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/22.jpg)
26
Merchandising – Browse and Search products
Browse by category
Special Lists
Filter by attributes
Lists hundreds of item
summaries
Ideally a single query is issued to the database to obtain all items and metadata to display
![Page 23: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/23.jpg)
27
The previous page presents many challenges:
• Response within milliseconds for hundreds of items
• Faceted search on many attributes: category, brand, …
• Attributes at the variant level: color, size, etc, and the variation's image should be shown
• thousands of variants for an item, need to de-duplicate
• Efficient sorting on several attributes: price, popularity
• Pagination feature which requires deterministic ordering
Merchandising – Browse and Search products
![Page 24: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/24.jpg)
28
Merchandising – Browse and Search products
Hundreds of sizes
One Item
Dozens of colors
A single item may have thousands of variants
![Page 25: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/25.jpg)
29
Merchandising – Browse and Search products
Images of the matching variants are displayed
HierarchySort
parameter
Faceted Search
![Page 26: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/26.jpg)
30
Merchandising – Traditional Architecture
Relational DBSystem of Records
Full Text SearchEngine
Indexing
#1 obtain search
results IDs
ApplicationCache
#2 obtain objects by
ID
Pre-joined into objects
![Page 27: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/27.jpg)
31
The traditional architecture issues:
• 3 different systems to maintain: RDBMS, Search engine, Caching layer
• search returns a list of IDs to be looked up in the cache, increases latency of response
• RDBMS schema is complex and static
• The search index is expensive to update
• Setup does not allow efficient pagination
Merchandising – Traditional Architecture
![Page 28: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/28.jpg)
32
MongoDB Data Store
Merchandising - Architecture
SummariesItems Pricing
PromotionsVariantsRatings & Reviews
#1 Obtain results
![Page 29: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/29.jpg)
33
The summary relies on the following parameters:
• department e.g. "Shoes"
• An indexed attribute
– Category path, e.g. "Shoes/Women/Pumps"
– Price range
– List of Item Attributes, e.g. Brand = Guess
– List of Variant Attributes, e.g. Color = red
• A non-indexed attribute
– List of Item Secondary Attributes, e.g. Style = Designer
– List of Variant Secondary Attributes, e.g. heel height = 4.0
• Sorting, e.g. Price Low to High
Merchandising – Summary Model
![Page 30: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/30.jpg)
34
> db.summaries.findOne()
{ "_id": "p39",
"title": "Evening Platform Pumps 39",
"department": "Shoes", "category": "Shoes/Women/Pumps",
"thumbnail": "http://cdn…/pump-small-39.jpg", "image": "http://cdn…/pump-39.jpg",
"price": 145.99,
"rating": 0.95,
"attrs": [ { "brand" : "Guess"}, … ],
"sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …],
"vars": [
{ "sku": "sku2441",
"thumbnail": "http://cdn…/pump-small-39.jpg.Blue",
"image": "http://cdn…/pump-39.jpg.Blue",
"attrs": [ { "size": 6.0 }, { "color": "Blue" }, …],
"sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …],
}, … Many more skus …
] }
Merchandising – Summary Model
![Page 31: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/31.jpg)
35
• Get summary from item iddb.variation.find({ _id: "p301671" })
• Get summary's specific variation from SKUdb.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )
• Get summary by department, sorted by ratingdb.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )
• Get summary with mix of parametersdb.variation.find( { department : "Shoes" ,
"vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" :
180.99 } } )
Merchandising - Summary Model
![Page 32: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/32.jpg)
36
Merchandising – Summary Model
• The following indices are used:– department + attr + category + _id– department + vars.attrs + category + _id– department + category + _id– department + price + _id– department + rating + _id
• _id used for pagination
• Can take advantage of index intersection
• With several attributes specified (e.g. color=red and size=6), which one is looked up?
![Page 33: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/33.jpg)
37
Facet samples:
{ "_id" : "Accessory Type=Hosiery" , "count" : 14}
{ "_id" : "Ladder Material=Steel" , "count" : 2}
{ "_id" : "Gold Karat=14k" , "count" : 10138}
{ "_id" : "Stone Color=Clear" , "count" : 1648}
{ "_id" : "Metal=White gold" , "count" : 10852}
Single operations to insert / update:
db.facet.update( { _id: "Accessory Type=Hosiery" },
{ $inc: 1 }, true, false)
The facet with lowest count is the most restrictive…
It should come first in the query!
Merchandising – Facet
![Page 34: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/34.jpg)
38
Merchandising – Query stats
Department Category Price Primary attribute
Time Average (ms)
90th (ms) 95th (ms)
1 0 0 0 2 3 3
1 1 0 0 1 2 2
1 0 1 0 1 2 3
1 1 1 0 1 2 2
1 0 0 1 0 1 2
1 1 0 1 0 1 1
1 0 1 1 1 2 2
1 1 1 1 0 1 1
1 0 0 2 1 3 3
1 1 0 2 0 2 2
1 0 1 2 10 20 35
1 1 1 2 0 1 1
![Page 35: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/35.jpg)
Inventory
![Page 36: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/36.jpg)
42
Inventory – Traditional Architecture
Relational DBSystem of Records
NightlyBatches
Analytics, Aggregations,
Reports
Caching Layer
Field Inventory
Internal & External Apps
Point-in-time Loads
![Page 37: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/37.jpg)
43
Opportunities Missed
• Can’t reliability detect availability
• Can't redirect purchasers to in-store pickup
• Can’t do intra-day replenishment
• Degraded customer experience
• Higher internal expense
![Page 38: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/38.jpg)
44
Inventory – Principles
• Single view of the inventory
• Used by most services and channels
• Read dominated workload
• Local, real-time writes
• Bulk writes for refresh
• Geographically distributed
• Horizontally scalable
![Page 39: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/39.jpg)
45
Inventory – Requirements
Requirement Challenge MongoDB
Single view of inventory
Ensure availability of inventory information on
all channels and services
Developer-friendly, document-oriented
storage
High volume, low latency reads
Anytime, anywhere access to inventory
data without overloading the system
of record
Fast, indexed readsLocal reads
Horizontal scaling
Bulk updates,intra-day deltas
Provide window-in-time consistency for highly
available services
Bulk writesFast, in-place updates
Horizontal scaling
Rapid application development cycles
Deliver new services rapidly to capture new
opportunities
Flexible schemaRich query language
Agile-friendly iterations
![Page 40: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/40.jpg)
46
Inventory – Target Architecture
Relational DBSystem of Records
Analytics, Aggregations,
Reports
Field Inventory
Internal & External Apps
Inventory
Assortments
Shipments
Audits
Products
Stores
Point-in-time Loads
NightlyRefresh
Real-timeUpdates
![Page 41: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/41.jpg)
47
Horizontal Scaling
Inventory – Technical Decisions
Store
Inventory
Schema
Indexing
![Page 42: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/42.jpg)
48
Inventory – Collections
Stores InventoryProducts
AuditsAssortmen
tsShipments
![Page 43: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/43.jpg)
49
Stores – Sample Document
• > db.stores.findOne()• {• "_id" :
ObjectId("53549fd3e4b0aaf5d6d07f35"),• "className" : "catalog.Store",• "storeId" : "store0",• "name" : "Bessemer store",• "address" : {• "addr1" : "1st Main St",• "city" : "Bessemer",• "state" : "AL",• "zip" : "12345",• "country" : "US"• },• "location" : [ -86.95444, 33.40178 ],
...• }
![Page 44: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/44.jpg)
50
Stores – Sample Queries
• Get a store by storeId
db.stores.find({ "storeId" : "store0" })
• Get a store by zip code
db.stores.find({ "address.zip" : "12345" })
![Page 45: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/45.jpg)
51
What’s near me?
![Page 46: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/46.jpg)
52
Stores – Sample Geo Queries
• Get nearby stores sorted by distance
db.runCommand({ geoNear : "stores", near : { type : "Point", coordinates : [-82.8006, 40.0908] }, maxDistance : 10000.0, spherical : true })
![Page 47: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/47.jpg)
53
Stores – Sample Geo Queries
• Get the five nearest stores within 10 km
db.stores.find({ location : { $near : { $geometry : { type : "Point", coordinates : [-82.80, 40.09] }, $maxDistance : 10000 } } }).limit(5)
![Page 48: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/48.jpg)
54
Stores – Indices
• { "storeId" : 1 }
• { "name" : 1 }
• { "address.zip" : 1 }
• { "location" : "2dsphere" }
![Page 49: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/49.jpg)
55
Inventory – Sample Document
• > db.inventory.findOne()• { • "_id": "5354869f300487d20b2b011d",• "storeId": "store0",• "location": [-86.95444, 33.40178],• "productId": "p0",• "vars": [• { "sku": "sku1", "q": 14 },• { "sku": "sku3", "q": 7 },• { "sku": "sku7", "q": 32 },• { "sku": "sku14", "q": 65 },• ...• ]• }
![Page 50: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/50.jpg)
56
Inventory – Sample Queries
• Get all items in a store
db.inventory.find({ storeId : "store100" })
• Get quantity for an item at a store
db.inventory.find({ "storeId" : "store100", "productId" : "p200" })
![Page 51: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/51.jpg)
57
Inventory – Sample Queries
• Get quantity for a sku at a store
db.inventory.find( { "storeId" : "store100", "productId" : "p200", "vars.sku" : "sku11736" }, { "vars.$" : 1 } )
![Page 52: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/52.jpg)
58
Inventory – Sample Update
• Increment / decrement inventory for an item at a store
db.inventory.update( { "storeId" : "store100", "productId" : "p200", "vars.sku" : "sku11736" }, { "$inc" : { "vars.$.q" : 20 } } )
![Page 53: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/53.jpg)
59
Inventory – Sample Aggregations
• Aggregate total quantity for a product
db.inventory.aggregate( [ { $match : { productId : "p200" } }, { $unwind : "$vars" }, { $group : { _id : "result", count : { $sum : "$vars.q" } } } ] )
{ "_id" : "result", "count" : 101752 }
![Page 54: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/54.jpg)
60
Inventory – Sample Aggregations
• Aggregate total quantity for a store
db.inventory.aggregate( [ { $match : { storeId : "store100" } }, { $unwind : "$vars" }, { $match : { "vars.q" : { $gt : 0 } } }, { $group : { _id : "result", count : { $sum : 1 } } } ] )
{ "_id" : "result", "count" : 29347 }
![Page 55: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/55.jpg)
61
Inventory – Sample Aggregations
• Aggregate total quantity for a store
db.inventory.aggregate( [ { $match : { storeId : "store100" } }, { $unwind : "$vars" }, { $group : { _id : "result", count : { $sum : "$vars.q" } } } ] )
{ "_id" : "result", "count" : 29347 }
![Page 56: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/56.jpg)
63
![Page 57: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/57.jpg)
64
Inventory – Sample Geo-Query
• Get inventory for an item near a point
db.runCommand( { geoNear : "inventory", near : { type : "Point", coordinates : [-82.8006, 40.0908] }, maxDistance : 10000.0, spherical : true, limit : 10, query : { "productId" : "p200", "vars.sku" : "sku11736" } } )
![Page 58: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/58.jpg)
65
Inventory – Sample Geo-Query
• Get closest store with available sku
db.runCommand( { geoNear : "inventory", near : { type : "Point", coordinates : [-82.800672, 40.090844] }, maxDistance : 10000.0, spherical : true, limit : 1, query : { productId : "p200", vars : { $elemMatch : { sku : "sku11736", q : { $gt : 0 } } } } } )
![Page 59: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/59.jpg)
66
Inventory – Sample Geo-Aggregation
• Get count of inventory for an item near a point db.inventory.aggregate( [ { $geoNear: { near : { type : "Point", coordinates : [-82.800672, 40.090844] }, distanceField: "distance", maxDistance: 10000.0, spherical : true, query: { productId : "p200", vars : { $elemMatch : { sku : "sku11736", q : {$gt : 0} } } }, includeLocs: "dist.location", num: 5 } }, { $unwind: "$vars" }, { $match: { "vars.sku" : "sku11736" } }, { $group: { _id: "result", count: {$sum: "$vars.q"} } }])
![Page 60: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/60.jpg)
67
Inventory – Sample Indices
• { storeId : 1 }
• { productId : 1, storeId : 1 }
• Why not "vars.sku"?– { productId : 1, storeId : 1, "vars.sku" : 1 }
• { productId : 1, location : "2dsphere" }
![Page 61: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/61.jpg)
68
Horizontal Scaling
Inventory – Technical Decisions
Store
Inventory
Schema
Indexing
![Page 62: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/62.jpg)
69
Shard
East
Shard
Central
Shard
West
East DC
Inventory – Sharding Topology
West DC Central DCLegacy
Inventory
Primary
Primary
Primary
![Page 63: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/63.jpg)
70
Inventory – Shard Key
• Choose shard key– { productId : 1, storeId : 1 }
• Set up sharding– sh.enableSharding("inventoryDB")– sh.shardCollection( "inventoryDB.inventory", { productId : 1, storeId : 1 } )
![Page 64: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/64.jpg)
71
Inventory – Shard Tags
• Set up shard tags– sh.addShardTag("shard0000", "west")
– sh.addShardTag("shard0001", "central")
– sh.addShardTag("shard0002", "east")
• Set up tag ranges– Add new field: region– sh.addTagRange("inventoryDB.inventory",
{ region : 0 }, { region : 100}, "west" )
– sh.addTagRange("inventoryDB.inventory",
{ region : 100 }, { region : 200 }, "central" )
– sh.addTagRange("inventoryDB.inventory",
{ region : 200 }, { region : 300 }, "east" )
![Page 65: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/65.jpg)
Insight
![Page 66: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/66.jpg)
87
Insight
Insight
MongoDB
Advertising metrics
Clickstream
Recommendations
Session Capture
Activity Logging
Geo Tracking
Product Analytics
Customer Insight
Application Logs
![Page 67: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/67.jpg)
88
Many user activities can be of interest:
• Search
• Product view, like or wish
• Shopping cart add / remove
• Sharing on social network
• Ad impression, Clickstream
Activity Logging – Data of interest
![Page 68: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/68.jpg)
89
Will be used to compute:
• Product Map (relationships, etc)
• User Preferences
• Recommendations
• Trends …
Activity Logging – Data of interest
![Page 69: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/69.jpg)
90
Activity logging - Architecture
MongoDB
HVDFAPI
Activity LoggingUser History
External Analytics:Hadoop,Spark,Storm,
…
User Preferences
Recommendations
Trends
Product MapApps
Internal Analytics:
Aggregation,MR
All user activity is recorded
MongoDB – Hadoop
Connector
Personalization
![Page 70: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/70.jpg)
91
Activity Logging
![Page 71: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/71.jpg)
92
• store and manage an incoming stream of data samples– High arrival rate of data from many sources– Variable schema of arriving data– control retention period of data
• compute derivative data sets based on these samples– Aggregations and statistics based on data – Roll-up data into pre-computed reports and summaries
• low latency access to up-to-date data (user history)– Flexible indexing of raw and derived data sets – Rich querying based on time + meta-data fields in samples
Activity Logging – Problem statement
![Page 72: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/72.jpg)
93
Activity logging - Requirements
Requirement MongoDB
Ingestion of 100ks of writes / sec
Fast C++ process, multi-threads, multi-locks. Horizontal scaling via sharding. Sequential IO via time partitioning.
Flexible schema Dynamic schema, each document is independent. Data is stored the same format and size as it is inserted.
Fast querying on varied fields, sorting
Secondary Btree indexes can lookup and sort the data in milliseconds.
Easy clean up of old data Deletes are typically as expensive as inserts. Getting free deletes via time partitioning.
![Page 73: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/73.jpg)
94
Activity Logging using HVDF
HVDF (High Volume Data Feed):
• Open source reference implementation of high volume writing with MongoDB https://github.com/10gen-labs/hvdf
• Rest API server written in Java with most popular libraries
• Public project, issues can be logged https://jira.mongodb.org/browse/HVDF
• Can be run as-is, or customized as needed
![Page 74: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/74.jpg)
95
Feed
High volume data feed architecture
Channel
Sample Sample Sample Sample
Source
Source
Processor
Inline Processing
Batch Processing
Stream Processing
Grouping by Feed and Channel
Sources send samples
Processors generate derivative Channels
![Page 75: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/75.jpg)
96
HVDF -- High Volume Data Feed engine
HVDF – Reference implementation
REST Service API
Processor Plugins
Inline
Batch
Stream
Channel Data Storage
Raw Channel
Data
Aggregated Rollup T1
Aggregated Rollup T2
Query Processor Streaming spout
Custom Stream Processing Logic
Incoming Sample Stream
POST /feed/channel/data
GET /feed/channeldata?time=XXX&range=YYY
Real-time Queries
![Page 76: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/76.jpg)
97
{ _id: ObjectId(),
geoCode: 1, // used to localize write operations
sessionId: "2373BB…",
device: { id: "1234",
type: "mobile/iphone",
userAgent: "Chrome/34.0.1847.131"
}
userId: "u123",
type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity
itemId: "301671",
sku: "730223104376",
order: { id: "12520185",
… },
location: [ -86.95444, 33.40178 ],
tags: [ "smartphone", "iphone", … ], // associated tags
timeStamp: Date("2014/04/01 …")
}
User Activity - Model
![Page 77: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/77.jpg)
98
Dynamic schema for sample data
Sample 1{ deviceId: XXXX, time: Date(…) type: "VIEW", …}
Channel
Sample 2{ deviceId: XXXX, time: Date(…) type: "CART_ADD", cartId: 123, …}
Sample 3{ deviceId: XXXX, time: Date(…) type: “FB_LIKE”}
Each sample can have
variable fields
![Page 78: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/78.jpg)
99
Channels are sharded
Shard
Shard
Shard
Shard
Shard
Shard Key: Customer_id
Sample{ customer_id: XXXX, time: Date(…) type: "VIEW",}
ChannelYou choose how
to partition samples
Samples can have dynamic
schema
Scale horizontally by adding shards
Each shard is highly available
![Page 79: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/79.jpg)
100
Channels are time partitioned
Channel
Sample Sample Sample Sample Sample Sample Sample Sample
- 2 days - 1 Day Today
Partitioning keeps indexes manageable
This is where all of the writes
happen
Older partitions are read only for
best possible concurrency
Queries are routed only to needed
partitions
Partition 1 Partition 2 Partition N
Each partition is a separate collection
Efficient and space reclaiming
purging of old data
![Page 80: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/80.jpg)
101
Dynamic queries on Channels
Channel
Sample Sample Sample Sample
AppApp
App
Indexes
Queries Pipelines Map-Reduce
Create custom indexes on Channels
Use full mongodb query language to access samples
Use mongodb aggregation pipelines to
access samples
Use mongodb inline map-reduce to access samples
Full access to field, text, and geo
indexing
![Page 81: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/81.jpg)
102
North America - West
North America - East
Europe
Geographically distributed system
Channel
Sample Sample Sample Sample
Source
Source
Source
Source
Source
Source
Sample
Sample
Sample
Sample
Geo shards per location
Clients write local nodes
Single view of channel available
globally
![Page 82: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/82.jpg)
103
Insight
![Page 83: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/83.jpg)
104
Insight – Useful Data
Useful data for better shopping:
• User history (e.g. recently seen products)
• User statistics (e.g. total purchases, visits)
• User interests (e.g. likes videogames and SciFi)
• User social network
![Page 84: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/84.jpg)
105
Insight – Useful Data
Useful data for selling more:
• Cross-selling: people who bought this item had tendency to buy those other items (e.g. iPhone, then bought iPhone case)
• Up-selling: people who looked at this item eventually bought those items (alternative product that may be better)
![Page 85: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/85.jpg)
106
• Get the recent activity for a user, to populate the "recently viewed" list
db.activities.find({ userId: "u123", time: { $gt: DATE }}).
sort({ time: -1 }).limit(100)
• Get the recent activity for a product, to populate the "N users bought this in the past N hours" list
db.activities.find({ itemId: "301671", time: { $gt: DATE }}).
sort({ time: -1 }).limit(100)
• Indices: time, userId + time, deviceId + time, itemId + time
• All queries should be time bound, since this is a lot of data!
Insight – User History
![Page 86: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/86.jpg)
107
• Get the recent number of views, purchases, etc for a userdb.activities.aggregate(([
{ $match: { userId: "u123", time: { $gt: DATE } }}, { $group: { _id: "$type", count: {$sum: 1} } }])
• Get the total recent sales for a userdb.activities.aggregate(([
{ $match: { userId: "u123", time: { $gt: DATE }, type: "ORDER" }}, { $group: { _id: "result", count: {$sum: "$totalPrice"} } }])
• Get the recent number of views, purchases, etc for an itemdb.activities.aggregate(([
{ $match: { itemId: "301671", time: { $gt: DATE } }}, { $group: { _id: "$type", count: {$sum: "1"} } }])
• Those aggregations are very fast, real-time
Insight – User Stats
![Page 87: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/87.jpg)
108
• number of activities for unique visitors for the past hour. Calculation of uniques is hard for any system!
db.activities.aggregate(([ { $match: { time: { $gt: NOW-1H } }}, { $group: { _id: "$userId", count: {$sum: 1} } }], { allowDiskUse: 1 })
• Aggregation above can have issues (single shard final grouping, result not persisted). Map Reduce is a better alternative here
var map = function() { emit(this.userId, 1); }var reduce = function(key, values) { return Array.sum(values); }db.activities.mapreduce(map, reduce,
{ query: { time: { $gt: NOW-1H } }, out: { replace: "lastHourUniques", sharded: true })
db.lastHourUniques.find({ userId: "u123" }) // number activities for a userdb.lastHourUniques.count() // total uniques
Insight – User Stats
![Page 88: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/88.jpg)
109
User Activity – Items bought together
Time to cross-sell!
![Page 89: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/89.jpg)
110
Let's simplify each activity recorded as the following:
{ userId: "u123", type: order, itemId: 2, time: DATE }
{ userId: "u123", type: order, itemId: 3, time: DATE }
{ userId: "u234", type: order, itemId: 7, time: DATE }
Calculate items bought by a user with Map Reduce:
- Match activities of type "order" for the past 2 weeks
- map: emit the document by userId
- reduce: push all itemId in a list
- Output looks like { _id: "u123", items: [2, 3, 8] }
User Activity – Items bought together
![Page 90: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/90.jpg)
111
Then run a 2nd mapreduce job from the previous output to compute the number of occurrences of each item combination:
- query: go over all documents (1 document per userId)
- map: emit every combination of 2 items, starting with lowest itemId
- reduce: sum up the total.
- output looks like { _id: { a: 2, b: 3 } , count: 36 }
User Activity – Items bought together
![Page 91: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/91.jpg)
112
Then obtain the most popular combinations per item:
- Index created on { _id.a : 1, count: 1 } and { _id.b: 1, count: 1 }
- Query with a threshold:
- db.combinations.find( { _id.a: "u123", count: { $gt: 10 }} ).sort({ count: -1 })
- db.combinations.find( { _id.b: "u123", count: { $gt: 10 }} ).sort({ count: -1 })
Later we can create a more compact recommendation collection that includes popular combinations with weights, like:
{ itemId: 2, recom: [ { itemId: 32, weight: 36},
{ itemId: 158, weight: 23}, … ] }
User Activity – Items bought together
![Page 92: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/92.jpg)
113
User Activity – Hadoop integration
EDW
Man
ag
em
en
t &
Mon
itori
ng
Secu
rity &
Au
ditin
g
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Operational Analytical
![Page 93: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/93.jpg)
114
Commerce
Applicationspowered by
Analysispowered by
• Products & Inventory• Recommended products• Customer profile• Session management
• Elastic pricing• Recommendation models• Predictive analytics• Clickstream history
MongoDB Connector for
Hadoop
![Page 94: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/94.jpg)
115
Connector Overview
Data
Read/Write MongoDB
Read/Write BSON
Tools
MapReduce
Pig
Hive
Spark
Platforms
Apache Hadoop
Cloudera CDH
Hortonworks HDP
Amazon EMR
![Page 95: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/95.jpg)
116
Connector Features and Functionality
• Open-source on github https://github.com/mongodb/mongo-hadoop
• Computes splits to read data– Single Node, Replica Sets, Sharded Clusters
• Mappings for Pig and Hive– MongoDB as a standard data source/destination
• Support for– Filtering data with MongoDB queries– Authentication– Reading from Replica Set tags– Appending to existing collections
![Page 96: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/96.jpg)
117
MapReduce Configuration
• MongoDB input
– mongo.job.input.format = com.hadoop.MongoInputFormat
– mongo.input.uri = mongodb://mydb:27017/db1.collection1
• MongoDB output
– mongo.job.output.format = com.hadoop.MongoOutputFormat
– mongo.output.uri = mongodb://mydb:27017/db1.collection2
• BSON input/output
– mongo.job.input.format = com.hadoop.BSONFileInputFormat
– mapred.input.dir = hdfs:///tmp/database.bson
– mongo.job.output.format =
com.hadoop.BSONFileOutputFormat
– mapred.output.dir = hdfs:///tmp/output.bson
![Page 97: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/97.jpg)
118
Pig Mappings
• Input: BSONLoader and MongoLoader data = LOAD ‘mongodb://mydb:27017/db.collection’ using com.mongodb.hadoop.pig.MongoLoader
• Output: BSONStorage and MongoInsertStorage STORE records INTO ‘hdfs:///output.bson’ using com.mongodb.hadoop.pig.BSONStorage
![Page 98: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/98.jpg)
119
Hive Support
CREATE TABLE mongo_users (id int, name string, age int)STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"WITH SERDEPROPERTIES("mongo.columns.mapping” = "_id,name,age”) TBLPROPERTIES("mongo.uri" = "mongodb://host:27017/test.users”)
• Access collections as Hive tables
• Use with MongoStorageHandler or BSONStorageHandler
![Page 99: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/99.jpg)
Thank You!
Antoine GirbalPrincipal Solutions Engineer, MongoDB Inc.@antoinegirbal
![Page 100: Retail Reference Architecture](https://reader038.vdocuments.us/reader038/viewer/2022102608/55515a2ab4c905a8768b4b9f/html5/thumbnails/100.jpg)