prepare for peak holiday season with mongodb

Preparing for Peak Holiday Season:A Seamless Customer Experience!

Global Business Architect & Strategist, MongoDB

@rebeccabucnis

Rebecca Bucnis

Principal Solutions Engineer, MongoDB@antoinegirbal

Antoine Girbal

1. How is Peak Season shaping up this year?

2. How does MongoDB scale to support your business?

3. How do you capture the holiday Digital Customer Experience with MongoDB?

3 Questions for this Session?

MongoDB Speakers

About Rebecca:

Rebecca Bucnis

Global Business Architect

- Business Strategy

- Using data for business value

- Former Retailer

Washington, DC

rebecca.bucnis@mongodb.com

@rebeccabucnis

About Antoine:

Antoine Girbal

Principal Solutions Engineer- Original team of MongoDB

- Engineer

- Solution Designer

Palo Alto, CA

Antoine.girbal@mongodb.com

@antoinegirbal

• Consumers more positive

• Increased spending (+25%*)

• Extended holiday buying window (with fewer days) starts 6pm

What to expect - Holiday Season 2014

* From Accenture Holiday Survey Oct 2014 Study on US Consumer Holiday Spending Plans

• Cyber Monday bigger than “Black Friday”

• Amazon has opened “stores” for returns

• 58%* of shoppers will shop with on-line retailers only:

What to expect - Holiday Season 2014

* From Accenture Holiday Survey Oct 2014 Study on US Consumer Holiday Spending Plans

• Consumers want the message right (*43% will defect when irrelevant)

• Price, Convenience, relevance & entertainment

• Collect immediate & longer term shopping behavior for action

The Opportunity - Holiday Season 2014

* From Gigya Personalization Study 2014 State of Consumer Privacy & Personalization

• A document model (holds mixed, variant data)

• Ability to add new & different data (agility)

• Ability to ask real-time questions based on right now update (complex queries & in-place updates)

• Geo-Location built-in

• Power of traditional data bases (full consistency,

durability, atomic operations)

• Near linear expansion (scaling via sharding)

• MongoDB is a unique fit for frictionless retail

The System of Engagement for Retail

“Global Product 360”

Themes: Up to date product details – with minimal down time; Images, reviews;

Vendor and order management;

Use Cases: Modern, Seamless Retail

Consolidated Customer View & Insight

Themes: Single View of Customer, Consumer 360; Activity Capture; Profiles for personalization

Use Cases: Modern, Seamless Retail

1. Detailed Product Information:- Single View of Product Information– Catalog

2. Real-time Inventory and Fulfillment- Real-time Inventory- Shopping Carts / Orders

3. Detailed Customer Views:- User Activity Logging - Integrating Customer Insights

4. Monitoring and Scaling- What to watch for and how to scale

Technical Deep Dive

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

Architecture Overview

Customer

ChannelsAmazon

Ebay…

StoresPOSKiosk

MobileSmartphone

Tablet

Website

Contact Center

APIData and Service

Integration

SocialFacebook

Twitter…

Data Warehouse

Analytics

Supply Chain Management

System

Suppliers

3rd Party

In Network

Web Servers

Application Servers

Commerce Functional Components

Information Layer

Look & Feel

Navigation

Customization

Personalization

Branding

Promotions

Customer's Perspective

ResearchBrowseSearch

SelectShopping Cart

PurchaseCheckout

ReceiveTrack

UseFeedbackMaintain

DialogAssist

Market / Offer

Semantic Search

Recommend

Rule-based Decisions

Pricing

Coupons

Sell / Fullfill

Orders

Payments

Fraud Detection

Fulfillment

Business Rules

InsightSession CaptureActivity

Monitoring

Customer Enterprise

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

Deep Dive: Product Catalog

The many catalogs problem

1. One department in charge of master product works hard at fitting data into SQL tables

2. Resulting data sits in a SQL server with a couple replicas. It's forbidden to hit it more than 100 times / sec

3. Other departments need to access the data way more often for their own services

4. Other departments need more information that is not available since it did not fit in that long devised rigid SQL schema

5. ETLs and Message Buses are put in place for other teams to try figure it out themselves…

6. Data becomes inconsistent, fragmented, not up-to-date…Problem visible both internally and by customers!

The many catalogs problem

Online Store

Catalog

Marketing

Catalog

Department 3

Catalog

Product Department

MasterCatalog

Department 4

Catalog

Department 5

Catalog

Department 1

Catalog

Message Bus

Dozens of catalogs!

How many Catalogs do you have?

Catalog Caches?

Message Buses and ETLs for them?

Too many catalogs problem

• Single view of a product, one central service

• Flexible schema containing all useful data

• Read volume high and sustained, 100k reads / s

• Can seamlessly take write spikes during catalog update

• Advanced indexing and querying

• Geographical distribution for HA and low latency

Goal: Single View of Product

MongoDB Data Store

Merchandising - Architecture

Items Pricing Promotions

VariantsRatings & Reviews

Search Engine

Product Service API

Online Store Marketing Inventory SCMS Public API …

• Item: the overall product info (e.g. Levi’s 501)

• Variant: a specific variant of an item (e.g. in black size 6) which typically has a specific SKU / UPC

• Price: price information may vary based on the store, the variant, etc

• Hierarchy: the item taxonomy

• Facet: facets to search products by

• Vendors: a given sku may be available through several vendors if the site is a marketplace

Models - Overview

{ "_id": "054VA72303012P", // the item id "desc": [ // item descriptions { "lang": "en", "val": "Give your dressy look a lift with ..." }, ... ], "name": "Women's Kate Ivory Peep-Toe Stiletto Heel", "category": "/84700/80009/1282094266/1200003270", // hierarchy "brand": { "id": "2483510", "img": "http://...", "name": "Metaphor" }, "assets": { // references to all assets "imgs": [ { "img": { "width": 1900, "height": 1900, "src": "http://..." }, ... ] }, "shipping": { // shipping specs }, "specs": { // item specs }, "attrs": [ // list of items attributes (facets) { "name": "Heel Height", "value": "High (2-1/2 to 4 in.)" }, { "name": "Toe", "value": "Open toe" }, ... ], "variants": { // quick info on the variants "cnt": 9, "attrs": [ { "dispType": "DROPDOWN", "name": "Color" }, { "dispType": "DROPDOWN", "name": "Shoe Size" }, ... ] }, "lastUpdated": 1400877254787 // keep track of updates }

Models - Item Model

Product Search – Traditional Architecture

Product Data Store Product Search

Indexing

#1 obtain search

results IDs

ApplicationCache

#2 obtain objects by ID from cache or DB

Pre-joined into objects

Product Search – New Architecture

Product Data Store Product Search

Indexing

#1 obtain search

results IDs

Applications

#2 obtain objects by list of IDs

MongoDB

Ready-to-use product documents

Search Engine

Product API

Application issues single

Deep Dive: Real-time Inventory and Fulfillment

Less than Real-Time Inventory

1. The Inventory system is centralized in a single SQL server

2. Latency to Inventory is too high, not accessible from individual stores or distribution centers

3. Stores / DCs need to manage their own local inventory, then ship the result once a day to the central system

4. Central inventory has no view of intra-day quantities. It does forecast and replenish with up to 24h delay

5. Opportunities are lost due to overstock / shortage

6. Sometimes products are sold due to existing quantities in a distant inventory. The product turns out not actually available, customers are upset

Less than Real-Time Inventory

Inventory – Traditional Architecture

Relational DBSystem of Records

Analytics, Aggregations,

Reports

Caching Layer

Field Inventory

Internal & External Apps

Local view only

Once-a-day sync

Stale view

Suboptimal logic

• Single view of the inventory, one central service

• Used by most services and channels

• Read dominated workload

• Local, real-time writes

• Bulk writes for refresh

• Geographically distributed

• Horizontally scalable

Goal: Real-Time Inventory

MongoDB

Inventory – Target Architecture

Relational DBSystem of Records

Analytics, Aggregations,

Reports

Field Inventory

Internal & External Apps

Inventory

Assortments

Shipments

Audits

Orders

Stores

Point-in-time Loads

NightlycheckReal-time

updates

Real-timeview

Relevant dataset

Representing quantities …

Inventory Levels - Inventory

Solution: 1 document per SKU / store

> 100 million items x 1000 stores

= 100 billions entries

{ "_id": "SPM7597703608A/store0", "storeId": "store0", "location": [-86.95444, 33.40178], "q": 88, "ts": 1400877254787 }

Solution: 1 document per key / store grouping SKUs

_id: item id or hash of SKU, with store id

> Good for geo distribution, low number of docs

{ "_id": "SPM7597703608/store0", // unique key "storeId": "store0", "location": [-86.95444, 33.40178], "geoCode": 1, "skus": [ // list of skus quantities { "id": "SPM7597703608A", "q": 88 }, { "id": "SPM7597703608B", "q": 55 }, { "id": "SPM7597703608C", "q": 104 }, … ], "ts": 1400877254787 }

• Increment / decrement / set quantity for an item at a store, atomically

Inventory Updates - Quantities

db.inventory.update( { "_id": { regex: "^SPM7597703608/" }, "skus.id": "SPM7597703608A" }, { "$dec": { "skus.$.q": 1 }})

db.inventory.update( { "_id": { regex: "^SPM7597703608/" }, "skus.id": "SPM7597703608A" }, { "$inc": { "skus.$.q": 20 }})

// use $set for setting …

• Get closest stores with available SKU

Inventory Levels – Inventory

db.runCommand({ geoNear: "inventory", near: { type: "Point", coordinates: [-82.8006, 40.0908] }, maxDistance: 10000.0, spherical: true, limit: 10, query: { _id: { regex: "^SPM7597703608/" }, skus: { $elemMatch: { id: "SPM7597703608A", q: { $gt: 0 }}}} })

How to keep reads / writes local with low latency?

How to stay available during network partition?

Inventory Updates – Availability

East DCCentral DCWest DC

ShardEast

ShardCentral

ShardWest

Primary

Basic Setup: Writes go

everywhere

• Basic shard key– { _id: 1 } // built as group key + store

• Shard key for "Geo-sharding"– { geoCode: 1, _id: 1}

• Alternative "Geo-sharding", more granular– { storeId: 1, _id: 1 }

ShardEast

ShardCentral

ShardWest

Primary

Using tag-aware sharding: mostly

local writes

Shopping Carts – Model

• Shopping cart fits naturally in 1 document

Shopping Carts – Model

{ _id: ObjectId(…), ts: ISODate("2011-12-09T00:00:00.000Z”), userId: "c12398", geoCode: 1, totalPrice: 1050.99, items: [{ sku: "SPM7597703608A", quantity: 1, price: 799, storeId: "store100", name: "Apple Macbook Air", thumbnail: "http://…", … }, { sku: "SPM7587703609C", quantity: 4, price: 20, storeId: "store100", name: "Oral-B Toothbrush", thumbnail: "http://…", … }, … ] }

ShardEast

ShardCentral

ShardWest

Shopping Carts – Availability

Primary

1. Shops in West, cart

written locally

2. Shops in East, same cart

read locally

Travel

ReplicationPrimary

• Each shard has 1 replica in every DC

• Primary servers are distributed among DCs

• Local Cart insert / update:– Tag-aware Sharding using the geoCode field

• Local Cart lookup:– Tag-aware Sharding using the geoCode field

• Local Cart lookup for all regions:– Nearest Read Preference (closest replica)

Shopping Carts – Topology

Deep Dive: User Activity Logging

and Insight

Insights

Data Intelligence

Many user activities can be of interest:

• Search terms

• Product viewed, liked or wished

• Shopping cart add / remove

• Orders submitted

• Sharing on social network

• Ad impression, Clickstream

Insights – Data of interest

Data will be used to compute:

• User / Product History

• Product Map (relationships, etc)

• User Preferences

• Recommendations

• Trends

> This is the basis for Personalization

Insights – Data of interest

1. Originally system does not record user activity much, since it is too voluminous. It ends up forgotten in log files.

2. Attempts are made to store it in SQL, but expensive to achieve adequate write performance. Reporting across large data sets (TB+) does not work.

3. Activity is recorded to Data Warehouse system which provides good reporting but too expensive to scale.

4. Using technologies like Hadoop, good scaling and powerful reporting are achieved.

5. Still there is a lack of scalable front end Data Store for real time queries and aggregations from applications.

Insights – Today's Limitations

Insights – Traditional Architecture

External Analytics:Hadoop,

Greenplum,Terradata,

Log ProcessorActivity Logs

SQL Data Store

Delays moving logs

Delays processing

Output limited by schema

Limited read capacity

• Store and manage large stream of data samples– High arrival rate from many sources– Variable schema– Control retention period of data

• Compute aggregations and derivative data sets– Aggregations and statistics based on data – Roll-up data into pre-computed reports and summaries

• Low latency access to up-to-date data– Flexible indexing of raw and derived data sets – Rich querying based on time + meta-data fields

Goal: Scalable and Powerful Insights

Insights – MongoDB Architecture

MongoDB

HVDFAPI

Activity LoggingUser History

External Analytics:Hadoop,Spark,Storm,

User Preferences

Recommendations

Trends

Product MapApps

Internal Analytics:

Aggregation,MR

All user activity is recorded

MongoDB – Hadoop

Connector

Personalization

Insights

Insights – MongoDB + Hadoop

Applicationspowered by

Analysispowered by

• Products & Inventory• Recommended products• Customer profile• Session management

• Elastic pricing• Recommendation models• Predictive analytics• Clickstream history

MongoDB Connector for

Hadoop

{ _id: ObjectId(),

geoCode: 1, // used to localize write operations

sessionId: "2373BB…",

device: { id: "1234",

type: "mobile/iphone",

userAgent: "Chrome/34.0.1847.131"

userId: "u123",

type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity

itemId: "301671",

sku: "730223104376",

order: { id: "12520185",

… },

location: [ -86.95444, 33.40178 ],

tags: [ "smartphone", "iphone", … ], // associated tags

timeStamp: Date("2014/04/01 …")

Insight – User Activity Model

• Recent activity for a user: db.activity.find({ userId: "u123" }) .sort({ time: -1 }).limit(100)

• Recent activity for a product: db.activity.find({ itemId: "301671" }) .sort({ time: -1 }).limit(100)

• Indices: – userId + time, itemId + time, time

• All queries should be time bound for performance!

Insight – User History

• Recent number of views, purchases, etc for user db.activities.aggregate(([ { $match: { userId: "u123", ts: { $gt: DATE }}}, { $group: { _id: "$type", count: { $sum: 1 }}}])

• Recent total sales for a user db.activities.aggregate(([ { $match: { userId:"u123", ts:{$gt:DATE}, type:"ORDER"}}, { $group: { _id: "result", count: {$sum: "$total" }}}])

• Recent number of views, purchases, etc for item db.activities.aggregate(([ { $match: { itemId: "301671", ts: { $gt: DATE }}}, { $group: { _id: "$type", count: { $sum: 1 }}}])

> Those aggregations are very fast, real-time

Insight – User Stats

• Map Reduce calculation of unique visitors: var map = function() { emit(this.userId, 1); }

var reduce = function(key, values)

{ return Array.sum(values); }

db.activities.mapreduce(map, reduce,

{ query: { time: { $gt: NOW-1H } },

out: { replace: "lastHourUniques", sharded: true })

// number activities for a user

db.lastHourUniques.find({ userId: "u123" })

// total uniques, immediate result

db.lastHourUniques.count()

Insight – User Stats

Monitoring and Scaling

Following are useful Monitoring tools:

• Mongo Monitoring Service (MMS)

• Mongostat – console based

• Mongotop – activity of each Namespace

• IOStat – disk activity

• Plugins for most popular frameworks (Munin, Nagios, Cacti, SNMP …)

> Without Monitoring, impossible to quickly troubleshoot and recover from downtime!

Monitoring Tips – Tools

Metrics to watch for:

• Data Size vs Disk Size

• Active Set Size vs Ram Size

• Disk IO

• Write Lock

> Account and test for highest possible traffic!

> MongoDB's support team is there to help!

Monitoring Tips – Metrics

Add replicas to:

• Reduce latency to users

• Add read capacity (data potentially stale)

• Increase data safety

> Adding / Removing replica is seamless

Replication Tips

If you are not sharding yet …

It may be time to shard

Switch to sharding with no downtime …

Just make sure you pick the right shard key!

MongoDB Support is there to help

Sharding Tips

Add shards to:

• Increase read / write IO capacity

• Increase Storage space

• Increase RAM space

• Bring a primary closer to users

> Shard add / remove takes time and capacity

> Scales mostly linearly but broadcast queries are sub-linear

Sharding Tips

Watch MMS Demo at https://www.youtube.com/watch?v=nSJiVXNsPHk

Closing Comments

1. How is Peak Season shaping up this year?

2. How do you scale your business with MongoDB?

3. How do you capture the holiday Digital Customer Experience with MongoDB?

3 Answers for this Season

1. Spending & confidence are back! Act fast!

2. Create single view services and scale using sharding

3. High volume activity logging capture for now & rest of the season for “insight”

1. Assess your data and determine your monitoring gaps

2. Join us and Engage:

• MongoDB Days – London – November 19

• MongoDB Days- San Francisco – December 3

• MongoDB Meet-ups, MUG, Office Hours

3. Start one step at a time - with “prototype” capabilities

What’s Next?

Questions?

Thank You!

@antoinegirbalAntoine.girbal@mongodb.com

@rebeccabucnis Rebecca.bucnis@mongodb.com

prepare for peak holiday season with mongodb

Technology

mongodb for c# developers - simonellistonball.com - mongodb...

Шардинг в mongodb, henrik ingo (mongodb)

mongodb days silicon valley: introducing mongodb 3.2

a morning with mongodb barcelona: mongodb and tapp

devoxx 2014 : atelier mongodb - decouverte de mongodb 2.6

mongodb days uk: mongodb and spark

mongodb evenings minneapolis: medtronic's mongodb journey

mongodb europe 2016 - advanced mongodb aggregation pipelines

chao zhang mongodb with dlvhex. the mongodb introduction to...

mongodb and using mongodb with .net

mongodb/cassandra -...

mongodb days germany: data processing with mongodb

automate mongodb with mongodb management service

realtime analytics with mongodb - mongodb meetup nyc

mongodb -...

chao zhang mongodb with dlvhex plugins. the mongodb

mongodb revised sharding guidelines mongodb...

morning with mongodb paris 2012 - mongodb basic concepts

mongodb iot city tour eindhoven: sharding in mongodb

mongodb 3.0 migration - mongodb days munich