mongodb and in-memory computing
TRANSCRIPT
![Page 1: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/1.jpg)
Elevate Your Enterprise Architecture with an In-Memory
Computing Strategy
Dylan TongPrincipal Solutions [email protected]
![Page 2: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/2.jpg)
In-Memory Computing
How can we process data as fast as possible by leveraging in-memory speed at it’s best?
What are the possibilities if we could?
![Page 3: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/3.jpg)
High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions.
Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds.
Source: Investopedia
Speed Matters…
![Page 4: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/4.jpg)
Speed Matters…Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon]
A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group]
A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]
![Page 5: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/5.jpg)
How Fast?
Latency Unit
RAM access 100s ns
SSD access 100s µs
HDD access 10s ms
Normalized to 1 s
~6 min
~6 days
~12 months
![Page 6: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/6.jpg)
Why Now?*Average $/GB
2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125
2005 2010 2013 2015$0
$20
$40
$60
$80
$100
$120
$140
$160
$180
$200
Last 10 Years…
“Generally affordable”
*http://www.statisticbrain.com/average-historic-price-of-ram/
![Page 7: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/7.jpg)
Why Now?
2010 2013 2015$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
“An Option at Scale”
*Average $/GB
2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125
Last 5 Years…
*http://www.statisticbrain.com/average-historic-price-of-ram/
![Page 8: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/8.jpg)
"This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car.
The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.”
Source: T3.com
The possibilities…
![Page 9: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/9.jpg)
Challenges: Scale
![Page 10: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/10.jpg)
Challenges: Cost Viability
= $34,777/yr. ~$1.74M/yr. for infrastructure to support 100TB
![Page 11: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/11.jpg)
Challenges: Cost Viability
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.00 500K
SSD 0.47-1.00 47K to 100K
HDD 0.03 3K
http://www.statisticbrain.com/average-cost-of-hard-drive-storage/
http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/
![Page 12: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/12.jpg)
Challenges: DurabilityVolatile Memory
• What happens when things fail, and what data maybe loss?
• How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?
![Page 13: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/13.jpg)
Challenges: Design Still Matters
![Page 14: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/14.jpg)
on RAM
![Page 15: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/15.jpg)
Scenario : ECommerce Modernization InitiativeBusiness Problems Technology Limitation
Customer experience is suffering during high traffic events.
Too expensive to scale system to support spike events.
Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth
Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare.
Lack of mobile customers in Europe and Asia has been attributed to latency issues.
Difficult to extend data architecture globally, so effort is put on hold
![Page 16: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/16.jpg)
Scenario : ECommerce Modernization InitiativeBusiness Problems Technology LimitationBelow industry conversation rate performance has been attributed partly to poor personalization
Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization
“Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical.
Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough
Related to limitations above
Integrating data into data warehouse is slow and hard to maintain
![Page 17: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/17.jpg)
OrdersProduct Catalog
Customer Data: Profile, Sessions,
Carts, PersonalizationInventory
NoSQLRDBMS
Platform Services
eCommerce Datastores Dependent External Data Sources and Integrations
CRM ERP PIM
Data warehouse
BI Tools
…
Platform API
Scenario : ECommerce Modernization Initiative
![Page 18: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/18.jpg)
Customer Data: Profile, Sessions,
Carts, Personalization
NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Product Catalog
Silo Data-sources Problem
SLOW AND POOR SCALABILITY
![Page 19: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/19.jpg)
NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Operational Single View
Operational Single ViewCustomer Data:
Profile, Sessions, Carts, Personalization
Product Catalog
![Page 20: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/20.jpg)
Operational Single View
MongoDB Enterprise Data Hub
Operational Single View
![Page 21: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/21.jpg)
Reference: Metlife Wall Presentation
![Page 22: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/22.jpg)
{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]
}
{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’}
{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}
Documents in the same product catalog collection in MongoDB
Dynamic Schema
![Page 23: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/23.jpg)
Flexible Data Model: facilitates agile development and continuous delivery methodologies
Scalability: scale-out dynamically as demand grows
Still Agile, Scalable and Simple
![Page 24: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/24.jpg)
High Performance: • More predictable, and lower
latency on less in-memory infrastructure.
In-Memory Storage Engine
Infrastructure Optimization: • Assign a data subset on the
In-Memory SE via Zone Sharding.
• Optimize on cost vs. performance without silos.
.Rich Query Capability: • Full MongoDB Query and
Indexing Support.IN-MEMORY SE NODES WIREDTIGER NODES
![Page 25: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/25.jpg)
WEST EAST
Update
SHARD 4TAG: EAST, WT
Local Read/Write with Strong Consistency
Session Data Geographically Localized, and with In-memory Engine Latency
SHARD 2TAG: WEST, WT
SHARD 3TAG: EAST, IN_MEM
SHARD 1TAG: WEST, IN_MEM
![Page 26: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/26.jpg)
Durability and Fault-Tolerance:
• Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE.
• Full High Availability: automatic fail-over, cross geography.
In-Memory Storage Engine
![Page 27: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/27.jpg)
NoSQLRDBMS
Platform Databases Dependent External Data Sources and Integrations
CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Operational Unified View
Advance Personalization
1. TRAIN/RE-TRAIN ML MODELS
2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS
3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ETC.
![Page 28: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/28.jpg)
Why ?Speed. By exploiting in-memory optimizations, Sparkhas shown up to 100x higher performance thanMapReduce running on Hadoop.
Simplicity. Easy-to-use APIs for operating on largedatasets. This includes a collection of sophisticatedoperators for transforming and manipulatingsemi-structured data.
Unified Framework. Packaged with higher-level libraries,including support for SQL queries, machine learning,stream and graph processing. These standard librariesincrease developer productivity and can be combined tocreate complex workflows.
![Page 29: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/29.jpg)
Operational Single View
+Spark Connector
• Native Scala connector, certified by Databricks
• Exposes all Spark APIs &
libraries
• Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations
• Locality awareness to reduce data movement
![Page 30: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/30.jpg)
Locality AwarenessCLUSTER MANAGER
TaskTask
TaskTask
Task
DRIVER PROGRAM
SPARK CONTEXT
![Page 31: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/31.jpg)
Operational Single View
+Spark Connector
Blend client data from multiple internal and external sources to drive real time campaign optimization
![Page 32: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/32.jpg)
MongoDB+Spark at China Eastern
180m fare calculations & 1.6 billion searches per day
Oracle database peaked at 200 searches per second.
Radically re-architect their fare engine to meet the required 100x growth in search traffic.
![Page 33: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/33.jpg)
ETL
(Yesterday’s) Data at the Speed of Thought?
![Page 34: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/34.jpg)
BI Connector
BI Connector
db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } }] )
SELECT SUM(price) AS totalFROM orders
![Page 35: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/35.jpg)
Resources for YouSpark Connector• Download: Spark Packages
GitHub • Documentation
• Whitepaper: Turning Analytics into Real-Time Action
• Education:M233: Getting Started with Spark and MongoDB
In-Memory Storage Engine• Download: Enterprise Server• Documentation
BI Connector• Download: BI Connector• Documentation
![Page 36: MongoDB and In-Memory Computing](https://reader038.vdocuments.us/reader038/viewer/2022103105/5899afe01a28aba11e8b4771/html5/thumbnails/36.jpg)