![Page 2: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/2.jpg)
ToroDB
About $self and 8Kdata
![Page 3: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/3.jpg)
ToroDB
What do Stripe and Buffer have in common?
Use MongoDB for high load OLTP
Successful companies
They are Web Scale
![Page 4: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/4.jpg)
ToroDB
What else do they have in common?
Both ETL to a RDBMS
for Analytics
![Page 5: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/5.jpg)
ToroDB
MongoDB analytics performance: Github dataset
{ "id": "2489368070", "type": "PushEvent", "public": true, "created_at": "2015-01-01T00:00:00Z", "actor": { "id": 9152315, "login": "davidjhulse", "gravatar_id": "", "url": "https://api.github.com/users/davidjhulse", "avatar_url": "https://avatars.githubusercontent.com/u/9152315?" }, "repo": { "id": 28635890, "name": "davidjhulse/davesbingrewardsbot", "url": "https://api.github.com/repos/davidjhulse/davesbingrewardsbot" },
![Page 6: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/6.jpg)
ToroDB
MongoDB analytics performance: Github dataset
"payload": { "push_id": 536740396, "size": 1, "distinct_size": 1, "ref": "refs/heads/master", "head": "a9b22a6d80c1e0bb49c1cf75a3c075b642c28f81", "before": "86ffa724b4d70fce46e760f8cc080f5ec3d7d85f", "commits": [ { "sha": "a9b22a6d80c1e0bb49c1cf75a3c075b642c28f81", "author": { "email": "[email protected]", "name": "davidjhulse" }, "message": "Altered BingBot.jar\n\nFixed issue with multiple account", "distinct": true, "url": "https://api.github.com/repos/..." } ] } }
![Page 7: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/7.jpg)
ToroDB
MongoDB analytics performance: Github dataset
Mill
iseco
nds
0
300.000
600.000
900.000
1.200.000
1.500.000
1.800.000
2.100.000
2.400.000
2.700.000
Github dataset size (GB)
1 GB 10 GB 100 GBMongoDB
db.github.find({ type: 'PushEvent', 'actor.login': 'davidjhulse' });
![Page 8: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/8.jpg)
ToroDB
Why is it so slow?
• For most aggregated queries, the whole collection is scanned for every query.
• For every document, many keys and offsets are parsed and computed to find the possible keys. Worst case: all keys are scanned.
• Different documents with different in the same collection. Mixing apples and oranges!
![Page 9: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/9.jpg)
ToroDB
How MongoDB performs an aggregated query
![Page 10: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/10.jpg)
What if we had all our data in a RDBMS?
![Page 11: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/11.jpg)
ToroDB
How a RDBMS performs an aggregate query
![Page 12: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/12.jpg)
ToroDB
How a RDBMS performs an aggregate query
![Page 13: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/13.jpg)
ToroDB
Read I/O required to answer the query db.githubarchive.aggregate([ { $group: { _id: '$actor.login', events: { $sum: 1 } } }, { $sort: { events: -1 }}, { $limit: 10 } ])
SELECT count(*), login FROM actor GROUP BY login ORDER BY 1 DESC LIMIT 10;
![Page 14: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/14.jpg)
ToroDB
Read I/O required to answer the query (“iotop -o -a”)
Github Archive: top 10 actors (1,4GB dataset)
Disk
Rea
d (M
B)
0
125
250
375
500
MongoDB PostgreSQL
87,93
536,36MongoDB storageSize: 536.37 MB
MongoDB size: 1410.35 MB
Exactly 100% of the storageSize!
![Page 15: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/15.jpg)
ToroDB
What if we use a columnar store?
![Page 16: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/16.jpg)
ToroDB
And if we use a columnar store? (“iotop -o -a”)
Github Archive: top 10 actors (1,4GB dataset)
Disk
Rea
d (M
B)
0
125
250
375
500
MongoDB PostgreSQL PostgreSQL + cstore
6,587,93
536,36compressed, only 1 column read
![Page 17: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/17.jpg)
ToroDB
What about query performance?
Mill
iseco
nds
0
43.750
87.500
131.250
175.000
Github dataset size (GB)
1 GB 10 GB 100 GB
Mill
iseco
nds
0
300.000
600.000
900.000
1.200.000
1.500.000
1.800.000
2.100.000
2.400.000
2.700.000
Github dataset size (GB)
1 GB 10 GB 100 GB
PostgreSQLMongoDB
![Page 18: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/18.jpg)
ToroDB
What if we take this to the extreme
• What if we do an aggregate query for a 1:N relationship which is empty 99% of the time?
• MongoDB will still scan the whole collection
• A RDBMS will only scan an almost empty table
![Page 19: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/19.jpg)
ToroDB
Show me another benchmark! YASP dataset
db.getCollection('player_matches').aggregate([ { $match: { 'item_uses.key': 'quelling_blade' } }, { $group: { '_id': null, avg: { $avg: '$level' } } } ]);
SELECT AVG(matches.level) FROM uses INNER JOIN matches ON uses.did = matches.did WHERE uses.key = 'quelling_blade';
![Page 20: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/20.jpg)
ToroDB
Show me another benchmark! YASP dataset
YASP dataset. Avg level of users using 'quelling_blade' object
exec
utio
n tim
e (m
s)
0K
14K
28K
42K
56K
71K
85K
99K
113K
127K
MongoDB PostgreSQL
63,36
131.678,49
That's more than 2000x faster!
![Page 21: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/21.jpg)
ToroDB
So what about indexes?
• For most aggregated queries, indexes will not be used!
• You cannot plan in advance all the queries that will require an index.
•Worst, but common use case: its the users who craft the queries they want to make.
![Page 22: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/22.jpg)
ToroDB
Show me another benchmark! YASP dataset
YASP dataset. Avg level of users using 'quelling_blade' object
exec
utio
n tim
e (m
s)
0
8
16
24
32
MongoDB (Indexed) PostgreSQL (Indexed)
19,65
30,85
![Page 23: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/23.jpg)
ToroDB
I’m sold! Gimme SQL
• Significant performance improvement!
• No connectors required: native SQL!
• Design DDL, implement ETL
• Not real-time, not HA
Maybe not that fast…
![Page 24: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/24.jpg)
![Page 25: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/25.jpg)
ToroDB
ToroDB: MongoDB to relational
![Page 26: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/26.jpg)
ToroDB
ToroDB for MongoDB Analytics
• No need to design the SQL schema:
• ToroDB automagically does it for you!
• Real-time: insert in MongoDB, automatically shows up in ToroDB. Even if tables need created.
• Native SQL: your data ends in a RDBMS like PostgreSQL
![Page 27: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/27.jpg)
ToroDB
ToroDB works as a secondary on a replica set
ToroD
this is your SQL replica!
![Page 28: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/28.jpg)
ToroDB
ToroDB performance: yet another example :)
Github Archive 10GB dataset
Seco
nds
06
1218
2430
3642
4854
60
MongoDB ToroDB on PostgreSQLToroDB on Greenplum
2,467,901
58,519
![Page 29: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/29.jpg)
ToroDB
ToroDB main characteristics
• Works as a MongoDB secondary node
•No need to run drdl or define schema. Accepts any input document, even with type conflicts
•Query w/ native SQL (PostgreSQL as of today)
•Open source!
![Page 30: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/30.jpg)
ToroDB
ToroDB vs MongoDB BI Connector
ToroDB Mongo BI CLicense Open Source Proprietary
Query Language PostgreSQLPostgreSQL (v1) or reduced SQL set (v2)
Performance 100x faster than MongoDB
100x slower plus big (v1) or smaller (v2) connector overhead
NoSQL to SQL transformation Once (insert time) Many (per query)
Distributed analyticsWith Greenplum or CitusDB No
Columnar store and compression
Yes No
![Page 31: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/31.jpg)
ToroDB
ToroDB main use cases
Native SQL BI Connector
Native SQL BI
Connector
Data Integration Platform: SQL and NoSQL apps in the
same RDBMS
Live MongoDB
to RDBMS migration
Apps: Write data with
Mongo API, query with
SQL!
![Page 32: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/32.jpg)
ToroDB
Last but definitely not least
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY;
Consistent reads!
![Page 33: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/33.jpg)
ToroDB
ToroDB: coming to Github soon!
![Page 34: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/34.jpg)
ToroDB
Rate My Session!
![Page 35: ToroDB: All your MongoDB data are belong to SQL](https://reader031.vdocuments.us/reader031/viewer/2022022811/58710ed41a28abac6d8b549d/html5/thumbnails/35.jpg)
Let’s Talk!
Edificio 4B - Loft 33 Avda. Fuencarral, 44
Campus Empresarial Tribeca 28108 Alcobendas, Madrid (SPAIN)
(+34) 91 867 55 54