back to basics webinar 5: introduction to the aggregation framework
TRANSCRIPT
![Page 1: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/1.jpg)
MongoDB Europe 2016Old Billingsgate, London
15th November
mongodb.com/europe
Register with code JD20 for a 20% discount
![Page 2: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/2.jpg)
Back to Basics 2016 : Webinar 5
Introduction to the Aggregation Framework
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
V1.1
![Page 3: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/3.jpg)
3
Recap
• Webinar 1 – Introduction to NoSQL
– The different types of NoSQL databases
– MongoDB is a document database
• Webinar 2 – My First Application
– Creating databases and collections
– CRUD, Indexes and Explain
• Webinar 3 – Schema Design
– Dynamic schema
– Embedding approaches
• Webinar 4 – GeoSpatial and Text Indexes
![Page 4: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/4.jpg)
4
The Aggregation Framework
• An analytics engine for MongoDB
• What is analytics?
• Think of the two types of database, OLTP, OLAP
• OLTP : Online Transaction Processing
– airline booking,
– ATMs,
– Taxi booking
• OLAP : Online Analytical Processing
– Which tickets make us most money?
– When do we need to refill our ATMs?
– How many cabs do we need to service the West End of London?
![Page 5: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/5.jpg)
5
What Does This Look Like?
OLTP OLAP
![Page 6: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/6.jpg)
6
OLAP - There Be (Hadoop) Dragons Here
• OLAP queries are often table scans
• The output of the queries is often stored for future analysis and comparison
• Many customers are looking at Spark and Hadoop for OLAP but:
– Complexity is astronomical
– Focused on algorithmic analysis of data (you gotta write a program)
– Requires some significant knowledge of parallel alogrithms and parallel
processing
• The aggregation framework is a kinder gentler tool
• May do what you want in less time with less anguish
![Page 7: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/7.jpg)
7
The Aggregation Framework – A Processing Pipeline
Match Project Group SortLimit
• Think unix pipeline
• The output of one stage is passed to the input of the next stage
• Each stage performs one job
• Stages can be repeated
• The input is a single collection
![Page 8: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/8.jpg)
8
Pipeline Operators
• $match
Filter documents
• $project
Reshape documents
• $group
Summarize documents
• $out
Create new collections
• $sort
Order documents
• $limit/$skip
Paginate documents
• $lookup
Join two collections togther
• $unwind
Expand an array
![Page 9: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/9.jpg)
9
The MoT Data Set
![Page 10: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/10.jpg)
10
Example Document
{ "_id" : ObjectId("5759ee6e8684975e1098af68"),
"TestID" : 400,
"VehicleID" : "278",
"TestDate" : ISODate("2013-04-23T00:00:00Z"),
"TestClassID" : "4",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 99284,
"Postcode" : "E",
"Make" : "AUDI",
"Model" : "A3",
"Colour" : "BLACK",
"FuelType" : "P",
"CylinderCapacity" : 1598,
"FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }
![Page 11: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/11.jpg)
11
Lets Use The Shell
MongoDB Enterprise > use vosa
switched to db vosa
MongoDB Enterprise > db.results2013.findOne()
{
"_id" : ObjectId("577294020cb23533dfbaac18"),
"TestID" : 17,
"VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"),
"TestClassID" : "2",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 46414,
"Postcode" : "BN",
"Make" : "SUZUKI",
"Model" : "UNCLASSIFIED",
"Colour" : "GREEN",
"FuelType" : "P",
"CylinderCapacity" : 398,
"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")
}
![Page 12: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/12.jpg)
12
$limit
MongoDB Enterprise > db.results2013.aggregate([ { "$limit" :2 } ] )
{
"_id" : ObjectId("577294020cb23533dfbaac18"),
"TestID" : 17,
"VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"),
"TestClassID" : "2",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 46414,
"Postcode" : "BN",
"Make" : "SUZUKI",
"Model" : "UNCLASSIFIED",
…
![Page 13: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/13.jpg)
13
Let’s Make a Small Collection
> db.results2013.aggregate( [ { "$limit" : 10000 }, {"$out" : "results10k" } ] )
> db.results10k.count()
10000
> db.results10k.findOne()
{
"_id" : ObjectId("577294020cb23533dfbaac18"),
"TestID" : 17,
"VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"),
"TestClassID" : "2",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 46414,
"Postcode" : "BN",
"Make" : "SUZUKI",
"Model" : "UNCLASSIFIED",
"Colour" : "GREEN",
"FuelType" : "P",
"CylinderCapacity" : 398,
"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")
}
![Page 14: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/14.jpg)
14
$match
…aggregate([ { "$limit" :2000 },
{ "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}} ])
{ "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28,
"TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" :
"N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" :
"SUZUKI", "Model" : "UNCLASSIFIED", "Colour" : "GREEN", "FuelType" : "P",
"CylinderCapacity" : 398, "FirstUseDate" : ISODate("1993-08-11T00:00:00Z") }
{ "_id" : ObjectId("577294020cb23533dfbaac19"), "TestID" : 22, "VehicleID" : 33,
"TestDate" : ISODate("2013-06-07T00:00:00Z"), "TestClassID" : "2", "TestType" :
"N", "TestResult" : "P", "TestMileage" : 15605, "Postcode" : "PE", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",
"CylinderCapacity" : 150, "FirstUseDate" : ISODate("1962-01-01T00:00:00Z") }
{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "TestID" : 44, "VehicleID" : 49,
"TestDate" : ISODate("2013-08-09T00:00:00Z"), "TestClassID" : "4", "TestType" :
"N", "TestResult" : "PRS", "TestMileage" : 72694, "Postcode" : "SO", "Make" :
"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",
"CylinderCapacity" : 998, "FirstUseDate" : ISODate("2001-05-16T00:00:00Z") }
...
![Page 15: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/15.jpg)
15
$project (1 of 2)
ageinusecs = { "$subtract" : [ "$TestDate", "$FirstUseDate" ] }
ageinyears = { "$divide" :[ ageinusecs , (1000*3600*24*365) ] }
floorage = { "$floor" : ageinyears }
ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]}
project = { "$project" : { "Make” :1,
"Model” :1,
"VehicleID" :1,
"TestResult” :1,
"Age” :floorage,
"pass” :ispass }}
![Page 16: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/16.jpg)
16
$project (2 of 2)
MongoDB Enterprise > db.nonulldates.aggregate( [ project ] )
{ "_id" : ObjectId("577294020cb23533dfbaac18"), "VehicleID" : 28, "TestResult" : "P", "Make" : "SUZUKI", "Model" : "UNCLASSIFIED", "Age" : 19, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac19"), "VehicleID" : 33, "TestResult" : "P", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 51, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "VehicleID" : 49, "TestResult" : "PRS", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 }
{ "_id" : ObjectId("577294020cb23533dfbaac1b"), "VehicleID" : 54, "TestResult" : "P", "Make" : "NISSAN", "Model" : "MICRA GX", "Age" : 13, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1c"), "VehicleID" : 54, "TestResult" : "F", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 13, "pass" : 0 }
{ "_id" : ObjectId("577294020cb23533dfbaac1d"), "VehicleID" : 63, "TestResult" : "P", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 1 }
{ "_id" : ObjectId("577294020cb23533dfbaac1e"), "VehicleID" : 63, "TestResult" : "F", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 }
{ "_id" : ObjectId("577294020cb23533dfbaac1f"), "VehicleID" : 93, "TestResult" : "P", "Make" : "BMW", "Model" : "318ti SE COMPACT", "Age" : 12, "pass" : 1 }
…
![Page 17: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/17.jpg)
17
$group
countMakes = { "$group" : {{ "_id" : "$Make"} , "total" : {"$sum" : 1 }}}
db.nonulldates.aggregate( [ countMakes ])
{ "_id" : "IVECO", "total" : 1 }
{ "_id" : "ISUZU", "total" : 1 }
{ "_id" : "YAMAHA", "total" : 1 }
{ "_id" : "OLDSMOBILE", "total" : 1 }
{ "_id" : "KAWASAKI", "total" : 1 }
{ "_id" : "MASERATI", "total" : 1 }
{ "_id" : "BENELLI", "total" : 1 }
{ "_id" : "BENTLEY", "total" : 3 }
{ "_id" : "AUDI", "total" : 26 }
{ "_id" : "SMART", "total" : 2 }
{ "_id" : "HARLEY-DAVIDSON", "total" : 1 }
…
![Page 18: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/18.jpg)
18
Summary
• A pipeline of operations
• Select, project, group, sort
• $out must appear last in an aggregation pipeline
• There are a range of accumulators (see the group by
documentation)
• Very powerful way to reshape and analyse data
• Shard aware to gain maxinum performance for large clusters
![Page 19: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/19.jpg)
Q&A
![Page 20: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/20.jpg)
![Page 21: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/21.jpg)
21
• This is slide content
![Page 22: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/22.jpg)
![Page 23: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/23.jpg)
![Page 24: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/24.jpg)
![Page 25: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/25.jpg)
25
![Page 26: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/26.jpg)
26
![Page 27: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/27.jpg)
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
Sollicitudin VenenatisLOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
LOREM
IPSUM
Graphic Element Examples
![Page 28: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/28.jpg)
Porta Ultricies
Commodo Porta
Graph Examples
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Category 1 Category 2 Category 3 Category 4
Series 1
Series 2
![Page 29: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/29.jpg)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Category 1 Category 2 Category 3 Category 4
Series 1
Series 2
![Page 30: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/30.jpg)
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
Code/Highlight Example
![Page 31: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/31.jpg)
Aggregation Framework Agility Backup Big Data Briefcase
Buildings Business Intelligence Camera Cash Register Catalog
Chat Checkmark Checkmark Cloud Commercial Contract
Computer Content Continuous Development Credit Card Customer Success
![Page 32: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/32.jpg)
Data Center Data Variety Data Velocity Data Volume Data Warehouse Database
Dialogue Directory Documents Downloads Drivers Dynamic Schema
EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop
Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development
![Page 33: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/33.jpg)
Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass
Man Mobile Phone Meter Monitoring Music New Apps
New Data Types Online Open Source Parachute Personalization Pin
Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying
![Page 34: Back to Basics Webinar 5: Introduction to the Aggregation Framework](https://reader034.vdocuments.us/reader034/viewer/2022052117/5873ecbe1a28abb1528b4763/html5/thumbnails/34.jpg)
Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel
Stopwatch Text Search Tick Data Training Transmission Tower Trophy
Woman World