mongo db mug_2012-02-07
DESCRIPTION
Data extraction for reporting on Mongo usingTRANSCRIPT
O Where Clause, Where Clause! Wherefore art thou Where Clause?
(a.k.a. Aggregation for Reporting)
Overview
• Discuss and demonstrate aggregating data • Specifically addresses reporting needs• Example study: Aggregating Video Game Stats
Disclaimer!
Kills
Sales
Player
Product
Dataset{
"_id" : ObjectId("50fc77ee364c74eba1afe1e3"),"fragdate" : ISODate("2012-12-24T00:00:19.901Z"),"gameId" : 1221,"gameName" : "Christmas Blitz","kill" : {
"_id" : ObjectId("50acfd45712e8bc7832ea7cb"),"username" : "player002","avatar" : "avatar.com/player002.png","displayname" : "Sniper the Clown","rank" : "Sniper","motto" : "If you run, you'll just die tired."
},"player" : {
"userid" : 1,"username" : "ArmyD00d1221","avatar" : "avatar.com/armyd00d1221.png","displayname" : "Army Grunt"
},"server" : "app01.fragzilla.com"
}
Aggregate the number of times each
player was killed
Report Details{
"_id" : ObjectId("50fc77ee364c74eba1afe1e3"),"fragdate" : ISODate("2012-12-24T00:00:19.901Z"),"gameId" : 1221,"gameName" : "Christmas Blitz","kill" : {
"_id" : ObjectId("50acfd45712e8bc7832ea7cb"),"username" : "player002","avatar" : "avatar.com/player002.png","displayname" : "Sniper the Clown","rank" : "Sniper","motto" : "If you run, you'll just die tired."
},"player" : {
"userid" : 1,"username" : "ArmyD00d1221","avatar" : "avatar.com/armyd00d1221.png","displayname" : "Army Grunt"
},"server" : "app01.fragzilla.com"
}
Only aggregate kills on these three players:• Sniper the Clown• Kurious Killer• My L1ttl3 P0wn13
Only on Dec 23, 2012Between 2pm and 10pm
Relational DB
IdfragDategameIDgameNameserverfkKilledfkPlayer
Kills IdusernameavatardisplayNamerankmotto
Killed
IdusernameavatardisplayNamerankmotto
Player
Could be the same table
Relational DB
IdfragDategameIDgameNameserverfkKilledfkPlayer
Kills IdusernameavatardisplayNamerankmotto
Killed
IdusernameavatardisplayNamerankmotto
Player
Could be the same table
SELECT tk.fragDate, k.id, count(k.id) FROM test.kills tkJOIN players p ON tk.fkPlayer = p.id JOIN killed k ON tk.fkKilled = k.idWHERE k.id IN (1,2,3)GROUP BY fragDate, k.id;
Sidenote: Exploration
• Software Engineering tends to have more clearly defined goals
• Report Engineering tends to have more clearly defined questions
Query From The Shell
Output
Next Step: Delimited Output
Display in Excel, R, Processing, etc
Aggregation: Big Picture
• Somewhere between Mongo Queries and Map/Reduce implementations
• Best suited for totaling and averaging functions• Similar functionality to SQL Group By clause
Complexity
MongoQueries
MongoAggregation Framework
Map/ReduceImplementations
Anatomy of Aggregation Framework
db.collection.aggregate( [ {do something},
{do something else},{do even more stuff}
])
Aggregate command
Pipeline Operators
Pipeline Operators
• Pipelines: transforms documents from the collection as they pass through– grep e server.log | less
• Expressions: produce output documents based on calculations performed on input documents
Pipelines:
$project$match$limit$skip$unwind$group$sort
Expressions$group Operators:$addToSet$first$last$max$min$avg$sum
Boolean Operators:$and$or$not
Comparison Operators:$cmp$eq$gt$lt$ne
Arithmetic Operators:$add$subtract$multiply$divide
String Operators:$strcasecmp$substr$toLower$toUpper
Date Operators:$year$month$hour
See http://docs.mongodb.org/manual/reference/aggregation/#aggregation-expression-operatorsFor an exhaustive list
Our Aggregation Query
Our Aggregation Query
All the magic goes between the []
Our Aggregation Query
$match:Provides a query-like interface to filter
documents out of the aggregation pipeline. The $match drops
documents that do not match the condition from the aggregation
pipeline, and it passes documents that match along the pipeline unaltered.
Our Aggregation Query
$project: Reshapes a document stream by renaming, adding, or
removing fields. Also use $project to create computed values or sub-
objects
Our Aggregation Query
$groupGroups documents together for the
purpose of calculating aggregate values based on a collection of
documents. Practically, group often supports tasks such as average page
views for each page in a website on a daily basis.
Our Aggregation Query
“numKills”: { $sum: “$numKills” }
Our Aggregation Query$sort
The $sort pipeline operator sorts all input documents and returns them to
the pipeline in sorted order.
{ $sort : { <sort-key> } }
Aggregation Output
{"result" : [
{"_id" : {"displayname" : "My L1ttl3 P0wn13","eventhour" : 21},"numKills" : 133},{"_id" : {"displayname" : "Kurious Killer","eventhour" : 21},"numKills" : 130},
// ******* Omitted for brevity *******{"_id" : {"displayname" : "Sniper the Clown","eventhour" : 2},"numKills" : 6}
],"ok" : 1
}
Produces a document with two fields: result and ok
Aggregation Output
{"result" : [
{"_id" : {"displayname" : "My L1ttl3 P0wn13","eventhour" : 21},"numKills" : 133},{"_id" : {"displayname" : "Kurious Killer","eventhour" : 21},"numKills" : 130},
// ******* Omitted for brevity *******{"_id" : {"displayname" : "Sniper the Clown","eventhour" : 2},"numKills" : 6}
],"ok" : 1
}
Recap: Aggregation Framework
db.collection.aggregate( [ {do something},
{do something else},{do even more stuff}
])
Aggregate command
Pipeline Operators
We’re not quite done…
We can’t really give something like this to our customers:
But if we had…
A database config.
But if we had…
To run our aggregation.
But if we had…
Inside a node server.
Q/A/Comments
Will [email protected]@wfbutton