mongodb hackathon 02
TRANSCRIPT
2
Before we start
Copyright 2013, Vivek A. Ganesan, All rights reserved
oA BIG thank you to our sponsors – Big
Data Cloud
oMeeting Space
oFood + Drinks
oConsulting/Training
3
Agenda
Copyright 2013, Vivek A. Ganesan, All rights reserved
oReview of Hackathon 01
oData Modeling
oIndexing
oAggregation
oMap/Reduce
4
Introduction
Copyright 2013, Vivek A. Ganesan, All rights reserved
o This is a hackathon, not a classo Which means we work on stuff together
o Please consult and help your team mates
o There will be labs (that’s when we learn!)
o Talk to your team mates
o Figure out what problem you want to solve
o Think about your data sets and how to model them in
Mongo DB
5
Review – MongoDB Basics
Copyright 2013, Vivek A. Ganesan, All rights reserved
o MongoDB is a document-oriented NoSQL data store
o It saves data internally as Binary JSON
o A mongo data store may hold multiple databases
o A database may have multiple collections (analog of tables)
o A collection is a container of documents
o Documents contain Key/Value pairs
o A default key of “_id” is inserted by MongoDB for all documents
o User can set the value of “_id” to anything they want
o Documents are schema-free
o No fixed structure to a collection
o A collection can have documents with different key/value pairs
6
Review – Shell and Clients
Copyright 2013, Vivek A. Ganesan, All rights reserved
o A Mongo Shell is a CLI client to MongoDB
o Shell commands are Javascript functions
o You can write your own Javascript code within the shell
o You can also import Javascript modules using load()
o Mongo Shell looks for an initialization file : ~/.mongorc.js
o Setup global variables here
o To use your favorite editor within the Mongo shell :
o Set the environment variable EDITOR to your editor
o MongoDB supports clients in several programming languages :
o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang
7
Review – Mongo DB Objects
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Note : Mongo Shell commands are in blue and output is in green
o Mongo uses a hierarchical naming scheme for database objects
o The current database is always in the db object
o The db command prints the name of the current db
o A collection called “mycollection” in the current database :
o db.mycollection (Note : This is a mongodb object)
o Commands are methods invoked on objects
o For e.g., to insert a document to db.mycollection collection :
o db.mycollection.insert command
o For e.g., to find documents in db.mycollection collection :
o db.mycollection.find command
8
Review – Create
Copyright 2013, Vivek A. Ganesan, All rights reserved
o First exercise :
o Create a new database called “blog”
o Create a collection called “users” and a collection called “posts”
o Solution to first exercise :
o use blog;
o db; => blog
o show collections; => system.indexes
o db.createCollection(“users”); => { “ok” => 1 }
o db.createCollection(“posts”); => { “ok” => 1 }
o show collections; => posts, system.indexes, users
9
Review – Insert
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Second Exercise :
o In the “users” collection :
o Insert a single document, {username: “admin”}
o In the “posts” collection :
o Insert ten posts using a loop
o Blog data : post_title, post_body and post_tags as CSV
o Solution to Second Exercise :o db.users.insert({username : “admin”});
o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title: "Title",
post_body: "Post Body", post_tags: "tag1,tag2,tag3,tag4,tag5"});
}
10
Review – Updates with modifier
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Third Exercise :
o In the “posts” collection :
o Update ten posts with an updated_at key and set it to the
current timestamp
o Solution to the Third Exercise :
o Note : MongoDB replaces the entire document for an
update call without a modifier (modifiers start with a
‘$’ symbol)
o db.posts.update({}, {$set : {updated_at: new Date()}},
false, true);
11
Review – Selective Updates
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Fourth Exercise :
o In the “posts” collection :
o Update the posts such that the first three posts have a “foo”
tag (use the cursor functionality to iterate)
o Solution to the Fourth Exercise :
o c = db.posts.find().limit(3);
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"] + ",foo";
o db.posts.save(post);
o }
12
Review – Mastering find
Copyright 2013, Vivek A. Ganesan, All rights reserved
o In a Mongo Shell,o Find all posts but extract only the post_title field
o db.posts.find({}, {post_title: 1, _id: 0});
o List all posts but in reverse order of created_on
o db.posts.find().sort({_id: -1});
o Do the same as above but paginate in sets of three
o db.posts.find().sort({_id: -1}).skip(3).limit(3);
o Find all posts that contain a tag called “foo”
o db.posts.find({post_tags: /foo/});
13
Review – Modifiers
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Fifth Exercise :o Modify “posts” collection
o Change the post_tags field to an array instead of
a CSV list
o c = db.posts.find();
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"].split(",");
o db.posts.save(post);
o }
14
Data Modeling
Copyright 2013, Vivek A. Ganesan, All rights reserved
o http://docs.mongodb.org/manual/core/data-modeling/
o When to reference?
o When it makes sense to i.e. many-to-many relationships
o When document size is a concern
o Some drivers may do this automatically
o When to embed?
o When it is “natural” for e.g. blog post and comments
o When there is a need for atomic operations
o When read performance is critical
15
Lab 01 – Model your data set
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Break – 15 minutes
o Lab 01 – 45 minutes - With your team :
o Look at your data set and figure out how you will model it
o How would you bulk load the data?
o How would you handle errors while loading?
o Implement the schema for your data set
o Bulk load a small portion of your data set
o Verify the load and also run some sample queries
o Figure out what queries you would run frequently
16
Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved
o http://docs.mongodb.org/manual/core/indexes/
o When to index?
o Improve find performance
o Improve sort performance
o Note : There is a performance impact for writes
o What to index?
o Depends on the query
o Usually, most frequently searched for fields
o Sometimes, fields in embedded documents as well
17
Types of Indexes and Options
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Unique indexes (_id has an unique index by default)
o Simple
o Compound Indexes
o Prefix order is important!
o Text indexes
o Sparse Indexes
o Multi-key indexes (for arrays)
o Geospatial and Geohaystack indexes
o Indexes can be built in the background (recommended!)
o Indexes can be named explicity (definitely recommened!)
18
Lab 02 – Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Lab 02 – 30 minutes - With your team :
o Look at the frequent queries from Lab 01 and :
o Which would you index and why?
o What kind of indexes are needed?
o Since this is predominantly a read use case, index away
o Would you use the sparse index? For what and how?
o Would you use the geospatial index? For what and how?
o Would you use the TTL index? For what and how?
19
Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Used for “group by”-like queries
o Aggregation Framework (introduced in 2.1)
o http://docs.mongodb.org/manual/aggregation/
o Simple count : db.posts.count();
o Using Aggregation Framework :
db.posts.aggregate([{ $group: { _id: null, count: {$sum:
1}}}]);
o Check the reference for comparison with SQL group by
o Still supports Map/Reduce (older approach and still relevant)
20
Lab 03 – Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Lab 03 – 30 minutes - With your team :
o Figure out what aggregations to run on the data set :
o For e.g., average rating per user?
o Or, average number of movies rated by all users?
o Write the queries for these aggregations and test them
o Are indexes helpful in aggregations? Why/Why not?
o Are you better off just doing these in your client code?
Why/Why not?
o When would you use pipelined aggregations?
21
Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Scatter/Gather framework
o db.collection.mapReduce(map_fn, red_fn, {out: output_coll})
o http://docs.mongodb.org/manual/aggregation/
o Mapper – just emits key/value pairs
o Framework – Groups and sorts mapper output => Reducer
o Reducer – Applies a function on the input => Output Coll.
o Distributed computation framework for full table scans
o http://docs.mongodb.org/manual/tutorial/map-reduce-
examples/
22
Lab 04 – Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Lab 04 – 30 minutes - With your team :
o Go through the Map/Reduce examples
o Figure out what Map/Reduce functions you would use
o Implement these functions (on a small data set)
o Some things to think about :
o Can you use Map/Reduce to “seed” your
recommendations?
o Can you use incremental Map/Reduce to “update”
your recommendations? How would you do this?
Copyright 2013, Vivek A. Ganesan, All rights reserved
23
Questions? Comments?
Thank You!
E-mail: [email protected] : onevivek