back to basics webinar 4: advanced indexing, text and geospatial indexes
TRANSCRIPT
MongoDB Europe 2016Old Billingsgate, London
15th November
Use my code JD20 for 20% off ticketsmongodb.com/europe
Back to Basics 2016 : Webinar 4
Advanced Indexing – Text and Geospatial Indexes
Joe DrumgooleDirector of Developer Advocacy, EMEA
@jdrumgoole
V1.1
3
Recap
• Webinar 1 – Introduction to NoSQL– The different types of NoSQL databases– What kind of database is MongoDB? A document database.
• Webinar 2 – My First Application– Creating databases and collections– CRUD operations– Indexes and Explain
• Webinar 3 – Schema Design– Dynamic schema– Embedding approaches– Examples
4
Indexing
• An efficient way to look up data by its value• Avoids table scans
5
Traditional Databases Use Btrees
• … and so does MongoDB
6
Queries, Inserts, Deletes O(Log(n) Time
7
Creating a Simple Index
db.coll.createIndex( { fieldName : <Direction> } )
Database Name
Collection Name
Command
Field Name to be indexed
Ascending : 1 Descending : -1
8
Two Other Kinds of Indexes
• Full Text Index– Allows searching inside the text of a field ( Lucene, Solr and Elastic
Search)• Geospatial Index
– Allows searching by location (e.g. people near me)• These indexes do not use Btrees
9
Full Text Indexes
• An “inverted index” on all the words inside a single field (only one text index per collection)
{ “comment” : “I think your blog post is very interesting and informative. I hope you will post more info like this in the future” }
>> db.posts.createIndex( { “comments” : “text” } )
MongoDB Enterprise > db.posts.find( { $text: { $search : "info" }} ){ "_id" : ObjectId(“…"), "comment" : "I think your blog post is very interesting and informative. I hope you will post more info like this in the future" }MongoDB Enterprise >
10
Results
MongoDB Enterprise > db.posts.getIndexes()...
{"v" : 1,"key" : {
"_fts" : "text","_ftsx" : 1
},"name" : "comment_text","ns" : "test.posts","weights" : {
"comment" : 1},"default_language" : "english","language_override" : "language","textIndexVersion" : 3
}
11
Dropping Text Indexes
• We drop text indexes by name rather than shapedb.posts.getIndexes()
{"v" : 1,"key" : {
"_fts" : "text","_ftsx" : 1
},"name" : "comment_text_text","ns" : "test.posts","weights" : {
"comment" : 5,"tags" : 10
},"default_language" : "english","language_override" : "language","textIndexVersion" : 3
}
12
Hence
MongoDB Enterprise > db.posts.dropIndex( "comment_text_tags_text" ){ "nIndexesWas" : 2, "ok" : 1 }MongoDB Enterprise >
• You can give an index an explict name to make this easier
MongoDB Enterprise > db.posts.createIndex( { "comments" : "text", "tags" : "text" }, { "name" : "text_index" } ){
"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1
}
13
On The Server
I INDEX [conn275] build index on: test.posts properties: { v: 1, key: { _fts: "text", _ftsx: 1 }, name: "comment_text", ns: "test.posts", weights: { comment: 1 }, default_language: "english", language_override: "language", textIndexVersion: 3 }}I INDEX [conn275] building index using bulk methodI INDEX [conn275] build index done. scanned 3 total records. 0 secs
14
More Detailed Example
>> db.posts.insert( { "comment" : "Red yellow orange green" } )>> db.posts.insert( { "comment" : "Pink purple blue" } )>> db.posts.insert( { "comment" : "Red Pink" } )
>> db.posts.find( { "$text" : { "$search" : "Red" }} ){ "_id" : ObjectId(“…”), "comment" : "Red yellow orange green" }{ "_id" : ObjectId( »…"), "comment" : "Red Pink" }>> db.posts.find( { "$text" : { "$search" : "Red Green" }} ){ "_id" : ObjectId(« …"), "comment" : "Red Pink" }{ "_id" : ObjectId(« …"), "comment" : "Red yellow orange green" }>> db.posts.find( { "$text" : { "$search" : "red" }} ) # <- Case Insensitve{ "_id" : ObjectId(“…"), "comment" : "Red yellow orange green" }{ "_id" : ObjectId(«…”), "comment" : "Red Pink" }>>
15
Using Weights
• We can assign different weights to different fields in the text index• E.g. I want to favour tags over comments in searching• So I increase the weight for the the tags field
>> db.blog.createIndex( { comment: "text", tags : "text” }, { weights: { comment: 5, tags : 10 }} )• Now searches will favour tags
16
$textscore
• Weights impact $textscore:
>> db.posts.find( { "$text" : { "$search" : "Red" }}, { score: { $meta: "textScore" }} ).sort( { score: { $meta: "textScore" } } ){ "_id" : …, "comment" : "hello", "tags" : "Red green orange", "score" : 6.666666666666666 }{ "_id" : …, "comment" : "Red Pink", "score" : 3.75 }{ "_id" : …, "comment" : "Red yellow orange green", "score" : 3.125 }>>
17
Other Parameters
• Language : Pick the language you want to search in e.g. – $language : Spanish
• Support case sensitive searching– $caseSensitive : True (default false)
• Support accented characters (diacritic sensitive search e.g. café is distinguished from cafe )– $diacriticSensitive : True (default false)
Geospatial Indexes
19
Geospatial Indexes
• MongoDB supports 2D Sphere indexes• Allows a user to represent location on the earth (which is a sphere)• Coordinates are stored in GeoJSON format• The Geospatial index supports subset of the GeoJSON operations• The index is based on a QuadTree representation• Index is based on WGS 84 standard
20
Coordinates
• Coordinates are represented as longitude, latitude• longitude
– Measured from Greenwich meridian in London (0 degrees) locations east (up to 180 degrees)
– For locations west we specify as negative • Latitude
– Measured from equator north and south (0 to 90 north, 0 to -90 south)• Coordinates in MongoDB are stored on Longitude/Latitude order• Coordinates in Google are stored in Latitude/Longitude order
21
2DSphere Versions
• Three versions of 2dSphere index in MongoDB• Version 1 : Up to MongoDB 2.4• Version 2 : From MongoDB 2.6 onwards• Version 3 : From MongoDB 3.2 onwards• We will only be talking about Version 3 in this webinar
22
Creating a 2dSphere Index
db.collection.createIndex ( { <location field> : "2dsphere" } )
• Location field must be coordinate or GeoJSON data
23
Example
>> db.test.createIndex( { loc : "2dsphere" } ){
"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1
}
24
Output
>> db.test.getIndexes()[
{"v" : 1,"key" : {
"loc" : "2dsphere"},"name" : "loc_2dsphere","ns" : "geo.test","2dsphereIndexVersion" : 3
}]>>
25
Use a Simple Dataset to investigate Geo Queries
• Lets search for restaurants in Manhattan• Using two candidate collections
– https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/neighborhoods.json– https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/restaurants.json
• Import them into MongoDB– mongoimport –c neighborhoods –d geo neighborhoods.json– mongoimport –c restaurants –d geo restaurants.json
26
Neighborhood Document
MongoDB Enterprise > db.neighborhoods.findOne(){
"_id" : ObjectId("55cb9c666c522cafdb053a1a"),"geometry" : {"coordinates" : [[[-73.94193078816193,40.70072523469547],
...[-73.94409591260093,40.69897295461309],
]
"type" : "Polygon"},"name" : "Bedford"
}
27
Restaurant Document
MongoDB Enterprise > db.restaurants.findOne(){
"_id" : ObjectId("55cba2476c522cafdb053adf"),"location" : {
"coordinates" : [-73.98241999999999,40.579505
],"type" : "Point"
},"name" : "Riviera Caterer"
}MongoDB Enterprise >
You can type this into google maps but
remember to reverse the coordinate order
28
Add Indexes
MongoDB Enterprise > db.restaurants.createIndex({ location: "2dsphere" }){
"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1
}MongoDB Enterprise > db.neighborhoods.createIndex({ geometry: "2dsphere" }){
"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1
}MongoDB Enterprise >
29
Use $geoIntersects to find our Neighborhood
• Assume we are at -73.93414657, 40.82302903• What neighborhood are we in? Use $geoIntersects
db.neighborhoods.findOne({ geometry: { $geoIntersects: { $geometry: { type: "Point", coordinates: [ -73.93414657, 40.82302903 ]}}}})
30
Results
{"geometry" : {
”coordinates" : [[
-73.9338307684026,40.81959665747723
], ...
[-73.93383000695911,40.81949109558767
] ]
"type" : "Polygon"},"name" : "Central Harlem North-Polo Grounds"
}
31
Find All Restaurants within 0.35 km
db.restaurants.find({ location: { $geoWithin: { $centerSphere: [ [ -73.93414657, 40.82302903 ], 5 / 6,378.1 ] } } })
Distance in km Divide by radius of earth to convert to radians
32
Results – (Projected)
{ "name" : "Gotham Stadium Tennis Center Cafe" }{ "name" : "Chuck E. Cheese'S" }{ "name" : "Red Star Chinese Restaurant" }{ "name" : "Tia Melli'S Latin Kitchen" }{ "name" : "Domino'S Pizza" }
• Without projection
{ "_id" : ObjectId("55cba2476c522cafdb0550aa"), "location" : { "coordinates" : [ -73.93795159999999, 40.823376 ], "type" : "Point" }, "name" : "Domino'S Pizza" }
33
Summary of Operators
• $geoIntersect: Find areas or points that overlap or are adjacent
• $geoWithin: Find areas on points that lie within a specific area• $geoNear: Returns locations in order from nearest to furthest
away
34
Summary
• Text Indexes : Full text searching of all the text items in a collection
• Geospatial Indexes : Search by location, by intersection or by distance from a point
35
Q & A
37
• This is slide content
41
42
LOREM IPSUM
LOREM IPSUM
LOREM IPSUM
LOREM IPSUM
Sollicitudin VenenatisLOREM IPSUM
LOREM IPSUM
LOREM IPSUM
LOREM IPSUM
Graphic Element Examples
Porta Ultricies
Commodo Porta
Graph Examples
Category 1 Category 2 Category 3 Category 40
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Series 1Series 2
Category 1 Category 2 Category 3 Category 40
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Series 1Series 2
{ _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" }
] }
Code/Highlight Example
Aggregation Framework Agility Backup Big Data Briefcase
Buildings Business Intelligence Camera Cash Register Catalog
Chat Checkmark Checkmark Cloud Commercial Contract
Computer Content Continuous Development Credit Card Customer Success
Data Center Data Variety Data Velocity Data Volume Data Warehouse Database
Dialogue Directory Documents Downloads Drivers Dynamic Schema
EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop
Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development
Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass
Man Mobile Phone Meter Monitoring Music New Apps
New Data Types Online Open Source Parachute Personalization Pin
Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying
Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel
Stopwatch Text Search Tick Data Training Transmission Tower Trophy
Woman World