nosql databases please remember to read the nosql distilled book and the seven databases book

17
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Upload: johnathan-nichols

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

NOSQL DATABASESPlease remember to read the NOSQL Distilled book and the Seven Databases book

Page 2: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Before we start• The classification of the various nosql databases is

imprecise, semi-controversial, and we have to be careful about reading too much into it.

• Rather than focusing on categorizing dbs, we should be concerned with what they do, how they relate to each other with respect to functionality, and how they compare to sql databases.

Page 3: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Key-value and key-document DBs• Databases that access aggregate data

• Key-value dbs know nothing about the structure of the aggregate• Key-document databases do know, but the interpretation of these

aggregates happens outside the db• Keep in mind that these two categories of databases overlap in

practice

• Importantly, both of these two database systems categories focus on storing and retrieving individual aggregates, and not on interrelating (horizontally) multiple aggregates

• There is something similar to this in SQL DBs – and that is highly un-normalized tables

Page 4: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Important notions…• It can be a difficult problem to represent some domains as

key-value or key-document databases, as the boundaries of aggregates might not be easy to determine.

• This basic data modeling issue has a lot of influence on the sort of database you should use.

• Relational databases don’t manipulate aggregates, but they are aggregate neutral for the most part, leaving the construction of aggregates to run time … but we might have hidden, un-normalized tables that make some commonly used aggregates much faster to materialize

Page 5: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Key-value vs. key-document• In key-value databases, we can only retrieve data via a

key• In key-document databases, we may be able to ask

questions about the content of documents – but again, we are not cross-associating them

• Mongo is perhaps the most talked about key-document system, and so we will start there

Page 6: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Installing Mongo• Mongo

• http://docs.mongodb.org/manual/installation

• A GUI• http://www.mongodb.org/display/DOCS/Admin+UIs

Page 7: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Mongo overview• Document based • Focuses on clusters for extremely large scaling • Supports nested documents• Uses JavaScript for queries• No schema

Page 8: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Terminology• A database consists of collections• Collections are made up of documents • A document is made up of fields• There are also indices• There are also cursors

Page 9: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

When to use Mongo• Medical records and other large document systems• Read heavy environments like analytics and mining• Partnered with relational databases

• Relational for live data• Mongo for huge largely read only archives

• Online applications• Massively wide e-commerce

Page 10: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Mongo documents and queries• Documents

• Self-defining, with hierarchical structure• like XML• Or JSON, which uses javascript to define docs in a human-readable

form

• Documents can vary in structure, even in the same collection

• You can add attributes to new documents in a collection without having the change the existing ones in the collection

• Queries: db.order.find({“customerId”:”99”})

Page 11: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Consistency and transactions• There is a tailor-able consistency command that can be

used the level you want for updating replicas of documents

• No multi-document atomic transactions are supported• CAP theorem, which basically says there is a tradeoff

between availability and consistency• You can embed references to other documents in a

document, but this tends to create a “join effect”• DBRef is the command

Page 12: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Selectors• Used for finding, counting, updating, and removing docs

from collections• {} is the null search and matches all documents• We could run: {gender:’f’}• {field1: value1, field2: value2} creates an ‘and’ operation• Also, less than, greater than, etc. (e.g., $gt)• $exists, $or

Page 13: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Some notes on Mongo• There are a few GUIs that seem pretty good

• Mongo-vision: http://code.google.com/p/mongo-vision/ (web page)• Needs Prudence as a web server

• MongoVue: http://mongovue.com, but Windows only• RockMongo (web based): http://rockmongo.com/ (web page)

• Needs an apache web server

• Very easy to install, just download • http://docs.mongodb.org/manual/installation

Page 14: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Getting an Apache web server• XAMPP for windows (mac version is way out of date)• MAMP for Macs (on the app store)• WAMP for windows (bitnami.org)

• All of these give you PHP and MySQL as well. If we have time, we will look at MySQL full text search.

• You might want to install PostgreSQL, too. There is a bitnami stack. If there is time, we will look at PostgreSQL UDTs and full text search.

Page 15: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Another document DB: CouchDB• Major focus: surviving network problems• Engineered for web use• No ad hoc querying, searching is via map reduce-based

indices• We will get back to CouchDB

Page 16: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Map Reduce• Focus is on performing data operations on parallel

hardware• This is a paradigm, not a specific programmatic

technique• Each map reduce process has two phases

• Convert a list into a desired sort of list with the map operator• Convert the new list into a small number of atomic values via a

reduce operator

• This allows us to spread an process across a wide array of servers, with each server performing an independent map reduce process

Page 17: NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book

Map reduce example, from Seven DBs

• Map phase: go through a list of items and find all that are related to Canada, and turning them to 1’s

• Reduce phase: compress this second list by adding up the 1’s to get the cardinality

• The first list could be spread across an array of machines, with the results being filtered into a smaller number, and the final result filtered into a final, single machine.