nosql databases introduction - utn 2013
DESCRIPTION
This was one of the workshop that we gave at the UTN University, to the students of Computer Science.TRANSCRIPT
Agenda
Introduction
SQL overview
Why NoSQL?
Characteristics of NoSQL databases
Use Cases
A NoSQL database in action!
Summary
Introduction
A database is an organized collection of data. The data are
typically organized to model relevant aspects of reality in a way
that supports processes requiring this information.
Management systems (DBMSs) are specially designed applications
that interact with the user, other applications, and the database
itself to capture and analyze data.
Formally, the term database refers to the data itself and
supporting data structures. Databases are created to operate
large quantities of information by inputting, storing, retrieving,
and managing that information.
SQL Databases
Characteristics
SQL is an ANSI and ISO standard computer language for creating and manipulating databases.
SQL allows the user to create, update, delete, and retrieve data from a database.
SQL is very simple and easy to learn.
High Speed: SQL Queries can be used to retrieve large amounts ofrecords from a database quickly and efficiently.
Well Defined Standards Exist: SQL databases use long-established standard,which is being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard.
No Coding Required: Using standard SQL it is easier to manage databasesystems without having to write substantial amount of code.
Transactions – ACID Properties (Atomic, Consistent, Isolated, Durable)
What has happened?
Relational databases were introduced into the 1970s to allow applications tostore data through a standard data modeling and query language (SQL). Sincethe rise of the web, the volume of data stored about users, objects,products and events has exploded. Data is also accessed more frequently,and is processed more intensively – for example, social networks createhundreds of millions of customized, real-time activity feeds for users basedon their connections' activities.
In response to this demand, computing infrastructure and deploymentstrategies have also changed dramatically. Low-cost, commodity cloudhardware has emerged to replace vertical scaling on highly complex andexpensive single-server deployments. And engineers now use agiledevelopment methods, which aim for continuous deployment and shortdevelopment cycles, to allow for quick response to user demand forfeatures.
NoSQL Databases
But.. What’s NoSQL?
A NoSQL database provides a
mechanism for storage and retrieval
of data that employs less constrained
consistency models than traditional
relational databases.
NoSQL systems are also referred to as
"Not only SQL" to emphasize that
they do in fact allow SQL-like query
languages to be used.
Characteristics Large data volumes (such as Google’s big data’)
Scalable replication and distribution
Potentially thousands of machines
Potentially distributed around the world
Queries need to return answers quickly
Mostly query, few updates
Asynchronous Inserts & Updates
Schema-less
ACID transaction properties are not needed – BASE (Basically Available, Soft-
State, Eventually Consistent).
CAP Theorem
Open source development
CAP Theorem
According to the theorem, a distributed
system cannot satisfy all three of these
guarantees at the same time.
Eventual consistency guarantees that if no
new updates are made to a given data item,
eventually all accesses to that item will
return the last updated value.
Taxonomy
The basic classification that most would
agree on is based on data model. A few
of these and their prototypes are:
Column: HBase, Accumulo
Document: MongoDB, Couchbase
Key-value : Dynamo, Riak, Redis, Cache,
Project Voldemort
Graph: Neo4J, Allegro, Virtuoso
MapReduce
A MapReduce program is composed of a Map() procedure that performsfiltering and sorting (such as sorting students by first name into queues, onequeue for each name) and a Reduce() procedure that performs a summaryoperation (such as counting the number of students in each queue, yieldingname frequencies).
NoSQL is not a magic solution
Inconsistent APIs between NoSQL providers.
Denormalized data requires you to maintain you own data relationships
in code.
Not a lot of real operational power for DevOps / IT.
Lack of complicated queries requires joins / aggregations / filters to be
done in code (except for MapReduce).
Need whole value from the key to read or write any partial information.
NoSQL Use Cases:
SAP uses MongoDB as a core component of SAP’s platform- as-a-service
(PaaS) offering.
Foursquare uses MongoDB to store venues and user ‘check-ins’ into
venues, sharding the data over more than 25 machines on Amazon EC2.
MongoDB is used for back-end storage on the SourceForge front pages,
project pages, and download pages for all projects.
Codecademy is the easiest way to learn to code online.
Guardian.co.uk is a leading UK-based news website.
EA Sports: MongoDB is being used for the game feeds component.
NoSQL Use Cases:
AOL: “We selected Couchbase after evaluating several open source products to power our next-generation backend ad serving platform”.
Zynga’s FarmVille, Café World, Mafia Wars and other games have over 235 million active users per month. We rely on technology from Couchbase to make that possible.
In the PayPal Media Network Advertising Pipeline, Couchbase is used to build a scalable cross channel audience profiling, segmentation, identity mapping & frequency capping.
LinkedIn built a durable and scalable index for it's metrics visualization engine using Couchbase.
Skyscanner scaled one of its flight search APIs from 100,000 searches a day to over 3 million, introducing Couchbase on its tech stack.
Another use cases..
Netflix is using Amazon SimpleDB. Link
Twitter uses Cassandra, Hadoop, Hbase, amont others. Link
Facebook and Instagram, are both using Cassandra.
Google uses BigTable (equivalent to Hadoop HBase).
LinkedIn uses Voldemort.
Etc
Summary
This is just the tip of an iceberg.
Now on, the rest it’s on you!
SQL works great, cant scale for
large data.
NoSQL works great, cant fit for
all.
Use SQL + NoSQL
References
Base de Datos [Wikipedia]
SQL [Wikipedia]
NoSQL Distilled [Martin Fowler]
NoSQL vs. SQL - Battle of the Backends [Google IO12]
SQL Standard and NoSQL Databases
What is NoSQL? [MongoDB]
Why NoSQL? [Couchbase]
CouchDB: The Definitive Guide
BigTable Patent [Google]
Thanks!
Backup
JSON
JSON or JavaScript Object Notation, is a text-based open standard
designed for human-readable data interchange. Derived from the
JavaScript scripting language, JSON is a language for representing simple
data structures and associative arrays, called objects. Despite its
relationship to JavaScript, JSON is language-independent, with parsers
available for many languages.
Sample: