nosql - the sequel

2
8/8/2019 NoSql - The Sequel http://slidepdf.com/reader/full/nosql-the-sequel 1/2 Let’s start with a simple question: What is the real difference between NoSQL and SQL? In my view, the different access patterns provided by NoSQL and SQL result in very different scalability and performance. NoSQL elements allow data access only in a narrow predefined access pattern. For example, DHT (Distributed Hash Table) is accessible via hashtable API; given the exact key, the value is returned. The access pattern for other NoSQL data services is similarly narrow and well-defined, and as a result scalability and performance structure are predictable and reliable. In SQL, the access pattern is not known in advance, the tables are modeled, assumptions are made regarding the access patterns, and these assumptions are translated into predefined optimizations like index definitions. SQL is by definition a generic language that allows access to data in various ways. The programmer also has limited control over the execution of the SQL statements; mostly, the database engine is responsible for optimizing the execution of the statements. In other words, in SQL, the data model does not enforce a specific way to work with the data — it is built with an emphasis on data integrity, simplicity, data normalization and abstraction, which are all extremely important for large complex applications. Why NoSQL The NoSQL approach presents huge advantages over SQL databases because it allows one to scale an application to new levels. The new data services are based on truly scalable structures and architectures, built for the cloud, built for distribution, and are very attractive to the application developer. There’s no need for DBA, no need for complicated SQL queries and it is fast. Hooray, freedom for the people! This is no small matter —  a good programmer’s freedom to choose a data model, write a program or an application with familiar tools, reduce dependencies on other people, test and optimize the code without doing guesswork or counting on a black box (DB). Yes, it’s slow on the test system, but someone will take care of it later by tuning the DB…these are all major advantages of the NoSQL movement. And Why Not… There are some disadvantages to the NoSQL approach. Those are less visible at the developer level, but are highly visible at the system, architecture and operational levels. 1.  At the system level, data models are key. Not having a skilled authority to design a single, well-defined data model, regardless of the technology used, has its drawbacks. The data model may suffer from duplication of data objects (non-normalized model). This can happen due to the different object model used by different developers and their mapping to the persistency model. At the system level one must also understand the limitations of the chosen data service, whether it is size, ops per second, concurrency model, etc.

Upload: avi-kapuya

Post on 09-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NoSql - The Sequel

8/8/2019 NoSql - The Sequel

http://slidepdf.com/reader/full/nosql-the-sequel 1/2

Let’s start with a simple question: What is the real difference between NoSQL and SQL? In my

view, the different access patterns provided by NoSQL and SQL result in very different

scalability and performance. 

NoSQL elements allow data access only in a narrow predefined access pattern. For example,

DHT (Distributed Hash Table) is accessible via hashtable API; given the exact key, the value isreturned. The access pattern for other NoSQL data services is similarly narrow and well-defined,

and as a result scalability and performance structure are predictable and reliable.

In SQL, the access pattern is not known in advance, the tables are modeled, assumptions are

made regarding the access patterns, and these assumptions are translated into predefined

optimizations like index definitions. SQL is by definition a generic language that allows access

to data in various ways. The programmer also has limited control over the execution of the SQL

statements; mostly, the database engine is responsible for optimizing the execution of the

statements. In other words, in SQL, the data model does not enforce a specific way to work with

the data — it is built with an emphasis on data integrity, simplicity, data normalization and

abstraction, which are all extremely important for large complex applications.

Why NoSQL

The NoSQL approach presents huge advantages over SQL databases because it allows one to

scale an application to new levels. The new data services are based on truly scalable structures

and architectures, built for the cloud, built for distribution, and are very attractive to the

application developer. There’s no need for DBA, no need for complicated SQL queries and it is

fast. Hooray, freedom for the people!

This is no small matter —  a good programmer’s freedom to choose a data model, write a

program or an application with familiar tools, reduce dependencies on other people, test andoptimize the code without doing guesswork or counting on a black box (DB). Yes, it’s slow on

the test system, but someone will take care of it later by tuning the DB…these are all major 

advantages of the NoSQL movement.

And Why Not… 

There are some disadvantages to the NoSQL approach. Those are less visible at the developer

level, but are highly visible at the system, architecture and operational levels. 

1. 

At the system level, data models are key. Not having a skilled authority to design asingle, well-defined data model, regardless of the technology used, has its drawbacks.

The data model may suffer from duplication of data objects (non-normalized model).

This can happen due to the different object model used by different developers and their

mapping to the persistency model. At the system level one must also understand the

limitations of the chosen data service, whether it is size, ops per second, concurrency

model, etc.

Page 2: NoSql - The Sequel

8/8/2019 NoSql - The Sequel

http://slidepdf.com/reader/full/nosql-the-sequel 2/2

2.  At the architecture level, two major issues are interfaces and interoperability. 

Interfaces for the NoSQL data services are yet to be standardized. Even DHT, which is

one of the simpler interfaces, still has no standard semantics, which includes

transactions, none blocking API etc. Each DHT service used comes with its own set of 

interfaces. Another big issue is how different data structures, such as DHT and a binary

tree, just as an example, share data objects. There are no intrinsic semantics forpointers in all those services. In fact, there’s usually not even strong typing in theseservices —  it’s the developer’s responsibility to deal with that.Interoperability is an

important point, especially when data needs to be accessed by multiple services. A simple

example: backoffice works in Java, web serving works in php, can the data be accessed

easily from both domains? Clearly one can use web services in front of the data as a data

access layer, but that complicates things even more, and reduces business agility,

flexibility and performance while increasing development overhead. 

3.  Moving to the operational realm, here, from my experience, lies the toughest

resistance, and rightfully so…The operational environment requires a set of tools that is

not only scalable but also manageable and stable, be it on the cloud or on a fixed set of 

servers. When something goes wrong, it should not require going through the wholechain and up to the developer level to diagnose the problem. In fact, that is exactly what

operation managers regard as an operational nightmare. Have you ever tried getting a

developer to diagnose why a payment system is not functioning while he’s at a bar and a

few beers in? I’m sure the developer’s date would be impressed by his dedication to hiswork, but that’s a pretty expensive way to impress someone :)Operation needs to be

systematic and self contained. With the current NoSQL services available in the market,

this is not easy to achieve, even in managed environments such as Amazon.

So, how can we gain the major advantages of the NoSQL approach while keeping the advantages

of the SQL approach?

SQL and NoSQL Joined:

A SQL database implementation that uses NoSQL infrastructure is a good solution. A SQL

database that is scalable, manageable, cloud-ready, highly available and built entirely on NoSQL

infrastructure, but still provides all the advantages of a SQL database, such as interoperability,

well-defined semantics and more.

This hybrid would not be as fast as a NoSQL service, but it may be good enough for the 80% of 

the market that needs stronger scalability and organic cloud behavior.

Such a solution would also allow migrating existing applications easily into cloudenvironments, thus protecting huge investments made by organizations in those applications.

It is my opinion that a SQL database built on NoSQL foundations can provide the highest value

to customers who wish to be both agile and efficient while they grow.