no sql3 rmoug

43

Upload: chen-gwen-shapira

Post on 26-Jan-2015

108 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: No sql3 rmoug
Page 2: No sql3 rmoug

I'm from California – where mountain biking and startups were invented. My friends work at Facebook, eBay, Linked-In, and often I'm the only DBA they will talk to. That is how I hear about the decision model around NoSQL usage.

Page 3: No sql3 rmoug

We are a managed service AND a solution provider of elite database and

System Administration skills in Oracle, MySQL and SQL Server

3

Page 4: No sql3 rmoug
Page 5: No sql3 rmoug

MySQL for front-end and ad serving

Oracle as a data warehouse

Hadoop for analytics and ETL

Hive as a more structured Hadoop frontend

Cassandra for mailbox search

While an excellent RDBMS such as Oracle can solve 90% of the problems, we

need multiple, special purpose databases for the other 10%.

Every developer knows more than one language, and most of them will happily

learn more if the job requires. The good ones are “software engineers” and

not “Java programmers”. We need to turn “Senior Oracle DBAs” into

“Database Engineers”.

5

Page 6: No sql3 rmoug

* Marketing term. These days everything is NoSQL(including Oracle!)

* Anything from file-system to cache can be called NoSQL

* Key value stores, document stores, column stores, OLTP or DW, RAM or Disk,

Page 7: No sql3 rmoug

Some people say: Why worry about scale before you have even 100 users?

Not true. Some startups like eBay or LinkedIn have a scale or fail business model from the beginning. They know that if they don't get 250M users, they will fail. So they plan for 250M from the beginning.

While initially most NoSQL databases are easier for developers, due to simpler data models and easier access methods than JDBC+SQL. Eventually NoSQL databases lack many of the services an RDBMS will provide, forcing your developers to do more work.

Page 8: No sql3 rmoug

You can do – pk access, range scan, group by – but not joins

You may be able to update a single row (“document”, “column family”) as an atomic operation. But that is the absolute limit.

Page 9: No sql3 rmoug

9

Page 10: No sql3 rmoug

Note that these are not traditional RDBMS problems:

Checkout requires access by key only.Monitoring is write a lot query very little.Page-rank and “People you might know” require

quick updates and selects are done with batch offline jobs.

Word completion is just set selection

Page 11: No sql3 rmoug

Hadoop – so big it deserves its own presentation

Page 12: No sql3 rmoug
Page 13: No sql3 rmoug

13

Page 14: No sql3 rmoug
Page 15: No sql3 rmoug
Page 16: No sql3 rmoug

... or when node 3 crashes?You need to remap every single datapoint to a new

node. Causing lots of data copy and scanning. Lots of extra work. Some of it may require locking.

Actually when you add node #5, you only need to mode 3000/5 datapoints, not 3000. Obviously, the more nodes you have, the more advantage there is to a smarter way of partitioning.

Page 17: No sql3 rmoug
Page 18: No sql3 rmoug
Page 19: No sql3 rmoug
Page 20: No sql3 rmoug
Page 21: No sql3 rmoug
Page 22: No sql3 rmoug

This includes subsequent access from independent processes

Page 23: No sql3 rmoug

When you decide to go with a distributed and replicated model, there's an obvious question: What do I do when some of the nodes needed for the operation are not available (either due to network issues or to crashes):

1. Fail the operation2. Wait for the node to come back3. Perform the operation on reachable nodes and

update the extra node when its back.

Page 24: No sql3 rmoug

Writes don't get lost, because at least one node keeps them and attempts to communicate them to other nodes in the system

Page 25: No sql3 rmoug

Important – the application must know how to resolve conflicts. If you don't have a good method of resolving conflicts – don't do eventual consistency.

Page 26: No sql3 rmoug

26

Page 27: No sql3 rmoug

Key-Value store

27

Page 28: No sql3 rmoug

Storage nodes are the physical serversThey contain “partitions”. Keys are mapped to partitions. Partitions are

grouped into “replication groups”, each containing a set number of partitions on separate servers, and if needed – separate data centers. All partitions in a replication group contain identical data.

One partition in a replication group is designated the “master”. Writes are done on the master only. If the master fails, a new master is elected in the group.

Client drivers keep track of the hash map – which key will map to which partition, who is the master of each replication group and the load on each node in the group. This allows the client to work to the right node.

28

Page 29: No sql3 rmoug

29

Page 30: No sql3 rmoug

30

Page 31: No sql3 rmoug

Major key controls the location of the key. This means that all keys with same major key are kept on same replica, and can be updated in one transactions. It also means that many different major keys should be used to fully utilize all storage nodes.

31

Page 32: No sql3 rmoug

32

Page 33: No sql3 rmoug

33

Page 34: No sql3 rmoug

34

Page 35: No sql3 rmoug

35

Page 36: No sql3 rmoug
Page 37: No sql3 rmoug

* New products = lots of bugs, few featuresOracle is at 11gR2. MS SQLServer is the equivalent

of Oracle 8i (maybe 9?), MySQL is somewhere between 6 and 7. NoSQL is between 2 and 3.

* Open source = no support

* Many companies decide to built their own – most of the algorithms are published, you can use existing code, there is no support anyway, solving specific problems is easier

Page 38: No sql3 rmoug
Page 39: No sql3 rmoug
Page 40: No sql3 rmoug
Page 41: No sql3 rmoug
Page 42: No sql3 rmoug
Page 43: No sql3 rmoug

43