vote no for mysql

Download Vote NO for MySQL

If you can't read please download the document

Upload: ulf-wendel

Post on 17-May-2015

2.689 views

Category:

Technology


6 download

DESCRIPTION

Vote NO for MySQL - Election 2012: NoSQL. Researchers predict a dark future for MySQL. Significant market loss to come. Are things that bad, is MySQL falling behind? A look at NoSQL, an attempt to identify different kinds of NoSQL stores, their goals and how they compare to MySQL 5.6. Focus: Key Value Stores and Document Stores. MySQL versus NoSQL means looking behind the scenes, taking a step back and looking at the building blocks.

TRANSCRIPT

  • 1. Ulf Wendel, MySQL/OracleVOTE NO for MySQLElection 2012: NoSQL

2. The speaker says...451 Research predicts* MySQL usage of 80% in 2012, and: Loss of 25% within five years Down to 55% in 2017 30%** looked into or use already NoSQLHmm, I think, I am looking for a new job. Are you looking forUlf WendelTM? I am for sale!* http://de.slideshare.net/mattaslett/mysql-vs-nosql-and-newsql-survey-results-13073043** MySQL users of the 451 Research sample 3. Swipe, tap, click, and zoom1234 - I am super FAST{"documents":"rock"}4567 - LIGTHNING fast{"mysql":"not"}7890 - [1][2][3][4][5]{"documents":"rock"} abcd - [1[a,b,c]],[2[d,e,f]]{"mysql":"not"} _de_- 1,281,2828,2,173,8 {"documents":"rock"} Key Value StoreDocument Database 010101001011010 101010110100101 101010010101010 101010110101011 101010110111101 Graph Database Big Data/Column-oriented 4. The speaker says...Will It Blend?* Will I need a new job? That is the question.Knifes of four different kinds of Not Only SQL are attackingMySQL.* http://www.youtube.com/user/Blendtec 5. Swipe, tap, click, and zoom Memcache, Redis, MongoDB, CouchDB, LevelDB, BerkeleyDB...Riak, ...Key Value Store Document Database Neo4j, FlockDB...BigTable, Hadoop, CassandraGraph DatabaseBig Data 6. The speaker says...Time for voting: Which systems are you using? Since when?Please note, LevelDB itself is a Key Value Store but Riakembeds it. Riak supports many storage engines and, thus itcan be bended in many directions. 7. A new era of databasesScaleable ElasticExplore the benefits Highly AvailableEasy To Use 8. The speaker says...Not Only SQL databases aim to be scalable. From one nodeto one tousand nodes in a bit. And, back depending on bothquery load and data size! Sharding built-in! Chewing gumsof the cloud area!Master down? No problem. Some dont do lame primarycopy. Paxos and others ensure the database clustersurvives the failure of nodes including primaries, if any.Hot on conferences: JavaScript, HTTP, JSON you name it -ingredients of todays web applications. No room for MySQL? 9. RDBMS are from the 80s, at best **** MYSQL 5.6+ BEYOND 3.23 ****MYSQL CLUSTER AUTO_SHARDING 99,999% HAMYSQL INNODB MEMCACHE PLUGINMYSQL CLUSTER NODE_JSREADYAre you ready to retire ?!?SYNTAX ERROR 10. The speaker says...80s at best? A claim from NoSQL folks... MySQL Clusterscales: 4 billion ACID tpm using 30 out of 48 possible nodes,auto sharding, condition push down, five-nines availability,synchronous replication, in-memory and on disk. Comingwith 7.3: foreign key support.The #1 Key-Value-Store for MySQL users is Memcache.MySQL 5.6 includes a Memcache plugin. Clients can accessInnoDB tables using SQL or Memcache protocol. This is thenext generation Memcache for MySQL: persistence andreplication included.BTW, Couchbase Server 2.0 = CouchDB + Membase. 11. thWelcome back 70 ?! IBM Information Management System R/360 TASK:197x IIMM MM SSSSS IIM M M MSS IIM M MSS IIM M M SSSS IIM MSS IIM MSS IIM MSSSSS (1) COUCHDB(2) MONGODB READY # --------------------------------------------------- abcdefghijklmnopqrstuvwxyz#+-*./%$():,;?&0123456789 12. The speaker says...How about this claim: document databases such asCouchDB and MongoDB have their roots in the 70th . Thehierarchical data model is older than the relational datamodel.Both databases manage documents in forests of trees. Theprimary operation is a simple scan on the tree. Additionallythe user can embed links to reference between trees.Relations are not an integral part of the data model. 13. SCNR - Curriculum vitae Until 2005: Senior Developer Lotus Notes Team. Lotus Notes is a documented-oriented, distributed database introduced in 1984. Since 2005: CouchDB Hero 14. NoSQL is better! 15. The speaker says...One or the other way, whether I have to search for a new jobor not, I need to take them serious. People use NoSQL!There must be something smart...For example, CouchDB: Speaks HTTP iPhone browser can connect directly Returns JSON iPhone browser can display it Server-side JavaScript app server built-in Replication with conflict detection sync mobile device 16. Swipe and zoom1234 - I am super FAST {"documents":"rock"}4567 - LIGTHNING fast {"mysql":"not"}Zoom!7890 - [1][2][3][4][5] abcd - [1[a,b,c]],[2[d,e,f]]Zoom! {"documents":"rock"}{"mysql":"not"} _de_- 1,281,2828,2,173,8{"documents":"rock"}Key Value Store Document Database010101001011010101010110100101 Swipe! Swipe!101010010101010101010110101011101010110111101Graph DatabaseBig Data/Table 17. The speaker says...We identified four flavours of Not Only SQL databases.Which ones are relevant to me when searching for a newemployer?Key-Value-Stores are popular. Document Databases arebeing considered for web applications. Click or tap to zoom!Graph databases and Big Data are beyond the scope of thepresentation: specialist tools for special purposes. Swipe! 18. Zoom! Key Value Store High Performance1234 - I am super FAST4567 - LIGTHNING fast Limited Search and Types7890 - [1][2][3][4][5] abcd - [1[a,b,c]],[2[d,e,f]] _de_- 1,281,2828,2,173,8 Key Value Store ScaleableLimited Persistence 19. The speaker says...A Key Value Store strikes for its simple data model which isthat of an associative array/hash. The data model ispoor at ad-hoc queries: loose the key and you lock your datain the treasure. But, it is fast. A need for speed has led tomany in-memory solutions in this class. A perfect model foruse as a cache. If used as a cache, persistence is oftensecondary. Generally speaking the data model is perfect forpartitioning/sharding. There are no operations coveringmultiple values, thus values can be distributed on multiplenodes to scale the system.Most operations are basic (CRUD). Redis stands out withcomplex data types and correspondig commands. 20. Zoom! Redis and MemcacheHigh Performance Sure: simple data model, in-memory, lightweight client protocols, non-blocking clients, pipelining, streaming...Limited Search and Types Redis: strings and abstract types/data structures but no checks. Maybe, Lua-scripting for search? Blocking taskScaleable Redis: single-threaded, lazy primary copy replication, sharding cluster plannedLimited Persistence Memcache. Redis: Snapshot + WAL (AOF) recovery time 21. The speaker says...Both Redis and Memcache are proven and popularsolutions. However, due to their late birth some featureshave limitations.For example, lack of multi-core CPU support. Whencomparing an operation on MySQL 5.0 and MySQL 5.6using only one concurrent client, dont be disappointed if ithas not become much faster. Focus is on scaling in the areaof 30...40 CPU threads.For example, persistence. Worst case recovery: WAL haschanges for the duration of the snapshot frequence. Replaytakes minutes, albeit no transactions to roll back...?Maybe we can bend MySQL? 22. MySQL 5.6 InnoDB MemcacheRDBMS and Key Value Store combined Benefits of a mature RDBMS High performance key lookup and SQL for ad-hoch querying Data hold once thus, no need to synchronize SQLMemcache Protocol MySQL 5.6 id | firstname | lastname -------------------------- 1 | Ulf| Wendel 2 | Nils | LagnerInnoDB 23. The speaker says...Lets make MySQL a happy dieter, or cut off some fat atleast! Then, support a proven, lightweight protocol with manyexisting language bindings for super fast access.Lets give MySQL users direct access to the inner workingsof MySQL. Inside MySQL we find a stable and CPU-efficientB-tree based storage called InnoDB. If needed, InnoDBautomatically builds a hash index behind the scenes eversince. In-memory performance is great because it has to be.Estimated 20% use 64 256 GB RAM with MySQL.Persistence is a given on top of it. However, SQL parsingand interpretation is slow. 24. Zoom! InnoDB MemcacheSQLMemcache Protocol MySQL 5.6CoreMemcached Plugin Storage Handler API InnoDB APIStandard MyISAM id | firstname | lastnameMemcache In-Memory-------------------------- Memory 1 | Ulf| Wendel Storage2 | Frank| Sons ... InnoDB 25. The speaker says...Ever since MySQL has separated the SQL layer from thestorage layer through the so-called internal Handler API,however the feature has not been exposed to the user formany years. Popular storage engines include InnoDB,Memory, MyISAM and NDB (MySQL Cluster). Since MySQL5.1 the storage engines can be implemented as a serverplugin. The Memcache plugin is another kind of serverplugin. It integrates memcached with MySQL.The plugin can store data both in InnoDB and the defaultmemcached way. As a result users can now access InnoDBtables using either SQL or Memcache protocol. 26. A Key Value Store on steroids?Read and write scale-out Within cluster: synchronous update anywhere (multi-master) Focus in-memory with full on-disk persistence Autosharding, Dual NoSQL and SQL interfaces 2002: 1 million reads per minute 2012: 1.2 billion write transactions per minute on 30 nodes 2012: 1 BN reads/m on 8 nodes, 4.3 BN reads/m on 30 nodes99,999% Availability Shared-nothing, all data stored 1..4x times, survives failures WAN replication: asynchronous with conflict detection 27. The speaker says...MySQL Cluster has reached 1.17 billion writes transactionsper minute on a 30 nodes setup in mid 2012. That is some3.9 million writes per second. You can have upto 48 nodesin a cluster. Clusters can be replicated over wide areanetworks, for example, to run them in different data centerson different locations. Just in case you worry about networklatency...You can access MySQL Cluster through a variety ofinterfaces. Among them are MySQL Server SQL nodes(ODBC, JDBC, .NET, ...), ClusterJ (JNI), LDAP, HTTP/REST(Apache mod-ndb) and Memcached. All of them internallyuse the NDB C++ API. 28. Zoom! MySQL Cluster Memcache SQL Memcache ProtocolMySQL Server / Cluster 7.2Memcached Storage Handler API ndb_engInnoDB NDB APIMyISAM id | firstname | lastname --------------------------Memory 1 | Ulf 2 | Nils| Wendel| Lagner ...MySQL Cluster (NDB) data node 29. The speaker says...Memcached support is one of the latest additions to the listof MySQL Cluster interface. Since MySQL Cluster 7.2 it ispossible to use MySQL Cluster as a storage engine for aMemcache server. This is quite similar to using MySQLCluster/NDB as a storage engine for MySQL. In both casesthe frontends wrap the main MySQL Cluster API which iscalled NDB API. Both frontends inherit all MySQL Clustergoodies.You can choose whether to run the Memcached, the MySQLCluster data nodes and the application on one machine (lowlatency) or on different ones (fail safety). Note, that with theInnoDB Memcache plugin you have Memcached andMySQL running in the same process. 30. Zoom! MySQL as a KVSTry the NoSQL APIs!High Performance1234 - I am super FAST SQL for ad-hoc querying4567 - LIGTHNING fast Limited Search and Types7890 - [1][2][3][4][5] abcd - [1[a,b,c]],[2[d,e,f]] _de_- 1,281,2828,2,173,8Threaded/Multi-Core,Key Value StoreScaleableReplication In-memory,Limited Persistenceon-disk with fast recovery 31. The speaker says...The InnoDB Memcache Plugin is certainly a step forward.MySQL is putting pressure on itself to modularize the serverallowing users to slim MySQL, to strip off features notneeded to get a certain job done.Users get more choices. If you want to combine a fast andlean client protocol with simple and fast access operationsbut cannot accept compromises on persistence orscalability, here you go.Cluster has been a speed monster ever since... 32. Transparent fast key access$mysqli = new mysqli("localhost", "usr", "pass", "test");$memcache = new memcached();$memcache->addServer("localhost", 11211);mysqlnd_memcache_set($mysqli, $memcache);$res1 = $mysqli->query("SELECT firstname FROM test WHERE id = 1");$res2 = $mysqli->query("SELECT * FROM test); mysqli PDO_MySQL mysqlMySQL native driver for PHP (mysqlnd)Plugin: PECL/mysqlnd_memcache SQL access Memcache access 33. The speaker says...PECL/mysqlnd_memcache is another free and open sourceplugin for the PHP mysqlnd library. Mysqlnd is the compiletime default C library used for all PHP MySQL APIs (mysqli,PDO_MySQL and mysql).Like other plugins it adds new features to all the APIs. Basedon a configurable regular expression the plugin turns a SQLaccess into a Memcache access. Due to the lightweightprotocol and direct access the Memcache access to MySQLis faster. No matter what protocol used by the library, theuser gets a standard result set in return. Simple to use.However, note that no meta data is available if a key accesshas been performed. 34. Ulfs take...Not a bad attempt at all... Go try! Go ask for more!A significant number of MySQL users is using Memcached Deploy only one data store instead of two Dual interface: can we skip a caching layer in our apps?A good first step, but looking for more Persistence for Memcache - more of a topic for Redis? No issues with warm-up or stampeding/slamming KVS is about performance, where is the proof @ 5.6...? Data model is about distribution/sharding, MySQL Cluster only? 35. The speaker says...Inside MySQL is some fine and stable technology... Onceinternals are exposed in a user friendly manner MySQL willbe kicking. 36. Zoom! Document DatabasesJavaScript/[J|B]SON, Map&Reduce {"documents":"rock"} Highly Available{"mysql":"not"} {"documents":"rock"}{"mysql":"not"} {"documents":"rock"} ScaleableDocument Database Easy To Use 37. The speaker says...Document Databases use a data model that seems asappealing as that of a Key Value Store. Think of a Key ValueStore that holds nothing but arbitrary documents.Documents are schema-free thus, you can code withoutthinking first... Nested documents are great for storingobjects of a programming language.Take CouchDB/MongoDB. All the ingredients of a modernweb database are there! MongoDB: Sharding, automatic HAfailover. CouchDB: Lazy primary copy, conflict detection,consistent hashing for sharding/clustering, ACIDtransactions. 38. Zoom! MongoDB and CouchDBJavaScript/[J|B]SON, Map&Reduce No portable and powerful query language, vendor lock-in, Map&Reduce: you write the search routine. MongoDB: no built-in validation of data, both: update anomalies, weak on relationsHighly Available Lazy primary copy with built-in fail-over (MongoDB) resp. conflict detection (CouchDB)Scaleable Sharding, CouchDB: compactation, single file on disk. MongoDB: M/R multi-threaded v8 JavaScript, really ?!, write locks, index in memory only 39. The speaker says...Search deserves a dedicated slide, more below. On paperthe high availability approach of both MongoDB andCouchDB looks good. Like with Key Value Stores their datamodel is great for sharding. This is a write-scale outapproach they share with MySQL Cluster. Scalability young systems, again. Nobody wants to waste disk space,manually compact files, limited to a single file on disk or relyon operating system cache managers to suit databaseneeds. Over the years MySQL got disk-efficient to makereads fast and learned partitioning to fine tune datadistribution on disk arrays. Mongo M/R jobs on multi-coresingle-threaded: https://jira.mongodb.org/browse/SERVER-4258 ? 40. Zoom! Rich query language?Projection (filter columns): SELECT [DISTINCT] attribute | arithmetric expression | aggregation functionDISTINCTarithmetric expressionaggregation function COUNT(*) SUM(column) (user to provide function) MAX(column) (user to provide function) MIN(column) (user to provide function) AVG(column) (user to provide function) Selection (filter rows): WHERE (see above)constant: attribute = | != | < | | &gt= constantattribute: attribute = | != | < | | &gt= attribute (see above)logical operators: ( attribute constant | attribute) and|or|not ()uncertainty: attribute LIKE constant (must be emulated with regular expressions)NULL: attribute IS NULLALL, ANY, IN, EXISTS (no subquery) Join (join tables) 41. The speaker says...The standardized query language (SQL) is a rich querylanguage, the MongoDB query language is not. Please see,http://blog.ulf-wendel.de/2012/searching-data-in-notonlymysql-databases-a-rich-query-language/ for details.Imagine you ever want to swap the database system. UsingMongoDB syntax means 100% vendor lock-in and significantefforts porting the application. As a developer, once youlearned SQL you are qualified for any job using any RDBMS.Same true for NoSQL...? Map&Reduce? Great batchprocessing for distributed systems! But, you write it. Youdefine the data access path. SQL: you say what, DB takescare of finding best possible physical access path. 42. Zoom! Unlimited, schema free!Support, why is my forum email not updated ?!?!> db.customer.save({name: "Ulf", email: "[email protected]"});> db.forum_user.save({name: "Ulf", email: "[email protected]"});Damn, John takes too much disk space...> db.movie.save({title: "Good", actors: ["John son of...", "Joe"]})> db.movie.save({title: "Good, really", actors: ["John..."]})Folks, is the old code with the typo still around?> db.forum_user.save({name: "Ulf", e-mail: "[email protected]"});Sales barking: we need to add validation to all clients!> db.prospects.save({name: "Ulf", age: "[email protected]"}); 43. The speaker says...The relational data model may be an academic one. Thismay be annoying at times. But, do not forget about its basicsuch as relations (qualified: 1:n, n:m) and the normal form(data duplication, update anomalies, ). Also recall thegoodies SQL has to offer: blue-prints, types, validation ofdata, validation of relations. As a PHP MySQL guy I vividlyremember STRICT_MODE and see PHP users asking fortypes to prevent errors...Data duplication has a serious side effect: disk spacerequirements. You need more disks ($) and you get slowerdata access because more data has to be read.What if MySQL schema changes would be cheap...? 44. MySQL 5.6: Online InnoDB DDLClassic ALTER TABLE Create temporary table, copy rows one-by-one Update indexes during copy, drop old, rename new Clients are blocked, disk space is wastedFirst step in 5.1/5.5: Fast Index Creation CREATE and DROP INDEX without copySecond step in 5.6: Online DDL Majority does not block SELECT, INSERT, UPDATE, DELETE Some changes in-place (without copy), faster copy otherwise 45. The speaker says...Online DDL speaks for itself. The improved ALTER TABLEgives you higher concurrency, lower disk and CPU usageand less purging of the buffer pool all resulting inperformance dips in the past. Please note, the overallruntime of ALTER table can increase. Recording concurrentDML changes and applying them at the end of an onlineDDL may reduce raw performance. However, concurrency isbetter.As a side-effect loading huge dumps has become faster.You can now create the table, load the data and addsecondary indexes later. Clients can start to access the tablewhile the secondary index is still being created. 46. Online DDL operationsPlease, check the manual for details! Quick summary:Concurrent DML and In-Place CREATE|DROP|ADD index Column changes: default, auto_increment, rename (name) Foreign keys: add, dropConcurrent DML but not In-Place Column changes: add, drop, NULL/NOT NULL Change ROW_FORMAT, KEY_BLOCK_SIZE 47. The speaker says...A new LOCK clause of the ALTER TABLE statement allowsyou to block DDL operations that reduce concurrency.Setting LOCK=NONE permits the execution of any ALTERTABLE statement that either blocks read or write operations.LOCK=SHARED blocks an ALTER TABLE statement whichprevents concurrent read operations.In sum: MySQL is not schema fee but has become morefriendly towards schema changes if need be.Next topics: High Availability and Replication 48. MySQL 5.6: GTID, easy failoverGlobal Transaction Identifier Easy to find most up-to-date slave for failover Emulation: PECL/mysqlnd_ms 1.2+, MySQL ProxyMySQL MasterLog 7, Pos 34, GTID M:1: UPDATE x=1Log 7, Pos 35, GTID M:2: UPDATE x=9 MySQL Slave 1 MySQL Slave 2 , GTID M:1: UPDATE x=1 , GTID M:1: UPDATE x=1 , GTID M:2: UPDATE x=9 49. The speaker says...A global transaction identifier is a cluster-wide uniquetransaction identifier. MySQL 5.6 can generate themautomatically and for older versions of MySQL you can usea client-side emulation provided by, for example, MySQLProxy and PECL/mysqlnd_ms.A GTID helps with failover in case of a master/primaryoutage. The most up-to-date slave/secondary should become the new master. Which one is the most current slavecan be checked comparing GTIDs. Unfortunatly, this is noautomatic failover. For automatic failover useDRBD/Pacemaker or other 3rd party tools. Dont miss thecommandline mysqlfailover utility of MySQL Workbench! 50. Group commit and parallel slave--slave-parallel-workers Before: multiple writer on master, one applier on slave Now: parallel worker for different databases, attn: reorderingMySQL MasterT1: UPDATE db1.table.x=1T2: UPDATE db2.table.x=9 SQL thread (coordinator)MySQL Slave 1Worker thread Worker threadT2: UPDATE db2.table.x=9 T1: UPDATE db1.table.x=1 51. The speaker says...MySQL 5.6 makes replication faster. Binary log writes canbe grouped together to significantly improve theperformance of the replication master. Less writes, lesswaits for the disks in tests we saw improvements between2x 4x.Slaves may catch up faster to the master in 5.6 as well.Transactions from different databases can be applied inparallel. The slave SQL thread acts as a coordinator for upto 1024 workers. Note, that transactions which do notoverlap, may be recorded in a different order in the slavesbinary log than in the master log. 52. BTW, PECL/mysqlnd_ms ...Replication and load balancing plugin for mysqlnd Supports all PHP APIs Supports all kinds of MySQL clusters Read-write splitting: automatic, SQL hints, user-controlled Load Balancing: Round-robin, random [once (sticky)], user, ... Transaction aware Lazy connections Quality-of-service: eventual/session/strong consistency Quality-of-service: max age Cache integration ... 53. The speaker says...By the way, we have released yet another free and opensource PHP mysqlnd plugin to make using clusters easier.Of course, I would love to see it used by everybody,because I believe in my own work. However, I do not expectthis to happen as cluster support is something most of youhave solved, one or the other way, years ago. If so, you stillmay want to read the documentation. You may getinspiration for improving your current solution.For example, how about the idea of quality-of-service ortransparent cache support, if eventual consistency is goodenough? 54. Zoom! JavaScript/[J|B]SON Proof of ConceptMySQL speaks HTTP and replies JSON. JavaScript (v8) runs inside MySQL. Map&Reduce jobs use the internal low-level high performance interfaces. it could be donehttp://de.slideshare.net/nixnutz/http-json-javascript-mapreduce-builtin-to-mysql 55. The speaker says...Like one can add a Memcached protocol interface toMySQL, one can add an HTTP interface. ReturningJSON/BSON and running Map&Reduce jobs inside MySQLis not out of reach... 56. Zoom! MySQL vs Docs DB JavaScript/[J|B]SON, Hmm...Map&ReduceReplication: Ok, add 3rd party Highly Available Cluster: beat it! Replication: Write limitScaleableCluster: beat it! Database and tooling: goodEasy To Use Need for ORM: hmm... 57. Ulfs take...Young contenders, beat MySQL on a single machine...Accept new data models Dont get emotional on the ER model: people love nested data!Push on (elastic) scalability and high availability Where is built-in automatic failover not just command line? Where is built-in write scaleout with MySQL Replication?Open up for innovation Developing server plugins must become easier Adding HTTP and other interfaces must become easier 58. The speaker says...The document data model must look as if it has been madefor certain applications so easy to dump arbitrary data thatrarely changes! This comes at a price (normalization,storage requirements, validation, ...).Limiting the discussion to a single machine (transactions,persistence, CPU scalability, memory efficiency, disk layout,query language, ...) MySQL is competitive. It takes manyyears of development fine tuning basic algorithms.But, NoSQL is also about massively distributed systems. 59. What can MySQL users do?Drive MySQL 5.6 to its limit before you worry...Ask yourself what you give up in case of a switch NoSQL is not SQL vs. something it is far more!Use MySQL and the ideas of the new generation Fast and lightweight client protocols Create independent data units for scaling over many machines Prebuild aggregates Batch processing in addition to ad-hoc querying 60. The speaker says...No big surprises here: ideas from the 80th, or has it been the70th? Who cares... Some things in NoSQL are new, othersare old favourites.Try to break things down to the basic concepts. That is hardin a world of temptations with each vendor focussing onpromoting his strengths. However, it really helps to take astep back and search for the basic concepts.Maybe, you have a senior in your company that can helpyou... 61. THE ENDContact: [email protected]