five database trends - updated april 2015

89
#C15LV REMINDER Check in on the COLLABORATE mobile app Top 5 Trends in Database Technology Guy Harrison, Executive Director, Information Mgt R&D, Dell Software Group Session ID#: 995 @guyharrison

Upload: guy-harrison

Post on 09-Jun-2015

1.750 views

Category:

Technology


3 download

DESCRIPTION

Presentation given at Oracle open world 2014. Five trends in database technology including big data, ssd, in-memory, NoSQL and column stores

TRANSCRIPT

  • 1. REMINDER Check in on the COLLABORATE mobile app Top 5 Trends in Database Technology Guy Harrison, Executive Director, Information Mgt R&D, Dell Software Group Session ID#: 995 @guyharrison

2. Top 5 Trends in Database Technology Guy Harrison, Executive Director, Information Mgt R&D, Dell Software Group 3. Dell Software Group3 Web: guyharrison.net Email: [email protected] Twitter: @guyharrison Introductions 4. 4 Dell Software Group 5. 5 Dell Software Group 6. 6 Dell Software Group 7. 7 Dell Software Group Dell and Quest a brief history 8. Dell Software Group8 But Seriously 9. Dell Software Group9 5 Database Technology Trends The end of one size fits all Big Data and Hadoop NoSQL Columnar architectures The end of disk? 10. 10 Dell Software Group Trend #1: The end of one size fits all 11. Dell Software Group11 History of databases Magnetic tape flat (sequential) files Pre-computer technologies: Printing press Dewey decimal system Punched cards Magnetic Disk IMS Relational Model defined Indexed-Sequential Access Mechanism (ISAM) Network Model IDMS ADABAS System R Oracle V2 Ingres dBase DB2 Informix Sybase SQL Server Access Postgres MySQL Cassandra Hadoop Vertica Riak HBase Dynamo MongoDB Redis VoltDB Hana Neo4J Aerospike Hierarchical model 1960-701940-50 1950-60 1970-80 1980-90 1990-2000 2000-2010 12. Dell Software Group12 Why? 3rd Platform drives new demands on the database: Global High Availability Data volumes Unstructured data Transaction rates Latency A single architecture cannot meet all those demands 13. Dell Software Group13 Operational RDBMS (Oracle, SQL Server, ) In-memory Analytics (HANA, Exalytics ) In-memory processing (Spark) Hadoop Web DBMS (MySQL, Mongo, Cassandra) ERP & in- house CRM Analytic/BI software (SAS, Tableau) Web Server Data Warehouse RDBMS (Oracle, Terradata ) It takes all sorts 14. Dell Software Group14 Oracle engineered systems 15. 15 Dell Software Group Trend #2: Big Data and Hadoop 16. Dell Software Group16 The 3-4 Vs Volume Terabytes Petabytes Exabytes Zetabytes Velocity Transaction rates User populations Machines Variety Structured Unstructured Human Generated Machine Generated Value 17. 17 Dell Software Group The Industrial revolution of data 18. 18 Dell Software Group 2005 19. 19 Dell Software Group 2009 20. Dell Software Group20 The instrumented human Bluetooth Personal Area Network 3G/WiFi Wide Area Network GPS Storage Pulse, temp monitor Silent alarms Pedometer, sleep monitoring Compass Camera Mike/earphones Heads up display Emotion/Attention monitor 21. Dell Software Group21 22. Dell Software Group22 The instrumented world 23. Dell Software Group23 Big Data is the culmination of cloud, social and mobile 24. Dell Software Group24 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit 25. Dell Software Group25 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit 26. 26 Dell Software Group Pioneers of big data 27. 27 Dell Software Group 28. 28 Dell Software Group 29. 29 Dell Software Group 30. 30 Dell Software Group 31. 31 Dell Software Group 32. Dell Software Group32 Google File System (GFS) Map Reduce BigTable Google Applications Google Software Architecture (circa 2005) 33. Dell Software Group33 Start ReduceMap Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Reduce 34. Dell Software Group34 Hadoop: 1.0: Open Source Map-Reduce Stack 35. Dell Software Group35 Hadoop at Yahoo 2010(biggest cluster): 4000 nodes 16PB disk 64 TB of RAM 32,000 Cores 2014: 16 Clusters 32,500 nodes 36. Dell Software Group36 37. Dell Software Group37 SQOOP (RDBMS loader) Hive (Query) Pig (Scripting) Flume (Log Loader) Oozie (Workflow manager) Hadoop File System (HDFS) Map Reduce / YARN Hbase (database) Zookeeper (locking) Hadoop family 38. Dell Software Group39 Economies $4,911 $750 $0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000 Exadata Hadoop Exadata vs Hadoop $$/TB (Hardware only) 39. Dell Software Group40 Hadoop is the most concrete Big Data technology Toad: your companion in the Big Data revolution 40. Dell Software Group41 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit 41. Dell Software Group42 More Data Storing all data including machine generated and sol, Social, community, demographic data in original format for ever To More Effect Smarter use of data (data science) to achieve competitive or human benefit 42. Dell Software Group43 Big Data Analytics AKA Data Science Machine Learning Programs that evolve with experience Predictive Analytics Programs that extrapolate from past to future Collective Intelligence Programs that use inputs from crowds to simulate intelligence 43. Dell Software Group44 44. Dell Software Group45 Collective Intelligence From now on, Ill call you An Ambulance. OK? Siri call me an ambulance 45. 47 Dell Software Group Trend #3: NoSQL 46. Dell Software Group48 Web Servers Database Servers Memcached Servers Shard (G-O) Shard (P-Z)Shard (A-F) Read Only Slaves 47. Dell Software Group49 CAP Theorem says something has to give CAP (Brewers) Theorem says you can only have two out of three of Consistency, Partition Tolerance, Availability Consistency Everyone always sees the same data Availability System stays up when nodes fail Partition Tolerance System stays up when network between nodes fail Oracle RAC lives here NO GO Most NoSQL lives here 48. Dell Software Group50 Major influences on non-relational Eventually consistent transaction model Consistent hashing Amazon Dynamo Column Family model for sparse distributed columnar data Google BigTable Paved the way for the document database OODBMS and XML DBs 49. Dell Software Group51 Amazon Dynamo Model 50. Dell Software Group52 Name Site Counter Dick Ebay 507,018 Dick Google 690,414 Jane Google 716,426 Dick Facebook 723,649 Jane Facebook 643,261 Jane ILoveLarry.com 856,767 Dick MadBillFans.com 675,230 NameId Name 1 Dick 2 Jane SiteId SiteName 1 Ebay 2 Google 3 Facebook 4 ILoveLarry.com 5 MadBillFans.com NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230 Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230 Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 BigTable Data Model 51. Dell Software Group53 OODBMS -1990s The OODBMS Manifesto (Atkinson/Bancilhon/DeWitt/Dittrich/Maier/Zdo nik, '90) "A relational database is like a garage that forces you to take your car apart and store the pieces in little drawers Also SQL is ugly A Object database is like a closet which requires that you hang up your suit with tie, underwear, belt socks and shoes all attached (Dave Ensor) http://4.bp.blogspot.com/- IPgd1Tg8ByE/UkOzH- g1FmI/AAAAAAAACB0/QYg8kE Vp5_0/s1600/db4o_vs_orm.png 52. Dell Software Group54 Revenge of the Object Nerds Document databases Structured documents XML and JSON (JavaScript Object Notation) become more prevalent within applications Web programmers start storing these in BLOBS in MySQL Emergence of XML and JSON databases 53. Dell Software Group55 Graph Database Neo4J Infinite Graph FlockDB Document JSON based MongoDB CouchDB RethinkDB XML based MarkLogic BerkeleyDB XML Key Value Memchache DB Oracle NoSQL Dynamo Voldemort DynamoDB Riak Table Based BigTable Cassandra Hbase HyperTable Accumulo 54. Dell Software Group56 Its not a database, its a key value store http://browsertoolkit.com/fault-tolerance.png 55. Dell Software Group57 No Means Yes! 56. 58 Dell Software Group Trend #4: Column-oriented DB 57. Dell Software Group59 Row orientation vs column orientation ID Name DOB Salary Sales Expenses 1001 Dick 21/12/60 67,000 78980 3244 1002 Jane 12/12/55 55,000 67840 2333 1003 Robert 17/02/80 22,000 67890 6436 1004 Dan 15/03/75 65,200 98770 2345 1005 Steven 11/11/81 76,000 43240 3214 Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database 58. Dell Software Group60 Analytical Queries Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database SELECT SUM(salary) FROM saleperson 59. Dell Software Group61 Compression Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database Poor compression ratio (low repetition) Good compression ratio (high repetition) 60. Dell Software Group62 Inserts Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database INSERT INTO salesperson 61. Dell Software Group63 C-Store (Vertica) Solution for inserts Read Optimized Store Columnar Disk-based Highly Compressed Bulk loadable Write Optimized Store Row oriented Uncompressed Single row inserts Asynchronous Tuple Mover Bulk sequential loads Continual Parallel inserts Merged Query 62. Dell Software Group64 Exadata Hybrid Columnar Compression (EHCC) Compression Unit (~