Unlocking big data with Hadoop + MySQL

Download Unlocking big data with Hadoop + MySQL

Post on 21-Jan-2018

272 views

Category:

Software

7 download

TRANSCRIPT

  1. 1. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Unlocking New Big Data Insights with Hadoop & MySQL Ricky Setyawan MySQL Principal Consultant - ASEAN
  2. 2. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | An Avalanche of Data
  3. 3. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Create Value Big Data What It Is, What it Means Volume Variety Velocity
  4. 4. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Whats Changed? Enablers Digitization nearly everything has a digital heartbeat Ability to store much larger data volumes (distributed file systems) Ability to process much larger data volumes (parallel processing) Why is this different from BI/DW? Business formulated questions to ask upfront Drove what was data collected, data model, query design Big Data Enables what-if analysis, real-time discovery
  5. 5. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Adoption Web Recommendations Sentiment Analysis Marketing Campaign Analysis Customer Churn Modeling Fraud Detection Research and Development Risk Modeling Machine Learning
  6. 6. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Leading Use-Case, On-Line Retail Users Browsing Recommendations Profile, Purchase History Web Logs: Pages Viewed Comments Posted Social media updates Preferences Brands Liked Recommendations
  7. 7. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Why Hadoop? Scales to thousands of nodes, PB of structured and unstructured data Combines data from multiple sources, schema-less Run queries against all of the data Runs on commodity servers, handle storage and processing Data replicated, self-healing Initially just batch (Map/Reduce) processing Extending with interactive querying, via Apache Drill, Cloudera Impala, Stinger etc. Copyright 2014, Oracle and/or its affiliates. All rights reserved.
  8. 8. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ANALYZE DECIDE ACQUIRE ORGANIZE CREATE VALUE FROM DATA
  9. 9. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ACQUIRE CREATE VALUE FROM DATA MySQL Database MySQL Cluster JSON Support NoSQL Interfaces MySQL Fabric
  10. 10. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | MySQL 5.7 Sysbench Benchmark: SQL Point Selects 3x Faster than MySQL 5.6 1,600,000 QPS 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000 1,600,000 1,800,000 8 16 32 64 128 256 512 1,024 QueriesperSecond Connections MySQL 5.7: Sysbench OLTP Read Only (SQL Point Selects) MySQL 5.7 MySQL 5.6 MySQL 5.5 Intel(R) Xeon(R) CPU E7-8890 v3 4 sockets x 18 cores-HT (144 CPU threads) 2.5 Ghz, 512GB RAM Linux kernel 3.16 10
  11. 11. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | 11 Hybrid Database: Rock Solid Reliability + Flexibility MySQL 5.7 JSON Support Traditional RDBMS Proven, transactional, secure Complex JOINs and queries Extensive operational tools NoSQL Solutions Flexible. Easy-to-use. Schema-less document storage Modern Applications Require agile development and operations with robust data protection and security Hybrid Database No trade-offs, best of both worlds. ACID properties & reliability of RDMS + flexible document management
  12. 12. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | MySQL NoSQL Interfaces: Fast, Flexible, Safe Blazing Fast Key/Value Queries Fully Transactional/ ACID NoSQL And SQL Across the same data Set Combined with Schema Flexibility: Online DDL
  13. 13. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | NoSQL Interfaces to MySQL Cluster MySQL Cluster Data Nodes Clients Application Layer Data Layer Copyright 2015, oracle and/or its affiliates. All rights reserved 13
  14. 14. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Memory optimized tables Durable Mix with disk-based tables Massively concurrent OLTP Distributed Joins for analytics Parallel table scans for non-indexed searches MySQL Cluster 7.4 FlexAsych 200M NoSQL Reads/Second MySQL Cluster 7.4 NoSQL Performance 200 Million NoSQL Reads/Second - 50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Readspersecond Data Nodes FlexAsync Reads
  15. 15. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Memory optimized tables Durable Mix with disk-based tables Massively concurrent OLTP Distributed Joins for analytics Parallel table scans for non-indexed searches MySQL Cluster 7.4 DBT2 BM 2.5M SQL Statements/Second MySQL Cluster 7.4 SQL Performance 2.5M SQL Statements/Second - 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 2 4 6 8 10 12 14 16 SQLStatements/sec Data Nodes DBT2 SQL Statements per Second
  16. 16. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | MySQL Fabric Scale out with Data Sharding + High Availability Scale-out through sharding Read AND Write Standard framework, no more custom solutions HA out of the box On top of Replication Automatic failover Automatic routing MySQL Fabric Connector Application Read-slaves SQL Master group Read-slaves Master group
  17. 17. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ACQUIRE ORGANIZE CREATE VALUE FROM DATA Import Data Apache Sqoop
  18. 18. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Apache Sqoop Apache TLP, part of Hadoop project Originally developed by Cloudera Bulk data import and export Between Hadoop (HDFS) and external data stores JDBC Connector architecture Supports plug-ins for specific functionality Fast Path Connector developed for MySQL
  19. 19. Copyright 2014 Oracle and/or its affiliates. All rights reserved. | MySQL Applier for Hadoop Copyright 2014, Oracle and/or its affiliates. All rights reserved. Real-time streaming of events from MySQL to Hadoop Supports move towards Speed of Thought analytics Connects to the binary log, writes events to HDFS via libhdfs library Each database table mapped to a Hive data warehouse directory Enables eco-system of Hadoop tools to integrate with MySQL data Available for download now: labs.mysql.com labs.mysql.com
  20. 20. Copyright 2014 Oracle and/or its affiliates. All rights reserved. | MySQL Applier for Hadoop 21 labs.mysql.com
  21. 21. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ANALYZE DECIDE CREATE VALUE FROM DATA Analyze Export Data Decide
  22. 22. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Analyze Big Data in Hadoop
  23. 23. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | MySQL Reporting Database for BI Copyright 2014, Oracle and/or its affiliates. All rights reserved.
  24. 24. Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Summary Create value from Big Data with MySQL MySQL + Hadoop: widely deployed solution (80% of Hadoop project) Best of both worlds: SQL + NoSQL Access; Schema-less data management Scale Out & data sharding with MySQL Fabric Tools and expertise to support you