can the elephants handle the nosql...

Post on 23-Jun-2020

23 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CAN THE ELEPHANTS HANDLE THE CAN THE ELEPHANTS HANDLE THE CAN THE ELEPHANTS HANDLE THE CAN THE ELEPHANTS HANDLE THE

NOSQL ONSLAUGHT?NOSQL ONSLAUGHT?NOSQL ONSLAUGHT?NOSQL ONSLAUGHT?

by

SRIDHAR REDDY VORUGANTISRIDHAR REDDY VORUGANTISRIDHAR REDDY VORUGANTISRIDHAR REDDY VORUGANTI

CSU ID: 2607043CSU ID: 2607043CSU ID: 2607043CSU ID: 2607043

ABSTRACTABSTRACTABSTRACT

• Traditional DBMSs under attack.

• NoSQL vs. SQL.

• Result (evaluation).

WHAT WE DISCUSS??WHAT WE DISCUSS??WHAT WE DISCUSS??

INTRODUCTIONINTRODUCTIONINTRODUCTION…………

• The database community is currently at an unprecedented and exciting inflection point.

• RDBMSs are no longer the only viable alternative for data-driven applications.

• At the other end of the big data application spectrum are analytical decision support workloads that are characterized by complex queries on massive amounts of data.

• The results are shown for the sole purpose of providing relative comparisons for this paper, and should not be compared to official benchmark results.

BACKGROUNDBACKGROUNDBACKGROUND…………

• Parallel Data Warehouse (PDW)

• Hive

• MongoDB

• Parallel database system.

• Two types of nodes-compute and control.

• Data-horizontally partitioned.

• DMS-shuffling data between nodes.

• Post-processing and re-integration by control node.

• Open-source data warehouse.

• HDFS-data storage.

• HiveSQL.

• Multiple data storage formats.

• Open-source NoSQL database.

• Collections-Documents.

• No need of schema.

• Supports Auto-partitioning technique.

• Supports replica sets.

EVALUATIONEVALUATIONEVALUATION…………

• Evaluation of RDBMS and a NoSQL system

• We use TPC-H to evaluate Microsoft’s PDW and Hive.

• Compare MongoDB with Microsoft SQL Server using YCSB benchmark.

HARDWARE CONFIGURATIONHARDWARE CONFIGURATIONHARDWARE CONFIGURATION…………

• 1Gbit HP Procurve Ethernet switch with 16nodes.

• Each node with 2.13 GHz, 32 GB of main memory, and 10 SAS 10K RPM 300GB hard drives.

• When evaluating PDW and Hive, we used eight disks to store the data.

• YCSB experiments-eight nodes were used as servers.

SOFTWARE CONFIGURATIONSOFTWARE CONFIGURATIONSOFTWARE CONFIGURATION…………

• Hive and Hadoop

• PDW

• MongoDB (Mongo-AS)

• Hive 0.7.1 and Hadoop 0.20.203

• RCFile format instead of text files

• JVM size 2GB.

• PDW– Version AU3

– Maximum 24GB memory.

• MongoDB– Version 1.8.2

– “Global lock” for write.

HIVE VS. PDWHIVE VS. PDWHIVE VS. PDW…………

• Workload Description

• Data Layout

• Data Preparation and Load Times

• Experimental Evaluation

DATA LAYOUT…

� Hive-Partitions and buckets

� PDW-Partitions and Replication

Data preparation steps

• Generate TPC-H dataset• Hive table for each TPC-H table• Load data in two phases

• Data loaded to HDFS• Data converted to RCFile

Hive PDW

• TPC-H is generated on landing node• Specify schema and tables• Text files split into multiple chunks

• Chunks loaded to nodes

EXPERIMENTAL EVALUATION…

QUERIESQUERIESQUERIES…………

• Performance Analysis– Query 5

– Query 19

• Scalability Analysis– Query 1

– Query 22

• Query 5(joins customer, orders, lineitem, supplier, nation and region)

• Query 19(joins lineitem, part)

• Query 1

– Scans ‘lineitem’

• Query 22

– Scans customer table

– 4 sub-queries

MONGODB VS. SQL SERVERMONGODB VS. SQL SERVERMONGODB VS. SQL SERVER…………

• Workload Description– YCSB benchmark

• Read heavy and Read only

• Experimental Evaluation– YCSB benchmark

• Update heavy, Read latest and Short ranges

YCSB BENCHMARKYCSB BENCHMARKYCSB BENCHMARK…………

CONCLUSIONS AND FUTURE WORKCONCLUSIONS AND FUTURE WORKCONCLUSIONS AND FUTURE WORK…………

• Popular alternatives.

• the TPC-H benchmark and the YCSB benchmark.

• Our results find that the relational systems continue to provide a significant performance advantage over their NoSQL counterparts, but the NoSQL alternatives are competitive in some cases.

• Expand SQL and NoSQL systems and revisit the performance differences in a few years.

REFERENCESREFERENCESREFERENCES…………

• http://hadoop.apache.org/

• http://mongodb.org/

• http://tpc.org/tpch/

top related