right-sizing your big data infrastructure your big data infrastructure tom lyon founder & chief...
TRANSCRIPT
Right-Sizing Your Big Data Infrastructure
Tom Lyon Founder & Chief Scientist For Strata + Hadoop World, Mar. 15, 2017
Cluster Out of Balance?
2
• Too little CPU, too much disk?
• Too little disk, too much CPU?
• How can you evolve the cluster balance as workloads change?
Too Many Silos? Too many SKUs?
3
• Each type of cluster “wants” a different amount of disk per server • Hadoop Data Lake • Dev/Test • Hbase • Kafka • Cassandra • …
• Fixed silos per cluster type lead to madness • No resource sharing • No elasticity • Too many server types / SKUs
Hadoop Storage Needs vs Supposed Solutions
4 DriveScale Confidential Information © 2016
Locality Converged compute &
storage
Replication Extreme Read BW
Erasure Coding
Examples
Hadoop HDFS ✔ ✔ ✔ ✔ ✖
NAS - Enterprise ✖ ✖ ✔ ✖ ✔ Isilon, Qumulo, Gluster
NAS - HPC ✖ ✖ ✖ ✔ ✖ Lustre, GPFS
SAN/Block - External ✖ ✖ ✔ ✖ ☐ ScaleIO, Ceph, Datera, Cinder, AWS EBS
SAN/Block - Hyperconverged
✖ ✔ ✔ ✖ ☐ Nutanix, ScaleIO, Robin
Object ✖ ✖ ✔ ✖ ✔ AWS S3, Scality, Swift, EMC ECS
DriveScale is a rack scale architecture, providing composable infrastructure on pooled commodity resources
5
Typical Rack Server Rack Configuration
• Compute pool: Processor + Memory Servers
• 1U DriveScale Adapter (DA) -Ethernet to SAS
• Storage pool: Disks in JBODs, connected via SAS to DAs
Rack Scale Architecture
DriveScale Adapter DriveScale Adapter
• DriveScale composes Logical Nodes (software defined physical nodes)
• Example: Logical
node might consist of dual proc server and 12 drives across 2 JBODs
6
DriveScale spans the data center and makes resources fungible
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
Cluster 1 Balanced
Cluster 2 Data Lake
Cluster 3 Compute Heavy
The boundaries between clusters are “movable” in software
DriveScale’s Core Value Propositions
7
Flexible and Responsive Physical Infrastructure
• Get the infrastructure that’s needed when it’s needed
• Repurpose resources on demand
Simplicity for Any Scale • No changes in the app stack
required. • Equivalent performance to
direct attached drives • No loss in “data locality”
information
Enterprise Class Solution • Highly available, Secure,
Reliable • Use industry standard
servers and storage of your choice