odsc west tidalscale keynote slides
Post on 22-Jan-2017
286 Views
Preview:
TRANSCRIPT
Software-Defined ServersA Game Changer for Data Scientists
Gary SmerdonCEOTidalScale, Inc.
NVMe Flash 150 μs $150,000Flash Array 1 ms $1,500,000
Hard Drive 5 ms $7,500,000
TCP packet retransmit 2 s $2,000,000,000
The ProblemIn-Memory Computing is Key
Operation ProcessingLatency
In Human Terms
L1-L3 Cache 1-13 ns $1DRAM 50 ns $50
Memory Cliff
Variety & Volume: Data growing at 62% CAGR
GB TBPB
EB
ZB
Data
Vol
umes
Time
Velocity:Data value declines with ageBu
sines
s Val
ue
Age of Data (seconds)
Current Approaches
- Lower prediction accuracy
Algorithm
- Lower prediction accuracy
Sub Sample
Shard 4
Shard 3
Shard 2
Shard 1- Lower prediction accuracy- Time & Money
Single Huge Server
Easy to discover relationships
Store
A
Hard to discover relationships
Sharded
Store
BSt
ore C
Store
D
Department ADepartment BDepartment CDepartment DDepartment EDepartment FDepartment G
Product AProduct B
Product CProduct D
Seeing All the Data Uncovers Relationships
Recent Data is more Predictive
2005-2007 mortgage data would have predicted the 2008 mortgage crisis…but analysts used data only from 2004
Five-year Modeled Default Frequency Rate by Deal Vintage Year
Year
Five-
year
Cum
ulat
ive
Defa
ult F
requ
ency
Ra
te
2002 2003 2004 2005 2006 20070%
5%
10%
15%
20%
25%
30%
35% Actual Default Frequency Rate
Model with data through 2004Model with data through 2007
Model Accuracy Needs RAM: Decision Trees• Decision Trees model
error rates decline with data size & tree depth
• Learning time decreases with tree depth
• More data & greater tree depth consumes more RAM
Number of Observations:
Prediction Error Rate by Data Set Size & Decision Tree Depth
What If Servers were Software-Defined?
•In-memory performance at scale•As many cores as needed•Self optimizing•Everything just works•Uses standard hardware
Software-Defined Servers
Traditional VirtualizationVi
rtual
Phys
ical
Multiple virtual machines share a single physical
server
Virtual Machin
e
Virtual Machin
e
Virtual Machin
e
ApplicationOperating
System
100%, bit-for-bit unmodified
ApplicationOperating System
ApplicationOperating
System
Single virtual machine spans multiple physical servers
TidalScale: Software-Defined Servers
Application
Operating System
… HyperKernel HyperKernel HyperKernel
TidalScale Virtual Machine
100%, bit-for-bit unmodified
HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel
Application
Operating System
TidalScale Software-Defined Server
Flexible – Scales Up or Down Quickly
Seamless Scalability
HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel
Uses patented machine learning to transparently align resources
Application
Operating System
TidalScale Software-Defined Server
Machine Learning Driven Self-Optimization
Applications
Operating Systems
Virtual Machine
If it runs, it runs on a TidalScale Software-Defined Server
TidalScale Software-Defined Server HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel
100% CompatibleCo
ntai
ners
Use Case: Retail Analytics on TidalScalePerformance Comparisons (TPC-H “Powertest” in
Minutes)
Workload Size in GB
Min
utes
to P
roce
ss
1000
10
20
30
40
50
70
100 Amazon EC2
0 200 300 400 500 600 700 800 900 1,000
60
80
90
69.1
TEST FAILSTidalScale Software-Defined Server
22.033.7
Benchmark: Open Source R on TidalScale
• Version: Revolution R Open 8.0.3 with pryr, dplyr, mgcv, rpart, randomForest, FNN, Matrix, doparallel & foreach
• Data: CMS Public Use Dataset• In-memory footprints: 32GB-
680GB• Operations timed:
• Load• Join• GAM linear regression• GLM linear regression• Decision Tree• Random Forest (fixed seed)• K Nearest Neighbors
Open Source R Performance Comparisons
Tota
l Exe
cutio
n Ti
me
(Min
utes
)
1000
100,000
200,000
400,000
700,000
- 200 300 400 500 600
300,000
500,000
600,000
Bare Metal Server (128GB)
158 days
TidalScale Software-Defined Server (5 x 128GB nodes)
1,325 3,787
https://github.com/TidalScale/R_benchmark_test
Workload Size in GB
IBM & TidalScale
Software-Defined Server
15TB RAM & 400 cores
(1 rack, 20 machines)
Tomorrow’s Servers Today: A Game-Changer
“Software-defined Servers make it easy to run memory-intensive applications like data mining, machine learning and simulation.” Marc Jones, Director & Distinguished Engineer, IBM
“This is the way all servers will be built in the future.”
Gordon BellIndustry legend & 1st outside investor in TidalScale
top related