odsc west tidalscale keynote slides

16
Software-Defined Servers A Game Changer for Data Scientists Gary Smerdon CEO TidalScale, Inc.

Upload: chuck-piercey

Post on 22-Jan-2017

286 views

Category:

Technology


17 download

TRANSCRIPT

Page 1: ODSC West TidalScale Keynote Slides

Software-Defined ServersA Game Changer for Data Scientists

Gary SmerdonCEOTidalScale, Inc.

Page 2: ODSC West TidalScale Keynote Slides

NVMe Flash 150 μs $150,000Flash Array 1 ms $1,500,000

Hard Drive 5 ms $7,500,000

TCP packet retransmit 2 s $2,000,000,000

The ProblemIn-Memory Computing is Key

Operation ProcessingLatency

In Human Terms

L1-L3 Cache 1-13 ns $1DRAM 50 ns $50

Memory Cliff

Variety & Volume: Data growing at 62% CAGR

GB TBPB

EB

ZB

Data

Vol

umes

Time

Velocity:Data value declines with ageBu

sines

s Val

ue

Age of Data (seconds)

Page 3: ODSC West TidalScale Keynote Slides

Current Approaches

- Lower prediction accuracy

Algorithm

- Lower prediction accuracy

Sub Sample

Shard 4

Shard 3

Shard 2

Shard 1- Lower prediction accuracy- Time & Money

Page 4: ODSC West TidalScale Keynote Slides

Single Huge Server

Easy to discover relationships

Store

A

Hard to discover relationships

Sharded

Store

BSt

ore C

Store

D

Department ADepartment BDepartment CDepartment DDepartment EDepartment FDepartment G

Product AProduct B

Product CProduct D

Seeing All the Data Uncovers Relationships

Page 5: ODSC West TidalScale Keynote Slides

Recent Data is more Predictive

2005-2007 mortgage data would have predicted the 2008 mortgage crisis…but analysts used data only from 2004

Five-year Modeled Default Frequency Rate by Deal Vintage Year

Year

Five-

year

Cum

ulat

ive

Defa

ult F

requ

ency

Ra

te

2002 2003 2004 2005 2006 20070%

5%

10%

15%

20%

25%

30%

35% Actual Default Frequency Rate

Model with data through 2004Model with data through 2007

Page 6: ODSC West TidalScale Keynote Slides

Model Accuracy Needs RAM: Decision Trees• Decision Trees model

error rates decline with data size & tree depth

• Learning time decreases with tree depth

• More data & greater tree depth consumes more RAM

Number of Observations:

Prediction Error Rate by Data Set Size & Decision Tree Depth

Page 7: ODSC West TidalScale Keynote Slides

What If Servers were Software-Defined?

•In-memory performance at scale•As many cores as needed•Self optimizing•Everything just works•Uses standard hardware

Software-Defined Servers

Page 8: ODSC West TidalScale Keynote Slides

Traditional VirtualizationVi

rtual

Phys

ical

Multiple virtual machines share a single physical

server

Virtual Machin

e

Virtual Machin

e

Virtual Machin

e

ApplicationOperating

System

100%, bit-for-bit unmodified

ApplicationOperating System

ApplicationOperating

System

Page 9: ODSC West TidalScale Keynote Slides

Single virtual machine spans multiple physical servers

TidalScale: Software-Defined Servers

Application

Operating System

… HyperKernel HyperKernel HyperKernel

TidalScale Virtual Machine

100%, bit-for-bit unmodified

Page 10: ODSC West TidalScale Keynote Slides

HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel

Application

Operating System

TidalScale Software-Defined Server

Flexible – Scales Up or Down Quickly

Seamless Scalability

Page 11: ODSC West TidalScale Keynote Slides

HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel

Uses patented machine learning to transparently align resources

Application

Operating System

TidalScale Software-Defined Server

Machine Learning Driven Self-Optimization

Page 12: ODSC West TidalScale Keynote Slides

Applications

Operating Systems

Virtual Machine

If it runs, it runs on a TidalScale Software-Defined Server

TidalScale Software-Defined Server HyperKernel … HyperKernel HyperKernel HyperKernel HyperKernel

100% CompatibleCo

ntai

ners

Page 13: ODSC West TidalScale Keynote Slides

Use Case: Retail Analytics on TidalScalePerformance Comparisons (TPC-H “Powertest” in

Minutes)

Workload Size in GB

Min

utes

to P

roce

ss

1000

10

20

30

40

50

70

100 Amazon EC2

0 200 300 400 500 600 700 800 900 1,000

60

80

90

69.1

TEST FAILSTidalScale Software-Defined Server

22.033.7

Page 14: ODSC West TidalScale Keynote Slides

Benchmark: Open Source R on TidalScale

• Version: Revolution R Open 8.0.3 with pryr, dplyr, mgcv, rpart, randomForest, FNN, Matrix, doparallel & foreach

• Data: CMS Public Use Dataset• In-memory footprints: 32GB-

680GB• Operations timed:

• Load• Join• GAM linear regression• GLM linear regression• Decision Tree• Random Forest (fixed seed)• K Nearest Neighbors

Open Source R Performance Comparisons

Tota

l Exe

cutio

n Ti

me

(Min

utes

)

1000

100,000

200,000

400,000

700,000

- 200 300 400 500 600

300,000

500,000

600,000

Bare Metal Server (128GB)

158 days

TidalScale Software-Defined Server (5 x 128GB nodes)

1,325 3,787

https://github.com/TidalScale/R_benchmark_test

Workload Size in GB

Page 15: ODSC West TidalScale Keynote Slides

IBM & TidalScale

Software-Defined Server

15TB RAM & 400 cores

(1 rack, 20 machines)

Tomorrow’s Servers Today: A Game-Changer

“Software-defined Servers make it easy to run memory-intensive applications like data mining, machine learning and simulation.” Marc Jones, Director & Distinguished Engineer, IBM

Page 16: ODSC West TidalScale Keynote Slides

“This is the way all servers will be built in the future.”

Gordon BellIndustry legend & 1st outside investor in TidalScale