h2o world - welcome to h2o world with arno candel

9
Welcome to H2O World Sri & H2O Team

Upload: jo-fai-chow

Post on 16-Apr-2017

904 views

Category:

Software


0 download

TRANSCRIPT

Page 1: H2O World - Welcome to H2O World with Arno Candel

We l c ome ' t o ' H 2O 'Wo r l d

Sri'&'H2O'Team'

Page 2: H2O World - Welcome to H2O World with Arno Candel

Data  Science   is  a  Team  Sport!  

                                                                                           Culture  Matters!

Page 3: H2O World - Welcome to H2O World with Arno Candel

Open  Source  Breeds  Courage!  

Community  Matters!  

Every  generation  needs  to  make  its  own  history!

Page 4: H2O World - Welcome to H2O World with Arno Candel

Code   is  conversation  with  Customer!

Great  Product  Matters!

Page 5: H2O World - Welcome to H2O World with Arno Candel

Accuracy  with  Speed  and  Scale

HDFS%

S3%

SQL%%

NoSQL%

CLASSIFICATION%REGRESSION%

FEATURE%ENGINEERING%

IN4MEMORY%

MAP%REDUCE/FORK%JOIN%

COLUMNAR%COMPRESSION%

DEEP%LEARNING%

PCA,%GLM,%COX%

RANDOM%FOREST%/%GBM%ENSEMBLES%

FA S T %MODE L ING % ENG INE %

Streaming% NANO % FA ST % JAVA % S COR ING % ENG INES %

MATRIX%FACTORIZATION% CLUSTERING%

MUNGING%

Page 6: H2O World - Welcome to H2O World with Arno Candel

What ’s  New  in  H2O-­‐3

H2O-­‐3  vs  H2O-­‐2:  • Total  rewrite  of  the  core  in  Java:  built  for  data  scientists  AND  developers!  • Unique  Flow  GUI  (Notebook  and  more)  • REST  Schemas  for  self-­‐describing  API  for  all  methods/algos  • New  R  client:  cleaner,  faster  • Sparkling  Water:  H2O  is  the  Killer  App  on  Spark  • Fully  featured  Python  client  (incl.  Pipelines,  scikit-­‐learn  look&feel)  • New  expression  parser  &  backend  execution  engine  for  R,  Py,  Flow  • New  Algo:  GLRM  -­‐  Generalized  Low  Rank  Modeling(unifies  PCA,  K-­‐Means,  Matrix  Factorization,  Imputation,  etc.)  

• New  Solvers  for  GLM:  Coordinate  Descent  and  L-­‐BFGScontinued…

Page 7: H2O World - Welcome to H2O World with Arno Candel

What ’s  New  in  H2O-­‐3

Additional  New  Features:  • Grid  Search  for  all  Algorithms  (R/Py/Flow)  • N-­‐fold  Cross-­‐Validation  for  all  Algorithms  • Early  Stopping  (check  for  convergence)  for  GBM/DRF/DL  • Stochastic  GBM  (row/col  sampling)  • Distributions  (Gaussian,  Laplace,  Poisson,  Gamma,  Tweedie)  for  GBM/DL  • Improved  sparse  data  handling  for  DL  • Multi-­‐node  auto-­‐tuning  for  DL  • Multinomial  GLM  • Scalable  Scatter  Plots  for  numeric  and  categorical  data  • Big-­‐Big  Joins  (“distributed  data.table”)  -­‐  in  QA

…and  many  more!

Page 8: H2O World - Welcome to H2O World with Arno Candel

Convergence-­‐Based  Early  Stopping   in  H2O

Before:  trains  too  long,  but  at  least  overwrite_with_best_model=true  prevents  overfitting  (returns  the  model  with  lowest  validation  error)

Now:  specify  additional  convergence  criterion:  E.g.  stopping_rounds=5,  stopping_metric=“MSE”,  stopping_tolerance=1e-­‐3,  to  stop  as  soon  as  the  moving  average  (length  5)  of  the  validation  MSE  does  not  improve  by  at  least  0.1%  for  5  consecutive  scoring  events

validation  error

training  error

overwrite_with_best_model=true

training  time  /  epochs

training  time  /  epochsUse  Flow  to  inspect  the  model

Early  stopping  saves  tons  of  time

Best  Model

Deep  Learning  with  Higgs  data

Page 9: H2O World - Welcome to H2O World with Arno Candel

What  do  these  st ickers  mean?

I have H2O Installed

I have Python installed

I have R installed

I have the H2O World data sets

P i ck  up   s t i cke rs  o r   get   i n s ta l l   he lp   a t   the  in fo rmat ion  booth