scalable automl for time series forecasting using ray

24
Jul 9, 2020 Scalable AutoML for Time Series Forecasting using Ray Shengsheng Huang, Jason Dai Intel Corporation

Upload: others

Post on 05-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Scalable AutoML for Time Series Forecasting using Ray

Shengsheng Huang, Jason Dai

Intel Corporation

Page 2: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

• Background

• Scalable AutoML for Time Series on Ray

• Use Case Sharing & Learnings

Outline

Page 3: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

• Background

• analytics-zoo, time series, automl, rayonspark

• Scalable AutoML for Time Series on Ray

• Use Case Sharing & Learnings

Outline

Page 4: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Seamless Scaling from Laptop to Distributed Big Data

Distributed, High-Performance

Deep Learning Framework for Apache Spark*

https://github.com/intel-analytics/bigdl

Unified Analytics + AI Platform for TensorFlow*, PyTorch*, Keras*, BigDL,

OpenVINO, Ray* and Apache Spark*

https://github.com/intel-analytics/analytics-zoo

*Other names and brands may be claimed as the property of others.

AI on Big Data

Page 5: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Analytics Zoo

Recommendation

Distributed TensorFlow & PyTorch on Spark

Spark Dataframes & ML Pipelines for DL

RayOnSpark

InferenceModel

Models & Algorithms

End-to-end Pipelines

Time Series Computer Vision NLP

Unified Data Analytics and AI Platform

https://github.com/intel-analytics/analytics-zoo

ML Workflow AutoML Automatic Cluster Serving

Compute Environment

K8s Cluster Cloud

Python Libraries (Numpy/Pandas/sklearn/…)

DL Frameworks (TF/PyTorch/OpenVINO/…)

Distributed Analytics (Spark/Flink/Ray/…)

Laptop Hadoop Cluster

Powered by oneAPI

Page 6: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Time Series Forecasting

• Problem Definition

• Given all history observations 𝑦1, … , 𝑦𝐭 , Predict

values of next 𝐡 steps, 𝑦𝐭+𝟏, … , 𝑦𝐭+𝒉

• Usually only lookback 𝐤 steps, 𝑦𝐭−𝒌+𝟏, … , 𝑦𝐭

• Applications

• demand prediction, network quality analysis,

predictive maintenance, AIOps

• Forecasting Methods

• Autoregression, Exponential Smoothing, ARIMA, …

• Machine Learning and Deep Learning methods

𝑦1 𝑦2 𝑦𝑡 𝑦𝑡+1 𝑦𝑡+ℎ𝑦𝑡−𝑘+1

lookback k steps predict h steps forward

Page 7: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

AutoML

Taking the Human out of Learning Applications : A Survey on Automated Machine Learning. Yao, Q., Wang, et. al

Page 8: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

RayOnSpark: Motivations

• Ray : A distributed framework for emerging AI applications

• Ray Tune, Ray Rllib, RaySGD, etc.

• https://github.com/ray-project/ray/

• RayOnSpark

• Directly run Ray programs on big data cluster

• Integrate Ray into Spark pipeline

Page 9: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

RayOnSpark: Architecture

https://ray.readthedocs.io/en/latest/https://analytics-zoo.github.io/master/#ProgrammingGuide/rayonspark/

• Ray provision with Spark

• Data Sharing between Ray & Spark

Page 10: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

RayOnSpark: How to Use

sc = init_spark_on_yarn(...)

ray_ctx = RayContext(sc=sc, ...)

ray_ctx.init()

#Ray code

@ray.remote

class TestRay():

def hostname(self):

import socket

return socket.gethostname()

actors = [TestRay.remote() for i in range(0, 100)]

print([ray.get(actor.hostname.remote()) \

for actor in actors])

ray_ctx.stop()

“RayOnSpark: Running Emerging AI Applications on Big Data Clusters with Ray and Analytics Zoo”https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a

• Init SparkContext

• Init RayContext

• … run ray programs

• Stop RayContext

• Stop SparkContext

Page 11: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

• Background

• Scalable AutoML for Time Series on Ray

• architecture, execution flow, apis, etc.

• Use Case Sharing & Learnings

Outline

Page 12: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Laptop Spark Cluster YARN Cluster K8s Cluster

Distributed TensorFlow & PyTorch on Spark

Spark Dataframes & ML Pipelines for DL

RayOnSpark

Model Inference

Reco

mm

end

ation

Co

mp

uter

Visio

n

NLP

Cluster Serving

Au

toM

L Fram

ewo

rk

Integrated Analytics and AI Pipelines

Time Series

Algo

rithm

s

Hyper-Parameter Tuning

Feature Generation Model Selection

An

alytics Zoo

Forecasting

……

AnomalyDetection

ML

Workflow

Built-in

Algorithms

and Models

Time Series Solution

User Models

Time Series Solution In Analytics Zoo

Rich models and algorithms

(neural-networks, hybrid, state-of-art)

AutoML

(automatic feature generation, model

selection, hyper-parameter tuning,

etc.)

Seamless scaling

(with integrated analytics and AI

pipelines)

Page 13: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

• AutoML Framework

• FeatureTransformer

• Model

• SearchEngine

• Pipeline

• Time Series upon AutoML

• TimeSequencePredictor

• TimeSequencePipeline

Software Stack

*Other names and brands may be claimed as the property of others.

https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-

analytics-zoo-b79a6fd08139

Page 14: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Training at Runtime

FeatureTransformer

Model

SearchEngine

Search presets

Workflow implemented in TimeSequencePredictor

trial

trial

trial

trial

…best model/parameters

trail jobs

Pipeline

with tunable parameters

with tunable parameters

configured with best parameters/model

Each trial runs a different combination of hyper parameters

Ray Tune

rolling, scaling, feature generation, etc.

Spark + Ray

Page 15: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

• Training a Pipeline

• fit (w/ automl)

• recipe

• distributed

• Using a Pipeline

• save/load

• evaluate/predict

• fit (incremental)

A glimpse of the APIs

Page 16: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Project Zouwu: Time Series for Telco

Project Zouwu: application framework for building E2E time series analysis

• Use case - reference time series use cases

• Models - built-in models for time series analysis

AutoTS - AutoML support for building E2E time series analysis pipelines

Project Zouwu

Built-in Models

ML WorkflowAutoML Workflow

Integrated Analytics & AI Pipelines

use-case

models autots

https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/zouwu

Page 17: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

• Background

• Scalable AutoML for Time Series on Ray

• Use Case Sharing & Learnings

Outline

Page 18: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 202018

Network Traffic KPI Forecasting in Project Zouwu

Use Case Description

Use aggregated traffic KPI’s (i.e. total bytes, average rate in Mbps/Gbps) in the past week to forecast the KPI in the next two hours.

2 ways to solve this problem using Zouwu

• Use built-in “Forecaster” models for training, and forecasting (example notebook link)

• Use “AutoTS” (with built-in AutoML support) to train an E2E Time Series Analysis Pipeline, and forecast (example notebook link)

Example result of network traffic average rate forecasting on the test period

Page 19: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Spark-SQL

Data Loading

DataLoader

Data Source APIs

File, HTTP, Kafka

forked.

DRAM Store

customized.

FlashStore

tiering

Preprocess RDD of Tensor Model Code of TF

DL Training & Inferencing

Data Model

SIMD Acceleration

Time Series Based Network Quality Prediction in SK Telecom

https://databricks.com/session_eu19/apache-spark-ai-use-case-in-telco-network-quality-analysis-and-prediction-with-geospatial-visualization

Page 20: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Forecast-based Analysis for AIOps at Neusoft

https://platform.neusoft.com/2020/01/17/xw-intel.html

https://platform.neusoft.com/2017/03/04/qt-baomaqiche.html

Page 21: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Takeaways from Early Users

• Highlights

• Additional features allowed

• Less efforts in tuning

• Satisfactory accuracy

• Data quality matters

• Scale, scale, scale

• hundreds of thousands of cells X N KPIs

• millions of servers/containers X N metrics

Page 22: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Future Work

• Handle massive time series

• Search algo, meta-learning, ensemble, …

• More models, features, …

• Automatic data preprocessing (e.g. missing data & outliers)

Page 23: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

Page 24: Scalable AutoML for Time Series Forecasting using Ray

Jul 9, 2020

• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit intel.com/performance.

• Intel does not control or audit the design or implementation of third-party benchmark data or websites referenced in this document. Intel encourages all of its customers to visit the referenced websites or others where similar performance benchmark data arereported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.

• Optimization notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com/benchmarks.

• Intel, the Intel logo, Intel Inside, the Intel Inside logo, Intel Atom, Intel Core, Iris, Movidius, Myriad, Intel Nervana, OpenVINO, Intel Optane, Stratix, and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

• *Other names and brands may be claimed as the property of others.

• © Intel Corporation

Legal Notices and Disclaimers