![Page 1: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/1.jpg)
AI WEBINARDate/Time: Tuesday, June 9 | 9 am PST
Kubernetes & AIwith Run:AI, Red Hat & Excelero
![Page 2: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/2.jpg)
Presenter:Omri Geller
CEO & Co-Founder
Your Host:Tom LeydenVP Marketing
AI WEBINAR
What’s next in technology and innovation?
Kubernetes & AIwith Run:AI, Red Hat & Excelero
Presenter:William Benton
Engineering Manager
Presenter:Gil Vitzinger
Software Developer
Presenter:Omri Geller
CEO & Co-Founder
Your Host:Tom LeydenVP Marketing
AI WEBINAR
What’s next in technology and innovation?
Kubernetes & AIwith Run:AI, Red Hat & Excelero
Presenter:William Benton
Engineering Manager
Presenter:Gil Vitzinger
Software Developer
![Page 3: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/3.jpg)
Kubernetes for AI WorkloadsOmri Geller, CEO and co-founder, Run:AI
![Page 4: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/4.jpg)
A Bit of History
2
Containers scale easily, they’re lightweight and efficient, they can run any workload, are flexible
and can be isolated…But they need orchestration
Bare Metal
Needed flexibility and better utilization
Virtual Machines
Reproducibility and portability
Containers
![Page 5: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/5.jpg)
Track, Schedule and Operationalize
Enter Kubernetes
3
Execute Across Different
Hardware
Create Efficient Cluster
Utilization
![Page 6: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/6.jpg)
Today, 60% of Those Who Deploy Containers Use K8s for Orchestration*
4
*CNCF
![Page 7: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/7.jpg)
Now let’s talk about AI
![Page 8: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/8.jpg)
6
Manual Engineering
Classical Machine Learning
Computing Power Fuels Development of AI
Deep Learning
![Page 9: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/9.jpg)
7
Artificial Intelligence is a Completely Different Ballgame
Experimentation R&D
New accelerators
Distributed computing
![Page 10: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/10.jpg)
Constant hassles
8
Data Science Workflows and Hardware Accelerators are Highly Coupled
Datascientists
Hardwareaccelerators
Workflow Limitations
Under-utilized GPUs
![Page 11: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/11.jpg)
This Leads to Frustration on Both Sides
9
Data Scientists are frustrated – speed and
productivity are low
IT leaders are frustrated – GPU utilization is low
![Page 12: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/12.jpg)
Container ecosystem for Data Science is growing
AI Workloads are Also Built on Containers
10
NGC – Nvidia pre-trained models for AI experimentation on docker containers
![Page 13: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/13.jpg)
How Can We Bridge The Divide?
11
![Page 14: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/14.jpg)
12
Kubernetes, the “De-facto” Standard for Container Orchestration
Multiple queues
Automatic queueing/de-queueing
Advanced priorities & policies
Advanced scheduling algorithms
Affinity-aware scheduling
Efficient management of distributed workloads
Lacks the following capabilities:
![Page 15: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/15.jpg)
13
Build Training
How is Experimentation Different?
![Page 16: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/16.jpg)
14
Build Training
Distinguishing Between Build and Training Workflows
• Development & debugging• Interactive sessions• Short cycles• Performance is less important• Low GPU utilization
![Page 17: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/17.jpg)
15
Build Training
Distinguishing Between Build and Training Workflows
• Development & debugging• Interactive sessions• Short cycles• Performance is less important• Low GPU utilization
• Training & HPO• Remote execution• Long workloads• Throughput is highly important• High GPU utilization
![Page 18: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/18.jpg)
16
Fixed quotas Guaranteed quotas
How to Solve? Guaranteed Quotas
• Fits build workloads• GPUs are always available
• Fits training workflows• Users can go over quota
![Page 19: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/19.jpg)
17
Fixed quotas Guaranteed quotas
Solution: Guaranteed Quotas
• Fits build workloads• GPUs are always available
• Fits training workflows• Users can go over quota
• More concurrent experiments• More multi-GPU training
![Page 20: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/20.jpg)
18
Queueing Management Mechanism
![Page 21: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/21.jpg)
Run:AI - Stitching it All Together
![Page 22: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/22.jpg)
Run:AI - Applying HPC Concepts to Kubernetes
20
With the advantages of K8s, plus some concepts from the world of HPC & distributed computing, we can bridge the gap
Data Science teams gain productivity
and speed
IT teams gain visibility and maximal GPU
utilization
![Page 23: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/23.jpg)
21
Run:AI - Kubernetes-Based Abstraction Layer
INTEGRABLEEasily integrates with IT and Data Science platforms
MULTI-CLOUDRun on any public, private and hybrid cloud environment
IT GOVERNANCEPolicy based orchestration and queuing management
![Page 24: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/24.jpg)
22
Run:AI
Utilize Kubernetes across IT to improve resource utilization
Speed up experimentation process and time to market
Easily scale infrastructure to meet needs of the business
![Page 25: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/25.jpg)
From 28% to 73% utilization, 2X speed, and $1M savings
23
Challenge
28% AVERAGE GPU UTILIZATION -inefficient and underutilized resources
After implementing Run:AI’s platformSolution
73% AVERAGE GPU UTILIZATION• Enabled 2x more experiments to run• Saved $1M in additional GPU
expenditures for 2020
![Page 26: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/26.jpg)
24
Run:AI at-a-Glance
Venture Funded
• Founded in 2018
• Backed by top VCs
• Offices in Tel Aviv, New York, and Boston
• Fortune 500 customers
• Top cloud and virtualization engineers
![Page 27: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/27.jpg)
Thank you
![Page 28: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/28.jpg)
NVMesh in Kubernetes
![Page 29: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/29.jpg)
What is NVMesh CSI Driver
● What is NVMesh CSI Driver ?
○ CSI - Container Storage Interface
○ NVMesh as a storage backend in Kubernetes
● Main Features
○ Static Provisioning
○ Dynamic Provisioning
○ Block and File System volumes
○ Access Modes (ReadWriteOnce, ReadWriteMany, ReadOnlyMany)
○ Extend volumes
○ Using NVMesh VPGs
29
![Page 30: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/30.jpg)
CSI Driver Components
NVMesh Management
NVMesh CSI Controller
Kubernetes Controller
NVMesh CSI Node Driver
NVMesh CSI Node Driver
NVMesh CSI Node Driver
NVMeshClient
NVMeshClient
NVMeshClient
NVMeshTargets
REST API
30
![Page 31: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/31.jpg)
Dynamic Provisioning & Attach Flow
NVMesh CSI Controller
Kubernetes Controller
NVMesh Management
Create Volume
User creates a Persistent Volume Claim (PVC)
NVMeshTargets
31
![Page 32: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/32.jpg)
Dynamic Provisioning & Attach Flow
NVMesh CSI Controller
Kubernetes Controller
NVMesh CSI Node Driver
NVMesh Client NVMesh Management
OS mount
User creates a POD that uses the PVC
Attach / Detach
User App PODs
/dev/nvmesh/v1
K8s internal mount
POD mount
Node
NVMeshTargets
Data
32
![Page 33: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/33.jpg)
Exposing NVMesh volume in a Pod
kublete/pod2/volumes/v1
/dev/nvmesh/v1
User App POD 1
kubelet/volume/mount
kubelet/pod1/volumes/v1
User App POD 2
FileSystem Volume
mount
NVMesh Client
NVMesh attach
Block Volume
bind mount
mkfs
CSI Publish Volume
For each volume for each POD
CSI Stage Volume
Once for each Volume on the Node
33
![Page 34: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/34.jpg)
Usage Examples
kind: PersistentVolumeClaimapiVersion: v1metadata:
name: block-pvcspec:
accessModes:- ReadWriteMany
volumeMode: Blockresources:requests:
storage: 15GistorageClassName: nvmesh-raid10
kind: StorageClassapiVersion: storage.k8s.io/v1metadata:
name: nvmesh-custom-vpgprovisioner: nvmesh-csi.excelero.comparameters:
vpg: your_custom_vpg
34
![Page 35: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/35.jpg)
Summary
NVMesh Benefits for Kubernetes:
● Persistent storage that scales for stateful applications
● Predictable application performance – ensure that storage is not a bottleneck
● Scale your performance and capacity linearly
● Containers in a pod can access persistent storage presented to that pod, but with the freedom to restart the pod on an alternate physical node
● Choice of Kubernetes PVC access mode to match the storage to the application and file system requirements
35
![Page 36: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/36.jpg)
William Benton Engineering Manager and Senior Principal Engineer Red Hat, Inc.
Machine learning discovery, workflows, and systems on Kubernetes
![Page 37: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/37.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 38: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/38.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 39: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/39.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 40: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/40.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 41: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/41.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 42: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/42.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 43: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/43.jpg)
configuration data collection
feature extraction process management
analysis tools
monitoring
serving infrastructure
machine resource
management
data verification
(Adapted from Sculley et al., “Hidden Technical Debt in Machine Learning Systems.” NIPS 2015)
![Page 44: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/44.jpg)
configuration data collection
feature extraction process management
analysis tools
monitoring
serving infrastructure
machine resource
management
data verification
(Adapted from Sculley et al., “Hidden Technical Debt in Machine Learning Systems.” NIPS 2015)
![Page 45: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/45.jpg)
data engineers
federate
events
databases
file, object storage
transform
transform
transform
archive
![Page 46: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/46.jpg)
data scientists
federate
trainmodels
events
databases
file, object storage
developer UItransform
transform
transform
![Page 47: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/47.jpg)
application developers
models
events
databases
file, object storage
management
web and mobile
reporting
transform
transform
transform
archivefederate
train
developer UI
![Page 48: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/48.jpg)
data scientists
application developersdata engineers
models
events
databases
file, object storage
management
web and mobile
reporting
developer UItransform
transform
transform
archive
train
federate
![Page 49: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/49.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 50: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/50.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 51: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/51.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 52: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/52.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and cleaning
model deployment
monitoring, validation
![Page 53: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/53.jpg)
How Kubernetes can help
![Page 54: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/54.jpg)
Immutable images
base image
configuration and installation recipes
user application code
979229b9
33721112 e8cae4f6 2bb6ab16 a8296f7e
a6afd91e 6b8cad3e
![Page 55: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/55.jpg)
Immutable images
base image
configuration and installation recipes
user application code
979229b9
33721112 e8cae4f6 2bb6ab16 a8296f7e
a6afd91e 6b8cad3e
![Page 56: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/56.jpg)
Immutable images
base image
configuration and installation recipes
user application code
979229b9
33721112 e8cae4f6 2bb6ab16 a8296f7e
a6afd91e 6b8cad3e
model in production on 16 July 2019
![Page 57: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/57.jpg)
Stateless microservices
![Page 58: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/58.jpg)
Stateless microservices
![Page 59: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/59.jpg)
Stateless microservices
![Page 60: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/60.jpg)
Stateless microservices
![Page 61: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/61.jpg)
Stateless microservices
![Page 62: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/62.jpg)
Stateless microservices
![Page 63: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/63.jpg)
Stateless microservices
![Page 64: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/64.jpg)
Stateless microservices
![Page 66: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/66.jpg)
Integration and deployment
![Page 67: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/67.jpg)
Integration and deployment
OK!
![Page 68: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/68.jpg)
Integration and deployment
OK!base image
configuration and installation recipes
application codeapplication code
![Page 69: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/69.jpg)
Integration and deployment
base image
configuration and installation recipes
application code
![Page 70: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/70.jpg)
Data drift
![Page 71: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/71.jpg)
Data drift
![Page 72: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/72.jpg)
On-demand discovery with the Open Data Hub
![Page 73: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/73.jpg)
![Page 74: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/74.jpg)
![Page 75: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/75.jpg)
0 0 0 1 1 0 1 0 1 0
0 0 1 0 0 0 1 1 0 0
1 0 1 1 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 1
0 1 0 0 1 0 0 1 0 0
1 0 0 0 0 1 0 1 1 0
0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 1 1
0 0 0 0 1 0 0 1 0 1
1 1 0 0 0 0 0 0 0 1
0.13 0.13
0.06 0.07
0.07 0.06
0.02 0.08
0.17 0.11
0.11 0.09
0.04 0.18
0.13 0.04
0.13 0.21
0.14 0.03
*
![Page 76: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/76.jpg)
more storage
sensitive data
more CPUsbetter GPUs
![Page 78: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/78.jpg)
PostgreSQL MariaDB Apache Spark SQL
Apache Kafka (via Strimzi)
Red Hat Ceph Storage
TensorFlow Serving PyTorch Serving Seldon
Spark Katib TFJob PyTorch
Argo Kubeflow Pipelines
OpenShift
JupyterHub Apache Superset
Grafana Prometheus
![Page 79: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/79.jpg)
codifying problem
and metrics
feature engineering
model training
and tuning
model validation
data collection
and cleaning
model deployment
monitoring, validation
![Page 80: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/80.jpg)
3
feature engineering
model training
and tuning
model validation
2
![Page 81: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/81.jpg)
codifying problem and metrics
feature engineering
model training and tuning
model validation
data collection and
cleaning
model deployment
monitoring, validation
OpenShift Pipelines
![Page 82: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/82.jpg)
codifying problem and metrics
model validation
data collection and
cleaning
model deployment
monitoring, validation
2 3
OpenShift Pipelines
![Page 83: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/83.jpg)
REST endpoint
OpenShift Serverless
![Page 84: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/84.jpg)
Further resources
Open Data Hub web site: https://opendatahub.io
Contribute: https://github.com/opendatahub-io
Get involved: https://gitlab.com/opendatahub/opendatahub-community
ML workflows on OpenShift and Open Data Hub: https://bit.ly/ml-workflows-ocp
![Page 85: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/85.jpg)
![Page 86: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed](https://reader034.vdocuments.us/reader034/viewer/2022050211/5f5d5d1eac7afb6a6827e41b/html5/thumbnails/86.jpg)
Thank you!