building an enterprise analytics platform with jupyter enterprise gateway · 2020. 7. 13. ·...
TRANSCRIPT
-
Building an Enterprise Analytics Platform with Jupyter Enterprise Gateway
github.com/jupyter-incubator/enterprise_gateway
Jupyter Notebook Platform Architecture
Enterprise Analytics Platform Characteristics
Shared Compute Resources• Enterprise Cloud, Public
Cloud or Hybrid
Distributed Consumers• Notebooks running local• Notebooks as services
Different Utilization Patterns• High number of idle
resources
Jupyter Notebook Platform Limitations
• Single-user sharing the same distributed filesystem and privileges
• Resources are limited to single node• No security, users can see and control each other’s
process
Jupyter Enterprise Gateway addresses enterprise requirements and use cases
Current Feature Set
Optimized Resource Allocation• Run Spark in YARN Cluster Mode to better utilize cluster resources• Supports YARN, IBM Spectrum Conductor, Kubernetes resource
managers• Pluggable architecture for additional resource managers
Enhanced Security• Enable TLS for all socket communications• Any HTTP communication should be encrypted (SSL)
Multiuser support with user impersonation • Enhance security and sandboxing by enabling user impersonation
when running kernels.• Individual HDFS home folder for each notebook user.• Use the same user ID for notebook and batch jobs.
Distributed Kernels == Optimized Resources
8 8 8 8
-20
30
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF SIMULTANEOUS KERNELS
16
32
48
64
0
20
40
60
80
4 Nodes 8 Nodes 12 Nodes 16 NodesCluster Size (32GB Nodes)
MAXIMUM NUMBER OF SIMULTANEOUS KERNELS
• Notebook UI runs in the browser
• The Notebook Server serves the ’Notebooks’
• Kernels interpret/execute cell contents
• Are responsible for code execution
• Abstracts different languages
J U P Y T E R E N T E R P R I S E G A T E W A Y
RemoteKernel
Manager
DistributedProcess Proxy
YARN ClusterProcess Proxy
KubernetesProcess Proxy
Conductor Cluster
Process Proxy
J U P Y T E R N O T E B O O K
NB2KG Lab
Jupyter Kernel Gateway
Jupyter Notebook
Spectrum Conductor
SupportedPlatforms
Jupyter Client
Integrated with the Jupyter stack
Remote Kernel Creation & ShutdownRemote Kernelspec{"language": "python","display_name": "Python on Kubernetes","process_proxy": {"class_name": "enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy","config": {"image_name": "elyra/kubernetes-kernel-py:dev"
}},"env": {},"argv": ["python","/usr/local/share/jupyter/kernels/python_kubernetes/scripts/launch_kubernetes.py","{connection_file}","--RemoteProcessProxy.response-address","{response_address}","--RemoteProcessProxy.spark-context-initialization-mode","none"
]}
User Notebook w/ NB2KG
EnterpriseGateway
Resource Manager Kernel
Request Kernel POST /api/kernels Submit via RM RM selects node
Connection Info
Request Kernel InfoKernel Info Reply
Locate via RM
POST Response
Kernel shut down
Shutdown KernelDELETE /api/kernels/
Shutdown message sent to kernel
Terminate via RM
Confirm via RMDELETE Response
Kern
el
Crea
tion
Kern
el
Shut
dow
n
Kernel ReadyWebsocket Est.