revolutionizing data science package management with conda
TRANSCRIPT
© 2017 Continuum Analytics - Confidential & Proprietary© 2017 Continuum Analytics - Confidential & Proprietary
Revolutionizing Data Science Package
Management with conda
Travis Oliphant, Co-founder, President & Chief Data Scientist
Kale Franz, conda Tech Lead
July 25, 2017
© 2017 Continuum Analytics - Confidential & Proprietary 20
• President, Chief Data Scientist & Co-Founder of Continuum Analytics
• Ph.D. from the Mayo Clinic, BS & MS degrees in mathematics and
electrical engineering from Brigham Young University
• Primary developer of NumPy package
• Founding contributor of SciPy package
• Author of Guide to NumPy
Travis Oliphant
© 2017 Continuum Analytics - Confidential & Proprietary 21
• Lead developer of the conda project
• Ph.D. in electrical engineering from Princeton University
• Domain expert in the field of quantum cascade lasers
• Made infrared semiconductor lasers for NASA jet Propulsion Laboratory
• Previously backend developer and DevOps tech lead at 23andMe
Kale Franz
22
What is Anaconda?
© 2017 Continuum Analytics - Confidential & Proprietary 232
3
The Most Popular Data Science Ecosystem with Over 4.5M Users
© 2017 Continuum Analytics - Confidential & Proprietary 242
4
conda: The core of Anaconda Distribution
• Written in Python, language
agnostic
• Multi-platform (Windows,
macOS, Linux)
• Powering the data science
ecosystem
• No admin privileges required
25
The Story of conda—Why conda?
© 2017 Continuum Analytics - Confidential & Proprietary 262
6
Over 5 years & 4.5 Million Users
© 2017 Continuum Analytics - Confidential & Proprietary 27
Reproducibility
Environment isolation
Upgrade management
Python, R, Scala
Project environment management
Multi-platform
(Windows, macOS, Linux)
2
7
Enterprise Support: Migration & Dependency Management
Python 2 / 3 Data Science Package Manager
Pre-compiled binary
C, C++, Fortran, Java…
Dependency Management
© 2017 Continuum Analytics - Confidential & Proprietary 28
• Trusted by enterprises to manage their data science
environments
• The package manager loved by data scientists
• Integrates alongside existing workflows & packaging
solutions, such as pip & Docker
• Flexible packaging solution for your custom in-house needs
• Simplify data science software packaging and deployment,
help data science projects reach their users
2
8
Why conda?
© 2017 Continuum Analytics - Confidential & Proprietary 292
9
Dharhas Pothina,
US Army Corps of Engineers ERDC
AnacondaCON 2017
“Packaging is hard. Just use conda”
Trusted by Enterprises & Industry Leaders
© 2017 Continuum Analytics - Confidential & Proprietary 3030
yum (rpm)
apt-get (dpkg)
Linux macOS
macports
homebrew
fink
Windows
chocolatey
npackd
Cross-Platform
conda Sophisticated light-weight
environments included!
http://conda.pydata.org
Cross-Platform Package Manager
© 2017 Continuum Analytics - Confidential & Proprietary 313
1
Conda with pip (vs. virtualenv): Different Package Managers for Different Needs
conda pip (+virtualenv)
multi-language & general
environmentspython packages & environments
handles environments natively virtualenv
installs binariescompiles from source
some binary support (wheels)
conda install pip
conda envs support pip
dependencies
N/A
© 2017 Continuum Analytics - Confidential & Proprietary 32
Desktop / Laptop
conda env 1
Analysis 1
conda env 2 conda env 3
Analysis 2 Analysis 3
Server
conda env 1
Analysis 1
conda env 2 conda env 3
Analysis 2 Analysis 3
Docker container
Data Science Development Data Science Deployment
conda + Docker: Better Together
32
© 2017 Continuum Analytics - Confidential & Proprietary 33
Laptop
Project 1 Project 2 Project 3
Project 1 Project 2 Project 3
Data Science Development Data Science Development & Deployment
Anaconda Enterprise
Container 1
Container 2
Container 3 Container 4
Anaconda Enterprise: Leveraging conda + Docker for Data Science Project Deployment
33
34
The conda Community
© 2017 Continuum Analytics - Confidential & Proprietary 353
5
repo.continuum.io
Anaconda Cloudanaconda.org
Anaconda Enterprise(on-premise repository)
Conda Forgeanaconda.org/conda-forge
PACKAGES
Curated Anaconda packages
Curated by the community
Uploaded by users & organizations
Curated by your organization
conda install <package>
Creating & Curating conda Packages for Data Scientists
36
The Future of conda
© 2017 Continuum Analytics - Confidential & Proprietary 373
7
conda Everywhere! From Arduino, Mobile Devices & IoT to the Mainframe
38
Explore conda
© 2017 Continuum Analytics - Confidential & Proprietary 39
What is conda not ?
configuration management
process management
®
What is conda?
packages
environments
© 2017 Continuum Analytics - Confidential & Proprietary 40
Enabling Environments
• portability
system-level package management that’s not tied to hard-coded
system paths
• multiple, composable environments
multiple instances of otherwise-colliding software, functionally isolated
on the same system
• preferential use of hard links
being respectful with disk usage
© 2017 Continuum Analytics - Confidential & Proprietary 41
Power and Flexibility for Users and Sysadmins Alike
• natively multi-user
fully functional within the limited privileges of a non-privileged user
admin/root users enabled with extensive configuration and
enforcement capabilitieshttps://www.continuum.io/blog/developer-blog/conda-configuration-engine-power-users
© 2017 Continuum Analytics - Confidential & Proprietary 42
Layers of Process Isolation
bare metal
hypervisor
virtual machine
container (e.g. Docker)
chroot
conda env
python virtualenv
application
isolation in-depth
pure-python isolation
functional isolation
more
secure isolation
© 2017 Continuum Analytics - Confidential & Proprietary 43
Conda is Platform-Universal
• conda anywhere, and everywhere
no mooring to any single platform, OS, or language
a package management abstraction layer
great complement to containers and Docker
®
+
© 2017 Continuum Analytics - Confidential & Proprietary 44
? pip install python3? pip install postgresql
yum / apt-get
conda
pip
everything on PyPI
Python sdists
OS kernel
System packages
Spectrum of Package Managers
© 2017 Continuum Analytics - Confidential & Proprietary 45
Enforcer of Safety and Correctness
• pre-compiled packages only
will never require a compiler in production, or unexpectedly invoke one
on you
• environment integrity
disk-mutating operations are wrapped in a transaction, and rolled back
in the event of errors
• environment correctness
conda enforces compatibility of packages within environments
© 2017 Continuum Analytics - Confidential & Proprietary 46
Channels for User Empowerment
• the channel is a component of a package’s identity
first-class citizen in package specifications
• easy package building
with conda-build, a dedicated tool with engaged and dedicated code
contributors
• channels enable community
independent contributors, independent packaging communities,
and corporations
© 2017 Continuum Analytics - Confidential & Proprietary 47
Conda is a Community
CONDA-FORGE OMNIA
®
© 2017 Continuum Analytics - Confidential & Proprietary 48
The Anaconda Package Ecosystem
• the Anaconda repositories are the gold standard
growing and evolving along with the software we distribute
backed by dedicated engineers who are packaging professionals
© 2017 Continuum Analytics - Confidential & Proprietary 49
Next Steps
DOWNLOAD Anaconda or minicondacontinuum.io/downloads
GET the conda cheatsheetbit.ly/2tKXe4G
READ more about conda in our developer blogcontinuum.io/blog/developer-blog/