revolutionizing data science package management with conda

31
© 2017 Continuum Analytics - Confidential & Proprietary © 2017 Continuum Analytics - Confidential & Proprietary Revolutionizing Data Science Package Management with conda Travis Oliphant, Co-founder, President & Chief Data Scientist Kale Franz, conda Tech Lead July 25, 2017

Upload: anaconda

Post on 21-Jan-2018

1.306 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary© 2017 Continuum Analytics - Confidential & Proprietary

Revolutionizing Data Science Package

Management with conda

Travis Oliphant, Co-founder, President & Chief Data Scientist

Kale Franz, conda Tech Lead

July 25, 2017

Page 2: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 20

• President, Chief Data Scientist & Co-Founder of Continuum Analytics

• Ph.D. from the Mayo Clinic, BS & MS degrees in mathematics and

electrical engineering from Brigham Young University

• Primary developer of NumPy package

• Founding contributor of SciPy package

• Author of Guide to NumPy

Travis Oliphant

Page 3: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 21

• Lead developer of the conda project

• Ph.D. in electrical engineering from Princeton University

• Domain expert in the field of quantum cascade lasers

• Made infrared semiconductor lasers for NASA jet Propulsion Laboratory

• Previously backend developer and DevOps tech lead at 23andMe

Kale Franz

Page 4: Revolutionizing Data Science Package Management with conda

22

What is Anaconda?

Page 5: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 232

3

The Most Popular Data Science Ecosystem with Over 4.5M Users

Page 6: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 242

4

conda: The core of Anaconda Distribution

• Written in Python, language

agnostic

• Multi-platform (Windows,

macOS, Linux)

• Powering the data science

ecosystem

• No admin privileges required

Page 7: Revolutionizing Data Science Package Management with conda

25

The Story of conda—Why conda?

Page 8: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 262

6

Over 5 years & 4.5 Million Users

Page 9: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 27

Reproducibility

Environment isolation

Upgrade management

Python, R, Scala

Project environment management

Multi-platform

(Windows, macOS, Linux)

2

7

Enterprise Support: Migration & Dependency Management

Python 2 / 3 Data Science Package Manager

Pre-compiled binary

C, C++, Fortran, Java…

Dependency Management

Page 10: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 28

• Trusted by enterprises to manage their data science

environments

• The package manager loved by data scientists

• Integrates alongside existing workflows & packaging

solutions, such as pip & Docker

• Flexible packaging solution for your custom in-house needs

• Simplify data science software packaging and deployment,

help data science projects reach their users

2

8

Why conda?

Page 11: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 292

9

Dharhas Pothina,

US Army Corps of Engineers ERDC

AnacondaCON 2017

“Packaging is hard. Just use conda”

Trusted by Enterprises & Industry Leaders

Page 12: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 3030

yum (rpm)

apt-get (dpkg)

Linux macOS

macports

homebrew

fink

Windows

chocolatey

npackd

Cross-Platform

conda Sophisticated light-weight

environments included!

http://conda.pydata.org

Cross-Platform Package Manager

Page 13: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 313

1

Conda with pip (vs. virtualenv): Different Package Managers for Different Needs

conda pip (+virtualenv)

multi-language & general

environmentspython packages & environments

handles environments natively virtualenv

installs binariescompiles from source

some binary support (wheels)

conda install pip

conda envs support pip

dependencies

N/A

Page 14: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 32

Desktop / Laptop

conda env 1

Analysis 1

conda env 2 conda env 3

Analysis 2 Analysis 3

Server

conda env 1

Analysis 1

conda env 2 conda env 3

Analysis 2 Analysis 3

Docker container

Data Science Development Data Science Deployment

conda + Docker: Better Together

32

Page 15: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 33

Laptop

Project 1 Project 2 Project 3

Project 1 Project 2 Project 3

Data Science Development Data Science Development & Deployment

Anaconda Enterprise

Container 1

Container 2

Container 3 Container 4

Anaconda Enterprise: Leveraging conda + Docker for Data Science Project Deployment

33

Page 16: Revolutionizing Data Science Package Management with conda

34

The conda Community

Page 17: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 353

5

repo.continuum.io

Anaconda Cloudanaconda.org

Anaconda Enterprise(on-premise repository)

Conda Forgeanaconda.org/conda-forge

PACKAGES

Curated Anaconda packages

Curated by the community

Uploaded by users & organizations

Curated by your organization

conda install <package>

Creating & Curating conda Packages for Data Scientists

Page 18: Revolutionizing Data Science Package Management with conda

36

The Future of conda

Page 19: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 373

7

conda Everywhere! From Arduino, Mobile Devices & IoT to the Mainframe

Page 20: Revolutionizing Data Science Package Management with conda

38

Explore conda

Page 21: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 39

What is conda not ?

configuration management

process management

®

What is conda?

packages

environments

Page 22: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 40

Enabling Environments

• portability

system-level package management that’s not tied to hard-coded

system paths

• multiple, composable environments

multiple instances of otherwise-colliding software, functionally isolated

on the same system

• preferential use of hard links

being respectful with disk usage

Page 23: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 41

Power and Flexibility for Users and Sysadmins Alike

• natively multi-user

fully functional within the limited privileges of a non-privileged user

admin/root users enabled with extensive configuration and

enforcement capabilitieshttps://www.continuum.io/blog/developer-blog/conda-configuration-engine-power-users

Page 24: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 42

Layers of Process Isolation

bare metal

hypervisor

virtual machine

container (e.g. Docker)

chroot

conda env

python virtualenv

application

isolation in-depth

pure-python isolation

functional isolation

more

secure isolation

Page 25: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 43

Conda is Platform-Universal

• conda anywhere, and everywhere

no mooring to any single platform, OS, or language

a package management abstraction layer

great complement to containers and Docker

®

+

Page 26: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 44

? pip install python3? pip install postgresql

yum / apt-get

conda

pip

everything on PyPI

Python sdists

OS kernel

System packages

Spectrum of Package Managers

Page 27: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 45

Enforcer of Safety and Correctness

• pre-compiled packages only

will never require a compiler in production, or unexpectedly invoke one

on you

• environment integrity

disk-mutating operations are wrapped in a transaction, and rolled back

in the event of errors

• environment correctness

conda enforces compatibility of packages within environments

Page 28: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 46

Channels for User Empowerment

• the channel is a component of a package’s identity

first-class citizen in package specifications

• easy package building

with conda-build, a dedicated tool with engaged and dedicated code

contributors

• channels enable community

independent contributors, independent packaging communities,

and corporations

Page 29: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 47

Conda is a Community

CONDA-FORGE OMNIA

®

Page 30: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 48

The Anaconda Package Ecosystem

• the Anaconda repositories are the gold standard

growing and evolving along with the software we distribute

backed by dedicated engineers who are packaging professionals

Page 31: Revolutionizing Data Science Package Management with conda

© 2017 Continuum Analytics - Confidential & Proprietary 49

Next Steps

DOWNLOAD Anaconda or minicondacontinuum.io/downloads

GET the conda cheatsheetbit.ly/2tKXe4G

READ more about conda in our developer blogcontinuum.io/blog/developer-blog/