clean code in jupyter notebooks

Post on 14-Apr-2017

1.005 Views

Category:

Software

19 Downloads

Preview:

Click to see full reader

TRANSCRIPT

@KNerush @Volodymyrk

Clean CodeIn Jupyter notebooks, using Python

1

5th of July, 2016

@KNerush @Volodymyrk

Volodymyr (Vlad) Kazantsev

Head of Data @ product madness

Product Manager

MBA @LBS

Graphics programming

Writes code for money since 2002

Math degree2

Kateryna (Katya) Nerush

Mobile Dev @ Octopus Labs

Dev Lead in Finance

Data Engineer

Web Developer

Writes code for money since 2003

CS degree

@KNerush @Volodymyrk

Why we end-up with messy ipy notebooks?

3

Coding

Stats Business

@KNerush @Volodymyrk

Who are Data Scientists, really?

4

Coding

Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.”

Data Science with Python

@KNerush @Volodymyrk

It is not going to production anyway!

5

@KNerush @Volodymyrk

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999

6

WTF! How am I suppose to validate this??

Sorry, but how do can I calculate 7 day retention ?

@KNerush @Volodymyrk

From Prototype to ... The Data Science Spiral

7

Ideas & Questions

Data Analysis

Insights

Impact

@KNerush @Volodymyrk

You do it for your own good..

8

Re-run all AB tests analysis for the last months, by tomorrow

Ideas & Questions

Data Analysis

Insights

Impact

@KNerush @Volodymyrk

Part 2What can Data Scientists learn from

Software Engineers?

9

@KNerush @Volodymyrk

Robert C. Martin, a.k.a. “Uncle Bob”

10

https://cleancoders.com/

@KNerush @Volodymyrk

“Clean Code” ?

11

Pleasingly graceful and stylish in appearance or manner

Bjarne StroustrupInventor of C++

Clean code reads like well written proseGrady Boochcreator of UML

.. each routine turns out to be pretty much what you expected

Ward Cunninghaminventor of Wiki and XP

@KNerush @Volodymyrk

One does not simply start writing clean code..

12

First make it work,Then make it Right,Then make it fast and small

Kent Beckco-inventor of XP and TDD

Leave the campground cleaner than you found it

- Run all the tests

- Contains no duplicate code

- Expresses all ideas...

- Minimize classes and methods

Ron Jeffriesauthor of Extreme

Programming Installed

The Boy Scouts of America

Applied to programming by Uncle Bob

Volodymyr Kazantsev
What are your thoughts? Makes sense?

@KNerush @Volodymyrk

I'm not a great programmer; I'm just a good programmer with great habits.

13

Kent Beck

@KNerush @Volodymyrk

“There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton

long_descriptive_names

Avoid: x, i, stuff, do_blah()

Pronounceable and Searchable

revenue_per_payer vs. arpdpu

Avoid encodings, abbreviations, prefixes, suffixes.. if possible bonus_points_on_iphone vs. cns_crm_dip

Add meaningful contextdaily_revenue_per_payer

Don’t be lazy. Spend time naming and renaming things.14

@KNerush @Volodymyrk

“each routine turns out to be pretty much what you expected” - Ward Cunningham

Small

Do one thing

One Level of Abstraction

Have only few arguments (one is the best)

Less important in Python, with named arguments.

15

Volodymyr Kazantsev
I'll leave this slide to you then
Katya Nerush
noo
Katya Nerush
sorrywanted to be useful
Katya Nerush
i disappear...
Volodymyr Kazantsev
ok

@KNerush @Volodymyrk

Use good names

Avoid obvious comments.

Dead Commented-out Code

ToDo, licenses, history, markup for documentation and other nonsense

But there are exceptions..

“When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck

16

@KNerush @Volodymyrk

// When I wrote this, only God and I understood what I was doing// Now, God only knows

17

@KNerush @Volodymyrk

// sometimes I believe compiler ignores all my comments

18

@KNerush @Volodymyrk

/*** Always returns true.*/public boolean isAvailable() { return false;}

19

@KNerush @Volodymyrk

“Long functions is where classes are trying to hide” - Robert C. Martin

20

Small

Do one thing

SOLID, Design Patterns, etc.

Volodymyr Kazantsev
Can you please fill slide with books
Katya Nerush
not sure about pragmatic programmer
Volodymyr Kazantsev
there probably should be original design patterns by "gang of four" book
Volodymyr Kazantsev
I don't know about pragmatic programmer
Volodymyr Kazantsev
I think this is good

@KNerush @Volodymyrk

Code conventions

Team should produce same style code as if that was one person

Team conventions over language one, over personal ones

Automate style formatting

21

@KNerush @Volodymyrk

Part 3How to write Clean Code in Python?

(e.g. this is not Java)

22

@KNerush @Volodymyrk

● Indentation● Tabs or Spaces?● Maximum Line Length● Should a line break before or after a binary operator?● Blank Lines● Imports● Comments● Naming Conventions

Example:

PEP 8 -- Style Guide for Python Code

23

foo = long_function_name(var_one, var_two, var_three, var_four)

foo = long_function_name(var_one, var_two, var_three, var_four)

Good Bad

https://www.python.org/dev/peps/pep-0008/

@KNerush @Volodymyrk

Google Python Style Guide

24

https://google.github.io/styleguide/pyguide.html

@KNerush @Volodymyrk25

My favourite !

This is not Java or C++

Functions are first-class objects

Duck-typing as an interface

No setters/getters

Itertools, zip, enumerate

etc.

@KNerush @Volodymyrk

Part 4How to write Clean Python Code in

Jupyter Notebook?

26

@KNerush @Volodymyrk

1. Imports

27

2. Get Data

5.Visualisation

6. Making sense of the data

4. Modelling

3. Transform Data

Typical structure of the ipynb

@KNerush @Volodymyrk

How big should a notebook file be?

28

@KNerush @Volodymyrk

How big should a notebook file be?

Hypothesis - Data - Interpretation

29

@KNerush @Volodymyrk

Keep your notebooks small!

(4-10 cells each)

30

@KNerush @Volodymyrk

Example:

Tip 1: break fat notebook into many small ones

31

1_data_preparation.ipynb

df.to_pickle(‘clean_data_1.pkl)

2_linear_model.py

df = pd.read_pickle(‘clean_data_1.pkl)

3_ensamble.py

df = pd.read_pickle(‘clean_data_1.pkl)

@KNerush @Volodymyrk

Tip 2: shared library

Data access

Common plotting functionality

Report generation

Misc. utils

32

acme_data_utils Data_access.py plotting.py setup.py tests/

@KNerush @Volodymyrk

Tip 3: Don’t just be pythonic. Be IPythonicDon’t hide “secret sauce” inside imported module

BAD:

Good:

33

@KNerush @Volodymyrk

Clean code reads like well written prose

34

Grady Booch

@KNerush @Volodymyrk

Good jupyter notebook reads like well written prose

35

@KNerush @Volodymyrk

How big should one Cell be?

36

@KNerush @Volodymyrk

One “idea - execution - output” triplet per cell

Import Cell: expected output is no import errors

CMD+SHIFT+P

37

Tip 4: each cell should have one logical output

@KNerush @Volodymyrk

Tip 5: write tests .. in jupyter notebooks

38

https://pypi.python.org/pypi/pytest-ipynb

@KNerush @Volodymyrk

Tip 6: ..to the cloud

39

@KNerush @Volodymyrk

Code Smells .. in ipynb

- Cells can’t be executed in order (with runAll and Restart&RunAll)

- Prototype (check ideas) code is mixed with “analysis” code

- Debugging cells

- Copy-paste cells

- Duplicate code (in general)

- Multiple notebooks that re-implement the same function

40

@KNerush @Volodymyrk

Tip 7: Run notebook from another notebook!

41

analysis.ipynb

@KNerush @Volodymyrk

Make Data Product from notebooks!

42

@KNerush @Volodymyrk

Summary: How to organise a Jupyter project

1. Notebook should have one Hypothesis-Data-Interpretation loop

2. Make a multi-project utils library

3. Good jupyter notebook reads like a well written prose

4. Each cell should have one and only one output

5. Write tests in notebooks

6. Deploy a shared Jupyter server

7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.

43

top related