luigi future

Post on 07-Jul-2015

4.745 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Luig is a workflow manager in Python that I've open sourced. These are slides about Luigi's future from a meetup at July 31

TRANSCRIPT

July 29, 2014

Luigi

The past, the present, the future

Section name

Source:

The history2

The long story builder (2009-2010)XML madness Only used for one single project (my Master’s thesis)

3

The long story builder2 (2010-2011)Everything in Python, but insane amounts of boiler plate

4

Why luigi?

We wanted to do everything in Python, not XML

5

Source:

How do we use it at Spotify?

6

Blah

7

The things we got right

8

Section name

Everything is a directed acyclic graph

Makefile style Tasks specify what they are dependent on not what other things depend on them

9

Section name

Do everything in Python

Dependencies often involve algebra hard to express in XML

10

Section name

Centralized scheduler

Overview of everything that’s currently running/scheduled

11

Luigi worker 1 Luigi worker 2

A

B C

A C

F

Luigi central planner

Section name

Trigger jobs locally is trivial

If the only way is to run things remotely, debugging is super hard Running things locally makes it a lot easier No messing around with paths and configuration !(this has a flip side – more on this later)

12

Section name

It’s a library more than a framework

Avoid the “Hollywood principle” and make it easy to customize etc

13

The hairy parts…

14

Section name

Execution is tied to scheduling

You can’t run this task “in the cloud” and go away

15

Section name

Visualization is pretty rudimentary

See how nice Driven looks for instance: !

16

Section name

Scheduling isn’t tied to triggering

Need to rely on crontab etc Could borrow some of the nice parts of Chronos:

17

Section name

Source:

What are some ideas for the future?

18

Section name

Separate scheduling and execution

Schedule something to run later/somewhere else !Recent baby step towards this is a very simple fix for running modules dynamically: !$ luigi --module MyModule MyTask --foo xyz --bar 123!!The next step would be to do something like !$ luigi --module MyModule MyTask --foo xyz --bar 123 --execute-remotely !!A full implementation would include a bunch of command line options to probe status, kill tasks, etc

19

Section name

Separate scheduling and execution (2)

20

Luigi central scheduler

Worker

Worker

Worker

Worker

...

Section name

On-the-fly dependencies

class MyTask(luigi.Task):! def run(self):! input = yield OtherTask() # this could replace requires()

21

Section name

Built in crontab-replacement

@luigi.schedule!class MyTask(luigi.Task):! param = luigi.DateParameter(default=datetime.date.today())! def run(self):! …!!The @luigi.schedule decorator would then 1. Register that my_module.MyTask should be scheduled (by telling the central planner?) 2. Trigger it continuously from somewhere (central planner?)

22

Section name

ETA for tasks

Using a persistent task history database, you could train a simple k-NN classifier to predict how long a task will run

Then use this with the dependency graph to predict when any task will finish

23

More features in the central planner

Kill a task Re-launch a task Launch a new task

24

Section name

Support for other languages

Luigi is written in Python – but the RPC is language agnostic.

25

Happy plumbing!

26

Questions?

27

top related