building taxbrain: numba-enabled financial computing on the web

40
Building TaxBrain: Numba-Enabled Financial Computing on the Web T.J. Alumbaugh July 25, 2015 © 2015 Continuum Analytics-

Upload: talumbau

Post on 16-Aug-2015

943 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Building TaxBrain: Numba-enabled Financial Computing on the Web

Building TaxBrain:Numba-Enabled Financial Computing on the Web

T.J. Alumbaugh

July 25, 2015

© 2015 Continuum Analytics-

Page 2: Building TaxBrain: Numba-enabled Financial Computing on the Web

© 2015 Continuum Analytics-

Agenda

1. Background

2. Tax Calculator

3. Numba to the rescue

4. Webapp demo

5. Deployment

6. Lessons Learned

7. Future Work

8. Acknowledgements

Page 3: Building TaxBrain: Numba-enabled Financial Computing on the Web

3

BACKGROUND

© 2015 Continuum Analytics- Confidential & Proprietary

Page 4: Building TaxBrain: Numba-enabled Financial Computing on the Web

4

Open Source Policy Center

© 2015 Continuum Analytics-

A community with the goal of making policy analysis more trustworthy, accessible, and innovative by harnessing open-source methods to build cutting edge economic models.

Page 5: Building TaxBrain: Numba-enabled Financial Computing on the Web

5

Open Source Policy Center - Motivation

© 2015 Continuum Analytics-

• Computational economic models massively influence which policy ideas become law.

• What happens to the government’s budget, who will gain/lose, how will people’s behavior change, what happens to the economy?

• Existing models are proprietary software and most policymakers and the public don’t have access.

- Limited access inhibits creative policy solutions or wide participation in policy debates -> Bad for democracy

- Limited transparency inhibits external review and stifles innovation - > Bad for economy

Page 6: Building TaxBrain: Numba-enabled Financial Computing on the Web

6

TAX CALCULATOR

© 2015 Continuum Analytics- Confidential & Proprietary

Page 7: Building TaxBrain: Numba-enabled Financial Computing on the Web

7

Tax-Calculator Python package: taxcalc• Implementation of the Federal income tax code for

2013 – 2024ish.• Performs a “microsimulation” calculation– What is the effect on revenue if we raise/lower the capital

gains tax? Raise/lower the maximum taxable income for SS? Raise/lower the highest income tax rate? Increase/decrease the Earned Income Tax Credit? Do all at the same time?

© 2015 Continuum Analytics-

Page 8: Building TaxBrain: Numba-enabled Financial Computing on the Web

8

taxcalc computes revenue estimates

© 2015 Continuum Analytics-

Sample tax returns

Tax code parameters

taxcalc

Revenue projection

Page 9: Building TaxBrain: Numba-enabled Financial Computing on the Web

9

Sample Tax Returns: the Public Use File• Public Use File (PUF) is a licensed dataset made

available by the Statistics of Income (SOI) branch of the IRS (your tax dollars at work!)

• ~150,000 sample tax returns, with privacy-enhancing modifications, weighted to be statistically similar to the ~120,000,000 tax returns filed every year

© 2015 Continuum Analytics-

Page 10: Building TaxBrain: Numba-enabled Financial Computing on the Web

10

Your policy reform in action!

© 2015 Continuum Analytics-

PUF

Default Tax code parameters

taxcalc

Status quo Revenue projection

taxcalcPUF

User-specified Tax code parameters

Revenue projection for user-defined policy

Δ

Δ = your policy effect!

Page 11: Building TaxBrain: Numba-enabled Financial Computing on the Web

11

Your policy reform in action!

© 2015 Continuum Analytics-

PUF

Default Tax code parameters

taxcalc

Status quo Revenue projection

taxcalcPUF

User-specified Tax code parameters

Revenue projection for user-defined policy

Δ

Δ = your policy effect!

(Actually, there is Δ1,…, Δ10

because we do this for 10 budget years)

Page 12: Building TaxBrain: Numba-enabled Financial Computing on the Web

12

NUMBA TO THE RESCUE

© 2015 Continuum Analytics- Confidential & Proprietary

Page 13: Building TaxBrain: Numba-enabled Financial Computing on the Web

13

Numba helps humans read fast Python code

© 2015 Continuum Analytics-

Turning this….

Page 14: Building TaxBrain: Numba-enabled Financial Computing on the Web

14

Numba helps humans read fast Python code

© 2015 Continuum Analytics-

Into this…

Page 15: Building TaxBrain: Numba-enabled Financial Computing on the Web

15

But wait – there’s more!

© 2015 Continuum Analytics-

<img>numba_logo.png</img>

Page 16: Building TaxBrain: Numba-enabled Financial Computing on the Web

16

Take advantage of common patterns

© 2015 Continuum Analytics-

Nearly every function in taxcalc operates on columns of a DataFrame There are ~150 different columns, most functions take 10-30 arguments return 5-15 arguments. That’s a lot of typing.

A B C for i in range(x): … expressions with A[i],

B[i], C[i], etc. All the tax logic goes here!

Page 17: Building TaxBrain: Numba-enabled Financial Computing on the Web

17

Custom decorator: @iterate_jit

© 2015 Continuum Analytics-

• We handle the boilerplate by making custom wrapping functions at import time (and jitting the result)

• Caller calls function like this:

SSBenefits(params, records)

Page 18: Building TaxBrain: Numba-enabled Financial Computing on the Web

18

Custom decorator: @iterate_jit

© 2015 Continuum Analytics-

Function definition looks like this:

Page 19: Building TaxBrain: Numba-enabled Financial Computing on the Web

19

Custom decorator: @iterate_jit

© 2015 Continuum Analytics-

• Creates/Applies a wrapper ‘for’ loop function to the given function

• Jits that resulting function• Shuffles the right arguments in and out of the

DataFrames• ~ SAS-like programming interface to leverage

experience of tax modeling community

Act now while

supplies last!!

Page 20: Building TaxBrain: Numba-enabled Financial Computing on the Web

20

How can we get non-coders to use taxcalc and do their own policy microsimulation?

© 2015 Continuum Analytics-

Page 21: Building TaxBrain: Numba-enabled Financial Computing on the Web

21© 2015 Continuum Analytics-

www.ospc.org

Page 22: Building TaxBrain: Numba-enabled Financial Computing on the Web

22

TAXBRAIN DEMO

© 2015 Continuum Analytics- Confidential & Proprietary

Page 23: Building TaxBrain: Numba-enabled Financial Computing on the Web

23

The TaxBrain architecture• Django• Celery: One budget year “delta” is an

asynchronously executed ‘task’• Redis for message brokering

© 2015 Continuum Analytics-

Page 24: Building TaxBrain: Numba-enabled Financial Computing on the Web

24

Option 1: Use Heroku for everything- Add-ons: RedisGreen for redis- Additional dynos for computational work

© 2015 Continuum Analytics-

Gunicorn serving Django app

Celery worker node

Web dyno Worker dynoRedis

Page 25: Building TaxBrain: Numba-enabled Financial Computing on the Web

25

Option 1: Use Heroku for everything- Add-ons: RedisGreen for redis- Additional dynos for computational work

© 2015 Continuum Analytics-

Gunicorn serving Django app

Celery worker node

Web dyno Worker dynoRedis

Result: we found that only PX dynos can handle heavy workloads w/ high memory watermark. They are also expensive and didn’t perform up to our expectations.

Page 26: Building TaxBrain: Numba-enabled Financial Computing on the Web

26

Option 2: Heroku + AWS• Option 2: Heroku for web, AWS for workers

© 2015 Continuum Analytics-

…AWS node

AWS node

AWS node

AWS node

AWS node

AWS node

year 0

year n

Gunicorn serving Django app

(HTTP)Split the calculation over budget years and recombine when work is done

Page 27: Building TaxBrain: Numba-enabled Financial Computing on the Web

27

AWS Worker nodes• Flask + Redis + Celery + taxcalc• State-less API to do one year’s budget calculation

– Flask endpoints for:• start work + get ticket POST “START_JOB”• Is this ticket done yet? GET “QUERY_RESULT”• Provide the answer for this ticket GET “GET_RESULT”

• Deployed with salt, services running with systemd• Cheap enough to have surplus workers so we can provide graceful degradation of

service• TIP: For numba-ized work, used a threaded worker pool for celery, not the default

pool of worker processes

celery -A webapp.apps.taxbrain.tasks worker -P eventlet -l info

© 2015 Continuum Analytics-

Page 28: Building TaxBrain: Numba-enabled Financial Computing on the Web

28

All of the code that runs ospc.org is now public!• js, css, Django templates• Distributed task execution

© 2015 Continuum Analytics-

Page 29: Building TaxBrain: Numba-enabled Financial Computing on the Web

29

All of the code that runs ospc.org is now public!• js, css, Django templates• Distributed task execution

© 2015 Continuum Analytics-

http://www.github.com/OpenSourcePolicyCenter/webapp_public

Page 30: Building TaxBrain: Numba-enabled Financial Computing on the Web

30

DEPLOYMENT

© 2015 Continuum Analytics- Confidential & Proprietary

Page 31: Building TaxBrain: Numba-enabled Financial Computing on the Web

31

We use Heroku for deployment• It’s hard to beat:

git push heroku master

• TIP: Custom Heroku buildpacks that use conda:https://github.com/kennethreitz/conda-buildpack/

© 2015 Continuum Analytics-

Page 32: Building TaxBrain: Numba-enabled Financial Computing on the Web

32

Deploying TaxBrain• taxcalc changes rapidly• Policy analysts want anyone to be able to

reproduce their results• Whatever you produce on TaxBrain should be

easily reproducible on a local machine on any platform

© 2015 Continuum Analytics-

Page 33: Building TaxBrain: Numba-enabled Financial Computing on the Web

33

Deploying TaxBrain 1. git tag + git archive -> updates package

__version__ through versioneer2. conda build and upload packages to anaconda.org3. Deploy with Heroku (git push heroku master)(latest version automatically used at deployment with conda install) conda install –c ospc taxcalc

© 2015 Continuum Analytics-

Page 34: Building TaxBrain: Numba-enabled Financial Computing on the Web

34

LESSONS LEARNED

© 2015 Continuum Analytics- Confidential & Proprietary

Page 35: Building TaxBrain: Numba-enabled Financial Computing on the Web

35

Lessons Learned• Heroku for ‘compute’ is expensive for

memory/compute intensive applications• Git tag + versioneer + conda build +

anaconda.org = transparent cross-platform deployment, reproducible results and public history of changes

© 2015 Continuum Analytics-

Page 36: Building TaxBrain: Numba-enabled Financial Computing on the Web

36

Lessons Learned• Formulate your work as state-less operations– BAD (form Ax=b, apply pre-conditioner, solve with

GMRES, return x, use x)– GOOD (partition problem into N smaller Ax=b

problems, give those to a pool of workers, assemble answer after all work is done)

– This may not be the least amount of computational work

© 2015 Continuum Analytics-

Page 37: Building TaxBrain: Numba-enabled Financial Computing on the Web

37

FUTURE WORK/ACKNOWLEDGEMENTS

© 2015 Continuum Analytics- Confidential & Proprietary

Page 38: Building TaxBrain: Numba-enabled Financial Computing on the Web

38

Future Work• Dynamic scoring macroeconomic model• Healthcare models, Social Security, etc.• Visualization of TaxBrain results (embedded Bokeh plots, D3,

etc.)• Lots of improvements to OSPC.org & taxcalc. Open issues on

Github!

© 2015 Continuum Analytics-

You!

Page 39: Building TaxBrain: Numba-enabled Financial Computing on the Web

39

Acknowledgements• Matt Jensen, Managing Director OSPC• Zach Risher, Web Dev• Fellow Continuum developers:– Jake Lyons, Theo Lekkas, Andrew Farrell, Kevin

Colton

© 2015 Continuum Analytics-

Page 40: Building TaxBrain: Numba-enabled Financial Computing on the Web

THANKS FOR LISTENING!

git clone http://www.github.com/OpenSourcePolicyCenter/Tax-Calculatorgit clone http://www.github.com/OpenSourcePolicyCenter/webapp_public

© 2015 Continuum Analytics-

Email: [email protected]: @talumbau