building taxbrain: numba-enabled financial computing on the web

Post on 16-Aug-2015

943 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building TaxBrain:Numba-Enabled Financial Computing on the Web

T.J. Alumbaugh

July 25, 2015

© 2015 Continuum Analytics-

© 2015 Continuum Analytics-

Agenda

1. Background

2. Tax Calculator

3. Numba to the rescue

4. Webapp demo

5. Deployment

6. Lessons Learned

7. Future Work

8. Acknowledgements

3

BACKGROUND

© 2015 Continuum Analytics- Confidential & Proprietary

4

Open Source Policy Center

© 2015 Continuum Analytics-

A community with the goal of making policy analysis more trustworthy, accessible, and innovative by harnessing open-source methods to build cutting edge economic models.

5

Open Source Policy Center - Motivation

© 2015 Continuum Analytics-

• Computational economic models massively influence which policy ideas become law.

• What happens to the government’s budget, who will gain/lose, how will people’s behavior change, what happens to the economy?

• Existing models are proprietary software and most policymakers and the public don’t have access.

- Limited access inhibits creative policy solutions or wide participation in policy debates -> Bad for democracy

- Limited transparency inhibits external review and stifles innovation - > Bad for economy

6

TAX CALCULATOR

© 2015 Continuum Analytics- Confidential & Proprietary

7

Tax-Calculator Python package: taxcalc• Implementation of the Federal income tax code for

2013 – 2024ish.• Performs a “microsimulation” calculation– What is the effect on revenue if we raise/lower the capital

gains tax? Raise/lower the maximum taxable income for SS? Raise/lower the highest income tax rate? Increase/decrease the Earned Income Tax Credit? Do all at the same time?

© 2015 Continuum Analytics-

8

taxcalc computes revenue estimates

© 2015 Continuum Analytics-

Sample tax returns

Tax code parameters

taxcalc

Revenue projection

9

Sample Tax Returns: the Public Use File• Public Use File (PUF) is a licensed dataset made

available by the Statistics of Income (SOI) branch of the IRS (your tax dollars at work!)

• ~150,000 sample tax returns, with privacy-enhancing modifications, weighted to be statistically similar to the ~120,000,000 tax returns filed every year

© 2015 Continuum Analytics-

10

Your policy reform in action!

© 2015 Continuum Analytics-

PUF

Default Tax code parameters

taxcalc

Status quo Revenue projection

taxcalcPUF

User-specified Tax code parameters

Revenue projection for user-defined policy

Δ

Δ = your policy effect!

11

Your policy reform in action!

© 2015 Continuum Analytics-

PUF

Default Tax code parameters

taxcalc

Status quo Revenue projection

taxcalcPUF

User-specified Tax code parameters

Revenue projection for user-defined policy

Δ

Δ = your policy effect!

(Actually, there is Δ1,…, Δ10

because we do this for 10 budget years)

12

NUMBA TO THE RESCUE

© 2015 Continuum Analytics- Confidential & Proprietary

13

Numba helps humans read fast Python code

© 2015 Continuum Analytics-

Turning this….

14

Numba helps humans read fast Python code

© 2015 Continuum Analytics-

Into this…

15

But wait – there’s more!

© 2015 Continuum Analytics-

<img>numba_logo.png</img>

16

Take advantage of common patterns

© 2015 Continuum Analytics-

Nearly every function in taxcalc operates on columns of a DataFrame There are ~150 different columns, most functions take 10-30 arguments return 5-15 arguments. That’s a lot of typing.

A B C for i in range(x): … expressions with A[i],

B[i], C[i], etc. All the tax logic goes here!

17

Custom decorator: @iterate_jit

© 2015 Continuum Analytics-

• We handle the boilerplate by making custom wrapping functions at import time (and jitting the result)

• Caller calls function like this:

SSBenefits(params, records)

18

Custom decorator: @iterate_jit

© 2015 Continuum Analytics-

Function definition looks like this:

19

Custom decorator: @iterate_jit

© 2015 Continuum Analytics-

• Creates/Applies a wrapper ‘for’ loop function to the given function

• Jits that resulting function• Shuffles the right arguments in and out of the

DataFrames• ~ SAS-like programming interface to leverage

experience of tax modeling community

Act now while

supplies last!!

20

How can we get non-coders to use taxcalc and do their own policy microsimulation?

© 2015 Continuum Analytics-

21© 2015 Continuum Analytics-

www.ospc.org

22

TAXBRAIN DEMO

© 2015 Continuum Analytics- Confidential & Proprietary

23

The TaxBrain architecture• Django• Celery: One budget year “delta” is an

asynchronously executed ‘task’• Redis for message brokering

© 2015 Continuum Analytics-

24

Option 1: Use Heroku for everything- Add-ons: RedisGreen for redis- Additional dynos for computational work

© 2015 Continuum Analytics-

Gunicorn serving Django app

Celery worker node

Web dyno Worker dynoRedis

25

Option 1: Use Heroku for everything- Add-ons: RedisGreen for redis- Additional dynos for computational work

© 2015 Continuum Analytics-

Gunicorn serving Django app

Celery worker node

Web dyno Worker dynoRedis

Result: we found that only PX dynos can handle heavy workloads w/ high memory watermark. They are also expensive and didn’t perform up to our expectations.

26

Option 2: Heroku + AWS• Option 2: Heroku for web, AWS for workers

© 2015 Continuum Analytics-

…AWS node

AWS node

AWS node

AWS node

AWS node

AWS node

year 0

year n

Gunicorn serving Django app

(HTTP)Split the calculation over budget years and recombine when work is done

27

AWS Worker nodes• Flask + Redis + Celery + taxcalc• State-less API to do one year’s budget calculation

– Flask endpoints for:• start work + get ticket POST “START_JOB”• Is this ticket done yet? GET “QUERY_RESULT”• Provide the answer for this ticket GET “GET_RESULT”

• Deployed with salt, services running with systemd• Cheap enough to have surplus workers so we can provide graceful degradation of

service• TIP: For numba-ized work, used a threaded worker pool for celery, not the default

pool of worker processes

celery -A webapp.apps.taxbrain.tasks worker -P eventlet -l info

© 2015 Continuum Analytics-

28

All of the code that runs ospc.org is now public!• js, css, Django templates• Distributed task execution

© 2015 Continuum Analytics-

29

All of the code that runs ospc.org is now public!• js, css, Django templates• Distributed task execution

© 2015 Continuum Analytics-

http://www.github.com/OpenSourcePolicyCenter/webapp_public

30

DEPLOYMENT

© 2015 Continuum Analytics- Confidential & Proprietary

31

We use Heroku for deployment• It’s hard to beat:

git push heroku master

• TIP: Custom Heroku buildpacks that use conda:https://github.com/kennethreitz/conda-buildpack/

© 2015 Continuum Analytics-

32

Deploying TaxBrain• taxcalc changes rapidly• Policy analysts want anyone to be able to

reproduce their results• Whatever you produce on TaxBrain should be

easily reproducible on a local machine on any platform

© 2015 Continuum Analytics-

33

Deploying TaxBrain 1. git tag + git archive -> updates package

__version__ through versioneer2. conda build and upload packages to anaconda.org3. Deploy with Heroku (git push heroku master)(latest version automatically used at deployment with conda install) conda install –c ospc taxcalc

© 2015 Continuum Analytics-

34

LESSONS LEARNED

© 2015 Continuum Analytics- Confidential & Proprietary

35

Lessons Learned• Heroku for ‘compute’ is expensive for

memory/compute intensive applications• Git tag + versioneer + conda build +

anaconda.org = transparent cross-platform deployment, reproducible results and public history of changes

© 2015 Continuum Analytics-

36

Lessons Learned• Formulate your work as state-less operations– BAD (form Ax=b, apply pre-conditioner, solve with

GMRES, return x, use x)– GOOD (partition problem into N smaller Ax=b

problems, give those to a pool of workers, assemble answer after all work is done)

– This may not be the least amount of computational work

© 2015 Continuum Analytics-

37

FUTURE WORK/ACKNOWLEDGEMENTS

© 2015 Continuum Analytics- Confidential & Proprietary

38

Future Work• Dynamic scoring macroeconomic model• Healthcare models, Social Security, etc.• Visualization of TaxBrain results (embedded Bokeh plots, D3,

etc.)• Lots of improvements to OSPC.org & taxcalc. Open issues on

Github!

© 2015 Continuum Analytics-

You!

39

Acknowledgements• Matt Jensen, Managing Director OSPC• Zach Risher, Web Dev• Fellow Continuum developers:– Jake Lyons, Theo Lekkas, Andrew Farrell, Kevin

Colton

© 2015 Continuum Analytics-

THANKS FOR LISTENING!

git clone http://www.github.com/OpenSourcePolicyCenter/Tax-Calculatorgit clone http://www.github.com/OpenSourcePolicyCenter/webapp_public

© 2015 Continuum Analytics-

Email: tj.alumbaugh@continuum.ioTwitter: @talumbau

top related