parallel processing in apache spark may 17, 2016 · pdf fileparallel processing in apache...

45
©2016 FI Consulting. All rights reserved. Rapid Monte Carlo Simulation for Forecasting, Stress Testing, and Scenario Analysis May 17, 2016 Parallel Processing in Apache Spark

Upload: hoangcong

Post on 06-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Rapid Monte Carlo Simulation for Forecasting,

Stress Testing, and Scenario Analysis

May 17, 2016

Parallel Processing in Apache Spark

Page 2: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Agenda

1 2 3 6

IntroductionsMonte Carlo

Examples

Monte Carlo Advice

7

Q&A

4

VasicekAnalysis

5

Apache Spark Implementation

Code Improvement

Page 3: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Introductions

FI Consulting (FI) specializes in providing financial

institution clients with custom analytics, model

validation and advisory services, and development

support for financial and analytic software. FI

focuses on diverse sectors of the credit market

including mortgages, consumer finance, and small

business loans.

Maxwell Consulting provides deep expertise in the

most complicated sectors of the financial world,

whether analyzing the valuation of a derivative

transaction or advising on the quality of a firm's

risk management. Maxwell serves diverse

audiences including executives, legal counsel,

traders, auditors, and jurors.

ROBERT CHANG, FRM

JOE PIMBLEY, FRM

Page 4: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Poll Questions

Page 5: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Advice

1 2 3 6

Monte Carlo Advice

74 5

Page 6: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Advice

More precision in results

Run more “experiments”

Permits greater “realism” in adding

features

Examples: Real-time risk

management; nested MC methods

Value of MC Simulation increases when calculations run faster

Page 7: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Advice (cont.)

Hardware – e.g., high-performance

processors in parallel

Software – e.g., optimized code

Disadvantage of “off-the-shelf” MC

products

Almost always possible to get faster speed

Page 8: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Examples

1 2 3 6

Monte Carlo Examples

74 5

Page 9: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Example #1: Stochastic Models

Yield y with Standard Deviation s, with Random Variable ε N(0,1)

“Mean Reversion” SDE:

“Normal” SDE:

“Log-Normal” SDE:

Page 10: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Example #2: Simulation for Collateral Defaults

Assets Liabilities Terms

Set Initial Conditions

Page 11: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Example #2: Simulation for Collateral Defaults

Assets Liabilities Terms

Set Initial Conditions

Take a Step Forward in Time

Change yield curve and other market variables Check for asset defaults Record scheduled asset payments Make scheduled liability payments

Page 12: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Example #2: Simulation for Collateral Defaults

Assets Liabilities Terms

Set Initial Conditions

Take a Step Forward in Time

Change yield curve and other market variables Check for asset defaults Record scheduled asset payments Make scheduled liability payments

Return to Step 2 if haven’t reached maturity

Record the Loss to Investors (if any)

Page 13: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Example #2: Simulation for Collateral Defaults

Assets Liabilities Terms

Set Initial Conditions

Take a Step Forward in Time

Change yield curve and other market variables Check for asset defaults Record scheduled asset payments Make scheduled liability payments

Return to Step 2 if haven’t reached maturity

Record the Loss to Investors (if any)

Estimate Density Functions for Losses to Each Tranche

Page 14: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Example #3: Simulation for Bank Asset Defaults

Banking Simulator at Bankmaxwell.azurewebsites.net

Page 15: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Monte Carlo Example #4: Vasicek Default Model

Consider Portfolio of “Infinite” Number of Loans

Each Loan has Identical Default Probability

Each Loan has Identical Correlation to all Other Loans

Vasicek: Analytical Solution for Portfolio Loss Distribution

Page 16: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Vasicek Analysis

1 2 3 6 74

VasicekAnalysis

5

Page 17: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Vasicek Analysis

Real Project: Analysis of Auto Loan Pool Losses

With Many Pools, MLE Estimation of PD & Correlation

Need a Test for the MLE Estimators

Build Monte Carlo Algorithm

Page 18: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Vasicek Analysis

Imagine M Loans in Each of the N Pools

Monte Carlo Simulation to get Default Fraction each Pool

Check MLE Estimators given “known” PD & Correlation

Also a Test of Vasicek and Numerical Methods

Pool 1 Pool 2 Pool 3 Pool N…

Page 19: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Vasicek Analysis

The Vasicek distribution describes the probability density function for the

fraction of defaulted loans within an infinitely diversified portfolio. Simple but

restrictive assumptions specify a single default probability (PD) common to each

loan and a single correlation parameter linking the behavior of all loans.

Variance test: the numerical integration and Monte Carlo simulation are two

viable methods to compute the variance of this Vasicek distribution - very

important for understanding the risk of the loan portfolio!

Pool Test: Create an arbitrary number of pools with an arbitrary number of

obligors per pool. Apply Monte Carlo simulation to determine fraction of defaults

in each pool. Then compare to the analytical Vasicek distribution.

Page 20: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Vasicek Analysis

Visual Studio .Net Web Application at

Webjoe.azurewebsites.net/Vasicek_Variance

Page 21: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

1 2 3 6 74 5

Code Improvement

Page 22: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Page 23: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

With 2,000 Pools and 10,000

Obligors per Pool, the Inner

Loop Generates RVs for

Default Determination of 20

Million Loans

Page 24: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Look Carefully at Each Line of

Code in the Inner Loop to

Reduce “Expensive”

Calculations – ex: redundant

Sqr evaluations

Page 25: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Look Carefully at Each Line of

Code in the Inner Loop to

Reduce “Expensive”

Calculations – ex: Normal CDF

determinations

Page 26: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Look Carefully at Each Line of

Code in the Inner Loop to

Reduce “Expensive”

Calculations – ex: calls to

costly Subroutines

Page 27: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Making this Change Reduces

Execution Time from 120 s to

117 s – Small Saving.

Page 28: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Change: Reduces

Execution Time from

117 to 28 – Big Saving!

Page 29: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Page 30: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Change: Reduces

Execution Time from

28 to 8.4 – Big Saving!

Page 31: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Code Improvement

Change: “Fixing” the Random Value Routine has Little Benefit

End Result is 15x Speed Improvement by Altering the Monte Carlo Inner Loop

Page 32: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Apache Spark Implementation

1 2 3 6 74 5

Apache Spark Implementation

Page 33: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Serial vs. Parallel Programming

A Serial Program consists of a sequence of instructions, where

each instruction executes one after the other.

In a Parallel Program, the processing is broken up into parts, each

of which could be executed concurrently on a different processor.

Page 34: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

What is MapReduce?

A map job, takes a set of data and converts it into another set of data,

where individual elements are broken down into key/value pairs.

A reduce job takes the output from a map as input and merges together

these values to form a possibly smaller set of values.

Page 35: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Spark

Spark is an open-source software solution that performs rapid calculations on

in-memory distributed datasets.

Spark’s uses Resilient Distributed Datasets (RDDs). RDDs can be

automatically recomputed on failure and are resilient and fault-tolerant.

Page 36: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Parallel Processing of Monte Carlo Samples

Page 37: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Parallel Processing of Monte Carlo Samples

Page 38: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Parallel Processing of Monte Carlo Samples

Page 39: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Additional Monte Carlo Draws to Vary LGD

Page 40: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Additional Monte Carlo Draws to Vary LGD

Page 41: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Running Monte Carlo in the Cloud

Page 42: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Running Monte Carlo in the Cloud (cont.)

Page 43: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Running Monte Carlo in the Cloud (cont.)

Page 44: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Wrap-Up and Q&A!

1 2 3 6 7

Q&A

4 5

Page 45: Parallel Processing in Apache Spark May 17, 2016 · PDF fileParallel Processing in Apache Spark ©2016 FI Consulting. All rights reserved. Agenda 1 2 3 6 Introductions Monte Carlo

©2016 FI Consulting. All rights reserved.

Wrap-Up and Q&A!

To learn more, please contact us!

Robert Chang, Model Validation Lead

[email protected]

FI Consulting at www.ficonsulting.com

Joe Pimbley, Principal

[email protected]

Maxwell Consulting at www.maxwell-consulting.com