analytics in fair lending and regulatory environments · analytics in fair lending and regulatory...
TRANSCRIPT
#AnalyticsXC o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Analytics in Fair Lending and Regulatory EnvironmentsDeanna NealFirst Vice-President Corporate ComplianceSunTrust Bank
Jeff MorrisonFirst Vice-President Corporate ComplianceSunTrust Bank
CLICK TO EDIT MASTER TITLE STYLE
© 2016 SunTr ust Banks, Inc. SunTr ust is a federally r egistered trademark of SunTr ust Banks, Inc.
NEW OPPORTUNITIES IN STATISTICS:NAVIGATING THE REGULATORY ENVIRONMENT IN FAIR BANKING
Traditional Statistics
SAS programming
Regression Analysis
Clustering
Text Mining
The views expressed in this presentation are those of
the authors, and do not represent the opinion of
SunTrust Banks, Inc. or its subsidiaries.
CLICK TO EDIT MASTER TITLE STYLE
NAVIGATING THE REGULATORY ENVIRONMENT
IN FAIR BANKING
2
•The Fair Housing Act (FHA) was enacted in 1968. It prohibits discrimination by anyone during certain residential real estate transactions, including the following activities:
• Marketing
• Originating
• Pricing
• Underwriting
• Purchasing
• Selling
• Brokering
• Appraising
Fair and ResponsibleBanking
CLICK TO EDIT MASTER TITLE STYLE
NAVIGATING THE REGULATORY ENVIRONMENT
IN FAIR BANKING
3
•The Fair Housing Act and U.S. Department of Housing and Urban Development regulations prohibit discrimination based on the following customer characteristics:
•Race
•Color•National Origin•Religion
•Sex•Familial Status•Handicap or Disability, including parental or family leave
•Sexual orientation or gender identity not explicitly covered, but housing discrimination against LGBTQ persons may be covered if it is based on non-conformity to gender stereotypes (discrimination on the basis of sex); fear of HIV/AIDS (discrimination on the basis of disability); or assumptions about marital or familial status.
Fair and ResponsibleBanking
CLICK TO EDIT MASTER TITLE STYLE
NAVIGATING THE REGULATORY ENVIRONMENT
IN FAIR BANKING
4
• The Equal Credit Opportunity Act (ECOA) was enacted in 1974 to prohibit creditors from discriminating in any credit transaction, such as:
• Mortgages
• Consumer loans (automobile, home equity lines/loans, credit cards)
• Small business loans
• Overdraft credit, including courtesy overdrafts
• Loan modifications
• ECOA prohibits discrimination on the basis of:
• Race
• Color
• National Origin
• Religion
• Sex, including maternity or family leave
• Marital Status
• Age (provided the applicant can legally contract)
• Receipt of income from public assistance
• Exercise of rights under the Consumer Credit Protection Act
Fair and ResponsibleBanking
CLICK TO EDIT MASTER TITLE STYLE
NAVIGATING THE REGULATORY ENVIRONMENT
IN FAIR BANKING
5
• Dodd-Frank Consumer Protection and Wall St. Reform Act of 2010
• Among other effects, the Dodd-Frank Consumer Protection and Wall St. Reform Act of 2010 (Dodd-Frank Act) expanded the definition of unfair and deceptive acts. The CFPB enforces UDAAP, while the prudential regulators and FTC enforces UDAP.
• Unfair, deceptive, or abusive acts and practices (UDAAPs) can cause significant financial injury to consumers, erode consumer confidence, and undermine the financial marketplace.
• Under the Dodd-Frank Act, it is unlawful for any provider of consumer financial products or services or a service provider to engage in any unfair, deceptive, or abusive acts or practices.
• Consumer complaints play an important role in regulatory reviews and detection of unfair, deceptive, or abusive practices, as do high volumes of charge-backs or refunds for products or services.
Fair and ResponsibleBanking
CLICK TO EDIT MASTER TITLE STYLE
PROXY FOR RACE / ETHNICITY FOR
NON-MORTGAGE PRODUCTS
6
•Official CFPB Document: http://files.consumerfinance.gov/f/201409_cfpb_report_proxy-methodology.pdf
•This methodology mathematically combines (weights) last name frequencies and census tract and block demographic information into probabilities of an individual being a member of one of six racial or ethnic categories.
•2010 Census tables provided by CFPB in the public domain as well as implementation code.
•Requires the data to be geocoded using standard address formats along with the last name of the primary borrower andco-borrower
Bayesian Improved Surname Geocoding (BISG) by CFPB
CLICK TO EDIT MASTER TITLE STYLE
EXAMPLE OF GEOCODING / BISG IMPUTATION PROCESS
7
Data needed: Name and Address Information for Geocoding and BISG Imputation
Data returned: BISG Imputation and Gender
Note: Information is fictitious and for illustrative purposes only
CLICK TO EDIT MASTER TITLE STYLE
•Difference in Average Decline Rate or Average Price between Whites and Minority or Protected Class
• Results from Regression Analysis
•Outlier Review, Random Sample, or Matched Pairs
•Logistic Regression, Linear Regression
•Means Test
•For underwriting, dependent variable =Probability of Decline
•For pricing, dependent variable = APR or discretionary component of pricing
• Implement BISG (Bayesian Improved Surname Geocoding)
•Check for potential disparate impact on minorities due to underwriting or pricing decisions
FAIR LENDING STATISTICAL REVIEW OF
UNDERWRITING, PRICING
Statistical Results
Methodology
Purpose
8
CLICK TO EDIT MASTER TITLE STYLE
FAIR LENDING STATISTICAL REVIEW OF
UNDERWRITING POLICIES
9
• Create a Decline Event
• by assigning each applicant a value of 1 if Initial Credit Decision=“Decline”
• if “Approve” or “Incomplete”, assign a value of 0; else remove from sample
• Calculate BISG Probabilities for Whites, Blacks, Hispanics, Asians, American Indians, and Multi-race using Geocoding, Surname Lists, and the CFPB computer program. (pr_black, pr_hispanic, etc. 1
6 = 1)
• Assign Gender by using from name list or geocoding software (WIZ, Social Security or Census name lists, etc.)
• Apply Statistical Methods against Decline Event using control variables (custom score, FICO, etc.) for underwriting along with BISG data. Control variables should include credit policy quantitative mapping (FICO minimums, etc.)
Methodology
CLICK TO EDIT MASTER TITLE STYLE
FAIR LENDING STATISTICAL REVIEW OF
UNDERWRITING, PRICING
10
•Decline Event for Underwriting Decision•Declines=~20k
•Approvals=~50k
•Simple Means Test – Significant Differences?
•Use BISG to assign membership in demographic group•Decline Means for Blacks (0.9) Whites (0.6)
•Decline Means for Hispanics (0.8) Whites (0.6) •Decline Means for Asians (0.7) Whites (0.6)
•Regression Results – Potential Issues•Without Controls (BISG only)•With Controls (Custom Score, FICO minimums, Joint
Status, Gender, Overrides, DTI, etc.)
Findings / Results
Illustrative Example
CLICK TO EDIT MASTER TITLE STYLE
EXAMPLE OF POTENTIAL DISPARITYBISG ONLY (OMIT BISG FOR WHITES)
11Note: Above results are fictitious and for illustrative purposes only
Standard Wald
Error
Chi-
Square
Intercept 0.877 0.0128 755.6719 <.0001
pr_black 4.221 0.045 3357.222 <.0001
pr_hispanic 3.988 0.0387 899.5309 <.0001
pr_api 0.322 0.0511 24.8603 <.0001
pr_aian 0.3989 0.7851 39.0191 <.0001
pr_mult_other 1.987 0.3161 43.1765 <.0001
Logistic Regression: BISG Only
Parameter Estimate
Pr > Chi
Sq
CLICK TO EDIT MASTER TITLE STYLE
EXAMPLE OF POTENTIAL DISPARITYADD “CONTROL” VARIABLES
12Note: Above results are fictitious and for illustrative purposes only
Standard Wald
Error
Chi-
Square
Intercept 4.9 1.2457 8.1804 0.0042
pr_black 0.2443 0.083 252.1146 <.0001
pr_hispanic 2.112 0.0723 105.7006 <.0001
pr_api 1.908 0.0951 15.1032 0.0001
pr_aian 0.223 1.5671 10.5922 0.0011
pr_mult_other 0.211 0.5553 4.5436 0.033
Gender_Female 1.987 0.0696 2.5155 0.1127
Gender_Missing 0.1888 0.0958 3.8828 0.0488
Gender_Joint -0.5427 0.0584 86.3135 <.0001
Average Credit Score -2.7035 0.0377 5151.755 <.0001
Minimum FICO -9.2124 1.1383 65.4997 <.0001
Debt to Income 7.6398 0.7895 93.6313 <.0001
Channel Dummy 15.7177 139.8 0.0126 0.9105
Override -0.7318 0.5161 2.0105 0.1562
Year_dum 0.072 0.0597 1.4522 0.2282
Self Employment Dummy 0.1368 0.1971 0.4816 0.4877
QTR1 0.0328 0.0876 0.1403 0.7079
QTR2 -0.2214 0.077 8.2657 0.004
QTR3 -0.0224 0.064 0.1223 0.7266
Logistic Regression: BISG + Control (Credit) Variables
Parameter Estimate
Pr > Chi
Sq
---
CLICK TO EDIT MASTER TITLE STYLE
CLASSIFICATION ACCURACYPREDICTED DECLINE EVENT
13
No Control Variables With Underwriting Controls
Legend: Area Under Curve• 90-1 = excellent (A) •.80-.90 = good (B) •.70-.80 = fair (C) •.60-.70 = poor (D) •.50-.60 = fail (F)
False Positives False Positives
True P
osi
tives
True P
osi
tives
http://gim.unmc.edu/dxtests/roc3.htm
Poor information content
Excellent information content
CLICK TO EDIT MASTER TITLE STYLE
CASE STUDY – REQUESTED LOAN (PRICING: APR)
14Note: Above results are fictitious and for illustrative purposes only
Variable Coefficient Significance FIT
Black 0.1748 <.0001 0.010
Hispanic 0.1945 <.0001 0.010
Asian 0.2192 <.0001 0.010
Female . . .
Variable Coefficient Significance FIT
Black -0.0732 0.15 0.78
Hispanic 0.014 0.05 0.78
Asian 0.0812 0.43 0.78
Female -0.0296 <.0001 0.78
No Control Variables
With Pricing Control Variables
• Results from Regression Analysis show disparities above the acceptable threshold for all protected classes using no controls other than BISG probabilities.
• By using pricing control variables, such as credit quality and collateral information, the disparities are significantly reduced.
Coefficient Comparison
CLICK TO EDIT MASTER TITLE STYLE
CASE STUDY – REQUESTED LOAN
15
Matched Pair #
Protected Class FICO Term APR
Loan to Value
11 White 652 72 1.99% 99.38
11 Black 625 72 2.45% 101.52
25 White 675 72 1.84% 103.08
25 Hispanic 673 72 2.00% 103.3
55 White 723 72 2.99% 111.23
55 Asian 721 72 2.41% 111.66
77 White 798 84 2.50% 93.45
77 Black 839 84 2.15% 89.05
78 White 867 84 2.41% 106.84
78 Hispanic 814 75 3.33% 111.01
•Key variables used to determine similarly situated protected class
•FICO
•Term
•Loan to Value
•Conduct Manual File Review
•Manually review applications of the matched pairs to determine if there are factors not captured in the data file which could explain the variance in markup.
Pricing
Matched Pair Review
Note: Above results are fictitious and for illustrative purposes only
CLICK TO EDIT MASTER TITLE STYLE
NAVIGATING THE REGULATORY ENVIRONMENT
IN FAIR BANKING
16
•Traditional – official business need
•Fair Lending – regulatory screening tool
•Traditional – forward looking
•Fair Lending – forensic in nature
•Traditional – weight of evidence, binning
•Fair Lending – quantifying policy variables
•Traditional – outlier analysis
•Fair Lending – outlier analysis (DFBETAs, Cook’s D)
•Traditional – model validation (hold-out), stress testing
•Fair Lending – forensic file reviews
Fair Lending vs. Traditional “Models / Tools or Controls”
CLICK TO EDIT MASTER TITLE STYLE
TEXT MINING ANDCOMPLAINT ANALYSIS(PROOF OF CONCEPT APPROACH)
CLICK TO EDIT MASTER TITLE STYLE
• Suggest Prediction Variables• “Word Clouds” of Complaint Narratives• KNN Classification• Accuracy• Supporting Paper, Implementation
• Closed Complaint Data• Clean & Organize Complaint Narratives• Determine Additional Predictors• K “Nearest Neighbor” Classification (KNN)
• Determine Feasibility of Predicting Complaint Resolutions• Potentially Facilitate Work Priority• Sampling for Review
Preliminary Results
Methodology
Purpose
18
COMPLAINT ANALYSIS USING TEXT MINING
CLICK TO EDIT MASTER TITLE STYLE
Example where “NO ACTION REQUIRED” was found
The customer complained that a branch employee refused to cash a check. Customer is a long-term client, and noticed that checks were cashed for white clients while she was being refused service. Customer is upset and complains that teller was rude and disrespectful.
Customer wants to submit a complaint because she is waiting for her closing of her refi since 04/15 and it is taking too long for the mortgage to close. The loan officer is XXX.
Example where “POTENTIAL EXCEPTION” was found
COMPLAINT PATTERN NARRATIVES(EXAMPLES ARE FICTITIOUS)
19
CLICK TO EDIT MASTER TITLE STYLE
Classification = No Action Classification = Action
WORD CLOUDS BY CLASSIFICATION
20
CLICK TO EDIT MASTER TITLE STYLE
21
WORD FREQUENCY BY CLASSIFICATION
CLICK TO EDIT MASTER TITLE STYLE
Classification Outcome: “No Action” Required
vs.“Action” Required
Some Information
Content (AUC=.68)
CREATION / IDENTIFICATION OF ADDITIONAL PREDICTORS
CLICK TO EDIT MASTER TITLE STYLE
KNN – NEAREST NEIGHBOR CLASSIFICATION
• Data is mathematically mapped to Euclidian distances
• Class assignment made by proximity to nearest neighboring data points
• Here, point “c” is closer to the “o” points rather than the “a’ points,
so it is classified as an “o”
Here, 2 out of 3 votes are cast for group “O”
23
CLICK TO EDIT MASTER TITLE STYLE
MODEL CORRECTLY CLASSIFIES ALMOST ALL OF THE TEST “NO ACTION” GROUP AND ABOUT HALF OF THE TEST “ACTION” GROUP
2/3rds of data used to train (calibrate) model
1/3rd of data used to test (i.e. validate) model
MODELING DATA WAS SPLIT INTO 2 PARTITIONS
24
CLICK TO EDIT MASTER TITLE STYLE
• Model apparently picks up on complaint
narratives that are more complex and wordy,
classifying them as “Action” events, all other
things remaining equal.
• Complaints that contain High Risk Terms, Tier 2,
Email, and CRT
CORRECTLY CLASSIFYING ACTION COMPLAINTS
AS ACTION
25
CLICK TO EDIT MASTER TITLE STYLE
Similar to a credit score, we can compute the probabilities of each
classification and rank order them from high to low to select samples for
review and auditing. Below we found about 75% of the “action needed”
complaints in the first two deciles (20%) ranked by their probability.
USING PROBABILITY OF “ACTION NEEDED” TO RANK
ORDER COMPLAINTS
26
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#AnalyticsX