analytics, big data and the cloud ii conference - kiribatu labs

19
Learn how insurers predict risk and how you can apply it to your predictive analytics project Pawel Brzeminski, Founder & CEO [email protected] May 15, 2013 Analytics, Big Data, and The Cloud II Edmonton

Upload: pawel-brzeminski

Post on 12-Aug-2015

581 views

Category:

Technology


0 download

TRANSCRIPT

Learn how insurers predict risk and how you can apply it to your predictive analytics project

Pawel Brzeminski, Founder & CEO [email protected]

May 15, 2013 Analytics, Big Data, and The Cloud II

Edmonton

The  Company  KIRIBATULABSDiscovering Knowledge Assets

Kiribatu is a predictive analytics company, founded in 2009 / 6 employees We serve the Canadian financial sector, predominantly Property & Casualty insurance

Predic1ve  analy1cs,  huh?  KIRIBATULABSDiscovering Knowledge Assets

Goal-driven ANALYSIS of a large data set to PREDICT human behavior

If  speed  was  important  to  you…  KIRIBATULABSDiscovering Knowledge Assets

YOUR insurance premium is calculated by methods designed 40-50 years ago

VS.

Risk  assessment  in  Insurance  KIRIBATULABSDiscovering Knowledge Assets

A vast majority of Canadian insurers (May 2013) still use outdated premium rating formulas created in 1960-1970s Only a handful of Canadian insurance companies are sophisticated predictive analytics users Leaders are decimating their competition

Where  to  start?  KIRIBATULABSDiscovering Knowledge Assets

Source: By Phil McElhinney from London (Jeremy Wariner) (http://creativecommons.org/licenses/by-sa/2.0)

How to identify an opportunity for a predictive analytics project?

Ques1ons  to  ask  while  star1ng  KIRIBATULABSDiscovering Knowledge Assets

Data is already collected (or can be easily acquired) Transactional data, customer data, sensor-generated data, usage data, etc.

There is a clear objective to predict something Future price, failure rate, customer risk, customer profitability, customer retention, etc.

Well-defined functional settings are a great place to start We focused on a Risk Sharing Pool (RSP) problem optimization

Typically the SMEs (Subject Matter Experts) are making decisions based on their experience and “gut feeling” Senior underwriters in our case

Significant ROI is expected Investment in analytics can be small but usually it is not trivial

Example  KIRIBATULABSDiscovering Knowledge Assets

Risk Sharing Pool is a construct used by Canadian insurers to optimize their risk assessment Insurers put their highest risks (primary driver and a vehicle) in the pool to avoid paying for the claims But they forfeit the premium

Insurers retain the risks they deem profitable on their book of business They can collect the premium and make a profit

Challenge  KIRIBATULABSDiscovering Knowledge Assets

Can we effectively predict future claims on policies? The model would need to predict claims that will occur up to 12 months in advance

Introducing  Underwri1ng  Score  KIRIBATULABSDiscovering Knowledge Assets

The predictive model generates an Underwriting (UW) Score The UW Score is a number between 1 to 1000

High UW Score = high profitability = low risk Low UW Score = low profitability = high risk

Highly accurate predictor of future claims on a policy UW Score will be used to assess which risks are placed in the pool and which risks are not placed in the pool

Data  Prepara1on  

Ra1ng  Factor  Analysis  

Model  Development  

Gain  Assessment  

KIRIBATULABSDiscovering Knowledge Assets 4  Key  Modeling  Steps  

Data  Prepara1on   •  Policy  &  claims  data  profiling,  understanding  and  verifica1on  

•  Data  cleansing  (filling  missing  values,  outliers  removal)  

•  Data  transforma1on  

•  Data  normaliza1on  (infla1on  &  claim  development  factors)  

•  Data  enrichment  with  3rd  party  data  (demographic,  econometric  –  Census  Canada,  VICC,  CLEAR,  etc.)  

Data  Prepara1on  KIRIBATULABSDiscovering Knowledge Assets

Ra1ng  Factor  Analysis  KIRIBATULABSDiscovering Knowledge Assets

•  Sta1s1cal  analysis  of  each  data  element  for  its  propensity  to  claim  

 •  Ra1ng  factors  with  high  correla1ons  are  included  in  the  final  predic1ve  model(s)  

• OYen,  new  powerful  ra1ng  factors  are  discovered  in  this  step  (very  useful  for  Underwri1ng)  

Ra1ng  Factor  Analysis  

Data  Prepara1on  

Model  Development  KIRIBATULABSDiscovering Knowledge Assets

•  Algorithm  selec1on  (gene1c  algorithms,  neural  networks,  logis1c  regression,  SVM)  

 •  Time-­‐wise  training  and  tes1ng  data  set  split  

 • Model  parameteriza1on,  genera1on  and  evalua1on  

Data  Prepara1on  

Ra1ng  Factor  Analysis  

Model  Development  

 

•  Calcula1on  of  UW  Scores  on  test  data  set  

•  Retrospec1ve  underwri1ng  gain  assessment  on  historical  data  sets  

     

Data  Prepara1on  

Ra1ng  Factor  Analysis  

Model  Development  

Gain  Assessment  

KIRIBATULABSDiscovering Knowledge Assets RSP  Gain  Assessment  

Results  KIRIBATULABSDiscovering Knowledge Assets

Source: “Improving P&C Insurance Risk Management and Policy Pricing with Predictive Analytics”, Pawel Brzeminski, September 2011, http://www.kiribatulabs.com/resources.php.

UW Score = 1000 – Risk Score

4  Key  Challenges  KIRIBATULABSDiscovering Knowledge Assets

Extremely low correlations / Data set imbalance 98% of policy transactions do not have any claims, 2% have claims

Bad, bad data Drivers driving 200,000 km per year (that's driving over 500 km per day for 365 days a year)

Over-fitting Certain features do not generalize very well in a time-wise data split

Data sparcity Motor Vehicle Abstract (MVA) data that contains convictions, suspensions and reinstatement is not always available

5  Key  Breakthroughs  KIRIBATULABSDiscovering Knowledge Assets

Policy transactions collapsed into single vectors Individual risk assessment for each vehicle on policy

Instance sampling and weighting Dealing with dataset imbalance and bad data

Custom model quality metric Aggregation of the highest claims in the top 5% of all transactions really moved the needle

Risk Assessment per insurance coverage Different data elements are important for each coverage, for instance liability coverage and comprehensive coverage are completely different products behave very differently

Prediction of Profitability Include written premiums in 2nd level model

Homework  KIRIBATULABSDiscovering Knowledge Assets

Where can I apply predictive analytics in my business?

Questions? Always happy to have a coffee

Pawel Brzeminski, Founder & CEO [email protected]

780-232-2634

http://ca.linkedin.com/pub/pawel-brzeminski/0/523/555

@pawelwb