contribution of data science to the solvency 2 regulatory framework · 2020-04-21 · insurance...

Strictly Confidential

Insurance Data Science Conference 2018Cass Business School, University of London

Contribution of Data Science to the Solvency 2 regulatory framework:

SFCR automated analysis

Please read the important disclaimer at the end of this presentation

Aurélien Couloumy, Head of Data [email protected]


I. Introduction – Data Science business cases and Solvency 2

P. 2

Use of Data Science could help facilitating S2 framework absorption by all parties

Reduce calculationtime within processes

Improve hypothesisand modeling

Exploit results quicklyand automate controls

Give intuition and agility

Risk profile+

ML

Model points+

ML*

Lapse rate+

ML

Asset allocation+

ML

BSCR+

ML

ORSA+

ML

ML* = Machine learning


I. Introduction – Context of the study

P. 3

Context:

Constraints:

Goals:

Evolvingmethodology

Understandableprocess

Light and open technology

Automate and save time

Create value for other applications21

Analyse and control easily 3

Crawling& parsing

Text mining& NLP

ML& DL

Solvency 2 knowledge

Create a process to collect, interpret and use SFCRs


<<<<<<

<<<<<<

Results gathering

Applications

II. Overall process – Structure

P. 4

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation

Meta data Quanti. data Quali. data

Strictly ConfidentialP. 5

III. Results – SFCR context

Study realized on the former exercise (publication of 18th of May 2017)

Globally, in Europe, each company has played his role, but inan heterogeneous way

A lot of consulting propose analysis reports without considering qualitative aspects

Estimation of around 2400 reports available on the web

Application of the process to this SFCR context


France≈ 220

ITALY≈ 70

GERMANY≈ 160

SPAIN≈ 80

All countries are not available on the map

Extraction of the whole list of (re)insurers websites

UK≈ 180

BENELUX≈ 100

SLOVAKIA≈ 15

POLAND≈ 25

NORWAY≈ 20

SWEDEN≈ 30

FINLAND≈ 10

PORTUGAL≈ 25 GREECE

≈ 20

IRELAND≈ 60

AUSTRIA≈ 25

Around 1200 SFCR collected

III. Results – Collect on websites

Both Solo andGroup reports

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation



25% to 28% of rejected documents thanks to the modification date

5% to 7% of rejected documents thanks to the number of pages

2% to 3% of rejected documents thanks to the document size

26% to 31% of rejected document

on average

We can also use supervised machine learningmodels to define a topic recognition model and to classify if documents are real SFCR

Ko

Ok

Ko

Ok

Ok

Ko

III. Results – Document management

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation



Several automated approaches to define the taxonomies and observe results

P. 8

Definition of several characteristics of study: LoB, Risks, Regulators, Outsourcer, Software,Countries, Modelling hypothesis, Reinsurance, Op. risk , Action Management, etc.

III. Results – Taxonomies creation

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation



III. Results – Taxonomies creation

Definition of supervised machine learning models to study the topics / the architecture of document (page / paragraph level)

Definition of metrics lists (SCRs, turnover, number of emloyees, etc.) and table templates (QRTs) we would like to pick up in the documents

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation



3 Data streams

P. 10

Meta data

Quantitative data

Qualitative data Company name ; URL ; Author; Creation/modification date; Page number; File size ; Encryptions level ; Language; Etc.

Balance sheet; Own funds ; Technical provisions SCRs LAC, capital add-one; Other:

o SCR sub modules;o Employees number;o Client number ;o Turnover.

Etc.

LoB; Top management actions; Outsourcers, software, auditor; Currency; Accounting information; Reinsurers ; Reinsurance methods Etc.

III. Results – Data

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation



Use of meta data for different studies:

P. 11

« Good and bad students » observed through several charts (calendar and versioning):

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation


III. Results – Meta Data


A text analysis on the documentary data base allows to understandmarket practices or market connections:

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation


III. Results – Qualitative Data

And many other possibilities: modelling hypothesis, risk appetite approach, assetstrategies, accounting methods, top management actions, etc.


Collect numerical values such as SCR, MCR, number of clients, turnover,etc. but also of tables(QRTs like S.02.01 and S.25.01)

P. 13

0%10%20%30%40%50%60%70%80%90%

100%

ACO

RIS

MU

TUEL

LES

AFI

-ESC

A

ALT

IMA

ASS

URA

NCE

S

AN

IPS

ASS

URA

NCE

MU

TUEL

LE…

AVI

VA A

SSU

RAN

CES

BARC

LAYS

VIE

CAIS

SE M

UTU

ELLE

…

CCM

O M

UTU

ELLE

CEN

TRE

MU

TUAL

ISTE

…

CNP

ASS

URA

NCE

S

ENTR

ENO

US

EURO

P A

SSIS

TAN

CE

GA

N A

SSU

RAN

CES

GA

RAN

CE

HU

MAN

IS A

SSU

RAN

CES

ICAR

E A

SSU

RAN

CE

LA P

REVO

YAN

CE…

MFP

REC

AU

TIO

N

MU

TAM

E &

PLU

S

MU

TAM

I

MU

TAV

IE S

E

MU

TUEL

LE D

ES M

ETIE

RS…

MU

TUEL

LE D

ES…

MU

TUEL

LE D

ES…

MU

TUEL

LE D

U G

ROU

PE…

MU

TUEL

LE E

PARG

NE…

MU

TUEL

LE G

ENER

ALE

DE…

MU

TUEL

LE L

E LI

BRE

CHO

IX

MU

TUEL

LE M

IEU

X-ET

RE

MU

TUEL

LE N

ATIO

NA

LE…

MU

TUEL

LE N

ATIO

NA

LE…

MU

TUEL

LE S

AIN

T-…

MU

TUEL

LE S

OLI

MU

T…

MU

TUEL

LES

DU

PA

YS…

OCI

ANE

OPT

EVEN

PRED

ICA

- PR

EVO

YAN

CE…

SMPS

(SO

CIET

E…

SO L

YON

MU

TUEL

LE

SOLI

MU

T M

UTU

ELLE

DE…

SURA

VEN

IR A

SSU

RAN

CES

Bonds repartition

Name obligations obligations_de_gouvernements obligations_entreprises

obligations_de_gouvernements obligations_entreprises

titres_structures titres_garantis

<<<<<<

<<<<<<

Results gathering

Applications

<

<<<<<

<<<<<<

<<<<<< <<<<<< <<<<<<

1.

2.

3.

4.1 4.2 4.3

5.

6.

Collect on websites

Document management

Taxonomies creation


III. Results – Quantitative data


IV. Application – Case 1

P. 14

Case 1: Non supervised machine learningmodel to create custom clusters

Goals and motivation:

o Allow to make comparison betweencompanies based on custom criteria

o For example: make cluster based on balance sheet and SCR and show discrimination due to reinsurancestrategies


IV. Application – Case 2

P. 15

Case 2: supervised machine learning to predictestimated SCR values

Goals and motivation:

o Allow to create a proxy to estimate SCR values based on few variables

o For example: propose to customers a toolthat could help them to have more intuition regarding their SCR values and importance of drivers such as reinsurance


Taxonomies creationrequires time and material

Scraping and parsing techniques are currently a subject of noisy debate

SFCR content (both quantitative and qualitative) is volatile

Performance on a simple machine is limited and requires discipline

Other constraints:

- Currency and unit for amounts are not always easy to capture

- Companies set up protection not to find documents easily

- Database is still small to apply large ML models

V. Limits


We have been able to create a process to extract different types of information (properties, labels, numbers) from more than 1000 SFCR

General remark: it is not so easy to compare publications even if the main characterictic of these documents is to be « comparable »

P. 17

V. Conclusion

Automate and save time

Create valuefor applications

2

1Analyse and control easily

3

We have created an automated and scalable process using open source material that allows to produce structured datasets in less than 6 hours

We have developed different business cases that demonstrate all the potential of collecting unstructured data


Contact

Reacfin SA - Place de l’Université, 25 B-1348 Louvain-la-Neuve (Belgium) - 0032 0 10 84 07 50 - www.reacfin.com

P. 18

About us

Reacfin is a consulting firm focused on setting up top quality, tailor-maderisk management frameworks and offering state-of-art actuarial andfinancial techniques, methodologies and risk strategies.

Thank you !

Do you have questions ?

Aurélien Couloumy, Head of Data ScienceM FR +33 6 26 13 09 97M BE +32 486 81 62 [email protected]

Follow us on Linkedin:

http://www.reacfin.com/

mailto:[email protected]

https://www.linkedin.com/company/reacfin/


IT Language:

Packages/Libraries (not exhaustive):

Database & Data visualization:

Appendix – IT framework

P. 19

Local files (CSV/txt files) and remote ones (AWS)

- tm, tidytext, udpipe, stringr, quanteda, etc.- rvest, rcrawler, etc.- caret, gbm, randomforest, factominer, etc.

- nltk, word2vec, doc2vec, etc.- tensorflow, scikit learn, etc.- pytesseract, opencv, etc.

contribution of data science to the solvency 2 regulatory framework · 2020-04-21 · insurance...

Documents