contribution of data science to the solvency 2 regulatory framework · 2020-04-21 · insurance...
TRANSCRIPT
Strictly Confidential
Insurance Data Science Conference 2018Cass Business School, University of London
Contribution of Data Science to the Solvency 2 regulatory framework:
SFCR automated analysis
Please read the important disclaimer at the end of this presentation
Aurélien Couloumy, Head of Data [email protected]
Strictly Confidential
I. Introduction – Data Science business cases and Solvency 2
P. 2
Use of Data Science could help facilitating S2 framework absorption by all parties
Reduce calculationtime within processes
Improve hypothesisand modeling
Exploit results quicklyand automate controls
Give intuition and agility
Risk profile+
ML
Model points+
ML*
Lapse rate+
ML
Asset allocation+
ML
BSCR+
ML
ORSA+
ML
ML* = Machine learning
Strictly Confidential
I. Introduction – Context of the study
P. 3
Context:
Constraints:
Goals:
Evolvingmethodology
Understandableprocess
Light and open technology
Automate and save time
Create value for other applications21
Analyse and control easily 3
Crawling& parsing
Text mining& NLP
ML& DL
Solvency 2 knowledge
Create a process to collect, interpret and use SFCRs
Strictly Confidential
<<<<<<
<<<<<<
Results gathering
Applications
II. Overall process – Structure
P. 4
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
Strictly ConfidentialP. 5
III. Results – SFCR context
Study realized on the former exercise (publication of 18th of May 2017)
Globally, in Europe, each company has played his role, but inan heterogeneous way
A lot of consulting propose analysis reports without considering qualitative aspects
Estimation of around 2400 reports available on the web
Application of the process to this SFCR context
Strictly ConfidentialP. 6
France≈ 220
ITALY≈ 70
GERMANY≈ 160
SPAIN≈ 80
All countries are not available on the map
Extraction of the whole list of (re)insurers websites
UK≈ 180
BENELUX≈ 100
SLOVAKIA≈ 15
POLAND≈ 25
NORWAY≈ 20
SWEDEN≈ 30
FINLAND≈ 10
PORTUGAL≈ 25 GREECE
≈ 20
IRELAND≈ 60
AUSTRIA≈ 25
Around 1200 SFCR collected
III. Results – Collect on websites
Both Solo andGroup reports
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
Strictly ConfidentialP. 7
25% to 28% of rejected documents thanks to the modification date
5% to 7% of rejected documents thanks to the number of pages
2% to 3% of rejected documents thanks to the document size
26% to 31% of rejected document
on average
We can also use supervised machine learningmodels to define a topic recognition model and to classify if documents are real SFCR
Ko
Ok
Ko
Ok
Ok
Ko
III. Results – Document management
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
Strictly Confidential
Several automated approaches to define the taxonomies and observe results
P. 8
Definition of several characteristics of study: LoB, Risks, Regulators, Outsourcer, Software,Countries, Modelling hypothesis, Reinsurance, Op. risk , Action Management, etc.
III. Results – Taxonomies creation
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
Strictly ConfidentialP. 9
III. Results – Taxonomies creation
Definition of supervised machine learning models to study the topics / the architecture of document (page / paragraph level)
Definition of metrics lists (SCRs, turnover, number of emloyees, etc.) and table templates (QRTs) we would like to pick up in the documents
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
Strictly Confidential
3 Data streams
P. 10
Meta data
Quantitative data
Qualitative data Company name ; URL ; Author; Creation/modification date; Page number; File size ; Encryptions level ; Language; Etc.
Balance sheet; Own funds ; Technical provisions SCRs LAC, capital add-one; Other:
o SCR sub modules;o Employees number;o Client number ;o Turnover.
Etc.
LoB; Top management actions; Outsourcers, software, auditor; Currency; Accounting information; Reinsurers ; Reinsurance methods Etc.
III. Results – Data
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
Strictly Confidential
Use of meta data for different studies:
P. 11
« Good and bad students » observed through several charts (calendar and versioning):
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
III. Results – Meta Data
Strictly ConfidentialP. 12
A text analysis on the documentary data base allows to understandmarket practices or market connections:
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
III. Results – Qualitative Data
And many other possibilities: modelling hypothesis, risk appetite approach, assetstrategies, accounting methods, top management actions, etc.
Strictly Confidential
Collect numerical values such as SCR, MCR, number of clients, turnover,etc. but also of tables(QRTs like S.02.01 and S.25.01)
P. 13
0%10%20%30%40%50%60%70%80%90%
100%
ACO
RIS
MU
TUEL
LES
AFI
-ESC
A
ALT
IMA
ASS
URA
NCE
S
AN
IPS
ASS
URA
NCE
MU
TUEL
LE…
AVI
VA A
SSU
RAN
CES
BARC
LAYS
VIE
CAIS
SE M
UTU
ELLE
…
CCM
O M
UTU
ELLE
CEN
TRE
MU
TUAL
ISTE
…
CNP
ASS
URA
NCE
S
ENTR
ENO
US
EURO
P A
SSIS
TAN
CE
GA
N A
SSU
RAN
CES
GA
RAN
CE
HU
MAN
IS A
SSU
RAN
CES
ICAR
E A
SSU
RAN
CE
LA P
REVO
YAN
CE…
MFP
REC
AU
TIO
N
MU
TAM
E &
PLU
S
MU
TAM
I
MU
TAV
IE S
E
MU
TUEL
LE D
ES M
ETIE
RS…
MU
TUEL
LE D
ES…
MU
TUEL
LE D
ES…
MU
TUEL
LE D
U G
ROU
PE…
MU
TUEL
LE E
PARG
NE…
MU
TUEL
LE G
ENER
ALE
DE…
MU
TUEL
LE L
E LI
BRE
CHO
IX
MU
TUEL
LE M
IEU
X-ET
RE
MU
TUEL
LE N
ATIO
NA
LE…
MU
TUEL
LE N
ATIO
NA
LE…
MU
TUEL
LE S
AIN
T-…
MU
TUEL
LE S
OLI
MU
T…
MU
TUEL
LES
DU
PA
YS…
OCI
ANE
OPT
EVEN
PRED
ICA
- PR
EVO
YAN
CE…
SMPS
(SO
CIET
E…
SO L
YON
MU
TUEL
LE
SOLI
MU
T M
UTU
ELLE
DE…
SURA
VEN
IR A
SSU
RAN
CES
Bonds repartition
Name obligations obligations_de_gouvernements obligations_entreprises
obligations_de_gouvernements obligations_entreprises
titres_structures titres_garantis
<<<<<<
<<<<<<
Results gathering
Applications
<
<<<<<
<<<<<<
<<<<<< <<<<<< <<<<<<
1.
2.
3.
4.1 4.2 4.3
5.
6.
Collect on websites
Document management
Taxonomies creation
Meta data Quanti. data Quali. data
III. Results – Quantitative data
Strictly Confidential
IV. Application – Case 1
P. 14
Case 1: Non supervised machine learningmodel to create custom clusters
Goals and motivation:
o Allow to make comparison betweencompanies based on custom criteria
o For example: make cluster based on balance sheet and SCR and show discrimination due to reinsurancestrategies
Strictly Confidential
IV. Application – Case 2
P. 15
Case 2: supervised machine learning to predictestimated SCR values
Goals and motivation:
o Allow to create a proxy to estimate SCR values based on few variables
o For example: propose to customers a toolthat could help them to have more intuition regarding their SCR values and importance of drivers such as reinsurance
Strictly ConfidentialP. 16
Taxonomies creationrequires time and material
Scraping and parsing techniques are currently a subject of noisy debate
SFCR content (both quantitative and qualitative) is volatile
Performance on a simple machine is limited and requires discipline
Other constraints:
- Currency and unit for amounts are not always easy to capture
- Companies set up protection not to find documents easily
- Database is still small to apply large ML models
V. Limits
Strictly Confidential
We have been able to create a process to extract different types of information (properties, labels, numbers) from more than 1000 SFCR
General remark: it is not so easy to compare publications even if the main characterictic of these documents is to be « comparable »
P. 17
V. Conclusion
Automate and save time
Create valuefor applications
2
1Analyse and control easily
3
We have created an automated and scalable process using open source material that allows to produce structured datasets in less than 6 hours
We have developed different business cases that demonstrate all the potential of collecting unstructured data
Strictly Confidential
Contact
Reacfin SA - Place de l’Université, 25 B-1348 Louvain-la-Neuve (Belgium) - 0032 0 10 84 07 50 - www.reacfin.com
P. 18
About us
Reacfin is a consulting firm focused on setting up top quality, tailor-maderisk management frameworks and offering state-of-art actuarial andfinancial techniques, methodologies and risk strategies.
Thank you !
Do you have questions ?
Aurélien Couloumy, Head of Data ScienceM FR +33 6 26 13 09 97M BE +32 486 81 62 [email protected]
Follow us on Linkedin:
Strictly Confidential
IT Language:
Packages/Libraries (not exhaustive):
Database & Data visualization:
Appendix – IT framework
P. 19
Local files (CSV/txt files) and remote ones (AWS)
- tm, tidytext, udpipe, stringr, quanteda, etc.- rvest, rcrawler, etc.- caret, gbm, randomforest, factominer, etc.
- nltk, word2vec, doc2vec, etc.- tensorflow, scikit learn, etc.- pytesseract, opencv, etc.