session 26: industry best practices for data protection · session 26: industry best practices for...

2018 Predictive Analytics Symposium

Session 26: Industry Best Practices for Data Protection

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

https://www.soa.org/legal/antitrust-disclaimer/

https://www.soa.org/legal/presentation-disclaimer/

Industry Best Practices for Data Protection

Brad Lipic

September 20, 2018

Vice President, Head of Data Strategy

Global Research and Data Analytics (GRDA)

2

SOA Antitrust Compliance GuidelinesActive participation in the Society of Actuaries is an important aspect of membership. While the positive contributions of professional societies and associations are well-recognized and encouraged, association activities are vulnerable to close antitrust scrutiny. By their very nature, associations bring together industry competitors and other market participants.

The United States antitrust laws aim to protect consumers by preserving the free economy and prohibiting anti-competitive business practices; they promote competition. There are both state and federal antitrust laws, although state antitrust laws closely follow federal law. The Sherman Act, is the primary U.S. antitrust law pertaining to association activities. The Sherman Act prohibits every contract, combination or conspiracy that places an unreasonable restraint on trade. There are, however, some activities that are illegal under all circumstances, such as price fixing, market allocation and collusive bidding.

There is no safe harbor under the antitrust law for professional association activities. Therefore, association meeting participants should refrain from discussing any activity that could potentially be construed as having an anti-competitive effect. Discussions relating to product or service pricing, market allocations, membership restrictions, product standardization or other conditions on trade could arguably be perceived as a restraint on trade and may expose the SOA and its members to antitrust enforcement procedures.

While participating in all SOA in person meetings, webinars, teleconferences or side discussions, you should avoid discussingcompetitively sensitive information with competitors and follow these guidelines:

• -Do not discuss prices for services or products or anything else that might affect prices• -Do not discuss what you or other entities plan to do in a particular geographic or product markets or with particular customers.• -Do not speak on behalf of the SOA or any of its committees unless specifically authorized to do so.• -Do leave a meeting where any anticompetitive pricing or market allocation discussion occurs.• -Do alert SOA staff and/or legal counsel to any concerning discussions• -Do consult with legal counsel before raising any matter or making a statement that may involve competitively sensitive

information.

Adherence to these guidelines involves not only avoidance of antitrust violations, but avoidance of behavior which might be so construed. These guidelines only provide an overview of prohibited activities. SOA legal counsel reviews meeting agenda and materials as deemed appropriate and any discussion that departs from the formal agenda should be scrutinized carefully. Antitrust compliance is everyone’s responsibility; however, please seek legal counsel if you have any questions or concerns.

2

3

Presentation Disclaimer

Presentations are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants individually and, unless expressly stated to the contrary, are not the opinion or position of the Society of Actuaries, its cosponsors or its committees. The Society of Actuaries does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented. Attendees should note that the sessions are audio-recorded and may be published in various media, including print, audio and video formats without further notice.

3

4

Agenda

Data Trends

Data Protection Trends

Supportive Frameworks and Tools for Data Protection

5

A presentation of two halves

Establish a Defense…

…and enable an Offense (Value Creation)

6

2018 Trends to Follow

Personal Data is Constantly Being Collected

8

What data could we use?

0

50

100

150

200

250

300

350

400

450

Life Non-Life Motor Health Retail Telco Banking TechnologyCompanies

SocialMedia

Estimated average number of customer interactions per year

Source: McKinsey & Company, presented in Transforming Life Insurance with Design Thinking

http://www.mckinsey.com/industries/financial-services/our-insights/transforming-life-insurance-with-design-thinking

9

A wide variety of information is being accumulated and used

Socio-Economic

Behavioral

Basic Demographics• Age

• Benefit amount• Where you live

• Credit Behaviour• Marital status• Number of Children

• Gender

• Smoker Status• Alcohol consumption

• Occupation

• Salary• House price

• Disease history• Family History

• Prescription History

• Driving Behaviour• Avocations

• Heart Rate / HRV / etc• Genetic Data

• Education

• Car ownership

• Assets

• Social Media Usage

• Well-being• Stress

• Insurance purchasing

• Investment choices

• Care provider

• ‘Omic Data

• Sleep

• Activity • Diet

• Blood pressure etc• BMI

• Depression

• Previous claims history

Health and Biometrics

10

Linking Data SourcesUnique insights and robust solutions to business problems do not exist within a single data source…neither should our analyses and solutions

There’s Something About Mary…

Mary

Traditional U/W Data

Traditional Policy Data

Traditional Distribution

Data

Rx & Wellness Data

Financial & Credit Data

Lifestyle and Psychographic

Health and Disease States

MVRData

Social & IoT Data

Datasets, in combination, can create unique insights about Mary

Without PII, we become limited in our ability to better understand the multifaceted forces that predict future life events of Mary (and others)

PII is required for linkages

11

Insurance Value Chain and Applications

New rating factors Underwriting risk

segmentation

Competitive pricing strategy

Agent assessment

Propensity to buy Distributor quality

management

Underwriting triage Fraud / Non-

disclosure protection

Propensity to complete purchase

Determine underwriting ratings

Multivariate experience analysis

Cross-sell and up-sell

Proactive lapse management

Customer lifetime value

Fraud / Non-disclosure protection

Claims triage

Pre-sale Underwriting In-force management Claims

Level of Demand

High Medium Low

Structured/Unstructured Data

Claim management Claim validation

Data Protection Trends and Regulatory Climates

13

https://news.sky.com/story/yahoo-fined-250000-over-2014-data-breach-11402490

https://www.esecurityplanet.com/network-security/att-hit-with-record-breaking-25-million-data-breach-fine.html

https://digitalguardian.com/blog/hilton-was-fined-700k-data-breach-under-gdpr-it-would-be-420m

In the News: Data BreachesAccording to Forbes, in 2018, an average data breach incident costs the highest in U.S. as $7.91 million.

14

Regulations driving change around the globe

GDPRGeneral Data Protection Regulation

European Union

APPIAct on the Protection of

Personal InformationJapan

Cyber Security LawChina

PDPOThe Personal Data (Privacy) Ordinance

Hong Kong

Asia-Pacific Economic Cooperation’s (APEC)

Cross-Border Privacy Rules (CBPR)

POPIProtection of Personal Information Act

South Africa

Shine the LightCalifornia’s Shine the Light

privacy law

Medical Data Was KingMotor Vehicle Records ValidatedNew Forms Of Data Hit The MarketSupportive Frameworks and Tools

16

Privacy by Design:A Framework Developed by Ann CavoukianPrivacy is to be taken into account as it relates to design and operation throughout the whole of IT systems, networked infrastructure, and business practices

Proactive not reactive; preventative not remedial

Privacy as the default setting

Privacy embedded into design

Full functionality –positive-sum, not zero-sum

End-to-end security – full lifecycle protection

Visibility and transparency – keep it open

Respect for user privacy – keep it user-centric

12

345

67

Take action in advance of the event, not after-the-fact

Responsibility is on IT system design; not the individual

Core and integral; not an add-on with potential diminished utility

Win-win in for all legitimate business purposes; not a trade-off between privacy and security

Empower the user with strong privacy defaults and end-user options

Data onboarding through destruction and all steps in-between are to be included

Ensure the system is subject to ongoing compliance and validation to stakeholders

17

Data Protection Capabilities

Evaluates two or more data records containing the same, or similar data elements to make a determination if they are for the same individual

Analyzing the data set to identify fields that could contain PII, as well as combinations of fields that could be used to identify an individual. Data is then classified according to defined sensitive fields

Provides visibility into the data available in the an ecosystem sourced from internal and external sources. This may include origin or storage source, data content, usage, or other elements the organization deems important to see and track

Protecting sensitive data through a variety of techniques, including anonymization, pseudonymization or redaction, and enabling only authorized users to see specific data elements in the clear

Detect and

Classify

Master Person Index

Protect

Catalog

18

Capability: Catalog

Benefits

• Enhances the data onboarding process

• Provides visibility and single source of truth for data, sources and projects using them

• Opportunity to cross-leverage sources

• Identifies gaps for future data acquisitions

• Enables regulatory compliance efforts

Risks

• Potential security concern given catalog could be exposed

• Potential financial impact at different capability levels

Measures of Success

• # of sources• % of sources in catalog• Date range of data available• Geographies available

• Attributes available by geo or demographic

• Where PII is located• Time spent cataloging

Ability to register and track data sources

19

Capability: Detect & Classify

Benefits

• Enables regulatory compliance (e.g. GDPR)

• Provides visibility into location of PII in applications and data sets

• Increases auditability

• Creates organizational consensus and clarity on PII definitions

Risks

• Mishandling of PII, due to vague definitions

• Hefty regulatory fines due to regulatory noncompliance

• Adoption resistance depending on level of maturity

• Decreased productivity due to additional process steps

Measures of Success

• # of data sets classified• # of classified fields/columns• Adoption by business units• Dollars fined

• # of classifications based on regulatory entity

• # of instances estimated vs. observed• Location of PII

Ability to identify fields or combinations of fields that contain or constitute PII / SPII

20

Capability: Master Person Index

Benefits

• Reduces time spent on manual linkage

• Manages jumbo risks / retention limits

• Builds trust in assigning aliases, across different partners and different data sets

• Increases data quality

Risks

• Could lead to inefficient use of partner data sets if incorrectly linked

• Potential data redundancy

• Potential flaws in procedures, selection bias

Measures of Success

• Reduction in record linkage time• % of false positives or false negatives• % accuracy based on samples taken

• # of new partnerships acquired over time• % of enterprise adoption• # of linkable datasets

Evaluates two or more data records containing the same, or similar data elements to make a determination if they are for the same individual

21

Capability: Protect

Benefits

• Minimizes risk and engenders trust

• PII/SPII unintended disclosure risk is reduced

• Enterprise-wide implementation would ensure GDPR compliance

Risks

• Removal of fields could impede analysis

• Adoption resistance

• If policy definitions are too lenient the tool may not be leveraged efficiently

• Enterprise, integration risk due to potential number of touchpoints

Measures of Success

• Relies on the detection capability to quantify protected data

• Data risk assessment reduction• # of supported applications

• # of fields• # of users• % of enterprise adoption• # of new external partners acquired over time

Ability to protect PII from unnecessary exposure in the enterprise pipeline

22

A Few Protection Methods

Data ProtectionAnonymized Alternatives

Anonymization Differential Privacy Tokenization Psuedonymization

WebsterRemove identifying

information from something so that the

original cannot be known

(None) Symbol representation, or distinguishing feature Use of a fictitious name

In-’Data Practice’

Method that seeks to maximize accuracy while minimizing the ability to identify the identity of data subjects

Processing of data that can no longer be attributed to a specific data subject without the use of additional information

Random generation of a value for plain text, which is then stored in a mapping database

Distinguishing Features

One-way; permanently removes the personally identifiable information with no way of getting back to it

Adds random noise to the data while retaining meaingful aggregate statistics; can be applied to the “ride-along” data attributes

Can switch the data between the “masked token” and “in the clear” as it moves through workflows

Adds a field with a pseudonym associated with the identity; preferred method when needing to link data sources on the same individual identity

23

AnonymizationNational

ID Name DOB HIV Indicator

Smoker Indicator BMI

123456789 John Doe 10/12/198

3 Y Y 24.5

NationalID Name DOB HIV

IndicatorSmoker

Indicator BMI

123456789 John Doe 10/12/198

3 Y Y 24.5

Low risk of re-identification Theoretically impossible to get back to

the identify of an individual (if anonymized properly)

Static Cannot merge with other data sources

Advantages Disadvantages

RawInput

Protected Output

24

Differential Privacy – The IssueNationa

lID

Name

HIV Indicato

r123 John Y456 Jane Y789 Bob Y111 Betsy N333 Zach N

Public Aggregate Analysis:

3 out of 5 people have HIV

25

Differential Privacy – The IssueNationa

lID

Name

HIV Indicato

r123 John Y456 Jane Y789 Bob Y111 Betsy N333 Zach N

Public Aggregate Analysis:

3 out of 5 people have HIV

What about Bob?

Suppose an adversary gets

aggregate analysis, and all

data records except for Bob

National

ID

Name

HIV Indicato

r123 John Y456 Jane Y789 Bob111 Betsy N333 Zach N

Adversary can determine that Bob has HIV

26

Differential Privacy – A Solution

Minimizes risk of identifying individuals upon stitching together disparate pieces of information

Need to take into account the number of queries/analyses that will be conducted to ensure differential privacy is retained


NationalID Name HIV

Indicator123 John Y456 Jane Y789 Bob Y111 Betsy N333 Zach N… … …

...additional people…… … …

Before Differential Privacy

Aggregate Analysis:

600 out 1000 people have HIV

Agg. Analysis #1:574 out of 960 have

HIV (59.8%)

Differential Privacy Applied

Adds random noise to a returned query via a

mathematical function with a specific privacy

parameter, ε

Agg. Analysis #2:589 out of 980 have

HIV (60.1%)

AfterDifferential Privacy

Differential Privacy ensures the above two

analyses are differentially private

27

TokenizationNational

ID Name DOB HIV Indicator

Smoker Indicator BMI

123456789 John Doe 10/12/1983 Y Y 24.5

Protects the identity of the individual when it is not necessary

Software can token/de-token as data flows through different systems and users

Tokens by themselves are not consistent for the same data subject coming from different data lineages


In-force Data

User: Analyst


IndicatorSmoker Indicator BMI

128*$ksdk 893fdz*& 90djg24 Y Y 24.5


IndicatorSmoker Indicator BMI

123456789 John Doe 10/12/1983 Y Y 24.5

User: Claims

28

PsuedonymizationName DOB HIV

Indicator

John Doe 10/12/1983 Y

Enables merging of de-identified datasets via common pseudonyms

Dynamic ability to merge new sources to existing sources over time

Requires significant investment in an identify management service to link data subjects

Requires IT and role-based separation of duties to minimize re-identification

Advantages

Disadvantages

Raw Input

Protected Output

Name YOB Smoker Indicator

Johnathon Doe 1983 N

Psuedo Name DOB HIV Indicator

123ABC Y

Pseudo Name YOB Smoker Indicator

123ABC N

29

A Structure for Focus on DataTwo key components to ensure data (i) governance/control and (ii) value maximization

New Data-Driven Initiatives Identified

Identify Data Managerfor Initiative

Document Data Specs & Intended Use Cases

Data Organization:

Isolated Virtual Server:

Analytics Sandbox:

Privacy Impact & Data Plan: Value Maximization: Access Governance Controls

Explore new data Augment Relevant Data

Sources Link Data

Sensitive Data

De-Identified Data

30

In Closing…

Prevent inappropriate use• Damage to reputation• Loss of business (current and future)• Substantial fines, damages, and costs

Engender Trust from data contributors• Keep the data flowing and therefore ‘things to do’

Why does data protection matter to a Data Science capability?

session 26: industry best practices for data protection · session 26: industry best practices for...

Documents