session 26: industry best practices for data protection · session 26: industry best practices for...
TRANSCRIPT
2018 Predictive Analytics Symposium
Session 26: Industry Best Practices for Data Protection
SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer
Industry Best Practices for Data Protection
Brad Lipic
September 20, 2018
Vice President, Head of Data Strategy
Global Research and Data Analytics (GRDA)
2
SOA Antitrust Compliance GuidelinesActive participation in the Society of Actuaries is an important aspect of membership. While the positive contributions of professional societies and associations are well-recognized and encouraged, association activities are vulnerable to close antitrust scrutiny. By their very nature, associations bring together industry competitors and other market participants.
The United States antitrust laws aim to protect consumers by preserving the free economy and prohibiting anti-competitive business practices; they promote competition. There are both state and federal antitrust laws, although state antitrust laws closely follow federal law. The Sherman Act, is the primary U.S. antitrust law pertaining to association activities. The Sherman Act prohibits every contract, combination or conspiracy that places an unreasonable restraint on trade. There are, however, some activities that are illegal under all circumstances, such as price fixing, market allocation and collusive bidding.
There is no safe harbor under the antitrust law for professional association activities. Therefore, association meeting participants should refrain from discussing any activity that could potentially be construed as having an anti-competitive effect. Discussions relating to product or service pricing, market allocations, membership restrictions, product standardization or other conditions on trade could arguably be perceived as a restraint on trade and may expose the SOA and its members to antitrust enforcement procedures.
While participating in all SOA in person meetings, webinars, teleconferences or side discussions, you should avoid discussingcompetitively sensitive information with competitors and follow these guidelines:
• -Do not discuss prices for services or products or anything else that might affect prices• -Do not discuss what you or other entities plan to do in a particular geographic or product markets or with particular customers.• -Do not speak on behalf of the SOA or any of its committees unless specifically authorized to do so.• -Do leave a meeting where any anticompetitive pricing or market allocation discussion occurs.• -Do alert SOA staff and/or legal counsel to any concerning discussions• -Do consult with legal counsel before raising any matter or making a statement that may involve competitively sensitive
information.
Adherence to these guidelines involves not only avoidance of antitrust violations, but avoidance of behavior which might be so construed. These guidelines only provide an overview of prohibited activities. SOA legal counsel reviews meeting agenda and materials as deemed appropriate and any discussion that departs from the formal agenda should be scrutinized carefully. Antitrust compliance is everyone’s responsibility; however, please seek legal counsel if you have any questions or concerns.
2
3
Presentation Disclaimer
Presentations are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants individually and, unless expressly stated to the contrary, are not the opinion or position of the Society of Actuaries, its cosponsors or its committees. The Society of Actuaries does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented. Attendees should note that the sessions are audio-recorded and may be published in various media, including print, audio and video formats without further notice.
3
4
Agenda
Data Trends
Data Protection Trends
Supportive Frameworks and Tools for Data Protection
5
A presentation of two halves
Establish a Defense…
…and enable an Offense (Value Creation)
6
2018 Trends to Follow
Personal Data is Constantly Being Collected
8
What data could we use?
0
50
100
150
200
250
300
350
400
450
Life Non-Life Motor Health Retail Telco Banking TechnologyCompanies
SocialMedia
Estimated average number of customer interactions per year
Source: McKinsey & Company, presented in Transforming Life Insurance with Design Thinking
9
A wide variety of information is being accumulated and used
Socio-Economic
Behavioral
Basic Demographics• Age
• Benefit amount• Where you live
• Credit Behaviour• Marital status• Number of Children
• Gender
• Smoker Status• Alcohol consumption
• Occupation
• Salary• House price
• Disease history• Family History
• Prescription History
• Driving Behaviour• Avocations
• Heart Rate / HRV / etc• Genetic Data
• Education
• Car ownership
• Assets
• Social Media Usage
• Well-being• Stress
• Insurance purchasing
• Investment choices
• Care provider
• ‘Omic Data
• Sleep
• Activity • Diet
• Blood pressure etc• BMI
• Depression
• Previous claims history
Health and Biometrics
10
Linking Data SourcesUnique insights and robust solutions to business problems do not exist within a single data source…neither should our analyses and solutions
There’s Something About Mary…
Mary
Traditional U/W Data
Traditional Policy Data
Traditional Distribution
Data
Rx & Wellness Data
Financial & Credit Data
Lifestyle and Psychographic
Health and Disease States
MVRData
Social & IoT Data
Datasets, in combination, can create unique insights about Mary
Without PII, we become limited in our ability to better understand the multifaceted forces that predict future life events of Mary (and others)
PII is required for linkages
11
Insurance Value Chain and Applications
New rating factors Underwriting risk
segmentation
Competitive pricing strategy
Agent assessment
Propensity to buy Distributor quality
management
Underwriting triage Fraud / Non-
disclosure protection
Propensity to complete purchase
Determine underwriting ratings
Multivariate experience analysis
Cross-sell and up-sell
Proactive lapse management
Customer lifetime value
Fraud / Non-disclosure protection
Claims triage
Pre-sale Underwriting In-force management Claims
Level of Demand
High Medium Low
Structured/Unstructured Data
Claim management Claim validation
Data Protection Trends and Regulatory Climates
13
https://news.sky.com/story/yahoo-fined-250000-over-2014-data-breach-11402490
https://www.esecurityplanet.com/network-security/att-hit-with-record-breaking-25-million-data-breach-fine.html
https://digitalguardian.com/blog/hilton-was-fined-700k-data-breach-under-gdpr-it-would-be-420m
In the News: Data BreachesAccording to Forbes, in 2018, an average data breach incident costs the highest in U.S. as $7.91 million.
14
Regulations driving change around the globe
GDPRGeneral Data Protection Regulation
European Union
APPIAct on the Protection of
Personal InformationJapan
Cyber Security LawChina
PDPOThe Personal Data (Privacy) Ordinance
Hong Kong
Asia-Pacific Economic Cooperation’s (APEC)
Cross-Border Privacy Rules (CBPR)
POPIProtection of Personal Information Act
South Africa
Shine the LightCalifornia’s Shine the Light
privacy law
Medical Data Was KingMotor Vehicle Records ValidatedNew Forms Of Data Hit The MarketSupportive Frameworks and Tools
16
Privacy by Design:A Framework Developed by Ann CavoukianPrivacy is to be taken into account as it relates to design and operation throughout the whole of IT systems, networked infrastructure, and business practices
Proactive not reactive; preventative not remedial
Privacy as the default setting
Privacy embedded into design
Full functionality –positive-sum, not zero-sum
End-to-end security – full lifecycle protection
Visibility and transparency – keep it open
Respect for user privacy – keep it user-centric
12
345
67
Take action in advance of the event, not after-the-fact
Responsibility is on IT system design; not the individual
Core and integral; not an add-on with potential diminished utility
Win-win in for all legitimate business purposes; not a trade-off between privacy and security
Empower the user with strong privacy defaults and end-user options
Data onboarding through destruction and all steps in-between are to be included
Ensure the system is subject to ongoing compliance and validation to stakeholders
17
Data Protection Capabilities
Evaluates two or more data records containing the same, or similar data elements to make a determination if they are for the same individual
Analyzing the data set to identify fields that could contain PII, as well as combinations of fields that could be used to identify an individual. Data is then classified according to defined sensitive fields
Provides visibility into the data available in the an ecosystem sourced from internal and external sources. This may include origin or storage source, data content, usage, or other elements the organization deems important to see and track
Protecting sensitive data through a variety of techniques, including anonymization, pseudonymization or redaction, and enabling only authorized users to see specific data elements in the clear
Detect and
Classify
Master Person Index
Protect
Catalog
18
Capability: Catalog
Benefits
• Enhances the data onboarding process
• Provides visibility and single source of truth for data, sources and projects using them
• Opportunity to cross-leverage sources
• Identifies gaps for future data acquisitions
• Enables regulatory compliance efforts
Risks
• Potential security concern given catalog could be exposed
• Potential financial impact at different capability levels
Measures of Success
• # of sources• % of sources in catalog• Date range of data available• Geographies available
• Attributes available by geo or demographic
• Where PII is located• Time spent cataloging
Ability to register and track data sources
19
Capability: Detect & Classify
Benefits
• Enables regulatory compliance (e.g. GDPR)
• Provides visibility into location of PII in applications and data sets
• Increases auditability
• Creates organizational consensus and clarity on PII definitions
Risks
• Mishandling of PII, due to vague definitions
• Hefty regulatory fines due to regulatory noncompliance
• Adoption resistance depending on level of maturity
• Decreased productivity due to additional process steps
Measures of Success
• # of data sets classified• # of classified fields/columns• Adoption by business units• Dollars fined
• # of classifications based on regulatory entity
• # of instances estimated vs. observed• Location of PII
Ability to identify fields or combinations of fields that contain or constitute PII / SPII
20
Capability: Master Person Index
Benefits
• Reduces time spent on manual linkage
• Manages jumbo risks / retention limits
• Builds trust in assigning aliases, across different partners and different data sets
• Increases data quality
Risks
• Could lead to inefficient use of partner data sets if incorrectly linked
• Potential data redundancy
• Potential flaws in procedures, selection bias
Measures of Success
• Reduction in record linkage time• % of false positives or false negatives• % accuracy based on samples taken
• # of new partnerships acquired over time• % of enterprise adoption• # of linkable datasets
Evaluates two or more data records containing the same, or similar data elements to make a determination if they are for the same individual
21
Capability: Protect
Benefits
• Minimizes risk and engenders trust
• PII/SPII unintended disclosure risk is reduced
• Enterprise-wide implementation would ensure GDPR compliance
Risks
• Removal of fields could impede analysis
• Adoption resistance
• If policy definitions are too lenient the tool may not be leveraged efficiently
• Enterprise, integration risk due to potential number of touchpoints
Measures of Success
• Relies on the detection capability to quantify protected data
• Data risk assessment reduction• # of supported applications
• # of fields• # of users• % of enterprise adoption• # of new external partners acquired over time
Ability to protect PII from unnecessary exposure in the enterprise pipeline
22
A Few Protection Methods
Data ProtectionAnonymized Alternatives
Anonymization Differential Privacy Tokenization Psuedonymization
WebsterRemove identifying
information from something so that the
original cannot be known
(None) Symbol representation, or distinguishing feature Use of a fictitious name
In-’Data Practice’
Method that seeks to maximize accuracy while minimizing the ability to identify the identity of data subjects
Processing of data that can no longer be attributed to a specific data subject without the use of additional information
Random generation of a value for plain text, which is then stored in a mapping database
Distinguishing Features
One-way; permanently removes the personally identifiable information with no way of getting back to it
Adds random noise to the data while retaining meaingful aggregate statistics; can be applied to the “ride-along” data attributes
Can switch the data between the “masked token” and “in the clear” as it moves through workflows
Adds a field with a pseudonym associated with the identity; preferred method when needing to link data sources on the same individual identity
23
AnonymizationNational
ID Name DOB HIV Indicator
Smoker Indicator BMI
123456789 John Doe 10/12/198
3 Y Y 24.5
NationalID Name DOB HIV
IndicatorSmoker
Indicator BMI
123456789 John Doe 10/12/198
3 Y Y 24.5
Low risk of re-identification Theoretically impossible to get back to
the identify of an individual (if anonymized properly)
Static Cannot merge with other data sources
Advantages Disadvantages
RawInput
Protected Output
24
Differential Privacy – The IssueNationa
lID
Name
HIV Indicato
r123 John Y456 Jane Y789 Bob Y111 Betsy N333 Zach N
Public Aggregate Analysis:
3 out of 5 people have HIV
25
Differential Privacy – The IssueNationa
lID
Name
HIV Indicato
r123 John Y456 Jane Y789 Bob Y111 Betsy N333 Zach N
Public Aggregate Analysis:
3 out of 5 people have HIV
What about Bob?
Suppose an adversary gets
aggregate analysis, and all
data records except for Bob
National
ID
Name
HIV Indicato
r123 John Y456 Jane Y789 Bob111 Betsy N333 Zach N
Adversary can determine that Bob has HIV
26
Differential Privacy – A Solution
Minimizes risk of identifying individuals upon stitching together disparate pieces of information
Need to take into account the number of queries/analyses that will be conducted to ensure differential privacy is retained
Advantages Disadvantages
NationalID Name HIV
Indicator123 John Y456 Jane Y789 Bob Y111 Betsy N333 Zach N… … …
...additional people…… … …
Before Differential Privacy
Aggregate Analysis:
600 out 1000 people have HIV
Agg. Analysis #1:574 out of 960 have
HIV (59.8%)
Differential Privacy Applied
Adds random noise to a returned query via a
mathematical function with a specific privacy
parameter, ε
Agg. Analysis #2:589 out of 980 have
HIV (60.1%)
AfterDifferential Privacy
Differential Privacy ensures the above two
analyses are differentially private
27
TokenizationNational
ID Name DOB HIV Indicator
Smoker Indicator BMI
123456789 John Doe 10/12/1983 Y Y 24.5
Protects the identity of the individual when it is not necessary
Software can token/de-token as data flows through different systems and users
Tokens by themselves are not consistent for the same data subject coming from different data lineages
Advantages Disadvantages
In-force Data
User: Analyst
NationalID Name DOB HIV
IndicatorSmoker Indicator BMI
128*$ksdk 893fdz*& 90djg24 Y Y 24.5
NationalID Name DOB HIV
IndicatorSmoker Indicator BMI
123456789 John Doe 10/12/1983 Y Y 24.5
User: Claims
28
PsuedonymizationName DOB HIV
Indicator
John Doe 10/12/1983 Y
Enables merging of de-identified datasets via common pseudonyms
Dynamic ability to merge new sources to existing sources over time
Requires significant investment in an identify management service to link data subjects
Requires IT and role-based separation of duties to minimize re-identification
Advantages
Disadvantages
Raw Input
Protected Output
Name YOB Smoker Indicator
Johnathon Doe 1983 N
Psuedo Name DOB HIV Indicator
123ABC Y
Pseudo Name YOB Smoker Indicator
123ABC N
29
A Structure for Focus on DataTwo key components to ensure data (i) governance/control and (ii) value maximization
New Data-Driven Initiatives Identified
Identify Data Managerfor Initiative
Document Data Specs & Intended Use Cases
Data Organization:
Isolated Virtual Server:
Analytics Sandbox:
Privacy Impact & Data Plan: Value Maximization: Access Governance Controls
Explore new data Augment Relevant Data
Sources Link Data
Sensitive Data
De-Identified Data
30
In Closing…
Prevent inappropriate use• Damage to reputation• Loss of business (current and future)• Substantial fines, damages, and costs
Engender Trust from data contributors• Keep the data flowing and therefore ‘things to do’
Why does data protection matter to a Data Science capability?