discover, identify, and classify personal data in ... · azure active directory: data discovery...

19
Discover, identify, and classify personal data in Microsoft Azure Personal data discovery, identification, and classification are essential to a successful security, governance, compliance, and personal data privacy strategy. Azure customers who collect data from their users must be able to identify personal data and understand where it’s located in order to keep it secure. Azure provides a rich diversity of data storage possibilities and multiple tools that can help customers identify, classify, and search for personal data in their Azure environments, hosted applications, and external sources. This article provides guidance on how to discover, identify, and classify personal data in several Azure tools and services, including using Azure Data Catalog, Azure Active Directory, SQL Database, Power Query for Hadoop clusters in Azure HDInsight, Azure Information Protection, Azure Search, and SQL queries for Azure Cosmos DB. Scenario A U.S.-based sports company collects a variety of personal and other data from their customers and employees, maintains it in multiple databases, and stores it in several different locations in their Azure environment. In addition to selling sports equipment, they also host and manage registration for elite athletic events around the world, including in the EU. Since the company hosts many international bicycling tours every year and has contingent staff in locations around the globe, a couple of the data sets are quite large. The company also has developer- built applications that are used by both customers and employees. Problem statement The company wants to address the following issues: Customer and employee personal data must be classified/distinguished from the other data the company collects in order to ensure proper access and security. The data admin needs to easily discover the location of customer personal data across various areas of the Azure environment. Customer and employee personal data that appears in shared documents and email communications must be classified and labelled to help ensure that it’s kept secure. The company’s app developers need a way to easily search for customer and employee personal data in their web and mobile apps. Developers also need to query their document database for personal data. Company goals Data sources and assets that include personal data must be registered so they can be tagged/annotated and searched in Azure Data Catalog.

Upload: others

Post on 22-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Discover, identify, and classify personal data in Microsoft Azure Personal data discovery, identification, and classification are essential to a successful security,

governance, compliance, and personal data privacy strategy. Azure customers who collect data from

their users must be able to identify personal data and understand where it’s located in order to keep it

secure.

Azure provides a rich diversity of data storage possibilities and multiple tools that can help customers

identify, classify, and search for personal data in their Azure environments, hosted applications, and

external sources.

This article provides guidance on how to discover, identify, and classify personal data in several Azure

tools and services, including using Azure Data Catalog, Azure Active Directory, SQL Database, Power

Query for Hadoop clusters in Azure HDInsight, Azure Information Protection, Azure Search, and SQL

queries for Azure Cosmos DB.

Scenario A U.S.-based sports company collects a variety of personal and other data from their customers and

employees, maintains it in multiple databases, and stores it in several different locations in their Azure

environment. In addition to selling sports equipment, they also host and manage registration for elite

athletic events around the world, including in the EU.

Since the company hosts many international bicycling tours every year and has contingent staff in

locations around the globe, a couple of the data sets are quite large. The company also has developer-

built applications that are used by both customers and employees.

Problem statement The company wants to address the following issues:

Customer and employee personal data must be classified/distinguished from the other data the

company collects in order to ensure proper access and security.

The data admin needs to easily discover the location of customer personal data across various

areas of the Azure environment.

Customer and employee personal data that appears in shared documents and email

communications must be classified and labelled to help ensure that it’s kept secure.

The company’s app developers need a way to easily search for customer and employee personal

data in their web and mobile apps.

Developers also need to query their document database for personal data.

Company goals Data sources and assets that include personal data must be registered so they can be

tagged/annotated and searched in Azure Data Catalog.

All customer and employee personal data must be tagged/annotated in Azure Data Catalog so it

can be found easily. Ideally customer and employee personal data are tagged/annotated

separately.

Personal data from customer and employee user profiles and work information residing in Azure

Active Directory must be easily located.

Personal data residing in multiple SQL databases must be easily queried.

Some of the company’s large data sets are managed through Azure HDInsight and stored in

Hadoop. They must be imported into Excel so they can be queried for personal data.

Personal data shared in documents and email communications must be classified, labelled, and

kept secure with Azure Information Protection.

The company’s app developers must be able to discover customer and employee personal data

in the apps they’ve built, which they can do with Azure Search.

Developers must be able to find personal data in their document database.

Solutions The following Azure tools can help you with personal data identification, classification, and discovery.

Azure Data Catalog: data classification, annotation, and discovery

Azure Data Catalog is a metadata catalog that helps enterprise organizations manage and track data

sources/assets. The first step is to register them. The next step is to classify all personal data and tag or

annotate it so it’s easier to find. Finally, you can discover personal data through searching and filtering.

Once you’ve located your data, you can use its location to connect to it with the application or tool of

your choice, such as Excel or SQL Server Management Studio.

In order to use the catalog, you must be the owner or co-owner of an Azure subscription and you must

be signed in with an Azure Active Directory user account.

Note: You can only have one data catalog per organization/Azure Active Directory domain.

Data can be classified, annotated and discovered in Azure Data Catalog either manually or through a

REST API.

How do I manually register, tag/annotate, and discover/search personal data sources, assets,

and objects?

The following steps are an overview of how to register, annotate, and discover/search for data in Azure

Data Catalog. The links in these steps take you to an Azure Data Catalog tutorial with exercises that

provide more specific guidance. The exercises focus on a fictional company called AdventureWorks.

Instructions earlier in the tutorial show you how to load the actual AdventureWorks database and

provide detailed background information.

You can do the exercises or just use the information as a guideline for working with your own data.

1. Register data sources/assets

In order to search for and identify personal data with Azure Data Catalog, you need to register

your data source/assets first. Once you sign in, you’ll launch the registration tool, choose a data

source to register and register specific data objects. You can also add tags to help enable search.

Once registered, the data source or asset remains in its existing location, but a copy of the

metadata is added to Azure Data Catalog, which allows the user to more easily discover personal

data.

You can categorize data assets that contain personal information during registration with a tag

that distinguishes them as such. You can tag customer and employee personal data separately,

too. For example, tag “name,” “Social Security number,” “ID number,” and any others as

“customer personal data,” “employee personal information,” or “sensitive customer data.”

Then they’ll be discoverable with a Data Catalog search. Tags are not preset. You can use any tag

name you want.

To learn how to register your data assets, follow the instructions in the Register data assets

section of the tutorial.

There is also a how-to page that provides more information about registering, discovering,

annotating and searching data in Azure Data Catalog. For more information, visit Register data

sources in Azure Data Catalog, which is part of a larger documentation site for the service (the

full tutorial can be found under Get Started with Azure Data Catalog on this same site).

Once you’ve registered your data sources/assets/objects, you can further tag (annotate) them

and discover/search for them.

2. Annotate data sources/assets

When registering your data source/assets in step 1, you have a chance to add tags to help

categorize and identify data objects. The annotate data steps show you how to do this after

your data source/assets are registered.

The tutorial shows you how to tag data assets, but doesn’t specifically discuss personal data.

You can use a data tag like “customer personal data,” “employee personal information,” or

“sensitive customer data” to identify all fields that contain personal data, such as “name”,

“Social Security number,” “ID number” and others. You can also add tags for experts, users, or

glossary items, or add tags or descriptions at the column level.

In addition, you can add information that shows users how to request access to the data

source/asset and documentation for your assets.

To learn how to annotate/tag your data assets, follow the instructions in the Annotate data

assets section of the tutorial.

For more information, visit How to annotate data sources.

3. Discover/search for data sources/assets

Personal data assets can be discovered in Azure Data Catalog through searching and filtering.

Basic search will match terms and annotations (tags), and filtering allows you to choose tags,

source type, and other specific identifiers to complement the basic search.

To learn how to discover data, follow the instructions in the Discover data assets section of the

tutorial. You can find personal data by doing a search for the specific tag(s) you set up to identify it,

for example “customer personal data,” “employee personal information,” or “sensitive customer

data.”

For more information, visit How to discover data sources in Azure Data Catalog.

Azure Data Catalog doesn’t allow you to access your data, it just helps you track and locate it. Once

you’ve located your data, you can connect to it by using the application or tool of your choice, such as

Excel or SQL Server Management Studio. For more information, visit the Connect to data assets section

of the tutorial.

Learn more Azure Data Catalog

What is Azure Data Catalog

Get started with Azure Data Catalog

Supported data sources in Azure Data Catalog

How do I register, tag (annotate), and discover/search personal data sources, assets, and

objects using the Azure Data Catalog REST API?

To learn how to do this, visit the Azure Data Catalog REST API documentation, which includes sections

on registering, annotating, and searching.

Azure Active Directory: data discovery

Azure Active Directory is Microsoft’s cloud-based, multi-tenant directory and identity management

service. You can locate customer and employee user profiles and user work information that contain

personal data in your Azure Active Directory (AAD) environment by using the Azure portal.

This is particularly helpful if you want to find or change personal data for a specific user. You can also

add or change user profile and work information. You must sign in with an account that’s a global admin

for the directory.

How do I locate or view user profile and work information?

1. Sign in to the Azure portal with an account that's a global admin for the directory.

2. Select More services, enter Users and groups in the text box, and then select Enter.

3. On the Users and groups blade, select Users.

4. On the Users and groups - Users blade, select a user from the list, and then, on the blade for the selected user, select Profile to view user profile information that might contain personal data.

5. If you need to add or change user profile information, you can do so, and then, in the command bar, select Save.

6. On the blade for the selected user, select Work Info to view user work information that may

contain personal data.

7. If you need to add or change user work information, you can do so, and then, in the command bar, select Save.

Learn more Azure Active Directory

What is Azure Active Directory?

Azure SQL Database: data discovery

Azure SQL Database is a cloud database that helps developers build and maintain applications. Personal

data can be found in Azure SQL Database using standard SQL queries. Azure SQL elastic query (preview)

enables users to perform cross-database queries.

A detailed SQL database tutorial explains many aspects of using a SQL database, including how to build

one and how to run data queries. The following is a summary of the information available in the tutorial

with links to specific sections.

How do I build a SQL database?

There are three ways to do it:

An Azure SQL database can be created in the Azure portal. In the tutorial, you’ll use a specific set

of compute and storage resources within a resource group and logical server. You’ll use sample

data from a fictitious company called AdventureWorks. You’ll also create a server-level firewall

rule. To learn how to do this, visit the Create an Azure SQL database in the Azure portal

tutorial.

A SQL database can also be created in the Azure Cloud Shell CLI, a browser-based command line

tool. The tool is available in the Azure portal and can be run directly from there. In this tutorial,

you’ll launch the tool, define script variables, create a resource group and logical server, and

configure a server firewall rule. Then you’ll create a database with sample data. To learn how to

create your database this way, visit the Create a single Azure SQL database using the Azure CLI

tutorial.

o Note: Azure CLI is commonly used by Linux admins and developers. Some users find

it easier and more intuitive than PowerShell, which is your third option.

Finally, you can create a SQL database using PowerShell, which is a command line/script tool

used to create and manage Azure and other resources. In this tutorial, you’ll launch the tool,

define script variables, create a resource group and logical server, and configure a server firewall

rule. Then you’ll create a database with sample data.

The tutorial requires the Azure PowerShell module version 4.0 or later. Run Get-Module -

ListAvailable AzureRM to find your version. If you need to install or upgrade, see Install Azure

PowerShell module.

To learn how to create your database this way, visit the Create a single Azure SQL database

using Powershell tutorial.

Note: Windows admins tend to use PowerShell, but some of them prefer Azure CLI.

How do I search for personal data in SQL database in the Azure portal?

You can use the built-in query editor tool inside the Azure portal to search for personal data. You’ll log in

to the tool using your SQL server admin login and password, and then enter a query.

Step 5 of the tutorial shows an example query in the query editor pane, but it doesn’t focus on personal

or sensitive information (it also combines data from two tables and creates aliases for the source

column in the data set being returned). The following screenshot shows the query from Step 5 as well as

the results pane that’s returned:

If your database was called MyTable, a sample query for personal information might include name,

Social Security number and ID number and would look like this:

“SELECT Name, SSN, ID number FROM MyTable”

You’d run the query and then see the results in the Results pane.

For more information on how to query a SQL database in the Azure portal, visit the Query the SQL

database section of the tutorial.

How do I search for data in SQL database with tools such as SQL Server Management Studio,

Visual Studio Code, .NET, Python or others?

You can search for data with your preferred tool using the Azure portal, Azure CLI, or Azure PowerShell.

For more information, visit the Next steps section of the tutorial, choose your preferred tool, and then

choose the Azure resource management tool you’d like to use.

How do I search for data across multiple databases?

SQL elastic query (preview) enables you to perform cross-database and multiple database queries and

return a single result. The tutorial overview includes a detailed description of scenarios and explains the

difference between vertical and horizontal database partitioning. Horizontal partitioning is called

“sharding.”

The Next steps section includes links to more detailed tutorials that explain how to get started, syntax,

and sample queries for both types of elastic queries.

To get started, visit the Azure SQL Database elastic query overview (preview) page.

For more detailed tutorials and additional information, visit the tutorial’s Next steps section.

Learn more Azure SQL Database

What is SQL Database?

Power Query (for importing Azure HDInsight Hadoop clusters): data discovery for large data

sets

Hadoop is an open source Apache storage and processing service for large data sets, which are analyzed

and stored in Hadoop clusters. Azure HDInsight allows users to work with Hadoop clusters in Azure.

Power Query is an Excel add-in that, among other things, helps users discover data from different

sources.

Personal data associated with Hadoop clusters in Azure HDInsight can be imported to Excel with Power

Query. Once the data is in Excel you can use a query to identify it.

How do I use Excel Power Query to import Hadoop clusters in Azure HDInsight into Excel?

An HDInsight tutorial will walk you through this entire process. It explains prerequisites, and includes a

link to a Get started with Azure HDInsight tutorial. Instructions cover Excel 2016 as well as 2013 and

2010 (steps are slightly different for the older versions of Excel). If you don’t have the Excel Power Query

add-in, the tutorial shows you how to get it. You’ll start the tutorial in Excel and will need to have an

Azure Blob storage account associated with your cluster.

To learn how to do this, visit the Connect Excel to Hadoop by using Power Query tutorial.

Azure Information Protection: personal data classification for documents and email

Azure Information Protection can help Azure customers apply labels to classify and ensure the

protection of internally or externally shared documents and email communications that contain

customer or employee personal information. Rules and conditions can be defined automatically or

manually, by administrators or by users. For example, if a user is saving a document that includes credit

card information, he or she would see a label recommendation that was configured by the

administrator.

How do I try it?

If you’d like to give Azure Information Protection a try to see if it might be a fit for your organization,

visit the Quickstart tutorial. It walks you through five basic steps—from installation to configuring policy

to seeing classification, labelling, and sharing in action—and should take less than a half hour.

How do I deploy it?

If you’d like to deploy Azure Information Protection for your organization, visit the deployment roadmap

for classification, labelling, and protection.

Is there anything else I should know?

For complementary information that will help you think through how to set it up, visit the Ready, set,

protect! blog post. And check the Learn more links listed below for more on Azure Information

Protection.

Learn more

Azure Information Protection

An introduction to the service

What is Azure Information Protection?

A thorough explanation of the service that includes links to all of the how-to documentation

What is Azure Rights Management?

An explanation of the protection technology that Azure Information Protection uses

Azure Information Protection: Ready, set, protect!

A friendly blog post that complements the how-to information and might help you think through how to

approach setting it up

Azure Information Protection Deployment roadmap

A step-by-step guide for those who are ready to deploy the service

Quickstart tutorial for Azure Information Protection

A 20-minute tutorial for those who want to give the service a try

Azure Information Protection Documentation homepage

Requirements for Azure Information Protection

Azure Search: data discovery for developer apps

Azure Search is a cloud search solution for developers, and provides a rich data search experience for

your applications. Azure Search allows you to locate data across user-defined indexes, sourced from

Azure Cosmo DB, Azure SQL Database, Azure Blob Storage, Azure Table storage, or custom customer

JSON data. You can also structure Lucene queries using the Azure Search REST API to search for personal

data types or the personal data of specific individuals. Features include full text search, simple query

syntax, and Lucene query syntax. Visit the following links for more information:

Azure Search

An introduction to the service

Azure Search documentation links What is Azure Search?

How full text search works in Azure Search

Simple query syntax in Azure Search

Lucene query syntax in Azure Search

Query Azure Cosmos DB data with SQL: data discovery

Azure Cosmos DB is a scalable, globally distributed database service. You can query your Azure Cosmos

DB and data with SQL to find customer and employee personal information.

How do I use SQL to query data?

To begin with the basics, visit the Azure CosmosD DB: How to query using SQL tutorial. The tutorial

provides a sample document and two sample SQL queries and results.

For more in-depth guidance on building SQL queries, visit SQL queries for Azure Cosmos DB Document

DB API.

If you’re new to Azure Cosmos DB and would like to learn how to create a database, add a collection,

and add data, visit the Azure Cosmos DB: Build a DocumentDB API web app Quickstart tutorial. If you’d

like to do this in a language other than .NET, such as Java or Python, just choose your preferred language

once you get to the site.

More information Azure Data Catalog documentation

SQL Database Query Editor available in Azure portal

What is Azure Search?

Search query overview: Query your Azure Search index