data done right

75
Data Done Right Administrators Brian Wiebe: Technical Engagement Manager, salesforce.com Ezra Kenigsberg: Data Architect, salesforce.com

Upload: salesforce

Post on 01-Nov-2014

2.448 views

Category:

Documents


0 download

DESCRIPTION

You probably already know that managing data in Salesforce can be a formidable task. But you might not know that it doesn't have to be! In this session, we'll focus on strategies to help you with key data tasks such as data migration, managing large data volumes, org merges, and data consolidations.

TRANSCRIPT

Page 1: Data Done Right

Data Done RightAdministrators

Brian Wiebe: Technical Engagement Manager, salesforce.comEzra Kenigsberg: Data Architect, salesforce.com

Page 2: Data Done Right

Safe HarborSafe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2010. This documents and others are available on the SEC Filings section of the Investor Information section of our Web site.

Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

TinyURL.com/SalesforceSafeHarbor

Page 3: Data Done Right

Purpose

To walk through three big data issues that can help

make you an even-better administrator.

Page 4: Data Done Right

This Session… Other Sessions…

Practical demos—things you

can do TODAY

Required– Data Loader

– Microsoft Excel

– A decent text editor

(I use Notepad++)

Optional– Cloud Converter

– Synchronizer

(requires Microsoft Access)

Bigger-picture data strategy

Professional third-party tools

Page 5: Data Done Right

Overview

1. Introduction

5 min

2. Moving Data

15 min

3. Cleaning Data

15 min

4. Working with Large Data Volumes

15 min

5. Q&A

until they kick us out

Prior to making any major changes to your org:

BACK UP!

Page 6: Data Done Right

Ezra Kenigsberg

salesforce.com

Page 7: Data Done Right

Moving Data

Page 8: Data Done Right

The Scenario

The scenario we’re walking through:

Gotta import new records by tomorrow

We’re creating a repeatable, documented process– “Just load it” fails the hit-by-a-bus test…

• …is difficult to audit after the fact

• …may not be reversible if I’ve made a mistake

Page 9: Data Done Right

Links and Tools

Useful links:

developer.force.com/consultants

EzraKenigsberg.com

Our tools:

Required– Data Loader

– Microsoft Excel

– A decent text editor (I use Notepad++)

Optional– Cloud Converter

– Synchronizer (requires Microsoft Access)

Page 10: Data Done Right

Useful Links: developer.force.com/consultants

Dedicated pages for• Data Migration• Large Data Volumes• many others

Page 11: Data Done Right

Useful Links: EzraKenigsberg.com

Dedicated sections for• Handy Tools• Reference Links• Presentations• Requests for Salesforce & Data Loader Improvements

Page 12: Data Done Right

Three Key Steps

Three steps:

1. How should I map my data?

2. How can I automate the generating of CSVs?

3. How can I load data in an auditable way?

HOW do I

generate? load?map?

HOW do I

generate? load?map?

Page 13: Data Done Right

Moving Data #1: How Do I Map?

Create a mapping file:

1. Create list of source fields in source file/s

2. Create list of API field names (not the UI labels!)– Get them with Data Loader or Cloud Converter

3. Match source fields to API field names

HOW do I

map?

Page 14: Data Done Right

Grab the Source Field Names

Transposed in Excel

Excel 2003:Edit | Paste Special | Transpose

Excel 2007 or 2010:Home | Paste | Paste Special | Transpose

1

2

HOW do I

map?

Page 15: Data Done Right

Grab the API Field Names (slide 1 of 2)

2

Model Metrics’ free utility “Cloud Converter” is a straightforward

way to export metadata

1

HOW do I

map?

Page 16: Data Done Right

Grab the API Field Names (slide 2 of 2)

2

1

...and a row for

every field

“Cloud Converter” exports a tab for every object...

HOW do I

map?

Page 17: Data Done Right

Build Out Mapping File

HOW do I

map?

Page 18: Data Done Right

Index Column

An index columnfacilitates latersorting

HOW do I

map?

Page 19: Data Done Right

A Column of Arrows

Arrows remind people of the DIRECTION of data

HOW do I

map?

Page 20: Data Done Right

Not Every Field Has to Map

Not every column has to map

HOW do I

map?

Page 21: Data Done Right

Keep it Simple!

The most common cause of bad data maps?

Too much stuff!

Put in only the columns you’ll UPDATE and USE

HOW do I

map?

Page 22: Data Done Right

Moving Data #2: How Do I Generate CSVs?

Import legacy data into a tool that enables REUSE– Raw data comes in, ready-to-load data comes out

– Minimize manual steps. Tools I’m not so thrilled with:• Microsoft Excel

• Import Wizard

– Each subsequent load becomes a straightforward process• Test-then-Production

• Follow-up loads

Which tool should I use?– Good question! Can you wait eight slides?

HOW do I

generate?

Page 23: Data Done Right

Moving Data #3: How Do I Load?

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

Page 24: Data Done Right

Folder Naming

YYYY-MM-DD#NNforces folders

to sort inchronological

order

HOW do I

load?

Page 25: Data Done Right

Keep All Files Together

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

Page 26: Data Done Right

Folder Contents (1 of 3)

Save source file, success files, and error files all in same directory

HOW do I

load?

Page 27: Data Done Right

Folder Contents (2 of 3)

When loading errors:• Copy error file and• Use as source for next load

1

2

HOW do I

load?

Page 28: Data Done Right

Folder Contents (3 of 3)HOW do I

load?

Page 29: Data Done Right

Dummy-Proof the Auto-Match

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

Page 30: Data Done Right

Perfect Auto-Match

With a file created this way, Auto-Match gets every field mapped

HOW do I

load?

Page 31: Data Done Right

The Smarter Way to Load Data: Upsert!

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

Page 32: Data Done Right

export UPSERT UPSERT

Why UPSERT and EXTERNAL IDs are Great

With Upsert & External IDs

export map update insert

map map update insert

a

c

a

c

a

c

a

c

a

c

cc

a

c

a

c

aa

c

Without Upsert & External IDs

a

c

a

Page 33: Data Done Right

HOW do I

generate? load?map?

Synchronizer is an OPEN SOURCE application that is NOT

SUPPORTED by Salesforce

Synchronizer has functions that facilitate all these tasks:– Mapping

– Generating CSVs

– Loading data

HOW do I

Synchronize?

How Synchronizer Automates All This

Page 34: Data Done Right

Synchronizer Walkthrough

In this walkthough, we’ll:

1. Import legacy data into Synchronizer

2. Map the data

3. Document the data map

4. Migrate the data into Salesforce

5. Review the files created

HOW do I

Synchronize?

Page 35: Data Done Right

Synchronizer Review: Import and Map

One-button CSV importer (not perfect, but fast and simple):

Grab API fields, then tie legacy fields to API fields:

HOW do I

Synchronize?

Page 36: Data Done Right

Synchronizer Review: Create Data Map

One-click data map generator:

HOW do I

Synchronize?

Page 37: Data Done Right

Synchronizer Review: Migrate

One-screen UI for loading data

HOW do I

Synchronize?

Page 38: Data Done Right

Synchronizer Review: Organize Files

Folder and file discipline:

Reimporting of success and error files for use in future loads

HOW do I

Synchronize?

Page 39: Data Done Right

Synchronizer Review: Other Goodies

Some other Synchronizer functions not covered:

Ability to run multiple steps in sequence

Scheduling

Mass-create tasks– Creating Users

– Assigning Users to Groups

Custom reports– Storage usage by User, by object

HOW do I

Synchronize?

Page 40: Data Done Right

Best Practices

1. Build a mapping file

2. Leverage a tool to generate CSVs

3. Use loading Best Practices

Get Synchronizer and help make it better!

HOW do I

generate? load?map?

Page 41: Data Done Right

Brian Wiebe

salesforce.com

Page 42: Data Done Right

Cleaning Data

Page 43: Data Done Right

43

What is Data Quality?Combination of Processes, Policies and Tools

Involves Governance, Enforcement, Prevention Goal is not perfection

What are the typical Issues?Duplicates (Account, Contact), Incomplete information,

Stale or Untouched data, Inconsistent values, Incorrect linkages

What are the typical causes?Not part of Budget, Unmeasurable problem

No Action Plan, No Ownership, Lack of Training, Non-optimized salesforce.com

Key Data Quality ConceptsDefining a Broad Topic

Page 44: Data Done Right

The Full Data Quality Lifecycle

AssessAssess

Cle

anse

•Train users•Enforce processes •Monitor on-going

quality

Data Protect

Protect

Our 3-step, iterative process quickly identifies problems, fixes them and helps you maintain high data quality over time

Data Cleanse

Data QualityAssessment•Profile data•Analyze results• Identify problems

and next steps

•Standardize & Cleanse

•Supplement & Enrich

•Test & Load

Page 45: Data Done Right

Data Quality Assessment

Page 46: Data Done Right

Data Quality Assessment

Page 47: Data Done Right

Project Planning

Strong Sponsorship– Committed Involvement & Availability (The DQ Assessment helps justify this)– Appreciation, Awareness & Understanding of Data complexities

Limiting Scope / Phased Approach– ACHIEVABLE Goals– Define critical quick-win items for Phase 1 (focus on biggest issues for end-users)

Test, Test, Test– Leverage your Sandbox Environment– Data Quality cleansing is a “destructive” process

Plan for End-user involvement– Data Quality is an iterative process – and MUST involve end-user buy-in and input

If the foundation is off..

Page 48: Data Done Right

Begin Governance & Stewardship– Involve IT and Business users– Monitoring Data Quality Dashboards – report back monthly– Use Salesforce features (e.g., Data Validation Rules, Conditional Workflow field

updates, Analytic Snapshots for trending)

Archive un-used Data– Data must be USEFUL to the Business and must be justifiable– Candidates for archiving : last updated > 1 year ago, no child records, Missing

Core Required Fields

Correct existing data– Users who have left company and STILL own records, Find/Replace picklist

values, Apply Naming Standards

Data Quality Solution Considerations

Page 49: Data Done Right

Identify and remove Dupes– Low hanging fruit:

• Simple dupes: e.g.,) match on a unique key like email address• Flag dupes for merging in Salesforce

– Leverage available de-dupe tools• Complex definition of dupe: e.g.,) fuzzy matching on name+address

– Define your rules (matching rules, merging rules)

Enrich & Append– Enrich your existing data– Add NEW data for known companies– 3rd party data vendors – helpful in creating Account hierarchies, helpful

for accurate Contact Info – especially at various levels in the Company

Data Quality Solution Considerations

Page 50: Data Done Right

Limit points of Entry– List Imports restricted to certain profiles– Control data being entered – without overwhelming users– Leverage Sales Intelligence Tools, Dupe Prevention, Address

Validation

Automation / Integration– Integrations, Master Data Management– Nightly Batch Updates

Data Quality Solution Considerations

Page 51: Data Done Right

Cleansing Environment

Staging

Staging

Production

• Transform & Re-model• Cleanse & Standardize• Enrich & De-dupe• Iterate• Validate with Business

Users

Page 52: Data Done Right

Company Name & Address

Enrich (Optional)Enrich (Optional)

Acme Inc HQAcme UK

Hierarchy Data

Demographics

3

Names

StandardizeStandardize

US, U.S. U.S.A USA

acme incorpAcme Inc

Addresses

Postal Standards

Identify, Match & Score

4

De-dupeDe-dupe

J. Smith, John Smith 80%

Re-parent Child Records

Account: Division, Opportunity, Contact

Merge

J. Smith, John Smith John Smith

Find & Replace

2

CleanseCleanse

Acme-Widgets-453

Hot HighCold Low

Data Transformation

Naming Conventions

Mergers, acquisitions, spin-offs

Archiving & Filtering

Load to Sandbox

5

ValidateValidate

Load to Production

Validate & Modify

1

Cleansing Process

Page 53: Data Done Right

Safeguard your cleansed data and prevent future deterioration.

TrainTrain

• User Training• Naming Conventions• Address Conventions• Dupe. Prevention Process• Data Importing Policies

• Required Fields• Default Values• Data Validation Rules• Workflow Field Updates• Web-to-Lead Restrictions

• Data Quality Dashboards• Data Quality

Reassessment• AppExchange Tools

EnforceEnforce MonitorMonitor

Protect Your Data

Page 54: Data Done Right

What tools do I use?

Page 55: Data Done Right

The AppExchange The Trusted Cloud Computing Marketplace

1000+Pre-Integrated Apps

300+ Services

4000+Customer Reviews

Page 56: Data Done Right

200+ Free Apps to Get You Started

• Reports & dashboards to end-to-end templates

• Fully customizable

Page 57: Data Done Right

AppExchange Tools Worth Checking Out!

Cloud Converter (Free)

Synchronizer (Free)

Jigsaw for Salesforce (Paid)

CRM Fusion (Paid)

Data Quality Dashboard (EE edition)

Page 58: Data Done Right

Data Quality Analysis Dashboard (EE Edition)

All reports pull from just TWO formula fields.

Page 59: Data Done Right

59

The Formula Field

You can EXPAND these formulas to include YOUR custom fields.

Page 60: Data Done Right

Brian WiebeEzra Kenigsberg

salesforce.com

Page 61: Data Done Right

Managing Large Data Volumes (LDV)

Page 62: Data Done Right

What Do We Mean by Large Data Volumes?

You know you’ve got scale when …– 1,000s of Users

– 1,000,000s of records for a single Object

– Role or Territory hierarchy > dozens of levels

– Public Groups nested > 5 levels deep

– A single User, Queue, Role, Public Group, or Territory:• Owning 10,000s of records

• Seeing 10,000s of records as a result of sharing

– 10,000s of Public Groups

– 1,000s of Territories

These are NOT hard limits—only useful guides to proceed carefully!

Talk to your Account Executive

Page 63: Data Done Right

Where Would We Proceed Carefully?

User Interface– Reports, Dashboards, List Views

– Searches

API– Queries

– Integration

Synchronizing with end-user (Outlook, Mobile)

apps

Page 64: Data Done Right

What Options are Available?

1. Segment

2. Optimize sharing

3. Leverage indexes & skinny tables

4. Move data asynchronously

Page 65: Data Done Right

1) Segmenting with Divisions Think millions

Use data access patterns

– Does everyone really look at

everything?

Acts like DB partitions

– Breaks up big objects

– Aim for <1M / division

Used for performance of search,

reports, dashboards, and list

views

Not a security measure!

Global Division

… SOHO SMB MSB Enterprise Gov …

By Geography

By Responsibility

Example: Financial Services Customer13M Clients / ~500 Branches = 26,000 records per Division

Page 66: Data Done Right

Arc

hive

1) Segmenting with Tiered Data Think tens of millions

Focus users on active data– Open cases

– Warm leads

– Recent history

Segregate inactive data

Can be used with Divisions

The Archive Data table...– ...doesn’t have to be custom

– ...doesn’t have to be Salesforce

Can use Analytic Snapshots

Example: Financial Services Customer145M Activities - 90M Legacy = 55M55M / ~500 Branches = 110,000

Active DataStandard or Custom ObjectStandard functionality

Batch ApexScheduled Apex

Archive DataStandard or Custom ObjectSubset of columnsReport focused

Page 67: Data Done Right

2) Optimize Sharing

Use private sharing strategically

Enforce ownership to prevent data

concentration– “Super-owner” individuals

Streamline hierarchies– Limit depth of nested groups

– Roles, Groups, Territories, etc.

Leverage all capabilities– Apex Managed Sharing for custom

objects

Page 68: Data Done Right

3) Leverage Custom Indexes

Standard Indexes: Created Date, Last Modified Date, Division, Record Type

Administrators can index fields bydesignating them as External IDs

Custom Indexes are availablethrough Support– Multi-column custom indexes also supported

– Can be applied based on use case, impact and priority

– Salesforce.com working on automatically detecting the need

Example: Large Japanese Insurance Co.Custom object with 10M records queried regularly by 50,000+ users80x boost in query perf. due to multi-column Custom Index

Page 69: Data Done Right

3) A Word About Skinny Tables

Database

Innovation– No work required

– Managed by

salesforce.com

2-10x performance

for some analytics

Also available

through Support

Name Address 1 ST Comments SUM

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

Name ST SUM

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

BaseTable

Ski

nn

yTa

ble

Faster Reports(more rows fit in memory)

Faster Reports(more rows fit in memory)

Fewerrowsper

fetch

Morerowsper

fetch

Page 70: Data Done Right

Data streamed to temporary storageData streamed to

temporary storage

ClientClientProcessing

ThreadProcessing

Thread

Processing Thread

Processing Thread

Processing Servers

Processing Servers

JobJob

Data Batches

Data Batches

Dequeue batch

Dequeue batch

Insert/updateInsert/update

Save resultsSave

results

Send all data

Send all data

Check StatusCheck Status

Retrieve ResultsRetrieve Results

4) Bulk Up—Asynchronously

Job updated in Admin Setup

Job updated in Admin Setup

Dataset processed in parallel

Dataset processed in parallel

ResultsResults

Page 71: Data Done Right

Bulk API

The “go-to” option for tens of thousands of records

and up

Up to 10,000 records in a batch file

Asynchronous loading, tracked in Salesforce’s

Walkthrough time!

Upsert legacy data into Salesforce—FAST

Example: American Insurance Co.230 million records processed in 33 hours,14 hours ahead of schedule

Page 72: Data Done Right

Q&A

Page 73: Data Done Right

Q&A

Links:

developer.force.com/consultants

ezrakenigsberg.com

Post-Session Questions?• Brian Wiebe

Technical Engagement Manager (West)[email protected]

• Ezra KenigsbergData Architect (Midwest) [email protected]

Page 74: Data Done Right

D I S C O V E R

Visit Customer Success Team at Campground

Discover

Training

Learning Paths

Experience

Product

Demos

Learn about Customer

Resources

the products, services and resources

Meet Success Experts

S U C C E S S

Find us at the Customer Success Team area of salesforce.com Campground at Moscone North

Learn about how to win prizes including 10 iPads & more!

that help you achieve

Page 75: Data Done Right

How Could Dreamforce Be Better? Tell Us!

Log in to the Dreamforce app to submit

surveys for the sessions you attendedUse the

Dreamforce Mobile app to submit

surveysEvery session survey you submit is

a chance to win an iPod nano!

OR