aen007 kenigsberg 091807

Post on 22-Nov-2014

1.504 Views

Category:

Economy & Finance

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Full session information and video available at Successforce.com.

TRANSCRIPT

Applied Data Quality

Ezra Kenigsberg

Data Architect

salesforce.com

Admin III: Expanding into new areas

Safe Harbor Statement

“Safe harbor” statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements including but not limited to statements concerning the potential market for our existing service offerings and future offerings. All of our forward looking statements involve risks, uncertainties and assumptions. If any such risks or uncertainties materialize or if any of the assumptions proves incorrect, our results could differ materially from the results expressed or implied by the forward-looking statements we make.

The risks and uncertainties referred to above include - but are not limited to - risks associated with possible fluctuations in our operating results and cash flows, rate of growth and anticipated revenue run rate, errors, interruptions or delays in our service or our Web hosting, our new business model, our history of operating losses, the possibility that we will not remain profitable, breach of our security measures, the emerging market in which we operate, our relatively limited operating history, our ability to hire, retain and motivate our employees and manage our growth, competition, our ability to continue to release and gain customer acceptance of new and improved versions of our service, customer and partner acceptance of the AppExchange, successful customer deployment and utilization of our services, unanticipated changes in our effective tax rate, fluctuations in the number of shares outstanding, the price of such shares, foreign currency exchange rates and interest rates.

Further information on these and other factors that could affect our financial results is included in the reports on Forms 10-K, 10-Q and 8-K and in other filings we make with the Securities and Exchange Commission from time to time. These documents are available on the SEC Filings section of the Investor Information section of our website at www.salesforce.com/investor. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements, except as required by law.

Purpose

To give you data techniques that can help make you a better administrator

This session… Other sessions…

Practical demos—things you can do TODAY

Tactical ideas and technical detail

Data Loader, Excel,Excel Connector, Access

Bigger-picture data strategy

Third-party tools

Outside this session:Please use these resources

Successforce.com

Salesforce.com’s Data Services group Brian Wiebe, Program Director Regional Data Architects

Product demos downstairs Developer Network: salesforce.com/developer

This Year’s Sessions: Data Data Everywhere! Data Quality: Who Doesn’t Want It?

Last Year’s Sessions: xrl.us/DataDataData xrl.us/ExcelConnectorDemo xrl.us/SystemOverload

Overview

1. De-duplicating existing data 20 min

2. Managing data loads better 20 min

3. Keeping data clean 5 min

4. Q&A 10 min

WHAT’S HOW do I

my data? match?

De-duplicatinga dupe?

WHAT’S HOW do I

my data? match?

I’ve gotta de-dupe the worst records by tomorrow Four questions:

1. What qualifies as a duplicate?

2. What’s my data look like?

3. How will I determine matches?

4. How will I merge records?

a dupe? merge?

merge?

Typical questions: Should we track individual physical customer locations? Should we merge all Accounts in a single country/region? Is it okay to merge clearly different records? Do we need different definitions of “dupe” for different business

units/regions?

. . . answers to these questions determineHierarchy Depth

Records per Customer

WHAT’S

a dupe?

A dupe’s definition determines the dataset!

De-duplicating1: What’s a dupe?

De-duplicating1: What’s a dupe? (cont’d)

PROBLEM: Too many records in org Successive de-dupe runs

were not eliminating as many records as expected

SOLUTION: Make records more

generalized by1. merging obvious

duplicates

AND2. merging different physical-

location records within US into unified records

WHAT’S

a dupe?

Fortune 500 company8,000+ EmployeesOperates in 30+ countries

Case Study:

Some hierarchy possibilities:1. One-and-only-one Account record2. Single parent Account and many children

(no grandparents or grandchildren)3. Multi-tier

Multi-tier example: D&B implemented within Salesforce Global Ultimate one across the entire world

o Domestic Ultimate one per country• Branch multiple physical locations

– Subsidiary one per physical location

De-duplicating1: What’s a dupe? (cont’d)

DecreasingDecreasingEase-of-Ease-of-

MaintenanceMaintenance

WHAT’S

a dupe?

For this example: We’ll de-dupe to the physical-location level We’ll allow hierarchy to be multi-tier We’ll use Excel Connector

• Following steps apply regardless of de-dupe tool

De-duplicating1: What’s a dupe? (cont’d)

WHAT’S

a dupe?

And now. . . a tool

Excel Connector Work with Salesforce data directly in Excel:

Insert, Update, Extract, Delete Too slow for large data volumes (typ. more than hundreds) No longer being updated, not supported by salesforce.com

(alas) xrl.us/ExcelConnector

Dreamforce session “Excel Connector: Your Golden Ticket to Clean Data” xrl.us/ExcelConnectorDemo

Apex tools (including Excel Connector) xrl.us/ApexTools

WHAT’S

my data?

De-duplicating2: What’s my data?

First stop: Names Examine the toughest cases to get a feel for all data

How do apparent dupes’ names seem to match?• Perfectly?

• Systematically?

• Randomly?

Sort Filter

WHAT’S

my data?

Initial data set

Look for a tough case, then sort and filter to get…

WHAT’S

my data?

(1) Sorted and (2) filtered on name

…a set of records to work with

WHAT’S

my data?

De-duplicating3: How do I match?

Look to other fields after Name1. Account numbers/IDs2. Phone numbers

• Scrub to raw numbers

3. Addresses• Select address fields that look promising• Concatenate as needed; Name & ZIP, Street & ZIP

HOW do I

match?

Numbers/IDs

…IDs/Numbers don’t help here

HOW do I

match?

De-duplicating3: How do I match? (cont’d)

Look to other fields after Name1. Account numbers/IDs2. Phone numbers

• Scrub to raw numbers3. Addresses

• Select address fields that look promising• Concatenate as needed; Name & ZIP, Street & ZIP

HOW do I

match?

Scrubbing phone numbersHOW do I

match?

Scrubbing phone numbers (cont’d)Easier to line up matches

HOW do I

match?

De-duplicating3: How do I match?

Look to other fields after Name1. Account numbers/IDs2. Phone numbers

• Scrub to raw numbers

3. Addresses• Select address fields that look promising• Concatenate as needed; Street & ZIP, Name & ZIP

HOW do I

match?

Concatenating address parts

…consistent addresses …consistent namesBetter when we have…

HOW do I

match?

De-duplicating3: How do I match?

Use these techniques Scrubbing Native & Third-Party Concatenating Native & Third-Party Address-standardizing Third-Party Address lookup Third-Party Fuzzy matching Third-Party Automated scoring Third-Party

Widespread merging requires a third-party tool CRMFusion’s DemandTools Trillium’s Diamond Data Informatica’s Data Quality Microsoft’s SQL Server 2005, while not specifically adapted to Salesforce,

provides fuzzy matching Many other solutions downstairs!

HOW do I

match?

De-duplicating4: How do I merge?

Designate a Winner rec with all proper data Modify records:

Third-Party: create “ID” field and “Winner/Loser” field Salesforce: modify Name field

• Winner:“merge” & generic Account Name & “winner”

• Loser:“merge” & generic Account Name

HOW do I

merge?

Modifying names in SalesforceHOW do I

merge?

Merging in the Salesforce UIHOW do I

merge?

Merging in the Salesforce UI (cont’d)HOW do I

merge?

De-duplicatingKey lessons

1. Establish what qualifies as a duplicate before starting2. Use a tough case to get a feel for the data3a. Determine which other fields can be used

Account numbers/IDs Phone numbers Addresses

3b. Determine which methods can be used Scrubbing Native & Third-Party Concatenating Native & Third-Party Address-standardizing Third-Party Address lookup Third-Party Fuzzy matching Third-Party Automated scoring Third-Party

4. Designate a Winner record and merge Losers into it

a dupe?

WHAT’S HOW do I

my data? match? merge?

Managing data loads better

Gotta import new records by tomorrow We’re creating a repeatable, documented process

“Just load it”…• …is difficult to audit after the fact

• …may not be reversible if I’ve made a mistake

Best practices for loading

Managing data loads better (cont’d)

Three steps:1. How should I map my data?

2. How can I automate the generating of CSVs?

3. How can I load data in an auditable way?

HOW do I

generate? load?map?

HOW do I generate? load?map?

And now. . . a tool

Data Loader Insert, Update, UPSERT, Extract, Delete via CSVs Supported by salesforce.com xrl.us/DataLoader

Apex tools (including Data Loader) xrl.us/ApexTools

HOW do I map?

And now. . . a tool (cont’d)HOW do I

map?

Relational Database: Data stored in tables Capable of importing CSVs Capable of creating joins

between tables

And now. . . a tool (cont’d)HOW do I

map?

Query functionality: Queries/views stored separately

from tables Queries/views use SQL (Structured

Query Language) Bonus: toggling between SQL and

graphical interfaces

Managing data loads better1: How do I map?

Create a mapping file Create list of fields in legacy file/s Create list of API field names (not the UI field labels!)

• Get them with Data Loader

Tie Force.com API field names to legacy fields

HOW do I map?

Grab the source field names

1

2

Transposed using Excel’s“Edit | Paste Special | Transpose”command

HOW do I map?

Grab the Force.com API field names

1

2Transposed using Excel’s“Edit | Paste Special | Transpose”command

HOW do I map?

Build out mapping file

An index columnfacilitates latersorting Not every

column has to map

Comment column is useful for concatenates, lookups

HOW do I map?

Managing data loads better2: How do I generate CSVs?

Import legacy data into Access Reusable tool – Great for subsequent loads

• Test-then-production• Pilot-then-phase 1

Change all fields to Text Concatenate strings to create query

HOW do I generate?

Importing into Access

1

2

HOW do I generate?

Managing data loads better2: How do I generate CSVs?

Import legacy data into Access Reusable tool – Great for subsequent loads

• Test-then-production

• Pilot-then-phase 1

Change all fields to Text Concatenate strings to create query

HOW do I generate?

Access table design

After import, change field types to Text

HOW do I generate?

Managing data loads better2: How do I generate CSVs?

Import legacy data into Access. Why? Reusable tool – Great for subsequent loads

• Test-then-production

• Pilot-then-phase 1

Change all fields to Text Concatenate strings to create query

HOW do I generate?

Concatenate strings to create query

Concatenation formula produces clauses suitable for pasting in SQL SELECT query

HOW do I generate?

Customize SQL where needed

Customized, concatenated DESCRIPTION field (as specified in mapping document)

Lookups to other tables would also require customization

HOW do I generate?

Managing data loads better3: How do I load?

Save query as CSV Folder naming:

Use Date & Task format: [YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]

Save source, success, error files in appropriate folder

Note how Auto-Match works with CSV

HOW do I load?

Folder naming

YYYY-MM-DD#NNYYYY-MM-DD#NNforces foldersforces folders

to sort into sort inchronologicalchronological

orderorder

HOW do I load?

Managing data loads better3: How do I load?

Save query as CSV Folder naming:

Use Date & Task format: [YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]

Save source, success, error files in appropriate folder

Note how Auto-Match works with CSV

HOW do I load?

Folder contents

Save source file, success files, and error files all in same directory

HOW do I load?

1

2

Folder contents (cont’d)

When loading errors:• Copy error file and• Use as source for next load

HOW do I load?

Managing data loads3: How do I load?

Save query as CSV Folder naming:

Use Date & Task format: [YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]

Save source, success, error files in appropriate folder

Note how Auto-Match works with CSV

HOW do I load?

Perfect auto-match

With a file created this way, Auto-Match gets every field mapped

HOW do I load?

Managing data loadsKey lessons

1. Build a mapping file to document how data gets transformed

2. Leverage a tool (in this example, Access) to generate CSVs Concatenating strings can help save time mapping

3. Use a folder & file name discipline to make your work self-documenting and easy-to-follow Use Date & Task format:

[YYYY]-[MM]-[DD]#[NN] [Object] [Task] [Desc]

HOW do I generate? load?map?

Keeping data clean

Record DeDup 1.4xrl.us/RecordDeDupV14

Data Quality Dashboardsxrl.us/DataQualityDashboardsV1

Custom Reports(go crazy!)

Session FeedbackLet us know how we’re doing!

Please score the session from 5 to 1 (5=excellent,1=needs improvement) in the following categories:

Overall rating of the session Quality of content Strength of presentation delivery Relevance of the session to your organization

We strive to improve, thank you for filling out our survey.

Additionally, please score each individual speaker on: Overall delivery of session

Q&A

Post-Session Questions?Ezra KenigsbergData Architect (Midwest) ekenigsberg@salesforce.com

Links: Successforce.com salesforce.com/developer Previous Dreamforce presentations:

xrl.us/DataDataData xrl.us/ExcelConnectorDemo xrl.us/SystemOverload

Data tools: xrl.us/ApexTools xrl.us/DataLoader xrl.us/ExcelConnector

Keeping data clean: xrl.us/RecordDeDupV14 xrl.us/DataQualityDashboardsV1

Extras:Ezra’s Toughest Data Projects

De-duplication Data normalization/denormalization

Mapping a table of addresses or phone numbers to specific fields according to rules

Data flow-through “If value is not in table A, use value in table B”

Forcing free text entries to behave like external keys Dealing with typos, whitespace

Attributing record ownership

Extras:Manage your time! Make shortcuts!

Get good with keyboard shortcuts We all know Alt+Tab… Alt+Spacebar and its subsequent accelerators Use accelerator keys…

• …in naming app & internet shortcuts• …in navigating menus

Learn individual apps’ keystrokes Shortcuts for apps and sites

Populate your Start Menu or Taskbar• Sort your Programs menu often

Ctrl+Alt+(letter) Use your Bookmarks/Favorites

Shortcuts for Salesforce tabs Shortcuts for Salesforce setup links Users want to go to a specific page?

• Help them put it on their Start Menus, Taskbars, or Desktops

Be curious!

…after

before…

Extras:Ezra’s favorite keystrokes

User-defined app shortcuts: Ctrl+Alt+(letter) Windows app menu: Alt+Spacebar Show desktop toggle: Windows+D Menu navigation: Alt, then letters Excel select row: Shift+Spacebar Excel select column: Ctrl+Spacebar Excel insert: Ctrl+Shift+Plus Excel delete: Ctrl+Minus Excel numformats: Ctrl+Shift+(number) Excel hide (unhide) row: Ctrl+(Shift)+9 Excel hide (unhide) column: Ctrl+(Shift)+0 Excel edit: F2

Extras:Other cool data tricks

Use FixID() to convert 15-char IDs to 18-char IDs Funnction available in Excel Connector

Use Pivot Tables to quickly list and count unique values Skip unneeded Wizard steps

Fill in blank cells in a table with a killer combo: Go To… Special | Blanks Ctrl+Enter

top related