managing a data cleansing process for material or service master data 20130529 (2)

Upload: technorule

Post on 05-Jul-2018

237 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    1/34

    Managing a Data CleansingProcess for Materials or Services

    Edition II A practical guide to cleansing master data using

    international data quality standards designedfor the military

    ECCMA White paperby

    Peter R. Benson

    2013-06-04

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    2/34

    Managing a Data Cleansing Process for Materials or Services

    Page 2

    Introduction

    If you are reading this it may be because you are evaluating a data cleansing project proposal, or you areabout to embark on a data cleansing project either using in house resources or contracting out theproject to a service provider. You may also be evaluating a range of software applications that aredesigned to make the project easier. Of course you could also be reading this because you are in themiddle of a data cleansing project and nothing is going according to plan. The project is late, over budgetor worse, not delivering the quality of data you expected.

    You may be staring into the abyss of a go-live date that, at first, appeared manageable, but now appearsimpossible with a process that is grinding to a halt as the number of problems and exceptions continueto grow. Even worse you may have cut corners to meet your go-live date only to be faced with an everincreasing number of complaints about the changes you have made or not made to the item names ordescriptions in your ERP system. No one can find what they are looking for and purchasing is filling in thegaps with free text purchase orders. With end users and senior managers all pointing to the quality of

    the data as the root cause of the failure, why did you get involved and how was it possible for such asimple project to go so horribly wrong?

    The answer lies in the fact that data cleansing or cataloging looks so deceptively simple, surely anyonecan do it. In fact, data cleansing is like any other process and while it can be successfully accomplishedon a very small scale by anyone with some common sense, it still takes a process and skill to besuccessful. Very much like cooking, there is an enormous difference between cooking for the family,cooking professionally for hundreds of guests, and designing, building and managing a foodmanufacturing process designed to consistently, efficiently and economically turn out millions of qualityitems.

    Cooking is a good analogy, first you need a description of the end product then you need the rightingredients, the right process and the right tools. Of course this is not all; it also takes experience andskill, if not a natural gift. As with most processes, specialization and industrialization has allowedcompanies to develop the tools to increase the speed and reduce the cost of data cleansing but inrespect of quality, in-house data cleansing will always deliver better quality data for the same reasonthat Machiavelli explained why a citizen army is always better than hired mercenaries, mercenaries fightfor money and citizens fight to protect their homes and their families, the motivation is different.

    Data cleansing (cleaning) or cataloging has come a very long way over the last ten years and as theoriginal author of the UNSPSC (United Nations Standard Product and Services Code) I am honored tohave watched the industry grow, not only in size, but in sophistication. The purpose of this document isto provide an insight into the process of data cleansing, to make it easier to evaluate data cleansingproposals and to make it possible for you to manage a data cleansing project with confidence.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    3/34

    Managing a Data Cleansing Process for Materials or Services

    Page 3

    Show me the money

    Regardless of where or how it is done, data cleansing costs money and justifying the cost is the first stepin any data cleansing project. The most common justification for data cleansing is cost avoidance or costreduction through part standardization or supplier rationalization. These are realistic goals that can beestimated with a reasonable degree of accuracy but they are frustratingly hard to sell to uppermanagement.

    Reducing costs while clearly necessary and vital to profitability is intrinsically hard to sustain over timesimply because of the law of diminishing returns. Once the low hanging fruit has been identified andharvested an ever increasing effort is required for an ever decreasing yield. The challenge is todemonstrate that the sustained efforts necessary to maintain quality data contributes to revenue andprofit in a significant measurable way and this is hard to do on cost savings alone.

    Master data plays a key role in almost every aspect of a business from identifying prospective customersto making a sale, creating and delivering a product or service as well as paying suppliers, contractors andemployees, not forgetting calculating and paying management bonuses and shareholder dividends.Rather than measuring savings in maintenance costs it is simply more effective to focus on the potentialfor increased output and reduced unit cost through reduced down time, improvements in production orbetter still convert increased output and reduced unit cost into cash flow or return on capital. Focusingon the impact quality master data has on growth and profitability is far more attractive and for goodreason, the return on investment (ROI) increases over time.

    In a steel plant I worked with, the IT and purchasing managers where trying to justify their datacleansing project based on savings in maintenance costs and improving the requisition to order process.Given the size of the plant and the quality of the data, the expected savings were significant both in

    terms of money and time, but they were unable to get the project passed by senior management. Thereason was simple, a quick analysis of finished product cost showed that maintenance cost includinglabor represented a mere 0.15% of total finished product cost. It was understandable that it should nothave been on management’s high priority list. But down time was clearly on management’s radar , astotal output and unit production costs are directly and immediately impacted by down time. Thecorrelation between maintenance and down time was clearly understood so all it required was to pointout that the predicted cost of improving data quality and the requisition to order process would becovered by reducing downtime by 2 minutes per day! With better data perhaps we could find the rootcauses of the problems and even reduce downtime by 5 minutes or even 15 minutes which wouldrepresent a ROI of 750%. With a full order book and senior managers struggling to increase output,

    restating the data cleansing project benefits in terms of potential increased output capacity made theproject an immediate top priority but it also did something else, it created a clear understanding thatmaintaining data quality would be key to maintaining output and the acceptance of the need forsustained funding for the data quality program.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    4/34

    Managing a Data Cleansing Process for Materials or Services

    Page 4

    Fundamental principles:

    Data is a critical asset to all organizations and as can be seen in the following illustration of the differenttypes of data and how they are related, master data plays a key role.

    master data

    data held by an organization that describes the entities that are both independent and fundamentalfor that organization, and that it needs to reference in order to perform its transactions

    EXAMPLE: A credit card transaction is related to two entities that are represented by master data. Thefirst is the issuing bank’s credit card account that is identified by the credit card number, where themaster data contains information required by the issuing bank about that specific account. The second isthe accepting bank’s merchant account that is identified by the merchant number, where the masterdata contains information required by the accepting bank about that specific merchant.

    NOTE 1 Master data typically includes records that describe customers, products, employees, materials,suppliers, services, shareholders, facilities, equipment, and rules and regulations.

    NOTE 2 The determination of what is considered master data depends on the viewpoint of theorganization.

    NOTE 3 The term "entity" is used in the general sense, not as used in information modeling.

    [ISO 8000-110]

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    5/34

    Managing a Data Cleansing Process for Materials or Services

    Page 5

    A master data record in an ERP system includes many different data elements controlled and managedby different business functions. S ome are general or “basic” data elements and some are functionspecific. Items that are stocked or inventoried will need minimum stock levels and reorder quantities, aswell as, lead times. Every item needs a name and almost everything is going to need a price and apurchase order description, although it is regrettably common to see that when this is mandatory, theitem name is often copied into the purchase order description field.

    Data cleansing, as its name implies, is a process of transformation; it consists of taking one set of namesand descriptions and creating another set of names and descriptions.

    Data cleansing is the process of improving the quality of the names and descriptions of an itemtypically in an ERP application.

    The data cleansing process can be broken down into a two step process where in the first step theoriginal name and descriptions are deconstructed and then enriched to create a structured master datarecord which in the second step is used to build new names and descriptions.

    The process of building the structured master data record is called cataloging and the process used totransform a structured master data record into descriptions is called rendering .

    Both the cataloging and rendering processes are driven by rules. The rules for cataloging are containedin data requirements (DR) also known as cataloging templates or identification guides, these are theactual data quality standards. The rules for rendering are contained in rendering guides (RG) ordescription rules.

    When the process we know today as data cleansing was originally being developed, contractors wouldperform the transformation from the original item names and descriptions to the new item names anddescriptions without providing their customers with copies of the cataloging templates, the structuredmaster data, or the rules used for creating the new names and descriptions. This is rarely the case todayas most customers realize that without the rules and the structured master data, they can never beindependent of the contractor.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    6/34

    Managing a Data Cleansing Process for Materials or Services

    Page 6

    ISO 22745 was designed as an international standard to ensure that the data needed to cleanse masterdata could be preserved independently of any software application and easily exchanged between datacleansing applications or services to encourage the competition that has resulted in better quality at alower cost.

    ISO/TS 22745-10 is the international standard for representing an open technical dictionary

    ISO/TS 22745-20 is the international standard for maintenance of an open technical dictionary

    ISO/TS 22745-30 is the international standard for representing computer processable data requirementsusing an open technical dictionary.

    ISO/TS 22745-35 is the international standard for requesting data that meets specified datarequirement

    ISO/TS 22745-40 is the international standard for the exchange of structured master data using an opentechnical dictionary

    ISO/WD 22745-45 is the international standard for representing computer processable rendering guidesusing an open technical dictionary.

    ISO 8000 was designed as a standard to be used specifically for contracting for quality master data

    Cataloging

    The first step in the journey to better descriptions is cataloging. This is the process of describingsomething, anything, and it is indeed an ancient art. Aristotle was struggling with descriptions over2,350 years ago when he wrote categories (http://classics.mit.edu/Aristotle/categories.html ). Catalogingidentifies the discrete characteristic of something in the form of property-value pairs where theproperty provides the meaning of the value. Examples of properties are height, weight and material.Properties are used to represent characteristics and the first rule of cataloging is that the propertiesmust be explicitly defined; this is typically done using a dictionary. While properties are used to definethe meaning of values, it is often useful to further define values in terms of their data types, a date, ameasure, a numeric value, a text string; these are examples of data types.

    Many items may be described using the same properties with different values. For example, you candescribe many individuals using the properties of name, date of birth, place of birth; the propertiesremain the same only the values change. A group of items that can be described using the sameproperties is called a “Class”. A class name is therefore nothing more than the name for a group ofproperties, typically it will be used in naming the item and it will also be used in its descriptions.

    http://classics.mit.edu/Aristotle/categories.htmlhttp://classics.mit.edu/Aristotle/categories.htmlhttp://classics.mit.edu/Aristotle/categories.htmlhttp://classics.mit.edu/Aristotle/categories.html

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    7/34

    Managing a Data Cleansing Process for Materials or Services

    Page 7

    Although it is possible to have several data requirements for a single class, this is not common and usersoften use the class name as the name of the data requirement, this is a common source of confusion.While we are on the subject of sources of confusion, if you look back at the example of a structuredmaster data record, you will see that the class is included in the list of properties, yes the class is indeeda property and the class name of a material should not be confused with the class name of aclassification.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    8/34

    Managing a Data Cleansing Process for Materials or Services

    Page 8

    Data quality“Lex parsimoniae ”, the principle of the law of thrift that has become known as Ockham's razor appliesto cataloging, the fewer properties you need to uniquely describe an item the better.

    Choosing the right data requirement is the key to successful cataloging. What you are looking for is just

    enough data to be useful. The definition of data quality is the degree to which the data complies withthe data requirement. You need to set the bar high enough to achieve your goals but no higher, as thiswill incur unnecessary additional costs.

    Data that exceeds the data requirement is not better data; it is just more expensive data.

    The best way to define your initial data requirements is to perform a scoping study. Performed correctly,a scoping study can identify your initial dictionary, your initial data requirements and your initialdescription rules. The emphasis is on “initial” because your dictionary, data requirements anddescription rules will evolve over time as you become more familiar with the role data plays in yourcompany.

    Duplicate and substitutable itemsYour initial data requirements will need to be set to allow you to identify duplicate and substitutableitems. The concepts of duplicate and substitutable are different.

    Duplication applies to items of production . Duplicate items are created when a single item is givenmultiple numbers by a manufacturer or supplier or when items are manufactured to a standardspecification. Most buyers cringe at the very thought that a manufacturer or supplier should usedifferent part numbers for the same item, but they do. The reason some suppliers advertise that you willnot find the same item at a lower price elsewhere, is that they know that the manufacturer issued thepart number specifically for them and the exact same item has a different part numbers when it is sold

    through a different supplier. If this was not bad enough, part numbers and model number have becomebrands in their own right, so a manufacturer may keep the part number or model number while makingwhat, in their opinion, is a small insignificant change in features or designs. To you, these features ordesigns may be important. If you have ever ordered a replacement part only to find it no longer fits, youwill understand the nature of the problem and why it is always safer to order something using a fullspecification.

    Substitution applies to items of supply . Substitutable items are created when several items are given asingle number by a buyer. Identify substitutable items is very important to buyers, it is the primarymethod of reducing price and risk by leveraging competition.

    The true skill of a cataloger lies in their ability to understand and identify substitutable items.

    True duplicate items are easier to identify and safer to group under a single material record but as wehave seen, the part number alone is not a reliable indication of duplication. Duplication should alwaysbe determined by comparing characteristic data.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    9/34

    Managing a Data Cleansing Process for Materials or Services

    Page 9

    In the following structured master data record both the characteristic and the identification data areshown:

    Items that share the same characteristics are substitutable and you can merge the records by groupingall the identification data under a single material, however considerable care needs to be taken indetermining which items are duplicates or substitutable. The characteristic data for each items needs tobe considered carefully, as well as, the characteristic that you consider to be fundamental. Removingcharacteristics will cause materials to become substitutable and adding characteristic will causematerials that were substitutable to become different.

    The skill in cataloging is to be able to use both the characteristic and the identification dataappropriately. If you are trying to buy one tire you should use the part number (identification data) butif you are requesting quotes for hundreds of tires you should use the characteristic data.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    10/34

    Managing a Data Cleansing Process for Materials or Services

    Page 10

    External identification data is copyrightThe types of identification data can include a NATO Stock Number (NSN) issued by a NATO catalogingbureau, a Supplier Part Number (SPN), a Manufacturer Part number (MPN) or a Standard ReferenceNumber (SRN). Other types not shown include a Drawing Reference Number (DRN) or a Global ItemNumber (GIN) issued by the GS1, an international association of retail item numbering organizations.Finally it is also common to include the Buyer Material Numbers (BMN) or Stock Keeping Unit number(SKU) which are the identifiers typically issued by other buyers. This is common when there are multiplebusiness units within a group or when a group of companies share a common catalog.

    Your passport number, your vehicle identification number, your vehicle registration number, your clubmembership number, your telephone number, a tax payer identifier; these are all familiar identifiers.Just as every master data record you create will have an internal identifier; external identifiers aretypically the internal identifiers of other organizations.

    An iden tifier is created by an “author” , all identifiers are copyright, they are the legal property of theirauthor and the author is the “authoritative” source of the characteristic d ata that was used to assign theidentifier. Taking this into consideration it also follows that you should never use an external identifieras your internal identifier and you must exercise great care in how you use external identifiers.Including external identifiers in your master data is an acceptable use as is using external identifiers asinternal search parameters but you must be careful to clearly identify the source of any characteristicdata or other external identifiers that are retrieved as the result of a search using an external identifier.

    This is often hard to follow and an example helps. A D-U-N-S Number issued by Dun and Bradstreet(D&B) is essentially a proprietary product number; it identifies a collection of data that belongs to D&B.While you can store the D-U-N-S numbers that were assigned to your trading partners by D&B, youshould only use these numbers to buy data from D&B. What you cannot do without a license is to use a

    D-U-N-S Number to lookup or distribute data that did not come from D&B. When you think of it, this iseminently reasonable, imagine a third party selling credit data or address verification data using yourinternal customer or vendor identifier. While it is acceptable to include D-U-N-S Numbers asorganization identifiers in your vendor or customer master data and to display this number in reports,you should not allow the use of the D-U-N-S Numbers as a lookup field even within your organization ifyou do not have a license to do so (to my knowledge only one organization, the federal government, hasbeen granted license to use the D-U-N-S Numbers for public search of data that does not belong toD&B).

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    11/34

    Managing a Data Cleansing Process for Materials or Services

    Page 11

    The data cleansing building blocks

    The ISO 8000 defines quality master data as portable data that meets requirements.

    This idea of standardized cataloging originated in 1950 with the development of the NATO catalogingSystem (NCS) and is at the heart of all cataloging and data cleansing today. The principle is very simplethe master data must conform to a specified data requirement and both the master data and the datarequirements are coded using a common dictionary, this makes the master data portable.

    Structured master data

    The structured master data is the key to the data cleansing process; it is composed of identification dataand characteristic data. The identification data can be a manufacturer’s model number, a supplier’s partnumbers or even a drawing number or a standards reference number. What is important to rememberis that identification data are third party identifiers controlled by third parties, so knowing who assigned

    the identifier is as important as the identifier itself. An item identifier combined with identifier of theorganization that issued the item identifier is called an item reference.

    Characteristic data is data that describes the physical characteristics or performance characteristics ofan item. NATO refers to this as the Fit, Form and Function of an item.

    Cataloging is the process of creating a structured master data record that conforms to a datarequirement.

    Both identification and characteristic data are represented in the form of property-value pairs where theproperty gives meaning to the value. The definitions of the properties, as well as, any other conceptsused in cataloging are contained in the dictionary.

    Structured Master Data

    Identification data Characteristic data

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    12/34

    Managing a Data Cleansing Process for Materials or Services

    Page 12

    Data requirementsData requirements are the data quality standards against which the structured master data ismeasured.

    The data requirements are the cataloging rules; they define what data elements are “required” or

    optional. Comparing the structured master data against the data requirements is how you identify themissing data. As we will see, data requirements play an important role in data acquisition when they areused to create requests for missing data or for data validation.

    In advanced data requirements you can also assign a data type to a property, as well as, validation rulessuch as a limited list of codes or a specific unit of measure or a numeric range. You can also define a“mask” for example if you want a str ing to be uppercase or the number of points after a decimal or aspecific date format. These advance features of data requirements are typically used in designing datacapture systems.

    Items that are described using the same characteristics are said to be members of a class . For example

    all the bolts in your structured master data could be described using the same properties of TYPE,MATERIAL, LENGTH, if this were the case all the materials would belong to the class BOLT. It is possiblefor a class to have several data requirements, for example, one for engineering that documents the dataneeded by engineering and one for procurement that document the data needed for purchasing but inpractice most companies create a single data requirement and name the data requirement after theclass. This is actually not a good practice and I recommend that data requirements be simply identifiedusing a reference number. The following is an example of two data requirements:

    Data Requirement reference DR1 DR2Date updated 2013-05-21 2013-05-21

    Class Name VALVE,BALL BEARINGCharacteristic data

    Mandatory Property 1 THREAD CLASS TYPEMandatory Property 2 BODY MATERIAL INSIDE DIAMETERMandatory Property 3 PIPE SIZE OUTSIDE DIAMETERMandatory Property 4 CONNECTION STYLE WIDTH

    Mandatory Property 5 MAX PRESSUREOptional Property 6 LOAD CAPACITYOptional Property 7 SPEED RATINGOptional Property 8 SEALING METHOD

    Optional Property 9Optional Property 10

    Identification data

    Mandatory Property 11MANUFACTURER

    REFERENCE(PREFERRED)

    MANUFACTURERREFERENCE

    (PREFERRED)Optional Property 12 NATO STOCK NUMBER

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    13/34

    Managing a Data Cleansing Process for Materials or Services

    Page 13

    The dictionaryThe only difference between an open technical dictionary and the dictionary you used at school is thatthe concepts defined in the dictionary are given a concept identifier; this makes it easier to link theconcepts used in your structured master data and your data requirements to the definitions in yourdictionary.

    Replacing concept names with their identifiers is called concept encoding and its purpose is to makesure your data is unambiguous. Computers love encoded data, the identifiers are shorter than wordsand much faster to process but of course in order for humans to make sense of encoded data it needs tobe decoded.

    The role of the dictionary is to encode data and to decode data. One of the significant benefits of using adictionary and concept encoding is that by using a multilingual dictionary it is possible to encode usingone language and decode using another. In practice this means that it is possible to catalog in English forexample and render description in French, automatically. This process has been used successfully bycompanies creating multilingual ERP descriptions with one company successfully using items catalogedin English to create item names and descriptions in twenty nine languages.

    You can build a dictionary using a spreadsheet with columns for concept identifiers, term and definition.ISO 22745-10 is the international standard for representing an open technical dictionary; it is a veryuseful model for dictionaries used to render complex, defined length names and descriptions ormultilingual names and descriptions.

    The following example dictionary was created as a subset of the ECCMA open technical dictionary. Theconcept identifiers have been abbreviated by removing the leading 0161-1# and the trailing #1 whichare required constants when exchanging standard compliant concept identifiers. In the eOTD not all theterms are in capitals or the definitions in mixed case, so these were converted to make the dictionarylook more attractive.

    Every company must maintain its own dictionary, creating it as a subset of the eOTD simply makes thetask easier.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    14/34

    Managing a Data Cleansing Process for Materials or Services

    Page 14

    Conceptidentifier

    Concepttype

    Term Abbrev Definition

    01-1142515 Class BEARING A device that supports and allows a rotating shaft to runwithout damage by reducing friction

    01-1145956 Class VALVE,BALL A device utilizing a ball of varying configurationsconnected to a shaft which rotates to control or blockflow in a pipe

    02-095207 Property TYPE A generic type, or configuration of the object

    02-014725 Property BODY MATERIAL The chemical compound or mechanical mixtureproperties of which the body is fabricated

    02-005366 Property INSIDEDIAMETER

    ID The length of a straight line which passes through thecenter of an item, and terminates at the insidecircumference

    02-006986 Property OUTSIDEDIAMETER

    OD The length of a straight line which passes through thecenter of a circular figure or body, and terminates at

    the outside circumference02-010188 Property WIDTH W A measurement of the shortest dimension of the item,

    in distinction from length

    02-016927 Property LOAD CAPACITY LC The weight the item can accommodate

    02-101753 Property SPEED RATING SR The maximum safe operating speed or rotational speed(rpm)

    02-019192 Property SEALINGMETHOD

    The means by which the item is sealed

    02-128590 Property MANUFACTURERREFERENCE(PREFERRED)

    MR(P) A preferred reference consisting of the manufacturername and manufacturer assigned part number

    02-128594 Property NATO STOCKNUMBER

    NSN A number issued by a NATO codification bureauidentifying an item of supply

    02-128591 Property SUPPLIERREFERENCE

    SR A reference consisting of a supplier name and supplierassigned part number

    02-024128 Property THREAD CLASS A numeric-alpha designator indicating the pitch-diameter tolerance and the external or internal locationof the thread.

    02-007268 Property PIPE SIZE Designates the size of the pipe.

    02-024592 Property CONNECTIONSTYLE

    The style designation indicating the configuration thatmost nearly corresponds to the appearance of theconnection.

    02-093392 Property MAX PRESSURE The maximum operating pressure that the Packing isdesigned to withstand

    05-003934 UOM INCH " A unit of linear measure equal to one twelfth of a foot(2.54 cm)

    07-000255 VALUE STEEL COMP 316 SS See industry standard

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    15/34

    Managing a Data Cleansing Process for Materials or Services

    Page 15

    Rendering guidesRendering guides are a recent addition to a cataloger’s tool kit; they are covered in ISO/WD 22745-45,the most recent addition to the cataloging standards. A rendering guide is an extension of the datarequirement; it specifies the sequence of properties that should be used in a name or description, aswell as how the property value pairs of the characteristic and identification data should be representedin the name or description.

    Most typically there are generic rules that apply to all names or descriptions followed by rules that applyto one or more classes and finally there may be a rule that applies to a specific item. The following is anexample of general rendering rules followed by two rendering guides for the same item class, one is forthe item name and the second is for the purchase order description.

    GENERAL RULES SEPARATORS

    CLASS NAME - CHARACTERISTIC DATA ": "

    PROPERTY VALUE PAIRS ", "PROPERTY NAME - PROPERTY VALUE "="

    CHARACTERISTIC DATA - IDENTIFICATION DATA "; "

    SPECIFIC DESCRIPTION RULES THE RULE SPECIFY THE ORDER OF THE PROPERTIES (P1…Pn) AND IF THE PROPERTY NAME (N) IS TO BE SUPPRESSED(S) OR ABBREVIATED (A) OR IF THE PROPERTY VALUE (V) IS TO BE ABBREVIATED (A)

    DESCRIPTION RULE REFERENCE RG1 RG2

    BASED ON DATA REQUIREMENT REF DR1 DR1

    DESCRIPTION TYPE ITEM NAME PURCHASE ORDER

    ITEM CLASS VALVE,BALL VALVE,BALL

    RULE CN: P2NSVA, P3NSVA, P5NSV CN:P3=V, P5=V, P2=V; P11=V

    EXAMPLE VALVE,BALL: SS, ¼”, 2500PSI

    VALVE,BALL: SIZE=1/4 INCH,MAX PRESSURE=2500PSI, BODYMATERIAL=STEEL COMP 316;MANUFACTURER REFERENCE(PREFERRED)=PARKER:4Z-MB4LPFA-SSP

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    16/34

    Managing a Data Cleansing Process for Materials or Services

    Page 16

    The following is an example of a structured master data record containing descriptions rendered fromcharacteristic and identification data.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    17/34

    Managing a Data Cleansing Process for Materials or Services

    Page 17

    ClassificationsThe key to Classifications is to understand that they are derived from, the characteristic data .

    If a classification is provided without the characteristic data from which the classification was derived,the classification cannot be verified.

    All classifications are designed for a specific purpose so it is common to need more than oneclassification. Some of the classifications will be an internal classifications, for example for spendanalysis and some will be external third party managed classifications.

    Some classifications can be derived from the item class and many third party classifications can beautomatically assigned using one of the many eOTD commercial classification lookup tables maintainedby ECCMA. In some cases the class is not sufficient to determine the classification and another propertymust be used in conjunction with, or instead of, the class. An example of this is the customs tariff code(HTS) where the material typically determines the classification.

    If the classification changes these lookup tables or other classification rules will need to be reapplied toupdate the classifications. For this reason it is not recommended to maintain classifications in a master

    data record unless they are regularly used in search or reporting functions.While I am the original author of the UNSPSC, responsible not only for its name and design rules but alsofor the process used to create and maintain it, perhaps it is only fitting that I also recognize its weaknessand in doing so the weakness of all classifications. As the name implies, a classification is an organizationof classes. In a hierarchical classification, classes are grouped into super classes, themselves groupedinto super classes. These groups of classes are called the “nodes” of a classification and they are also

    Externallymanaged

    Internallymanaged

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    18/34

    Managing a Data Cleansing Process for Materials or Services

    Page 18

    given names. Hierarchical classifications are referred to as tree structures with root or leaf nodes or withparent and child nodes.

    As we started, to better define classes as, collections of characteristics, it followed that a superclassshould include the characteristics of all of its subclasses and also a subclass should not contain a

    characteristic that was not inherited from its parent class. When you apply this logic many of thehierarchical classifications used in procurement and spend analysis such as UNSPSC, eCLass, CPV or NIGPstart to break down.

    The UNSPSC was designed as a standard spend analysis classification specifically for the purchasing cardindustry. The plan was to encourage merchants to adopt Level III credit card processing in which eachline item has a description and a supplier assigned UNSPSC commodity classification. The objective wasto be able to provide better accounting for what was being bought with a company purchasing card andto actually decline the card at the point of sale when the corporate accounts department flagged aUNSPSC codes as “decline” for a specific individual or g roup of individuals. In theory, at least this would

    solve the problem of the high number of refusals caused by using the much more generic merchantclassification (which is what is still used today). Despite the merchant’s unwillingness to pay the extracost required to implement Level III credit card transactions, the concept was flawed in that it relied onthe seller classifying what they sold. Over and above the challenge of suppliers using different versionsof the classification, we quickly found that suppliers wanted to classify what they were selling in as manyUNSPSC classifications as possible, even going to the extent of giving the same product many differentnames and bar codes in order to be listed under the different classifications, if only one was allowed perproduct. We also found that the codes used by sellers to classify their products were not the code thebuyers wanted the item to be classified under. Largely abandoned for the purpose of managingpurchasing card transactions, the UNSPSC then became used by suppliers as a catalog classification.

    Luckily, just as we were developing the UNSPSC and others were developing yet more material orservice classifications, high speed and high relevance text search was coming into its own. Today, thesethird party managed classifications are of limited value and most companies realize that they need tomanage multiple classifications, not only to satisfy their customers, but their own internal requirements.More important, a classification can never replace a good description and buyers now realize that theyneed to obtain the characteristic data from their suppliers so that they can name, describe and classifyitems in whatever manner suits their internal operational requirement.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    19/34

    Managing a Data Cleansing Process for Materials or Services

    Page 19

    Rendering names and descriptionsDescriptions are created (rendered) from both the characteristic and identification data in thestructured master data record.

    The difference between names and descriptions that were manually entered versus automatedrendered can be observed in the consistency of the terminology and the formatting of the names and

    descriptions. It is possible for manually entered data to be consistent particularly in a small welldisciplined group with minimal turnover but this is rarely the case and a quick glance at most materialmaster data will identify inconsistencies in the use of terminology, as well as, in the formatting of namesand descriptions. It is very expensive to build a search engine that is tolerant of a lack of consistency innames and descriptions, a simple space of change in a character can cause items to be missed. To acomputer “ORING”, “ O RING”, O -RING” , “O’RING” and” O/RING” are very different.

    Automating the process of creating names and descriptions through the application of rendering guidesnot only creates consistency in the use of terminology and formats but also consistency across thenames and descriptions of all the items that belong to the same class. Automating this process alsoallows changes to be made to a large number of items very quickly.

    Item names and descriptions need not be static, in fact the ability to change them, as required is one ofthe more useful features of an item master. What does not change is the material number.

    Item names and descriptions must be useful and both requirements and rendering preferences changeover time so it is important to be able to respond to requests to change item names or descriptions. It isimportant to create a culture and a process where users can recommend changes and be confident thatthey will be acted upon, or as an alternative they will find a work around, typically adding another itemrecord, same item but with the name or description they asked for. The following is a work flow forresolving issues tha t arise when a name or description is “not acceptable”.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    20/34

    Managing a Data Cleansing Process for Materials or Services

    Page 20

    Name or descriptionis not acceptable

    All the required data inthe description?

    Required data in themaster data record?

    Change theDescription Rules

    Yes

    Data requirementincludes the property

    or coded value?

    Add data to masterdata record

    Yes

    Create new name ordescription

    Property orcoded value in the

    dictionary

    Add property orcoded value to data

    requirement

    Add property orcoded value to

    dictionary

    Yes

    New name anddescription

    YesNo

    No

    No

    No

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    21/34

    Managing a Data Cleansing Process for Materials or Services

    Page 21

    Data cleansing workbook:

    The data requirement (cataloging template) is the data quality standard that drives the data cleansingprocess, the terminology used in the data requirements must be defined in a dictionary and finally thereare the rendering guides. In addition to these three resources you will need an organization lookup tablethat list the names of organizations that are used in the identification data to form the item references,as well as, in both the characteristic and identification data to identify the source of the data(provenance). While it is possible to include organization data in the dictionary, it is easier to managethis data separately in an organization table.

    The following are the tables that you will need in your data cleansing workbook;

    1. Data Requirements2. Dictionary3. Rendering Guides4. Organization identifiers

    The scoping study

    A scoping study is a critical part of any data cleansing project. The objective of the study is to provide ananalysis sufficient to define the quantity and quality of the source data, as well as, a framework formeasuring the quantity and quality of the data to be delivered and the anticipated level of effortrequired to perform the data cleansing task. Undertaking a data cleansing project without a scopingstudy is like building a house without a plan. It can be done, but the result is rarely predictable even ifthe frustration is.

    Data CleansingWorkbook .xlsx

    Available from www.ECCMA.org

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    22/34

    Managing a Data Cleansing Process for Materials or Services

    Page 22

    The source data for the scoping study is the material or service master as well as the vendor master andpreferably three years purchase order line item detail.

    The purchase order data is used to identify the active materials and vendors and to provide an estimateof the level of duplication. The a nalysis will also identify the all important “free text” transactions. These

    are items where the material or service master reference is missing.The next step in the analysis is the identification of key vendors. The level of effort required to obtainthe characteristic data from which the new descriptions will be created will depend on the levelcooperation from the suppliers and to a large degree this will depend on the nature of the relationship.Everything else being equal, the more money you spend with a vendor the more likely they are to payattention to your request for data, but how you ask and what you ask for, can also make a hugedifference. The nicer you are and the more specific the request the better.

    The next step is to group materials into data cleansing strategies and priorities. The groups on the rightof the matrix, where there are a high number of suppliers, are the groups where vendor rationalization

    will yield the most return with group six being the top priority. The groups on the left of the matrix arehigh risk categories with group five being the top risk group. Whenever there are a low number ofsuppliers the emphasis should be on contracts as well as monitoring the financial well being of yoursuppliers, conversely when the numbers of suppliers are high, close monitoring of market trends is thebest strategy.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    23/34

    Managing a Data Cleansing Process for Materials or Services

    Page 23

    Outsourced data cleansing

    The data cleansing industry typically categorizes material master and service master data in accordancewith one of the following quality levels. The purpose of the data cleansing process is to move data fromLevel 1 to Level 4.

    Phase 1: taking data from level 1 to level 2

    Source Data ExtractionBefore the data cleansing process can begin data must be extracted from the source systems. These aretypically ERP systems but they can also be specialized procurement, production, inventory ormaintenance planning systems, basically anywhere there is a name or a description of an item.

    The data entering the process should be at a minimum Level I data, if there is insufficient data to

    purchase the item from a known source, then it is very unlikely that the data cleansing process will beable to improve the description.

    • The item is identified in terms of its source of supply and has areference number sufficient to successfully place an order forthe item from that source.

    Level 1

    • The item is identified and has been assigned a class sufficient toallow the classification of the item for spend analysis.Level 2

    • The item is identified and partially described some of theproperties specified in the data requirement have beenprovided. The data is useful for some of the business functions.

    Level 3

    • The item is identified and all mandatory properties in the datarequirement have been provided, it can be competitivelysourced based solely on its description and the data meets the

    requirements needed to support all known business functions.

    Level 4

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    24/34

    Managing a Data Cleansing Process for Materials or Services

    Page 24

    The reference number specified in Level I can however be very source specific as in “Joe, can you sendme 100 boxes of those small screws we buy from you? ”. As long as Joe understands what you mean by“small screw” it is considered a valid Level I reference, at least in the context of ordering screws for Joe.In some instances where the item is a true commodity that can be obtained from many sources thereference may be a standard or a published specification such as, a military item specification (MIL-SPEC).

    Typically, the data cleansing service provider will supply a data extraction template and a good serviceprovider will spend some time getting to know your system to ensure that whatever data you have thatmay be useful to the process is not overlooked.

    Many data cleansing service providers will ask for the vendor master and this is actually a good sign asthis indicates that they will probably be looking to see if they can find some of the item descriptions inyour supplier’s electronic catalogs. You should consider setting ground rules regarding communicatingwith your suppliers.

    Before the data cleaning work begins, a good service provider will look to see if there are any flags in themaster data that indicate obsolete items. A good service provider will also ask for a twelve or twentyfour month extract of your purchase order transaction file and they will use this to identify high valueand frequently purchased items as these need to be prioritized.

    Reference Data ExtractionTypically, descriptions contain manufacturer or vendor names, part numbers or other reference datasuch as references to standards or drawings. This process analyzes the descriptions and extractspotential reference data.

    If the original Level I description was “HOSE, 1/4" X 250 FT. GRAINGER 5W553” the extracted reference

    data would be GRAINGER 5W553. At this stage there is no way of knowing if it is a supplier or amanufacturer reference.

    Potential Duplicate Identification Based on Reference DataThis is largely an automated process that does not yield significant number of duplicates but a goodservice provider will include it in the file preparation work. This process includes de-duplication whereitems that have identical descriptions and the identification of those items that have similar referencedata. The items with similar reference data are marked as potential duplicates and they are eitherreviewed by the customer or by one of the service provider domain experts. At this stage what is lookedfor are obvious duplicates, so duplicate identification is based on very close matching of reference data.

    Class AssignmentThis is a process in which descriptions are analyzed and the item is assigned a class from the eOTD. Thiswill bring line items to Level 2 quality data. The assignment of a class is based on the original descriptionand is not definitive. The class may be modified when the data required by the data requirement isextracted, the item is researched or the item is physically inspected during walk down.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    25/34

    Managing a Data Cleansing Process for Materials or Services

    Page 25

    Class assignment is a direct replacement for the older UNSPSC classification; the process is similar andlargely automated but considerably more accurate and reliable. Assigning the UNSPSC or othercommercial classifications (CPV, eClass, NIGP) to an item once an eOTD class has been assigned is simplya matter of applying a table look-up and it is a completely automated process.

    If the original Level 1 description was “HOSE, 1/4" X 250 FT. GRAINGER 5W553”, searching for the classconcept of HOSE would have resulted in the eOTD class concept 0161-1#01-087529#1 associated withthe term “HOSE” and the definition “A flexible pipe, of rubber, plastic, etc. may be reinforced, designedto convey liquid, air, gas, etc.”

    Level 2 analysis

    Potential Duplicate Identification Based on Class and Reference DataUnder this process, items are grouped by class and the combination of the class and the reference datais used to identify potential duplicates. In NATO this is referred to as the SCREENING process. Partialreference data matching within a class is an efficient and reliable way to identify potential duplicates.The items identified, as a result of the process, are marked as potential duplicates and a report isgenerated. Duplicate resolution itself is a separate process that requires physical verification followed byresolution in the master data and procurement records. The potential duplicate based on class andreference data report is typically one of the first indicators of the benefits that can be expected from adata cleansing project.

    The combination of the report with unit price and minimum stock levels will provide a reliableindication of the savings that can be expected from an inventory rationalization project .

    The combination of the report with the Purchase Order transaction file will provide a reliableindication of the savings that can be expected from a vendor rationalization project .

    Spend Analysis Classification MappingAs we saw under classifications, most companies develop and maintain a number of spendclassifications for example a spend classification that rolls up to the chart of accounts and another spendclassification that groups items by procurement specialty. If these classifications were created asgroupings of the root class taken from the eOTD then the assigned eOTD class can be automaticallymapped to other classifications as the eOTD class is mapped to the UNSPSC, the CPV and several othercommercial classifications making adding third party classifications an automated process.

    For example the eOTD concept of 0161-1#01-087529#1 “Hose” would be classified as “40.14.20.00Hoses ” in the UNSPSC version UNv120901 or “ 37-11-01-90 Hose (w/o conn., unspecified) ” in eClass

    version 8.0 or “ 44165100 Hoses” in CPV -2007 or “ 460-00 HOSE, ACCESSORIES, AND SUPPLIES:INDUSTRIAL, COMMERCIAL, AND GARDEN” in NIGP.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    26/34

    Managing a Data Cleansing Process for Materials or Services

    Page 26

    Phase 2: taking data from level 2 to level 4

    Develop Data RequirementsThis is the most critical part of the data cleansing process and by far the most important. Many datacleansing companies differentiate themselves by their domain expertise but you need to own andcontrol the data requirements used to clean your data. The development and validation of the datarequirements used to clean your data represent one of your major investments in the data cleansingprocess.

    These data requirements will be used to verify the quality of your master data, to maintain your masterdata going forward and will be the key component in your ability to request the data you need fromyour suppliers. It cannot be stressed too highly that a data cleaning process that does not provide youwith access to these data requirements is to be avoided. You can ask that your data requirements areeither registered and published in the ECCMA Data Requirement Registry (eDRR) or are given to you in aform that you can use with any commercial off the shelf cataloging or data cleaning software program.

    The preferred format is eOTD-i-xml, this is an ISO 22745-30 compliant format.If you have decided to work with a company that uses proprietary data requirements you mustnegotiate a license to use these data requirements after the data cleaning project is completed, as theyare an integral part of your master data.

    The following is an example of a data requirement that includes a data type and specified units ofmeasure.

    Class HOSE 0161-1#01-087529#1

    INTERIOR DIAMETER numeric, mm mandatory

    Length numeric, m mandatory

    MATERIAL text optional

    COLOR text optional

    The development of the data requirements will determine the cost of the data cleansing project. Whilethere is a cost associated with poor quality master data that does not support the needs of a business, itis also possible to over specify data requirements and as a result, spend more than is necessary on datacleansing.

    As we saw earlier data requirements change over time and according to need. The best way to deal withthis is to work to satisfy the most obvious and well known of your current data requirements and acceptthat as new requirements are identified, some of the descriptions will need to be reworked.

    It is better to start with simple data requirements as these will not only lower the cost of the datacleaning project but you will find it much easier to keep the project on track.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    27/34

    Managing a Data Cleansing Process for Materials or Services

    Page 27

    Value ExtractionThis process consists of analyzing the original item descriptions using the data requirements as a guideand extracting properties and their associated values. Value extraction is considered complete when allmandatory properties and their values as specified in the data requirements are populated.

    Value extraction is a semi-automated process; it requires a high degree of domain and technicalexpertise, as well as, quality control. Despite what some service providers’ claim, you cannot extractdata that is not there, but of course you can research missing data. This is a much more expensiveproposition.

    If you are considering cleansing your master data it is because the existing data is incomplete orunreliable so it follows that relying on the data extracted from these descriptions may not be a verygood idea. Extracting data can serve a purpose as even if it is incorrect it can be useful in the validationprocess because tests have shown that in responding to request for data, respondents are much morewilling to correct errors then they are to fill in a blank field. The bottom line is that extracted data mustalways be validated.

    An additional task often associated with data extraction is value standardization. Value standardizationconsists of establishing preferred units of measure and converting all values to these units of measure.Creating consistent metric units by setting the position of the decimal is useful and without risk.Conversions between metric and imperial measurements is a risky business and best practice is toclearly indicate that a value is the result of a conversion.

    Cataloging at Source (C@S)Cataloging at Source (C@S) is a process that was developed and extensively tested by NATO as areplacement for the traditional military cataloging method. It is at the heart of the development of ISO22745 as an international standard for cataloging and ISO 8000 the international standard for dataquality.

    The traditional method of cataloging in the military was for the buyers to request that the suppliers ormanufacturers provide technical specifications and drawings that were then used by military cataloguersto create a NATO Stock Number (NSN) record, essentially the military equivalent of your master datarecord.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    28/34

    Managing a Data Cleansing Process for Materials or Services

    Page 28

    NCAGE Part Number

    NSNItem of Supply

    Segment AIdentification Guide, Item name

    Characteristic dataFit, Form, FunctionSegment V (coded)

    Segment M (clear text)

    Material Management dataSegment H

    Name Address

    Packaging dataSegment W

    Identification dataItem of production

    Segment C

    +

    Beyond manufacturer resistance and supplier inability to provide what consisted of unspecified data,the cost of extracting the data from source documents was prohibitive. Cataloging at source took adifferent approach in specifying exactly what data was needed. The process was extensively tested andit demonstrated a substantial improvement in the quality of data that was provided, the speed the datawas provided and as a result it lowered the cost of cataloging (by 75%!!).

    In March 2011, this resulted with the inclusion of the following clause in the standard that specifies theinformation exchange requirements for most material management functions commonly performed in

    supporting international projects.

    The Contractor shall supply identification and characteristic data in accordance with ISO 8000-110:2009 on any of the selected items covered in his contract. Following an initial codification requestas specified in section 3.2, the NATO Codification Bureau (NCB) shall present a list of the required

    properties in accordance with the US Federal Item Identification Guides (The US Federal Itemidentification guides are data requirements)

    The process also demonstrated that suppliers and manufacturers welcomed the change, as for the firsttime, they were given visibility of exactly what data their customer wanted or needed and the preferredbeing asked for data as opposed to the alternative, where they had no visibility of what data was being

    collected or from where it was being obtained.ISO 22745 was developed to support the cataloging at source process and to create what has becomeknown as the data supply chain, as illustrated in the following diagram.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    29/34

    Managing a Data Cleansing Process for Materials or Services

    Page 29

    Creating and managing a data supply chain is the single most important development in data cleansing.It is a recognition that the characteristic data essential in creating a structured master data recordoriginates from outside the organization. Cataloging at source, has to a large degree, replaced the dataextraction and research function performed by contractors and it is the largest single contributor toreducing the cost of cataloging.

    If your service provider is using automated web search tools, such as, web robots also known as webwanderers, crawlers, or spiders you should require a written confirmation that they are doing soethically and legally. They should have a written policy in which they expressly agree to adhere to therobot exclusion rules defined in the robot.txt file on the target web site and will respect the rulesgoverning the use of a third party’s web site. These automated programs are used by search engines,such as Google, Yahoo and Bing to index web content. Unfortunately, spammers also use them to scanfor email addresses and many companies use them to illegally obtain data, this is not only frowned upon

    but it can be illegal and can be considered industrial espionage. If these automated search agents arenot managed properly they can also seriously disrupt the operation of a third party’s website.Remember, the data cleansing company is working for you and they are conducting research as youragent, so you do care about how they do their work.

    The following is a work flow that details the cataloging at source process.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    30/34

    Managing a Data Cleansing Process for Materials or Services

    Page 30

    SupplierMaster Data contains

    Technical Point ofContact email

    Contact supplier to obtainTechnical Point of Contact email

    No

    Descriptionsufficient

    to assign a class

    Yes

    Data is sufficient toorder the item from a

    known supplierNo

    Yes

    No

    Supplier hasISO 22745 catalog

    Yes

    Send Supplier ISO 22745request for data

    Yes

    Data requirementexists in Registry

    Yes

    Create Data RequirementNo

    Service or Material ItemRecord in ISO 8000-120Master Data warehouse

    Send Technical Point of Contactemail request for data with URL to

    on-line form fillNo

    Contact buyer or supplier orconduct on-line research to

    determine class

    Contact buyer to obtain datasufficient to order the item from a

    known supplier

    Reply receivedOn-line research or

    data extractionNo

    Yes

    Add data to ISO 8000-120 MasterData warehouse

    Cataloging at source work flow

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    31/34

    Managing a Data Cleansing Process for Materials or Services

    Page 31

    Leveraging level 4 data

    Generate Standardized DescriptionsOne of the major benefits of level 4 data is the ability to automatically and programmatically generatedescriptions. Descriptions can be generated in any of the languages supported by the dictionary.Descriptions are generated using a rendering guide that specifies the data elements to be included, aswell as, the order. The rendering guide will also specify the overall length of the description and whereabbreviations should be used. The software required to auto generate descriptions can be verysophisticated and both the item name and all the descriptions should be rendered from the structuredmaster data record.

    The major advantage of this process is the dynamic nature of descriptions. If the data requirement for aclass changes and a new property is introduced, new descriptions can easily be generated for all theitems in the class.

    Potential Duplicate Identification Based on Characteristic DataThis process combines the reference data, the class and the characteristic data to perform asophisticated analysis that results in a probability analysis. Used by experienced domain experts thisprocess can be extremely efficient at identifying potential duplicates with a very high degree ofconfidence.

    Competitive SourcingCompetitive sourcing is one of the primary purposes of cleansing data. The better the specification, thehigher the response to your Request for Quote (RFQ) and the easier it is to analyze the replies. Manysuppliers will not respond to an incomplete technical specification because they know that even if theyget the order there is a high probability that the item they supply will be incorrect and be returned.

    Automating the generation and analysis of RFQ is relatively straight forward, in fact it was one of thevery first systems I designed. It was called Jade (I cannot remember why) and it was driven by a veryprimitive supplier master and item master. The system generated detailed RFQs which were sent outfirst by mail and then by telex through a special network. This was in the days before email when in theUK it was illegal to connect a fax machine to the British Telecom network, you could only lease theequipment which required a dedicated line at a combined cost of $250 per month! Of course BT alsoowned all the telephones and all the answering machines. BT would never have given up on theirgoldmine if it had not been for massive civil disobedience. I was actually fined and threaten withpermanent disconnection for plugging an answering machine purchased in the US into the BT network,

    luckily times have changed.

    Physical VerificationPhysical verification is not typically quoted as part of a standard data cleansing process but it isrecommended on many items identified as potential duplicates. If it is undertaken it is common toinclude a physical stock check and photographs of the items. Although it is obvious, it is good practice toinclude a ruler in the picture for scale (surprisingly many contractors forget this).

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    32/34

    Managing a Data Cleansing Process for Materials or Services

    Page 32

    Potential Duplicate ResolutionGreat care needs to be taken in resolving potential duplicates. The first step is to determine that theitems are truly duplicates. Even an identical manufacture part number cannot be relied upon asconclusive proof of duplication. Part numbers are very useful search strings and in many instances theyhave become recognizable brands, so manufacturers and suppliers can maintain the part numbers evenwhen they make changes to the materials or components. This often results in changes in the fit, formor function. Physical verification of the items to determine that they are true duplicates is highlyrecommended but it is important to also keep in mind that a common duplication problem iscounterfeiting, where the two items share the same visible physical characteristics but may besubstantially different in terms of their performance characteristics.

    Resolution of duplication consists of selecting one or more items to be marked as deprecated in the itemmaster. This means that the item number is no longer to be used but it is not deleted, as deleting anitem from the master data would make it impossible to report on the historical records. By deprecatingthe item number, it should no longer be available for requisition but there may still be open purchase

    orders and these should be given time to work through the system.If the item is inventoried, then the physical inventory should be consolidated as soon as the duplicationhas been confirmed. It is a good idea to leave the empty bin in place with a note including the new itemnumber and location. Dating the note will allow the bin to be safely reused when needed.

    The savings attributed to duplicate identification and resolution is typically measured by the reduction ininventory of the highest priced item plus the annual savings. The lower the inventory turn and thehigher the price differential, is the greater the savings. While the savings, due to a reduction ininventory, would normally be shown as a balance sheet item, it is not uncommon for the excessinventory to be “consumed”. This reduces expenditure to below normal in the short term so it is very

    important to be aware that expenditure will recover once the excess inventory is absorbed .

    The growth of commodities

    Commodities were traditionally materials of uniform quality defined by a standard that can bereferenced in the contract and produced in large quantities by many different producers. In order to betraded as a commodity, the compliance of the item along with the standard, needs to be capable ofbeing independently verified. One of the benefits of standards is commoditization, from the buyer’sperspective this increases competition and from the supplier’s perspective, increases market size. Whilethe number of commodities traded on the commodities markets has grown, this growth has beeneclipsed by the growth in the commoditization of intangibles in the form of financial instruments suchas, derivatives. The lesson learned from the commoditization of intangible assets is the critical natureplayed by identifiers and the associated characteristic data. These lessons apply to the commoditizationof services.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    33/34

    Managing a Data Cleansing Process for Materials or Services

    Page 33

    Materials vs. Services

    In many companies the expenditure on materials is decreasing and the expenditure on services isincreasing, this is a reflection of a growing sophistication in the supply chain which needs to be matchedby an increase in the ability to reliably contract for services. A service can be described using the sameprocess used to describe tangible materials; a service will be assigned a service number and will beassigned a class and described using characteristic data. Contracting for an intangible service relies justas much on the specification as does contracting for any tangible item. The difference is that thecharacteristics of a tangible item are typically its physical or performance characteristics while thecharacteristic that describes a service are typically its tangible output described as the deliverables.

    Best practice is to avoid including materials with services when the material have their own materialnumber. This can be a challenge when contracting for maintenance services, which include thereplacement of material, such as motors and valves. Reconciling purchase orders can be more difficultbut without this effort, spend analysis can be very frustrating.

    Contracting for data cleansing services

    The quality and consistency of data cleansing services continues to improve to a large degree becausethe quality of the delivered data can now be objectively and independently measured as the degree towhich the data meets the data requirements. All master data cleansing should include or be precededby a scoping study that defines the number of records to be cleansed, the number of item classes, thedata requirement by class and the priority by item class, where appropriate. Throughout the datacleansing project the customer should actively monitor the dictionary, the data requirements and the

    description rules which should be clearly defined as deliverables. Changes in the data requirementsduring the project should be documented in “change orders” as they will impact project cost andduration.

  • 8/15/2019 Managing a Data Cleansing Process for Material or Service Master Data 20130529 (2)

    34/34

    Managing a Data Cleansing Process for Materials or Services

    The following is a recommended specification of the deliverables that should be included in a masterdata cleansing contract:

    The master data delivered pursuant to this contract shall be ISO 8000-120 compliant:

    1.

    The master data shall be provided in ISO 22745-40 compliant Extensible Markup Language(xml).2. The provenance of the property values shall be identified in accordance with ISO 8000-120

    using an ECCMA Global Organization Registry (eGOR) identifier to identify the source of thedata and shall be dated with the date the data was obtained.

    3. Identification data (for example part numbers, drawing numbers, standard specifications)shall be in the form of a reference where the organization that issued the identifier shallitself be identified using an ECCMA Global Organization Registry (eGOR) identifier.

    4. The master data shall comply with agreed data requirements that shall be delivered in xmlin compliance with ISO 22745-30 or registered in the ECCMA Data Requirements Registry(eDRR).

    5. Property values that are rendered from other property values (for example rendereddescriptions) shall identify the rules used in rendering and the rules shall be stated inconformance with ISO 22745-45.

    6. If a classification is provided, all the characteristics used in assigning the classification mustbe included in the characteristic data.

    7. The master data, the data requirements and the description rules shall be encoded usingconcept and terminology identifiers from the ECCMA Open Technical Dictionary (eOTD), anISO 22745 compliant open technical dictionary that supports free identifier resolution.

    For the avoidance of doubt the following data must be provided in an application neutral formatwithout the inclusion of proprietary tags:

    1. The dictionary (including all classes, attributes, units of measure and coded values with anyand all terminology necessary to render descriptions)

    2. The data requirements (cataloging templates)3. The description rendering rules

    Statement of intellectual property: Contractor hereby warrants that data delivered pursuant tothis contract is free from any and all claims to intellectual property be it in the form of copyright,

    patent or trade secret that would restrict customer from using or redistributing the delivereddata.