bloor research on data migration

Upload: ivahdam

Post on 03-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Bloor Research on Data Migration

    1/13

    Data Migration

    A White Paper by Bloor ResearchAuthor : Philip HowardPublish date : May 2011

    WhitePaper

  • 7/28/2019 Bloor Research on Data Migration

    2/13

    data migration projects areundertaken because they willsupport business objectives.There are costs to the businessi it goes wrong or i the projectis delayed, and the mostimportant actor in ensuringthe success o such projects isclose collaboration between thebusiness and IT.

    Philip Howard

    Free copies of this publication have been

    sponsored by

  • 7/28/2019 Bloor Research on Data Migration

    3/131 2011 Bloor ResearchA Bloor White Paper

    Data Migration

    Executive introduction

    In 2007, Bloor Research conducted a survey into the state o the marketor data migration. At that time, there were ew tools or methodologiesavailable that were targeted specifcally at data migration and it was notan area o ocus or most vendors. As a result, it was not surprisingthat 84% o data migration projects ran over time, over budget, or both.During the spring o 2011, Bloor Research re-surveyed the market todiscover what lessons have been learned since 2007. While a detailedanalysis o the survey results will not be available until later in the year,it is encouraging that now only a minority o migration projects are notcompleted on time and within budget.

    This paper will discuss why data migration is important to your businessand why the actual process o migration needs to be treated as a business

    issue, what lessons have been learned in the last ew years, the initialconsiderations that organisations need to bear in mind beore undertak-ing a data migration, and best practices or addressing these issues.

  • 7/28/2019 Bloor Research on Data Migration

    4/132 2011 Bloor Research A Bloor White Paper

    Data Migration

    Data migration is a business issue

    I youre migrating rom one application environment to another, imple-menting a new solution, or consolidating multiple databases or applica-tions onto a single platorm (perhaps ollowing an acquisition), youredoing it or business reasons. It may be to save money or to providebusiness users with new unctionality or insights into business trendsthat will help to drive the business orward. Whatever the reasons, it isa business issue.

    Historically, certainly in 2007, data migration was regarded as risky. To-day, as our latest results indicatenearly 62% o projects are deliveredon-time and on-budgetthere is much less risk involved in data migra-tion, provided that projects are approached in the proper manner. Nev-ertheless, there are risks involved. Figure 1 illustrates some o the costs

    associated with overrunning projects, as attested by respondents in our2011 survey. Note that these are all directly related to business costs.

    So, there are risks, and 30% o data migration projects are delayed be-cause o concerns over these and other issues. These delays, which av-erage approximately our months (but can be a year or more), postponethe accrual o the business benefts that are driving the migration in thefrst place. In eect, delays cost money, which is why it is so importantto take the risk out o migration projects.

    It is worth noting that companies were asked about the top three actorsaecting the success o their data migration projects. By ar, the mostimportant actor was business engagement with 72% o organisationsquoting this as a top three actor and over 50% stating this as the most

    important actor. Conversely, over one third o companies quoted lacko support rom business users as a reason or project overruns.

    To summarise: data migration projects are undertaken because they willsupport business objectives. There are costs to the business i it goeswrong or i the project is delayed, and the most important actor in ensur-ing the success o such projects is close collaboration between the busi-ness and IT. Whether this means that the project should be owned by thebusinesstreated as a business project with support rom ITor wheth-er it should be delegated to IT with the business overseeing the projectis debatable, but what is clear is that it must involve close collaboration.

    Figure 1: Costs to the business o overrunning projects

  • 7/28/2019 Bloor Research on Data Migration

    5/133 2011 Bloor ResearchA Bloor White Paper

    Data Migration

    Lessons learned

    Figure 3: Diagrammatic representation o a business entity

    Used in 2007 Used in 2011

    Data profling tool 10% 72%

    Data cleansing tool 11% 75%

    Formal methodology 72% 94%

    In-house methodology 76% 41%

    Figure 2: Increased use o tools and methodologies

    As previously noted, there has been a signif-cant reduction in the number o overrunning orabandoned projects. What drove this dramaticshit? There are a number o areas that havesignifcantly changed since 2007, which are

    summarised in Figure 2.

    This shows a very signifcant uptake in the useo data profling and data cleansing tools, aswell as more companies now using a ormalmethodology. With respect to a methodology,there has been a signifcant move towardsormalised approaches that are supplied byvendors and systems integrators rather thandeveloped in-house. While we cannot provea causal relationship between these changesand the increased number o on-time and on-budget projects, these fgures are, at the veryleast, highly suggestive.

    Since these are lessons that have already beenlearned we will not discuss these elements indetail, but they are worth discussing briey.

    Data profling

    Data profling is used to identiy data qual-ity errors, uncover relationships that existbetween dierent data elements, discoversensitive data that may need to be masked oranonymised and monitor data quality on an on-going basis to support data governance. One

    important recommendation is that data profl-ing be conducted prior to setting your budg-ets and timescales. This is because proflingenables identifcation o the scale o the dataquality, masking and relationship issues thatmay be involved in the project and thus enablesmore accurate estimation o project durationand cost. Exactly hal o the companies usingdata profling tools use them in this way, andthey are more likely to have projects that run totime and on budget, by 68% compared to 55%.

    While the use o data profling to discover er-rors in the data is airly obvious, its use with

    respect to relationships in the data may notbe. Figure 3 shows a business entity or anestate agent, showing relevant relationships.I you migrate this data to a new environment

    then the relationship between the propertyand the land registry, or example, needs toremain intact, otherwise the relevant applica-tion wont work.

    Note that this is a business-level representa-tion o these relationships. Data profling toolsneed to be able to represent data at this levelso that they support use by business analystsand domain experts, as well as in entity-

    relationship diagrams used by IT. Ideally, theyshould enable automatic conversion rom oneto the other to enable the closest possiblecollaboration.

    Data quality

    Data quality is used to assure accurate dataand to enrich it. It is only in recent years thatmore attention has been paid to data quality;yet many companies still ignore it. For exam-ple, the GS1 Data Crunch report, published inlate 2009, compared the product catalogues o

    the UKs our leading supermarket chains withthe same details at their our leading suppli-ers. It ound that 60% o the supermarketsrecords were duplicates, resulting in over-stocking and wastage or some products andunder-stocking and missed sales or others.The total cost o this poor data quality or theseour companies over 5 years was estimatedat 1bn. As another and more generic proopoint, a Forbes Insights survey published in2010 stated that data-related problems costthe majority o companies more than $5 mil-lion annually. One-fth estimate losses in ex-cess o $20 million per year.

    Poor data quality can be very costly and canbreak or impair a data migration project justas easily as broken relationships. For example,

  • 7/28/2019 Bloor Research on Data Migration

    6/134 2011 Bloor Research A Bloor White Paper

    Data Migration

    Lessons learned

    suppose that you have a customer email address with a typo such as a0 instead o an O. An existing system may accept such errors, but thenew one may apply rules more rigorously and not support such errors,meaning that you cannot access this customer record. In any case, theend result will be that this customer will never receive your email mar-keting campaign.

    According to survey respondents in 2011, poor data quality or lack o vis-ibility into data quality issues was the main contributor (53.4%) to projectoverruns, where these occurred.

    Methodology

    Having an appropriate migration methodology is quoted in our surveyas a critical actor by more respondents than any other except businessengagement. It is gratiying to see that almost all organisations now usea ormal methodology. For overrunning projects, more than a quarterblamed the act that there was no ormal methodology in place.

    In terms o selecting an appropriate methodology or use with datamigration, it should be integrated with the overall methodology orimplementing the target business application. As such, the migrationis a component o the broader project, and planning or data migrationactivities should occur as early as possible in the implementation cycle.

  • 7/28/2019 Bloor Research on Data Migration

    7/135 2011 Bloor ResearchA Bloor White Paper

    Data Migration

    Pre-migration decisions

    There are various options that need to beconsidered beore commencing a project. Inparticular:

    Who will have overall control o the project?Is it the business leader with support rom ITor IT with support rom the business?

    Who owns the data? Owners will need to beidentifed or all key data identities, and theywill need to be aware o what is expected othem and their responsibilities during themigration project. This is critical: we have

    previously discussed the importance o busi-ness engagement and it is at this level thatit is most important. Selected tools shouldsupport business-IT collaboration so thatbusiness users can work within the projectusing business-level constructs and termi-nology, which can be automatically translat-ed into IT terms. This will assist in ensuringbusiness engagement and will also acilitateagile user acceptance testing.

    Will older data be archived rather than mi-grated to the new system? Almost 60% othe migrations in our most recent survey

    involved archival o historic data rom earliersystems. In most cases, it is less expensiveto archive historic data versus migrating andmaintaining it in the new systemreducingtotal cost o ownership. It also reduces thescale requirements or the migrated system.A key eature o archiving is that you need tounderstand relationships in the data (usingdata profling) so it oten makes sense tocombine archiving and migration into a sin-gle project. Setting and monitoring retentionpolicies is also a concern as it is necessaryto comply with legal requirements. This

    monitoring may be included as part o datagovernance (see next section).

    What development processes will be em-ployed? Will it be an agile methodology withrequent testing or a more traditional ap-proach? Agile approaches should includeuser acceptance testing as well as unctionaland other test typesthe earlier that mis-matches can be caught the better.

    Does the project involve any sensitive data?According to our survey results, around 44%o projects involve sensitive data. I sensitive

    data is an issue, it will need to be discovered(profling tools can do this) and then masked.In terms o masking, simple scramblingtechniques may not be enough. For example,

    i you scramble a postal code randomly thiswill be fne i the application is only testingor a correct data ormat, but i it checks thevalidity o the code then you will need to usemore sophisticated masking algorithms. Itmust also be borne in mind that relation-ships may need to be maintained during themasking process, or similar reasons. Forexample, a patient with a particular diseaseand treatment cannot be randomly maskedbecause there is a relationship betweendiseases and treatments that may need tobe maintained. Manual masking methods

    were used by most companies in our survey(though more than 10% simply ignored theissue, thereby breaking the law), but thiswill only be appropriate or the most simplemasking issues, where neither validity norrelationships are an issue.

    How will the project be cut-over? Is this tobe a turn the old system o and the newsystem on, over a long weekend cut-over?I the cut-over is to run over an extendedperiod, best practice suggests that thereshould be a reeze on changes to the sourcesystem(s) during that time. Will a more

    iterative approach, perhaps with parallelsystems running, be employed? Or will a ze-ro-downtime approach be used? I systemsare running in parallel ater the migrationis completed, then criteria needs to be de-fned that determines when the old systemcan be turned o. Procedures then need tobe in place to monitor those criteria so thatthe old system is turned o as soon as pos-sible. Failure to defne appropriate criteriacan lead to old systems being maintained orar too long, which is a waste o time, moneyand resources.

    Is the migration to be phased? For example,i migrating to a new application and a newdatabase it might be decided to migratethe application frst and the database later.Or, i the new environment will host muchmore detailed data than the current one, adecision to migrate just the current data andthen add in the additional data as a secondphase may be best. This can be a useul ap-proach i you are migrating between two verydierent environments.

    Who will you partner with? The evidence

    rom our 2011 survey suggests that projectsthat are managed using in-house resourcesare least likely to overrun. I you do nothave such internal expertise then a third

  • 7/28/2019 Bloor Research on Data Migration

    8/136 2011 Bloor Research A Bloor White Paper

    Data Migration

    Pre-migration decisions

    party will be needed to assist in the migra-tion project and to provide a methodology.Results suggest that sotware vendors arebetter able to do this than systems integra-tors (59% no overruns compared to 52%),perhaps because o their in-depth knowl-edge o the integration tools that they supplyand/or their unctional knowledge o targetapplications. A urther consideration will bethe degree o knowledge transer the thirdparty will be able to provide, allowing you tobuild your own internal expertise to supportuture projects. In addition, go-live is still a

    very critical and disruptive event that is otenassociated with the implementation o a newbusiness application and new processesor the organisation. Experienced thirdparty support or the change managementprocesses involved can be critical. Resultssuggest that sotware vendors with strongconsultancy capabilities are better able todo this than pure-play systems integrators.Other major actors in choosing a partnerare experience with mission-critical and en-terprise-scale migrations, the ability to drivebusiness-IT collaboration and the changemanagement that may involve, technical ex-

    pertise with the tools to be used, unctionaland domain expertise with respect to thebusiness processes involved, and a suitablemethodology (as previously discussed).

    Data governance

    As noted, it is imperative to ensure good dataquality in order to support a data migrationproject. However, it is all too easy to think odata quality as a one-o issue. It isnt. Dataquality is estimated to deteriorate between 1%and 1.5% per month. Conduct a one-o data

    quality project now and in three years you willbe back where you started. I investing in dataquality to support data migration, plan to moni-tor the quality o this data on an on-going basis(see box) and remediate errors as they arise.

    In addition to addressing the issues raised inthe previous section, consider using data mi-gration as a springboard or a ull data govern-ance programme, i one is not used already.Having appropriate data governance proc-esses in place was the third most highly ratedsuccess actor in migration projects in our2011 survey, ater business involvement and

    migration methodology. While we do not havedirect evidence or this, we suspect that parto the reason or this ocus is because datagovernance programmes encourage business

    engagement. This happens through the ap-pointment o data stewards and other repre-sentatives within the business that becomeactively involved not only in data governanceitsel, but also in associated projects such asdata migration.

    In practice, implementing data governance

    will mean extending data quality to other ar-eas, such as monitoring violations o businessrules and/or compliance (including retentionpolicies or archived data) as well as puttingappropriate processes in place to ensure thatdata errors are remediated in a timely ashion.Response time or remediation can also bemonitored and reported within your data gov-ernance dashboard.

    In this context, it is worth noting that regula-tors are now starting to require ongoing dataquality control. For example, the EU Solvency II

    regulations or the insurance sector mandatethat data should be accurate, complete andappropriate and maintained in that ashionon an ongoing basis. The proposed MiFID IIregulations in the fnancial services sector useexactly the same terminology; and we expectother regulators to adopt the same approach.It is likely that we will see the introduction oSarbanes-Oxley II on the same basis. All othis means that it makes absolute sense touse data migration projects as the kick-o orongoing data governance initiatives, both orpotential compliance reasons and, more gen-erally, or the good o the business.

    To monitor data quality bear in mind that100% accurate data is not obtainable and,even i it was, it would be prohibitivelyexpensive to maintain. It is thereoreimportant to set targets or things suchas number o duplicate records, recordswith missing data, records with invaliddata, and so on.

    These key perormance indicators canbe presented by means o a dashboardtogether with trend indicators that supporteorts towards continuous improvement.

    Data profling tools can not only be used tomonitor data quality, but also to calculatekey perormance indicators and presentthese as described.

  • 7/28/2019 Bloor Research on Data Migration

    9/137 2011 Bloor ResearchA Bloor White Paper

    Data Migration

    Initial steps

    We have discussed what needs to be consid-ered beore you begin. Here, we will brieyhighlight the main steps involved in the actualprocess o migration. There are basically ourconsiderations:

    1. Development: develop data quality andbusiness rules that apply to the data andalso defne transormation processes thatneed to be applied to the data when loadedinto the target system.

    2. Analysis: this involves complete and in-

    depth data profling to discover errors in thedata and relationships, explicit or implicit,which exist. In addition, profling is usedto identiy sensitive data that needs to bemasked or anonymised. The analysis phasealso includes gap analysis, which involvesidentiying data required in the new systemthat was not included in the old system. Adecision will need to be made on how tosource this. In addition, gap analysis willbe required on data quality rules becausethere may be rules that are not enorced inthe current system, but compliance will beneeded in the new one. By profling the data

    and analysing data lineage, it is also pos-sible to identiy legacy tables and columnsthat contain data, but which are not mappedto the target system within the data migra-tion process.

    3. Cleansing: data will need to be cleansed andthere are two ways to do this. The existingsystem can be cleansed and then the dataextracted as a part o the migration, or thedata can be extracted and then cleansed.The latter approach will mean that the mi-grated system will be cleansed, but not the

    original. This will be fne or a short durationbig-bang cut-over, but may cause problemsor parallel running or any length o time.In general, a cleansing tool should be usedthat supports both business and IT-levelviews o the data (this also applies to dataprofling) in order to support collaborationand enable reuse o data quality and busi-ness rules, thereby helping to reduce costo ownership.

    4. Testing: it is a good idea to adopt an ag-ile approach to development and testing,treating these activities as an iterativeprocess that includes user acceptancetesting. Where appropriate, it may be use-ul to compare source and target databasesor reconciliation.

    These are the main our elements, but they areby no means the only ones. A data integrationtool (ETL - extract, transorm and load), or ex-ample, will be needed in order to move the dataand this should also have the collaborative and

    reuse capabilities that has been outlined.

    A urther consideration is that whatever toolsare used, they should be able to maintain a ullaudit trail o what has been done. This is im-portant because part o the project may need tobe rolled back i something goes wrong and anaudit trail provides the documentation o whatprecisely needs to be rolled back. Further,such an audit trail provides documentation othe completed project and it may be requiredor compliance reasons with regulations suchas Sarbanes-Oxley. Other important capabili-ties include version management, how data is

    masked, and how data is archived.

    Clearly it will be an advantage i all tools canbe provided by a single vendor running on asingle platorm.

  • 7/28/2019 Bloor Research on Data Migration

    10/138 2011 Bloor Research A Bloor White Paper

    Data Migration

    Summary

    Data migration doesnt have to be risky. As our research shows, theadoption o appropriate tools, together with a ormal methodology hasled, over the last our years, to a signifcant increase in the successuldeployment o timely, on-cost migration projects. Nevertheless, thereare still a substantial number o projects that overrunmore than athird. What is required is careul planning, the right tools and the rightpartner. However, the most important actor in ensuring successulmigrations is the role o the business. All migrations are business is-sues and the business needs to be ully involved in the migrationbe-ore it starts and on an on-going basisi it is to be successul. As aresult, a critical actor in selecting relevant tools will be the degreeto which those tools enable collaboration between relevant businesspeople and IT.

    Finally, data migrations should not be treated as one-o initiatives. Itis unlikely that this will be the last migration, so the expertise gainedduring the migration process will be an asset that can be reused in theuture. Once data has been cleansed as a part o the migration process,it represents a more valuable resource than it was previously, because itis more accurate. It will make sense to preserve the value o this assetby implementing an on-going data quality monitoring and remediationprogramme and, preerably, a ull data governance project.

    Further Inormation

    Further inormation about this subject is available romhttp://www.BloorResearch.com/update/2085

    http://www.bloorresearch.com/update/2085http://www.bloorresearch.com/update/2085
  • 7/28/2019 Bloor Research on Data Migration

    11/13

    Bloor Research overview

    Bloor Research is one o Europes leading ITresearch, analysis and consultancy organisa-tions. We explain how to bring greater Agil-ity to corporate IT systems through the eec-tive governance, management and leverageo Inormation. We have built a reputation ortelling the right story with independent, in-telligent, well-articulated communicationscontent and publications on all aspects o theICT industry. We believe the objective o tellingthe right story is to:

    Describe the technology in context to its

    business value and the other systems andprocesses it interacts with.

    Understand how new and innovative tech-nologies ft in with existing ICT invest-ments.

    Look at the whole market and explain allthe solutions available and how they can bemore eectively evaluated.

    Filter noise and make it easier to fnd theadditional inormation or news that sup-ports both investment and implementation.

    Ensure all our content is available throughthe most appropriate channel.

    Founded in 1989, we have spent over two dec-ades distributing research and analysis to ITuser and vendor organisations throughoutthe world via online subscriptions, tailoredresearch services, events and consultancyprojects. We are committed to turning ourknowledge into business value or you.

    About the author

    Philip HowardResearch Director - Data

    Philip started in the computer industry way backin 1973 and has variously worked as a systemsanalyst, programmer and salesperson, as well asin marketing and product management, or a va-riety o companies including GEC Marconi, GPT,Philips Data Systems, Raytheon and NCR.

    Ater a quarter o a century o not being his own boss Philip set up whatis now P3ST (Wordsmiths) Ltd in 1992 and his frst client was Bloor Re-search (then ButlerBloor), with Philip working or the company as an

    associate analyst. His relationship with Bloor Research has continuedsince that time and he is now Research Director. His practice area en-compasses anything to do with data and content and he has fve urtheranalysts working with him in this area. While maintaining an overviewo the whole space Philip himsel specialises in databases, data man-agement, data integration, data quality, data ederation, master datamanagement, data governance and data warehousing. He also has aninterest in event stream/complex event processing.

    In addition to the numerous reports Philip has written on behal o BloorResearch, Philip also contributes regularly to www.IT-Director.com andwww.IT-Analysis.com and was previously the editor o both ApplicationDevelopment News and Operating System News on behal o Cam-bridge Market Intelligence (CMI). He has also contributed to various

    magazines and published a number o reports published by companiessuch as CMI and The Financial Times.

    Away rom work, Philips primary leisure activities are canal boats,skiing, playing Bridge (at which he is a Lie Master) and walking the dog.

  • 7/28/2019 Bloor Research on Data Migration

    12/13

    Copyright & disclaimer

    This document is copyright 2011 Bloor Research. No part o this pub-lication may be reproduced by any method whatsoever without the priorconsent o Bloor Research.

    Due to the nature o this material, numerous hardware and sotwareproducts have been mentioned by name. In the majority, i not all, o thecases, these product names are claimed as trademarks by the compa-nies that manuacture the products. It is not Bloor Researchs intent toclaim these names or trademarks as our own. Likewise, company logos,graphics or screen shots have been reproduced with the consent o theowner and are subject to that owners copyright.

    Whilst every care has been taken in the preparation o this document

    to ensure that the inormation is correct, the publishers cannot acceptresponsibility or any errors or omissions.

  • 7/28/2019 Bloor Research on Data Migration

    13/13

    2nd Floor,145157 St John Street

    LONDON,EC1V 4PY, United Kingdom

    Tel: +44 (0)207 043 9750

    Fax: +44 (0)207 043 9748Web: www.BloorResearch.com

    email: [email protected]

    http://www.bloor-research.com/mailto:[email protected]:[email protected]://www.bloor-research.com/