pallas architecture (uk)

Upload: david-walker

Post on 30-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Pallas Architecture (UK)

    1/24

    Business Intelligence at Ahold Netherlands

    Pallas

    the Albert Heijn Data Warehouse

    A Description of its Architecture

    Lidwine van As

    Wouter van Aerle

    October 2004

  • 8/14/2019 Pallas Architecture (UK)

    2/24

    Pallas - The Albert Heijn Data Warehouse Page: 2

    Table of content:

    1. Introduction ______________________________________________________________ 3

    1.1 Work Method and Organisation of Documents____________________________________ 3

    1.2 Explaining the Choices _______________________________________________________ 3

    1.3 Reference___________________________________________________________________ 4

    1.4 About the Authors ___________________________________________________________ 4

    2. Rationale_________________________________________________________________ 5

    2.1 Defining the Problem_________________________________________________________ 5

    2.2 Solution ____________________________________________________________________ 5

    3. A Birds-eye View of Pallas __________________________________________________ 6

    4. Functional Capability_______________________________________________________7

    4.1 Point of Departure ___________________________________________________________ 7

    4.2 Process Orientation __________________________________________________________ 8

    4.3 Quality of the Data___________________________________________________________ 9

    4.4 Performance Capability for Reporting _________________________________________ 10

    4.5 Other Performance Capabilities_______________________________________________ 10

    5. Technology ______________________________________________________________ 12

    5.1 Infrastructure______________________________________________________________ 12

    5.2 Architecture _______________________________________________________________ 13

    5.2.1 Overall Architecture _____________________________________________________________ 135.2.2 Transaction Repository ___________________________________________________________ 14

    5.2.3 Data Marts_____________________________________________________________________ 15

    5.3 Data Staging Areas (DSA)____________________________________________________ 165.3.1 Processing _____________________________________________________________________ 17

    5.4 Interfaces and Interaction with Other Systems___________________________________ 18

    6. Approach________________________________________________________________20

    6.1 Organisational Measures_____________________________________________________ 206.1.1 Current Services ________________________________________________________________ 20

    6.1.2 Metadata ______________________________________________________________________ 20

    6.1.3 New Services __________________________________________________________________ 21

    6.2 Methods, Techniques and Documentation_______________________________________ 21

    7. Costs and Benefits ________________________________________________________22

    7.1 General Situation ___________________________________________________________ 22

    7.2 Costs _____________________________________________________________________ 23

    7.3 Benefits ___________________________________________________________________ 23

    7.4 Future Prospects ___________________________________________________________ 24

  • 8/14/2019 Pallas Architecture (UK)

    3/24

    Pallas - The Albert Heijn Data Warehouse Page: 3

    1. INTRODUCTIONThis description of the architecture is the entry from Albert Heijn Competence Centre for

    Business Intelligence (CC-BI) to the Dutch Championship Architecture competition for2004. The IEEE recommended practice for architecture descriptions, IEEE 1471, has been

    used as guiding principle.

    1.1 Work Method and Organisation of DocumentsIn compliance with IEEE 1471, the primary stakeholders of the data warehouse and their

    concerns have been described. The concerns are then translated into and recapitulated

    according to 4 angles of approach:

    Stakeholders Concerns ApproachesUser What processes are supported? What

    functional capability is offered? What is

    the quality level of the data?

    Functional

    Capability

    Owner What does the data warehouse cost? What

    does it produce? What are its prospects for

    the future?

    Costs and Benefits

    Internal Accounting Service What is the quality level of the data? How

    are security and privacy guaranteed? How

    is trace ability guaranteed?

    Functional

    Capability

    IT Management How do we control and reduce costs? Approach

    DWH technical team

    (development team, DWHarchitect, management team)

    How is the data warehouse (DWH)

    organised? How is the demand forfunctional capability translated into a

    technical solution?

    Technology,

    Approach

    IT Management How does the data warehouse fit in with

    AHs IT environment?

    Technology

    The chapter on Rationale explains the reason for the systems existence; after that the

    chapter on A Birds-eye View of Pallas provides an overall description of the system.

    Then the chapters on Functional Capability, Technology, Approach and Costs and

    Benefits elaborate further from the various angles of approach.

    1.2 Explaining the ChoicesTaking into account that in 2000 the original architectural design for Pallas filled

    73 pages, the present document is inevitably briefly worded. Not all subjects can

    be treated with the same degree of depth. So a selection has to be made from

    among the supply of information.

    The most important basic principle that was applied here is that what is uniquewas preferred above what is trivial.For that reason, little attention was spent onthe privacy and security aspects of the data warehouse, since no measures havebeen taken in that territory that are essentially different from what one may expect

    from any system of this type.

  • 8/14/2019 Pallas Architecture (UK)

    4/24

    Pallas - The Albert Heijn Data Warehouse Page: 4

    1.3 Reference[1] "The Data Warehouse Life Cycle Toolkit", Ralph Kimball et al., ISBN 0-471-25547-

    5, Wiley, 1998

    [2] Kimball Design Tip #59: "The Surprising Value of Data Profiling", September 2004

    [3] "Corporate Information Factory, 2nd

    edition", W.H. Inmon et al, ISBN 0-471-39961-2,

    Wiley, 2000

    1.4 About the AuthorsLidwine van As ([email protected]) is a self-employed IT consultant working in the

    field of software development and business intelligence. She has been involved as a data

    warehouse architect in the Pallas project practically from its inception.

    Wouter van Aerle ([email protected]) started working in 2001 for AlbertHeijn in CC-BI. As an information analyst, he was involved in augmenting the data

    warehouse with bonus card information and in creating a separate data mart for bonus

    card analysis. In addition to this, he took part in many other new developments. He is

    currently occupied with increasing further the professionalism of the information analysis

    specialisation within the CC-BI.

    mailto:[email protected]:[email protected]
  • 8/14/2019 Pallas Architecture (UK)

    5/24

    Pallas - The Albert Heijn Data Warehouse Page: 5

    2. RATIONALE2.1 Defining the ProblemIt was ascertained in 1999 that the quality of the management information then provided at

    Albert Heijn was felt to be far from satisfactory. The situation was characterised by poor

    controllability, incompleteness, contradictory figures, delayed availability and the lack of

    a single, integrated method for approaching the sources of information.

    Apart from that, new developments in the business, such as farther-reachingdifferentiation, made improvements desirable. This led to an increasing need forinformation with a greater level of detail (branch and check-out level) than it was possibleto provide at that time. Because of their closed character, the management informationsystems then available could not provide this or were not equipped for the volume of datathat could be expected.

    That is why the Pallas project was launched in 1999 with the following objective:

    To create a solution extending over the whole of Albert Heijn for the problem relating to

    the provision of information within the organisation, to raise this to a higher level and to

    lay the foundation for further expansion.

    An additional objective derived from this, was adding value by protecting and supportingthe complete value chain and facilitating the comprehensive use of the availableinformation. As well as this, the new solution had to call a halt to the veritable tower ofBabel-like confusion that was the consequence of using several types of informationenvironments in juxtaposition. Establishing one central source of historical data would

    make it possible to guarantee a common reference framework for information andbusiness definitions or one copy of the truth.A final objective was to derive cost advantages by reusing the environment for otherAhold operating companies.

    2.2 SolutionTo achieve these objectives, an overall information environment was developed to

    support the decision-making processes in the AH organisation. It would bring together

    information from all the business processes and would be called Pallas, the enterprise

    data warehouse.

    The following starting points and supplementary conditions for developing the

    environment were formulated to increase the chances that the new environment would

    be successful: The data warehouse was to be business-driven: the information that the business

    needs was the primary drivers for developing and expanding the data warehouse. A pragmatic approach was to be used without losing sight of the theory.

    One way in which this was done was to seek a connection with alreadyexisting best practices in the field of data warehousing.

    Think big, act small: The final objective is an enterprise-wide data warehouse,but it is an enormous chore to get this set up all at once. Instead of doing that, thefirst step was to lay the foundation (the architecture), and then build in small

    increments. Proven technology was to be the preferred choice.

  • 8/14/2019 Pallas Architecture (UK)

    6/24

    Pallas - The Albert Heijn Data Warehouse Page: 6

    3. A BIRDS-EYE VIEW OF PALLASThe Pallas architecture is constructed around a central data warehouse, the Transaction

    Repository (TR). The TR is filled from the different source systems within Albert Heijn.The data marts from which the end users can request data are generated from the TR; they

    are customised to suit the support needed by specific business demands and processes.

    Pallas has a delay of a day when compared to the source systems. The source information

    is supplied every night and is loaded and processed in the TR and the data marts so that

    they are available the next day for reporting and analysis. Every week approximately 600

    million records are loaded into the data warehouse. The feeding systems will gradually

    switch over to supplying data in real time, which will make the data warehouses

    response duration increasingly shorter. In the meantime, reports are available that can

    display the turnover as of ten minutes ago.

    After the foundation for the architecture was laid in 2000, Pallas was developed andelaborated incrementally. Although the enlargements were triggered in the first place by,

    and were implemented for the benefit of, the solutions to specific business issues, the idea

    of integration always lay at the basis of the work: the added information was to be

    reintegrated into the information already available. In this way it would be possible to

    procure extra insight into the underlying business processes. The kinds of information

    now available extend over nearly all elements of the retail value chain, from the

    distribution centre (DC) deliveries to check-out counter transactions. The information is

    used to support nearly all of the business processes in the company. Thanks to the rapid

    availability of the information and the high degree of detail, the data warehouse can

    provide both management and operational reports.

    Since the start of the project, the Pallas users' group has grown relentlessly. Every week

    1800 individual users of all levels, from the board of directors to the shop floor and the DC

    employees, use the reports and the analysis environments.

  • 8/14/2019 Pallas Architecture (UK)

    7/24

    Pallas - The Albert Heijn Data Warehouse Page: 7

    4. FUNCTIONAL CAPABILITY4.1 Point of DepartureThe objective of a business intelligence environment is simple: providing the right

    information, at the right time to the right person, in the right way.Making management

    and other information available to end users is what determines the fundamental

    functional capacity of the BI environment. Seen from a functional perspective, this can be

    done in several ways: from statistical reports to extensive opportunities for searching the

    data. Seen from a technical perspective, there are innumerable alternatives for achieving

    all of this. The different ranges of applicability have been classified in a display sheet.

    Figure 1: Display Sheet depicting end-user functional capability

    As can be seen on this display sheet, all end-user functional capability has been put intooperation. The choice was made to achieve each type of functional capability (reporting,analysis, and the like) with standard front-end tools that could offer both out-of-the-boxreporting capability as well as modules for constructing ones own specific types ofreports (such as score cards). This approach determines a priori what is and what is notfunctionally possible. After all, the available functional components and modalities of thestandard tools determine the way in which data can be reported and made available. Aswell as this, the classification is a good means of communication when co-ordinatinginformation needs with users and the way in which these needs will be made operational.

  • 8/14/2019 Pallas Architecture (UK)

    8/24

    Pallas - The Albert Heijn Data Warehouse Page: 8

    4.2 Process OrientationEach type of business (merchandising, fulfilment, replenishment, etc) has its own unique

    information requirements. This has been anticipated in the architectural options made atthe TR and data mart levels. Relevant choices in this context include such items as the

    application of conformed dimensions and taking one data mart per business process as

    a starting point. (In some cases, applying this principle meant that exactly the same data

    had to be included in several different data marts. Take, for instance, the example of sales

    information that is required for the management of several different primary processes

    such as store operation, merchandising, and the like.) The use of tooling supports the

    creation of several different reporting environments that can all be handled via one portal.

    Thanks to this, it was possible to create a dedicated reporting environment for each

    supported process or focus for attention as component of the chosen tool. This

    environment could be fed its own underlying data.

    The following bus matrix [1] gives an impression of the primary processes that Pallasopens up and supports, and the data mart dimensions that are related to this.

    Time Article Branch Sales

    Formula

    Campaign Distribution

    Centre

    Customer Employee Supplier

    Brand mgmt /

    CRM

    Format mgmt

    Merchandising

    Replenishment

    Fulfilment store

    Fulfilment

    logistics

    Figure 2: Bus matrix: achieved sourcing and support with regard to primary processes.

    Formal and navigation characteristics have been standardised across the various reporting

    environments. This gives each report and each environment the same look and feel. Thisaffords a uniform and thus peaceful presentation for the user, and also creates a sense

    of recognition regarding what information does and does not originate with the data

    warehouse.

    Besides this, the management of data definitions ensures a uniformity of informationacross the various environments; for instance, the name for the monetary value of sales inone environment can be Turnover, while another environment calls it CustomerSales, because the underlying definitions differ. The data definitions are documented ina central location and are accessible for the end user, so that the latter can interpret theinformation offered in a correct manner. Information that has the same definition, willalso have the same name in each environment, be it at the back-end side, at the database

    level, at the front-end side or in reports and cubes.

  • 8/14/2019 Pallas Architecture (UK)

    9/24

    Pallas - The Albert Heijn Data Warehouse Page: 9

    4.3 Quality of the DataA primary driver for the added value and effectiveness of the data warehouse is the

    quality of the data stored in it1

    . Designing and developing high-quality data was afundamental principle from the very beginning. The following measures were applied to

    accomplish this:

    1. Drafting KDDs2 with respect to the source systems and source data: since a datawarehouse is fed from source systems, the quality of its content is determined to alarge degree by the quality of the source systems. The KDDs then encompass thefollowing:

    Original source: data is only transferred from the source that created the data, andnot from other systems that, for efficiency reasons, also have access to this data butnot to its source.

    The highest possible level of detail: information is opened up and stored in thehighest possible level of detail that is available. Thanks to this, the potential of thedata warehouse is not unnecessarily restricted by the data that has been incorporated,while in the future, perhaps, more detailed data could be required.

    Complete source: when a source table is opened up, the whole source table issourced and not only the data that is required at that moment.

    2. It is always presupposed that sources are contaminated. Consequently, a fulltechnical and functional source analysis is performed on every new source up to theattribute level (data profiling [2]). This provides insight in how a source is actually

    being used. Acknowledged issues are tackled in one of the following manners:

    The source owner is requested to remedy the issue mentioned. A control is made of the ETL process to prevent the issue from arising. Each

    prevention is logged. In this way, for instance, a report can be generated on thenumber of times that a specific issue has been resolved.

    In addition, there is a customised follow-up3on the issue3. Because the TR is the only placeholder for all data, duplications are immediately

    apparent. When the same type of information arises in various sources, it is collected

    in the TR into one and the same table if this is applicable. On this score, the TRproves its added value as data integrator. This means that the condition of the stockof articles in distribution centres and branches can be stored separately in theoperational systems; modalities are created in the TR to combine data so thatcomprehensive reports can be drafted on the total stock position of the article.

    1 Efficiency aspects are established in advance by the shape of the output used. It could be stated

    that the same decision or steering measure could be made with either the data sheet or a

    dashboard/ exception report, provided that the underlying data is the same and is of the same

    quality. However, the latter is not determined by the tools, but by the back-end of the data

    warehouse.2 Key Design Decisions: fundamental decisions with respect to the functional capability and thearchitecture. See the chapter on Approach3 Reject, Ignore, Correct, Suspend

  • 8/14/2019 Pallas Architecture (UK)

    10/24

    Pallas - The Albert Heijn Data Warehouse Page: 10

    4. In each data mart, reference data has the same modelling and the same content. Thisprinciple of conformed dimensions [1] imposes a uniform pattern on all information.Moreover, conformed dimensions offer the prospect of integrating all data relating tothe same subject (article, branch, customer, and the like). One challenge on this scoreis the way of tackling informal stratifications of things like articles or branches.There appears to be a variety of alternative groupings that are not supported by asource system. A strict policy is pursued with respect to this: the data is included inthe data warehouse only when a formal definition and a formal source are availablefor this type of stratification.

    5. Once the software has been constructed to open up new information, a verification isperformed on the data processing. This check examines whether everything that hasbeen supplied has successfully arrived in the data warehouse and the data mart.

    4.4 Performance Capability for ReportingIn conformity with the display sheet, the output in the form of reports on the data

    warehouse (Reporting) makes up the bulk of the functional capability that is supplied

    from the Pallas architecture. The potential for finding content for this type of functional

    capability varies from Excel-style spreadsheets with large amounts of detailed

    information up to and including pushed exception reports, balanced scorecards and

    management dashboards with incorporated information. All these options are supported

    by one and the same standard tool. To keep the number and variety of reports

    manageable, a taxonomy of the reports has been drafted; newly developed reports must

    be classifiable within this taxonomy, otherwise they will not be developed. However, it is

    still difficult to foresee specific functional requirements with these standard forms. This is

    due to procedure-based reasons more than to technological reasons: to a certain degree

    standard report forms require standardised and structured procedures but the proceduresare far from always being standardised. Nevertheless, one positive effect is that the use of

    the BI environment for the business has provided an occasion to address this subject.

    4.5 Other Performance CapabilitiesThe second level in the display sheet is Analysis. Within Pallas, this is understood as

    meaning posing a series of ad-hoc question to the database where each following question

    is determined by the answers to the preceding questions. This requires a first-rate

    performance from the environment; the functional capability in question is made

    operational within Pallas with the aid of (M)OLAP capability (multi-dimensional cubes).

    Target groups for this type of functional capability are the knowledge employees in thePlanning & Control, Market Research and similar support departments.As well as this, a limited amount of statistical analysis also takes place. A proof-of-conceptis also performed with data-mining technology.

  • 8/14/2019 Pallas Architecture (UK)

    11/24

    Pallas - The Albert Heijn Data Warehouse Page: 11

    Two forms of functional capability that are not directly related to the BI, but which Pallas

    does provide, deserve a brief glance:

    The delivery of information: Pallas is not only the platform for managementinformation within Albert Heijn, the chosen data warehouse architecture has alsomade it the central storage place for historical information. This characteristic,combined with the fact that the data is of a high quality, makes Pallas the idealsupplier of historical information. This is effectuated as interfaces leading back tothe operational systems. We can thus speak ofclosing the loop: Pallas supplies therequired information to operational applications that need historical information forthe implementation of their task. This means that such tasks as predicting volumesfor automatic deliveries to stores, predicting campaigns and designing shelf plans forstores can be supported.

    Operational reporting: because the selected standard tool for reporting can also beused on operational databases, it was decided to use this front-end tool to standardise

    all reporting (not only management information, but also operational reports).

    This makes it possible to dismantle the many varieties of other reporting tools that are

    often included in standard packages. In practice this means creating a supplementary

    reporting environment, this time, however, in the operational system instead of in the data

    warehouse. One example of the use of such environments is in the logistics systems. A

    high degree of transparency is guaranteed for the user: with the same portal, tools and

    layout, he/she can retrieve both management and operational reports.

  • 8/14/2019 Pallas Architecture (UK)

    12/24

    Pallas - The Albert Heijn Data Warehouse Page: 12

    5. TECHNOLOGY5.1 InfrastructureThe largest part of the Pallas data warehouse is run under AIX on an IBM p670 with 161.45-GHz processors. Beside this, one other performance-intensive data mart runs, alsounder AIX, on an IBM S85 with 8 750 MHz processors. All data is stored in an EMCdisk cabinet that can hold more than 11 TB.

    Oracle is used as RDBMS. An ETL tool is used to transform and load the data into thedata warehouse; this was preferable to hand encoded procedures due to management-related concerns, given the expected amount of processing jobs. We chose InformaticaPowercenter. The co-operation and scheduling of Powercenter workflow is establishedand managed in control-M (BMC); a part of the scheduling and the workflow

    management will gradually be shifted to the Powercenter Workflow Manager.

    The end users do not retrieve information from the data warehouse directly, but via end-user tools. Microstrategy is used for standard reporting. Most users view their reports in aweb browser via Microstrategy Web, or have these delivered as PDF files, Excelspreadsheets or SMS messages sent by Microstrategy Narrowcaster. A few web servershave been set up using Microsoft IIS to support Microstrategy Web. These web serverscan balance the load of user requests among one another.

    Hyperion Essbase is available for ad-hoc analyses, with as front-ends Hyperion Analyzerand Temtec Executive Viewer.

  • 8/14/2019 Pallas Architecture (UK)

    13/24

    Pallas - The Albert Heijn Data Warehouse Page: 13

    5.2 Architecture

    5.2.1

    Overall Architecture

    The Pallas architecture is constructed around a central data warehouse, the TransactionRepository (TR). This architecture is actually a hybrid intermediate form between theInmons [3] Corporate Information Factory on one side and on the other the datawarehouse from Kimball [1], which is based on dimensional data marts. The TR is filledwith source information, this is then routed to the front portal data staging area) fromwhere it is processed further. The data marts are built from the TR. Multi-dimensionalcubes for ad-hoc analysis can be generated from the data marts.

    Figure 3: Overall Architecture of Pallas

  • 8/14/2019 Pallas Architecture (UK)

    14/24

    Pallas - The Albert Heijn Data Warehouse Page: 14

    5.2.2 Transaction RepositoryThe Transaction Repository (TR) functions as the central one copy of the truth, the

    storage place for historical information at the most detailed possible level. The TR isoptimised for bulk loading and queries on detailed information and, in principle, end users

    do not retrieve information directly from it. The ultimate horizon for the TR is 5 years,

    with a database size of approximately 2.5 TB (raw data).

    With a view to controllability, it is not desirable to change the TR model frequently: thecentral source for historical information should not need to be completely redesigned,implemented and migrated every six months. Therefore when the TR model was beingdesigned, we had in mind the greatest possible independence for the source systems, onone side, and the business requirements of the moment, on the other. A generic manner ofmodelling based on the relational model, was used for reference data; reference data werestored according to the following pattern:

    Figure 4: PRODUCT Modelling subject area (simplified)

  • 8/14/2019 Pallas Architecture (UK)

    15/24

    Pallas - The Albert Heijn Data Warehouse Page: 15

    The entity PRODUCT constitutes the heart of the PRODUCT subject area: the domain in

    the TR where all reference data relating to sales articles are stored. This entity contains

    only unchangeable PRODUCT attributes; it is also the place where the mapping between

    the operational keys and surrogate keys takes place. Only the surrogate keys are used

    within the data warehouse. All changeable attributes of PRODUCT are stored inPRODUCT_HIST. This table contains several snapshots of PRODUCT, as it were,

    provided with an indication of the time; each snapshot is stored with a starting and ending

    date for the situation concerned. This also contains relations with reference data such as

    unit of measure and brand name. In this way, the history is stored even when the source

    does not do this itself.

    PRODUCT groupings are stored in PRODUCT_GROUP. A link to the table

    PRODUCT_GROUP_TYPE establishes the type of grouping that is stored. The table

    PRODUCT GROUP IN PGRP_HIST is used to link groupings to one another

    hierarchically. The table PRODUCT IN PRODUCT GR HIST situates a product in a

    hierarchy. The last two tables again also contain a starting and ending date, because

    hierarchies can change over time.

    Thanks to this way of modelling it is possible to add and expand hierarchies and

    hierarchy levels without consequences for the modelling.

    5.2.3 Data MartsBusiness Requirements

    While the TR has a permanent character, the data marts are oriented exclusively tofulfilling a demand for information from the business, without attention for such

    requirements as being future-safe, reusable, etc. The idea behind this is that the TR servesas an unchangeable historical source, while the way in which the end user regards theinformation can change, depending on the support that he/she needs to perform his/hertask. If the end users way of looking at the data changes, this can mean that a data martmust be completely remodelled and regenerated. The data marts thus have a much moretransitory character than does the TR. As was indicated in the chapter on FunctionalCapability, when modelling is done to suit the users needs, an attempt is made toconstruct the data marts in such a way that they can act as a whole when supporting agiven business process.

    Another difference between the TR and data marts is that the data marts usually containno detailed information, but contain primarily information that has been collected. The

    detailed information from the TR serves only as a basis for composing the views that theuser needs to support his/her work. In this way the details of check-out transactions thatare stored in the TR can serve as a basis for the composition of the turnover informationfor each branch in each quarter. The user is only interested in the collected information,thus the data mart need only contain the aggregation. (One exception is the data mart usedfor bonus card analysis, where the analysis takes place at the check-out transaction level.)The time horizon for each data mart is also determined by the desires of the user forwhom it has been developed. The data mart with the broadest historical coverage containsinformation going back (collected over) 3 years, but there is also a data mart with a 6-week history. Beside this the cubes have technical restrictions on the amount ofinformation that they can store.

  • 8/14/2019 Pallas Architecture (UK)

    16/24

    Pallas - The Albert Heijn Data Warehouse Page: 16

    Tool-determined Requirements; Dimensional Modelling

    In addition to the business drive, the requirements that are imposed by the end-user toolare also relevant to the modelling of the data marts. Most end-user tools require adimensional modelling (star diagrams), thus we follow the Kimball [1] dimensionalmodelling method for the data marts. Beside this Microstrategy places several specificdemands on the dimensional model. For instance, the Microstrategys performance issignificantly improved when the hierarchies of the dimensions in the model are explicitlymodelled (snowflaking). On this point, there is a divergence from the Kimballmodelling, which prescribes far-reaching de-normalisation. The performance of thesnowflake is improved still further by allowing the keys from the higher levels to beinherited by lower levels in the hierarchy so that the number of joins needed to reach ahierarchical level is kept to a minimum. For instance, the entity ARTICLE contains thekeys of all the higher-situated levels. (For that matter, it is also true here that shouldMicrostrategy be replaced by another end-user tool, this would probably impose suchdifferent requirements on the modelling of the data marts that these would have to beremodelled and regenerated. The separation between the TR and the data marts and the

    independent modelling of the TR means that this can soon be achieved without posingany problem.) In the data mart, the users always see the facts from todays perspective,whether these are yesterdays facts or facts from two years ago. This means that the factsin the data mart are always collated according to the most recently known version of thedimensions (in Kimball-terms: a type 1 approach of slowly changing dimensions isfollowed). Although the TR can be used to generate other perspectives, there has thusfar been no business demand for this. Thus, the data marts have not yet made any use ofthe history dimension available in the TR.

    The generation of cubes takes place on the basis of the data marts, and not the TR, becausethe construction of the dimensions in the cubes is the same as those in the data marts. Oncethey have been generated in the data marts, they are immediately ready in the correct

    format for the cubes.

    All the dimensions in the data marts are designed as conformed dimensions, so thatthe information in the different star diagrams can be compared and related to oneanother (drilling across).

    5.3 Data Staging Areas (DSA)Source files are received in the first DSA, and the source data is transformed into themodel of the TR format. This is where quality controls, cleaning and enrichment of thedata take place. The initial DSA consists both of files and of tables. In the second DSA, it

    is decided which factual information must be loaded onward (see the section on Deltas),and the conformed dimensions are constructed on the basis of the last state of thereference data in the TR and are transferred as a whole to the data marts.

    The DSA tables are a physical part of the TR database.

  • 8/14/2019 Pallas Architecture (UK)

    17/24

    Pallas - The Albert Heijn Data Warehouse Page: 17

    5.3.1 ProcessingThe information from the source systems are usually batch processed at night. That keepsthe processor capacity during the day available for producing reports on the data marts

    and making analyses in the cubes; moreover, some information can only be supplied afterthe close of the day. The processing is subdivided into several main flows, each of whichloads a particular type of data: check-out transactions, logistic movements, condition ofthe stock, etc. The main flows are, in their turn, composed of approximately 1500Powercenter jobs. Managing the underlying dependencies and starting these flows is donein Control-M.

    Information can be used in several data marts; there is thus an m:n correspondencebetween data flows and data marts. In the first place, source information is loaded onwardin one flow from the source via the TR to the data marts, where they were needed.Sometimes for reasons of performance, the information is loaded from other data martsinstead of from the TR, because the tables needed were already present in the desired

    basic form in another data mart. This means that the various flows become increasinglyinterwoven, which had an unfavourable effect on the underlying dependencies and theexpandability of the environment.

    That is why a new approach was developed in which it is possible to work with semi- finished products. This means that the information flows are cut in two: first theinformation from all flows is processed in the TR and the underlying DSA, where the so-called semi-finished products are produced. These semi-finished products then serve asthe basis for constructing the separate data marts. Shared basic tables now arrive in theDSA.

    Figure 5:ETL processing by means of semi-finished products

    A procedure for undoing the interweaving and partially reorganising the standing

    environment according to the semi-finished product principle has already started and will

    run until the end of 2004.

  • 8/14/2019 Pallas Architecture (UK)

    18/24

    Pallas - The Albert Heijn Data Warehouse Page: 18

    Deltas

    Some sources can deliver changes (corrections) to factual information that has already

    been loaded. A smart delta mechanism is used to process this correction: it sends a

    signal when movements in the factual information have taken place. It ascertains whichinformation has changed since the last processing run or which has been newly supplied.

    This is used to create a To Do list that indicates which facts must be updated. The ETL

    processing uses the To Do list to determine which information must then be passed on for

    loading into the data marts. Altered information is first removed from the tables and then

    entered anew. The fact tables are thus constructed incrementally instead of always being

    completely recreated anew each time.

    This not only reduces the total amount of processing in comparison with dumb bulk

    loading, it also maintains a record of the administration and the logging of information that

    has changed.

    Re-incorporation

    The use of the todays perception raises an issue in the case of incorporated information.After all, this is incorporated on the basis of the perception at the time of theincorporation; factual information that has not been changed is not reloaded, so changesin the corresponding reference information (adjustment of the perception) is notimplemented in the incorporations. The picture of the reality that the incorporationsprovide becomes gradually less and less accurate. This applies to incorporations that areincorporated by means of dimensions in which entities can be assigned a differentgrouping, for instance in the article dimension, when articles are shifted from oneassortment group to another.That is why these incorporations must regularly be reconstructed anew. Given the

    magnitude of the available history in some data marts, it is not practicable to include re-incorporation as element of the nightly processing: the re-incorporation would simplyrequire too much time. When the time arrives for re-incorporation, a copy of the basictables is made and the re-integration process is started to run in the background. When theintegration has been completed, the old and new integration tables are swapped.

    5.4 Interfaces and Interaction with Other SystemsBy definition, a data warehouse that has been created to integrate information from othersystems has links to these other systems.

    In integrating information from different systems, Pallas has had an important advantagefrom the very start with respect to many of its fellow data warehouses: within AlbertHeijn the source systems were already in an highly standardised and uniform statebecause a corporate data model had been in use. That meant that for each type of sourcedata there was, in principle, only one source system in use; delivery of similar types ofdata from source systems that differed in terms of modelling and had dissimilar datadefinitions was thus not an issue.

    In 2000, it was decided to remodel the AH application landscape. To anticipate better theflexibility and differentiation demanded by business requirements, it was decided toswitch over to a service-oriented approach. Applications were henceforth set up asindependent services each of which had its own data store and could communicate withother applications non-synchronically in an event-driven manner by means of messages

    sent over a message broker.

  • 8/14/2019 Pallas Architecture (UK)

    19/24

    Pallas - The Albert Heijn Data Warehouse Page: 19

    It is clear that this also had repercussions on the delivery of source data to Pallas:although the majority of the data would still be delivered as an interface file in a batchprocess, a growing number of applications delivered their source data in near-real-time inthe form of messages, to the extent that this was practicable and meaningful. Althoughthis made new demands on the data warehouses reception and loading mechanism, it didoffer new potential. In the meantime, Pallas has linked up to the message flow for check-out transactions: each check-out transaction in the store is forwarded in near-real-time toheadquarters where it is processed in various relevant systems, including the datawarehouse. The data warehouse collects the real-time information in an ODS (operationaldata store) and sends this onward to the TR in small batches for loading every tenminutes. This is called trickle feed: the information trickles, as it were, into the datawarehouse. This not only gave Pallas more room in its nightly batch window after all,the more information that trickles in during the day, the less there is that need to beprocessed at night it also meant that users could be provided with up-to-dateinformation in near-real-time The area of focus in shifting all or part of the processing today-time hours is the balance between the time gained by loading during the day and theusers experience of reporting performance. After all, if the loading is done during the

    day, this can be at the expense of the processor capacity that is available for runningreports.

  • 8/14/2019 Pallas Architecture (UK)

    20/24

    Pallas - The Albert Heijn Data Warehouse Page: 20

    6. APPROACH6.1 Organisational MeasuresWithin Albert Heijn, the development and maintenance of Pallas is entrusted to theCompetence Centre Business Intelligence (CC-BI). This is a component unit of AH IT,Albert Heijns IT department. In addition to this, the CC-BI also carries out BI activitiesfor various other of Aholds operating companies on the basis of existing infrastructureand working methods. The CC-BI is divided into two sub-departments: Operations &Improvements (O&I) and Projects. The activities performed included all the activities thatoccur from DBA activities to end-user support; only the management of the machinery issituated elsewhere. Depending on the number of current projects, there are between 30and 50 people working in the department. The CC-BI supplies services to its users. Aservice can be seen as a logical whole of information on offer, including the complete

    flows leading up to it, from source to report.

    6.1.1 Current ServicesThe delivery of existing services is provided by O&I. O&I is primarily responsible for thedaily operation and the maintenance of the standing architecture and BI solutions; Thisalso includes implementing cost-saving improvements to the architecture andinfrastructure. A Service Level Agreement is entered into with each users' group for theservice that it receives; this is to help provide an orientation for its management. Thisestablishes agreements on such items as the expected time at which the requested reportswill be available each morning and the maximum down time of a service. In addition,

    O&I supplies support and training for the end users of the data warehouse and providesQuick Services: upon request, information that the users cannot access themselves butwhich is stored in the data warehouse can be supplied via reports or cubes.

    Finally, standards and directives for developing new solutions are drafted and monitoredwithin O&I and the Pallas Delivery Framework is also managed there. The developmentprocesses and deliverables for new projects are defined within this framework.

    6.1.2 MetadataMetadata (information about the information) is indispensable to the proper management

    of the environment. Two types of metadata are particularly interesting in this context. The

    first is dynamic metadata, metadata about the process: how many reports are run, howheavily taxed the machinery is during the day, what service level has been attained, etc.

    The data warehouse, as standardised environment for reporting, reports information on its

    own operation. For this purpose, the systems own tooling registers facts and dimensions

    in the TR. This is then processed into a separate data mart from which reporting is also

    done using standard tools. A second type of metadata for supporting the operations

    consists of statistical metadata; information on the structure of the data warehouse.

    Various services can have certain components (machines, databases, ETL processes, etc.)

    in common, thus the failure or modification of one such component can have an impact

    on several services. Setting out clearly the mutual dependencies between services and

    evaluating the impact of changes and service interruptions on the various services is still

    primarily manual work. The endeavour is to automate this further with the help of an

    integrated metadata solution.

  • 8/14/2019 Pallas Architecture (UK)

    21/24

    Pallas - The Albert Heijn Data Warehouse Page: 21

    6.1.3 New ServicesThe Projects sub-department is organised into project teams that develop new services onthe basis of the existing environment and architecture. These project teams do not work in

    isolation: O&I employees are involved in projects as reviewers to guarantee the tie-in ofnewly developed solutions with the existing environment. Ultimately O&I determineswhether and when a new solution is put into production.

    The central framework for each project team is the Pallas Delivery Framework. Becauseeach project or release usually has the same basic elements data logistics and front-endfunctional capability a high degree of standardisation in the activities is attainable.Although a lot of room must still be reserved for specific solutions, everyone operates onthe basis of same reference framework. This is also necessary because everyone is in factworking on the same environment.

    6.2

    Methods, Techniques and Documentation

    Fundamental decisions with respect to the functional capability and the architecture ofthe data warehouse are recorded as Key Design Decisions (KDDs).Originally createdas a medium for formalising decision-making, they form a good record and referencefor the principles that lie at the foundation of Pallas.

    Besides this, the directives and best practices are formulated and established for dailydevelopment practices. After experimenting with various types of recording, design

    patterns have recently been introduced for this purpose. The design patters have beendrafted for processing facts with an unknown reference, for the way of logging dataquality problems and for the processing of messages delivered in real time.

    Analysis: information analysis for BI applications appears to be a difficult matter.Ultimately, it is not very difficult to design the desired output in the shape of reportsonce it has been established for which management or decision-making processes theinformation is needed. However, analysing, modelling and describing all this is stillundeveloped territory. Ideas and experiences within the CC-BI are being developedinto best practices, including varieties of prototyping, solution scenarios and menucards. A menu card is a representation method in which an operational or otherprocess is analysed; which management or other information is needed for eachprocess step is determined as is the source that can provide the information.

    In any case, experience has shown that the analysis methods used for normal systemdevelopment are inadequate. For instance, it has been established that Use Case

    Engineering (RUP/UML) offers hardly any grip, since each use case would amount toa Print Report. Experiences with function point analysis have also shown that thecharacter of information analysis and functional design for BI applications isfundamentally different than for transactional systems.

  • 8/14/2019 Pallas Architecture (UK)

    22/24

    Pallas - The Albert Heijn Data Warehouse Page: 22

    7. COSTS AND BENEFITS7.1 General SituationWhile drafting a general cost/benefit analysis for IT is difficult, the job is particularlythorny for BI applications. After all, how do you determine the benefits of a new,improved BI environment? Ultimately it comes down to quantifying the effect of betterdecision making and steering, which seems to be an impossible exercise. Added to this isthat investments must often be assessed in comparison to one another. Think of thecomparative assessment that the management of a company must make when deciding toinvest either in logistics or in BI: it is probably easier to make credible for the first thatthere are tangible benefits than it is to do so for the second.

    As was already indicated in the rationale, several considerations played a role in Albert

    Heijns decision to invest in BI:

    Controllability: There was a high degree of fragmentation in the old solutions: amultiplicity of technical solutions, each of which was managed by a different ITdepartment. From the perspective of management, it was desirable to consolidate allthis and make it uniform. The cost/benefit consideration for this consideration couldbe expressed in terms of lower management costs.

    Necessity: each company has the need of the functional capability to provide itsmanagement and steering information. As the complexity of the organisationincreases, the demand for management and steering information also increases. It wasevident that future developments within Albert Heijn (including the differentiation

    strategy) would pose demands that could no longer be met by the then currentsolutions: so something new was necessary. Within the framework of theseconsiderations it was important to set an acceptable investment level.

    Ideology: it was believed also at the level of the board of directors that creatingone integrated information environment would provide many advantages for thecompany in terms of data quality, integration and combination of information,availability of detailed data, the one copy of the truth principle and the like. Withoutbeing able to quantify this in hard figures, there was a strong conviction that thissolution could lay the foundation for future benefits. This last consideration inparticular contributed to the decision to make the actual in depth investment.It is understandable that the combination of considerations was decisive in the ultimatechoice to make the investment.

  • 8/14/2019 Pallas Architecture (UK)

    23/24

    Pallas - The Albert Heijn Data Warehouse Page: 23

    7.2 CostsIn the sphere of action the rule of thumb was used that more than 70% of the development

    costs would be made in the back-end development, so specifically in the development ofthe ETL processing (excluding the hardware and licence costs). This rule of thumb alsoapplied to Pallas; for this reason much steering was placed on this part of the systemdevelopment. One of the measures that were taken to control costs was ,in the case ofopening up sources, to read in the entire source table and not only the attributes for whichthere were specific requirements. The additional costs of adding this later proved to besubstantially higher than the extra costs of including them in the initial development. Inaddition, considerable attention was devoted to data quality analysis, ETL design andintegration testing. Inevitably, there were costs attached to learning from experience andincrements that were too large were initially chosen in some sub-areas. In general,however, practice showed that the foundation that had been laid was solid and the futuresecure.

    Just as with any other information system, Pallas is continually expanding and there is apermanent demand for changes. To prevent uncontrolled proliferation, a steering groupwas created. In addition to keeping the total cost of Pallas under control, it also had thetask of keeping watch over the quality and coherence of the content of the BIenvironment as a whole.

    7.3 BenefitsOn the benefit side, the advantages were mainly found in the reusability of elementsof the data warehouse architecture. Relevant elements in this context are specifically:

    knowledge standard tools approach and methodology.

    Several architectural variants were developed as spin-offs from Pallas by combining the

    above mentioned components. These could be used to service different types of desires in

    the area of management information. By different types we mean smaller-scale, with a

    smaller scope and lower demands for integration and business processes Within the CC-

    BI, these were referred to as the Ferrari and the Volkswagen: the competence centre has

    the architecture, the knowledge, the tools and the approach in house to build a Ferrari, but

    there is no reason why these could not also be used to build a Volkswagen.

    These alternative variants have thus far been used for other of Aholds operatingcompanies, including Gall&Gall, Ahold Vastgoed, albert.nl and the holding companyitself. For instance, the data warehouse solution for Gall&Gall was created at a relativelyvery low cost. The management costs are proportionately low, certainly when they arecompared with a scenario in which Gall&Gall would have kept control over thedevelopment and management. Strictly speaking, these benefits are not for Albert Heijn,but for Ahold as concern. This certainly also played a role in the considerations whendeciding to make the initial investments.

  • 8/14/2019 Pallas Architecture (UK)

    24/24

    For Albert Heijn, the integration and combination of information has opened up newopportunities. For instance, in the past there was little, if any, insight in the financialeffects of write-offs in stores: the information that was relevant for determining this wasspread over various systems and could not be combined due to their differing datadefinitions. Only after the information was combined and given one data definition inPallas, did it become possible for the first time to take structural action to stem the flowof write-offs.

    Another important advantage from which Albert Heijn benefited directly was the leadtime for delivering information on an ad-hoc basis. Many examples could be given inwhich the integrated collection of information and the speed with which this could beaccessed have proven their cogency:

    Setting out clearly the hoarding behaviour after major calamities (11 September,Iraq war)

    Supporting the initial price offensive in the fall of 2003 Interim evaluation of specific campaigns

    In addition to this, it has become evident that preserving the history of both reference andfactual information offers considerable added value. This not only produces better insightinto the historical behaviour of campaigns, the condition of the stock and the turnover, forinstance, but the historical behaviour of reference data, such as the development ofpurchasing prices over the years, can also be analysed.

    7.4 Future ProspectsNow that a large part of the information available in source systems is accessible from thedata warehouse, attention on the supply side will shift more to filtering out the golden

    nuggets from the enormous amount of information that is available: less is more. Userswill, for instance, be more frequently informed via automatic alerts and proactiveexception reports based on business rules. It is as if intelligence were being added to theinformation.

    On the demand side, initiatives have been launched to give third parties, such assuppliers, access to elements of the data warehouse. In house, the integration aspect willcome to play a greater role: while in past years the primary objective in opening up mostsubject areas was to support separate business processes, in the future the demand forcomprehensive information will increase over the entire chain. This movement is visible,for instance, on the replenishment side, where after the demand from separate businessrequirements for supporting DC replenishment and supplying stores, the demand now

    arises for information on stock movements throughout the whole chain. Thanks to thechosen approach and architecture, this type of information is in most cases alreadypresent in an integrated form in the enterprise data warehouse. On this point, too, thechoices made with regard to the approach and architecture will continue to yield anincreasing benefit.