data warehouse design

34
Data Warehouse Design Ines.beltaief.54 [email protected]

Upload: ines-beltaief

Post on 27-Jan-2017

149 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data warehouse design

Data Warehouse

Design

[email protected]

Page 2: Data warehouse design

Outline1.Context2.Introduction3.Problematic4.Papers5.Comparaison6.Discussion7.Conclusion

InesBeltaief
Page 3: Data warehouse design

Context In this work ,I’ll present the Data

wherehouse concept, Different methods of modeling and advantages of each proposed methods.

For this, I worked on 3 papers talking about The X-META (a methodology for data warehouse design with metadata management) & gave conceptual data models for dw design.

Page 4: Data warehouse design

Introduction In computing, a data warehouse (DW or DWH), also known

as an enterprise data warehouse (EDW), is a system used for reporting and data analysis.

Date Warehousing is a powerful tool for supporting decision-making processes in modern corporations. However, developing a Data Warehouse (DW) is a complex and costly activity. It requires strategies, which should be specific to the characteristics and needs of the organization where it will be introduced.

Page 5: Data warehouse design

ProblematicHow to design a Data wharehouse ?

What are the method studied ?

Which are the steps to design a DW?

Page 6: Data warehouse design

Papers A DataWarehouse Conceptual Data Model :

Enrico Franconi and Anand Kamble ( Faculty of Computer Science, Free University of Bozen-Bolzano, Italy )

X-META: A Methodology for Data Warehouse Design with Metadata Management : Liane Carneiro - Angelo Brayner ( University of Fortaleza – UNIFOR Fortaleza - Ceará, Brazil )

A comprehensive Method for Data warehouse design : Sergio Lujan - Mora and Juan Trujillo ( Departement of Software and Computing Systems University of Alicante Spain )

Page 7: Data warehouse design

Paper 1 : X-META

Abstract:

the contributions of this paper are twofold. First of

all, it proposes amethodology, which

addresses the development of DW projects. Second, it

presents a strategy to cope with metadata

management integrated to the DW

development process.

Related Works:

the existent methodologies are

normally addressed to generic environments.

Thus, they donot address DW development in

organizations that need a methodology adapted to their characteristics and

expectations

Page 8: Data warehouse design

Metadata Creation & ManagementMetadata is “Data that provides information about other data". Two types of metadata exist:  structural metadata & descriptive metadata. Structural metadata is data about the

containers of data. Descriptive metadata uses individual

instances of application data or the data content.

Page 9: Data warehouse design

Metadata Creation & ManagementMetadata should exist throughout the development process and during all the useful life of the DW. Therefore, metadata, in a DW project, play a key role. It is a big challenge to many DW projects, because there exists much heterogeneity among tools and products for creating and managing metadata in a Data Warehounsing environment.

Page 10: Data warehouse design

X-META: A METHODOLOGY FOR DW DESIGN

The key idea of the proposed methodology is to start with a DW pilot project in order to introduce experience in constructing a DW into the internal team and to prove the viability and the DW importance to an organization.

The overall methodology lifecycle is divided into 5 major phases, as depicted in figure 1.

Page 11: Data warehouse design

The X-Meta methodology defines three distinct iteration types, each one with its own specific purpose:

(3) DM/DW Project – In this level, many pilot projects may have already been developed and many uncertainties have already been eliminated. Besides, the development process in this iteration type, which corresponds to DM/DW projects, uses the same methodology used in the pilot project iteration type.

(2) Pilot Project – it allows the incremental and evolutionary development of pilot projects, each of which with its own purpose, for instance: testing products, acquiring experience, developing metadata repositories, etc. Each pilot project represents an iteration, which begins in the Planning phase and uses one or more input from the DM/DW Construction Cycle phase, depending on the evolutionary level and on the project goal.

(1) First prototype – it has the main goal of allowing the insertion of the DW technology in the organization. This iteration, which is executed only once in the DW development process, uses only the Introduction phase in the development of the first pilot project in the organization;

X-META: A METHODOLOGY FOR DW DESIGN

Page 12: Data warehouse design

A presented example of possible projects to be developed in an organization using the methodology presented in this paper. Each spiral represents a project and each project uses a specific iteration type.

X-META: A METHODOLOGY FOR DW DESIGN

Page 13: Data warehouse design

Metadata modeling It has the goal of defining and controlling the activities related to

the metadata, supplying corporative directions for its creation, use and maintenance.

there is a list of activities to be performed in this module. Those activities have the goal of producing a metadata model that will be used during the construction of a metadata repository and during the metadata management phase:

Define the metadata management team and its responsibilities; Define and classify metadata types that will be stored;

Define user types a repository and the equivalent access levels; Determine the metadata sources in the organization

Define and construct the metadata model and metadata flow.

Page 14: Data warehouse design

Metadata infrastructureThe metadata repository is an infrastructure component of the DW environment that is useful to all other environment components, working as a tool to help in the integration. The goal of this module is to define the general metadata infrastructure in the organization. The activities belonging to this module are: Construction of an architecture for the metadata (centralized,

decentralized, distributed); Evaluation of metadata tools in order to select the best one to

achieve the project goals. Physical implementation of the metadata repository; Defining security procedures.

Page 15: Data warehouse design

Abstract : The goal of the work is to extend the standard ER conceptual

data model, as defined in the database textbooks, with constructs which allow the modelling of multi-dimensional aggregated entities together with their interrelationships with the other parts of the conceptual schema.

This work is also based on a similar preliminary work done by one author on the use of Description Logics as a mean to give precise semantics to a data warehouse conceptual data model and to study its computational properties.

Paper 2 : A DataWarehouse Conceptual Data Model

InesBeltaief
Page 16: Data warehouse design

A Data Warehouse Conceptual Schema

Page 17: Data warehouse design

Explication the entity Calls represents a basic cube whose dimensions are Date,

Destination, Source, which are restricted to the basic levels Day, Point, and again Point, respectively.

A first extension of the language can be seen with the simple aggregated

entities, non-dimensional aggregations—Weekday and Customertype, which represent dimensional levels built from the basic dimensional entities Day

and Point, respectively. A simple aggregations aggregate the collections of objects that are in the

extension of the aggregated entities. So, in our example, since the entities Mon, . . ., Sun form a partition of the entity Day, the Weekday entity denotes exactly seven objects, one for all the Mondays, one for all the Tuesdays,etc.

Page 18: Data warehouse design

On the other hand, the aggregated entity Customertype denotes exactly two objects, consumer and business. In this way, by interleaving partitioning and simple aggregations, we are able to construct level hierarchies starting from some basic dimensional level.

A second extension of the language is the multidimensional aggregated entity, by the entity Calls-by-Weekday-and-Customertype. This entity denotes all the cells of a cube whose coordinates are the weekdays of the date of the calls, and the customer types of the originators of the calls.

A multi-dimensional aggregated entity is an entity itself in the ER diagram, and it can have attributes and can be part of further relationships or constraints.

Explication

Page 19: Data warehouse design

Paper 3 : A Comprehensive Method For DW Design

Abstract:Various methods and appraches have been presented for designing different parts of DW,but, no general and standard method exists to date for dealing with the whole design of a DW .In this paper we fill this gap by presenting a method based on UML allows the user to tackle all DW design phases and steps from the the operational data source to the final implementation and including the definition of ETL Processes .

Related Works:

Different case of studies of data marts (DM) are presented.The

MD modeling is based in the use of the schema and it’s different

varaitions.Most recently , a method for the DW is proposed which basn a MD

model called IDEA and it proposes a set of steps to

adress the conceptual , logical and physical design of a DW.

InesBeltaief
Molecular dynamics (MD) is a computer simulation method for studying the physical movements of atoms and molecules
Page 20: Data warehouse design

Overview of the method

We have adopted the OO paradigm because it is semantically richer that others and it offers numerous advantages.

The design of a DW is a joint effort of DW developpers ( Technical user ) and final

user ( the user who are only interested in the business content ). Therefore, a powerfull method with the

correspond models is needed and we believe the OO paradigm is the best approach for the DW design .

Page 21: Data warehouse design

Design for a data wharehouseOverview of the method

Page 22: Data warehouse design

Design for a data wharehouseThe architecture of DW is depicted as various layers ( every layer is derived from the previous layer ) .

Following this consideration , we consider that the developpement of a DW can be structered into an integrated model with four different shemas :o Operational Data Shemas (ODS) : Defines the structure of the

operational and external data sources.o DW Conceptual Schema (DWCS) : Defines the conceptual shema

of DW .o DW Storage Shema (DWSS) : Defines the physical storage of the

DW depending on the target platform .o Business model (BM) : it defines the different ways or view of

accessing the DW from final user’ point of view .

Overview of the method

Page 23: Data warehouse design

Two shcema mapping are also needed in ordre to obtain a global and integrated DW design approach that covers the necessary shcemas:

ETL Process : Defines the mapping between the ODS & DWCS. Exportation Process : Defines the mapping between the DWCS &

DWSS.

Design for a data wharehouseOverview of the method

This method accomplish each one of the shcemas and mapping in an integrated manner : We use a modeling notion based on the UML.

Page 24: Data warehouse design

Data warehouse conceptual schema (DWCS) : The most important feature of the MD paradigm is dividing data into facts and dimensions, to provide data on a suitable level of granuliraty, hierarchies are defined on the dimensions.

Overview of the method

StarSchema1

StarSchema2

Dimension1

Dimension2

Fact 1

Dimension2

Dimension2Level 1

Dimension2Level 3

Dimension 2Level 2

The UML Profile includes the use of the UML package , in this way , when modeling complex and large DW systems, we are not restricted to use flat UML class diagrams ans , therefore, cluttered diagrams are avoided.

Design for a data wharehouse

Page 25: Data warehouse design

LeveL 1: Model Definition . A package represent a star schema of a conceptual MD level. A depency between two package indicates that the star schemas share at least one dimension.

LeveL 2: Star Schemas Definition . A package represents a fact or a dimension of a star schema. A depency between two dimension package indicates that the package share at least one level of a dimension hierarchy.

LeveL 3: Dimension/fact definition. A package from the second level is exploded into a set of classes that represent the hierarchy levels in a dimension package ,or the whole star schema in the case of the fact package.

Overview of the methodDesign for a data wharehouse

Page 26: Data warehouse design

How to apply the Method

We propose a set of steps to guide the design of a DW following our approach .The UML diagram is devided into two swimlanes depending ion who leads the activities : DW final user ( final users guide the work of DW designers and administrators ) and DW designers & administrators ( they don’t need the help of final users because all the needed informations has been previously gathered ).

Activities where the schemas are created

Activities where the mapping between

schemas are created

Page 27: Data warehouse design

Different substeps can be achieved during requirements gathering ; specifically, the designer has to :

1) Analysis :

*Determine the desired data

format,level & elements users

wish

*Classify different

summaries.*Define acces

control & security rules.

*Help the final user to

understand wht they do not know

they need whereas others

are less common .

How to apply the Method

Page 28: Data warehouse design

DWCS : two “extreme” strategies can be adopted in this activity : top-down (definition of DWCS based on the final user requirements) , bottom-down ( definition of DWCS based on the available data sources ).

We suggest to adopt a combined solution : the DW is designed from the final user’s requirements, but bearing in mind the available data source. ETL process : it’s defined as a mapping between the data source and

the DWCS. this activity and the previous one define a cycle, because during the definition of the ELT process some errors in the DWCS can be detected and , therefore, the DWCS may be modified. Define DMs (BM): different models in the BM are defined from the

final user’s initiale requirements and the DWCS, the BM can be implemented as real or virtual DMs.

2) Design :How to apply the Method

Page 29: Data warehouse design

Define Storage (DWSS) : the target platform is selected ( relational, OO..) and the corresponding logical schema (DWSS) is defined ; the query performance can be improved by simplifying the data schema( so that it only contains the essential data ) or by the definition of summaries ( aggregates ) based on the final user’s requirement .

Define exportation process : the mapping between DWCS and DWSS are manually or automatically defined .

Implement Reports & queries : requested by the final users and implemented in the query tool used.

3) Implementation :

How to apply the Method

Page 30: Data warehouse design

Validate DW : the solution obtained ( the DW built ) is checked against the existing problem (final user requirements ) . If any discrepancy exists, some corrective actions can be taken and the process can return to one of the previous activities.

4) Test :How to apply the Method

Page 31: Data warehouse design

No Pratique application yet

Based on Uml modeling

A tutorial is availableonline.

Uses multiple extended EER or UML diagramsTogether.

The proposed methodology is being used to construct a DW for a department of the Fortaleza City Hall.

New concept

Compariason

1 2 3

Page 32: Data warehouse design

This work present a global data warehouse design method that is based on UML for designing the different DW schemas and the corresponding transformation in an integrated manner .

The goal of the work is to extend the standard ER conceptual data model, as defined in the database textbooks, with constructs which allow the modelling of multi-dimensional aggregated entities together with their interrelationships with the other parts of the conceptual Schema.

X-Meta addresses the problem of developing a first DW project in corporations, which do not have staff with practical experience in such development. Furthermore, it integrates metadata creation and management to the DW developing process.

Discussion

1 2 3

Complicated Notions Brief explanation

with many missing parts

Complete and Clear work

InesBeltaief
EER : Enhanced Entity-Relationship Model, a high-level conceptual data model extended from the Entity-Relationship Model
Page 33: Data warehouse design

Data warehousing is a business analyst's dream — all the informations about the organization's activities gathered in one place, open to a single set of analytical tools. But how do you make the dream a reality?

You have to plan your data warehouse system. You must understand what questions users will ask it because the purpose of a data warehouse system is to provide decision-makers the accurate, timely information they need to make the right choices.

Conclusion

Page 34: Data warehouse design

Thanks For You Attention