data warehouse design
Post on 27-Jan-2017
149 Views
Preview:
TRANSCRIPT
Data Warehouse
Design
Ines.beltaief.54Inesbeltaief8@gmail.com
Outline1.Context2.Introduction3.Problematic4.Papers5.Comparaison6.Discussion7.Conclusion
Context In this work ,I’ll present the Data
wherehouse concept, Different methods of modeling and advantages of each proposed methods.
For this, I worked on 3 papers talking about The X-META (a methodology for data warehouse design with metadata management) & gave conceptual data models for dw design.
Introduction In computing, a data warehouse (DW or DWH), also known
as an enterprise data warehouse (EDW), is a system used for reporting and data analysis.
Date Warehousing is a powerful tool for supporting decision-making processes in modern corporations. However, developing a Data Warehouse (DW) is a complex and costly activity. It requires strategies, which should be specific to the characteristics and needs of the organization where it will be introduced.
ProblematicHow to design a Data wharehouse ?
What are the method studied ?
Which are the steps to design a DW?
Papers A DataWarehouse Conceptual Data Model :
Enrico Franconi and Anand Kamble ( Faculty of Computer Science, Free University of Bozen-Bolzano, Italy )
X-META: A Methodology for Data Warehouse Design with Metadata Management : Liane Carneiro - Angelo Brayner ( University of Fortaleza – UNIFOR Fortaleza - Ceará, Brazil )
A comprehensive Method for Data warehouse design : Sergio Lujan - Mora and Juan Trujillo ( Departement of Software and Computing Systems University of Alicante Spain )
Paper 1 : X-META
Abstract:
the contributions of this paper are twofold. First of
all, it proposes amethodology, which
addresses the development of DW projects. Second, it
presents a strategy to cope with metadata
management integrated to the DW
development process.
Related Works:
the existent methodologies are
normally addressed to generic environments.
Thus, they donot address DW development in
organizations that need a methodology adapted to their characteristics and
expectations
Metadata Creation & ManagementMetadata is “Data that provides information about other data". Two types of metadata exist: structural metadata & descriptive metadata. Structural metadata is data about the
containers of data. Descriptive metadata uses individual
instances of application data or the data content.
Metadata Creation & ManagementMetadata should exist throughout the development process and during all the useful life of the DW. Therefore, metadata, in a DW project, play a key role. It is a big challenge to many DW projects, because there exists much heterogeneity among tools and products for creating and managing metadata in a Data Warehounsing environment.
X-META: A METHODOLOGY FOR DW DESIGN
The key idea of the proposed methodology is to start with a DW pilot project in order to introduce experience in constructing a DW into the internal team and to prove the viability and the DW importance to an organization.
The overall methodology lifecycle is divided into 5 major phases, as depicted in figure 1.
The X-Meta methodology defines three distinct iteration types, each one with its own specific purpose:
(3) DM/DW Project – In this level, many pilot projects may have already been developed and many uncertainties have already been eliminated. Besides, the development process in this iteration type, which corresponds to DM/DW projects, uses the same methodology used in the pilot project iteration type.
(2) Pilot Project – it allows the incremental and evolutionary development of pilot projects, each of which with its own purpose, for instance: testing products, acquiring experience, developing metadata repositories, etc. Each pilot project represents an iteration, which begins in the Planning phase and uses one or more input from the DM/DW Construction Cycle phase, depending on the evolutionary level and on the project goal.
(1) First prototype – it has the main goal of allowing the insertion of the DW technology in the organization. This iteration, which is executed only once in the DW development process, uses only the Introduction phase in the development of the first pilot project in the organization;
X-META: A METHODOLOGY FOR DW DESIGN
A presented example of possible projects to be developed in an organization using the methodology presented in this paper. Each spiral represents a project and each project uses a specific iteration type.
X-META: A METHODOLOGY FOR DW DESIGN
Metadata modeling It has the goal of defining and controlling the activities related to
the metadata, supplying corporative directions for its creation, use and maintenance.
there is a list of activities to be performed in this module. Those activities have the goal of producing a metadata model that will be used during the construction of a metadata repository and during the metadata management phase:
Define the metadata management team and its responsibilities; Define and classify metadata types that will be stored;
Define user types a repository and the equivalent access levels; Determine the metadata sources in the organization
Define and construct the metadata model and metadata flow.
Metadata infrastructureThe metadata repository is an infrastructure component of the DW environment that is useful to all other environment components, working as a tool to help in the integration. The goal of this module is to define the general metadata infrastructure in the organization. The activities belonging to this module are: Construction of an architecture for the metadata (centralized,
decentralized, distributed); Evaluation of metadata tools in order to select the best one to
achieve the project goals. Physical implementation of the metadata repository; Defining security procedures.
Abstract : The goal of the work is to extend the standard ER conceptual
data model, as defined in the database textbooks, with constructs which allow the modelling of multi-dimensional aggregated entities together with their interrelationships with the other parts of the conceptual schema.
This work is also based on a similar preliminary work done by one author on the use of Description Logics as a mean to give precise semantics to a data warehouse conceptual data model and to study its computational properties.
Paper 2 : A DataWarehouse Conceptual Data Model
A Data Warehouse Conceptual Schema
Explication the entity Calls represents a basic cube whose dimensions are Date,
Destination, Source, which are restricted to the basic levels Day, Point, and again Point, respectively.
A first extension of the language can be seen with the simple aggregated
entities, non-dimensional aggregations—Weekday and Customertype, which represent dimensional levels built from the basic dimensional entities Day
and Point, respectively. A simple aggregations aggregate the collections of objects that are in the
extension of the aggregated entities. So, in our example, since the entities Mon, . . ., Sun form a partition of the entity Day, the Weekday entity denotes exactly seven objects, one for all the Mondays, one for all the Tuesdays,etc.
On the other hand, the aggregated entity Customertype denotes exactly two objects, consumer and business. In this way, by interleaving partitioning and simple aggregations, we are able to construct level hierarchies starting from some basic dimensional level.
A second extension of the language is the multidimensional aggregated entity, by the entity Calls-by-Weekday-and-Customertype. This entity denotes all the cells of a cube whose coordinates are the weekdays of the date of the calls, and the customer types of the originators of the calls.
A multi-dimensional aggregated entity is an entity itself in the ER diagram, and it can have attributes and can be part of further relationships or constraints.
Explication
Paper 3 : A Comprehensive Method For DW Design
Abstract:Various methods and appraches have been presented for designing different parts of DW,but, no general and standard method exists to date for dealing with the whole design of a DW .In this paper we fill this gap by presenting a method based on UML allows the user to tackle all DW design phases and steps from the the operational data source to the final implementation and including the definition of ETL Processes .
Related Works:
Different case of studies of data marts (DM) are presented.The
MD modeling is based in the use of the schema and it’s different
varaitions.Most recently , a method for the DW is proposed which basn a MD
model called IDEA and it proposes a set of steps to
adress the conceptual , logical and physical design of a DW.
Overview of the method
We have adopted the OO paradigm because it is semantically richer that others and it offers numerous advantages.
The design of a DW is a joint effort of DW developpers ( Technical user ) and final
user ( the user who are only interested in the business content ). Therefore, a powerfull method with the
correspond models is needed and we believe the OO paradigm is the best approach for the DW design .
Design for a data wharehouseOverview of the method
Design for a data wharehouseThe architecture of DW is depicted as various layers ( every layer is derived from the previous layer ) .
Following this consideration , we consider that the developpement of a DW can be structered into an integrated model with four different shemas :o Operational Data Shemas (ODS) : Defines the structure of the
operational and external data sources.o DW Conceptual Schema (DWCS) : Defines the conceptual shema
of DW .o DW Storage Shema (DWSS) : Defines the physical storage of the
DW depending on the target platform .o Business model (BM) : it defines the different ways or view of
accessing the DW from final user’ point of view .
Overview of the method
Two shcema mapping are also needed in ordre to obtain a global and integrated DW design approach that covers the necessary shcemas:
ETL Process : Defines the mapping between the ODS & DWCS. Exportation Process : Defines the mapping between the DWCS &
DWSS.
Design for a data wharehouseOverview of the method
This method accomplish each one of the shcemas and mapping in an integrated manner : We use a modeling notion based on the UML.
Data warehouse conceptual schema (DWCS) : The most important feature of the MD paradigm is dividing data into facts and dimensions, to provide data on a suitable level of granuliraty, hierarchies are defined on the dimensions.
Overview of the method
StarSchema1
StarSchema2
Dimension1
Dimension2
Fact 1
Dimension2
Dimension2Level 1
Dimension2Level 3
Dimension 2Level 2
The UML Profile includes the use of the UML package , in this way , when modeling complex and large DW systems, we are not restricted to use flat UML class diagrams ans , therefore, cluttered diagrams are avoided.
Design for a data wharehouse
LeveL 1: Model Definition . A package represent a star schema of a conceptual MD level. A depency between two package indicates that the star schemas share at least one dimension.
LeveL 2: Star Schemas Definition . A package represents a fact or a dimension of a star schema. A depency between two dimension package indicates that the package share at least one level of a dimension hierarchy.
LeveL 3: Dimension/fact definition. A package from the second level is exploded into a set of classes that represent the hierarchy levels in a dimension package ,or the whole star schema in the case of the fact package.
Overview of the methodDesign for a data wharehouse
How to apply the Method
We propose a set of steps to guide the design of a DW following our approach .The UML diagram is devided into two swimlanes depending ion who leads the activities : DW final user ( final users guide the work of DW designers and administrators ) and DW designers & administrators ( they don’t need the help of final users because all the needed informations has been previously gathered ).
Activities where the schemas are created
Activities where the mapping between
schemas are created
Different substeps can be achieved during requirements gathering ; specifically, the designer has to :
1) Analysis :
*Determine the desired data
format,level & elements users
wish
*Classify different
summaries.*Define acces
control & security rules.
*Help the final user to
understand wht they do not know
they need whereas others
are less common .
How to apply the Method
DWCS : two “extreme” strategies can be adopted in this activity : top-down (definition of DWCS based on the final user requirements) , bottom-down ( definition of DWCS based on the available data sources ).
We suggest to adopt a combined solution : the DW is designed from the final user’s requirements, but bearing in mind the available data source. ETL process : it’s defined as a mapping between the data source and
the DWCS. this activity and the previous one define a cycle, because during the definition of the ELT process some errors in the DWCS can be detected and , therefore, the DWCS may be modified. Define DMs (BM): different models in the BM are defined from the
final user’s initiale requirements and the DWCS, the BM can be implemented as real or virtual DMs.
2) Design :How to apply the Method
Define Storage (DWSS) : the target platform is selected ( relational, OO..) and the corresponding logical schema (DWSS) is defined ; the query performance can be improved by simplifying the data schema( so that it only contains the essential data ) or by the definition of summaries ( aggregates ) based on the final user’s requirement .
Define exportation process : the mapping between DWCS and DWSS are manually or automatically defined .
Implement Reports & queries : requested by the final users and implemented in the query tool used.
3) Implementation :
How to apply the Method
Validate DW : the solution obtained ( the DW built ) is checked against the existing problem (final user requirements ) . If any discrepancy exists, some corrective actions can be taken and the process can return to one of the previous activities.
4) Test :How to apply the Method
No Pratique application yet
Based on Uml modeling
A tutorial is availableonline.
Uses multiple extended EER or UML diagramsTogether.
The proposed methodology is being used to construct a DW for a department of the Fortaleza City Hall.
New concept
Compariason
1 2 3
This work present a global data warehouse design method that is based on UML for designing the different DW schemas and the corresponding transformation in an integrated manner .
The goal of the work is to extend the standard ER conceptual data model, as defined in the database textbooks, with constructs which allow the modelling of multi-dimensional aggregated entities together with their interrelationships with the other parts of the conceptual Schema.
X-Meta addresses the problem of developing a first DW project in corporations, which do not have staff with practical experience in such development. Furthermore, it integrates metadata creation and management to the DW developing process.
Discussion
1 2 3
Complicated Notions Brief explanation
with many missing parts
Complete and Clear work
Data warehousing is a business analyst's dream — all the informations about the organization's activities gathered in one place, open to a single set of analytical tools. But how do you make the dream a reality?
You have to plan your data warehouse system. You must understand what questions users will ask it because the purpose of a data warehouse system is to provide decision-makers the accurate, timely information they need to make the right choices.
Conclusion
Thanks For You Attention
top related