informatica easy learning online training
TRANSCRIPT
Data Warehousing Concepts What is Data Warehousing? Dimensional Data Model Star Schema Snowflake Schema Slowly Changing Dimension Conceptual Data Model Logical Data Model Physical Data Model Conceptual, Logical, and Physical Data Model Data Integrity What is OLAP MOLAP, ROLAP, and HOLAP
What is Data Warehousing?Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.
A process of transforming data into information and making it available to users in a timely enough manner to make a difference
To summarize ...
• OLTP Systems are used to “run” a business
• The Data Warehouse helps to “optimize” the business
Corporate Data
It includes
• human resource data• financial data• facilities data• sales data• expenses on marketing data• production planning cost• manufacturing cost• service delivery cost• inventory management• shipping and payment data
What is enterprise-wide corporate data?
How is the Business Intelligence in Retail Banking? Or Retail Industry?
KPI’s
The KPI can be used as the performance measurement tool
(Key Performance Indicator)
The KPI’s in Retail Banking: The Total cash deposits held in a month The average annual deposit held Average number of deposits per retail bank growth Average withdrawals made by each depositor Ratio of active depositor or dormant depositor Average number of default borrowers in a year Average number of credit cards issued by the retail bank Rate of borrowing risk Rate of default risk Average number of customers served in a day Average number of closed bank accounts
KPI’s
The KPI can be used as the performance measurement tool
(Key Performance Indicator)
The KPI’s in Retail Industry:
• Sales compared to Budget & Target• Sales compared to last year (or any other period)• Wage cost recovery• Average sale per customer/transaction• Units per customer/transaction• Sales per hour• Sales & Gross Margin
KPI’s (Key Performance Indicator)
Examples of common departmental KPIs
Sales GrowthAnalyze the pace at which your organization's sales revenue is growing and use that information in strategic decision-making
MarketingAnalyze the pace at which your organization's sales revenue is growing and use that information in strategic decision-making
FinancialMeasures your organization's financial health by analyzing readily available resources that could be used to meet any short-term obligations.
Data Warehousing
Data Warehousing Architecture
Data Warehousing Environment
• Duplicate data • Inconsistent values• Missing data• Unexpected use of fields• Impossible or wrong values
Data Quality
• Data-Type Constraints: • Range Constraints:• Mandatory Constraints: • Unique Constraints: • Set-Membership constraints: • Foreign-key constraints: Regular expression patterns:
Validations for Data Cleansing
Views to build warehouse
• The top-down view• The data source view• The data warehouse view• The business query view
What approach is better to design data warehouse?
Top Down Approach
Bottom Up Approach
Data Warehousing Design
• Requirement Gathering• Physical Environment Setup• Data Modeling• ETL• OLAP Cube Design• Front End Development• Report Development• Performance Tuning• Query Optimization• Quality Assurance• Rolling out to Production• Production Maintenance• Incremental Enhancements
Why Data Warehousing?
Need to see daily, weekly, monthly, quarterly profit of each store.
Comparison of sales and profit on various time periods.
Comparison of sales in various time bands of the day.
Need to know which product has more demand on which location?
Need to study trend of sales by time period of the day over the week, month, and year?
On what day sales is higher?
Phases of Data Warehousing Project
1. Identify and collect requirements
Need to see daily, weekly, monthly, quarterly profit of each store.
Comparison of sales and profit on various time periods.
Comparison of sales in various time bands of the day.
Need to know which product has more demand on which location?
Need to study trend of sales by time period of the day over the week, month, and year?
On what day sales is higher?
Will be handled by business analyst and leads
Who collects the requirements?
Phases of Data Warehousing Project
2. Design the dimensional model
Pharmacy_Claims_FactDrug_Id (FK)Org_Id (FK)Practitioner_Id (FK)Product_Id (FK)Time_ID (FK)Claim_status_Id (FK)Provider_Id (FK)Subscriber_id (FK)Demographic_key (FK)
InsuranceType_Id (FK)Incurred_DateClaim_DateClaim_Settled_DateDays_SupplyDispensing_FeeIncentive_Savings_AmountIncentive_Fee_Paid_AmountAmount_ClaimedAmount_PaidAmount_PendingAmount_Adjusted
CoPayment_AmountCoInsurance_Amount
DeductibleRefill_IndicatorClaim_Production_Key
Claim_Production_Txn_NoStatus_Change_DateLast_Record_Flag
PractitionerPractitioner_IdPractitioner_NamePractitioner_Type
practioner_type_descQualification
Specialisationssn
Medical_Assoc_Enroll_No
OrganisationOrg_IdOrg_prod_idOrg_NameAddressCityCountyStateZipIndustry_Classification
SubscriberSubscriber_idSubscriber_prod_keyMember_prod_keyMember_NameDate_of_BirthSubscriber_typeAddressCityCountyStateZipHobby1Hobby2Smoker_YNAlcoholic_YNPre_Existing_Ailments
DemographicsDemographic_keyAge_groupIncome_groupRaceCountry_of_birthMarital_statusGenderCitizenship_status
ProviderProvider_IdProvider_NameProvider_TypeAddressCityCountyStateZipService_Area
Netwrok_Provider
Insurance_TypeInsuranceType_IdInsuranceType_NameInsuranceType_Desc
ProductProduct_IdProduct_NameProduct_Category
LoB
Claim_StatusClaim_status_IdClaim_Status_Reason
Claim_stat_catg
TimeTime_IDDayWeekMonthQuarterYearSeason
DrugsDrug_IdDrug_Name_GenericDrug_Name_TradeNational_Drug_CodeDrug_DescriptionDrug_CategoryFormularyManufacturer
Data Model will be designed by Data Modelers
Phases of Data Warehousing Project
3. Create and Maintain the tables
Database will be maintained by DBA’s
Phases of Data Warehousing Project
4. Loading the data into Data Warehouse and Data Marts
Will be taken care by ETL Team
What is ETL?
Informatica is ETL application
Phases of Data Warehousing Project
5. Develop Reports / Dashboards
Will be taken care by Reporting Team
Phases of Data Warehousing Project
6. Testing ETL Mappings and Reports / Dashboards
Will be taken care by QA Department
7. Deploying to the Production and Maintaining by Production Team
Will be taken care by Production Department
Where do we fit after learning this training?
Phases of Data Warehousing Project
Where do we fit after learning this training?
We can work as a1. ETL Developer2. ETL Administrator3. ETL Tester
Data Modeling
What is Data Modeling?
• Data model defines relationships between data
• Dimensional data model is most often used in data warehousing systems.
• Data modeling is the process of learning about the data.
Data modeling will be designed by data modelers
What is Dimensional Modeling?
• It help us store the data
Goals and benefits of Dimensional Modeling• Faster Data retrieval• Better Understandability• Extensibility
It has 2 distinct categories• Dimension and• Measures
Scenarios of Dimensional Data Modeling
McDonald’s client:I want to store information of how many burgers and fries are getting sold per day from a single McDonald’s outlet.
what is dimension and what is a measure in this example
Step1: Identify the Dimensions
1.Food (ex: Burgers and fries) 2. Store (McDonald’s) 3. Some specific day
Step2: Identify the measures
Number of burgers/fries sold is a measure.
The Fact table captures the data that measures the organizations business operations
Scenarios of Dimensional Data Modeling
Step3: Identify the attributes or properties of dimensions
KEY NAME
1 Burger
2 Fries
KEY NAME
1 Store 1
2 Store 2
... ...
KEY DAY
1 01 Jan 2012
2 02 Jan 2012
3 03 Jan 2012
... ...
Scenarios of Dimensional Data Modeling
Step 4: Identify the granularity of the measures
What is meant by "Granularity"?
Granularity refers to the lowest (or most granular) level of information stored in any table
Scenarios of Dimensional Data Modeling
Step 5: History Preservation (Optional)
This can be solved by designing the dimension tables as "slowly changing dimension".
Entities:Entities are the things about which you want to store information.
For example: EMPLOYEE
Cardinalities:
Scenarios of Dimensional Data Modeling
The cardinality shows how much of one side of the relationship belongs to how much of the other side of the relationship.
For example: • How many customers belong to 1 sale?; • How many sales belong to 1 customer?; • How many sales take place in 1 shop?
Customers --> Sales; 1 customer can buy something several timesSales --> Customers; 1 sale is always made by 1 customer at the timeCustomers --> Products; 1 customer can buy multiple productsProducts --> Customers; 1 product can be purchased by multiple customers
Scenarios of Dimensional Data Modeling for Banking
Scenarios of Dimensional Data Modeling for Retail Banking
Scenarios of Dimensional Data Modeling for Retail Banking
Event 1 - Set-up Banks and BranchesEvent 2 - Create new CustomerEvent 3 - Setup New AccountEvent 4 - Issue Credit CardEvent 5 - Customer makes DepositEvent 6 - Customer uses CardEvent 7 - Bank Issues StatementEvent 8 - Customer closes Account
Data Modeling
Data Modeling
Data Modeling
Types of OLAP Servers
We have four types of OLAP servers:
• Relational OLAP (ROLAP)• Multidimensional OLAP (MOLAP)• Hybrid OLAP (HOLAP)• Specialized SQL Servers
OLTP v/s OLAP
OLTP Data Model
OLTP OLAP
Snowflake Schema
Snowflake Schema
Star Schema
Informatica