cash registers & satellites briefing to the 2006 noaatech conference november 2, 2005 stan...
TRANSCRIPT
![Page 1: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/1.jpg)
Cash Registers & Satellites Briefing to the 2006 NOAATech Conference
November 2, 2005
Stan Cutler
[email protected] 301-457-5210 x 163
Mitretek Systems/NESDIS/OSD
![Page 2: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/2.jpg)
2
Improve communication between NOAA’s developers and the wider community of data management professionals
– Introduce vocabulary
– Identify NOAA applications that can be described using common vocabulary
Purpose
![Page 3: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/3.jpg)
3
Agenda
Universal Data Management Challenges Notional Data Warehouse Architecture Data Modeling Approaches
– Relational
– Dimensional
![Page 4: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/4.jpg)
4
I. Universal Data Management Challenges
![Page 5: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/5.jpg)
5
Data Mining Example: “Market Basket Analysis”
Decisions:1) Move beer display closer to the diaper display 2) On Thursdays, sell beer & diapers at full price
Rationale:1) When men bought diapers on Thursdays and Saturdays, they
also tended to buy beer2) Men typically did their weekly grocery shopping on Saturdays3) On Thursdays, they only bought a few items
![Page 6: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/6.jpg)
6
Many Disciplines Mine Their Data
Law Enforcement - Optimal Deployment Health Care – Coverage Risks E-Commerce – Pop-up/Link Selection Medicine – Gene/Disease Associations Etc.
Data Management GoalDevelop systems in which the data and procedures are
configured to answer questions that are important to the enterprise
![Page 7: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/7.jpg)
7
Integrating Global (Environmental Observations) and Data Management
Ensuring Sound, State-of-the-Art (Research) Developing, Valuing, and Sustaining a World-Class
Workforce
NOAA’s Future
We are not unique. Any enterprise that collects large amounts of data has the same kind of challenges and goals
![Page 8: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/8.jpg)
8
Ask the same kinds of questions as those challenged with similar problems
Understand the constructs and vocabulary– Architectures – Data Modeling
We can find valuable expertise outside the NOAA community
![Page 9: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/9.jpg)
9
II. Notional Data Warehouse Architecture
![Page 10: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/10.jpg)
10
“Hub and Spoke Architecture”
Application Specific “Data Marts”
use ”OLAP” Technologies()
DataStagingArea
DataWare-houseExternal
Data
InternalData
Transform&
“Cleanse”
Application Neutral
“ETL” = Extract, Transform and Load
“OLAP” = Online Analytical Processing
![Page 11: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/11.jpg)
11
Retail ApplicationHub and Spoke Architecture
OLAP Data Marts(Application Specific)
DataStagingArea
DataWare-houseExternal
CustomerLists
SalesData
Transform&
Cleanse
Application Neutral
Marketing
FloorManagement
Human Resources
RealEstate
Accounting
![Page 12: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/12.jpg)
12
Notional NOAA Hub and Spoke Architecture
NOAA Applications(Data Marts using OLAP)
DataStagingArea(RichInventory?)
DataWare-house
Other SatelliteArchives
CLASS
Transform&
Cleanse
Application Neutral
ClimatePrediction
WeatherForecast
EcosystemsManagement
Commerce &Transportation
ExternalCustomers
ESPC
Data Centers
![Page 13: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/13.jpg)
13
III. Data Modeling Approaches
![Page 14: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/14.jpg)
14
“Relational” Vocabulary “Relational” technologies
– Relational Data Base Management Systems (RDBMS)• COTS Products (INFORMIX, DB2, ORACLE, MS/SS, etc.)• Proprietary data management/manipulation software
– RDBMS Extensions (Most COTS products built on an RDBMS) • GUIs, CASE Tools, COOP, Application Generators, Security, etc.
“Relational” Data Models - Evolutionary approach to data base design
• Conceptual Entity Relationship Diagrams (ERD) used to identify data requirements, relationships, rules
– Diagrams– Data Dictionaries
• Logical ERDs used to normalize (eliminate redundancies)• Physical models are the Table Schema entered into the RDBMS
Online Transaction Processing (OLTP) – e.g., CLASS
![Page 15: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/15.jpg)
15
Entity Relationship Diagram (ERD)
key..…
…
key..…
…
key..…
…
key..…
…
Entity
Relationship
Attributes
Cardinality(1, Many, or 0)
The foundation of all OLTP systems, such as CLASS
Attributes, entities, and relationships are described in the data dictionary
EntityClass
![Page 16: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/16.jpg)
16
Object Models “inherit” ERD constructs
key..…
…
key..…
…
key..…
…
ObjectClass
key
…
…
Behavior:>>>>>>>>
![Page 17: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/17.jpg)
17
Pros & Cons of systems based on Relational models
Strengths – Referential integrity
– Data locking
– Fast Look-up and Retrieval
– GUIs Weaknesses
– Entity proliferation
– Users don’t understand them
– Complex code must be written to accumulate multiple instances (Hard to use for Data Mining)
![Page 18: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/18.jpg)
18
Dimensional Data Models
Fact– An instance of numeric data
Dimension– Foreign key
Fact Table– Key is a concatenation of foreign keys (dimensions)
– An instance can have dozens of foreign keys
– Millions of instances (rows) often required Programmers revenge on Data Base Administrators
– Break many relational “rules”
– Re-invented often
![Page 19: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/19.jpg)
19
A “Dimensional” Data Model for Retailing
Who (buys, sells) – Customer (age, gender, marital status, occupation, etc.)– Sales person ( “ , “ , training, etc.)– Cash Register
What (products) – Brand, color, size, type, etc
When – Time of day, day of week, season
Where – Store (location, size, type), Shelf
Why– Promotions, advertising, discounts, economic trends
How much (was spent)– Per product, per total sale
![Page 20: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/20.jpg)
20
Classical Star Schema: Point of Sale
Clerk_key ClerkNameJobGradeEtc.
Clerk Dimension
Time_keyCustomer_keyStore_keyClerk_keyPromo_keyProduct_keyRegister_keyDollars SoldUnits SoldDollars Cost
Register_key LocationTypeEtc.
Register Dimension
Promo_key PromoNamePriceTypeAdTypeEtc.
Promo Dimension
Product_keyDescriptionBrandSub CategoryCategoryDeptFlavorPackage Type
Product Dimension
Time_keyDayofWeekFiscal period
Time Dimension
Customer_keyCustomerNamePurchase ProfileEtc.
Customer Dimension
Store_keyStoreNameAddressFloorTypeEtc.
Store Dimension
FACT
![Page 21: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/21.jpg)
21
Snowflake Schema: Point of Sale
Register_key LocationTypeEtc.
Register Dimension
Clerk_key ClerkNameJobGradeEtc.
Clerk Dimension
Time_keyCustomer_keyStore_keyClerk_keyPromo_keyProduct_keyRegister_keyDollars SoldUnits SoldDollars Cost
Promo_key PromoNamePriceTypeAdTypeEtc.
Promo Dimension
Product_Type_PKProduct_Type_Desc
Product Dimension
Time_keyDayofWeekFiscal period
Time Dimension
Customer_keyCustomerNamePurchase ProfileEtc.
Customer Dimension
Store_keyStoreNameAddressFloorTypeEtc.
Store Dimension
FACT Sub-Type_PKSub-Type-Desc
Sub-Type_PKSub-Type-Desc
Sub-Type_PKSub-Type-Desc
Model-Num_PKModel-Desc
Brand-ID_PKMaker-Desc
Sub-Type_PKSub-Type-Desc
Model-Num_PKModel-Desc
Brand-ID_PKMaker-Desc
![Page 22: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/22.jpg)
22
Metadata in Dimensional Modeling
NOAA usage:– If it’s not a fact
– If it’s not a key
– It’s metadata Conventional Dimensional usage:
– If it’s not a fact
– If it’s not a key
– It’s documentation
BUT
– If it’s a key
– It’s metadata (because it describes the fact)
![Page 23: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/23.jpg)
23
Dimensional Models for NOAA Which
– Satellite– Instrument
When – Orbit, UTC, Season, decade, epoch, etc
Where – Geospatial coordinates
Who– User affiliation– Developer affiliation
FACT: How much? – Temperature, moisture, radiance, color, etc.
![Page 24: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/24.jpg)
24
A NOAA Star Schema?
Altitude_ key Distance above SLEtc.
Altitude Dimension
Time_key (fk)Location-key (fk)Altitude key (fk)Product_key (fk)Satellite_key (fk)Instrument_key (fk)
Temperature
Satellite_key NamePosition
Satellite Dimension
Instrument_key NameDescription
Instrument Dimension
Product_keyProduct NameDescriptionSystemSub SystemEtc.
Product Dimension
FACT TABLE
Time_keyUTC of Obs’nUTC of receipt LocalT of Obs’nOrbit_IdEtc.
Time Dimension
Location keyGeo-Coordinates of Obs’n Etc.
Location Dimension
![Page 25: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/25.jpg)
25
Pros & Cons of systems based on dimensional models
Strengths– Very few “entity types” needed
– Decision Support Systems (DSS)• End-Users construct complex queries by selecting dimensions from a GUI
• Statistical analysis of very large data bases
– Artificial Intelligence (AI) • Automated scheduling of continuous executions
• System identifies (“discovers”) new relationships
• Discoveries shape successive execution
Weaknesses – Development Cost
– Storage
– Operational Cost - Requires much “care and feeding”
![Page 26: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/26.jpg)
26
False Dichotomy: Relational “vs.” Dimensional
Relational and dimensional systems are not mutually exclusive – Data warehouses usually extract fact tables from relational
data bases
– Data warehouse capabilities are extensions in RDBMSs Depends on the business
– Feasibility: Is the application data good enough for ETL?
– ROI: Does the business benefit outweigh the cost?
![Page 27: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD](https://reader035.vdocuments.us/reader035/viewer/2022070415/5697bf851a28abf838c87638/html5/thumbnails/27.jpg)
27
SUMMARY:
NOAA’s data mining challenge is similar to that of other enterprises
A world-wide community of IT professionals uses a particular vocabulary to address the challenge
Relational technologies & models are the essential first step
Dimensional technologies & models come next