data administration data warehouse implementation 5/26/04

28
Data Administration Data Warehouse Implementation 5/26/04

Upload: acton-barlow

Post on 31-Dec-2015

29 views

Category:

Documents


0 download

DESCRIPTION

Data Administration Data Warehouse Implementation 5/26/04. DW Current Resources. Servers – IBM RS6000 S7A with 4 engines, 4GB memory, 200GB disk space, running AIX 4.3.0 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Administration Data Warehouse Implementation 5/26/04

Data Administration

Data Warehouse Implementation

5/26/04

Page 2: Data Administration Data Warehouse Implementation 5/26/04

DW Current Resources• Servers –

IBM RS6000 S7A with 4 engines, 4GB memory, 200GB disk space, running AIX 4.3.0

Dell PowerEdge 2550 with two 1.4 GHZ processors, 4GB of RAM, four 36GB disk drives and running Windows 2000 server operating system.

• Data – 100GB disk space used to house FRS, FES, HRS and SIS Oracle extracts. This is essentially a “data dump” of the IDMS data, and additional “views” still need to be created.

• Staff – DBA - Henry 80%, Song 20%, Chen 10% DA - Kisil 70%, Cheesman 50%, Claunch 50%, Carter

100%

Page 3: Data Administration Data Warehouse Implementation 5/26/04

DW Tasks DBA – Extract data, design Oracle DB, load data, and

production support (i.e. monitor system and DB performance, enforce security, schedule backups, etc.)

Data Administration – User interface, develop requirements document for all DW projects, evaluate data quality, create DB views, develop specialized reports, test, train users, and coordinate projects.

Both – Infrastructure design (with Systems staff), and tool evaluation (ETL, OLAP and desktop reporting) with help from the C/S group.

Page 4: Data Administration Data Warehouse Implementation 5/26/04

5/26/2004

WebFOCUS (Reporting)

Operational Enterprise

Ad Hoc and Operational Reports

Tell me what happened?

EIS

Tell me everything I need to know and what is important, but do it quickly and easily!

Current DWE

Data Warehouse Cleansed Subset of Detail Data Subset of Summary

Data Multiple Years of Data Periodic Updates Strategic

External Data Census Data, Benchmark, Salary Surveys, Economic

Data

Data Staging Area Extract Data Transform Data Quality Assurance Create Metadata

Source Data IDMS

Oracle Flat files

Ad Hoc Query Repository Copy of Source Data Operational Daily Updates All Elements Minimum Number of Years of

Data

Data Mart #1 Course Management

Subset of DW Summarized in specific

manner Tactical

Metadata

SAS Data Mining Server

Tell me what may happen, or what is interesting?

Give me information to help me achieve specific goals!

Tell me what happened and why?

Legend: 1) Wide black border indicates physical servers. 2) Narrow border indicates no decision on if it

will be a separate physical server. 3) Red border indicates under development. 4) Gray background indicates BI or analytical

software servers.

Page 5: Data Administration Data Warehouse Implementation 5/26/04

5/26/2004

WebFOCUS (Reporting)

Operational Enterprise

Ad Hoc and Operational Reports

Tell me what happened?

EIS

Tell me everything I need to know and what is important, but do it quickly and easily!

Original Plan for DWE

Data Warehouse Cleansed Subset of Detail Data Subset of Summary

Data Multiple Years of Data Periodic Updates Strategic

External Data Census Data, Benchmark, Salary Surveys, Economic

Data

Data Staging Area Extract Data Transform Data Quality Assurance Create Metadata

Source Data IDMS

Oracle Flat files

Ad Hoc Query Repository Copy of Source Data Operational Daily Updates All Elements Minimum Number of Years of

Data

Data Mart #2 Resource Management

Data Mart #1 Course Management

Subset of DW Summarized in specific

manner Tactical

Metadata

SAS Data Mining Server

OLAP Server

Tell me what may happen, or what is interesting?

Give me information to help me achieve specific goals!

Tell me what happened and why?

Legend: 1) Wide black border indicates physical servers. 2) Narrow black border indicates no decision on

if it will be a separate physical server. 3) Gray background indicates BI or analytical

software servers.

Page 6: Data Administration Data Warehouse Implementation 5/26/04

DW Terms• Source Data: Operational data from internal systems, such as IDMS (FES, FRS, HRS,

SIS), Oracle, etc. •  External Data: Data from systems external to the University, such as economic and

census data collected by the government.

•  Data Staging Area: Storage and processing area for data extracted from the internal and external systems prior to loading into the Warehouse, Data Marts or Ad Hoc Query Repository. Some of the data will remain un-cleansed and an exact replica of the data in the online systems, for subsequent loading into the Ad Hoc Query Repository. Other data will be cleansed and transformed before being moved to the Data Warehouse and Data Marts for analysis. Some data will be located in multiple places and in multiple forms and aggregations.

• Metadata: A term used for data that describes or specifies other data. It is used to define all of the characteristics of data required to build databases and applications, and to support knowledge workers and information producers. This includes information currently in the Data Inventory (the element name, meaning, format, domain values), and additional information such as business integrity rules, relationships, owner, etc.

Page 7: Data Administration Data Warehouse Implementation 5/26/04

DW Terms• Ad Hoc Query Repository: A collection of enterprise data from multiple sources,

used to do ad hoc and operational reporting where the need to use the most current and un-standardized source data is a requirement. The Repository will typically contain only one or two years of the most recent data, unless regulatory or statutory requirements dictate otherwise. (Also known as an Operational Data Store or ODS.)

• Data Warehouse: An enterprise-wide, cross-functional, cross-organizational database typically comprised of data extracted, cleansed and/or summarized from multiple online transaction processing systems, and other stores of data (Purdue University; Stanford University). It is designed for query and analysis, typically contains historical data, and is used to present information to support decision-making, tactical and strategic business processes. A data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. In general, a data warehouse tends to be a strategic, but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need. (Improving Data Warehouse and Business Information Quality, Larry P. English, 1999.)

Page 8: Data Administration Data Warehouse Implementation 5/26/04

DW Terms• Data Mart: A subset of enterprise data from the Data Warehouse that is

summarized and stored in an optimal fashion for analysis and presentation of information to support trend analysis and tactical decisions and processes. Data Marts are typically designed based on an analysis of user needs to answer specific questions in the pursuit of specific goals. The scope can be that of a complete data subject such as Student, or of a particular business area or line of business, such as Enrollment. (Improving Data Warehouse and Business Information Quality, Larry P. English, 1999.)

• On-Line Analytical Processing (OLAP): A category of software technology

that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server. (http://www.moulton.com/olap/olap.glossary.html) Functionality includes multi-dimensional analysis, slicing, drill-down and rotation.

Page 9: Data Administration Data Warehouse Implementation 5/26/04

DW Terms (Continued)

• Data Mining: A class of database applications that look for hidden patterns in a group of data. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data. (http://www.webopedia.com/TERM/d/data_mining.html)

Page 10: Data Administration Data Warehouse Implementation 5/26/04

DW Terms (Continued)

• Executive Information System (EIS): An application developed to provide senior management direct access to information relevant to an organization’s goals and performance. These applications are developed to gather, analyze and integrate internal and external data to provide management with insight into key performance indicators, potential problems, and changes in the environment. Typical features include extensive use of graphics, simple navigational controls, automatic replacement of report contents, drill-down analysis, trend analysis capabilities, exception reporting or alerts, graphical charts with links to underlying reports, provision of data from multiple sources, and the highlighting of information an executive feels is critical. (The Data Warehouse Lifecycle Toolkit, Ralph Kimball, et al.)

Page 11: Data Administration Data Warehouse Implementation 5/26/04

What is a

Decision Support System

EIS

Data Mart

Data Warehouse

Operational Data Store

Covansys

High Level Summarized Data For Top Executives (“Pre-programmed DASHBOARD”)

Addresses Specific Subject Area

Collection Of Integrated Subject Oriented Databases (Historical)

Time-Current, Integrated Databases (Tactical-Power Users)

Components of a Decision Support System

Page 12: Data Administration Data Warehouse Implementation 5/26/04

How do we get there?

• Educate users. • Develop detailed requirements documents,

including “Information Value Chains” for all goals which the DW/DM is expected to address.

• Data Mart approach.• Phased implementation.• Additional resources.

Page 13: Data Administration Data Warehouse Implementation 5/26/04

Educate Users • Basics – “What is a Data Warehouse?” Create a

“single-source-of-truth.” “What it’s not!” (It is not all the data, with daily updates and online storage.)

• Change in culture – “Let’s make better decisions based on objective analysis of data.”

• Set realistic expectations - No silver bullet. It can help you make better decisions, but you still have to be responsible for implementing those decisions.

• Focus on institutional goals – “What is it we need to achieve? What metrics do we need to evaluate our progress in attaining goals?”

• Importance of business sponsors – Make timely business decisions and support IT requests for additional funds.

Page 14: Data Administration Data Warehouse Implementation 5/26/04

Course Management (I.V.C.)

Business Functions and Goals Optimize course offerings to meet student need.

Improvement Opportunities Increase number of high demand courses/sections Increase maximum enrollment in sections Eliminate or reduce frequency of low demand courses Improve course meeting patterns and delivery mode Performance Measures

# and % decrease of students who do not get any section of the course requested

# and % decrease of low demand courses # and % increase in enrollment % usage of classroom capacity % decrease in length of time to graduate # and % increase in courses taught through

preferred mode Business Questions What are the characteristics of high/low demand courses? What characteristics of the student are related to demand? What courses can be eliminated? Which courses should/can be moved to smaller/larger facilities? What impact does the meeting time and location have on demand? What improvements can be made with/without additional money?

Data Model Data Mart/ Warehouse (American Management Systems, Inc.)

Courses

Student

Enrollment

Defines

Economic Data

Course Demand

College Budgets

Degree Reqs.

Facilities

Available Faculty

Page 15: Data Administration Data Warehouse Implementation 5/26/04

Enrollment Management (I.V.C.)Business Functions and Goals Increase student enrollment

Improvement Opportunities Increase student retention Increase number of new students Increase number of transfers Performance Measures

# and % increase of returning, new and transfer students by term

# and % increase of applicants to college eligible by KY county

# and % increase of admits to applicants # and % increase of enrollments to admits Ratios and % increase for FT/PT, Res/Non-Resident

Business Questions What are the characteristics of students who do/do not return? What are the characteristics of students who apply, but do not enroll? What are the characteristics of students who do/do not transfer? What policies or practices could affect these student outcomes? What are the external factors affecting our enrollment?

Data Model Data Mart/ Warehouse (American Management Systems, Inc.) Courses

Applicant Student

Enrollment

Defines

Financial Aid KY Population Data

State Aid Programs Data

Economic Data Course Demand

Page 16: Data Administration Data Warehouse Implementation 5/26/04

Data Warehouse Subject Areas To be added: Hospital data Note: Asset Management includes facilities, network and properties management.

Staff Applicant

s

Faculty Applicants

Assets (Tangible)

Vendors

Research

Faculty

Courses

Room

Scheduling

Students

Alumni

Staff

Applicants

Prospects

Department/ College

Donors

Accounts (Funding)

Grants

Page 17: Data Administration Data Warehouse Implementation 5/26/04

Course Management DM – Subject Areas

• Accounts (dollars for faculty and supplies)• Assets (facilities and equipment)• Benchmark Data (what and when courses offered,

degree requirements)• Courses (meeting pattern, teaching mode, requirements)• Department/College (who owns the course, degree

offered and requirements)• Economic Data (salary and demand by profession) • Faculty (distribution of effort, availability to teach)• Faculty Applicants (who and how many)• Population Data (who, where and what volume)• Staff (instructional support)• Students (classification, course demand)

Page 18: Data Administration Data Warehouse Implementation 5/26/04

Enrollment Management DM – Subject Areas

• Accounts (dollars for services, financial aid)• Assets (facilities and equipment, housing)• Benchmark Data (enrollment patterns)• Courses (meeting pattern, teaching mode, requirements,

professor)• Department/College (who owns the course, degree

requirements, degrees offered)• Economic Data (general condition of economy, hot professions) • Faculty (quality, student evaluations)• Population Data (who, where and what volume)• Staff (instructional support, student service area)• Students (demographics, course demand, GPA)• Student Prospects (demographics, contacts)• Student Applicants (demographics, contacts)

Page 19: Data Administration Data Warehouse Implementation 5/26/04

Project Schedule

Phase I, 2001-02 Infrastructure & PlanningPhase II, 2002-03 Creation of Data Mart

Phase III, 2003-04 Creation of Additional Data Marts

Page 20: Data Administration Data Warehouse Implementation 5/26/04

Phase I – Infrastructure and Planning (2001-

2002) 1) IDMS Data Dump to Oracle – 100% complete.

2) WebFOCUS Implementation – Completion 12/02.

3) Data Mining Tools for IR staff – Purchased in 2002. Implementation in progress. Target installation completion 3/03.

4) Create Views for “Data Dump” (Ad Hoc Reporting Repository) - Target completion 4Q 02-03.

5) Establish Enterprise Standards for Key Data – Analysis and recommendations are ongoing.

6) Identify and Prioritize Data Mart Development – Course Management Data Mart top priority for Data Stewards.

Page 21: Data Administration Data Warehouse Implementation 5/26/04

Phase I – Infrastructure and Planning (2001-

2002) (Continued) 7) GASB – Phase I completed 8/02.8) CPE – Six years of data loaded in

Oracle. 9) Review Desktop Reporting Tools –

Ongoing review and testing of: • Brio• Crystal Reports• SAS• WebFOCUS

Page 22: Data Administration Data Warehouse Implementation 5/26/04

Phase II – Creation of Data Mart (2002-2003)

1) Select and Purchase ETL Tools – Choices:• Ascential

• IBM

• Informatica

• Oracle

• SAS

2) Course Management DM – Requirements phase. Target completion 4/03.

3) Phase II of GASB. Target completion 7/03.

Page 23: Data Administration Data Warehouse Implementation 5/26/04

Phase III – Creation of Additional Data Marts

(2003-2004) 1) Create Metadata – Dependent on ETL implementation.

2) Data Marts – Complete Course Management DM by 7/1/04, and complete requirements for Resource Management DM. (Others to consider: Enrollment Mgt., Department Mgt., Diversity & Equity Initiatives, Grant Management, and Research Management.).

3) Evaluate External Data Needs - Identify external data needed for the Data Marts, locate source(s), estimate cost and request funds for 04-05.

Page 24: Data Administration Data Warehouse Implementation 5/26/04

Phase ??

1) Create Remaining Data Marts.

2) Develop OLAP applications.

3) Develop EIS.

4) Purchase and Load External Data.

Page 25: Data Administration Data Warehouse Implementation 5/26/04

New DWE Resources• The existing server will need to be upgraded, and

additional servers will need to be purchased for the Ad Hoc Reporting Repository, the Data Staging Area, the Data Mining and/or OLAP tools, and for one or more of the Data Marts. Size, type and cost will vary depending on function.

• Storage – Additional disk space for DW server, and for other servers using Enterprise Storage System. Exact amounts for online storage and for archiving will be identified during the project requirements phase.

Page 26: Data Administration Data Warehouse Implementation 5/26/04

New Resources (Cont.)• Training –

Infrastructure – “How do you design a DW environment?Staff training on DWOracle (or other) DB training

Tools – Analysis and reporting tools. Onsite visits to other universities.

• Consulting on Infrastructure Design – Covansys, Gavroshe, Oracle, IBM, etc.

• Staff – Additional staff required for database administration, institutional reporting, EIS development, ETL use, data modeling, etc.

Page 27: Data Administration Data Warehouse Implementation 5/26/04

DWE Critical Issues• Personnel Resources – Need to keep people focused on DW in light

of competing projects. (Given the current budget situation, this will be difficult.)

• Training – IT and users.

• Consulting – Assistance for infrastructure planning.

• Culture Change – Focus on goals rather than data elements. Tell us what, not how! Value in making decision based on data. Proof of concept.

• Requirements Gathering – Needs to be thorough, and heavily relies on timely user decisions. Defining who the “users” are on any given project is critical.

• Budget/Funding – For additional hardware, software and people.• Business Sponsor – The Data Warehouse is not another IT

project. It is an enterprise initiative!

Page 28: Data Administration Data Warehouse Implementation 5/26/04

Data Administration

QUESTIONS?