what is a data warehouse? and why are so many schools setting them up? richard goerwitz

37
What is a Data What is a Data Warehouse? Warehouse? And Why Are So Many And Why Are So Many Schools Setting Them Schools Setting Them Up? Up? Richard Goerwitz Richard Goerwitz

Upload: beatrix-craig

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

What is a Data Warehouse?What is a Data Warehouse?

And Why Are So Many And Why Are So Many Schools Setting Them Up?Schools Setting Them Up?

Richard GoerwitzRichard Goerwitz

What Is a Data Warehouse?What Is a Data Warehouse?

Nobody can agreeNobody can agree So I’m not actually going to define a DWSo I’m not actually going to define a DW Don’t feel cheated, thoughDon’t feel cheated, though By the end of this talk, you’llBy the end of this talk, you’ll

• Understand key concepts that underlie all Understand key concepts that underlie all warehouse implementations (“talk the talk”)warehouse implementations (“talk the talk”)

• Understand the various components out of Understand the various components out of which DW architects construct real-world data which DW architects construct real-world data warehouseswarehouses

• Understand what a data warehouse project Understand what a data warehouse project looks likelooks like

Why Are Schools Setting UpWhy Are Schools Setting UpData Warehouses?Data Warehouses?

A data warehouse makes it easier to:A data warehouse makes it easier to:• Optimize classroom, computer lab usageOptimize classroom, computer lab usage• Refine admissions ratings systemsRefine admissions ratings systems• Forecast future demand for courses, majorsForecast future demand for courses, majors• Tie private spreadsheet data into central repositoriesTie private spreadsheet data into central repositories• Correlate admissions and IR data with outcomes such as:Correlate admissions and IR data with outcomes such as:

GPAsGPAs Placement ratesPlacement rates Happiness, as measured by alumni surveysHappiness, as measured by alumni surveys

• Notify advisors when extra help may be needed based onNotify advisors when extra help may be needed based on Admissions data (student vitals; SAT, etc.)Admissions data (student vitals; SAT, etc.) Special events: A-student suddenly gets a C in his/her majorSpecial events: A-student suddenly gets a C in his/her major Slower trends: Student’s GPA falls for > 2 semesters/termsSlower trends: Student’s GPA falls for > 2 semesters/terms

• (Many other examples could be given!)(Many other examples could be given!) Better information = better decisionsBetter information = better decisions

• Better admission decisionsBetter admission decisions• Better retention ratesBetter retention rates• More effective fund raising, etc.More effective fund raising, etc.

Talking The TalkTalking The Talk To think and communicate usefully about data warehouses To think and communicate usefully about data warehouses

you’ll need to understand a set of common terms and you’ll need to understand a set of common terms and concepts:concepts:• OLTPOLTP• ODSODS• OLAP, ROLAP, MOLAPOLAP, ROLAP, MOLAP• ETLETL• Star schemaStar schema• Conformed dimensionConformed dimension• Data martData mart• CubeCube• MetadataMetadata

Even if you’re not an IT person, pay heed:Even if you’re not an IT person, pay heed:• You’ll have to communicate with IT peopleYou’ll have to communicate with IT people• More importantly:More importantly:

Evidence shows that IT will only build a successful warehouse if Evidence shows that IT will only build a successful warehouse if youyou are intimately involved!are intimately involved!

OLTPOLTP

OLTP = OLTP = online transaction processingonline transaction processing The process of moving data around to The process of moving data around to

handle day-to-day affairshandle day-to-day affairs• Scheduling classesScheduling classes• Registering studentsRegistering students• Tracking benefitsTracking benefits• Recording payments, etc.Recording payments, etc.

Systems supporting this kind of activity Systems supporting this kind of activity are called are called transactional systemstransactional systems

Transactional SystemsTransactional Systems Transactional systems are optimized primarily for Transactional systems are optimized primarily for

the the here and nowhere and now• Can support many simultaneous usersCan support many simultaneous users• Can support heavy read/write accessCan support heavy read/write access• Allow for constant changeAllow for constant change• Are big, ugly, and often don’t give people the data they Are big, ugly, and often don’t give people the data they

wantwant As a result a lot of data ends up in shadow databasesAs a result a lot of data ends up in shadow databases Some ends up locked away in private spreadsheetsSome ends up locked away in private spreadsheets

Transactional systems don’t record all previous Transactional systems don’t record all previous data statesdata states

Lots of data gets thrown away or archived, e.g.:Lots of data gets thrown away or archived, e.g.:• Admissions dataAdmissions data• Enrollment dataEnrollment data• Asset tracking data (“How many computers did we Asset tracking data (“How many computers did we

support each year, from 1996 to 2006, and where do we support each year, from 1996 to 2006, and where do we expect to be in 2010?”)expect to be in 2010?”)

Simple Transactional DatabaseSimple Transactional Database

Map of Microsoft Map of Microsoft Windows Update Windows Update Service (WUS) Service (WUS) back-end databaseback-end database• Diagrammed using Diagrammed using

Sybase Sybase PowerDesignerPowerDesigner

Each green box is a Each green box is a database “table”database “table”

Arrows are “joins” or Arrows are “joins” or foreign keysforeign keys

This is This is simplesimple for an for an OLTP back endOLTP back end

More Complex ExampleMore Complex Example Recruitment Plus back-end Recruitment Plus back-end

databasedatabase Used by many admissions Used by many admissions

officesoffices Note again:Note again:

• Green boxes are tablesGreen boxes are tables• Lines are foreign key Lines are foreign key

relationshipsrelationships• Purple boxes are viewsPurple boxes are views

Considerable expertise is Considerable expertise is required to report off this required to report off this database!database!

Imagine what it’s like for Imagine what it’s like for even more complex even more complex systemssystems• ColleagueColleague• SCT Banner (over 4,000 SCT Banner (over 4,000

tables)tables)

The “Reporting Problem”The “Reporting Problem”

Often we require OLTP data as a snapshot, in a Often we require OLTP data as a snapshot, in a spreadsheet or spreadsheet or reportreport

Reports require Reports require queryingquerying back-end OLTP support back-end OLTP support databasesdatabases

But OLTP databases are often very complex, and But OLTP databases are often very complex, and typically typically • Contain many, often obscure, tablesContain many, often obscure, tables• Utilize cryptic, unintuitive field/column namesUtilize cryptic, unintuitive field/column names• Don’t store all necessary historical dataDon’t store all necessary historical data

As a result, reporting becomes a problem –As a result, reporting becomes a problem –• Requires special expertiseRequires special expertise• May require modifications to production OLTP systemsMay require modifications to production OLTP systems• Becomes harder and harder for staff to keep up!Becomes harder and harder for staff to keep up!

WorkaroundsWorkarounds Ways of working around the reporting Ways of working around the reporting

problem include:problem include:1.1. Have OLTP system vendors do the workHave OLTP system vendors do the work

• Provide canned reportsProvide canned reports• Write reporting GUIs for their productsWrite reporting GUIs for their products

2.2. Hire more specialistsHire more specialists• To create simplified To create simplified viewsviews of OLTP data of OLTP data• To write reports, create snapshotsTo write reports, create snapshots

3.3. Periodically copy data from OLTP systems to Periodically copy data from OLTP systems to a place wherea place where

• The data is easier to understandThe data is easier to understand• The data is optimized for reportingThe data is optimized for reporting• Easily pluggable into reporting toolsEasily pluggable into reporting tools

ODSODS

ODS = ODS = operational data storeoperational data store ODSs were an early workaround to the “reporting ODSs were an early workaround to the “reporting

problem”problem” To create an ODS youTo create an ODS you

• Build a separate/simplified version of an OLTP systemBuild a separate/simplified version of an OLTP system• Periodically copy data into it from the live OLTP systemPeriodically copy data into it from the live OLTP system• Hook it to operational reporting toolsHook it to operational reporting tools

An ODS can be an integration point or real-time An ODS can be an integration point or real-time “reporting database” for an operational system“reporting database” for an operational system

It’s not enough for full enterprise-level, cross-It’s not enough for full enterprise-level, cross-database database analytical processinganalytical processing

OLAPOLAP

OLAP = OLAP = online analytical processingonline analytical processing OLAP is the process of creating and OLAP is the process of creating and

summarizing historical, multidimensional summarizing historical, multidimensional datadata• To help users understand the data betterTo help users understand the data better• Provide a basis for informed decisionsProvide a basis for informed decisions• Allow users to manipulate and explore data Allow users to manipulate and explore data

themselves, easily and intuitivelythemselves, easily and intuitively More than just “reporting”More than just “reporting” Reporting is just one (static) product of Reporting is just one (static) product of

OLAPOLAP

OLAP Support DatabasesOLAP Support Databases

OLAP systems require support databasesOLAP systems require support databases These databases typicallyThese databases typically

• Support fewer simultaneous users than Support fewer simultaneous users than OLOLTTPP back endsback ends

• Are structured simply; i.e., Are structured simply; i.e., denormalizeddenormalized• Can grow largeCan grow large

Hold snapshots of data in OLTP systemsHold snapshots of data in OLTP systems Provide history/time depth to our analysesProvide history/time depth to our analyses

• Are optimized for read (not write) accessAre optimized for read (not write) access• Updated via periodic batch (e.g., nightly) Updated via periodic batch (e.g., nightly) ETLETL

processesprocesses

ETL ProcessesETL Processes ETL = extract, transform, loadETL = extract, transform, load

• ExtractExtract data from various sources data from various sources• TransformTransform and clean the data from those sources and clean the data from those sources• LoadLoad the data into databases used for analysis and the data into databases used for analysis and

reportingreporting ETL processes are coded in various waysETL processes are coded in various ways

• By hand in SQL, UniBASIC, etc.By hand in SQL, UniBASIC, etc.• Using more general programming languagesUsing more general programming languages• In semi-automated fashion using specialized ETL tools In semi-automated fashion using specialized ETL tools

like Cognos Decision Streamlike Cognos Decision Stream Most institutions do hand ETL; but note well:Most institutions do hand ETL; but note well:

• Hand ETL is slowHand ETL is slow• Requires specialized knowledgeRequires specialized knowledge• Becomes extremely difficult to maintain as code Becomes extremely difficult to maintain as code

accumulates and databases/personnel change!accumulates and databases/personnel change!

Where Does the Data Go?Where Does the Data Go?

What sort of a database do the ETL What sort of a database do the ETL processes dump data into?processes dump data into?

Typically, into very simple table Typically, into very simple table structuresstructures

These table structures are:These table structures are:• DenormalizedDenormalized• Minimally branched/hierarchizedMinimally branched/hierarchized• Structured into Structured into star schemasstar schemas

So What Are Star Schemas?So What Are Star Schemas?

Star schemas are collections of data arranged Star schemas are collections of data arranged into star-like patternsinto star-like patterns• They have They have fact tablesfact tables in the middle, which contain in the middle, which contain

amounts, measures (like counts, dollar amounts, GPAs)amounts, measures (like counts, dollar amounts, GPAs)• DimensionDimension tables around the outside, which contain tables around the outside, which contain

labels and classifications (like names, geocodes, majors)labels and classifications (like names, geocodes, majors)• For faster processing, For faster processing, aggregate fact tablesaggregate fact tables are are

sometimes also used (e.g., counts pre-averaged for an sometimes also used (e.g., counts pre-averaged for an entire term)entire term)

Star schemas shouldStar schemas should• Have descriptive column/field labelsHave descriptive column/field labels• Be easy for users to understandBe easy for users to understand• Perform well on queriesPerform well on queries

A Very Simple Star SchemaA Very Simple Star Schema

Data Center UPSData Center UPSPower OutputPower Output

Dimensions:Dimensions:PhasePhaseTimeTimeDateDate

Facts:Facts:VoltsVoltsAmpsAmpsEtc.Etc.

A More Complex Star SchemaA More Complex Star Schema Freshman survey Freshman survey

data (HERI/CIRP)data (HERI/CIRP)

Dimensions:Dimensions:• QuestionsQuestions• Survey yearsSurvey years• Data about test Data about test

takerstakers Facts:Facts:

• Answer (text)Answer (text)• Answer (raw)Answer (raw)• Count (1)Count (1)

OopsOops• Not a starNot a star• Snowflaked!Snowflaked!Oops, answers should have been placed in their

own dimension (creating a “factless fact table”). I’ll demo a better version of this star later!

Data MartsData Marts One definition:One definition:

• One or more star schemas that present data on a single One or more star schemas that present data on a single or related set of business processesor related set of business processes

Data marts should Data marts should notnot be built in isolation be built in isolation They need to be connected via dimensional They need to be connected via dimensional

tables that aretables that are• The same or subsets of each otherThe same or subsets of each other• Hierarchized the same way internallyHierarchized the same way internally

So, e.g., if I construct data marts for…So, e.g., if I construct data marts for…• GPA trends, student major trends, enrollmentsGPA trends, student major trends, enrollments• Freshman survey data, senior survey data, etc.Freshman survey data, senior survey data, etc.

……I connect these marts via a conformed I connect these marts via a conformed studentstudent dimensiondimension• Makes correlation of data across star schemas intuitiveMakes correlation of data across star schemas intuitive• Makes it easier for OLAP tools to use the dataMakes it easier for OLAP tools to use the data• Allows nonspecialists to do much of the workAllows nonspecialists to do much of the work

Simple Data Mart ExampleSimple Data Mart ExampleUPSUPSBattery starBattery star

By batteryBy batteryRun-timeRun-time% charged% chargedCurrentCurrent

Input starInput starBy phaseBy phase

VoltageVoltageCurrentCurrent

Output starOutput starBy phaseBy phase

VoltageVoltageCurrentCurrent

Sensor starSensor starBy sensorBy sensor

TempTempHumidityHumidity

Note conformed date, time dimensions!

CIRP Star/Data MartCIRP Star/Data Mart CIRP CIRP

Freshman Freshman survey datasurvey data

Corrected Corrected from a from a previous previous slideslide

Note the Note the CirpAnswer CirpAnswer dimensiondimension

Note student Note student dimension dimension (ties in with (ties in with other marts)other marts)

CIRP Mart in Cognos BI 8CIRP Mart in Cognos BI 8

ROLAP, MOLAPROLAP, MOLAP

ROLAP = OLAP via direct relational queryROLAP = OLAP via direct relational query• E.g., against a (materialized) viewE.g., against a (materialized) view• Against star schemas in a warehouseAgainst star schemas in a warehouse

MOLAP = OLAP via multidimensional MOLAP = OLAP via multidimensional database (MDB)database (MDB)• MDB is a special kind of databaseMDB is a special kind of database• Treats data kind of like a big, fast spreadsheetTreats data kind of like a big, fast spreadsheet• MDBs typically draw data in from a data MDBs typically draw data in from a data

warehousewarehouse Built to work best with Built to work best with star schemasstar schemas

The term The term data cubedata cube means different things to means different things to different peopledifferent people

Various definitions:Various definitions:1.1. A star schemaA star schema2.2. Any DB view used for Any DB view used for

reportingreporting3.3. A three-dimensional A three-dimensional

array in a MDBarray in a MDB4.4. Any multidimensional Any multidimensional

MDB array (really a MDB array (really a hyperhypercube)cube)

Which definition do you Which definition do you suppose is technically suppose is technically correct?correct?

Data CubesData Cubes

MetadataMetadata Metadata = Metadata = data about datadata about data In a data warehousing context it can mean many In a data warehousing context it can mean many

thingsthings• Information on data in source OLTP systemsInformation on data in source OLTP systems• Information on ETL jobs and what they do to the dataInformation on ETL jobs and what they do to the data• Information on data in marts/star schemasInformation on data in marts/star schemas• Documentation in OLAP tools on the data they Documentation in OLAP tools on the data they

manipulatemanipulate Many institutions make metadata available via Many institutions make metadata available via

data malls or warehouse portals, e.g.:data malls or warehouse portals, e.g.:• University of New MexicoUniversity of New Mexico• UC DavisUC Davis• Rensselear Polytechnic InstituteRensselear Polytechnic Institute• University of IllinoisUniversity of Illinois

Good ETL tools automate the setup of Good ETL tools automate the setup of malls/portals!malls/portals!

The Data WarehouseThe Data Warehouse OK now we’re experts in terms like OLTP, OLAP, OK now we’re experts in terms like OLTP, OLAP,

star schema, metadata, etc.star schema, metadata, etc. Let’s use some of these terms to describe how a Let’s use some of these terms to describe how a

DW works:DW works:• Provides ample metadata – data about the dataProvides ample metadata – data about the data• Utilizes easy-to-understand column/field namesUtilizes easy-to-understand column/field names• Feeds multidimensional databases (MDBs)Feeds multidimensional databases (MDBs)• Is updated via periodic (mainly nightly) ETL jobsIs updated via periodic (mainly nightly) ETL jobs• Presents data in a simplified, denormalized formPresents data in a simplified, denormalized form• Utilizes star-like fact/dimension table schemasUtilizes star-like fact/dimension table schemas• Encompasses multiple, smaller data “marts”Encompasses multiple, smaller data “marts”• Supports OLAP tools (Access/Excel, Safari, Cognos BI)Supports OLAP tools (Access/Excel, Safari, Cognos BI)• Derives data from (multiple) back-end OLTP systemsDerives data from (multiple) back-end OLTP systems• Houses historical data, and Houses historical data, and cancan grow very big grow very big

A Data Warehouse is Not…A Data Warehouse is Not… Vendor and consultant proclamations Vendor and consultant proclamations

aside, a data warehouse is not:aside, a data warehouse is not:• A projectA project

With a specific end dateWith a specific end date• A product you buy from a vendorA product you buy from a vendor

Like an ODS (such as SCT’s)Like an ODS (such as SCT’s) A canned “warehouse” supplied by iStrategyA canned “warehouse” supplied by iStrategy Cognos ReportNetCognos ReportNet

• A database schema or instanceA database schema or instance Like OracleLike Oracle SQL ServerSQL Server

• A cut-down version of your live transactional A cut-down version of your live transactional databasedatabase

Kimball & Caserta’s DefinitionKimball & Caserta’s Definition

According to Ralph Kimball and Joe According to Ralph Kimball and Joe Caserta, a data warehouse is:Caserta, a data warehouse is:

A system that extracts, cleans, conforms, and A system that extracts, cleans, conforms, and delivers source data into a delivers source data into a dimensional data dimensional data storestore and then supports and implements and then supports and implements querying and analysis for the purpose of querying and analysis for the purpose of decision making.decision making.

Another def.: The union of all the enterprise’s data martsAnother def.: The union of all the enterprise’s data marts Aside: The Kimball model is not without some critics:Aside: The Kimball model is not without some critics:

• E.g., Bill E.g., Bill InmonInmon

Example Data Warehouse (1)Example Data Warehouse (1)

This one is This one is RPI’sRPI’s

5 parts:5 parts:• SourcesSources• ETL stuffETL stuff• DW properDW proper• Cubes etc.Cubes etc.• OLAP appsOLAP apps

Example Data Warehouse (2)Example Data Warehouse (2)

Caltech’s DWCaltech’s DW Five Parts:Five Parts:

• Source systemsSource systems• ETL processesETL processes• Data martsData marts• FM/metadataFM/metadata• Reporting and Reporting and

analysis toolsanalysis tools• Note: They’re Note: They’re

also customers also customers of Cognos!of Cognos!

So Where is Colorado College?So Where is Colorado College?

Phil Goldstein (Educause Center for Applied Phil Goldstein (Educause Center for Applied Research fellow) identifies the major deployment Research fellow) identifies the major deployment levels:levels:• Level 1: Transactional systems onlyLevel 1: Transactional systems only• Level 2a: ODS or single data mart; no ETLLevel 2a: ODS or single data mart; no ETL• Level 2: ODS or single data mart with ETL toolsLevel 2: ODS or single data mart with ETL tools• Level 3a: Warehouse or multiple marts; no ETL; OLAPLevel 3a: Warehouse or multiple marts; no ETL; OLAP• Level 3b: Warehouse or multiple marts; ETL; OLAPLevel 3b: Warehouse or multiple marts; ETL; OLAP• Level 3: Enterprise-wide warehouse or multiple marts; Level 3: Enterprise-wide warehouse or multiple marts;

ETL tools; OLAP toolsETL tools; OLAP tools Goldstein’s study was just released in late 2005Goldstein’s study was just released in late 2005 It’s very good; based on real survey dataIt’s very good; based on real survey data Which level is Colorado College at?Which level is Colorado College at?

Implementing a Data WarehouseImplementing a Data Warehouse In many organizations IT people want to huddle and work In many organizations IT people want to huddle and work

out a warehousing plan, but in factout a warehousing plan, but in fact• The purpose of a DW is decision supportThe purpose of a DW is decision support• The primary audience of a DW is therefore College decision The primary audience of a DW is therefore College decision

makersmakers• It is College decision makers therefore who must determineIt is College decision makers therefore who must determine

ScopeScope PriorityPriority ResourcesResources

Decision makers can’t make these determinations without Decision makers can’t make these determinations without an understanding of data warehousesan understanding of data warehouses

It is therefore imperative that key decision makers first be It is therefore imperative that key decision makers first be educated about data warehouseseducated about data warehouses• Once this occurs, it is possible toOnce this occurs, it is possible to

Elicit requirements (a critical step that’s often skipped)Elicit requirements (a critical step that’s often skipped) Determine priorities/scopeDetermine priorities/scope Formulate a budgetFormulate a budget Create a plan and timeline, with real milestones and deliverables!Create a plan and timeline, with real milestones and deliverables!

Is This Really a Good Plan?Is This Really a Good Plan? Sure, according to Phil Goldstein (Educause Center for Sure, according to Phil Goldstein (Educause Center for

Applied Research)Applied Research) He’s conducted extensive surveys on “academic analytics” He’s conducted extensive surveys on “academic analytics”

(= business intelligence for higher ed)(= business intelligence for higher ed) His four recommendations for improving analytics:His four recommendations for improving analytics:

1.1. Key decisionmakers must Key decisionmakers must leadlead the way the way2.2. Technologists must Technologists must collaboratecollaborate

• Must collect requirementsMust collect requirements• Must form strong partnerships with functional sponsorsMust form strong partnerships with functional sponsors

3.3. IT must IT must buildbuild the needed infrastructure the needed infrastructure• Carleton violated this rule with Cognos BICarleton violated this rule with Cognos BI• As we discovered, without an ETL/warehouse infrastructure, As we discovered, without an ETL/warehouse infrastructure,

success with OLAP is elusivesuccess with OLAP is elusive4.4. Staff must Staff must traintrain and develop deep analysis skills and develop deep analysis skills

Goldstein’s findings mirror closely the advice of industry Goldstein’s findings mirror closely the advice of industry heavyweights – Ralph Kimball, Laura Reeves, Margie Ross, heavyweights – Ralph Kimball, Laura Reeves, Margie Ross, Warren Thornthwaite, etc.Warren Thornthwaite, etc.

Isn’t a DW a Huge Undertaking?Isn’t a DW a Huge Undertaking?

Sure, it can be hugeSure, it can be huge Don’t hold on too tightly to the big-Don’t hold on too tightly to the big-

sounding word, “warehouse”sounding word, “warehouse” Luminaries like Ralph Kimball have shown Luminaries like Ralph Kimball have shown

that a data warehouse can be built that a data warehouse can be built incrementallyincrementally• Can start with just a few data martsCan start with just a few data marts• Targeted consulting help will ensure proper, Targeted consulting help will ensure proper,

extensible architecture and tool selectionextensible architecture and tool selection

What Takes Up the Most Time?What Takes Up the Most Time?

You may be surprised You may be surprised to learn what DW step to learn what DW step takes the most timetakes the most time

Try guessing which:Try guessing which:• HardwareHardware• Physical database setupPhysical database setup• Database designDatabase design• ETLETL• OLAP setupOLAP setup

Acc. to Kimball & Caserta, ETL will eat up 70% of the time.Other analysts give estimates ranging from 50% to 80%.

The most often underestimated part of the warehouse project!

0

10

20

30

40

50

60

70

80

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Hardware

Database

ETL

Schemas

OLAP tools

Eight Month Initial DeploymentEight Month Initial DeploymentStepStep DurationDuration

Secure, configure networkSecure, configure network 1 day1 day

Deploy physical “target” DBDeploy physical “target” DB 4 days4 days

Learn/deploy ETL toolLearn/deploy ETL tool 28 days28 days

Choose/set up modeling toolChoose/set up modeling tool 21 days21 days

Design initial data martDesign initial data mart 7 days7 days

Design ETL processesDesign ETL processes 28 days28 days

Hook up OLAP toolsHook up OLAP tools 7 days7 days

Publicize, train, trainPublicize, train, train 21 days21 days

StepStep DurationDuration

Begin educating decision makersBegin educating decision makers 21 days21 days

Collect requirementsCollect requirements 14 days14 days

Decide general DW designDecide general DW design 7 days7 days

Determine budgetDetermine budget 3 days3 days

Identify project rolesIdentify project roles 1 day1 day

Eval/choose ETL toolEval/choose ETL tool 21 days21 days

Eval/choose physical DBEval/choose physical DB 14 days14 days

Spec/order, configure serverSpec/order, configure server 20 days20 days

ConclusionConclusion Information is held in transactional systemsInformation is held in transactional systems

• But transactional systems are complexBut transactional systems are complex• They don’t talk to each other well; each is a siloThey don’t talk to each other well; each is a silo• They require specially trained people to report off ofThey require specially trained people to report off of

For normal people to explore institutional data, data in For normal people to explore institutional data, data in transactional systems needs to betransactional systems needs to be• Renormalized as star schemasRenormalized as star schemas• Moved to a system optimized for analysisMoved to a system optimized for analysis• Merged into a unified whole in a Merged into a unified whole in a data warehousedata warehouse

Note: This process must be led by “customers”Note: This process must be led by “customers”• Yes, IT people must build the infrastructureYes, IT people must build the infrastructure• But IT people aren’t the main customersBut IT people aren’t the main customers

So who are the customers?So who are the customers?• Admissions officers trying to make good admission decisionsAdmissions officers trying to make good admission decisions• Student counselors trying to find/help students at riskStudent counselors trying to find/help students at risk• Development offers raising funds that support the CollegeDevelopment offers raising funds that support the College• Alumni affairs people trying to manage volunteersAlumni affairs people trying to manage volunteers• Faculty deans trying to right-size departmentsFaculty deans trying to right-size departments• IT people managing software/hardware assets, etc….IT people managing software/hardware assets, etc….