building data marts – a sprint not a marathon (forward intelligence) v5

33
A Sprint Not A Marathon Building Data Marts

Upload: david-waters

Post on 18-Feb-2017

438 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon

Building Data Marts

Page 2: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon �  Iterative Data Warehousing

�  Lightweight Data Warehousing

�  Agile Data Warehousing

Page 3: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon �  In 1986 Dr Fred P Brooks Jr published a journal paper titled

“No Silver Bullet”

�  Two promising techniques that Brooks identified are “Rapid Prototyping” and "Software Requirements Refinement”

�  Since the late 1990s rapid prototyping and requirements refinement have been two of the major innovations of the agile software development methodologies

Page 4: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon �  Ralph Kimball’s dimensional modelling design patterns

�  Ralph Kimball’s ETL design patterns

�  SAS BI Enterprise Toolset

Page 5: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon �  Lean Software Development

§  Eliminate waste

§  Amplify learning

§  Decide as late as possible

§  Deliver as fast as possible

§  Empower the team

§  Build integrity in

§  See the whole

�  Mary Poppendieck’s work based on Toyota’s lean manufacturing

Page 6: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon

Page 7: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon �  Manifesto for Agile Software Development

�  Individuals and interactions over processes and tools

� Working software over comprehensive documentation

� Customer collaboration over contract negotiation

� Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more http://agilemanifesto.org/

Page 8: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon �  What it is Not

•  Lack of processes and tools

•  Lack of documentation

•  Lack of contract negotiation

•  Lack of planning

Page 9: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon

Page 10: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon

WHO?

Page 11: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Who? �  Griffith University

�  37,786 Students (Fulltime and Part-time) �  3,563 Staff (Full Time Equivalent) �  5 Campuses (Nathan, Gold Coast, Mt Gravatt, Logan, South Bank)

�  8,847 International Students from 119 countries �  38 Research Centres �  268 Undergraduate Programs �  382 Postgraduate Programs �  104 Research Programs

�  SAS Enterprise BI Server 9.1 and SAS Strategy Management

Page 12: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Who? �  UNSW – University of New South Wales

�  Formed in 1949 �  46,302 Students �  11,592 International Students �  9,408 Degrees and Diplomas Bestowed �  8737 Full Time Staff �  8,645 Part Time Staff �  965 Casual Staff �  Member of the Group of Eight (G8) �  Ranked 47 in Times Higher Education-QS World University

Rankings

�  SAS Enterprise BI Server 9.2

Page 13: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon

WHY?

Page 14: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Why? �  Software development is expensive

�  Risk of failure is high

�  Industry surveys state that that the majority of software projects fail

Page 15: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Why? �  Software delivery fails to meet schedule

�  Software delivery fails to meet budget

�  Software delivery fails to meet business or user needs

Page 16: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Why? �  Fundamentally agile attempts to solve all three of these

dilemmas (budget, schedule and meeting user needs) through the use of light weight iterative processes

�  Agile focuses first on those requirements that are most useful to the business users (the business users choose what gets delivered first)

Page 17: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Why? �  Anti-Agile

�  BIG BANG �  One massive release after two years work

�  If You Build It They Will Come �  Let’s build something and see if someone uses it

Page 18: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon

HOW?

Page 19: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? �  People

�  Processes

�  Tools

Page 20: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? � People

�  Best People You Can Get

� Well Trained (on tools, methods and business skills)

� Collocated (users, ETL developers, report writers)

�  Business and Technologists

Page 21: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? � Processes

� Design Processes (user involvement)

� Short Iterations (two to three months)

� Useful Documentation (no useless documentation)

� Migration and Version Control, Issue Tracking

Page 22: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? � Tools

� SAS Enterprise BI Toolset

� Ralph Kimball’s dimensional modelling design patterns

� Ralph Kimball’s ETL design patterns �  Kimball Articles

1.  The 38 Subsystems of ETL By Ralph Kimball December 4, 2004

2.  Kimball University: The Subsystems of ETL Revisited By Bob Becker October 21, 2007 (34 Subsystems)

Page 23: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? �  Tools – All of these are provided by SAS Data Integration Studio

�  01 Data Profiling �  02 Change Data Capture �  03 Extract System �  04 Data Cleansing System �  05 Error Event Tracking �  06 Audit Dimension Creation �  07 Deduplication �  08 Data Conformance �  09 Slowly Changing Dimension (SCD) Manager �  10 Surrogate Key Generator �  11 Hierarchy Manager

Page 24: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? �  Tools – All of these are provided by SAS Data Integration Studio

�  12 Special Dimensions Manager �  13 Fact Table Builders �  14 Surrogate Key Pipeline �  15 Multi-Valued Bridge Table Builder �  16 Late Arriving Data Handler �  17 Dimension Manager �  18 Fact Table Provider �  19 Aggregate Builder �  20 OLAP Cube Builder �  21 Data Propagation Manager �  22 Job Scheduler (LSF in Management Console) �  23 Backup System (Metadata Backup Tools Provided)

Page 25: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? �  Tools – All of these are provided by SAS Data Integration Studio

�  24 Recovery and Restart �  25 Version Control �  26 Version Migration �  27 Workflow Monitor �  28 Sorting �  29 Lineage and Dependency �  30 Problem Escalation �  31 Paralleling and Pipelining �  32 Security �  33 Compliance Manager (auditing of data access) �  34 Metadata Repository

Page 26: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? �  Supporting Processes

� Version Control – Subversion/SVN

�  Infrastructure Control - ITIL

� Testing – Reconciliation, Performance, Regression

�  Bug Tracking - FlySpray

Page 27: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – How? �  What Documentation?

� Design Documentation – Office Suite

� Handover Documentation - WIKI

� Change Management Documentation - WIKI

§  Technical How To Documents and End User Documentation

Page 28: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon

AND?

Page 29: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Summary �  Agile works if

§  Staff are talented, well trained, well funded and appreciated §  Software is well tested, delivered rapidly §  Customers are engaged, interaction, regarded as capable §  Software can rapidly adapt to change

�  Agile Requires §  Good design (i.e. easily extendable) §  Version Control §  Infrastructure Control §  Automagical Testing §  Bug Tracking

�  Good Tools §  SAS Data Integration Studio §  SAS Web Report Studio §  SAS has End to End Meta Data §  Design Patterns

Page 30: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

A Sprint Not A Marathon – Summary �  Agile Data Warehousing:

Delivering World-Class Business Intelligence Systems Using Scrum and XP

�  Ralph Hughes (2008)

Page 31: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

References - Agile �  Lean software development : an agile toolkit / Mary Poppendieck, Tom Poppendieck.

Author: Poppendieck, Mary. Publication: Boston : Addison-Wesley, c2003. Beck, Kent. 1999, Embracing Change with Extreme Programming, IEEE Computer, Vol.32, no.10, pp. 70-77.

�  Boehm, Barry. 1985, Spiral Model of Software Development and Enhancement, Proc. Int’l Workshop Software Process and Software Environments also ACM Software Eng. Notes, Aug. 1986, pp. 22-42

�  Boehm, Barry; Turner, Richard. 2003, Using Risk to Balance Agile and Plan-Driven Methods, IEEE Computer, Volume 36, no 6, pp. 57 - 66

�  Brooks, Fred P Jr. 1987, No Silver Bullet: Essence and Accidents of Software Engineering, Proc. IFIP, IEEE CS Press, 1987, pp. 1069-1076; reprinted in Computer, Apr. 1987, pp. 10-19.

�  Cockburn, Alistair 2004, Crystal Clear: A Human-Powered Methodology for Small Teams, Addison-Wesley Professional

�  Cockburn, Alistair. 2001, Agile Software Development, Addison-Wesley Professional

�  Cockburn, Alistair; Highsmith, Jim. Boehm, Barry (Editor) 2001a, Agile Software Development: The People Factor, IEEE Computer, Vol. 34, no. 11, pp. 131-133.

�  Jim Highsmith, Alistair Cockburn, Agile Software Development: The Business of Innovation, IEEE Computer, vol. 34, no. 9, pp. 120-122, Sept. 2001, doi:10.1109/2.94710

�  Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas, 2001, The Agile Manifesto, http://agilemanifesto.org/ Accessed 22 August 2010.

�  Larman, Craig; Basili, Victor R. 2003, Iterative and Incremental Development: A Brief History, IEEE Computer, Volume 36, no 6, pp 47 - 56

�  Royce, Dr Winston W, 1970 Managing the Development of Large Software Systems IEEE WESCON, August 1970 pages 1-9

�  Salo, Outi. 2006, Enabling Software Process Improvement in Agile Software Development Teams and Organisations. Ph.D Thesis - Espoo 2006. VTT Publications 618. 149 p. + app. 96 p

Page 32: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

References – Design Patterns �  http://www.ralphkimball.com/

�  The 38 Subsystems of ETL By Ralph Kimball December 4, 2004

�  Kimball University: The Subsystems of ETL Revisited By Bob Becker October 21, 2007 Alexander, Christopher et al (1977). A Pattern Language: Towns, Buildings, Construction. Oxford University Press, USA, 1216. ISBN 0195019199.

�  Gamma, Erich; Richard Helm, Ralph Johnson, and John Vlissides (1995). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. ISBN 0-201-63361-2.

�  Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy, Bob Becker The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems John Wiley & Sons, 2008

Page 33: Building Data Marts – a Sprint Not A Marathon (Forward Intelligence) v5

Questions? �  Contact Details

�  David M Waters �  Mobile: 0408 074 082 �  Email: [email protected] �  Web: www.forwardintelligence.com.au �  LinkedIn: www.linkedin.com/in/davidmwaters