  • 8/11/2019 Week 1 Introduction Presentation


    Effective Data Preparation

    Week 1: Introduction

    Data integration used to be done solely in data warehousing

    operations. Now, predictive analysts must do it themselves,

    to make sure that it is done right!

  • 8/11/2019 Week 1 Introduction Presentation



    Assigned Readings for all students

    Introductions by other students

    Grading Rubric document

    Preface, Introduction, and Chapter 1 in the text

    Assigned Readings for students without SDM

    (STATISTICA Data Miner)

    Procedure for downloading and installing SDM

    Inserting and Connecting Nodes document

    SDM Help topic on the SDM interface

    Chapter 10 in the text


  • 8/11/2019 Week 1 Introduction Presentation


    Getting Acquainted & Oriented

    Welcome to Effective Data Preparation!

    This course will be presented in the form of an

    extended tutorial for preparing and modeling

    the KDD-Cup 1998 data set.

    Each week will have 2- 3 assignments to be completed.

    Most assignments will be file submissions

    Work for a given week will require up to 15 hours of work. There will be several quizzes.


  • 8/11/2019 Week 1 Introduction Presentation


    Getting Acquainted & Oriented

    Get acquainted with other students. Learn

    where they work, and what their goals are.

    If you have not done so before, read the

    documents on downloading and installing

    SDM, and the document on connecting SDM

    Become familiar with the Grading Rubric

    document, so you will know what to expect in

    this class.


  • 8/11/2019 Week 1 Introduction Presentation


    Getting to Know Everyone

    Introduce yourself in the Welcome Messages


    Respond to at least one welcome message by

    a student. This assignment will be graded as a

    pass/fail task.

    You can help each other with the tasks in this

    course. Relationships are important!


  • 8/11/2019 Week 1 Introduction Presentation


    STATISTICA Data Miner We will be the same STATISTICA Data Miner (SDM) Version 12

    software package from Statsoft, which was used in the Intro


    If you have not done so before (or have to do it again), read

    the document:


    Download and install Version 12 of SDM

    If you have question about or problems with the download

    and installation, contact [email protected]


    mailto:[email protected]:[email protected]
  • 8/11/2019 Week 1 Introduction Presentation


    A New Look for STATISTICA

    If you used SDM Version 10 in one of the previous

    Introduction courses, you will notice a big change in

    the modeling canvas

    Version 10 divided the canvas into 4 parts

    Version 12 has no divisions on the canvas

    This means you can put your modeling icons anywhere you

    want to!

    I am a big fan of this change.


  • 8/11/2019 Week 1 Introduction Presentation


    Become Familiar with SDM

    1. Read Data Miner Workspace document in the SDM

    Help File

    Click on Help on the top menu

    Click the Index tab, and enter Data MinerWorkspace

    Scroll down to the Data Miner Workspace

    document, and read it. Read Chapter 10 (pages 214-234) in the text

    about STATISTICA Data Miner.


  • 8/11/2019 Week 1 Introduction Presentation


    Become Familiar with SDM

    2. Read the document: Inserting and

    Connecting Nodes.doc

    3. Become familiar with the SDM Help file.

    This facility is a marvelous way to learn about

    almost anything related to predictive

    analytics in general, and SDM in particular.


  • 8/11/2019 Week 1 Introduction Presentation


    The 10 Steps of Data Preparation

    1. Data access and extraction

    2. Data Integration

    3. Data Cleansing

    4. Data conditioning5. Missing value imputation

    6. New Variable derivation

    7. Variable Selection

    8. Algorithm selection

    9. Preliminary model building

    10. Feedback to earlier data preparation activities


  • 8/11/2019 Week 1 Introduction Presentation



    (in a data preparation course?) Role of modeling in data preparation.

    Additional tasks?

    An iterative loop!

    Practice in model building will be useful in general.


  • 8/11/2019 Week 1 Introduction Presentation


    Is All of this Stuff Necessary?

    You will be tempted to omit a step

    This is nota good idea

    You can create a model without all steps, but

    The bottom line is

    The motto of Seal Team 6 is: The more blood, sweat and

    tears your put into training, the less you will shed in battle

    The same is true with data preparation.


  • 8/11/2019 Week 1 Introduction Presentation



    That is all that there is in Week 1.

    There will be more to do in Week 2, and

    particularly in Week 3; the work is not linearly

    distributed across all weeks.

    You may take the Week 1 test now.

    When you finish the test, you can continue to

    Week 2 assignments.


Top Related