week 1 introduction presentation
TRANSCRIPT
-
8/11/2019 Week 1 Introduction Presentation
1/13
extension.uci.edu
Effective Data Preparation
Week 1: Introduction
Data integration used to be done solely in data warehousing
operations. Now, predictive analysts must do it themselves,
to make sure that it is done right!
-
8/11/2019 Week 1 Introduction Presentation
2/13
Introduction
Assigned Readings for all students
Introductions by other students
Grading Rubric document
Preface, Introduction, and Chapter 1 in the text
Assigned Readings for students without SDM
(STATISTICA Data Miner)
Procedure for downloading and installing SDM
Inserting and Connecting Nodes document
SDM Help topic on the SDM interface
Chapter 10 in the text
2
-
8/11/2019 Week 1 Introduction Presentation
3/13
Getting Acquainted & Oriented
Welcome to Effective Data Preparation!
This course will be presented in the form of an
extended tutorial for preparing and modeling
the KDD-Cup 1998 data set.
Each week will have 2- 3 assignments to be completed.
Most assignments will be file submissions
Work for a given week will require up to 15 hours of work. There will be several quizzes.
3
-
8/11/2019 Week 1 Introduction Presentation
4/13
Getting Acquainted & Oriented
Get acquainted with other students. Learn
where they work, and what their goals are.
If you have not done so before, read the
documents on downloading and installing
SDM, and the document on connecting SDM
Become familiar with the Grading Rubric
document, so you will know what to expect in
this class.
4
-
8/11/2019 Week 1 Introduction Presentation
5/13
Getting to Know Everyone
Introduce yourself in the Welcome Messages
Forum
Respond to at least one welcome message by
a student. This assignment will be graded as a
pass/fail task.
You can help each other with the tasks in this
course. Relationships are important!
5
-
8/11/2019 Week 1 Introduction Presentation
6/13
STATISTICA Data Miner We will be the same STATISTICA Data Miner (SDM) Version 12
software package from Statsoft, which was used in the Intro
course.
If you have not done so before (or have to do it again), read
the document:
PROCEDURE_for_DOWNLOADING_STATISTICA_Data_Miner_software.doc
Download and install Version 12 of SDM
If you have question about or problems with the download
and installation, contact [email protected]
6
mailto:[email protected]:[email protected] -
8/11/2019 Week 1 Introduction Presentation
7/13
A New Look for STATISTICA
If you used SDM Version 10 in one of the previous
Introduction courses, you will notice a big change in
the modeling canvas
Version 10 divided the canvas into 4 parts
Version 12 has no divisions on the canvas
This means you can put your modeling icons anywhere you
want to!
I am a big fan of this change.
7
-
8/11/2019 Week 1 Introduction Presentation
8/13
Become Familiar with SDM
1. Read Data Miner Workspace document in the SDM
Help File
Click on Help on the top menu
Click the Index tab, and enter Data MinerWorkspace
Scroll down to the Data Miner Workspace
document, and read it. Read Chapter 10 (pages 214-234) in the text
about STATISTICA Data Miner.
8
-
8/11/2019 Week 1 Introduction Presentation
9/13
Become Familiar with SDM
2. Read the document: Inserting and
Connecting Nodes.doc
3. Become familiar with the SDM Help file.
This facility is a marvelous way to learn about
almost anything related to predictive
analytics in general, and SDM in particular.
9
-
8/11/2019 Week 1 Introduction Presentation
10/13
The 10 Steps of Data Preparation
1. Data access and extraction
2. Data Integration
3. Data Cleansing
4. Data conditioning5. Missing value imputation
6. New Variable derivation
7. Variable Selection
8. Algorithm selection
9. Preliminary model building
10. Feedback to earlier data preparation activities
10
-
8/11/2019 Week 1 Introduction Presentation
11/13
Modeling:
(in a data preparation course?) Role of modeling in data preparation.
Additional tasks?
An iterative loop!
Practice in model building will be useful in general.
11
-
8/11/2019 Week 1 Introduction Presentation
12/13
Is All of this Stuff Necessary?
You will be tempted to omit a step
This is nota good idea
You can create a model without all steps, but
The bottom line is
The motto of Seal Team 6 is: The more blood, sweat and
tears your put into training, the less you will shed in battle
The same is true with data preparation.
12
-
8/11/2019 Week 1 Introduction Presentation
13/13
Finished!
That is all that there is in Week 1.
There will be more to do in Week 2, and
particularly in Week 3; the work is not linearly
distributed across all weeks.
You may take the Week 1 test now.
When you finish the test, you can continue to
Week 2 assignments.
13