Page 1: Chapter 13 Data Mining - 2 -

Chapter 13Data Mining

Page 2: Chapter 13 Data Mining - 2 -

- 2 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Recommended References• This lecture assumes some knowledge on learning systems. We

recommend:– P. Langley: Elements of Machine Learning. Morgan Kaufman 1996.

– T.M. Mitchell: Machine Learning. McGraw Hill 1997.

– R. Bergmann: Slides on “Lernende Systeme”, ;also: M.M. Richter: Lernende Systeme, Vorlesungsmanuskript Kaiserslautern.

– Bergmann, R. & Stahl, S. (1998). Similarity Measures for Object-Oriented Case Representations. Proceedings of the European Workshop on Case-Based Reasoning,


• Data Mining references:– P. Adriaans, D.Zatinge: Data Mining. Addison Wesley 1996.

– Th. Reinartz: Focusing Solutions for Data Mining. Springer Lecture Notes in AI 1623, 1998.

– S.M. Weiss, N. Indurkhya: Predictive Data Mining. Morgan Kaufman 1997.

Page 3: Chapter 13 Data Mining - 2 -

- 3 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Data Mining, Learning and Performance (1)

• The ultimate goal is to make an optimal performance of some process P.

• The meaning of this is given by the users utility.• In order to make an optimal performance certain

knowledge is necessary. This knowledge may be implicitly in the available data and has to be made usable, i.e. has to be learned.

• For learning one needs:– What are precisely the goals?– How to measure the achievements of the goals?– How to react if goals are not achieved ?

Page 4: Chapter 13 Data Mining - 2 -

- 4 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Users view on the performance of P

Formal evaluationfunction F for P


• Coincidence of the users view on the performance and the result of the evaluation is wanted.

• Often the coincidence can be only be approximated

Data Mining, Learning and Performance (2)

The performance of the process P is tested in experiments whichgenerates certain data D. Thesedata are the input to some evaluation function F.

Page 5: Chapter 13 Data Mining - 2 -

- 5 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Data Mining, Learning and Performance (3)

Process P and knowledge K

Generated data Dexperiment

Evaluation result

Analysis result

Data Mining: analyze data and evaluation result

Improved Process P’ and knowledge K’

Data mining



Page 6: Chapter 13 Data Mining - 2 -

- 6 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

KDD: Knowledge Discovery in Data Bases

• Knowledge Discovery in Data Bases is the non-trivial

process of identifying valid, novel, potential useful, and

ultimately understandable patterns in data (Fayyad).

• Data Mining is often used as a synonym for KDD but

sometimes restricted to a crucial step in KDD:

• The step of applying data analysis and discovery

algorithms that, under acceptable computational

efficiency limitations, produce a particular enumeration

of patterns over the data.

Page 7: Chapter 13 Data Mining - 2 -

- 7 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

KDD Phases








Page 8: Chapter 13 Data Mining - 2 -

- 8 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Requirement Analysis for KDD Processes



application requirements






system contextcharacteristics

Page 9: Chapter 13 Data Mining - 2 -

- 9 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Data Mining and the Pre-Sales Process• The purpose of the data mining for the pre-sales

process is to get knowledge which allows the supplier to catch more customers of the intended target groups.

• The knowledge obtained can be concerned with– The market in general– The market with respect to certain products– The behavior of certain customer classes: Marketing Campaign

Management: How react customers on marketing actions ? Basket Analysis: What buy customers typically ?

– Individual customers and their behavior

• The general strategy for data mining of a company is the strategic model which on the other hand is influenced from feed back of the results obtained.

Page 10: Chapter 13 Data Mining - 2 -

- 10 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Data Mining and the Sales Process

• The purpose of the data mining for the sales process is to get knowledge which allows the supplier to improve the quality of his processes in such a way that customers who have contacted the supplier – are guided efficiently in the sales process– make a positive decision for the sale

• This includes– offering the products appropriately– offering adequate alternatives– guiding effeciently through the dialogue

• This influences the diagnostic and the action model.

Page 11: Chapter 13 Data Mining - 2 -

- 11 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Data Mining in the After Sales Process

• The purpose of the data mining for the after-sales process is to get knowledge which allows to deal with customer questions and complaints more efficiently.

• The goals are– improve recognition of reasons for calls– avoid repeated calls– come efficiently to solutions

• Useful knowledge is mainly contained in experiences and therefore the collection of experiences is central.

• Experiences are best stored as cases in CBR.

Page 12: Chapter 13 Data Mining - 2 -

- 12 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

The Starting Point: Data (1)

• Data have a certain quality– Correctness and completeness problem

• It is essential to address the problem of data quality: if you feed garbage into the system, you will get garbage out !– the insights obtained from the data lead to

incorrect consequences (wrong data)– the insights are too general to be useful

(incomplete data)

Page 13: Chapter 13 Data Mining - 2 -

- 13 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Starting Point: Data (2)

• Data may be noisy• Incorrect data

– wrong values for the attributes– incorrect classification– duplicate data

• Incomplete data– missing values for some attributes– missing attributes– missing objects

• Data not usable– free text difficult to cope with– terminology not understood– not suitable for the intended goals

Page 14: Chapter 13 Data Mining - 2 -

- 14 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Starting Point: Data (3)

• Knowledge management task:• Quality management !• Data sampling

– Define the goals– Quality is more important than quantity– Make use of existing information sources to

ensure completeness of the base– Create your own sources– Data have to come in time: Data which are too old

are not useful (updating problem)• See chapter 15.

Page 15: Chapter 13 Data Mining - 2 -

- 15 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Data for what Knowledge ?• The way data are obtained depends on the type of

knowledge one is interested in.• We distinguish three main types:

– Knowledge about some market. This will influence the strategic, the diagnostic model and the action model of the supplier.

– Knowledge about individual customers. It is used to treat the customer individually, e.g. making special offers.

– Knowledge about technical objects: Their quality, how to explain to operate them etc,

• With the type of knowledge different – goals of the supplier– data sources

are connected.

Page 16: Chapter 13 Data Mining - 2 -

- 16 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Idea: Store knowledge like physical objectsAllows: Access, delivery, manipulation as for physical objects.

Data Ware House:• Access to knowledge for immediate use• Makes knowledge available for improving the quality

The data warehouse is managed by the knowledge manager.

Data Ware House

Page 17: Chapter 13 Data Mining - 2 -

- 17 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

From Data to Knowledge (1)

Data Facts

Information Description, definition, perspective

Knowledge Strategy, practice, method

Wisdom Insight, moral



What, when, where, who?

How, why? Implications

(Understand relations)

(Understand models, rules, patterns)

(Understand principles)


Page 18: Chapter 13 Data Mining - 2 -

- 18 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

From Data to Knowledge (2)

• Data are raw products• Information pieces are semifinished products• Knowledge and wisdom are high quality products

But:When using knowledge acces to actual data and information is necessary, How to do this ?


Page 19: Chapter 13 Data Mining - 2 -

- 19 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

It is a knowledge management task to provide for each application

of knowledge the needed actual data:

Task to perform

Knowledge applied Data needed

From Data to Knowledge (3)

Page 20: Chapter 13 Data Mining - 2 -

- 20 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

From Data to Knowledge (4)

• Only explicit knowledge can be used directly• Explicit knowledge is directly formulated:

Prescriptions, rules, norms Suggestions, ways to behave General laws, exceptions Hierarchical relations Properties, Constraints . . .

Page 21: Chapter 13 Data Mining - 2 -

- 21 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

From Data to Knowledge (5)

• Implicit knowledge cannot be directly used• Implicit knowledge is:

contained in data and information often hidden and difficult to discover not directly applicable silent knowledge

Page 22: Chapter 13 Data Mining - 2 -

- 22 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

From Data to Knowledge (6)

Implicit knowledge: Sales statistics contain implicit knowledge about

customer preferences Data bases about accidents contain implicit

knowledge about dangerous situations Test data contain implicit knowledge about quality

Page 23: Chapter 13 Data Mining - 2 -

- 23 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

From Data to Knowledge (7)

• Data and pieces of information have to be correct (or exact tolerances have to be given)

• Knowledge has not to be totally correct in order to be useful:– Probabilities, Evidences– Heuristics– Rules of thumb– Vague statements („this is not reliable“, „the

weather there is not nice in November“)– Fuzzy statements

• A correct statement in a complex situation may even be useless because it is too complicated

Page 24: Chapter 13 Data Mining - 2 -

- 24 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern


• Wisdom is usually referred to as a very advanced type of knowledge

• It refers to the understanding of basic background principles

• Only in the exact sciences it can be expressed in precise terms

• Wisdom is of relevance for the strategic model (which is mainly informal)

Page 25: Chapter 13 Data Mining - 2 -

- 25 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Make Knowledge Explicit (1)

• General properties of products need to be differently represented in different situations:

• Vacations in Tirol are nice and warm (for persons from Alaska) nice and cool (for persons from Brazil)

• A car is good and speedy on small and hilly roads (Germany) is comfortable (USA)

Page 26: Chapter 13 Data Mining - 2 -

- 26 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Make Knowledge Explicit (2)

• Use the properties of a product in order to– guarantee the satisfaction of different safety

regulations– satisfy different types of demands– respect different types of sensitivities

• Describe these properties in different ways• For such purposes one has to extract the specific

views from the overall knowledge

Page 27: Chapter 13 Data Mining - 2 -

- 27 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Reliability of Knowledge (1)

Extension of knowledge

Darkness indicatesreliability

Obtained by direct retrieval

Obtained by logical deduction

Obtained by approximativereasoning

Obtained by CBR

Obtained by learning and datamining

This assumesthat the underlyingdata and informationbases are reliable

Page 28: Chapter 13 Data Mining - 2 -

- 28 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Reliability of Knowledge (2)

• This schema is only a rough and general indication.• The success in applications depend heavily on e.g.

– correctness, amount and typicality of data– adequate choice of the specific method and precision with

which it is applied– number of experiments carried out– testing of the results

• Therefore the success depends on the investigated effort.

• There is again the utility question: Costs of obtaining knowledge versus gain of applying knowledge

Page 29: Chapter 13 Data Mining - 2 -

- 29 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Sources of Data• General analysis, public domain

– accessible to everyone but often widely distributed and hard to collect

• General analysis, performed by the company itself or some paid institution– expensive, but can be taylored to the needs of the company

• History of customers– requires customers who buy regularly– has to be updated regularly

• Internal analysis of customer behavior– reaction on change of

• prices• dialogue strategies etc.

• Cases– collected experiences, failure statistics etc.

Page 30: Chapter 13 Data Mining - 2 -

- 30 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

History of Customers• Knowledge about behavior about individual customers

should in general not be obtained by asking personal questions but rather automatically.

• One possibility is to do this at the cashier if the customer pays by a customer or credit card. A method for E-C is if the customer orders directly over the net.

• There may be certain restrictions by law.• The history can contain among others

– main products ordered and their quantities– times or events when ordered (weekend, holidays, time of the


• The history should contain (if possible) information about the customer (for description of customer classes)– age, sex, profession, location of living,...

Page 31: Chapter 13 Data Mining - 2 -

- 31 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Cases (1)

• In the after sales process histories have to be recorded if they are available, they are the material for the cases.

• Often there are not enough cases available to cover all or most of the relevant problem situations.

• In this situation artificial cases can be created which is done by variation of relevant parameters.

• Both, collecting and creating cases requires some a priori understanding of the tasks to be performed.

• To build a CBR-System one has to define the four containers vocabulary, case base, similarity measure and solution transformation.

Page 32: Chapter 13 Data Mining - 2 -

- 32 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Cases (2)

• There are commercial systems like CBR-Works which support the collection and representation of cases (see also chapters 3 and 12).

• A general methodology for developing CBR-System for applications in the help desk area is described in– R. Bergmann, S. Breen, M. Göker, M. Manago, S. Wess:

Developing Industrial Case-Based Reasoning Application - The INRECA- Methodology. Springer Lecture Notes in AI 1612, 1999.

Page 33: Chapter 13 Data Mining - 2 -

- 33 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

From Data to Information Using Knowledge

Raw Datawill be

valuable Information

by using


Customer: Company X, ArchitectsPC component: Matrox G100?

Company X:1x PC Dual-Pentium XL437, Sold 4/972x ML 649 (P233/124/9,6), Sold 5/97

SW: High-End, CAD&3D Visual., TCP/IP Netw., …G100:

Entry level graphics card, AGP slot necessary,very good Price/Power relation, limited 3D power,


“The G100 is only little useful for Company X because the architects use high-end 3D graphics

software. G100 is an entry level graphics cardand additionally needs it an AGP slot which is not built in the current HW configuration

of the PC’s.”

Page 34: Chapter 13 Data Mining - 2 -

- 34 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Three Main Phases

• Measurement: Collects numerical data about the intended utility

• Evaluation: Extracts statements about the utility from the data (excellent, good, sufficient, improved, insufficient, ...)

• Sensitivity Analysis: Extracts influence factors responsible for the result of the evaluation.

• The learning and data mining tools can – use the results of all three phases– can improve these phases

Page 35: Chapter 13 Data Mining - 2 -

- 35 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern


• The utility is often only informal and implicit in the head of the user.

• The measurement problem is – to map it on quantitative magnitudes– to define procedures which measure these quantities.

• The measurement procedures are often difficult to define and expensive.

• The parameters in the procedures have to be named precisely such that the procedure can be applied repeatedly (as e.g. in the exact sciences)

Page 36: Chapter 13 Data Mining - 2 -

- 36 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern


• The evaluation of the measured data has to close the gap between the data and the utility of the user:– the evaluation predicate should (at least ideally) coincide with

the predicate which is given by the user to the performance (see also the relation between similarity and utility in chapter 6).

• The evaluation should contain a statement about its reliability, e.g.– tolerances for errors– error probabilities– confidence intervals

• The reliability depends heavily on the input data (volume, representability, correctness,noise, etc.)

Page 37: Chapter 13 Data Mining - 2 -

- 37 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Sensitivity Analysis• This is the most difficult and the most important phase.

• The evaluation is given as a function Ev(d1,....,dn) where the di are data obtained by the measurement.

• The data di are on the other hand an indirect consequence of parameters pi which can be directly influenced by the person who designs the process (or product etc.) which is evaluated:

– Ev(d1,...,dn) = Influence(p1,...,pm)– where the function Influence is in general unknown.

• We call a parameter pi an (important) influence factor if small variations of pi result in large variations of the function influence(....,pi,...).

• The determination of influence factors is the basis for learning improvements of object under consideration.

Page 38: Chapter 13 Data Mining - 2 -

- 38 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

QMCB: Quantitative Models of Consumer Behavior

• Goal: The calculation and prediction of meaningful market diagnostics on the basis of data.

• A possible approach: Integration of statistical methods and models as well as econometric models in a knowledge based system.

• Tasks:– Descriptive (a posterori) analysis of data– Model based simulation of future buying behavior.

• The special types of task require special data representations for useful evaluations.

Page 39: Chapter 13 Data Mining - 2 -

- 39 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Different Types of Forecasts

• The types vary with respect to the knowledge they contain and the usefulness of the prognosis. From the QMCB one should be able to compute directly (examples):– Market share of a product– Product purchase probability, expectation and variance– Brand purchase probability, expectation and variance– Heterogeneity in purchase rates

• Indirect consequences:– relative product attraction– relative brand attraction– etc.

Page 40: Chapter 13 Data Mining - 2 -

- 40 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Example System: KVASS (1)

• KVASS (KaufVerhaltensAnalyse und SimulationsSystem) is an example of a model and knowledge based data analysis system.– Reference: R. Decker:Knowledge Based Selection and

Application of Quantitative Models of Consumer Behavior. Information Systems and Data Analysis (ed. H.H.Bock, W.Lenski, M.M.Richter), Springer Verlag 1994, p. 405-414.

• Basic idea: Model data with a predefined set of descriptors. These are essentially attributes with there domains, e.g.– estimation method : {undefined, least squares, ...., moments}– type of recording : {undefined, diary, ..., interview}

Page 41: Chapter 13 Data Mining - 2 -

- 41 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Example System: KVASS (2)

• Classes of descriptors are:– Essential aspects for a general description (type of recording,

market share etc.)– Temporal aspects (periods for data collections etc.)– Information on the models used for computation (e.g. estimation

method)– Technical descriptors for interpretation of the representation

(e.g. ordinal, nominal etc.)– Combination of descriptors allow to represent complex

situations; this can be translated in more understandable relational representations (see chapter 4).

Page 42: Chapter 13 Data Mining - 2 -

- 42 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Example System: KVASS (3)

• The system describes essentially a measurement procedure, i.e. the first phase.

• The purpose is not to make an evaluation about the success of a product or process of the company.

• The correctness condition is that the results provided by the analysis of the system coincide with the reality.

• The results of the system are on the other hand important for the sensitivity analysis concerning success or failure of processes or products designed by the company.

Page 43: Chapter 13 Data Mining - 2 -

- 43 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Causal Analysis (1)• Causal analysis is some kind of sensitivity analysis. Task:

Make causal relations explicit.

• Suppose the Xi are activities and the Yi are sales results. Notation:– Xi + Yi : positive influence– Xi - Yi : negative influence – no arow : neutral

• Initial situation: A suspected model for the influence.

• Either experiment: Variation of the Xi and measurement of the Yi or analysis in several companies.

• Data analysis: E.g. by analyzing the covariance structure.

• Result: Revised model and refined model.

Page 44: Chapter 13 Data Mining - 2 -

- 44 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Causal Analysis (2)

• Example (artificially created): – X1: Effort in catalogues– X2: Effort in dynamic forms – X3 :Effort in recording and applying customer histories– Y1: Return from book sales– Y2: Return from high tech products sales

• Initial model based on qualitative knowledge:X1





++ +

Page 45: Chapter 13 Data Mining - 2 -

- 45 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Causal Analysis (3)

Revised model:







+ +


A possibility for coming to a refined quantitative model is toassume a linear model (which may be justified by some knowledge).This leads to the linear equations

Y1 = a11X1 - a13X3Y2 = a21X1 + a22X2 + a23X3

The solutions for the coefficients aik will determine a quantitative model.

Page 46: Chapter 13 Data Mining - 2 -

- 46 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Quality Management: Internal Analysis

• As the first step the goals of the analysis have to be defined:– Where are the weak points ?– What has to be improved or optimized ?– Where are improvements possible ?

• This is part of the requirements analysis• Further steps include

– identify groups of objects with similar quality characteristics– identify properties of these groups– describe these groups– draw conclusions for quality improvements

Page 47: Chapter 13 Data Mining - 2 -

- 47 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Example: Quality Analysis for Dialogues (1)

• Classification of Dialogues (evaluation of the user):– succesfully finished– quit because no adequate product available– quit for unknown reasons : This is the failure class.

• Measurement:– Has to collect data which arise during the dialogue– These data may not be recorded during an ordinary dialogue,

e.g. • Which questions raised by the customer where dealing with

a certain property type of the product• Which actions where performed by customers from a certain

customer class– The quality of the measured data has to be considered

Page 48: Chapter 13 Data Mining - 2 -

- 48 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Quality Analysis for Dialogues (2)

• The evaluation is simple because it is the same as the one of the user.

• The sensitivity analysis has two phases here:– (1) Describe the evaluation result in terms of measured

quantities and determine the influence factors of this description.

– (2) Describe the evaluation result in terms of factors which define the dialogue.

• The first phases involves already a learning step:– The classification of the dialogue in terms of measured

quantities has to be learned. This classification approximates the real classes obtained from the evaluation.

Page 49: Chapter 13 Data Mining - 2 -

- 49 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Quality Analysis for Dialogues (3)

• The analysis of the first phases is based on the dialogue situations and additionally measured data.

• Typical candidates for interesting data in order to classify types of situations are

• length of the dialogue• not understandable terms• customer questions (How often? Typical ones?)• etc.

• The selection of these candidates depends on a hypothesis for a preliminary dependency model. The data mining and learning methods are used in order to refine and correct this model.

Page 50: Chapter 13 Data Mining - 2 -

- 50 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Quality Analysis for Dialogues (4)• The result allows a prognosis of the dialogue class from

the occurrence of dialogue situations which are important influence factors (but here in terms ob measured data!), in particular a description of failure situations, i.e. situations which lead with high probability to a failure dialogue.

• The description of the failure situations is refined in order to– discover dependencies between influence factors– in particular to obtain definitions of earliest failure situations in

dialogues, i.e. earliest situations in the dialogue which will lead to a failure.

• The earliest failure situations give rise to the second phase of the sensitivity analysis.

Page 51: Chapter 13 Data Mining - 2 -

- 51 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Quality Analysis for Dialogues (5)

• Second phase: Analysis of reasons for reaching earliest failure situations, mainly:

• Which elements in the strategy are responsible?

• Weak points of the knowledge base (e.g. wrong prices for products)?

• These reasons can directly be influenced when the dialogue is designed.

• Consequences of the analysis (learned results):– improved knowledge base

– Possible changes of the strategy

– Possible disadvantages of changes

• Final recommendations: Update

Page 52: Chapter 13 Data Mining - 2 -

- 52 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern


• The dialogue and the situations can be given in a (possibly object oriented) attribute-value representation. Some virtual attributes (like length of dialogues) can be useful, they contain valuable knowledge.

• One way to proceed is to use cluster analysis techniques and machine learning algorithms (e.g. CN2, C4.5) for learning the classification.

• Another way is to consider the data base as a case base and start with an initial similarity measure which is improved during the development of a CBR-system for the classification and the improvement suggestions.

Page 53: Chapter 13 Data Mining - 2 -

- 53 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Learning Informal Concepts

• Many concepts in e-c, in particular in connection with CRM and customer classes are of informal character where no direct formal equivalent exists.

• Computer support requires a formal notion which approximates the informal concept as good as possible.

• Such formal versions have to be learned and the learning process requires data mining activities which are again based on studies of customers and their behavior.

• It has to taken into account that informal concepts are usually not stable over time.

Page 54: Chapter 13 Data Mining - 2 -

- 54 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

The Correctness Problem• The correctness problem for the statement that two

expressions are logically equivalent reduces to a formal proof.• How to “proof” that an informal and a formal concept are

equivalent ? – Formal systems do not have access to informal notions.

– Humans have usually difficulties to compare both types of notions because this refers to a broad scope of intended uses.

• Required is a kind of Turing Test which decides that a human who uses the informal version and a machine which uses the formal version refer to the same concept.

• The ordering principle is that the test does not deal with the concept itself but with partial orderings related to the concept.

Page 55: Chapter 13 Data Mining - 2 -

- 55 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

The Ordering Principle and a Turing Test (1)

Suppose there is a partial ordering „<„ with the concept C associated: The partial ordering then again has two versions: formal and informal.The Turing test refers to these two versions of „<„ :

Informal humanversion of C

Formalversionof C

The goal is that whenvariations of the arguments of < arepresented:The human says „up“if and only if the formalsystem says „up“


Page 56: Chapter 13 Data Mining - 2 -

- 56 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

The Ordering Principle and a Turing Test (2)

Concept to grasp: Typical lionFormalversion usesOrdering: Quotient of length/height

Human:Aesthetic property


betterThe partial ordering approximates the concept C in the sense that semantics of y < z is : z is more typical for Cthan y is.

Page 57: Chapter 13 Data Mining - 2 -

- 57 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

The Ordering Principle and a Turing Test (3)

• Advantages of the ordering principle:– The validity of the equivalence of formal and informal

concepts can be effectively validated by Turing tests, i.e. by experiments.

– If there are several orderings involved this can be done for all of them.

– The search for a formal counterpart of an informal concept can be performed in an approximative way and partial validation is possible.

• The formal partial ordering is what has to be learned• The learning process is an approximation process in

order to perform the Turing test sufficiently well.

Page 58: Chapter 13 Data Mining - 2 -

- 58 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

The Learning Scenario • (1)The informal concept C on a set U is regarded as a fuzzy

set where a set of prototypes P U is known.

(2) An informal relation rx(y,z) stating “y is more similar to x than z is”

• The object to be learned is a similarity measure

sim: U x P [0,1].

• Turing test: The relations x (from the formal similarity measure) and rx agree.

• We decompose the approach into two basic steps:– A first step to get a suitable representation language : Concept

learning.– A second step for learning the similarity measure: Subsymbolic


Page 59: Chapter 13 Data Mining - 2 -

- 59 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Learning of Weights• Learning similarities is an example of subsymbolic

learning and reduces often to learning weights:We distinguish:– global weights:

– prototype specific weights:wi,c: relevance matrix

• Change of weights: Change of relevance of features.• Error function determined by Turing test.• Learning procedures can be supervised or unsupervised.

sim q c w sim q ci i i ii


( , ) ( , )


sim q c w sim q ci c i i ii


( , ) ( , ),


Page 60: Chapter 13 Data Mining - 2 -

- 60 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Learning of Weights with/without Feedback

• Many algorithms for both learning types are known.• Learning without Feedback for Retrieval / Reuse

– Use the the distribution of cases in the case base in order to determine the relevance of attributes

• Learning with Feedback– Correct or incorrect choice of cases / classification– result leads to the change of weights











A1 is more important than A2

Page 61: Chapter 13 Data Mining - 2 -

- 61 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Learning of Weights without Feedback

• Determination of class specific weights:– Binary coding of the attributes by

• Discretizing of real valued attributes• Transforming each symbolic attribute into n binary attributes

– Suppose

• wik the weight for attribute i for class k

• class(c) the class (solution) in case c

• ci the attribute i in case c

– Put: wik = P( class(c)=k | ci) conditional probability that the class of a case is k under the condition that the attribute i vorliegt is given.

– Estimation of the probabilities use samples of the case base.

Page 62: Chapter 13 Data Mining - 2 -

- 62 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Learning of Weights with Feedback

• Correct or incorrect classification leads to a correction of weights:wik := wik + wik

• There are several ways for the adaptation of the weights:• Approach of Salzberg (1991) for binary attributes:

– Feedback = positive (i.e. correct classification): • Weight for attributes with the same values increases• Weight for attributes with different values decreases• Feedback = negative (i.e. wrong classification): • Weight for attributes with the same values decreases• Weight for attributes with different values increases

• The increment wik remains constant.

Page 63: Chapter 13 Data Mining - 2 -

- 63 - (c) 2000 Dr. Ralph Bergmann and Prof. Dr. Michael M. Richter, Universität Kaiserslautern

Summary• Relations between data mining and kdd.• Relations between data mining, learning and performance.• The way from data to knowledge.• Making knowledge explicit.• Collecting cases and building a CBR-system• Examples:

– Quantitative models of consumer behavior (external analysis)– Causal analysis (external analysis)– Quality analysis for dialogues (internal analysis)

• Learning of informal concepts can be reduced to learning of similarity measures.

Top Related