csi5388 data sets – part ii experiments with artificial data sets

11

CSI5388CSI5388Data Sets – Part IIData Sets – Part II

Experiments with Artificial Data SetsExperiments with Artificial Data Sets

Assessing the Impact of Changing Assessing the Impact of Changing environments on Classifier Performance environments on Classifier Performance

(Rocio Alaiz-Rodriguez and Nathalie (Rocio Alaiz-Rodriguez and Nathalie Japkowicz)Japkowicz)

22

Purpose of the WorkPurpose of the Work

Direct purpose:Direct purpose: To test the hypothesis To test the hypothesis by David Hand (2006) that simple by David Hand (2006) that simple classifiers are more robust to changing classifiers are more robust to changing environments than complex ones.environments than complex ones.

Indirectly:Indirectly: Demonstrating the Demonstrating the feasibility and value of generating feasibility and value of generating artificial, but realistic domains.artificial, but realistic domains.

More generally:More generally: Proposing an Proposing an alternative to the use of the UCI alternative to the use of the UCI domains.domains.

33

Specific hypotheses under Specific hypotheses under reviewreview

Preliminaries:Preliminaries: Different kinds of changing environments:kinds of changing environments:• Population DriftPopulation Drift — p(d|x) remains unchanged, but p(x) differs — p(d|x) remains unchanged, but p(x) differs

from training to testing set. from training to testing set. Also known asAlso known as: covariate shift or : covariate shift or sample selection bias.sample selection bias.

• Class Definition ChangeClass Definition Change — p(x) does not change, but p(d|x) — p(x) does not change, but p(d|x) varies from training to testing set. varies from training to testing set. Also known asAlso known as: concept drift or : concept drift or functional relation change.functional relation change.

Hypotheses under review:Hypotheses under review:• Hypothesis 1: When either or both a population drift and a class When either or both a population drift and a class

definition change occurs, can we generally observe a drop in definition change occurs, can we generally observe a drop in performance by all kinds of classifiers?performance by all kinds of classifiers?

• Hypothesis 2:Hypothesis 2: Do simpler classifiers maintain their performance Do simpler classifiers maintain their performance more reliably than more complex ones in such cases?more reliably than more complex ones in such cases?

44

Our Experimental Framework IOur Experimental Framework I

Our domain is a simulated medical domain Our domain is a simulated medical domain that states the prognostic of patients infected that states the prognostic of patients infected with the flu and described as follows:with the flu and described as follows:• Patient’s age [Infant, Teenager, YoungAdult, Patient’s age [Infant, Teenager, YoungAdult,

Adult, OldAdult, Elderly]Adult, OldAdult, Elderly]• Severity of flu symptoms [Light, Medium, Strong]Severity of flu symptoms [Light, Medium, Strong]• Patient’s general health [Good, Medium, Poor]Patient’s general health [Good, Medium, Poor]• Patient’s social position [Rich, MiddleClass, Poor]Patient’s social position [Rich, MiddleClass, Poor]• Class: NormalRemission, ComplicationsClass: NormalRemission, Complications

55

Our Experimental Framework IIOur Experimental Framework II

In order to make the In order to make the problem interesting problem interesting for classifiers, we for classifiers, we assumed that the assumed that the features are not features are not independent of one independent of one another.another.

We also assumed We also assumed that certain feature that certain feature values were values were irrelevant.irrelevant.

Attribute Dependency Graph

66

Our Experimental Framework IIIOur Experimental Framework III

We used different distributions to model the various We used different distributions to model the various features and the class:features and the class:

Age:Age: we assumed a region with negative growth we assumed a region with negative growth which, according to the Population Reference Bureau which, according to the Population Reference Bureau contains many people in the Adult and Young Adult contains many people in the Adult and Young Adult categories, and few people in the other categories. categories, and few people in the other categories. We used uniform distributions to model the first five We used uniform distributions to model the first five categories, and an exponential distribution for the categories, and an exponential distribution for the elderly category (which is not bounded upwards).elderly category (which is not bounded upwards).

Social Status:Social Status: Normal Distribution distributed around Normal Distribution distributed around the Middle class (= 2), with variance .75. [Poor=3 and the Middle class (= 2), with variance .75. [Poor=3 and Rich= 1]Rich= 1]

Severity of Flu Symptoms:Severity of Flu Symptoms: (= Severity of the virus (= Severity of the virus strain) Same distribution as Social Status.strain) Same distribution as Social Status.

77

Our Experimental Framework IVOur Experimental Framework IV

General Health:General Health: We used an unobserved binary We used an unobserved binary variable called “delicate person” whose probability variable called “delicate person” whose probability increases with age, and generated rules based on the increases with age, and generated rules based on the assumptions that (1) delicate people have worse assumptions that (1) delicate people have worse general health than other members of the population; general health than other members of the population; and (2) poorer people have worse geberal health than and (2) poorer people have worse geberal health than the richer members of society.the richer members of society.

Class:Class: The class labels were assigned automatically The class labels were assigned automatically and abided by the following general principles:and abided by the following general principles:• In case of stronger flu symptoms, the chances of In case of stronger flu symptoms, the chances of

complications are greater.complications are greater.• Infants and Elderly have greater chances of complicationsInfants and Elderly have greater chances of complications• People with poorer general health are more susceptible to People with poorer general health are more susceptible to

complications complications

88

Our Experimental Framework VOur Experimental Framework V

Class Distribution, orClass Distribution, or

probability of normal probability of normal remission as a function remission as a function of age, severity of the of age, severity of the flu symptoms and flu symptoms and general health. When general health. When two lines appear, the two lines appear, the discontinuous one discontinuous one applies to instances applies to instances with poor social statuswith poor social status

99

Changes to the Testing Set IChanges to the Testing Set I Population Drifts with full representation:Population Drifts with full representation:

• Developing Population (DP): high birth and death Developing Population (DP): high birth and death rates.rates.

• Zero Growth Population (ZGP): Similar birth and Zero Growth Population (ZGP): Similar birth and death rates.death rates.

• Season changes (NGP/W): Winter: Flu symptoms Season changes (NGP/W): Winter: Flu symptoms get strongerget stronger

• Season changes (NGP/SW): Soft Winter: Flu Season changes (NGP/SW): Soft Winter: Flu Symptoms get milderSymptoms get milder

• Season changes (NGP/DW): Drastic Winter: Flu Season changes (NGP/DW): Drastic Winter: Flu Symptoms get stronger and general health declinesSymptoms get stronger and general health declines

• Population is much poorer (NGP/P)Population is much poorer (NGP/P)• Population is much poorer and the winter is trastic Population is much poorer and the winter is trastic

(NGP/P+DW)(NGP/P+DW)

1010

Changes to the Testing Set IIChanges to the Testing Set II

Population Drifts with Non-Represented casesPopulation Drifts with Non-Represented cases• We considered several situations where one or two We considered several situations where one or two

population groups are not represented.population groups are not represented. Class Definition Changes:Class Definition Changes:

• More Complications (MC): the probability of More Complications (MC): the probability of normal remission decreases for certain ages, social normal remission decreases for certain ages, social statuses and flu symptoms.statuses and flu symptoms.

• Fewer Complications (FC): the age group for Fewer Complications (FC): the age group for which the probability of normal remission is high which the probability of normal remission is high is widened.is widened.

1111

Evaluation MeasureEvaluation Measure In order to measure the changes in In order to measure the changes in

performance caused by environmental performance caused by environmental changes, we introduce a new metric, called changes, we introduce a new metric, called performance Deterioration (pD),performance Deterioration (pD), defined defined as follows:as follows:

EEtesttest – E – Eidealideal , if E , if Etesttest <= E <= E00

pD = EpD = E0 0 – E– Eideal ideal

1, Otherwise1, Otherwise With EWith E00 representing the error rate of the representing the error rate of the

trivial classifier, Etrivial classifier, Etesttest is the classifier’s error is the classifier’s error rate on the test set, and Erate on the test set, and Eidealideal is the classifier’s is the classifier’s error rate when trained and tested on data error rate when trained and tested on data abiding by the same distribution.abiding by the same distribution.

1212

Results I: Population Drifts with Full Results I: Population Drifts with Full RepresentationRepresentation

Verification of our hypotheses:(a) A drop in performance is observed by all classifiers(b) Simpler classifiers suffer much more.

1313

Results II: Population Drifts with Results II: Population Drifts with Non-Represented CasesNon-Represented Cases

Verification of our hypotheses:(a) A drop in performance is observed by all classifiers(b) Simpler classifiers suffer slightly more.

1414

Results III: Class Definition ChangesResults III: Class Definition Changes

Verification of our hypotheses:(a) A drop in performance is observed by all classifiers(b) Simpler classifiers don’t necessarily suffer more than complex

ones. (SimpleNN does not. 1R does)

1515

SummarySummary Our results show that the trend hypothesized by Our results show that the trend hypothesized by

David Hand does happens in some cases, but does David Hand does happens in some cases, but does not happen in others.not happen in others.

In all cases, however, complex classifiers that In all cases, however, complex classifiers that generally obtain lower error rates in the original generally obtain lower error rates in the original scenario (with no changing conditions) remain the scenario (with no changing conditions) remain the best choice since their performance remains higher best choice since their performance remains higher than that of the simple classifiers even though their than that of the simple classifiers even though their performance deterioration are sometimes performance deterioration are sometimes equivalent.equivalent.

Given the dearth of data sets representing Given the dearth of data sets representing changing environments, none of the results we changing environments, none of the results we present here could have been obtained had we not present here could have been obtained had we not generated artificial though realistic data sets generated artificial though realistic data sets simulating various conditions.simulating various conditions.

1616

Future WorkFuture Work

Develop a systematic way to generate Develop a systematic way to generate realistic artificial data sets that could realistic artificial data sets that could replace, or, at least, supplement the UCI replace, or, at least, supplement the UCI domains.domains.

Find a way to verify the realistic nature of Find a way to verify the realistic nature of these data sets.these data sets.

Rather than generate data sets from Rather than generate data sets from intuition as we’ve done it here, start from intuition as we’ve done it here, start from actual real-world data sets and expand actual real-world data sets and expand them artificially.them artificially.

csi5388 data sets – part ii experiments with artificial data sets

Documents

worse general health

poorer general health

class definition change

middle class

class labels

poorer people

delicate people

population drift pdx