nima hazar amin dehesh. several induction algorithm have been developed that vary in the methods...
TRANSCRIPT
Induction SystemsNima Hazar
Amin Dehesh
Several induction algorithm have been developed that vary in the methods employed to build the decision tree or set of rules.
ID3 is a general purpose rule induction algorithm developed by Quinlan(1979), and today is most expert system shell.
ID3
Predicting weather: temperature, wind direction, condition of sky, barometric pressure.
ID3 (cont.)
Choose most important issue first. No-data result. Excludes irrelevant factors.
ID3 Features
ID3 Algorithm
The central choice in the ID3 is selecting which attribute to test at each node in the tree.
Define a statistical property, called information gain, that measures how well a give attribute separates the training examples according to their target classification.
ID3 Algorithm (cont.)
Characterizes the purity of an arbitrary of examples.
Suppose S is collection of 14 examples: 9 positive and 5 negative ([9+,5-]).
Entropy
0log0 = 0 Entropy is 0 if all members of S belong to
same class. Entropy is 1 when the collection contains an
equal number of positive and negative. If the target attribute can take on C different
values :
Entropy (cont.)
Gain(S,A) of an attribute A, relate to be a collection of examples S, is defined as:
Values(A) is the set of all possible values for attribute A, and Sv is the subset of S for which attribute A has value v.
Information Gain
An example :
Information Gain (cont.)
Playing Tennis:
An ID3 Example
The best is “outlook”
An ID3 Example (cont.)
An ID3 Example (cont.)
This process continues for each new leaf until either of two conditions is met:
1.Every attribute has already been included along this path through the tree
2.The training examples associated with this leaf node all have the same target attribute value.
An ID3 Example (cont.)
An ID3 Example (cont.)
Determine objective
Determine decision factors: attribute nodes of the decision tree
Determine decision factor values :attribute values of the decision tree
Determine solution : list of final decisions that the system can make
Developing an induction expert system
Form example set: problem knowledge
Create decision tree
Test the system
Revise the system
Developing an induction expert system (cont.)
Predict the outcome of a football game.
Objective: a football game prediction system that can predict if our team will win or lose its next game
Decision factors : location of the game, weather, our own team’s record, and the opponent’s record
Football game prediction system
Decision factor values :
Solution: a simple binary decision
Football game prediction system (cont.)
Examples :go back over first 8 games.
Football game prediction system (cont.)
Decision Tree : ID3 Algorithm
Football game prediction system (cont.)
Testing : Next 8 games
Football game prediction system (cont.)
Revising: After discussing, another factor is the team’s health. With the values of {poor, average, good}
Football game prediction system (cont.)
Football game prediction system (cont.)
Attempts to determine the influence of various system elements on the system’s behavior.
Some shells like 1STCLASS permit you to deactivate decision factors or examples without removing.
Advantages:◦ Impact of decision factors.◦ Examples from different sources.
Sensitivity Study
Sensitivity Study
Advantages Of Induction Discovers rules from examples. Avoids knowledge elicitation problem. Can produce new knowledge. Can uncover critical decision factors. Can eliminate irrelevant decision factors. Can uncover contradictions.
Contradictions Example: Pump diagnosis system: Sometimes contradiction is acceptable. Indicates a problem:
◦ A bad example.◦ Inadequate decision factors or values.
Disadvantages of Induction Often difficult to choose good decision
factors. Difficult to understand rules. Applicable only for classification problems.
Expert Systems Developed Through Induction AQ11 WILLARD Transformer Fault Diagnosis Customer Support VAX-VMS Operating System Tuning Predicting Stock Market Behavior
AQ11 Developed in 1980 for diagnosing soybean
diseases. Capable of identifying 15 diseases. 630 examples of diseased soybean plants. Uses 35 decision factors. A special example selection program, ESEL,
was used to select 290 considerably different examples.
AQ11 (cont.) AQ11 formulated a set of rule for classifying
a new example into one of the 15 categories.
They also developed a rule-based system from knowledge of a plant pathologist.
The 340 examples were used to compare. The rule-based results 71.8%. AQ11 scored 97.6%.
WILLARD To aid forecasting thunderstorms in the
United States NSSFC (1984). Uses 140 examples of thunderstorm
weather data. 30 modules, each with a single decision
tree. Characterize to none, approaching, slight,
moderate or high with a probability range.
Was developed using RULEMASTER inductive tool.
Was tested for one week in spring of 1984, in Texas, Oklahoma, and Colorado.
5 thunderstorms passed through this region. WILLARD was found to compare favorable
with those made by an expert meteorologist.
WILLARD (cont.)
Transformer Fault Diagnosis Designed to detect early signs of transformer
faults in order to avoiding potential failure. An expert system for evaluating the gas-in-oil
test results. From knowledge contained in the form of past
examples using RULEMASTER rule induction tool.
Contains 27 modules, each with its own induced rule.
Tested using data from 900 tests. 90% agreed with the expert’s analysis.
Customer Support SCREENIO, a software by NORCOM, allows
the user to design IBM PC screens for Realia COBOL programs.
NORCOM developed an expert system to aid their support personnel.
9 months of data on typical customer problems.
9 decision factors and was developed in 1 day using 1STCLASS.
Is used by support personnel who ask the customer for values for each of the decision factors.
Made a major improvement in customer support responsiveness and efficiency.
Customer Support (cont.)
An expert system to help tune the VAX-VMS operating system (1984).
Tuning is a complex and dynamic task. Over 150 parameter must be set by the system manager.
Adjustments and modifications are required in response to changes in configuration and loading.
The developed system collects data on present system performance and generate a summary report.
VAX-VMS Operating System Tuning
Interacts with the system manager by asking questions that lead to a recommended action.
Recommendation:◦ Adjusting system parameters◦ Adjusting user authorization values◦ Redistribution or reducing user demand◦ Purchasing new hardware◦ …
VAX-VMS Operating System Tuning (cont.)
Was developed using the induction tool TIMM.
The effectiveness is measured by comparing the performance before and after the recommended actions are taken.
Has made the management of the operating system a more efficient task.
VAX-VMS Operating System Tuning (cont.)
Is a difficult challenge. Analysts use historical data analysis techniques which provide a degree of uncertainty.
Braun developed an expert system based on ID3 to improve reliability of stock market prediction. (1987)
20 decision factors, values were determined over a time between March 1981 to April 1983 from Wall Street Journal.
Predicting Stock Market Behavior
3 different results:◦ Bullish: upward trend◦ Bearish: downward trend◦ Neutral: either call is too risky
Correctly predicted 64.4% of the time. The expert analyst predicted 60.2%. The analyst was impressed with general
structure of the decision tree and with system predictions.
Predicting Stock Market Behavior (cont.)