project 2 data mining part 1
DESCRIPTION
TRANSCRIPT
Project II Data Mining a
Mushroom Dataset Group 1
Raymond Borges
Jarilyn Hernandez
The Mushroom Dataset
Data Set Characteristics:
Multivariate Number of Instances:
8124 Area: Life
Attribute Characteristics:
Categorical Number of Attributes:
22 Date Donated:
1987
This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.
Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
Mushroom Dataset
22 Independent attributes
1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
Mushroom Dataset
22 Attributes Total
18 Intrinsically
on Mushroom
4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond
C = creosote
F = foul
L = anise
M = musty
N = none
P = pungent
S = spicy
Y = fishy
a c f l m n p s y
J48 Tree 100%
Classification
P P P P P E P E
almond
creosote foul anise spicy fishy
E = Edible
P = Poisonous
E E E E E E P E
black
brown buff chocolate green orange purple yellow
E
broad
narrow
E P
P E E E E E
abundant clustered numerous scattered several solitary
musty none pungent
white
crowded distant close
Simplest rule-set (Benchmark)
These are Poisonous 1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)
2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)
3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)
4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white (100% accuracy)
Habitat Insights
Woods Grasses Leaves Meadows Paths Urban Waste
Waste is safe but stay away from paths
Population Insights
Abundant Clustered Numerous Scattered Several Solitary
Mushrooms travel safer in groups
Information Knowledge
Population Data %Rates vs. Mushrooms
Abundant Clustered Numerous Scattered Several Solitary 0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
% Poisonous % Edible
Poisonous/Edible Ratio
vs. Mushroom Population Density
solitary
several
scattered
numerous clustered
abundant
-50.00%
0.00%
50.00%
100.00%
150.00%
200.00%
250.00%
300.00%
0 1 2 3 4 5 6 7
Po
iso
no
us/
Edib
le R
atio
Mushroom Density
Conclusions
If it stinks don’t eat it, 98.52% accuracy
If it doesn’t stink and it’s spore color is not
green then you have a 99.41% chance of
survival
Odor and spore color may be the best
attributes statistically but not in the field
Future Work Use more easily identified attributes to classify
mushrooms to produce a method of easier visual classification
Eliminate nonvisual attributes
Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk
Compare the two methods