![Page 1: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/1.jpg)
1
Using Bayesian networks for Water Quality Prediction in
Sydney Harbour
Ann Nicholson
Shannon Watson, Honours 2003
Charles Twardy, Research Fellow
School of Computer Science and Software Engineering
Monash University
![Page 2: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/2.jpg)
2
Overview
Representing uncertainty Introduction to Bayesian Networks
» Syntax, semantics, examples
The knowledge engineering process Sydney Harbour Water Quality Project 2003 Summary of other BN research
![Page 3: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/3.jpg)
3
Sources of Uncertainty
Ignorance Inexact observations Non-determinism AI representations
» Probability theory» Dempster-Shafer» Fuzzy logic
![Page 4: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/4.jpg)
4
Probability theory for representing uncertainty
Assigns a numerical degree of belief between 0 and 1 to facts» e.g. “it will rain today” is T/F. » P(“it will rain today”) = 0.2 prior probability
(unconditional)
Posterior probability (conditional)» P(“it will rain today” | “rain is forecast”) = 0.8
Bayes’ Rule: P(H|E) = P(E|H) x P(H) P(E)
![Page 5: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/5.jpg)
5
Bayesian networks
A Bayesian Network (BN) represents a probability distribution graphically (directed acyclic graphs)
Nodes: random variables,» R: “it is raining”, discrete values T/F» T: temperature, cts or discrete variable» C: colour, discrete values {red,blue,green}
Arcs indicate conditional dependencies between variables
P(A,S,T) can be decomposed to P(A)P(S|A)P(T|A)
![Page 6: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/6.jpg)
6
Bayesian networks
Conditional Probability Distribution (CPD)– Associated with each variable– probability of each state given parent states
“Jane has the flu”
“Jane has a high temp”
“Thermometertemp reading”
Flu
Te
Th
Models causal relationship
Models possible sensor error
P(Flu=T) = 0.05
P(Te=High|Flu=T) = 0.4P(Te=High|Flu=F) = 0.01
P(Th=High|Te=H) = 0.95P(Th=High|Te=L) = 0.1
![Page 7: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/7.jpg)
7
BN inference
Evidence: observation of specific state Task: compute the posterior probabilities for query
node(s) given evidence.
Th
Y
Flu
Te
Diagnostic inference
Th
Flu
Te
Predictive inference
Intercausal inference
Te
Flu TBFlu
Mixed inference
Th
Flu
Te
![Page 8: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/8.jpg)
8
BN software
Commerical packages: Netica, Hugin, Analytica (all with demo versions)
Free software: Smile, Genie, JavaBayes, See appendix B, Korb & Nicholson, 2004
Example running Netica software
![Page 9: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/9.jpg)
9
Decision networks
Extension to basic BN for decision making» Decision nodes» Utility nodes
EU(Action) = p(o|Action,E) U(o) o
» choose action with highest expect utility
Example
![Page 10: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/10.jpg)
10
Elicitation from experts
Variables» important variables? values/states?
Structure» causal relationships?» dependencies/independencies?
Parameters (probabilities)» quantify relationships and interactions?
Preferences (utilities)
![Page 11: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/11.jpg)
11
Expert Elicitation Process
These stages are done iteratively Stops when further expert input is no longer
cost effective Process is difficult and time consuming. Current BN tools
» inference engine » GUI
Next generation of BN tools?
BN EXPERT
BN TOOLS
Domain EXPERT
![Page 12: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/12.jpg)
12
Knowledge discovery
There is much interest in automated methods for learning BNS from data» parameters, structure (causal discovery)
Computationally complex problem, so current methods have practical limitations» e.g. limit number of states, require variable
ordering constraints, do not specify all arc directions
Evaluation methods
![Page 13: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/13.jpg)
13
Knowledge Engineering for Bayesian Networks (KEBN)
1. Building the BN» variables, structure, parameters, preferences» combination of expert elicitation and knowledge discovery
2. Validation/Evaluation» case-based, sensitivity analysis, accuracy testing
3. Field Testing» alpha/beta testing, acceptance testing
4. Industrial Use» collection of statistics
5. Refinement» Updating procedures, regression testing
![Page 14: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/14.jpg)
14
The KEBN process
![Page 15: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/15.jpg)
15
Quantitative KE process
![Page 16: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/16.jpg)
16
Water Quality for Sydney Harbour
Water Quality for recreational use
Beachwatch / Harbourwatch Programs
Bacteria samples used as pollution indicators
Many variables influencing Bacterial levels – rainfall, tide, wind,
sunlight temperature, ph etc
![Page 17: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/17.jpg)
17
Past studies
Hose et al. used multi dimension scaling model of Sydney harbour » low predictive accuracy, unable to handle the noisy bacteria
samples, explained 63% of bacteria variablity (Port Jackson) Ashbolt and Bruno:
» agree with Hose et al, + wind effects, sunlight hours, tide Crowther et al (UK):
» rainfall, tide, sampling times, sunshine, wind» Explained 53% of bacteria variablility
Other models developed by the USEPA to model estuaries are:» QUAL2E – Steady-state receiving water model
» WASP – Time Varying dispersion model» EFDC – 3D hydrodynamic model
EPA in Sydney interested in a model applying the causal knowledge of the domain
![Page 18: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/18.jpg)
18
EPA Guidelines
Today Yesterday Day Before Yesterday
Pollution
IF T>4 THEN Likely
ELSE IF T 4 AND Y 4 AND DBY 4
THEN Unlikely
ELSE IF T 4 AND Y 4 AND DBY 4
THEN Unlikely for 24h flushing
But Likely for 48h flushing
ELSE Likely for all other results
![Page 19: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/19.jpg)
19
Stages of Project
Preparation of EPA Data rainfall only Hand-craft simple networks for rainfall
data Comparison of hand-crafted networks
with range of learners (using Weka software)
Using CaMML to learn BN on extended data set
2003 Honsproj
2003/04SummerVac proj
![Page 20: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/20.jpg)
20
EPA Data
Database 1: » E.coli, Enterococci (cfu/100mL), thresholds 150 &
35. » 60 water samples each year since 1994 at 27 sites
in Sydney Harbour.» Enterococci E.coli, Raining, Sunny, Drain running,
temperature, time of sample, direction of sampling run, date, site name, beach code
Database 2:» Rainfall readings (mm) at 40 locations around
Sydney
![Page 21: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/21.jpg)
21
Data Preparation
New file format:Date BeachCode Entc Ecoli D1 D2 D3 D4 D5 D6
D1 = rainfall on day of collection
D6 = rainfall 5 days previously
Rainfall data had many missing entries
![Page 22: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/22.jpg)
22
Rainfall BNs
Hand-crafted BNs to predict bacteria using rainfall only
Started with deterministic BN that implemented EPA guidelines
Looked at varying number of previous days rainfall for predicting bacteria
Investigated various discretisations of variables
![Page 23: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/23.jpg)
23
EPA Guidelines as BN
![Page 24: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/24.jpg)
24
Davidson BN: 1 day rainfall
![Page 25: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/25.jpg)
25
Davidson BN: 6 days rainfall
![Page 26: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/26.jpg)
26
Evaluation
Split data 50-50 training/testing 10 fold cross validation Measures: Predictive Accuracy & Information Reward Also looked at ROC curves (correct classification vs
false positives) Using Weka: Java environment for machine learning
tools and techniques Small data: 4 beaches: Chinamans, Edwards,
Balmoral (all middle harbour), Clifton (Port Jackson) Using 6 days rainfall averaged from all rain gauges
![Page 27: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/27.jpg)
27
Predictive accuracy
Examining each joint observation in the sample
Adding any available evidence for the other nodes
Updating the network Use value with highest probability as
predicted value Compare predicted value with the actual value
![Page 28: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/28.jpg)
28
Information Reward
Rewards calibration of probabilities Zero reward for just reporting priors Unbounded below for a bad prediction Bounded above by a maximum that depends
on priors
Reward = 0
Repeat
If I == correct state
IR += log ( 1 / p[i] )
else
IR += log ( 1 / 1 - p[i] )
![Page 29: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/29.jpg)
29
Evaluation: Weka learners
Naïve Bayes J48 (version of C4.5) CaMML –Causal BN learner, using MML metric AODE TAN Logistic “Davidson” BN – 6 days previous rainfall
» With and without adaptation of parameters (case learning)
“Guidelines” BN – 3 days previous rainfall» Deterministic rule» With adaptation of parameters (case learning)
Pr=1/3 Pr=1/3 Pr=1/3
![Page 30: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/30.jpg)
30
Results
Learner Pred Accuracy Info Reward
Prior 0.758 0
Naïve Bayes 0.760 -0.729
J48 0.791 0.125
CaMML 0.764 0.122
AODE 0.769 0.128
TAN 0.775 -1.459
Logistic 0.787 0.128
Davidson 0.757 -0.272
Davidson CL 0.776 0.033
Guidelines (det) 0.530 -2.318
Guidelines CL 0.776 0.058
![Page 31: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/31.jpg)
31
Results: ROC Curves
![Page 32: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/32.jpg)
32
Results: area under ROC Curves
Perfect 0.999
AODE 0.733
Logistic 0.729
CaMML 0.718
J48 0.689
Naïve 0.679
Davidson CL 0.645
Guidelines CL 0.643
Guidelines 0.637
Davidson 0.620
TAN 0.561
Prior 0.496
![Page 33: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/33.jpg)
33
Results: ROC Curves
For ~20% false-positive, can get ~60% of events For ~45% false-positive, can get ~75% of events For ~60% false-positive, can get ~80% of events Implications?
» Using current guidelines, if accept 45% false-positive, getting 60% hit rate
» Can either keep that false-positive rate, get extra 15%» Or, keep same hit rate at half the false positive rate
![Page 34: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/34.jpg)
34
Example of CaMML BN
![Page 35: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/35.jpg)
35
Future Directions?
![Page 36: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/36.jpg)
36
![Page 37: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/37.jpg)
37
Early BN-related projects
DBNS for discrete monitoring (PhD, 1992) Approximate BN inference algorithms based
on a mutual information measure for relevance (with Nathalie Jitnah, 1996-1999)
Plan recognition: DBNs for predicting users actions and goals in an adventure game (with David Albrecht, Ingrid Zukerman, 1997-2000)
DBNs for ambulation monitoring and fall diagnosis (with biomedical engineering, 1996-2000)
Bayesian Poker (with Kevin Korb, 1996-2003)
![Page 38: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/38.jpg)
38
Knowledge Engineering with BNs
Seabreeze prediction: joint project with Bureau of Meteorology» Comparison of existing simple rule, expert elicited
BN, and BNs from Tetrad-II and CaMML ITS for decimal misconceptions Methodology and tools to support knowledge
engineering process» Matilda: visualisation of d-separation » Support for sensitivity analysis
Written a textbook: » Bayesian Artificial Intelligence, Kevin B. Korb and
Ann E. Nicholson, Chapman & Hall / CRC, 2004.www.csse.monash.edu.au/bai/book
![Page 39: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/39.jpg)
39
Current BN-related projects
BNs for Epidemiology (with Kevin Korb, Charles Twardy)
» ARC Discovery Grant, 2004 » Looking at Coronary Heart Disease data sets» Learning hybrid networks: cts and discrete variables.
BNs for supporting meteorological forecasting process (DSS’2004) (with Ph. D student Tal Boneh, K. Korb, BoM)
» Building domain ontology (in Protege) from expert elicitation» Automatically generating BN fragments» Case studies: Fog, hailstorms, rainfall.
Ecological risk assessment » Goulburn Water, native fish abundance » Sydney Harbour Water Quality
![Page 40: 1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d2b5503460f94a0040e/html5/thumbnails/40.jpg)
40
Open Research Questions
Methodology for combining expert elicitation and automated methods» expert knowledge used to guide search» automated methods provide alternatives to be
presented to experts Evaluation measures and methods
» may be domain dependent Improved tools to support elicitation
» Reduce reliance on BN expert» e.g. visualisation of d-separation
Industry adoption of BN technology