The bycatch of Bayes Nets
Kerrie Mengersen QUT
Australia
Australian Research Council Centre of Excellence
Mathematical & Statistical Frontiers:
Big Data, Big Models, New Insights
7 year horizon
6 Universities
7 Partner Organisations 18 CIs, 8 PIs, 23 AIs, 18 RAs, 40PhDs
Bayesian Research and Applications Group (BRAG)
Our vision: To engage in world-class, relevant fundamental and collaborative statistical research, training and application through Bayesian (and other) modelling + fast computation + translation
Bayesian stats + food security
• Process modelling for plant biosecurity• Conservation• Surveillance design• “Intelli-sensing”, eg satellite data and UAVs
4
Spiralling WhiteflyAleurodicus dispersus
Countries where spiralling whitefly has been detected. Administrative regions within some countries are shown when documented. Source (CABI 2004, Monteiro et al. 2005, CABI 2006). Personal communications (J.H. Martin, 2008, B.M. Waterhouse, 2008)
The Problem Major tropical plant pest Lives on 100 hosts + Restricts market access to other
states
Information Literature: Characteristics,
growth, spread Detectability (inspectors) Surveillance data (> 30 000
records)
Scope of modelling Local, district and statewide
• Data Model: Pr(data | incursion process and data parameters) – How data is observed given underlying pest extent
• Process Model: Pr(incursion process | process parameters) – Potential extent given epidemiology / ecology
• Parameter Model: Pr(data and process parameters)– Prior distribution to describe uncertainty in detectability, exposure, growth …
• The posterior distribution of the incursion process (and parameters) is related to the prior distribution and data by:
Pr(process, parameters | data) Pr(data | process, parameters ) Pr( process | parameters ) Pr(parameters)
Hierarchical Bayesian model
Early Warning Surveillance
Priors
Surveillance data
Posterior learning modest reduction in
area freedom large reduction in
estimated extent residual “risk” maps to
target surveillance
Invasion Parameter Estimates
Useful for local management
Observation parameter estimates
Also learn about:
• Host suitability
• Inspector efficiency
Conservation and food security
Modelling complex systems
EconomicHuman impact
GovtBiology
UnknownsExternal factors
Social
En
viro
nm
ent
“There's so much talk about the system. And so little understanding.”
Robert Pirsig
Zen and the Art of Motorcycle Maintenance
“Move away from indicators reported
separately towards methods based on
understanding complexity and emergence.”
Tony Morton
Systems Models
Bayesian Networks
G
E
F
G
E F normal high
yes
low 0.4 0.6
medium 0.2 0.8
high 0.1 0.9
no
low 0.5 0.5
medium 0.6 0.4
high 0.4 0.6
F
low 0.7
medium 0.2
high 0.1
• Be able to model the system– Include many diverse factors and their interactions– Bring together disparate knowledge, including data, model
outputs, expert information, etc– Include costs, benefits, utility
• Use the model to:– Identify key drivers– Explore scenarios of change (“what if…?”)– Identify critical control points– Suggest optimal strategies for improved outcomes– Understand impact of management and policy decisions
Why BNs?
Systems models (BNs) related to food security
• Conservation • Water quality• Recycled water and health• Dairy sustainability• Plant biosecurity risk
Indicator Category Farm Factory Market RatingEconomic Commodity prices 0.8 0.6 0.6 0.7
Legal and administrative environment 0.0 0.1 0.8 0.3Access to capital and labour 0.6 0.8 0.8 0.7Profitability 0.3 0.8 0.7 0.6Workforce capabiility 0.1 0.8 0.6 0.5Economic sustainability rating 0.4 0.6 0.7 0.6
Social Lifestyle and community 0.0 0.5 0.1 0.3Health and well being 0.6 0.9 0.8 0.7Value and contribution 0.1 0.6 0.6Product, safety and production 0.8 0.0 0.0 0.1Social relevance 0.6 0.8 0.8 0.7Social sustainability rating 0.4 0.6 0.3 0.5
Environment Energy, effl uent and water 0.6 0.2 0.2 0.3Materials, suppliers and transport 0.2 0.8 0.6Products and services 0.8 0.8 0.2 0.6Biodiversity 0.2 0.8 0.6 0.6Compliance 0.2 0.0 0.0 0.1Environment sustainability rating 0.4 0.5 0.2 0.4
0.4 0.6 0.4 0.5Dairy Industry sustainability rating
Study 1: viability of wild cheetah population in Namibia
Human Factors Subnetwork
Biological Factors Subnetwork
Ecological Factors Subnetwork
Combined “Object Oriented” BN (OOBN)
Study 2: Sustainability scorecard Measuring the complex interactions of sustainability
Collaboration with Dairy Australia
Aim: to develop a sustainability scorecard to measure Triple Bottom Line (TBL – economic, social and environmental) performance of agricultural systems.
– Key Dairy Stakeholder Review
– 2009 Diary Sustainability Project
– 2011 Materiality Survey (NetBalance)
– 2007/08 Australian Dairy Manufacturing Industry Sustainability Report (DMSC)
– Stakeholder TBL reports
Vital Capital Survey, SAFE framework, DairySAT, Fonterra Sustainability Indicators, Unilever Sustainable Code, Nestle, Lactalis / Parmalat / Pauls, Danone Sustainability Report, Dutch Dairy Farming, RISE, GRI
Sustainability Measurement Review
Dairy Scorecard – Conceptual BN
Social Farm
Economic Farm
Environmental Farm
Measurement of indicator
Initial Sustainability at the Farm
Using the quantified BN submodels & putting them together gives the initial predictive scores for sustainability at the farm level
• Now able to ask questions of the model, e.g.
1. If we improve social sustainability, how will it affect overall sustainability at the farm level?
What if …..?
High: 20% 39%, Medium: 48% 33%, Low: 32% 28%
What if …. ?2. If we improve sustainability at the farm level, what is the
effect on the TBL?
H,M,L: 25%, 39%, 26% 70%, 18%, 12%H,M,L: 25%, 62%, 13% 48%, 48%, 4%
H,M,L: 5%, 51%, 43% 13%, 60%, 27%
Economic
Social
Environmental
Sustainability scorecardIndicator Category Farm Factory Market RatingEconomic Commodity prices 0.8 0.6 0.6 0.7
Legal and administrative environment 0.0 0.1 0.8 0.3Access to capital and labour 0.6 0.8 0.8 0.7Profitability 0.3 0.8 0.7 0.6Workforce capabiility 0.1 0.8 0.6 0.5Economic sustainability rating 0.4 0.6 0.7 0.6
Social Lifestyle and community 0.0 0.5 0.1 0.3Health and well being 0.6 0.9 0.8 0.7Value and contribution 0.1 0.6 0.6Product, safety and production 0.8 0.0 0.0 0.1Social relevance 0.6 0.8 0.8 0.7Social sustainability rating 0.4 0.6 0.3 0.5
Environment Energy, effl uent and water 0.6 0.2 0.2 0.3Materials, suppliers and transport 0.2 0.8 0.6Products and services 0.8 0.8 0.2 0.6Biodiversity 0.2 0.8 0.6 0.6Compliance 0.2 0.0 0.0 0.1Environment sustainability rating 0.4 0.5 0.2 0.4
0.4 0.6 0.4 0.5Dairy Industry sustainability rating
Study 3: Water qualityInitiation of lyngbya in Moreton Bay
The policy questions
What is the overall scientific consensus about the drivers of lyngbya?
What management actions should be taken to reduce lyngbya in Moreton Bay, Australia?
Temperature
LowHigh
49.550.5
19.6 ± 9
Light Quantity
OptimalSubOptimal
20.080.0
Light Quality
PoorBorderlineHigh
10.040.050.0
Wind direction
NorthSEOther
21.024.055.0
Wind Speed
LowHigh
59.940.1
Ground Water Amount
LowHigh
73.126.9
Rain - Present
LowMediumHigh
62.026.012.0
142 ± 190
Dissolved Fe Concentration
LowHigh
56.743.3
Dissolved P Concentration
LowHigh
62.137.9
199 ± 300
Dissolved N Concentration
LowHigh
49.650.4
Dissolved Organics
LowHigh
51.049.0Sediment Nutrient Climate
NonReducingReducing
58.441.6
Avail nutrient pool (dissolved)
EnoughNot enough
33.666.4
Land Run-off Load
LowHigh
51.648.4
Tide
SpringNeap
50.050.0
Bottom Current Climate
LowHigh
48.052.0
Turbidity
LowHigh
45.454.6
Light Climate
InadequateAdequate
71.328.7
20.7 ± 12
Point Sources
LowMediumHigh
26.330.143.7
No.of previous dry days
LowMediumHigh
10.050.040.0
75.6 ± 110
Air
LowHigh
57.442.6
Particulates (Nutr)
LowHigh
45.154.9
2.8 ± 3.3
INITIATION MODEL
Bloom Initiation
NoYes
76.423.6
Most influential factors
1. Available Nutrient Pool
2. Bottom Current Climate
3. Sediment Nutrients
4. Dissolved Iron
5. Dissolved Phosphorous
6. Light
7. Temperature
MANAGEMENT
ACTIONS
“What-if” scenarios
Factor Change in P(Bloom)(%)
Available Nutrient Pool 77 (3% - 80%)
Bottom Current Climate 28 (15% - 43%)
Sediment Nutrient Climate 17 (21% - 38%)
Dissolved Fe 16 (21% - 37%)
Dissolved P 15 (23% - 38%)
Light Climate 14 (18% - 32% )
Temperature 14 (21% - 35%)
Dissolved N 13 (22% - 35%)
Rain – present 10 (25% - 35%)
Light Quantity 9 (21% - 30%)
From Science to Management
Study 4: Recycled Water and Health Handbook
Study 5: “Beyond Compliance”
An integrated approach to pest risk management
STDF – WTO funded project
5 SEA partners + OC: + QUT
Mumford et al.
44
• Production Chain
• Decision Support
• Control Point BN (CP-BN)
1: Production chainExporting Malaysian jackfruit to China
Decision support spreadsheet
Key Factors Score UncertaintyA2.01 Overall rating - Entry Unlikely LowA2.02 Overall rating - Establishment Moderately unlikely LowA2.03 Overall rating - Spread Moderate LowA2.04 Overall rating - Impact Minor Low
A2.05 How easy is it to detect the key organisms on the commodity / pathway?
Easy Medium
A2.06 How easy is it to identify the key organisms?
With some diffi culty Medium
A2.07 How well organised is the sector at risk in the importing country?
Mod. well organised Medium
A2.08 What is the estimated prevalence of the pest in the area where commodity is cultivated?
High Low
Decision support spreadsheet
1.1 a) What is its potential contribution to risk reduction?
1.1 b) Uncertainty
Graphic
1.2 a) The measure can be verified?
1.2 b) Uncertainty Graphic
Sterile insect technique (SIT)
Very high Low Very easy Very low
Pesticides spray program
High Medium Easy Low
Male annihilation, utilizing the attraction of males to methyl eugenol baits
High Low With some difficulty Low
Culling of over-crowded and disease infested fruits
High Low Easy Low
Bagging of fruits 14 days after fruit set
Very high Low Easy Low
High Low
High Low
Risk management measures available (automatically read in from Table B2)
Efficacy Verification
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VH H M L VL
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
0
0.2
0.4
0.6
0.8
1
VE E SD D VD
CP-BN
Economics add-on
• The final target node gives the probability of infestation at the point of export. This must be sufficiently low to comply with the requirements of the dragon fruit importer concerned.
• We also need to include the equally important issues of loss to fruit
production due to this infestation, and costs of control or preventive measures
• That is, what is the net value of the crop?
49
Economicsadding costs via utility nodes
50
Economicsadding losses utility nodes
51
J. Holt, A. W. Leach, S. Johnson, D. M. Tu, D. T. Nhu, N. T. Anh, L. N. Quang, M. M. Quinlan, P. J. L.Whittle, K. Mengersen and J. D. Mumford (in prep.) Bayesian networks to compare pest control interventions on commodities along agricultural production chains.
Methods Questions
52
1. How to elicit information from experts?
2. How to combine information from multiple experts?
3. How to assess the validity and reliability of a BN?
4. How to incorporate uncertainty into BNs?
5. How to combine BNs?
1. Eliciting expert information• Train experts prior to elicitation• Elicit using “outside-in” method
– Extrema: absolute lower and upper limits– Quantiles: realistic limits
(L, U) + uncertainty/sureness around these bounds– Mode: most plausible value
• Record as count, percentage or multiplicative factor• Encode via least squares as normal, lognormal, extended beta etc
2. Combining expert judgements
– Delphi method– Pooling– Modelling
Pooling
1. Average expert opinions for each node and propagate the averages through the network
2. Average after transforming probability to log odds
3. Propagate the opinions through the network for each expert and average the outputs for each expert
Average = linear or geometric, weighted or unweighted
Add a random effect for between-expert deviations
Modelling
• Random effects model• Measurement error model• Item response model
• Can obtain estimates of combined probabilities, node differences, expert differences
Probability in nodel lOverall Node effect Expert effect
3. Validity and reliability of a BN
57
Psychometric approachNomological: sits well within current academic thoughtFace: valid representation of the underlying systemContent: includes all potentially relevant factorsConcurrent: related measures in time/space vary similarlyConvergent: theoretically related measures matchDiscriminant: theoretically unrelated measures are different
Pitchforth, 2013
4. Incorporating uncertainty
• Add prior distributions to nodes• Propagate populations through the BN
(Donald et al. ANZJS 2015)
58
Prob. gastroenteritis (95% CI) = 0.030 (0.026, 0.034)
5. Combining BNs
Many perspectives = many potential models
How to combine outputs?
Model averaging approach– Obtain an estimate of goodness of fit for each
BN– Generate probabilities or ‘data’ from each BN– Obtain a weighted average of the desired
measures
How to combine structures?
TBC…59
60
Conclusion: Why BNs? Because sometimes the solutions are not where we are looking