visualizing inference in large bayesian networks (ucsd m.sc. project)
DESCRIPTION
In this project we address the challenge of viewing and using Bayesian networks as their structural size and complexity grows. We introduce two new visualization methods, inference diffs, and relevance filtering, to enable visual analysis of information flow in these networks, and to enable direct comparison of two evidence configurations simultaneously. We implement and discuss the performance of these visualization methods on two modestly large networks, built from real-world data.TRANSCRIPT
M.Sc. Project UCSD 2013
Cl i f ford Champion
<cchampio@cs>
Adviser : Prof . Charles Elkan
VISUALIZING INFERENCE
IN LARGE BAYESIAN
NETWORKS
December 10 th , 2013
What and Why
Designing the Visualization
Implementation and Results
B-Vis, F-AI
Traffic and Census data sets
Conclusion and Q&A
OUTLINE
WHAT AND WHY
2002: the indexed size of the internet was about 167 TB.
2002: > 330 TB of human-generated email was created.
2010: 50 billion user photos stored in Facebook .
2010: 130 TB of logs generated daily by Facebook.
2010: 2.5 PB of Walmart customer and transaction data .
2013: Over 50 GB of Tweets created daily on Twitter.
2013: eBay stores 40 PB dedicated to “deep” analysis.
“BIG DATA”
HOW DO WE USE ALL THIS DATA?
Image credit: http://commons.wikimedia.org/wiki/User:Shervinafshar
To quote Edward Tufte
“ often the most effective way to describe, explore, and summarize a
set of numbers – even a very large set – is to look
at pictures of those numbers ” (emphasis added)
“ data graphics can be both the simplest
[and] most powerful of methods ”
Visualizations help reveal interesting facts
and abstract relationships
Impossible or inefficient if using tabular data alone
In software applications, visualizations are a navigational tool
DATA VISUALIZATION WILL BE ESSENTIAL
Bayesian networks can be an important tool for “big data”
“Information flow” in Bayesian networks can be an opaque
concept
D-separation is not useful enough
More there beneath the surface?
Visualizing Bayesian networks well has been a goal largely
neglected
WHY THIS PROJECT?
A graphical model of random variables
On a scale of 0 to 1, how likely is rain today? (e.g.)
The edges of the graph define conditional (in)dependencies
between variables (nodes)
Can represent statistical, causal, and/or latent variables
What is the life expectancy of a non-smoker living in South America?
A car that won’t start can be caused by a dead battery. But being late
to work won’t cause a car to not start.
HMMs and topic clustering
Queryable: evidence goes in, new beliefs come out
If we know Winter=TRUE, what do we believe of Rain=TRUE?
BAYESIAN NETWORKS:
IN A NUTSHELL
Every random variable Y has a conditional probability
distribution P(Y|X1, . . . ,Xm(Y)), given m(Y) parents.
For our purposes, stored as a conditional probability table (CPT).
If Y has no parents, its probability distribution simplifies to P( Y).
Marginal distributions, e.g. P(Y) or P(Z), are
easily recovered/computed.
To create Bayesian network you must train (machine learning)
and/or hand-craft (expert interviews)
BAYESIAN NETWORKS:
IN A NUTSHELL
VISUAL DESIGN
Top-down causal ordering
Regularly the unstated choice for small networks
Difficult to satisfy in large/complex networks
Edge crossings are avoided
Also difficult to satisfy in
large/complex networks
THE STATE OF THE ART
Image credit: Kollar, Daphne, and Nir Friedman. Probabilistic graphical
models: principles and techniques. The MIT Press, 2009.
THE STATE OF THE ART
Conditional probability tables
Michele Cossalter, Ole J. Mengshoel, and Ted Selker. “Visualizing and Understanding Large-Scale Bayesian Networks“ The AAAI-11 Workshop
on Scalable Integration of Analytics and Visualization. 2011
CPT heatmaps
Natural representation for parents of count 1 only.
THE STATE OF THE ART
Chiang, Chih-Hung, et al. Visualizing Graphical Probabilistic Models. Technical Report 2005-017, UML CS, 2005.
Marginal distributions via embedded bar charts
THE STATE OF THE ART
Bayes Server (http://bayesserver.com)
Marginal distributions via shading
Binary variables only
Parent influence via hue blending
At most 2 parents, maybe 3
THE STATE OF THE ART
Zapata-Rivera, Juan-Diego, Eric Neufeld, and Jim E. Greer.
"Visualization of Bayesian belief networks." Proceedings of IEEE
Visualization’99, Late Breaking Hot Topics. 1999.
Williams, Lloyd, and Robert St Amant. "A Visualization Technique
for Bayesian Modeling." Proc. of IUI. Vol. 6. 2006.
Partition and fish-eye
User-driven
THE STATE OF THE ART
Sundarararajan, Priya Krishnan, Ole J. Mengshoel, and Ted Selker. "Multi-focus and multi-window techniques for interactive network
exploration." IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2013.
Design before code
Wireframes and mockups produce high-quality “what-if’s”
General principles:
Maximize the “data -ink ratio” (Tufte)
Don’t distort the data, don’t mislead the viewer (Tufte)
Maximize readability and cleanliness (me)
Goals specific to Bayesian networks
Present the “basics” clearly and conveniently
The effects of evidence should be stupidly obvious
Should scale to large networks
Shoshin (初心)
PHILOSOPHIES & OBJECTIVES
Photo credit: zeze57@flickr
What are the variables of the model?
What is the structure of the model?
STRUCTURE
STRUCTURE
Low contrast (no information beyond strokes/shading)
Single, capital letter for variable names;
subscripts if needed
Legend is ordered
w/layout vertical order
Structural view is
zoomable, scrollable
STRUCTURE
What are the event spaces?
EVENT SPACES
Each event space receives a
color mapping
Categorical spaces jump between
contrasting hues
Ordered spaces step through
similar hues
Legend is augmented accordingly
EVENT SPACES
What are the probability distributions captured in the model?
DISTRIBUTIONS
DISTRIBUTIONS
Distributions are embedded into each node
Pie chart slices are proportional to marginal probability masses
DISTRIBUTIONS
P (A)
P (V)
P (T1)
P (X)
P (T2)
What about evidence?
What about the effects of evidence?
SEEING INFERENCE
SEEING INFERENCE
Evidence nodes receive a black border
Query (non-evidence) nodes’ embedded
distributions updated to reflect
posterior distributions
SEEING INFERENCE
Let V=v
P (T1|V=v, A=a)
P (X|V=v, A=a)
P (T2|V=v, A=a)
Let A=a
What just happened? Dif ficult to see the change.
We need a way to perform a direct comparison.
Let E1 and E2 be evidence sets*, e.g. E1 = ( A=a ) and E2 = ( A=b, V=v)
Compute the posterior distributions separately
Visualize the posterior distributions together
Inspired by code dif fs in software engineering
* The word “set” is being abused here .
SEEING INFERENCE
AN “INFERENCE DIFF”
Inner “pie” is posterior for E1
Outer “ring” is posterior for E 2
Seeing the dif ference
Consistent event space coloring
Consistent event space ordering
Changes in area and color mass
easy to spot
Evidence in E1 and E2
distinguished by black borders
around pie and/or ring
AN “INFERENCE DIFF”
Evidence
in E1
Evidence
in E2
P ( X | E2 ) P ( X | E1 )
What if there are too many variables!
What about when I don’t know which variables to look for?
SCALING TO LARGER NETWORKS
RELEVANCE FILTERING
RELEVANCE FILTERING
Emphasize the variables with most change; diminish the rest
Use KL-divergence to quantify change [0, +Inf)
Call the top C% most changed
variables the “relevant” variables
Shrink & fade nodes of
irrelevant variables
Shorten and fade edges with
irrelevant variables
Reduces the canvas size needed
Facilitates discovery in
large models
IMPLEMENTATION AND
RESULTS
Structure and CPT learning (F#)
Structure space search using edge operations and BIC scoring
Dynamic programming algorithm, memoizing local scores
OO Design: Types defined for network, random variable, distribution, event space, etc.
Immutable type design made life easier and computations faster
Inference (F#)
Approximate inference using Markov Chain Monte Carlo / Gibbs sampling
Visualization Tool (C#)
Adopted a variation of the Model-View-Controller paradigm
Independent threads for learning, layout, inference, and UI/rendering
Used Microsoft WPF for vector graphics and user-input handling
Used open-source Graph# for Sugiyama graph layout
All source shared at https://github.com/duckmaestro/F -AI
SOFTWARE IMPLEMENTATION
Traffic flow measurements from the San Francisco bay area
highway system
32 sensor locations
4 discretized values of traffic flow amount from low to high
4415 examples
Acquired by Krause and Guestrin; reprocessed by Shahaf et al.
Model Training
Entire data set used
Uniform Dirichlet prior
Parent limit of 2
EXAMPLE: SAN FRANCISCO TRAFFIC
NETWORK
TRAFFIC NETWORK
“Traffic” Bayesian network
visualized in B-Vis.
Must zoom out a bit to see
entire model of this size.
Pictured with no evidence
configured.
INFERENCE DIFF OF TRAFFIC NETWORK
E1 = (empty)
E2 = (A4=‘medium’)
Relevance filtering: top 20%.
Reduction in overall space
requirement allows us to see
entire structure even while
zoomed in.
The impact of this evidence
diminishes as it propagates
in this model
1990 U.S. Census
68 attributes
Each attribute has categorical or discretized values
Each attribute has between 2 and 17 values
2.4 million examples
Discretized by Meek et al.; hosted by UCI Machine Learning
Repository
Model Training
First 10,000 randomly chosen examples
Uniform Dirichlet prior
Parent limit of 3
EXAMPLE: 1990 U.S. CENSUS
CENSUS NETWORK
“Census” Bayesian network
visualized in B-Vis.
Must zoom quite far out to
see entire model on screen
INFERENCE DIFF OF CENSUS NETWORK
E1 = (empty)
E2 = (‘income4’=‘yes’)
‘income4’
TOP 20% RELEVANT VARIABLES
‘ income4’ := Interest , d iv idends , o r renta l income in pr ior year.
Relevant var iables ( in no spec ial o rder) :
year o f immigrat ion ( ' immigr ' )
place of b i r th ( 'pob ' )
Hispanic her i tage ( 'h ispanic ' )
re lat ionship to the homeowner ( ' re lat2' )
whether l i v ing in a subfami ly ( ' subfam1')
number of subfami l ies ( ' subfam2')
whether work ing on a farm ( ' income3' )
whether the ind iv idual ser ved in the mi l i tar y dur ing no major war or conf l ict ( 'othrser v ' )
the i r ancest r y ( 'ancstr y2' )
the i r means of t ranspor tat ion to work ( 'means ' )
the i r s tatus in the job market ( 'avai l ' )
employment s tatus of parents ( ' remplpar ' )
‘income4’
LET’S DRILL DOWN
Let’s inspect
‘means’:
Increased
likelihood of no
daily commute
Decreased
likelihood of
bike, train, and
other non-auto
means of
commute
INFORMATION FLOW SURPRISE
Parents of ‘income4’ were
irrelevant in this inference
dif f: ‘ income6’, ‘rpincome ’ ,
‘rearning ’ .
Relevance filtering reveals
that in general the greatest
impact of evidence can be
“far away”. Snowball ef fect.
‘income4’
‘rpincome’
‘income6’
‘rearning’
PARTING THOUGHTS
Finding the greatest dif ference between two medical
treatments – a Bayesian network as a causal model
E1 = ( Age=38,
HasConditionX=True,
do(MedicationA=True),
do(MedicationB=False) )
E2 = ( Age=38,
HasConditionX=True,
do(MedicationA=False),
do(MedicationB=True) )
An inference dif f with relevance filtering could clearly
and visually present the greatest expected dif ferences in
prognosis and side-effects.
OTHER USES OF INFERENCE DIFFS
Layout stability is important
Better layout algorithms exist, and may be
customizable with relevance filtering in mind
Unused visual modalities have untapped potential
Node shape, additional evidence set rings, etc.
Large event spaces / continuous event spaces
Adaptive color space folding? / Density pie chart?
Alternative measures of “relevance”
User-specifiable event space value importance
Other graphical models
Dynamic Bayesian networks? Conditional random fields?
CHALLENGES & FUTURE WORK
Visualizations can help reveal insights
Visualizations can communicate dense information efficiently
We introduced inference diffs for direct comparisons of
posterior beliefs in Bayesian networks
We extended inference dif fs with relevance filtering
for assisting users in locating interesting phenomena in large
IN CONCLUSION
Thanks! Q & A
APPENDIX
UX question: is there an easy way to assign evidence?
Radial drag-drop menu
Keeps with pie chart motif
Drag outside inward to
assign evidence
Drag inside outward to
remove evidence
ASSIGNING EVIDENCE
Dropping on the
ring assigns
evidence in E2
Dropping on the
center assigns
evidence in E1
CONDITIONAL PROBABILITY TABLES
Use vertical space, not horizontal space
Event space color mapping reused
for probability masses and for parent
permutations
DISCOVERY AND NAVIGATION
Goldfarb, Doron, et al. "Art History on Wikipedia, a Macroscopic Observation."arXiv preprint arXiv:1304.5629 (2013).