states of protest david van brackle matt kirsch eli sakov chris serrano
TRANSCRIPT
States of Protest
David Van BrackleMatt Kirsch
Eli SakovChris Serrano
The Problem
• Can we predict likelihood/number of protests?• Can we do this with data from the ICEWS
project?– ICEWS = International Crisis Early Warning System– LM-ATL project– Data consists of aggregations of events
Events
2. Extracting information
about “Things That Happened”
(Events)
Source Event Target
LKA 012 KOR
USA 013 PHL
JAP 010 CHN
AUS 010 NZL
Statistical models. Aggregated event counts are used as model inputs
Aiding analysts. Events can be analyzed in any number of ways and viewed on timelines, maps, etc.
1. Ingesting information from
freeform text (e.g. news
stories)
3. Coding those events in a
standard way
Useful For:
Aggregations
• ICEWS has over 11,000 ways of aggregating events– By source/target– By event type– By goldstein code (Friendly vs Hostile)
• Part of our problem was to select the right aggregationsStart,End, GOVtALLhighhostilityct, GOVtALLeventstotals, ALLtGOVeventstotals, Protests12/31/2000, 1/6/2001, 0.0, -3.0, -2.0, 61/7/2001, 1/13/2001, 2.0, -14.6, -5.0, 21/14/2001, 1/20/2001, 0.0, -6.3, -5.7, 11/21/2001, 1/27/2001, 4.0, -67.8, 0.1, 81/28/2001, 2/3/2001, 5.0, -73.7, -39.5, 62/4/2001, 2/10/2001, 5.0, 6.8, -2.6, 52/11/2001, 2/17/2001, 3.0, -28.0, -33.0, 42/18/2001, 2/24/2001, 0.0, -9.6, -15.9, 92/25/2001, 3/3/2001, 14.0, -147.2, -33.2, 93/4/2001, 3/10/2001, 0.0, -22.2, -34.2, 33/11/2001, 3/17/2001, 2.0, 5.0, 26.0, 23/18/2001, 3/24/2001, 26.0, -436.2, -15.7, 33/25/2001, 3/31/2001, 0.0, -65.6, 13.0, 24/1/2001, 4/7/2001, 0.0, 13.8, -31.4, 44/8/2001, 4/14/2001, 4.0, -31.0, -18.9, 8
Day 1: Design the Model
• Examine/Understand the available data– Use BayesDB to look at some correlations and
dependencies• Talk about the problem• Design the states and connections• Plan: Learn parameters, then test against new
data
The Data - Protests
The Data - Others
The ModelGOVALL
Verbal
GOVALLMaterial
GOVALLArrests
ALLALLAttitude
ALLGOVAttitude
ALLGOVProtests
Anger
Fear
Fatigue
Trigger
Consecutive
GOVALLVerbal
GOVALLMaterial
GOVALLArrests
ALLALLAttitude
ALLGOVAttitude
ALLGOVProtests
Anger
Fear
Fatigue
Consecutive
Day 2: Build the ModelAbstract Weekly Model
First Weekly Model
Middle Weekly Model
Abstract Parameters
Prior Parameters
Learned Parameters
Overall Model It’s an array instead of sequential Universes, because we want to learn the same parameters over all weeks
Run the model twice• Once with Prior Parameters (distributions) to learn• Once with Learned Parameters (Constants) to predict
Day 3: Simplify the Model
GOVALLAttitude
GOVALLViolence
ALLGOVAttitude
ALLGOVProtests
Sentiment
Trigger
Consecutive
GOVALLAttitude
GOVALLViolence
ALLGOVAttitude
ALLGOVProtests
Sentiment
Trigger
Consecutive
Model Codeabstract class WeeklyModel { val trigger : Element[Double] val sentiment : Element[Double] val govtall : Element[Double] val hostile : Element[Double] val alltgov : Element[Double] val protests : Element[Double] val consecutiveHighProtest : Element[Double] }
class FirstWeeklyModel extends WeeklyModel{ val trigger = Constant(0.0) val sentiment = Normal( 0.0, 15.0 ) val govtall = Normal( -257.8, 273.477 ) val hostile = Exponential( 0.04 ) val alltgov = Normal( -117.2, 175.0 ) val protests = Exponential( 0.04 ) val consecutiveHighProtest = Normal( 30.0, 10.0 )}
Parameter Codeabstract class Parameters{ val sent2sent : Element[Double] val govtall2sent : Element[Double] val hostile2sent : Element[Double] val alltgov2sent : Element[Double] val consecutiveHighProtest2sent : Element[Double] val sent2protests : Element[Double] val protests2govtall : Element[Double] val alltgov2govtall : Element[Double] val protests2hostile : Element[Double] val govtall2alltgov : Element[Double] val hostile2alltgov : Element[Double] val highProtestThreshold : Element[Double] val trigger2sent : Element[Double] def toList() : List[Element[Double]] }
class PriorParameters extends Parameters{ val sent2sent = Cauchy( 1.0, 0.3 ) val govtall2sent = Cauchy( -1.0, 2.0 ) val hostile2sent = Cauchy( 1.0, 2.0 ) val alltgov2sent = Cauchy( -1.0, 2.0 ) val consecutiveHighProtest2sent = Cauchy( 30.0,10.0 ) val sent2protests = Cauchy( 30.0, 10.0 ) val protests2govtall = Cauchy( -30.0, 10.0 ) val alltgov2govtall = Cauchy( 1.0, 2.0 ) val protests2hostile = Cauchy( 1.0, 2.0 ) val govtall2alltgov = Cauchy( 1.0, 3.0 ) val hostile2alltgov = Cauchy( 5.0, 5.0 ) val highProtestThreshold = Cauchy( 30.0, 10.0 ) val trigger2sent = Cauchy( 10.0, 3.0 ) def toList() = List( sent2sent, govtall2sent, hostile2sent, alltgov2sent, consecutiveHighProtest2sent, sent2protests, protests2govtall, alltgov2govtall, protests2hostile, govtall2alltgov, hostile2alltgov, highProtestThreshold, trigger2sent ) }
More Model Codeclass MiddleWeeklyModel( previous : WeeklyModel, params : Parameters ) extends WeeklyModel{ def linear( prevs : List[Element[Double]], parms : List[Element[Double]] ) = { var mean = prevs.zip(parms).map( pair => Apply( pair._1, pair._2, (a:Double, b:Double)=>(a*b)) ).foldLeft(Constant(0.0): Element[Double])((a,b) => (a++b)) Chain( mean, (m:Double) => Normal(m, 3.0)) } def adjust( base : Element[Double], value : Element[Double], scale : Element[Double] ) = { var mean = Apply( base, value, scale, (b:Double, v:Double, s:Double ) => (b+v*s) ) Chain( mean, (m:Double) => Normal(m, 3.0)) } val trigger = Select( 0.00001 -> 1.0, 0.99999 -> 0.0 )
val consecutiveHighProtest = If( Apply(previous.protests, params.highProtestThreshold, (a:Double, b:Double)=>(a>b)), previous.consecutiveHighProtest++Constant(1.0) , Constant(0.0) )
val sentiment = linear( List(previous.sentiment, previous.govtall, previous.alltgov, previous.hostile, consecutiveHighProtest, trigger ), List(params.sent2sent, params.govtall2sent, params.alltgov2sent, params.hostile2sent, params.consecutiveHighProtest2sent, params.trigger2sent ) ) val govtall = linear( List( previous.protests, previous.alltgov ), List( params.protests2govtall, params.alltgov2govtall ) ) val alltgov = linear( List( previous.govtall, previous.hostile ), List( params.govtall2alltgov, params.hostile2alltgov ) ) val hostile = adjust( previous.hostile, previous.protests, params.protests2hostile ) val pmean = Apply( sentiment, params.sent2protests, (a:Double, b:Double) => (a*b)) val protests = Chain( pmean, (x:Double) => Normal( x, 3.0 ) )}
Overall Modelclass OverallModel( params : Parameters, size : Int, training : Boolean ){ val length = size val parms = params val weeks : Array[WeeklyModel] = new Array[WeeklyModel](size) weeks(0) = new FirstWeeklyModel weeks(1) = new MiddleWeeklyModel( weeks(0), params ) if( training ) for( i <- 2 until size ) { weeks(i) = new MiddleWeeklyModel( weeks(i-1), params ) } val dist = new NormalDistribution( 0.0, 20.0 )
def constrainValue(observed: Double, actual: Double) = { dist.logDensity( observed-actual ) } def populate( i : Int, values : HashMap[Symbol,Double], istrigger : Boolean ) { println( "Populating row " + i ); weeks(i).govtall.setLogConstraint(x => constrainValue(values('govtall),x)) weeks(i).alltgov.setLogConstraint(x => constrainValue(values('alltgov),x)) weeks(i).hostile.setLogConstraint(x => constrainValue(values('hostile),x)) weeks(i).protests.setLogConstraint(x => constrainValue(values('protests),x)) weeks(i).trigger.setLogConstraint(x => constrainValue(if(istrigger) 1.0 else 0.0 ,x)) }}
Trainingdef train( model : OverallModel ) { val lines = fromFile("/home/peval/Desktop/PakData.csv").getLines.toList for( i <- 1 until model.length ) { println( "Populating row " + i ) model.populate( i-1, parserow( lines(i) ), i==19 || i==168 || i==280 || i==610 ) } val alg = MetropolisHastings( ProposalScheme.default, model.parms.toList:_* ) alg.messageTimeout = new Timeout( 2, TimeUnit.MINUTES) alg.start() while(true) { Thread.sleep( 300000 ) alg.stop() println( "------------------------------" ) model.parms.toList.foreach { p => println( "= Constant(" + alg.expectation(p, (x:Double)=>x) + ")" ) } alg.resume() } }
More Parameter Codeclass LearnedParameters extends Parameters{ val sent2sent = Constant(1.3677075485827346) val govtall2sent = Constant(43.816705101176794) val hostile2sent = Constant(1.3705215838506093) val alltgov2sent = Constant(-0.30241081905766826) val consecutiveHighProtest2sent = Constant(42.377031937327615) val sent2protests = Constant(30.869314384095745) val protests2govtall = Constant(-19.542948727635462) val alltgov2govtall = Constant(2.574871223029797) val protests2hostile = Constant(-5.9124152162747405) val govtall2alltgov = Constant(29.32142628155293) val hostile2alltgov = Constant(3.272390543018787) val highProtestThreshold = Constant(23.695876311150947) val trigger2sent = Constant(12.330984097687368)
def toList() = List()}
Predicting def predict( model : OverallModel, i : Int, values : HashMap[Symbol,Double] ) { println( "Specifying alg") val alg = MetropolisHastings( 50000, ProposalScheme.default, model.weeks(1).protests ) alg.start() alg.stop() println( "predicted: " + alg.expectation(model.weeks(1).protests, (x:Double)=>x) + ", actual: " + values.get('protests) ) alg.kill() model.populate( 0, values, i==19 || i==168 || i==280 || i==610 ) }
def main(args: Array[String]) { val learned = new LearnedParameters val learnedmodel = new OverallModel( learned, 2, false ) val lines = fromFile("/home/peval/Desktop/PakData.csv").getLines.toList learnedmodel.populate(0, parserow(lines(104)), false ) for( i <- 1 until 100 ) { predict( learnedmodel, i, parserow(lines(i+104))) } }
Day 4: What Went Wrong?
• Results were terrible. Why?– Too many parameters?– Wrong distributions?– Wrong model?– What we’re looking for isn’t in the data?– Quality/stability of data? (Data comes from NLP)– Wrong algorithm? (Had to use M-H, it was the
only one that would run)– Need more time to tweak/fiddle?
Lessons Learned
• Just because a model is “intuitive” doesn’t mean it’s right
• Getting over “scala” was a little challenging, but Figaro was actually easy (for me) to use– Was able to make wholesale changes to the model
easily– Learning, Prediction were so similar, that code was
simple & reusable