recorded future – a white paper on temporal analytics
TRANSCRIPT
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 1/19
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 2/19
ontraditionaltextsearch,usingvariousalgorithmsbutreallylookingat
individualdocumentsinisolation.
Googlechangedthat,withitspublicdebutin1998.Google’ssecondgeneration
searchengineisbasedonideasfromanexperimentalsearchenginecalled
BackRub.AtitsheartisthePageRankalgorithm,andthisisthecoreofGoogle’ssuccess(togetherwithcleveradvertisingbasedrevenuemodels!).ThemainideaofthePageRankalgorithmistoanalyzelinksbetweenwebpages,andtorank a
pagebasedonthenumberoflinkspointingtoit,and(recursively)therankof
thepagespointingtoit.Thisuseofexplicitlinkanalysishasproventobetremendouslyusefulandsurprisinglyrobust(eventhoughGooglecontinuously
havetotweaktheiralgorithmstocombatattemptstomanipulatetherankingalgorithm).
RecordedFutureisdevelopingathirdgenerationanalyticsengine,whichgoesbeyondexplicitlinkanalysisandadsimplicitlinkanalysis,bylookingatthe
“invisiblelinks”betweendocumentsthattalkaboutthesame,orrelated,entitiesandevents.Wedothisbyseparatingthedocumentsandtheircontentfromwhattheytalkabout–the“canonical”entitiesandevents(yes,thismodelisheavily
inspiredbyPlatoandhisdistinctionbetweentherealworldandtheworldofideas).
Documentscontainreferencestothesecanonicalentitiesandevents,andweusethesereferencestorankcanonicalentitiesandeventsbasedonthenumberof
referencestothem,thecredibilityofthedocuments(ordocumentsources)
containingthesereferences,andseveralotherfactors(forexample,co-occurrenceofdifferenteventsandentitiesinthesameorinrelated
documentsisalsousedforranking).Thisrankingmeasure–calledmomentum–isouraggregatejudgmentofhowinterestingorimportantanentityoreventis
atacertainpointintime–notethatovertime,themomentummeasureof
coursechanges,reflectingadynamicworld.
Inadditiontoextractingeventandentityreferences,RecordedFuturealso
analyzesthe“timeandspacedimension”ofdocuments–referencestowhenandwhereaneventhastakenplace,orevenwhenandwhereitwill takeplace–since
manydocumentactuallyrefertoeventsexpectedtotakeplaceinthefuture.Wearealsoaddingmorecomponents,e.g.sentimentanalyses,whichdeterminewhat
attitudeanauthorhastowardshis/hertopic,andhowstrongthatattitudeis–
theaffectivestateoftheauthor.
Thesemantictextanalysesneededtoextractentities,events,time,location,
sentimentetc.canbeseenasanexampleofalargertrendtowardscreating“thesemanticweb”.
ThetimeandspaceanalysisdescribedaboveisthefirstwayinwhichRecorded
Futurecanmakepredictionsaboutthefuture–byaggregatingweighted
opinionsaboutthelikelytimingoffutureeventsusingalgorithmiccrowdsourcing.Inadditiontothis,wecanusestatisticalmodelstopredictfuture
happeningsbasedonhistoricalrecordsofchainsofeventsofsimilarkinds.
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 3/19
Thecombinationofautomaticevent/entity/time/locationextraction,implicit
linkanalysisfornovelrankingalgorithms,andstatisticalpredictionmodelsformsthebasisforRecordedFuture’stemporalanalyticsengine.Ourmissionis
nottohelpourcustomersfinddocuments,buttoenablethemtounderstandwhatishappeningintheworld.
RecordedFutureandBusinessIntelligence
Therehasbeenalongpathofinnovationinsystemsforbusinessintelligence–
tryingtohelpdecisionmakersincompaniesandorganizationsmakebetter,datadriven,decision.We’dliketothinkoftheseinthreegenerationsaswell:
• Firstgenerationbusinessintelligencetools(BI)wereallaboutreporting
andOLAPcubes,typicallytakinghistoricalfinancial,sales,and
manufacturinginformationandorganizingforanalysis.Veryhelpful–butveryfocusedonprovidingarearmirrorviewoftheworld
• Secondgenerationbusinessintelligencewasallaboutrealtime–hookingintorealtimedatasourcesaswellasrealtimeuserinteraction–allowing
decisionmakerstobothlookatverytimelydataaswellasadjustandinteractwithsuchviewsathighpace.
• Thirdgenerationbusinessintelligence,wewouldliketobelieve,willbeallaboutlookingoutsidecorporationsandgeneratingdataandanalytics
fordecisionmakingbasedontheworld,notjustoldhistoricalenterprise
data.ThisisRecordedFuture.
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 4/19
RecordedFutureatWork
Toillustratetheseideas,we’llpresentasimpleexample.Assumewehaveasetofdifferentsourcesfromthenet,asillustratedinthispicture:
Fromthesesources,weharvestdocuments,eitherfromRSSfeedsorotherforms
ofwebharvesting.Anexampledatasetmightcontainthefollowingdocumentswithshorttextsnippetsinthem:
Ouranalysisfirstdetectsentitiesmentionedinthedocument,anddecideswhich
entitycategorytheybelongto(inthisexample,blueforCompanies,OrangeforPersons,andgreenforCities):
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 5/19
Next,eventsinvolvingtheseentitiesaredetected;inthisexamplefivedifferent
kindsofevents:
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 6/19
Thesearethecanonicalevents;wenowaddeventreferences/instancesderivedfromthedifferentdocuments(andthesameforentityinstances,butforthesake
ofgraphicalclaritythesearenotincludedinthesepictures):
Oncethisanalysisiscompleted,wecanactuallydispose1oftheoriginaltexts,sincewehavecompletedthetransitionfromthetexttothedatadomain:
1Wedokeepreferencestotheoriginaldocuments,butwedonotstoreanycopyoftheactualtext.
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 7/19
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 8/19
SystemArchitecture
TheRecordedFuturesystemcontainsmanycomponents,whicharesummarized
inthefollowingdiagram:
Thesystemiscenteredroundthedatabase,whichcontainsinformationaboutallcanonicaleventandentities,togetherwithinformationabouteventandentity
references(sometimesalsocalledinstances),documentscontainingthese
references,andthesourcesfromwhichthesedocumentswereobtained.
Therearefivemajorblocksofsystemcomponentsworkingwiththisdatabase:
- Harvesting–inwhichtextdocumentsareretrievedfromvarioussources
onthenetandstoredinthedatabase(temporarilyforanalysis,longertermonlyifpermittedbytermsofuseandIPRlegislation).
- Linguisticanalysis–inwhichtheretrievedtextsareanalyzedtodetecteventandentityinstances,timeandlocation,textsentimentetc.Thisis
thestepthattakesusfromthetextdomaintothedatadomain.Thisisalsotheonlylanguagedependentcomponentofthesystem;asweare
addingsupportformultiplelanguagesnewmodulesareintroducedhere.
Weareusingindustryleadinglinguisticsplatformsforsomeoftheunderlyinganalyses,andcombinethemwithourownanalysistoolswhen
necessary.
- Refinement–inwhichdataisanalyzedtoobtainmoreinformation;this
includescalculatingthemomentumofentities,events,documentsand
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 9/19
evensources(seenextsection),calculationofsentiment,synonym
detection,andontologyanalysis.
- Dataanalysis–inwhichdifferentstatisticalandAIbasedmodelsare
appliedtothedatatodetectanomaliesinthedataandtogenerate
predictionsaboutthefuture,basedeitheronactualstatementsinthetextsorothermodelsforgeneralizingtrendsorhypothesizingfrompreviousexamples.
- Userexperience–thedifferentuserinterfacestothesystem,includingthewebinterface,overviewdashboard,alertmechanisms,andtheAPIfor
interfacingtoothersystems.
Momentum
Tofindrelevantinformationintheseaofdataproducedbyoursystem,weneed
somerelevancemeasure.Tothisend,wehavedeveloped“momentum”–a
relevancemeasureforeventsandentitieswhichtakesintoaccounttheflowofinformationaboutanentity/event,thecredibilityofthesourcesfromwhichthat
informationisobtained,theco-occurrencewithothereventsandentities,andsoon.Momentumisforexampleusedtopresentresultsinmostrelevantorder,and
canalsobeutilizedtofindsimilaritiesbetweendifferenteventsandentities.
UserExperience
EndusersinteractwithRecordedFuturethroughaseriesofrichuser
experiences.Theanalyticsqueryinterfaceallowsuserstospecifyevents(suchas
“PersonTravel”),entities(suchas“HuJintao”)andtimeintervals(suchas“2009”or“AnytimeintheFuture”):
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 10/19
Theresultscanthenbeanalyzedinseveraldifferentviews(details,charts,
timelines):
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 11/19
Videosshowingtheuseofthesystemareavailableat:http://www.youtube.com/recordedfuture
Finally,enduserscaneasilysubscribetoemailalerts(calledFutures)
correspondingtointerestingqueries.Livevisualizationswithup-to-datedata
fromRecordedFuturecanalsobeembeddedinblogs,etc.
Futures
FuturesareawayofstoringanalyticquestionsandhavingRecordedFuture
monitorthemwithrespecttothecontinuousflowofdatafromtheworld.AnyqueryinRecordedFuturecanbeturnedintoaFutureattheclickofagreen
button:
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 12/19
WhenaFutureisdefined,thefrequencyofupdatescanbespecified(andofcoursechangedlater),andtheFuturecanalsobesharedwithothers:
FuturesarethendeliveredastheyaredetectedbyRecordedFuture,inarich
emailformatwhichworkswellonbothlargeandsmallscreendevices:
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 13/19
API
DeveloperscanaccessRecordedFuturedataandanalyticsthroughawebservicesAPI(documentationavailable
http://code.google.com/p/recordedfuture).Queriestothesystemareexpressed
usingjson(http://json.org/)andresultsareprovidedasjsonorcsvtext.TheAPIcanbeusedtointerfaceRecordedFuturewithstatisticssoftwaresuchasR
(http://www.r-project.org/)orvisualizationsoftwaresuchasSpotfire
(http://spotfire.tibco.com/),aswellasproprietaryanalyticsapplications.
ExamplesofapplicationsoftheRecordedFutureAPIinclude:
• Algorithmictrading–usingtheRecordedFuturedatastreamtoenhance
automatedtrading/riskdecisionmaking,e.g.bymonitoringmomentumandsentimentdevelopmentofcompaniesinaportfolio.
• Mediamonitoring–buildingnewapplicationsthatmonitorsocialaswell
astraditionalmediacoverageofacompany,industrysector,organization,orcountry.
• Dashboards–usingtheRecordedFuturedatastreamtodisplaynovel,
externallyoriented,indicatorsoftheworld,likethefollowingverysimpleexample:
• GeographicalinformationaccessedthroughtheAPIcaneasilybeusedto
presentresultsin3rdpartyapplicationssuchasGoogleMapsandGoogle
Earth:
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 14/19
AFinalWord
RecordedFuturebringsaparadigmshifttoanalytics,byfocusingontimeasanessentialaspectoftheanalyst’swork.Sophisticatedlinguisticandstatistical
analysescombinedwithinnovativeuserinterfacesandapowerfulAPIbringsnewopportunitiestobothhumananalystsanddevelopersof3rdpartyanalytics
systems.Wecontinuouslydevelopalltheseaspectsofoursystemtobringnewtoolsintotheanalysts’hands-thefuturehasonlyjustbegun!
"Thus,whatenablesthewisesovereignandthegoodgeneraltostrikeandconquer,andachievethingsbeyondthereachofordinarymen,isforeknowledge."
(fromTheArtofWarbySunTzu,Section13)
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 15/19
WHITEPAPERADDENDUM
Plato,theCave,andRecordedFuture
StaffanTruvé,Ph.D.
TounderstandthephilosophybehindRecordedFuture,itishelpfultoconsiderthefamous“caveallegory”byPlato:
Platoimaginesagroupofpeoplewhohavelivedchainedinacavealloftheirlives, facingablankwall.Thepeoplewatchshadowsprojectedonthewallbythings
passinginfrontofafirebehindthem,andbegintoascribeformstotheseshadows. AccordingtoPlato,theshadowsareascloseastheprisonersgettoseeingreality.
Hethenexplainshowthephilosopherislikeaprisonerwhoisfreedfromthecave
andcomestounderstandthattheshadowsonthewallarenotconstitutiveofrealityatall,ashecanperceivethetrueformofrealityratherthanthemere
shadowsseenbytheprisoners.(en.wikipedia.org/wiki/Allegory_of_the_Cave)
(imagefromwww.thatmarcusfamily.org/philosophy/Course_Websites/Phil_Math/Photos/Cave.jpg )
Whatwereadinnewspapers,blogsetc.isnotunliketheshadowsonthewallofthecave–wegetreportsabouteventsintherealworld,andattempttousethat
informationtogetanideaaboutwhatisreallyhappening.Asgoodanalysts,wenaturallyconsultseveralsources,andweightogethertheinformationobtained
fromthem–alwayskeepinginmindthatsomesourcesaremorecrediblethan
others,andthusshouldbegivenhigherweight.Wecalltheevidencewegetfrom
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 16/19
differentreports“eventinstances”,andtherealworldeventstheyreportonwe
refertoas“canonicalevents”.
Acanonicalevent,inoursystem,isarepresentationofaparticularhappeningin
therealworld.Forexample,assumewereadthefollowingstatementintheNew
YorkTimes:
“BarackObamasaidyesterdaythatHillaryClintonwillbetravellingtoHaitinext
week”
Thisstatementdescribestwoevents:acanonical“Quotation”eventanda
canonical“PersonTravel”event.
Thequotationeventreferstoacanonicalentity,“BarackObama”,anda
statement“HillaryClintonwillbetravellingtoHaitinextweek”.Ithasanassociatedtime,“yesterday”.
The“PersonTravel”eventincludesreferencestotwocanonicalentities,“HillaryClinton”and“Haiti”,andhasanassociatedtime“nextweek”.
Notethat“yesterday”and“nextweek”arerelativetimes,andtoplacethemonanabsolutetimeaxisweneedtoknowwhentheentirestatementwasuttered.Let
usassumethatthestatementwasutteredonWednesday,March17th.Thenwe
mightrepresentthestatementpictoriallyinthefollowingway2:
2
Notethat“nextweek”isculturallydependant–intheUS,weeksbeginonSundayswhereasinmanyothercountriestheybeginonMondays!
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 17/19
Inoursystem,thisstatementwillberepresentedinthefollowingway:
Wehavethreecanonicalentities:BarackObamaandHillaryClinton,whicharePersonentities[bluerectangles],andHaiti,aLocationentity[greenrectangle].
Therearetwocanonicalevents[redovals]–“QuotationbyBarackObama”and
“PersonTravelofHillaryClintontoHaiti”.
Furthermore,thereareinstancesoftheseevents[pinkovals],whicharetagged
bythetimeortimeintervalduringwhichtheyareexpectedtohaveoccurredor
willoccur.
TheQuotationinstancealsohasareferencetothetextofthequoteandtothe
instanceoftheeventreferencedinthequote.
Finally,bothinstancesrefertothetextfragmentrepresentingtheoriginal
statement,andthefragmentreferstoitssource–theNewYorkTimes.
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 18/19
Multipletextdocuments,retrievedfromdifferentsources,canofcoursebeused
togatherevidenceofthesamecanonicalevent,i.e.,toprovidedifferentinstancesofthecanonicalevent.Severaldifferentcanonicalevents–andinstances–will
alsorefertothesameentities.Toextendourexample,let’saddthestatement:
“HillaryClintontomeetwithBanKi-MooninPortauPrinceonMarch23rd
”
Therepresentationofour“worldknowledge”willthenbeupdatedto:
Isthisallweknow?Notreally!RecordedFuturealsomaintainsanontology3,
withadditionalinformationaboutcanonicalentitiesandtheirrelationships.Inthisparticularexample,thefollowinginformationcanbefoundinourdatabase:
3Ontologyisthephilosophicalstudyofthenatureofbeing,existenceorrealityin
general,aswellasthebasiccategoriesofbeingandtheirrelations.Traditionally
listedasapartofthemajorbranchofphilosophyknownasmetaphysics,ontologydealswithquestionsconcerningwhatentitiesexistorcanbesaidtoexist,and
howsuchentitiescanbegrouped,relatedwithinahierarchy,andsubdivided
accordingtosimilaritiesanddifferences.(http://en.wikipedia.org/wiki/Ontology)
7/27/2019 Recorded Future – A White Paper on Temporal Analytics
http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 19/19
Combiningtheinformationderivedfromanalyzedtextandtheontologygivesus
thefollowingpictureforthisminimalexample.IntherealRecordedFuturedatabase,therearemillionsofeventinstances.Thisshouldgiveyouanidea
abouthowtherichnessofRecordedFuturedatacanhelpyouinanalyzingeventsintherealworld!
Additionalreadingonourblogs:
CompanyUpdates:http://blog.recordedfuture.com
Government&Intelligenceexamples:
http://www.AnalysisIntelligence.com
Finance&Statisticsexamples:
http://www.PredictiveSignals.com