Bigdatafordevelopment:LIRNEasia’sexperiences
SriganeshLokanathanh>p://lirneasia.net/projects/bd4d/
ThisworkwascarriedoutwiththeaidofagrantfromtheInternaGonalDevelopmentResearchCentre,CanadaandtheDepartmentforInternaGonalDevelopmentUK
UniversityofDhakaDhaka,3rdFebruary2017
Catalyzing policy change through research to improve people’s lives in the emerging Asia Pacific by facilitating their use of hard and soft infrastructures through the use of knowledge, information and technology.
LIRNEasiaisaregionalnon-profitthinktank.Ourmissionisthatof
Wherewework
SomedevelopmentproblemsofinteresttoLIRNEasia...
• MorepeopleliveinciGesthaninruralareassince2008– HowcanwemakeciGesmorelivable?– IstherearoleofICTs,notjustmoreroads,transit,etc.?
• InfecGousdiseasesareposingthreats– Canwemakebe>erdecisionsreallocaGngscarceresources?
• GovernmentsareflyingblindwithalackofGmelydatatobe>ertargetexpenditures,assessprograms,orachieveSDGs– Aretherewaystoremedythis?
4
ComprehensivecoverageofpopulaGonneeded.Sourcesofdata?
• AdministraGvedata– E.g.,digiGzedmedicalrecords,insurancerecords,taxrecords
• CommercialtransacGons(transacGon-generateddata)– E.g.,Stockexchangedata,banktransacGons,creditcardrecords,
supermarkettransacGonsconnectedbyloyaltycardnumber
• OnlineacGviGes/socialmedia– E.g.,onlinesearchacGvity,onlinepageviews,blogs/FB/twi>erposts
• Sensorsandtrackingdevices– E.g.,roadandtrafficsensors,climatesensors,equipment&
infrastructuresensors,mobilephonescommunicaGngwithbasestaGons,satellite/GPSdevices
5
MobileNetworkBigDataisonlyopGonatthisGme
6
Sources: https://www.gsmaintelligence.com/; http://www.itu.int/en/ITU-D/Statistics/Documents/publications/misr2015/MISR2015-w5.pdf; Facebook advertising portal; http://data.worldbank.org/indicator/SP.POP.TOTL
ButwhatkindofMNBDisappropriaterightnowforSriLanka?
• StreetbumpisaBostoncrowdsourcing+bigdataapplicaGonthatusesthenaturalmovementofciGzenstoimprovestreetmaintenance– Datageneratedfromanapp
downloadedtoasmartphone“mounted”inacar
• CanStreetbumpbetransplantedinColomboatthisGme?– Featurephones>>
Smartphones
• “Somethingbe>erthannothing”maynotapply– Biastowardroadstraversedby
smartphoneownersèIncondiGonsoflimitedresources,mayskewresourceallocaGon
7
Bigdatausedintheresearch• MulGplemobileoperatorsinSriLankaprovidedfourdifferenttypesof
meta-data– CallDetailRecords(CDRs):Recordsofcalls,SMS,Internetaccess– AirGmerechargerecords– NoVisitorLocaGonRegistry(VLR)data,becausetheyarewri>enover¬
stored
• DatasetsdonotincludeanyPersonallyIdenGfiableInformaGon– Allphonenumbersarepseudonymized– LIRNEasiadoesnotmaintainanymappingsofidenGfierstooriginalphone
numbers• Historical,notrealGme;thereforeanalyzedinbatchmodeinahardware
stackcosGng<USD30k• Cover50-60%ofusers;veryhighcoverageinWestern(whereColombo
thecapitalcityinlocated)&Northern(mostaffectedbycivilconflict)Provinces,basedoncorrelaGonwithcensusdata
8
• NowalsousingCCTVfootageaswellassatelliteimagery
Mobilenetworkbigdata+otherdataèrich,Gmelyinsightsthatserveprivateaswellaspublicpurposes
9
Construct Behavioral Variables
1. Mobility variables 2. Social variables 3. Consumption variables
Other Data Sources
1. Data from Dept. of Census & Statistics
2. Transportation data 3. Health data 4. Financial data 5. Etc.
Dual purpose insights
Private purposes
1. Mobility & location based services
2. Financial services 3. Richer customer
profiles 4. Targeted
marketing 5. New VAS
Public purposes
1. Transportation & Urban planning
2. Crises response + DRR
3. Health services 4. Poverty mapping 5. Financial inclusion
Mobile network big data (CDRs, Internet access usage, airtime recharge
records)
EXAMPLESOFONGOINGWORK
10
MNBDdatacangiveusgranular&high-frequencyesGmatesofpopulaGondensity
11
DSD population density from 2012 census
DSD population density estimate from MNBD
Voronoi cell population density estimate from MNBD
Popula9ondensitychangesinColomboregion:weekday/weekendPicturesdepictthechangeinpopula9ondensityatapar9cular9merela9vetomidnight
12
Wee
kday
Su
nday
Decrease in Density Increase in Density
Time 18:30 Time 12:30 Time 06:30
Ourfindingscloselymatchresultsfromexpensive&infrequenttransportaGonsurveys;arecheaper&canbeproducedasneeded
13
46.9%ofcity’sdayGmepopulaGoncomesfromoutside.PotenGalconfiguraGonsofaMetropolitanCorporaGon
HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me
popula9on
ColomboCity(2DSDs)
555,031
53.1
Total 555,031 53.1
HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me
popula9on
ColomboCity(2DSDs)
555,031
53.1
Maharagama 195,355
3.7
Kolonnawa 190,817
3.5
Kaduwela 252,057
3.3
SriJ’puraKo>e
107,508
2.9
Total 1,300,768 66.5
HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me
popula9on
ColomboCity(2DSDs)
555,031
53.1
Maharagama 195,355
3.7
Kolonnawa 190,817
3.5
Kaduwela 252,057
3.3
SriJ’puraKo>e
107,508
2.9
Dehiwala 87,834
2.6
Kesbewa 244,062 2.5
Wa>ala 174,336
2.5
Kelaniya 134,693
2.1
Ratmalana 95,162 2.0
Moratuwa 167,160 1.8
Total 2,204,015 79.9
HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me
popula9on
ColomboCity(2DSDs)
555,031
53.1
Maharagama 195,355
3.7
Kolonnawa 190,817
3.5
Kaduwela 252,057
3.3
SriJ’puraKo>e
107,508
2.9
Dehiwala 87,834
2.6
Kesbewa 244,062 2.5
Wa>ala 174,336
2.5
Total 1,807,000 74.1
Wecanunderstandtheextantoflocalizedtravel
15
Each colored area represents a region where the density of travel within it is greater than that across its borders: useful for
decisions on last-mile solutions for multi-modal transport
UnderstandingtrafficcondiGons
• CDRdataisn’tgoodenoughtounderstandtrafficcondiGons
• CCTVfootagegivesusabe>erchanceofunderstandingtrafficflow(volume,speeds)
16
Haar-feature classification Deep learning based classifier
WeexploitdiurnalbasestaGonsignaturesto…..
• WecanusethisinsighttogroupbasestaGonsintodifferentgroups,usingunsupervisedmachinelearningtechniques
17
TypeY:?TypeX:?
…..understandlandusepa>erns
18
Highly commercial
Highly residential
ButusingjustMNBDgetsusonlysofar
• Analysescurrentlybeingimprovedusingmachinevisiontechniquesonsatellitedata,aswellasusingclassifiersfromFoursquaredata
• 4differentsourcesofdataarebeingusedtocollecGvelygiveabe>erGmelypictureoflanduse– ExisGnggroundtruthsurveydata(dataotenoldandgeographically
sparse)– MNBD– Satelliteimagery– Foursquare
19
Finalresultswillhelpabe>ercomparisonofrealityagainstplans
20
2013MNBDanalyses 2020UDAplan 2000UDAlandusesurvey
2010SurveyDept.LandUseMap
WearealsoleveragingmobilitytoinfereconomicacGvity
Economicac9vity=
(numberofworkers)x(produc9vityperworker)
Observed Mustbeinferred
• WeassumemoreproducGveregionsaremorea>racGvedesGnaGons • CommuGngpa>ernsemergefromthetrade-offbetweena>racGvenessofaworkplaceandthecostofgewngthere
21
ExampleofcommuGngflowsfromoneoriginlocaGon
22
Biyagama Export
Processing Zone
HighCommuGngFlows
LowCommuGngFlows
23
WecandevelopnewproxymeasuresofeconomicacGvity
HighEsGmatedWage
LowEsGmatedWage
EsGmatedlog(wage)
Intuition: log(wage) is estimated as employment in excess of what is predicted by distance to residential population.
24
Nightlights Householddata IndustrialData
GeographicvariaGon
TimevariaGon yearly quarterly/2-3yrs/decade yearly/decade
RelevantvariablesEduca9on,
(un)employment,skilllevels
Employment,capitalintensity
Idealfor: ImprovingMeasure
Improving&ValidaGon
IncorporaGngotherdatacangivefurtherinsights
Householddata:Census/HIES/LFSIndustrialdata:ASI,IndustrialCensus
BenefitofanimprovedframeworkformodelingeconomicacGvity
• IncreasethecoverageofexisGngsurveys(bothtemporalandgeographic)– Bycalibra9ngwithhousehold,industrycensusandsurveydata,whenavailable
– Then,mobiledatacanbeusedtopredict/extrapolateforGmeperiodsandregionswithoutsurveydata
• Cancaptureinformaleconomicac9vity– Otherresearchsuggestsinformaleconomyisalmost30%ofGDPinSriLanka
25
HowdocommuniGesmatchexisGngadministraGvedivisions?
26 The 9 provinces The 11 detected
communities
WithsomeexcepGons,boundariesofcommuniGesdifferfromexisGngadministraGvedivisions
27
• Northern(1),Uva(10)andSouthern(11)communiGesmostsimilartoexisGngprovincialboundaries;but11takesEmbilipiGyaandKataragama
• Colombodistrictisclusteredasasinglecommunity(7)• GampahamergeswithcoastalbeltofNorthWesternProvince
(2)andKalutara(8)isitsowncommunity– WhatdoesthismeanforWesternProvinceMegapolis?
• Bawcaloa&AmparadistrictsoftheEasternProvincemergewiththePolonnaruwadistrictofNorthCentralProvincetoformitsowndisGnctcommunity(6)– PossiblyreflecGveofeconomiclinkagessincethisisthericebelt
ofSriLanka– Doeseconomicsoverrideethnicity?
Moredifferencesappearwhenwezoominfurther• Theli>oralregionsform
theirowndisGnctsub-communiGes
• ThenorthernpartofColombocityformsacommunitywithWa>ala,acrosstheKelaniriver
• Ingeneral,riversnolongerformnaturalboundariesofcommuniGes
• SuchworkcanpossiblyinformelectoralreformprocessinSriLankaandtheworkoftheDelimitaGonCommission
28 Bridge
PredicGngspaGalspreadofdengueinSriLanka
• SuchanalyseswillbeveryimportantshouldanycasesofzikabedetectedinSriLanka
• IniGalsresultsarebeingimprovedfurtherinpartnershipwiththeUniversityofMoratuwaandtheEpidUnitoftheMinistryofHealth
29
Comparing predicted & actual dengue outbreaks for Dehiwala MOH region in 2014
Week
# of
cas
es
POLICYIMPLICATIONS
30
ImplicaGonsforpublicpolicy• PopulaGonmapsandmobility/migraGonpa>ernsare
essenGalpolicytools– Includedinmostcensusandhouseholdsurveys
• Improve&extendmeasurement– Largesamplesizes– Veryfrequentmeasurement
• Moreprecisemeasures– CommuGng/seasonal/long-termmigraGon
• Improveplanningfor“spiky”eventsthatcreatepressureongovernmentandprivatelysuppliedservices– Humanitarianresponsetodisasters– Specialevents/holidays
31
ImplicaGonsforpublicpolicy• Urban&transportaGonplanning
– CanidenGfyhighvolumetransportcorridorstoprioriGzeforprovisionofmasstransit
– Mapdefactomunicipalboundaries
• Healthpolicy– Mobilitypa>ernsthatcanhelprespondtospreadofinfecGous
diseases(e.g.,dengue,malaria,zika,etc.)
• Almostreal-Gmemonitoringofurbanlanduse– Cancomplementcostlyandinfrequentlandusesurveys
• Providebe>ermeasuresofundetectedinformaleconomicacGvity
32
CHALLENGE1:POLICYIMPACT
33
Fromsupply-pushtodemand-pull
34
Event
2014Oct • FoundingChairhasone-on-onemeeGngwithSecretary,UrbanDevelopmentèpresentaGontourbandevelopmentprofessionalsinNovember2014agreedupon,butpostponedduetoearlyannouncementofPresidenGalElecGon
2015Jan • BigDatateamconductsapubliclectureorganizedbyInsGtuteofEngineersSriLanka(IESL)
2015Feb • EmailcontactmadewithnewDGofUDAwithoffertobriefonLA’songoingresearch(meeGngsplannedbutdon’thappen)
• FirstmediainteracGonsinSriLanka
2015May • BigDataTeamLeadermakes5-minutepresentaGonatWorkshoponimplementaGonoftransportaGonmasterplanforMinistryofInternalTransportorganizedbyUoM’sDept.ofTransport&LogisGcs.A>endedbydomainspecialists,academics,researchers,officialsfromUDA,RDA,etc.
2015Jun-Sept
• BigDataTeamLeaderinvitedtojoinplanningteamfortheWesternRegionMegapolisPlanningProjecttoprovideinsightsfromMNBD.Ournamesuggestedbyoneofthea>endeesintheMay2015event,whowasappointedtoleadoneofthecommi>eesworkingontheWRMPP
• Intotal9meeGngswerea>ended,culminaGnginapresentaGontoallthecommi>eesoninsightsfromLIRNEasiabigdataresearchrelatedtourbanandtransportaGonplanning
2015Aug • BigDataTeamLeaderpresentsatWorkshoponIntegratedLandUseTransportModelingPracGcesinSriLankaandaroundtheWorldorganizedbyUoM’sTransportaGonEngineeringDivision.A>endedbydomainspecialists,academics,researchers,officialsfromUDA,RDA,etc.
Polic
y En
light
enm
ent
/ Sup
ply-
push
D
eman
d-Pu
ll
35
Sri Lanka’s highest-circulation English newspaper
Sunday April, 12, 2015
Fromsupply-pushtodemand-pull
36
Event
2015Sept • BigDataTeamLeadera>endslaunchofWorldBankReporton“LeveragingUrbanizaGoninSouthAsia”
• SidediscussionswithDGofUDAonLIRNEasia’songoingresearchleadstoone-on-onemeeGngwithDGthefollowingdaytobriefhimonLIRNEasiaresearch
2015Dec • UDA,UDA’sProfessionalsAssociaGon,&YoungPlannersForumofInsGtuteofTownPlanners,SriLankaorganizespecialsessionforLAtopresentongoingresearch.
• EchelonMagazinereportonMegapolisplansincludechartsgivenbyUDAonsourcelocaGonsofColombo’sdayGmepopulaGondevelopedbyLIRNEasia(withoutacknowledgement)
2016Jan • BigDatateaminvitedtomakepresentaGontoSriLankaStrategicCiGesDevelopmentProjectworkingonKandy
2016Feb • DGUDArequestsaddiGonalmobilityandland-useinsightsonKandy
2016Feb • SriLankaStrategicCiGesDevelopmentProjectreachesouttoLIRNEasiaforinsightsonfoottrafficinKandy.Ourdatanotsuitablebutwebrainstormpossiblemethodologies
2016Aug • UDArequestsaddiGonalfiner-grainedmobilityinsightsforspecificareasinWesternProvince
Polic
y En
light
enm
ent
/ Sup
ply-
push
Dem
and-
Pull
WesternRegionalMegapolisProject(WRMP)
37
Source: Interview with Western Region Megapolis Authority in Echelon magazine (December 2015, pp. 63)
Lessons• Nodemandforinsightswhenwestartedin2012;buthadwe
notstartedontheresearchwhenwedid,noresultswouldhavebeenavailablewhenthepolicywindowopened
• Givendifficultyofassessingqualityofbig-dataresearch,thecredibilityofLIRNEasiahasbeenofvalue– MoreworkneedstobedonetomakepotenGalusersofthisresearch
moreknowledgeable
• Supply-pushapproachhelpedcreatethecondiGonsforessenGaldemand-pull– However,poliGcalchangesandappointmentswhichwereoutsideour
controlwerecriGcalincreaGngdemand
38
CHALLENGE2:HUMANRESOURCES
39
DatascienGstsareinshortsupply;NeededaremulG-disciplinaryteams
• WeprioriGzeanalyGcalthinkingoverknowledgeofbigdatatoolsinourrecruitmentinterviews
• TeammembershavedifferentspecialGes• Staffandcollaboratorshaveamixofcomputer-scienceskills,staGsGcs,anddomainknowledge
40
41
• DanajaMaldeniya(ComputerScience,StaGsGcs)
– NowatUofMichiganbutsGllcollaboraGng
• CDAthuraliya(ComputerScience)• DedunuDhananjaya(SotwareEngineer,
SystemsAdministrator)– Movedtoprivatesector,butsGllcollaboraGng
• IsuruJayasooriya(ComputerScience,MachineVision)
• KaushalyaMadhawa(ComputerScience,StaGsGcs)
– NowatTokyoInsGtuteofTechnology
• KeshandeSilva(ComputerScience,AgentBasedSimulaGons)
• LasanthaFernando(ComputerScience)• MadhushiBandara(ComputerScience)
– NowatUofNewSouthWalesbutsGllcollaboraGng
• NisansadeSilva(ComputerScience)– NowatUofOregon,butsGllcollaboraGng
• RobertGalyean(MathemaGcs,Physics)• Prof.RohanSamarajiva(PublicPolicy)• SriganeshLokanathan(PublicPolicy,
ComputerScience)• ShaznaZuhyle(ResearchManager)• ThavishaGomez(ResearchManager)
Staff Collaborators • Prof.AmalKumarage(Dept.of
Transport&LogisGcs,UoM)– TransportaGon,UrbanPlanning
• DrAmalShehanPerera(Dept.ofCSE,UoM)
– DataMining
• FieldsofView– IndianresearchinsGtutespecializingin
gamesandsimulaGonsforpublicpolicyissues
• GabrielKreindler(MIT)– Economics,StaGsGcs
• ProfJoshuaBlumenstock(UCBerkley,SchoolofInformaGon)
– DataScience,Economics,StaGsGcs
• ProfMoinulZaber(UofDhaka)– DataScience,PublicPolicy
• Shibasaki&SekimotoLaboratory,UniversityofTokyo
– BigDataforDevelopmentresearchpracGce
• YuheiMiyauchi(MIT)– Economics,StaGsGcs
Flowthrough:We’realwayshiring• Acomputer-sciencegraduatedoesnotthinkofworkingina
think-tank;otenlookingtojoinasotwarefirm– LIRNEasiahasdonepresentaGonsinpublicforumsandatuniversiGes
tobroadenhorizons
• Keysellingpointistheworkthatwedoandourpartnerships– Rewardingtoseeresearchbeingused– GoodopportuniGesforpublicaGonandconferencepapers– LIRNEasiaencouragesindividualresearcherstobuildtheir“brands”– IdealforadmissionintogoodPhDprograms;currentfunded
placements• UOregon• UMichigan• TokyoInsGtuteofTechnology• UNewSouthWales
42
Privacy• ThehighertheresoluGonofyourinsights,thegreatertheprivacyimplicaGons
• Mixingnon-personaldatawithothersourcescanrevealpersonala>ributes
• Informedconsentismeaninglessinabigdataworld
• Peopleotendon’tknowwhattheirgeneralizableprivacyneedsareorhowtheirpreferencesmightevolve
43
Weshouldalsobeaskingthefollowing:
• Shouldweconcentrateondefiningandenforcingprivacyrightsorreducingharms?
• Whenitcomestodevelopingeconomies,shouldweworryaboutinclusionorexclusioninbigdata?
• WhataboutcompeGGon?
44
SelectedPublicaGons&Reports• Lokanathan,S.,Kreindler,G.,deSilva,N.D.,Miyauchi,Y.,Dhananjaya,D.,&Samarajiva,R.
(2016).UsingMobileNetworkBigDataforInformingTransportaGonandUrbanPlanninginColombo.Informa*onTechnologies&Interna*onalDevelopment
• Samarajiva,R.,Lokanathan,S.,Madhawa,K.,Kriendler,G.,&Maldeniya,D.(2015).Bigdatatoimproveurbanplanning.EconomicandPoli*calWeekly,VolL.No.22,May30
• Maldeniya,D.,Lokanathan,S.,&Kumarage,A.(2015).Origin-DesGnaGonmatrixesGmaGonforSriLankausingmobilenetworkbigdata.13thInternaGonalConferenceonSocialImplicaGonsofComputersinDevelopingCountries.Colombo
• Kreindler,G.&Miyauchi,Y.(2015).CommuGngandProducGvity:QuanGfyingUrbanEconomicAcGvityusingCellPhoneData.LIRNEasia
• Lokanathan,S&Gunaratne,R.L.(2015).MobileNetworkBigDataforDevelopment:DemysGfyingtheUsesandChallenges.Communica*ons&Strategies.
• Lokanathan,S.(2014).TheroleofbigdataforICTmonitoringandfordevelopment.InMeasuringtheInforma*onSociety2014.InternaGonalTelecommunicaGonUnion.
Moreinforma9on:hdp://lirneasia.net/projects/bd4d/
45