welcome to ds504/cs586: big data analyticsyli15/courses/ds504fall16/...• final project...
TRANSCRIPT
![Page 1: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/1.jpg)
DS504/CS586: Big Data Analytics Application I Prof. Yanhua Li
Welcome to
Time:6:00pm–8:50pmRLoca2on:AK232
Fall2016
![Page 2: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/2.jpg)
• 18 critiques & Next Thur we have the last critique. – Already graded 8 of them. – Plan to grade 1-2 more.
![Page 3: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/3.jpg)
• Grading – Projects (40%)
• Project 1 (10%) • Project 2 (30%)
– Finalreportsinthediscussionforum(by11:59pm12/13);– Self-and-peerevalua2onformforproject2(by11:59PM12/13);
– WriPenwork(30%):• Cri2ques+Projectreports(20%)• Quiz(10%,with5%each)
– Oralwork(30%):• Presenta2on
![Page 4: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/4.jpg)
• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule:
• 12/15Thu
![Page 5: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/5.jpg)
5
Next Class: Summary and Discussion
v Review of the semester v Plus the last critique/review
![Page 6: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/6.jpg)
Ar2ficialNeuralNetworks(ANN)
X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0
X1
X2
X3
Y
Black box
Output
Input
OutputYis1ifatleasttwoofthethreeinputsareequalto1.
![Page 7: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/7.jpg)
Ar2ficialNeuralNetworks(ANN)
X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0
S
X1
X2
X3
Y
Black box
0.3
0.3
0.3 t=0.4
Outputnode
Inputnodes
⎩⎨⎧
=
>−++=
otherwise0 trueis if1
)( where
)04.03.03.03.0( 321
zzI
XXXIY
![Page 8: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/8.jpg)
Ar2ficialNeuralNetworks(ANN)
• Modelisanassemblyofinter-connectednodesandweightedlinks
• Outputnodesumsupeachofitsinputvalueaccordingtotheweightsofitslinks
• Compareoutputnodeagainstsomethresholdt
S
X1
X2
X3
Y
Black box
w1
t
Outputnode
Inputnodes
w2
w3
)( tXwIYi
ii −= ∑PerceptronModel
)( tXwsignYi
ii −= ∑
or
![Page 9: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/9.jpg)
GeneralStructureofANN
Activationfunction
g(Si )Si Oi
I1
I2
I3
wi1
wi2
wi3
Oi
Neuron iInput Output
threshold, t
InputLayer
HiddenLayer
OutputLayer
x1 x2 x3 x4 x5
y
TrainingANNmeanslearningtheweightsoftheneurons
![Page 10: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/10.jpg)
Real-worldproblemsarealwaysmessy
• Mul2plemodels
• Keyfeatures
• DataSparsity
![Page 11: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/11.jpg)
• Whatdowedotosolveaclassifica2on/inference/predic2onproblem?– DataCleaning
– Featureselec2on
– Inferencemodel
– Evalua2on
• Anexampleofhowtosolverealworldapplica2onproblem
![Page 12: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/12.jpg)
U-Air:WhenUrbanAirQualityMeetsBigData
Authors:YuZheng,MicrosogResearchAsia
![Page 13: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/13.jpg)
Background • Airquality
– NO2,SO2– Aerosols:PM2.5,PM10
• WhyitmaPers– Healthcare– Pollu2oncontrolanddispersal
• Reality– Buildingameasurementsta2on
isnoteasy– Alimitednumberofsta2ons
(poorcoverage)
Beijingonlyhas22airqualitymonitorsta2onsinitsurbanareas(50kmx40km)
Air quality monitor station
![Page 14: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/14.jpg)
2PM,June17,2013
![Page 15: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/15.jpg)
Challenges• Airqualityvariesbyloca2ons
non-linearly• Affectedbymanyfactors
– Weathers,traffic,landuse…– Subtletomodelwithaclearformula
0 40 80 120 160 200 240 280 320 360 400 440 4800.00
0.05
0.10
0.15
0.20
0.25
0.30
Por
titio
n
Deviation of PM2.5 between S12 and S13
>35%Prop
or2o
n
A) Beijing (8/24/2012 - 3/8/2013)
![Page 16: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/16.jpg)
WedonotreallyknowtheairqualityofalocaGonwithoutamonitoringstaGon!
![Page 17: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/17.jpg)
Challenges • Exis2ngmethodsdonotworkwell
– Linearinterpola2on– Classicaldispersionmodels
• GaussianPlumemodelsandOpera2onalStreetCanyonmodels• Manyparametersdifficulttoobtain:Vehicleemissionrates,street
geometry,theroughnesscoefficientoftheurbansurface…
– Satelliteremotesensing• Sufferfromclouds• Doesnotreflectgroundairquality• Varyinhumidity,temperature,loca2on,andseasons
– Outsourcedcrowdsensingusingportabledevices• Limitedtoafewgasses:CO2andCO• Sensorsfordetec2ngaerosolarenotportable:PM10,PM2.5• Alongperiodofsensingprocess,1-2hours
30,000+USD,10ug/m3 202×85×168(mm)
![Page 18: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/18.jpg)
InferringReal-TimeandFine-GrainedairqualitythroughoutacityusingBigData
Meteorology Traffic POIs RoadnetworksHumanMobility
Historicalairqualitydata Real-2meairqualityreports
![Page 19: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/19.jpg)
![Page 20: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/20.jpg)
Applica2ons • Loca2on-basedairqualityawareness
– Fine-grainedpollu2onalert– Rou2ngbasedonairquality
• Deployingnewmonitoringsta2ons• Asteptowardsiden2fyingtherootcauseofairpollu2on
S2
S1
S5
S3
S7
S6S4
S1
S8
S9
S10
B) Shanghai
![Page 21: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/21.jpg)
Difficul2es • 1.Howtoiden2fyfeaturesfromeachkindofdatasource
• 2.Incorporatemul2pleheterogeneousdatasourcesintoalearningmodel– Spa2ally-relateddata:POIs,roadnetworks– Temporally-relateddata:traffic,meteorology,humanmobility
• 3.Datasparseness(liPletrainingdata)– Limitednumberofsta2ons– Manyplacestoinfer
![Page 22: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/22.jpg)
MethodologyOverview • Par22onacityintodisjointgrids• Extractfeaturesforeachgridfromitsaffec2ngregion
– Meteorologicalfeatures – Trafficfeatures– Humanmobilityfeatures– POIfeatures– Roadnetworkfeatures
• Co-training-basedsemi-supervisedlearningmodelforeachpollutant
– PredicttheAQIlabels– Datasparsity– Twoclassifiers
![Page 23: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/23.jpg)
MeteorologicalFeatures:Fm
• Rainy,Sunny,Cloudy,Foggy• Windspeed• Temperature• Humidity• Barometerpressure
GoodModerate
UnhealthyUnhealthy-S
Very Unhealthy
AQIofPM10AugusttoDec.2012inBeijing
![Page 24: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/24.jpg)
TrafficFeatures:Ft
• Distribu2onofspeedby2me:F(v)• Expecta2onofspeed:E(V)• Standarddevia2onofSpeed:D
Good Moderate
km
Unhealthy-S Very UnhealthyUnhealthy
0≤v<20
20≤v<40
v≥ 40
E(v)
D(v)
GPStrajectoriesgeneratedbyover30,000taxisFromAugusttoDec.2012inBeijing
![Page 25: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/25.jpg)
HumanMobilityFeatures:Fh • Humanmobilityimplies
– Trafficflow– Landuseofaloca2on– Func2onofaregion(likeresiden2alorbusinessareas)
• Features:– Numberofarrivals 𝑓↓𝑎 andleavings𝑓↓𝑙
A) AQI of PM10 B) AQI of NO2
fl fl
fa faGoodModerate
UnhealthyUnhealthy-S
Very Unhealthy
GoodModerate
UnhealthyUnhealthy-S
Very Unhealthy
Numberofarrivalsfaandleavings(departures)fl
Parksvsfactories
![Page 26: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/26.jpg)
Extrac2ngTraffic/HumanMobilityFeatures
• Offlinespa2o-temporalindexing• ta:arrival2me• Traj:trajID• Ii:theindexofthefirstGPSpoint(inthetrajectory)enteringagrid• Io:theindexofthelastGPSpoint(inthetrajectory)enteringthegrid
![Page 27: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/27.jpg)
POIFeatures:Fp• WhyPOI
– Indicatethelanduseandthefunc2onoftheregion– thetrafficpaPernsintheregion
• Features– NumbersofPOIsovercategories– Por2onofvacantplaces– ThechangesinthenumberofPOIs
• Factories,shoppingmalls,• hotelandrealestates• Parks,decora2onandfurnituremarkets
![Page 28: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/28.jpg)
RoadNetworkFeatures:Fr • Whyroadnetworks
– Haveastrongcorrela2onwithtrafficflows– Agoodcomplementaryoftrafficmodeling
• Features:
– Totallengthofhighways𝑓↓ℎ – Totallengthofother(low-level)roadsegments𝑓↓𝑟 – Thenumberofintersec2ons𝑓↓𝑠 inthegrid’saffec2ngregion
fh
fr
fs
![Page 29: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/29.jpg)
Semi-SupervisedLearningModel
• Philosophyofthemodel– Statesofairquality
• Temporaldependencyinaloca2on• Geo-correla2onbetweenloca2ons
– Genera2onofairpollutants• Emissionfromaloca2on• Propaga2onamongloca2ons
– Twosetsoffeatures• Spa2ally-related• Temporally-related
s2s1
s3
s4l
s2s1
s3
s4l
s2s1
s3
s4
ti
t1
t2
lTime
Geospace
A location with AQI labels A location to be inferred Temporal dependencySpatial correlation
POIs: Spatial
Fh Temporal
Road Networks: Fr
Ft FmMeteorologic:Traffic:Human mobility:
FpSpa2alClassifier
TemporalClassifierCo-Training
![Page 30: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/30.jpg)
Co-Training-BasedLearningModel
• Spa2alclassifier– Modelthespa2alcorrela2onbetweenAQIofdifferentloca2ons– Usingspa2ally-relatedfeatures– BasedonaBPneuralnetwork
• Inputgenera2on– Selectnsta2onstopairwith– Performmrounds
∆P1x
∆R1x
c
D1Fp
Frl1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Frlk
k
k
Fpx Frx lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11 b1
bq
b'r
b'1
b''
∆P1x
∆R1x
c
D1Fp
Frl1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Frlk
k
k
Fpx Frx lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11 b1
bq
b'r
b'1
b''
![Page 31: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/31.jpg)
Co-Training-BasedLearningModel
• Temporalclassifier– Modelthetemporaldependencyoftheairqualityinaloca2on– Usingtemporallyrelatedfeatures– BasedonaLinear-ChainCondi2onalRandomField(CRF)
Yt-1
Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)
Yt Yt-1
![Page 32: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/32.jpg)
LearningProcess
Yt-1
Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)
Yt Yt-1
∆P1x
∆R1x
c
D1Fp
Frl1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Frlk
k
k
Fpx Frx lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11 b1
bq
b'r
b'1
b''
Training
Temporally-relatedfeatures
Spa2ally-relatedfeatures
Labeleddata Unlabeleddata
Inference
![Page 33: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/33.jpg)
InferenceProcess
∆P1x
∆R1x
c
D1Fp
Frl1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Frlk
k
k
Fpx Frx lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11 b1
bq
b'r
b'1
b''
Temporally-relatedfeatures
Spa2ally-relatedfeatures
<𝑝↓𝑐1 , 𝑝↓𝑐2 ,…,𝑝↓𝑐𝑛 >
<𝑝′↓𝑐1 , 𝑝′↓𝑐2 ,…,𝑝′↓𝑐𝑛 >
× 𝑐= arg↓𝑐↓𝑖 ∈𝒞 𝑀𝑎𝑥( 𝑝↓𝑐𝑖 × 𝑝′↓𝑐𝑖 )
< pc1 , · · · , pcn >
< p0c1 , · · · , p0cn >
c = argci2CMax(P ciSC ⇥ P
ciTC)
Yt-1
Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)
Yt Yt-1
![Page 34: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/34.jpg)
Evalua2on
Datasources Beijing Shanghai Shenzhen Wuhan POI 2012Q1 271,634 321,529 107,061 102,467
2012Q3 272,109 317,829 107,171 104,634
Road #.Segments 162,246 171,191 45,231 38,477 Highways 1,497km 1,963km 256km 1,193km Roads 18,525km 25,530kmKM 6,100km 9,691km #.Intersec. 49,981 70,293 32,112 25,359
AQI #.Sta2on 22 10 9 10 Hours 23,300 8,588 6,489 6,741 Timespans 8/24/2012-3/8/201
3 1/19/2013-3/8/20
13 2/4/2013-3/8/201
3 2/4/2013-3/8/2013
UrbanSize(grids) 5050km(2500) 5050km(2500) 5745km(2565) 4525km(1165)
• Datasets
S1
S2
S4
S5
S8
S5
S2
S1
S7
S5
S3
S3
S6 S7S6
S9S10
S12
S11
S13 S14
S22S15
S16
S16
S17
S18
S19
S20
S21S3
S7
S6
S4
S1
S8
S9
S10
S1
S4
S2
S6
S9
S8
S1
S2
S4
S3S10
S5
S9
S6 S7
S8
A) Beijing B) Shanghai C) Shenzhen D) Wuhan
![Page 35: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/35.jpg)
Evalua2on
• GroundTruth– Removeasta2on– Crossci2es
• Baselines– LinearandGaussianInterpola2ons– ClassicalDispersionModel– DecisionTree(DT):– CRF-ALL– ANN-ALL
![Page 36: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/36.jpg)
Evalua2on
PM10 NO2
Features Precision Recall Precision Recall
Fm 0.572 0.514 0.477 0.454 Ft 0.341 0.36 0.371 0.35 Fh 0.327 0.364 0.411 0.483 Fp+Fr 0.441 0.443 0.307 0.354
Fm+Ft 0.664 0.675 0.634 0.635 Fm+Ft+Fp+Fr 0.731 0.734 0.701 0.691
Fm+Ft+Fp+Fr+Fh 0.773 0.754 0.723 0.704
• Doeseverykindoffeaturecount?
![Page 37: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/37.jpg)
Evalua2on • Overallperformanceoftheco-training
0 20 40 60 80 100 120 140 160
0.65
0.70
0.75
0.80
Pre
cisi
on
Num. of Iterations
SC TC Co-Training
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
NO2PM10
Acc
urac
y
U-Air Linear Guassian Classical DT CRF-ALL ANN-ALL
Accuracy
![Page 38: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/38.jpg)
Evalua2on
GroundTruth
PredicGons
G M S U
G 3789 402 102 0 0.883
Recall
M 602 3614 204 0 0.818 S 41 200 532 50 0.646 U 0 22 70 219 0.704 0.855 0.853 0.586 0.814 0.828
Precision
• ConfusionmatrixofCo-TrainingonPM10
![Page 39: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/39.jpg)
Evalua2on
CiGes PM2.5 PM10 NO2
Prec. Rec. Prec. Rec. Prec. Rec.
Beijing 0.764 0.763 0.762 0.745 0.730 0.749
Shanghai 0.705 0.725 0.702 0.718 0.715 0.706
Shenzhen 0.740 0.737 0.710 0.742 0.732 0.722
Wuhan 0.727 0.723 0.731 0.739 0.744 0.719
• PerformanceofSpa2alclassifier
![Page 40: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/40.jpg)
Evalua2on
Procedures Time(ms) Procedures Time(ms) Feature
extracGon (pergrid)
Ft&Fh 53.2 Inference (pergrid)
SC 21.5 Fp 28.8 TC 13.1 Fr 14.4 Total 131
• Efficiencystudy• Singlegrid131s• InferringtheAQIsforen2reBeijingin5minutes
![Page 41: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/41.jpg)
Conclusion• Inferfine-grainedairqualitywith
– Real-2meandhistoricalairqualityreadingsfromexis2ngsta2ons– Otherdatasources:meteorology,POIs,roadnetwork,humanmobility,and
trafficcondi2on
• Co-Training-basedsemi-supervisedlearningapproach– Dealwithdatasparsitybylearningfromunlabeleddata– Modelthespa2alcorrela2onamongtheairqualityofdifferentloca2ons– Modelthetemporaldependencyoftheairqualityinaloca2on
• Results– 0.82withtrafficdata(co-training)– 0.76ifonlyusingspa2alclassifier
![Page 42: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:](https://reader034.vdocuments.us/reader034/viewer/2022051920/600d1c819e4d2c32e06299d9/html5/thumbnails/42.jpg)
Ques2ons?