the performance impacts of machine learning design choices ... · learning design choices for...
TRANSCRIPT
ThePerformanceImpactsofMachineLearningDesignChoicesforGridded
SolarIrradianceForecastingFeaturesworkfrom“EvaluatingStatisticalLearningConfigurationsforGridded
SolarIrradianceForecasting”,SolarEnergy,UnderReview.
DavidJohnGagneII,NCARSueEllenHaupt,NCARAmyMcGovern,UniversityofOklahomaJohnWilliams,TheWeatherCompany,anIBMBusinessSethLinden,NCARDougNychka,NCAR
1
Motivation:SolarIrradiance
Solarirradiancepredictionsareneededforsiteswithouthistoricaldata(Source:http://www.adventurecats.org/cat-tales/maine-coon-deaf-sailors-ears-sea/)
• Solarelectricitygenerationcontinuestogrowrapidlyanddecreaseincost• Accuratesolarirradiancepredictionsneededbyelectricutilitiestobalancesupplywithexpecteddemand• Solarpowerisbeinggeneratedmoreatsitesthatdonothaveobservationsorhistoricalrecordsofirradiance• Contributions
• DevelopedaGriddedAtmosphericForecastingSystem(GRAFS)forsolarirradiance
• Evaluateddifferentmachinelearningmodelconfigurationsforpredictiveaccuracyatunobservedsitesfordayaheadsolarirradianceforecasts
2
SolarForecastingIngredients
• Positionofsuninsky• Scatteringbyatmosphere&aerosols• Cloudcovereffects• Precipitation• Non-meteorologicalobstructions
Sun Position
Panel Orientation
Cloud PropertiesHeight
CoverageTransparency
Aerosols and Water Vapor(Turbidity)
Panel Obstructions
Panel Temperature
ShadingPrecipitation
SolarfactorsdiagramfromGagne(2014)
3
SolarData• NOAAGlobalForecastSystem(GFS)
• Interpolatedto4kmgrid• 3hourlyoutputinterpolatedintimetohourlyoutput
• Variables:Solarirradiance,temperature,cloudcover,sunangles,spatialstatistics
• EvaluationPeriod:June-August2015• OklahomaMesonet (McPhersonetal.2007)
• Sitesrecordsolarirradianceevery5minuteswithaLi-Cor pyranometer
• Hourly-averagedirradianceandclearnessindexcomputedfromrawobservations
• Clearnessindex:ratioofobservedirradiancetotop-of-atmosphereirradiance
4
MachineLearningConfigurations:Solar
• Mesonet stationsrandomlysplitinto“training”and“testing”sites
• Evaluationperiodsplitintotrainingandtestingdays:every3rd dayusedfortesting
• Models:RandomForest,GradientBoosting,LassoLinearRegression
• MultiSiteTraining• Onemachinelearningmodelfittedwithalltrainingsites’data• Appliedattestingsitesusinginputdatacollocatedwithsite
• SingleSiteTraining• Separatemachinelearningmodelsfittedateachtrainingsite• Predictionsmadeattrainingsitesandinterpolatedtotesting
siteswithCressman interpolation(Cressman 1959)• SimilartoapproachusedbyGriddedMOS(Glahn etal.2009)
NWP Model Output
Oklahoma Mesonet
5-minute solar irradiance
Calculate solar position and clear sky irradiance at
each site and time
Extract input variables at each
site
Calculate clearness index and hourly
means
Calculate neighborhood statistics for each variable
Match model and observation
data
Split sites into training and testing sets
Train machine learning models to predict clearness
index
Training site data
Testing site data
Apply machine learning models at testing sites
5
GradientBoostingRegression
• Stagewise,additivedecisiontreeensemble• Initialtreepredictsexactvalue,subsequenttreespredictresidualsoftotalpredictionsfromallprevioustrees• Usedbytop4finishersofAMSSolarEnergyPredictionContest
Irradiance>500?
0.1 0.8
Temperature>30?
-0.1 0.3
Dewpoint>2?
0.05 -0.03
+0.1*+0.1*
6
DetailedConfiguration
• RandomForest• Default:500trees,minsamplessplit10,features=sqrt• ShortTrees:maxdepth3• AllFeatures:features=all
• GradientBoosting• Default:loss=“lad”,500trees,maxdepth5,features=sqrt,learningrate=0.1• LeastSquares:loss=“ls”• BigTrees:minsamplesplit=10• AllFeatures:features=“all”• SlowLearningRate:learnrate=0.01
• LassoLinearRegression• Top16variablesbyF-Score,Alpha=0.5
7
Solar:GFSClearnessIndexError
8
GradientBoosting:OptimizeswithMAE,TreeDepthof5,SamplessubsetoffeaturesGradientBoostingLeastSquares:UsesMSEinsteadofMAEGradientBoostingAllFeatures:EvaluatesallinputfeaturesGradientBoostingSlowLearningRate:Usesalearningrateof0.01insteadof0.1GradientBoostingBigTrees:AllowstreestogrowtominimizetrainingsamplesineachbranchRandomForest:fullygrowntrees,evaluatessubsetoffeaturesRandomForestAllFeatures:evaluatesallfeaturesRandomForestShortTrees:treedepthof3LinearRegression:Lassowithtop16variablesRawGFS:DownwardshortwaveirradiancePersistence:Interpolatedirradianceattestsitesbasedonobservationsfrom24hoursbefore
GFSSolarDistributions
9
GFSForecastDistributions
10
GFSSolarStationErrors
11
NextSteps:DeepLearning
• Investigatingtheuseofdeeplearningmodelsforweatherfeatureandregimeidentification• Goal:TrainmodelstorecognizemultiscalefeaturesinNWPoutput• Potentialapplicationforimprovedsolarirradiancemeanandvariabilityforecastsbasedonweatherregime• Manyotherweatherandclimateapplications
12
DeepConvolutionalGenerativeAdversarialNetworkarchitecturefromRadfordetal.(2016)
GenerativeAdversarialNetworks
13
UnsupervisedmethodoflearningcomplexfeaturerepresentationsfromdataRequires2deepneuralnetworks
Discriminator:determineswhichsamplesarefromthetrainingsetandwhicharenot
Generator:Createssyntheticexamplessimilartotrainingdatatofooldiscriminator
Bothnetworkshavea“battleofwits”eithertothedeathor
untilthediscriminatorisfooledoftenenough
Advantages• Unsupervisedpre-training:learnfeatureswithoutneedingalargelabeleddataset• Dimensionalityreduction:reduceimagetosmallervector• Learnssharper,moredetailedfeatures thanautoencoder models• Donotneedtospecifyacomplexlossfunction
PreliminaryResults:MeanSeaLevelPressure
14
• Trainedon4096GEFSpressureforecasts• Produces”realistic”pressurefieldsafter
100epochsoftraining
• Generatoruses100-valuevectorasinput• Eachinputadjustdifferentpartsoffield
Summary• Developedgriddedstatisticalforecastingsystemforsolarirradiance• Evaluateddifferentmachinelearningmodelsandconfigurationsontheirabilitytopredictirradianceatmultiplesites• GradientBoostingconsistentlyshowedlowesterrors• Allmachinelearningmodelsunderestimatedcloudcoverfrequency• MLmodelshadlowererrorsatsiteswithfewerclouds• GenerativeAdversarialNetworksshowpotentialforextractinginformationfromweatherdata
15
Acknowledgements• RichLoft• TomHamill• TheOklahomaMesonet
ContactMe• Email:[email protected]• Twitter:@DJGagneDos• Github:github.com/djgagne