big data and deep learning platform for terabyte-scale ...yweng2/papers/2018pscc.pdfdue to the...

7
Big Data and Deep Learning Platform for Terabyte-Scale Renewable Datasets Yang Weng School of Elec., Comp. and Energy Eng. Arizona State University Tempe, United States [email protected] Abhishek Kumar School of Comp., Information and Decision Sys. Eng. Arizona State University Tempe, United States [email protected] Muhammad B. Saleem School of Elec., Comp. and Energy Eng. Arizona State University Tempe, United States [email protected] Baosen Zhang Dept. of Elec. Engineering University of Washington Seattle, United States [email protected] Abstract—Many renewable resources cover diverse geographi- cal areas and it is increasingly important to analyze large sets of data to understand their spatial and temporal behaviors. In this paper, we propose and demonstrate a data platform to efficiently manipulate and visualize data on the scale of terabytes. As the application of interest, we focus on visualization and forecasting of wind power over large geographic areas at various different spatial and temporal resolutions. In particular, we show how to balance the amount of data used and the need for computational efficiency in real-time applications. The main data set we use is the recently released terabyte wind dataset by NREL. Index Terms—Big data analytics, cloud platform, feature extraction, forecasting, and renewable generation I. I NTRODUCTION Recent and rapid deployment of renewables is dramatically changing electrical system planning and operations [1]. One of the most prominent transformations is the availability of data [2]. Because of the advancement of new sensors, commu- nication infrastructure and the penetration of renewables, the amount of data in power systems have experienced exponential growth in recent years and only expected to continue [3], [4], [5]. With vast amounts of data being generated, researchers and engineers need to address how to efficiently and quickly extract useful information from large datasets to improve the reliability, efficiency, and flexibility of the grid [6], [7], [8]. In the paper, we focus on wind data and propose a big data analytics platform to overcome two challenges when the datasets are in the terabyte-scale: visualization and forecast- ing [9]. Visualization is important because it is often useful to visually inspect both static information and dynamic changes in wind power generation over a large geographic area. By providing real-time visualization along with real-time data analytics, power system engineers and operators can quickly and intuitively assess the state of the system. In a large system such as the United States, millions of data points need to be visualized in real-time, resulting in nontrivial computational and data processing challenges [10], [11]. In this paper, we show how to use state-of-the-art hardware and software to build a platform that enables real-time visualization of wind data across the entire United States over various time periods. In addition to the visualization, forecasting is of vital importance to the operation of power systems, especially under high penetration of renewables [12], [13], [14]. In the area of wind power forecasting, many algorithms have been introduced. For example, [12] uses support vector regression on historical traces. [15] uses finite-state Markov chains to account for the diurnal non-stationarity and the seasonality of wind. [16] uses .For a more comprehensive review, the interested readers can refer to [13], [17] and the references within. In addition to using historical data of a single wind site, larger datasets allow us to use historical information over multiple locations to improve forecasting performances by bet- ter capturing the spatiotemporal correlations. However, finding a tractable stochastic model becomes a primary bottleneck. For example, a natural approach is to group data based on location, but nearby locations by latitude and longitude need not exhibit similar behaviors because of complex terrain. In this paper, we work with 4.5 terabytes wind energy time- series data from the National Renewable Energy Laboratory (NREL) [18]. At this scale, explicit modeling techniques such as the finite-state Markov chain method in [17] is no longer able to capture the dynamics in the data (without exponentially growing the number of states). To capture com- plex long-term dynamics, we make use of recursive neural networks (RNNs) [19]. These are essentially generalizations of deep neural networks that keep a “state” in memory. In particular, we adopt a modified version of RNN called the long short term memory networks (LSTMs) [20], [21]. These networks have been used extensively in natural language processing and machine vision applications and achieve some of the best performances [22]. Due to the computational complexity of deep learning involved in LSTM, this forecasting method requires special techniques to process a large amount of data [23]. Therefore, we deploy a Graphical Processing Unit-based computer, con- duct feature selection, and group data based on 3-dimensional nearest neighbors special for wind data set. Numerical compar- ison over the dataset shows good performance in the metric spaces of mean absolute percentage error and mean square error. The rest of the paper is organized as follows: Section II describes the dataset, data analytics, and the visualization platform; Section III defines the forecasting task, data prepro- cessing, and the proposed deep learning method; Section IV illustrates the numerical comparison results and section V concludes the paper.

Upload: others

Post on 10-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

Big Data and Deep Learning Platform forTerabyte-Scale Renewable Datasets

Yang WengSchool of Elec., Comp.

and Energy Eng.Arizona State University

Tempe, United [email protected]

Abhishek KumarSchool of Comp., Information

and Decision Sys. Eng.Arizona State University

Tempe, United [email protected]

Muhammad B. SaleemSchool of Elec., Comp.

and Energy Eng.Arizona State University

Tempe, United [email protected]

Baosen ZhangDept. of Elec.Engineering

University of WashingtonSeattle, United States

[email protected]

Abstract—Many renewable resources cover diverse geographi-cal areas and it is increasingly important to analyze large sets ofdata to understand their spatial and temporal behaviors. In thispaper, we propose and demonstrate a data platform to efficientlymanipulate and visualize data on the scale of terabytes. As theapplication of interest, we focus on visualization and forecastingof wind power over large geographic areas at various differentspatial and temporal resolutions. In particular, we show how tobalance the amount of data used and the need for computationalefficiency in real-time applications. The main data set we use isthe recently released terabyte wind dataset by NREL.

Index Terms—Big data analytics, cloud platform, featureextraction, forecasting, and renewable generation

I. INTRODUCTION

Recent and rapid deployment of renewables is dramaticallychanging electrical system planning and operations [1]. Oneof the most prominent transformations is the availability ofdata [2]. Because of the advancement of new sensors, commu-nication infrastructure and the penetration of renewables, theamount of data in power systems have experienced exponentialgrowth in recent years and only expected to continue [3], [4],[5]. With vast amounts of data being generated, researchersand engineers need to address how to efficiently and quicklyextract useful information from large datasets to improve thereliability, efficiency, and flexibility of the grid [6], [7], [8].

In the paper, we focus on wind data and propose a bigdata analytics platform to overcome two challenges when thedatasets are in the terabyte-scale: visualization and forecast-ing [9]. Visualization is important because it is often useful tovisually inspect both static information and dynamic changesin wind power generation over a large geographic area. Byproviding real-time visualization along with real-time dataanalytics, power system engineers and operators can quicklyand intuitively assess the state of the system. In a large systemsuch as the United States, millions of data points need to bevisualized in real-time, resulting in nontrivial computationaland data processing challenges [10], [11]. In this paper, weshow how to use state-of-the-art hardware and software tobuild a platform that enables real-time visualization of winddata across the entire United States over various time periods.

In addition to the visualization, forecasting is of vitalimportance to the operation of power systems, especiallyunder high penetration of renewables [12], [13], [14]. In thearea of wind power forecasting, many algorithms have been

introduced. For example, [12] uses support vector regressionon historical traces. [15] uses finite-state Markov chains toaccount for the diurnal non-stationarity and the seasonalityof wind. [16] uses .For a more comprehensive review, theinterested readers can refer to [13], [17] and the referenceswithin. In addition to using historical data of a single windsite, larger datasets allow us to use historical information overmultiple locations to improve forecasting performances by bet-ter capturing the spatiotemporal correlations. However, findinga tractable stochastic model becomes a primary bottleneck. Forexample, a natural approach is to group data based on location,but nearby locations by latitude and longitude need not exhibitsimilar behaviors because of complex terrain.

In this paper, we work with 4.5 terabytes wind energy time-series data from the National Renewable Energy Laboratory(NREL) [18]. At this scale, explicit modeling techniquessuch as the finite-state Markov chain method in [17] is nolonger able to capture the dynamics in the data (withoutexponentially growing the number of states). To capture com-plex long-term dynamics, we make use of recursive neuralnetworks (RNNs) [19]. These are essentially generalizationsof deep neural networks that keep a “state” in memory.In particular, we adopt a modified version of RNN calledthe long short term memory networks (LSTMs) [20], [21].These networks have been used extensively in natural languageprocessing and machine vision applications and achieve someof the best performances [22].

Due to the computational complexity of deep learninginvolved in LSTM, this forecasting method requires specialtechniques to process a large amount of data [23]. Therefore,we deploy a Graphical Processing Unit-based computer, con-duct feature selection, and group data based on 3-dimensionalnearest neighbors special for wind data set. Numerical compar-ison over the dataset shows good performance in the metricspaces of mean absolute percentage error and mean squareerror.

The rest of the paper is organized as follows: Section IIdescribes the dataset, data analytics, and the visualizationplatform; Section III defines the forecasting task, data prepro-cessing, and the proposed deep learning method; Section IVillustrates the numerical comparison results and section Vconcludes the paper.

Page 2: Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

II. DATA PREPROCESSING AND ANALYTICS

A. Data Description

We use the recently released wind dataset by NREL thatinclude wind speed and wind power data, which covers mostof the continental United States (with some coastal areas) [18].The total size of the data is approximately 4.5 terabytes (TB).Fig. 1 shows the visualization of the data in two spatial scales.Fig. 1a displays the data across United State, where each greendot is a location with wind speed and power data. Our platformallows robust zoom operations (less than 1 second per zoom),and Fig. 1b shows the clustered data in a local region byzooming in several times to the west of New York City.

(a) Visualization of wind power generation data across United States (U.S.)Each green dot is a location with wind speed and power information.

(b) Zoomed in view of Western New York.Figure 1. Illustration of the data visualization platform.

On this map, each dot represents one location. In total,the dataset consists of around 126, 693 points (wind sites)which comprise of existing and probable locations for windenergy development and their corresponding 7 years data.More specifically, hourly data is available from Jan 1, 2007,to Jan 1, 2014 at each location. The data is stored in the formof NetCDF4 files (.nc files). Each file corresponds to a singlesite. The contents of each NetCDF4 file includes the site’sattributes, dimensions, and variables. The attributes of a fileinclude information such as the latitude and longitude of thecorresponding point, the starting time of the time series, andthe time duration between successive points in the time series

which is 5 mins. The dimensions contain the size of the timeseries which is 736, 416 observations in 7 years. The variablesinclude time series of wind speed, wind direction, power, airdensity, air temperature, air pressure. Part of the variables isdescribed in Table I.

To make the data more amiable to analysis and powersystem operations (e.g., the forecasting application describedin Section III), features often need to be extracted from theraw data. Our platform provides a convenient framework forfeature selection.

In Fig. 2, we plot the average wind power vs time, thetype of wind farms, and the relationship between wind speedand wind power. Focusing on the bottom right figure, wesee that the wind speed-power relationship can be clusteredinto a few groups (indicated by different colors). Within eachgroup, there is a strong linear relationship between wind speedand power. This observation would potentially allow plannersto get some quick intuitive feelings based on wind speeddata without working through potentially complicated speedto power conversion functions. Our platform enables one tovisualize many additional relationships as defined by the user.B. Data Processing Platform

To provide real-time visualization in the above and thesubsequent algorithms on forecasting, some special hardwareand software is currently required, e.g., to host virtual envi-ronments (VE). For hardware, we deploy a GPU-based servernamed Precision Tower 7000 Series (7910). Its CPU is theIntel Xeon Processor E5-2687W-v4 (12c, 3.0GHz, 3.5GHzTurbo, 2400MHz, 30MB, 160W) with 12 cores. The serveruses 12 cores-based GPU named NVIDIA Quadro M40008GB (4DP) GPU. Instead of sequential processing, a graphicsprocessing unit (GPU) structures in a massively and parallelarchitecture. It consists thousands of smaller (operating at alower frequency) but more efficient GPU cores. These coreswere designed to handle many tasks simultaneously. Therefore,a GPU can be used as a co-processor to significantly acceler-ate CPUs to offload some computational-intensive and time-consuming sub-tasks. Such a cooperation has been used toaccelerate engineering applications such as deep learning [24].

On the software side, we have a GPU-based platform forvisualization. This is because, for terabyte level computingand visualization, it is hard to process the data fast enoughthrough CPUs. CPUs are processing cores that are optimizedfor a series tasks. When a large amount of data with simi-lar processing requirement, executing commands in a seriesfashion tremendously increases the processing time.

For the visualization platform, an almost continuous anima-tion is achieved by updating the display using bitmap copies.The effect of the animation is to make the system appear to“come to life”. The trouble facing the current energy industryis that much of data exploration and analytical platforms weredesigned to accommodate millions of rows in a reasonabletimeframe. But, with four terabyte level datasets, there arebillions of rows of data. To process them still with sub-second latency is not possible using current software. Toovercome this challenge, we use MapD with an SQL database

Page 3: Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

TABLE IDATA AVAILABILITY FOR EACH LOCATION

type total number range (approximate) starting time ending time unitswind speed 736, 416 observations (7 years) 0− 50 km/h 1st Jan 2007 00 : 00 1st Jan 2014 00 : 00 m/s

wind direction 736, 416 observations (7 years) 0− 360 degrees 1st Jan 2007 00 : 00 1st Jan 2014 00 : 00 degrees (bearing)power 736, 416 observations (7 years) 0− 16MW 1st Jan 2007 00 : 00 1st Jan 2014 00 : 00 MW

air density 736, 416 observations (7 years) 1.1− 1.3 kg/m3 1st Jan 2007 00 : 00 1st Jan 2014 00 : 00 kg/m3

air temperature 736, 416 observations (7 years) 185− 330K 1st Jan 2007 00 : 00 1st Jan 2014 00 : 00 Kelvinair pressure 736, 416 observations (7 years) 99, 500− 104, 000 Pascal (Pa) 1st Jan 2007 00 : 00 1st Jan 2014 00 : 00 Pa

Figure 2. Examples of different data analytic features. The top figure plots the average power v.s. time, the lower left figure shows different types of windsites in the United States, and the bottom right figure plots the wind speed-power relationship.

that leverages the parallel processing power of GPUs [25].It can query billions of rows in milliseconds, hundreds tothousands of times faster than existing CPU databases. Due toits utilization of GPU, MapD can display billions of data pointswithin seconds to allow engineers and analysts to visually gaininsight from massive data sets.

For computing, we also leverage GPU-based operations. Forexample, OpenCL is most widely adopted GPU language forcomputing and the most used platform for OpenCL is calledNvidia CUDA [26]. We coded in a way such that the terabytelevel data can assign a significant amount of wind generationdata processing to GPU. The CPU will handle the remainderof the code in where fast series execution is needed.

III. WIND POWER FORECASTING AND DATA ANALYTICS

We define the wind generation as follows: Given the his-torical generation xi, temperature ti, wind speed wi, latitudelai, and longitude loi, etc., where i is the time index, findthe forecasted power y5−min, y1−hour, y1−day for 5 minutes,1 hour, and 1 day forecasting horizons.

A. A Three Dimensional Nearest Neighbors ApproachOne could use the historical data from all locations to

forecast the wind power generation at a single site. However,

such a prediction algorithm is both computationally time-consuming and inefficient. Geographically far away locationsusually have much less impact on the targeted area, so it isuseful to consider ways of grouping the information in toenable a smaller set of data to be analyzed. For example,North American Electric Reliability Council (NERC) uses theflow-gate idea for considering a localized set. This methodrelies on the intuition that with the same topology, close-bylatitude and longitude usually produce similar wind speed.Therefore, a smaller 2D distance implies that the associatedwind generation data, as well as other features, are similarto each other with high probability. Thus, a group of nearestneighbors is obtained instead of a single historical data point(K-nearest neighbors). The tuning parameter K can be chosenby cross-validation [23].

However, for wind power forecasting, the altitude (hubheight) is also a critical factor [27], [28]. For example, windtends to flow at the same speed for the same altitude, especiallyin the mountain area. Fig. 3a shows such a terrain, wherewind may flow around the mountain. Fig. 3b indicates a resultfrom the dataset, where points that are further away in thesense of longitude and latitude get a higher correlation. Theinformation on these three points is listed in Table II and Table

Page 4: Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

(a) Example of complex terrain

(b) Three wind sites, deeper color indicates higher correlation.Figure 3. Altitude (or hub height) is an important factor correlating windpower sites. In (b), two locations further way have strong correlations.

III. Therefore, we modify the existing K nearest neighborTABLE IIALTITUDE

point latitude longitude altitude average powerA 44.63 −96.41 509.7 7.63B 44.65 −96.22 391.7 6.64C 44.83 −96.54 506.9 7.51

TABLE IIIAN EXAMPLE

pair distance (km) Pearson Correlation CoefficientAB 15.2 0.892AC 24.5 0.943BC 32.2 0.889

(K-NN) algorithm to include altitude information as well.

l = arg min|l|=K

d(l) =∑k∈l

||lcur. − lk||22, k ≤ Q, (1)

i.e., minimizing sum distance function d(s). Here Q is thenumber of total data points in the database, N represents theset of natural number, k represents a particular index for a datapoint. As a result, lk shows the measurement set within thetime slot of index k. Finally, K indicates the cardinality of theset s. Mainly, during the searching step, the algorithm simplylooks for an index set s with K elements which represents agroup of pseudo-measurement vector sets that have a nearer

distance to the current measurement lcurrent. Since the datasetdoes not have altitude information, we make use of the GoogleMaps Elevation API1 to obtain altitude at each of the possiblewind sites. Then, we solve (1) to obtain clusters based on3-dimensional distances.

The flowchart in Fig. 4 describes the solution procedurefor (1). Fig. 5 shows an example of the clustering algorithmapplied to the offshore locations just east of New York.

Figure 4. Nearest neighbor flowchart.

Figure 5. Clustering of the wind sites off the cost of New York. Here K = 3.B. Long Short Term Memory Networks

As the mapping from historical data on different features(wind generation, wind density, etc) to the wind generationin the future is highly nonlinear, deep learning is a promisingmethod to deal with the nonlinearity. The core of deep learningis a neural network, where the hidden neuron pass the inputto the hidden layer output via hj = φ(

∑i wijxi) = φ(Wxj),

where φ is the activation function, e.g., sigmoid or tanhfunction. Then, the real output is based on the hj via y = V h.The activation function captures the nonlinearity.

1https://developers.google.com/maps/documentation/elevation/start

Page 5: Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

However, to describe wind power, it is crucial for memoryto exist between neighbor neurons to capture the time depen-dence of the physical process. The recursive neural network,shown in Fig. 6a, is designed for this purpose. For such amodel, the hidden neuron input-output relationship changes toht = φ(Wxj + Uhj), where the new input not only dependson xj , but also depends on the output of the neuron outputhj , which builds up the memory.

(a) Recursive neural network (RNN).

(b) Long short term memory network (LSTM).Figure 6. Comparison between a RNN network and a LSTM network [29].

In such a model, RNN needs some past context to predictthe current output, but the issue is how much. It seems thatsometimes, less context is sufficient, but it is entirely possiblethat gap between the relevant information and the point whereit is needed become very large. This is the issue of long-termdependencies, which is almost impossible to capture usingclassical linear or Markov based systems [30]. In theory, RNNsshould be capable of learning memories of arbitrary length, butin practice, they don’t seem to learn them [31]. In the mid-90s, a variation of the recurrent net with so-called Long Short-Term Memory units, or LSTMs, was proposed as a solution tothe vanishing gradient problem. Different than RNN, LSTMin Fig. 6b introduces a state C and three nonlinear “gate”mechanisms. On the left-hand side, it adds a forgetting gatefor the past information: ft = σ1(Wxj + Uhj). Then, itcreates an input gate to decide what new information to bestored: it = σ2(Wxj + Uhj). Then, it uses ft, it, and C =tanh(Wxj+Uhj) to adapt the state Ct = ft×Ct−1+it×Ct,which is shown on the right-hand side of Fig. 6b. Finally, theoutput of the neuron is ht = σ3(Wxj + Uhj) × tanh(Ct).In summary, LSTM keeps two pieces of information as itpropagates through time: A hidden state as the memory andthe previous time-step output. During the propagation, LSTMaccumulates using its forget, input, and output gates over time.

Remark 1. There are various deep learning methods, suchas Convolutional Neural Networks (CNN), Recurrent neuralnetwork (RNN), and Long Short-Term Memory (LSTM). In thiswork, we choose LSTM. This is because power systems have

a periodic pattern, e.g., yearly pattern. Therefore, memorizingand utilizing historical data that are far away from thecurrent operating condition is important. Due to the long-termproperty of LSTM, we choose it for this work.

IV. NUMERICAL RESULTS

In this section, we present several numerical results. We notethat the LSTM network may overfit the training data. There-fore, we use dropout as a regularization method where inputand recurrent connections to LSTM units are probabilisticallyexcluded from activation and weight updates while training anetwork [32]. This has the effect of reducing overfitting andimproving model performance. Finally, we shuffle the trainingdata for preprocessing to improve the accuracy.

A. TrackingWe show that the LSTM method can accurately track the

shape and the magnitude of the wind power output for variousforecast time horizons. For example, Fig. 7 shows the trackingperformance for both 5 minutes and 1 hour ahead predictions.

B. Error AnalysisNext we report the numerical errors using our approach.

Fig. 8 shows the bar plot of root mean square percentage errorand mean absolute percentage error for day-ahead prediction.We compare with the internal method used by NREL [18]and provided in the dataset. Table IV shows that our erroris consistently smaller for all error metrics. Next, we have

TABLE IVERROR COMPARISON WITH THE METHOD IN [18].

forecasting horizon result type metric error1 hour ahead [18]’s method RMSE 2.351 hour ahead lstm method RMSE 1.781 hour ahead [18]’s method MAE 1.421 hour ahead lstm method MAE 1.011 day ahead [18]’s method RMSE 6.521 day ahead lstm method RMSE 3.421 day ahead [18]’s method MAE 4.701 day ahead lstm method MAE 2.50

implemented two of the most popular existing wind powerforecasting algorithms: linear regression [33] and weightedaveraging [34]. Fig. 9 shows the comparison between thesetwo methods and our proposed method both for 1 hourahead and day ahead forecasting. The proposed LSTM methodconsistently outperforms the other two methods.

Remark 2. From the numerical analysis above, the forecasterrors given by the LSTM are much less than conventionalmethods. Such a result is due to two factors: (1) LSTMremembers long-term pattern for such a long historical dataperiod, e.g. 7 years, and (2) our big data analysis uses not onlythe data in the interested location but also the data from manyneighbor locations that have similar geographical features tothe interested locations.

C. Error visualization over the U.S.The capability of our platform is that it allows engineers to

view finer grained picture than just the average error. In Fig.10a, we show the mean absolute error across U.S. By directobservation, the mountainous areas in the central U.S. turn out

Page 6: Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

(a) 5 mintues estimation (b) 1 hour estimaton.Figure 7. Tracking between the predicted and the actual realized wind power time series for both 5 minutes and 1 hour lead times.

(a) Bar plot of RMSPE. (b) Bar plot of MAPE.Figure 8. Histogram bar plot for RMSPE and MAPE. Most of the errors are fairly low.

(a) Hour ahead error in % (b) Day ahead error in MWFigure 9. Forecasting error comparison between our proposed method based on LSTM and linear regression and weighted averaging.

to be harder forecast, which is consistent with altitude beingan important factor. In Fig. 10b, we show the error versuswind power production size grouped by different states. Fromthis figure, we immediately identify that some states have bothlarger average powers and smaller estimation error (e.g., NorthDakota and New Hampshire), while others are not as attractivefrom an error point of view (e.g., District of Columbia).

V. CONCLUSION

In the paper, we demonstrate a big data analytics platformusing about 4.5 terabytes of wind data from NREL. Thisplatform leverages GPU computing and deep neural networksfor real-time visualization and large-scale forecasting. Specif-ically, feature extraction and selection are used in the featurespace for data reduction and altitude information was used to

Page 7: Big Data and Deep Learning Platform for Terabyte-Scale ...yweng2/papers/2018PSCC.pdfDue to the computational complexity of deep learning involved in LSTM, this forecasting method requires

(a) MAPE across U.S.

(b) Forecasting error versus average power in different states.Figure 10. Visualization of forecast errors of various locations in the US.

develop a three-dimensional clustering algorithm for identify-ing nearest neighbors. A special type of deep neural networks,called long short term memory network, is implemented forwind forecasting for 5 minutes, 1 hour, and 1 day ahead.Our analytic approach is robust in the sense it allows systemoperators identify the necessary feature information, so thatdifferent types of features can be used in difficult applicationsof interest.

REFERENCES

[1] Y. Weng and R. Rajagopal, “Probabilistic baseline estimation via gaus-sian process,” pp. 1–5, 2015.

[2] Y. Weng, R. Negi, and M. D. Ilic, “Historical data-driven state estimationfor electric power systems,” pp. 97–102, 2013.

[3] D. Liu, G. Li, R. Fan, and G. Guo, “Research about big data platformof electrical power system,” in International Conference on IndustrialIoT Technologies and Applications. Springer, 2016, pp. 36–43.

[4] N. Yu, S. Shah, R. Johnson, R. Sherick, M. Hong, and K. Loparo, “Bigdata analytics in power distribution systems,” in Innovative Smart GridTechnologies Conference (ISGT), 2015 IEEE Power & Energy Society.IEEE, 2015, pp. 1–5.

[5] Y. Weng, C. Faloutsos, M. D. Ilie, and R. Negi, “Speed up of data-drivenstate estimation using low-complexity indexing method,” pp. 1–5, 2014.

[6] M. E. Dyson, S. D. Borgeson, M. D. Tabone, and D. S. Callaway, “Usingsmart meter data to estimate demand response potential, with applicationto solar energy integration,” Energy Policy, vol. 73, pp. 607–619, 2014.

[7] M. Kezunovic, L. Xie, and S. Grijalva, “The role of big data inimproving power system operation and protection,” in IEEE Bulk PowerSystem Dynamics and Control-IX Optimization, Security and Control ofthe Emerging Power Grid (IREP), 2013 IREP Symposium, 2013, pp.1–9.

[8] Y. Weng, R. Negi, C. Faloutsos, and M. D. Ilic, “Robust data-driven stateestimation for smart grid,” IEEE Transactions on Smart Grid, vol. 8,no. 4, pp. 1956–1967, 2017.

[9] X. Zhang, G. He, S. Lin, and W. Yang, “Economic dispatch consideringvolatile wind power generation with lower-semi-deviation risk measure,”pp. 140–144, 2011.

[10] P. Cuffe and A. Keane, “Visualizing the electrical structure of powersystems,” IEEE Systems Journal, vol. 11, no. 3, pp. 1810–1821, Sep.2017.

[11] T. J. Overbye and J. D. Weber, “New methods for the visualization ofelectric power system information,” in IEEE Symposium on InformationVisualization 2000. INFOVIS 2000. Proceedings, 2000, pp. 131–16c.

[12] O. Kramer, F. Gieseke, and B. Satzger, “Wind energy prediction andmonitoring with neural computation,” Neurocomputing, vol. 109, pp.84–93, 2013.

[13] A. M. Foley, P. G. Leahy, A. Marvuglia, and E. J. McKeogh, “Cur-rent methods and advances in forecasting of wind power generation,”Renewable Energy, vol. 37, no. 1, pp. 1–8, 2012.

[14] Y. Weng, R. Negi, and M. D. Ilic, “Graphical model for state estimationin electric power systems,” pp. 103–108, 2013.

[15] M. He, L. Yang, J. Zhang, and V. Vittal, “A spatio-temporal analysisapproach for short-term forecast of wind farm generation,” IEEE Trans-actions on Power Systems, vol. 29, no. 4, pp. 1611–1622, July 2014.

[16] Y. Dvorkin, M. Lubin, S. Backhaus, and M. Chertkov, “Uncertaintysets for wind power generation,” IEEE Transactions on Power Systems,vol. 31, no. 4, pp. 3326–3327, 2016.

[17] Y. Zhang, J. Wang, and X. Wang, “Review on probabilistic forecastingof wind power generation,” Renewable and Sustainable Energy Reviews,vol. 32, pp. 255–270, 2014.

[18] Vaisala, Inc., “Final report on the creation of the wind integrationnational dataset (wind) toolkit and api,” NREL, Tech. Rep., 2016.

[19] R. Socher, C. C. Lin, C. Manning, and A. Y. Ng, “Parsing natural scenesand natural language with recursive neural networks,” in Proceedings ofthe 28th international conference on machine learning (ICML-11), 2011,pp. 129–136.

[20] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:Continual prediction with lstm,” 1999.

[21] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,2016.

[22] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practicalmachine learning tools and techniques. Morgan Kaufmann, 2016.

[23] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of StatisticalLearning: Data Mining, Inference, and Prediction, 2nd ed. Springer,Feb. 2009.

[24] P. Kennedy, “Nvidia deep learning / AI GPU value compari-son,” 2017, https://www.servethehome.com/nvidia-deep-learning-ai-gpu-value-comparison-q2-2017/.

[25] T. Mostak, “Mapd,” https://www.mapd.com/, 2017.[26] B. Gaster, L. Howes, D. R. Kaeli, P. Mistry, and D. Schaa, Heteroge-

neous Computing with OpenCL. Morgan Kaufmann, 2012.[27] B. G. Brown, R. W. Katz, and A. H. Murphy, “Time series models to

simulate and forecast wind speed and wind power,” Journal of Climateand Applied Meteorology, vol. 23, no. 8, pp. 1184–1195, 1984.

[28] D. Siuta, G. West, R. Stull, and T. Nipen, “Calibrated probabilistic hub-height wind forecasts in complex terrain,” Weather and Forecasting,vol. 32, no. 2, pp. 555–577, 2017.

[29] C. Olah, “Understanding lstm networks,”http://colah.github.io/posts/2015-08-Understanding-LSTMs/, 2015.

[30] Y. Bengio and P. Frasconi, “An input output hmm architecture,” inAdvances in neural information processing systems, 1995, pp. 427–434.

[31] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of trainingrecurrent neural networks,” in International Conference on MachineLearning, 2013, pp. 1310–1318.

[32] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov, “Dropout: a simple way to prevent neural networksfrom overfitting.” Journal of machine learning research, vol. 15, no. 1,pp. 1929–1958, 2014.

[33] G. Sideratos and N. D. Hatziargyriou, “An advanced statistical methodfor wind power forecasting,” IEEE Transactions on power systems,vol. 22, no. 1, pp. 258–265, 2007.

[34] C. Monteiro, R. Bessa, V. Miranda, A. Botterud, J. Wang, G. Conzel-mann et al., “Wind power forecasting: state-of-the-art,” Argonne Na-tional Laboratory (ANL), Tech. Rep., 2009.