modeling and optimization of no emission from a 660 mw

566 Journal of Chemical Engineering of Japan Copyright © 2021 The Society of Chemical Engineers, Japan

Journal of Chemical Engineering of Japan, Vol. 54, No. 10, pp. 566–575, 2021

Modeling and Optimization of NOx Emission from a 660 MW Coal-Fired Boiler Based on the Deep Learning Algorithm

Yingnan Wang, Ruibiao Xie, Wenjie Liu, Guotian Yang and Xinli LiSchool of Control and Computer Engineering, North China Electric Power University, Beijing, 102206, Beijing, China

Keywords: Recurrent Neural Network, LSTM, GRU, GWO, NOx Emission, Model Optimization

With the increasingly strict environmental protection policies, restrictions on NOx emissions are becoming increasingly stringent. This paper focuses on modeling and optimizing NOx emission for a coal-�red boiler with advanced deep learn-ing approaches. Three types of deep recurrent neural network models, including recurrent neural network (RNN), long short-term memory (LSTM), and gate recurrent unit (GRU), are developed to model the relationship between operational parameters and NOx emission of a 660 MW boiler. The hyperparameters of the models are selected by grid search and the e�ects of the hyperparameters on the prediction results are analyzed. Compared with the traditional back propagation neural network (BP), support vector machine (SVM) models and deep belief network (DBN), the deep recurrent neural network models have higher prediction accuracy. The experimental results show that the GRU-based NOx prediction model has the best prediction performance among the proposed models. Then, the predicted NOx emission is used as the objective of searching the optimal parameters for the boiler combustion through the grey wolf optimization (GWO) algorithm. The searching process of GWO is convergent. According to the simulation results, the declines in the NOx emis-sions in the two selected cases were 19.49% and 17.96%, which are reasonable achievements for the boiler combustion process.

Introduction

Coal-�red thermal power plants across China generated nearly 67% of national electricity in 2018 (BP, 2018), are integral to the reliability and capacity of the current electri-cal grid. However, a drawback of coal-�ring is the pollutants released from the combustion process. Among them, NOx causes severe environmental problems, such as acid rain, photochemical smog, and also harms human health (Skalska et al., 2010). Increasingly stringent NOx emission regula-tions limit the maximum acceptable emission values (PRC, 2019), which pushes manufacturers to make innovations by introducing new means to reduce NOx emissions. Since no additional modi�cation of the boiler is needed, the combus-tion optimization technique has drawn increasing attention from the research community.

Combustion optimizations are implemented by advanced control strategies using models of the combustion process to perform optimization, which is a demonstrated method of reducing NOx emission (Wang et al., 2018b). Firstly, estab-lish a NOx emission prediction model based on boiler op-eration parameters. On this basis, the operation parameters are optimized and adjusted to achieve low NOx combus-tion. �us, a reliable and accurate prediction NOx emission

model is critical to solving the problem.�e formation mechanism of NOx in the furnace is very

complex, it involves a variety of chemical reactions and thermal phenomena, such as turbulence and turbulence transfer processes, combustion, heat, and mass transfer (Xie et al., 2020). �e construction of an ideal mechanism model to predict the dynamics of NOx emissions is still challeng-ing. �e data-driven approaches that map the relationships between the input and output using only data provide a new way to solve this problem. Many scholars have focused their e�orts on this �eld. Some scholars (Liukkonen et al., 2011; Krzywański and Nowak, 2017) have introduced the arti�cial neural network (ANN) to model NOx emissions of pulverized coal-�red powerplants. Ilamathi et al. (2013) developed an ANN model for NOx emission prediction based on the experimental data from a 210 MW pulverized coal-�red boiler and obtained an optimum level of operating conditions corresponding to low NOx emission combined with a genetic algorithm (GA) approach. Tuttle et al. (2019) mapped the relationship between the operational param-eters and NOx emission through a genetic algorithm-opti-mized ANN model. In the literature (Li et al., 2013; Lv et al., 2013, 2015; Wei et al., 2013; Tan et al., 2016b), the applica-tions of support vector machines (SVM) and their variants for modeling NOx emission are also popular. Lv et al. (2013) obtained data over one week from a 660 MW coal-�red boil-er under stable operating conditions and proposed a novel least squares support vector machine (LSSVM)-based en-semble learning to predict NOx emission. Tan et al. (2016b) developed an SVM model combined with principal compo-

Received on January 8, 2021; accepted on July 30, 2021DOI: 10.1252/jcej.21we004Correspondence concerning this article should be addressed to Y. Wang (E-mail address: [email protected]).ORCiD ID of R. Xie is https://orcid.org/0000-0002-2297-8478

Research Paper

Vol. 54 No. 10 2021 567

nent analysis (PCA) for NOx emission prediction, and the data was acquired from a 1000-MW coal-�red power plant to validate the model. Besides, extreme learning machines (ELMs) (Li and Niu, 2016; Tan et al., 2016a) and Gaussian Process (GP) (Wang et al., 2018a) have been introduced to construct predictive models for operating parameters and NOx emission. �ese modeling methods are mainly con-cerned with the evaluation of model and optimization per-formance on modeling the NOx emissions on steady-state conditions. So a few studies have attempted to develop dy-namic models with data-driven methods (Liukkonen et al., 2012; Smrekar et al., 2013; Ahmed et al., 2015; Safdarnejad et al., 2019). Tang et al. (2018) developed a DBN model to predict the key variables of a boiler-turbine unit. Data pre-processing was �rst carried out using Pearson correla-tion coe�cient and principal component analysis. �en, a multi-layer DBN network was used to predict the three key parameters of steam pressure, steam �ow and active power for the boiler-turbine unit. Shi et al. (2019) constructed an ANN model using Computational Fluid Dynamics (CFD) simulation data and historical operation data to describe the boiler operation and emission characteristics. A conven-tional genetic algorithm was then used for combustion op-timization to obtain a better boiler air distribution solution.

However, an important shared characteristic of all these studies is that the optimal order of delay must be deter-mined in advance, which is not simple work. Moreover, using the content related to delay as model input makes the dimension increase rapidly, which increases the di�culty of modeling NOx emission.

Considering that the NOx emission has characteristics such as non-linear, delay, and time-sequential, it is very suit-able to use the recurrent neural network (RNN) to construct the dynamic NOx emission model of a coal-�red boiler. �e RNN introduces the concept of time series into the design of the network structure, making it more adaptive in model-ing the time series data (Mocanu et al., 2016; Zhang et al., 2018). As a famous deep learning model, RNN provides more powerful feature representation and mapping capa-bilities to model time sequence, because it has more hidden layers to learn the relationship between input and output vectors (Wen et al., 2020). Besides, RNN with the gated re-current unit (GRU) and long short-term memory (LSTM) unit, is chosen as the network structure to process long-term time series data in this study. Only a few studies have investigated RNNs, LSTMs, or GRUs to handle time-series data to predict NOx emission. In the literature (Tan et al., 2019; Yang et al., 2020), based on the historical operation data of coal-�red boilers, LSTM is applied to the NOx emis-sion prediction of coal-�red boilers. LSTM is an important variant of RNN, which tracks the dynamic time behavior of time series through its own unique network structure. Cheng et al. (2018) presented a coal-�red boiler combus-tion optimization system. In this, LSTM network was �rst employed to extract boiler operating features to model the boiler e�ciency and pollutant emission, and then a deep Q-network (DQN) was used as the online optimizer to search

for optimal operating parameters. �is study proposes this deep learning-based approach for predicting the NOx emis-sion of a coal-�red boiler. Historical operation data from a power station is used to train and test the prediction models. RNN, LSTM, and GRU are applied respectively to establish the NOx emission prediction models, and the performance of the three networks is compared and analyzed. Based on the NOx emission predicted by the selected GRU model, a combustion optimization scheme is proposed to minimize NOx emission during the combustion process. And, the grey wolf optimization (GWO) algorithm is also used as an opti-mizer in the procedure.

�e rest of the paper is organized as follows. In Section 1, the boiler and observational data are introduced. Section 2 illustrates RNN, LSTM, GRU, the proposed model archi-tecture, and model evaluation criteria. �e optimization of combustion is also proposed in Section 2. Section 3 presents the experimental results and discussion. Finally, we con-clude our work in the Conclusions.

1.　Boiler Description and Data Preparation

1.1　Structure of boiler�e research object of this paper is a coal-�red 660 MW

boiler with ultra-supercritical parameters, which adopts an opposed-wall-�ring mode. �e boiler has a 19.08×19.08 m2 cross-section furnace and a height of 65.1 m, which belongs to one unit of the Heqi power plant in Henan Province, China. A total of six layers of burners are arranged (three layers on each of the front and back walls), each layer has six low NOx swirl burners, and the front wall is A, B, and C lay-ers from bottom to top, correspondingly, the back wall is D, E, and F. Each medium-speed coal mill provides pulverized coal for six burners in the same layer. Two layers of over �re air (OFA) nozzles are arranged above swirl burners on the front and back walls, respectively. �e schematic diagram of the boiler is shown in Figure 1.

1.2　Data preparation�e data are acquired from the distributed control sys-

tem (DCS) of the power plant, the boiler load of which is varied from 300 to 660 MW with comprehensive data cover-age. �e sampling interval of the data is 5 s, and more than 10,000 operation data of the boiler are collected. Figure 2

Fig. 1　Schematic diagram of the boiler

568 Journal of Chemical Engineering of Japan

indicates the correlation between the unit load and the NOx concentration at the SCR inlet. As shown in Figure 2, the se-lected samples cover the normal operation data for full-load conditions. �ere are many factors a�ecting the formation of NOx, such as coal quality, air/coal ratio, and temperature in the main combustion zone. Besides, for a given boiler, the combustion operation of the boiler will have a great in�u-ence on NOx emissions. �e combustion optimization is mainly to adjust the operating parameters to achieve good NOx emission. Considering the basic knowledge of the NOx formation and the suggestions of the engineer, the input variables which contain 25 variables related to the boiler operation are selected. �e list of variables is presented in Table 1.

2.　Methodology

2.1　Recurrent Neural NetworkRNN was �rst developed in the 1980s (LeCun et al.,

2015), which is specialized for processing a sequence of data. RNN has an internal self-looped structure, which of-fers dynamic behavior by building the temporal relationship between the latest states and the previous states. In other words, the states produced by an RNN at time t−1 will have an impact on the states produced by the RNN at time t. At the same time, every layer of RNN will share its parameter values in the time step of the whole sequence. �is charac-teristic makes it possible to process sequences of variable length, i.e., scale to much longer or shorter sequences.

�e traditional RNN has a chain structure of repeating modules, as shown in Figure 3, which can help RNN “re-member” previous information. �e output at time t is de-termined by both the input at time t and the output at time t−1, which can be derived using the following equations.

ht = tanh (Uxt +Wht−� + bh) (1)

ot = Vht + bo (2)

Herein, xt denotes the input vector at time t, ht−1 represents the hidden cell state at time t−1, ot denotes the output at time t, parameters bh and bo are the bias vectors, W, U and V are the weight matrices for corresponding layers of RNN, respectively, and tanh(x) is a nonlinear active function.

�e gradient of the RNN can be calculated by back propa-gation through time (BPTT). However, RNN is not suf-�ciently e�cient to learn a pattern from long term depen-dency because of the vanishing or exploding gradient. �is problem can be solved by the structure of LSTMs or GRUs (Hochreiter and Schmidhuber, 1997; Kolen and Kremer, 2001).

2.2　Long Short-Term Memory networkLSTM is a special kind of RNN that models sequential

data (Graves, 2012). In order to solve the vanishing gradient problem of RNN in the process of operation, LSTM replaces RNN cells in a hidden layer with LSTM cells on the basis of the original network structure and resets the calculation node. As shown in Figure 4, the structure of the LSTM cell is more complicated. On the basis of RNN, the hidden layer adds self-connected storage and multiplication units, which can provide reading, writing and judging functions for the neural network.

�e LSTM cell has three control gates, consisting of a forget gate, an input gate, and an output gate. As shown in Figure 4, the input vector xt, the output from the previous LSTM unit ht−1 and the cell state previous of LSTM unit ht−1 are input to the LSTM through three gate structures. Each gate calculates the input information and determines

Fig. 2　Variations of unit load and NOx concentration

Table 1　�e inputs of the models

Inputs Units Descriptions

Load MW �e load of the boilerTar t/h Total air rateC1,C2,C3,C4,C5,C6 t/h Six coal feeder ratesP1,P2,P3,P4,P5,P6 t/h Six primary air �ow ratesS1,S2,S3,S4,S5,S6 t/h Six second air �ow ratesOFA1,OFA2,OFA3,OFA4 t/h Forth OFA �ow ratesO2 % Oxygen concentration

at the out let of the furnace

Fig. 3　Structure of RNN

Fig. 4　Structure of LSTM

Vol. 54 No. 10 2021 569

whether it is activated according to its logical function. Among them, the forget gate f discards information that the memory cell considers not important, the input gate i iden-ti�es the new information to be retained in the cell state and updates it with the activation functions, and the output gate o determines the output according to the state of the cell. �e process can be expressed by the following formula.

it = σ (Wixt +Uiht−� + bi) (3)

ft = σ (Wf xt +Uf ht−� + b f ) (4)

ct = ft ct−� + it tanh (Wcxt +Ucht−� + bc) (5)

ot = σ (Woxt +Uoht−� + bo) (6)

ht = ot tanh (ct) (7) Herein, ct denotes the cell state of LSTM at time t, ht is the output of the calculation unit at time t, W and U denote the parameter matrices, b is the bias vector, it, ft and ot denote the input, forget and output gates, respectively. It is obvi-ous that the input, forget and output gates are respectively connected to a multiplicative unit to control the input and output of information and the state of each LSTM cell. σ and tanh are nonlinear active functions, and σ=sigmoid (x).

2.3　Gated Recurrent Unit networkGated Recurrent Unit (GRU) is an important variant of

LSTM, proposed by Cho et al. (2014). As shown in Figure 5, GRU replaces the forget gate and the input gate of LSTM with a single update gate. Its main function is to control the information of the current input xt to be retained. In addi-tion, GRU has a reset gate, which determines the in�uence of the previous output ht−1 on the current input xt. Unlike LSTM, GRU has only two control gates, making it much faster than LSTM.

�e input vector xt and the state of cell ht−1 are input to the update gate zt and the reset gate rt to calculate. �en, the tanh function will create a new candidate vector h̃t, which, together with the last moment output vector ht−1, yields the output ht. �e process can be expressed using the following equations.

zt = σ (Wzht−� +Uzxt + bz) (8)

rt = σ (Wrht−� +Urxt + br) (9)

h̃t = tanh (Whrtht−� +Uhxt + bh) (10)

ht = (� − zt) ht−� + zt h̃t (11) Herein, ht is the output of GRU unit at time t, W and U denote the weight parameter matrices, b is the bias vector, σ and tanh represent the activation functions, and zt and rt denote the update and reset gates, respectively.

2.4　Structure of NOx emission modelsIn order to illustrate the e�ectiveness of the depth net-

work in modeling NOx emission, RNN, LSTM and GRU neural networks are employed to construct the prediction model, respectively. �e proposed model consists of a three-layers network structure and includes the input, a hidden layer and an output layer.

In this study, the NOx emission prediction framework is built by using the Keras library with a TensorFlow backend. All the codes are written in Python under the Linux system. Compared with the traditional ANN, the training process of RNN adopts BPTT algorithm. Adam is a common optimiza-tion method, the core of which is to minimize the loss func-tion. In essence, the Adam algorithm is an implementation of BPTT. Adam optimizer converges to the optimal value faster. �is algorithm needs less memory and is very suitable for solving problems involving huge dataset learning and multi-parameters optimization. �erefore, the Adam optimizer is applied to optimize the weight parameters of each layer in the model. Additionally, the loss function is de�ned below.

f =�� 1

m

m�i=1 [y′(i) − y(i)]2

(12)

�e hyperparameters that a�ect the prediction accuracy of the model are learning rate, time step and the number of hidden layer nodes. Both LSTM unit and GRU can adapt to the time dependence in time series data. However, they may produce di�erent prediction performance results on the same data. �e NOx emission models based on RNN, LSTM and GRU are established, respectively, and the grid search algorithm (Mirjalili et al., 2014) is used to �nd the optimal hyperparameters of each model.

2.5　Evaluation metrics�e following metrics are employed to evaluate the pre-

diction performance of the NOx models.

δMAE = �n

n�i=� �yi − y

′i �

(13)

δMAPE = �n

n�i=��yi − y

′i

yi

�� × ��% (14)

δRMSE =�� 1

n

n�i=1 (yi − y′i)2

(15)

Herein, δMAE is the mean absolute error (MAE), δMAPE is the mean absolute percentage error (MAPE), δRMSE is the root mean square error (RMSE), yi is the real NOx emission, y′i is Fig. 5　Structure of GRU


the NOx emission predicted by the neural network, and n is the total number of test samples.

2.6　Adjusting operational parameters with gray wolf optimization

By constructing a NOx emission model, the correlation between operating parameters and NOx emission can be obtained. On this basis, the NOx emission can be reduced by optimizing and adjusting the operation parameters. �e prediction model for NOx emissions has 25 inputs. However, not all inputs of the NOx emission model can be considered as optimized variables. In the production process of a coal-�red boiler, the load and coal properties may be determined by the unit operating conditions and not be considered as adjustable parameters. According to the experience of the boiler operators, the operating parameters that are more sensitive to NOx emission are selected in this study. �e four-layer OFA �ow rates are selected as the variable param-eters for optimization and adjustment, and the adjustment range of the variable parameters is carefully determined.

In this paper, we use the gray wolf optimization (GWO) algorithm as a tool to optimize boiler operation parameters. �e GWO algorithm was recently developed by Mirjalili et al. (2014). It is a novel heuristic algorithm inspired by the hierarchy and hunting behavior of gray wolves. GWO algo-rithm has the advantages of simple structure, less adjustable parameters and easy implementation. Among them, there is a self-adaptive convergence factor and information feed-back mechanism, which can achieve a balance between local search and global search. �erefore, the GWO algorithm has good performance in solving problems in terms of precision and convergence speed.

In the design of GWO, whole wolves are divided into four groups. �e optimal solution is considered as α wolf. Conse-quently, the second optimal solution is set as β wolf and the third is called δ wolf. �e remaining candidate solutions are assumed to be ω wolves. �e �rst three groups, α, β and δ wolves, guide ω wolves to hunt toward the target. �e hunt-ing process of gray wolves includes approaching and encir-cling its prey, and the equations are as follows.

D = �C ⋅ Xp(t) − X(t)� (16)

X(t + �) = Xp(t) − A ⋅ D (17)

Herein, D indicates the distance between the individual and the prey, t is the current iteration, C and A are coe�cient vectors, Xp(t) denotes the position vector of the prey and X(t) denotes the position vector of a single gray wolf. Be-sides, A and C are calculated by Eqs.(18) and (19).

A = 2a ⋅ r1 − a (18)

C = 2r2 (19)

Herein, r1 and r2 are random numbers between [0,1], and α is the convergence factor.

a = 2 − 2ti�tmax (20)

Herein, tmax represents the maximum iteration, i=1,2, ..., N. When |A|>1, the gray wolf group expands the search space to �nd better food, which corresponds to the global search. When |A|<1, the gray wolf group will shrink the encircling circle, which corresponds to local search.

�e gray wolf tracking prey position is as follows.

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩Dα = �C1 ⋅ Xα − X(t)�Dβ = �C2 ⋅ Xβ − X(t)�Dδ = �C3 ⋅ Xδ − X(t)� (21)

Herein, Dα, Dβ and Dδ denote the distance between α, β, δ and other individuals, respectively, Xα, Xβ and Xδ denote the current position of α, β and δ, respectively, and C1, C2 and C3 are coe�cients.

During the hunting process, the remaining ω wolves move toward α, β and δ.

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩X1 = Xα − A1 ⋅ Dα

X2 = Xβ − A2 ⋅ Dβ

X3 = Xδ − A3 ⋅ Dδ

(22)

X(t + 1) = (X1 + X2 + X3) �3 (23)

�e above equations de�ne the �nal position of the ω wolf. A and C are random vectors and adaptive vectors, respectively.

�e algorithm steps are as follows.a) Initialize random wolves in the upper and lower limits of

wolf variables.b) Calculate the �tness value of each wolf.c) Select the �rst three best wolves and save them α, β and δ.d) Apply Eqs.(20)–(22) to update other wolves ω.e) Update parameters α, A and C.f) If the end condition is not met, go to step b).g) Output α wolf position.

�e entire process of the system is shown in Figure 6. �e deep learning models are established to determine NOx emission characteristics. And, the optimal operation param-eters for minimizing NOx emission is obtained based on the selected model and GWO optimizing framework.

3.　Results and Discussion

3.1　Model hyperparameters�e collected 8,000 groups of data were divided into a

train set and test set. Among them, 7,000 sets of data were selected to build the NOx emission model, and the remain-ing 1,000 sets of data were used as test sets to verify the per-formance of the model. �e prediction accuracy of the deep learning models will be a�ected by the hyperparameters. �e learning rate and the number of hidden layer neurons can be determined preferentially. In order to determine the appropriate parameters, grid search is used to optimize the neural network. By observing the in�uence of each param-eter on the prediction accuracy, a more appropriate param-eter can be inferred. And, RMSE is selected as the evaluation

Vol. 54 No. 10 2021 571

index, as shown is Figure 7. Under the same model param-eters, LSTM and GRU models have better prediction e�ect than RNN. Moreover, RMSE decreases initially, and then basically stabilizes with the learning rate decreases. When the learning rate is 0.01, RSME increases with the increase of the number of hidden layer neurons (model complexity). When the learning rates are 0.001 and 0.0001, on the con-trary, RSME decreases with the increase of the number of hidden layer neurons.

Timestep limits the depth of recurrent neural network expansion over time and determines the length of the input data. It is worth noting that the timestep in LSTM or GRU is di�erent from the long-range sequence in the traditional RNN time series prediction. �e latter can better handle the long-range sequence through the gate structure in the memory cell. In order to better verify the rules of NOx emis-sion prediction model on time series, this paper makes a comparison at di�erent time steps. �e predicted results of each model under di�erent time steps are shown in Table 2. In the study, di�erent timesteps,1, 4, 8 . . . , 128, are used to get the optimal timestep and to study the impact of timestep on NOx prediction accuracy. As shown in Figure 8, with the increase of the time step, RMSE of RNN model decreases at the beginning, but in-creases slowly a�er reaching the minimum. However, for LSTM and GRU models, RMSEs �rst decrease, and then stabilize with the increase of the timestep. It indicates that LSTM and GRU ignore the ad-ditional timesteps, and once a large enough timestep is set,

there is no need to pay special attention to the model pre-diction precision.

Besides, when the hyperparameters are the same, GRU and LSTM models have higher prediction accuracy than the RNN model. Compared with the LSTM model, the ac-

Fig. 6　Framework of NOx emission prediction and optimization

Fig. 7　In�uence of parameters on prediction accuracy

Table 2　Prediction results with di�erent timestep

ModelRMSE [ppm]

RNN LSTM GRU

Timestep 1 11.453 3.571 3.3064 6.662 2.746 3.2798 3.251 2.932 2.85

16 5.011 2.671 2.73724 5.534 2.61 2.6332 5.736 2.921 2.34848 5.664 2.651 2.60164 5.269 2.394 2.604

128 4.777 2.801 3.041


curacy of the GRU model is not signi�cantly improved, but the computing time is shortened. �is indicates that GRU has a good prediction e�ect on NOx emission, mainly be-cause NOx emission data is a time series and its production mechanism is relatively complex. �erefore, combined with historical data and other boiler parameters, the GRU model can achieve nonlinear �tting well. Considering the complex-ity of the model and the cost of computing, the optimal hy-perparameters of each model are listed in Table 3.

3.2　NOx emission predictionIn order to compare the performance of each model more

intuitively, the prediction results of the depth models are shown in Figure 9 and Figure 10. Figure 9 shows the NOx emissions between the predicted and measured values of the training set. Figure 10 shows the results of the test set. �e best performance obtained in each model is ranked in descending order as the GRU, the LSTM and the RNN. To evaluate the predictive performance of the model, three kinds of performance indexes are introduced in Section 2.5. �e detailed metrics corresponding to the di�erent models is shown in Table 4. According to the calculations, the GRU model has higher accuracy, where MAPE=0.38%, MAE=1.282 ppm, RMSE=2.348 ppm. Moreover, GRU has achieved more e�ciency in time series data modeling. It indicates that the GRU model can more accurately describe the nonlinear relationship between the input variables and the output variables.

It is worth noting that the data used in the study is con-tinuous and contains dynamic processes. �is enables the established dynamic model to adapt to the dynamic changes

of unit operating conditions, which is also impossible to be achieved by the steady-state model. �is is very important for power plant applications because the current units can-not maintain a stable operating state. �is advantage is very

Fig. 8　RMSE of the RNN-based models with di�erent timesteps

Table 3　Hyperparameters of Models

HyperparameterModel

RNN LSTM GRU

Learning rate 0.001 0.001 0.001Number of hidden layer neurons 512 512 512Time step 8 64 32

Fig. 9 Prediction results of each model in the training data. (a) and (b) are RNN, (c) and (d) are LSTM, (e) and (f) are GRU

Table 4　Performance comparison of di�erent models using testing set

ModelPerformance Index

RMSE [ppm] MAE [ppm] MAPE [%] Time [s]

GRU 2.348 1.282 0.38 350.468LSTM 2.394 1.228 0.47 421.002RNN 3.251 1.826 0.61 253.428

Fig. 10 Prediction results of each model in the testing data. (a) and (b) are RNN, (c) and (d) are LSTM, (e) and (f) are GRU

Vol. 54 No. 10 2021 573

important for the application of dynamic modeling in power plants, because the current units cannot maintain a stable operating state.

3.3　Comparisons with other methodsIn order to evaluate the performance of deep neural net-

work in the prediction of NOx emissions, BP, SVM and DBN are also used in this study to establish comparative mod-els, which have been successfully and widely used in NOx emission modeling. �e widely used BP neural network is a feed-forward network, which can be considered as non-linear mapping of the input pattern to the output pattern. A three-layer network with Relu hidden neurons is selected to accomplish the model, and the number of neurons was determined by repeated attempts. Similar to most SVM modelling researches, the generalization performance of the model mainly depends on two parameters, namely general-

ization parameter C and the kernel function parameters γ (Zhou et al., 2012). PSO is also used to optimize these two parameters, so as to ensure high precision prediction results and optimal generalization performance.

�e results show that the BP model gave the mean abso-lute error of 5.256 ppm, the mean absolute percentage error of 1.87% and the root mean square error of 6.054 ppm. For the SVM model, the mean absolute error is 8.102 ppm, the mean absolute percentage error is 3.4% and the root mean square error is 9.677 ppm. For the DBN model, the mean absolute error is 2.006 ppm, the mean absolute percentage error is 0.64% and the root mean square error is 5.375 ppm. For better comparison, Figure 11 shows the prediction error distribution of the testing data for various models. Speci�-cally, Figure 12 shows the prediction errors of RNN, LSTM and GRU in the training data and testing data. �e deep learning models, including GRU, LSTM, RNN and DBN, produce fewer errors than the traditional models, and the majority of the errors for the four models is within 5%. In particular, the absolute error of the GRU model is concen-trated in a small range. RMSE, MAE and MAPE of GRU are the smallest among several models, as shown in Figure 13. Obviously, compared with the traditional BP model, SVM model and DBN model, the deep recurrent neural network models have a better performance on NOx emission of ther-mal power plants.

3.4　Combustion optimizationPrior to application of optimization, the objective func-

tion should be de�ned at �rst. �e purpose of this study is to reduce NOx emission by modifying boiler operating parameters. According to the above analysis, the NOx emis-sion model based on GRU obtained the best prediction performance. If taking NOx emission function expressed by GRU as the �tness function, then the optimum operat-ing parameters can be achieved using the GWO algorithm. �e parameters are set as follows: the search agent is 30 and the maximum number of iterations is 100. �e goal is to minimize NOx emission. To verify the optimization per-formance, the commonly used PSO algorithm is also per-formed in the study. PSO can solve complex nonlinear prob-lems with fast computing speed and wide application range. Parameters are set as follows: the population size is 50, the maximum number of iterations is 100, the acceleration coef-�cients c1 and c2 are both equal to 2 and remain unchanged in the searching process, and the lower and upper bounds of the inertia weight factor ω are 0.4 and 0.9, respectively.

Table 5 shows the optimization results. �e �rst line shows the parameters of the normal production process. �e second line of adjustable parameters are optimized by the GWO algorithm under normal combustion conditions. And the third line of adjustable parameters are optimized by the PSO algorithm under normal combustion conditions. Ac-cording to the optimization results, the predicted NOx emis-sions are reduced to approximately 124.6 and 112.4 ppm, and the declines are 38.8 ppm and 43.9 ppm, respectively. For the PSO algorithm, the NOx emissions are reduced to

Fig. 12 Prediction errors of the training data and testing data for various models: (a) Comparison chart in the training data, (b) Comparison chart in the testing data

Fig. 13　Comparison of various models in error performance metrics

Fig. 11 Prediction error distribution of the testing data for various models


129.4 ppm and 119.4 ppm, with a reduction of 33.9 ppm and 36.9 ppm, respectively.

�e search process is shown in Figure 14. Both GWO and PSO can be used to reduce NOx emissions. However, the optimization results of GWO are superior to that of PSO. Compared with PSO, GWO provides higher quality and more stable solutions. �e former has good convergence, simple parameters and fast calculation speed, which is suit-able for online application. Besides, the results show that the upper OFA �ow rates should be increased when the condi-tions permit. OFA comes from the air stage combustion technology, which is one low NOx combustion technology. In this study, OFA comes from the secondary air. When the OFA �ow rates are changed, the secondary air may be a�ected. And NOx emission is reduced when the OFA �ow rates increase.

Conclusions

In this study, the GRU model has been successfully es-tablished to predict the NOx emissions of a 660 MW ultra-supercritical coal �red power plant using historical operat-ing data. GWO is employed in combustion control to �nd the best OFA distribution scheme under the speci�c inputs, and the NOx emission can �nally be reduced by adjusting the corresponding parameters to the optimized values. �e prediction performance of the model and the process of optimizing combustion are analyzed. �e major conclusions are as follows: 1. �ree kinds of the recurrent neural networks (RNN,

LSTM and GRU) have been used to model the NOx emis-sions prediction. �e comparison of the prediction re-sults between three models shows that GRU obtains bet-ter performance for predicting NOx emission. �e GRU

and LSTM models have higher prediction accuracy than RNN. Moreover, GRU has achieved high e�ciency in time series data modeling.

2. Compared with BP and SVM, which are commonly used in NOx modeling, the recurrent neural networks have better prediction performance. �is is mainly because the traditional model cannot save previous information and learn time series data, leading to the limited ability to predict long-term time series data such as NOx emissions.

3. On the basis of the NOx emission model, the GWO algo-rithm is used to optimize the operation parameters. By adjusting the corresponding parameters to the optimal value, NOx emission can be reduced eventually. According to the simulation results, 19.49% and 17.96% NOx reduc-tions are achieved for the two selected cases.

Acknowledgement

�is work is supported by the Fundamental Research Funds for the Central Universities, China (NO.2018QN052).

Literature Cited

Ahmed, F., H. J. Cho, J. K. Kim, N. U. Seong and Y. K. Yeo; “A Real-Time Model Based on Least Squares Support Vector Machines and Output Bias Update for the Prediction of NOx Emission from Coal-Fired Power Plant,” Korean J. Chem. Eng., 32, 1029–1036 (2015)

BP; BP Statistical Review of World Energy, https://www.bp.com/en/global/corporate/energy-economics.html, (2018)

Cheng, Y., Y. Huang, B. Pang and W. Zhang; “�ermalNet: A Deep Reinforcement Learning-Based Combustion Optimization System for Coal-Fired Boiler,” Eng. Appl. Artif. Intell., 74, 303–311 (2018)

Cho, K., B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk and Y. Bengio; “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation,” arXiv preprint arXiv:1406.1078 (2014)

Graves, A.; Supervised sequence labelling with recurrent neural net-works, Springer-Verlag, Berlin, Germany (2012)

Hochreiter, S. and J. Schmidhuber; “Long Short-Term Memory,” Neural Comput., 9, 1735–1780 (1997)

Ilamathi, P., V. Selladurai, K. Balamurugan and V. Sathyanathan; “ANN–GA Approach for Predictive Modeling and Optimization of NOx Emission in a Tangentially Fired Boiler,” Clean Technol. Environ. Policy, 15, 125–131 (2013)

Kolen, J. F. and S. C. Kremer; A Field Guide to Dynamical Recurrent Networks, Wiley-IEEE Press, Piscataway, U.S.A. (2001)

Krzywański, J. and W. Nowak; “Neurocomputing Approach for the Pre-diction of NOx Emissions from CFBC in Air-Fired and Oxygen-

Table 5　Combustion optimization result

Case OFA1[t/h]

OFA2[t/h]

OFA3[t/h]

OFA4[t/h]

NOx[ppm]

Case1 Original 53.133 64.977 56.344 65.583 163.387GWO 82.761 93.411 48.773 43.624 124.566PSO 74.398 68.365 40.03 50.972 129.432

Case2 Original 29.473 79.652 95.398 69.014 156.331GWO 81.577 59.424 44.133 39.715 112.384PSO 74.398 68.365 40.03 50.972 119.387

Fig. 14 Comparison of various models in error performance metrics: (a) Optimization process for Case 1, (b) Optimization process for Case 2

http://dx.doi.org/10.1007/s11814-014-0301-2

http://dx.doi.org/10.1007/s11814-014-0301-2

http://dx.doi.org/10.1007/s11814-014-0301-2

http://dx.doi.org/10.1007/s11814-014-0301-2

http://dx.doi.org/10.1007/s11814-014-0301-2

http://dx.doi.org/10.1016/j.engappai.2018.07.003



http://dx.doi.org/10.1162/neco.1997.9.8.1735

http://dx.doi.org/10.1162/neco.1997.9.8.1735

http://dx.doi.org/10.1007/s10098-012-0490-5

http://dx.doi.org/10.1007/s10098-012-0490-5

http://dx.doi.org/10.1007/s10098-012-0490-5

http://dx.doi.org/10.1007/s10098-012-0490-5

Vol. 54 No. 10 2021 575

Enriched Atmospheres,” Journal of Power Technologies, 97, 75–84 (2017)

LeCun, Y., Y. Bengio and G. Hinton; “Deep Learning,” Nature, 521, 436–444 (2015)

Li, G. and P. Niu; “Combustion Optimization of a Coal-Fired Boiler with Double Linear Fast Learning Network,” So� Comput., 20, 149–156 (2016)

Li, G., P. Niu, W. Zhang and Y. Liu; “Model NOx Emissions by Least Squares Support Vector Machine with Tuning Based on Ame-liorated Teaching–Learning-Based Optimization,” Chemom. Intell. Lab. Syst., 126, 11–20 (2013)

Liukkonen, M., M. Heikkinen, T. Hiltunen, E. Hälikkä, R. Kuivalainen and Y. Hiltunena; “Arti�cial Neural Networks for Analysis of Process States in Fluidized Bed Combustion,” Energy, 36, 339–347 (2011)

Liukkonen, M., E. Hälikkä, T. Hiltunen and Y. Hiltunen; “Dynamic So� Sensors for NOx Emissions in a Circulating Fluidized Bed Boiler,” Appl. Energy, 97, 483–490 (2012)

Lv, Y., J. Liu, T. Yang and D. Zeng; “A Novel Least Squares Support Vector Machine Ensemble Model for NOx Emission Prediction of a Coal-Fired Boiler,” Energy, 55, 319–329 (2013)

Lv, Y., T. Yang and J. Liu; “An Adaptive Least Squares Support Vector Machine Model with a Novel Update for NOx Emission Predic-tion,” Chemom. Intell. Lab. Syst., 145, 103–113 (2015)

Mirjalili, S., S. M. Mirjalili and A. Lewis; “Grey Wolf Optimizer,” Adv. Eng. So�w., 69, 46–61 (2014)

Mocanu, E., P. H. Nguyen, M. Gibescu and W. L. Kling; “Deep Learning for Estimating Building Energy Consumption,” Sustainable Energy, Grids and Networks, 6, 91–99 (2016)

PRC; Technical Guideline for the Development of National Air Pollutant Emission Standards, http://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/wrfzjszc/201812/t20181225_685872.shtml, (2019)

Safdarnejad, S. M., J. F. Tuttle and K. M. Powell; “Dynamic Model-ing and Optimization of a Coal-Fired Utility Boiler to Forecast and Minimize NOx and CO Emissions Simultaneously,” Comput. Chem. Eng., 124, 62–79 (2019)

Shi, Y., W. Zhong, X. Chen, A. Yu and J. Li; “Combustion Optimiza-tion of Ultra Supercritical Boiler Based on Arti�cial Intelligence,” Energy, 170, 804–817 (2019)

Skalska, K., J. S. Miller and S. Ledakowicz; “Trends in NOx Abatement: A Review,” Sci. Total Environ., 408, 3976–3989 (2010)

Smrekar, J., P. Potočnik and A. Senegačnik; “Multi-Step-Ahead Predic-tion of NOx Emissions for a Coal-Based Boiler,” Appl. Energy, 106, 89–99 (2013)

Tan, P., J. Xia, C. Zhang, Q. Fang and G. Chen; “Modeling and Reduc-

tion of NOx Emissions for a 700 MW Coal-Fired Boiler with the Advanced Machine Learning Method,” Energy, 94, 672–679 (2016a)

Tan, P., C. Zhang, J. Xia, Q. Fang and G. Chen; “NOx Emission Model for Coal-Fired Boilers Using Principle Component Analysis and Support Vector Regression,” J. Chem. Eng. Japan, 49, 211–216 (2016b)

Tan, P., B. He, C. Zhang, D. Rao, S. Li, Q. Fang and G. Chen; “Dynamic Modeling of NOx Emission in a 660 MW Coal-Fired Boiler with Long Short-Term Memory,” Energy, 176, 429–436 (2019)

Tang, Z., Y. Wang, Y. He, X. Wu and S. Cao; “Modeling of Boiler–Turbine Unit with Two-Phase Feature Selection and Deep Belief Network,” J. Chem. Eng. Japan, 51, 865–873 (2018)

Tuttle, J. F., R. Vesel, S. Alagarsamy, L. D. Blackburn and K. Pow-ell; “Sustainable NOx Emission Reduction at a Coal-Fired Power Station through the Use of Online Neural Network Modeling and Particle Swarm Optimization,” Control Eng. Pract., 93, 104167 (2019)

Wang, C., Y. Liu, S. Zheng and A. Jiang; “Optimizing Combustion of Coal Fired Boilers for Reducing NOx Emission Using Gaussian Process,” Energy, 153, 149–158 (2018a)

Wang, F., S. Ma, H. Wang, Y. Li and J. Zhang; “Prediction of NOx Emission for Coal-Fired Boilers Based on Deep Belief Network,” Control Eng. Pract., 80, 26–35 (2018b)

Wei, Z., X. Li, L. Xu and Y. Cheng; “Comparative Study of Computa-tional Intelligence Approaches for NOx Reduction of Coal-Fired Boiler,” Energy, 55, 683–692 (2013)

Wen, L., K. Zhou and S. Yang; “Load Demand Forecasting of Residen-tial Buildings Using a Deep Learning Model,” Electr. Power Syst. Res., 179, 106073 (2020)

Xie, P., M. Gao, H. Zhang, Y. Niu and X. Wang; “Dynamic Modeling for NOx Emission Sequence Prediction of SCR System Outlet Based on Sequence to Sequence Long Short-Term Memory Network,” Energy, 190, 116482 (2020)

Yang, G., Y. Wang and X. Li; “Prediction of the NOx Emissions from �ermal Power Plant Using Long-Short Term Memory Neural Network,” Energy, 192, 116597 (2020)

Zhang, X.-Y., F. Yin, Y.-M. Zhang, C.-L. Liu and Y. Bengio; “Drawing and Recognizing Chinese Characters with Recurrent Neural Net-work,” IEEE Trans. Pattern Anal. Mach. Intell., 40, 849–862 (2018)

Zhou, H., J. P. Zhao, L. G. Zheng, C. L. Wang and K. F. Cen; “Modeling NOx Emissions from Coal-Fired Utility Boilers Using Support Vector Regression with Ant Colony Optimization,” Eng. Appl. Artif. Intell., 25, 147–158 (2012)

http://dx.doi.org/10.1038/nature14539

http://dx.doi.org/10.1038/nature14539

http://dx.doi.org/10.1007/s00500-014-1486-3

http://dx.doi.org/10.1007/s00500-014-1486-3

http://dx.doi.org/10.1007/s00500-014-1486-3

http://dx.doi.org/10.1016/j.chemolab.2013.04.012




http://dx.doi.org/10.1016/j.energy.2010.10.033




http://dx.doi.org/10.1016/j.apenergy.2012.01.074









http://dx.doi.org/10.1016/j.advengsoft.2013.12.007

http://dx.doi.org/10.1016/j.advengsoft.2013.12.007

http://dx.doi.org/10.1016/j.segan.2016.02.005



http://dx.doi.org/10.1016/j.compchemeng.2019.02.001







http://dx.doi.org/10.1016/j.scitotenv.2010.06.001

http://dx.doi.org/10.1016/j.scitotenv.2010.06.001








http://dx.doi.org/10.1252/jcej.15we066










http://dx.doi.org/10.1016/j.conengprac.2019.104167








http://dx.doi.org/10.1016/j.conengprac.2018.08.003






http://dx.doi.org/10.1016/j.epsr.2019.106073



http://dx.doi.org/10.1016/j.energy.2019.116482







http://dx.doi.org/10.1109/TPAMI.2017.2695539







modeling and optimization of no emission from a 660 mw

Documents