detection of cyberattacks in a water distribution system ...detection of cyberattacks in a water...

Detection of Cyberattacks In a Water Distribution System Using Machine Learning Techniques

Patrie Nader Paul Honeine Pierre Beauseroy Institut Charles Delaunay (CNRS)

Universite de Teehnologie de Troyes France

[email protected]

LITIS Lab Universite de Rouen

France [email protected]

Institut Charles Delaunay (CNRS) Universite de Technologie de Troyes

France pierre. [email protected]

Abstract-Cyberattacks threatening the industrial processes and the critical infrastructures have become more and more complex, sophisticated, and hard to detect. These cyberattacks may cause serious economic losses and may impact the health and safety of employees and citizens. Traditional Intrusion Detection Systems (IDS) cannot detect new types of cyberattacks not exist-ing in their databases. Therefore, IDS need a complementary help to provide a maximum protection to industrial systems against cyberattacks. In this paper, we propose to use machine learning techniques, in particular one-class classification, in order to bring the necessary and complementary help to IDS in detecting cyberattacks and intrusions. One-class classification algorithms have been used in many data mining applications, where the available samples in the training dataset refer to a unique/single class. We propose a simple one-class classification approach based on a new novelty measure, namely the truncated Mahalanobis distance in the feature space. The tests are conducted on a real dataset from the primary water distribution system in France, and the proposed approach is compared with other well-known one-class approaches.

Index Terms-Cyberattack detection, kernel methods, Maha-lanobis distance, one-class classification.

I. INTRODUCTION

The security of industrial processes and critical infrastruc-tures has gained a lot of attention in the past few years with the growth of cyberattacks threatening these infrastructures [1][2]. The industrial processes are controlled via some supervisory systems, which enable the operators to perform centralized monitoring for field sites over long-distance communication networks. The interconnection of industrial systems with other corporate networks has exposed the critical infrastructures to new sources of cyberthreats, and has opened new ways for carrying out cyberattacks against these facilities. These cyberattacks could have severe potential consequences on the corresponding physical processes [3]. Several intentional cyberattacks have been carried out against industrial processes in the past few years. In 2000, the Maroochy water services attack resulted in the release of one million liters of untreated sewage into local parks and rivers [4]. In 2003, the Slammer worm penetrated Ohio's nuclear power plant and disabled the safety monitoring system for nearly five hours [5]. In 2010, the complex malware Stuxnet was discovered in Iran, targeting the PLCs connected to nuclear centrifuges used for enriching

ISBN: 978-1-4673-7504-7 ©2016 IEEE

Uranium [6]. In 2012, the malware Gauss was discovered in the Middle East with the largest number of infections in Lebanon. This malware was designed to intercept session data and steal credentials from several Lebanese banks, e.g., Bank of Beirut, Byblos Bank, and Fransabank [7].

Although traditional Intrusion Detection Systems update frequently their databases of known attacks, new complex cyberattacks are generated everyday to circumvent security systems and to make their detection nearly impossible [8]. This is why researchers have been developing and deploying various IDS to restrict the impact of cyberattacks on the infrastructures and limit the economic and human life losses. Some examples of IDS-based techniques are the collaborative intrusion detection approach using a centralized server as proposed in [9], the detection of intrusions by monitoring the state evolution of the studied system as presented in [10], and the approach based on the concept of critical state analysis for the detection of a particular type of cyberattacks against a given industrial installation as described in [11]. Unfortunately, a prior knowledge on the physical process and on its different critical states is mandatory to build the detection rules of these techniques. Statistical methods have been also investigated for cyberattacks detection, such as the Bayesian network implemented in [12], the moving average and the Kalman filter used for intrusion detection in [13] [14], and the probability density estimation adopted for anomaly detection in [15]. The main drawback of these approaches is that they operate only on a predefined model-based system and require prior knowledge on different types of attacks for an accurate detection. These difficulties restrict the use of parametric model-based approaches, and highlight the potential role of machine learning techniques in detecting cyberattacks, thus providing the needed complementary help to traditional IDS.

Machine learning techniques have been widely used in data mining to discover hidden regularities and patterns in data [16]. In particular, one-class classification algorithms gained a lot of interest in a large number of applications where the only available data designate a unique class, as in industrial systems [17] [18]. One-class classifiers learn the normal behavior modes of the studied system, and develop decision functions to accept normal samples and reject outliers

25

[19]. Scholkopf et al. proposed in [20] the one-class Support Vector Machines (one-class SVM), in which the mapped data are separated from the origin with maximum margin using a hyperplane. Tax et al. introduced in [21] the Support Vector Data Description (SVDD) which estimates the hypersphere with minimum radius enclosing most of the training data. Both approaches are greedy in terms of computational cost since they require to solve constrained quadratic programming problems. The slab Support Vector Machine (slab SVM), described in [22][23], aims at finding two parallel hyperplanes enclosing the samples instead of one hyperplane in the stan-dard SVM. The constrained quadratic programming problem of the SVM remains unchanged in this approach. The "Robust SVM" algorithm, described in [24], is less sensitive than the standard SVM towards outliers, yet it still requires to solve a constrained quadratic programming problem. A fast one-class approach was introduced in [25] to overcome the drawbacks of existing algorithms, but the use of the Euclidean distance in the decision function of the classifier leads to high sensitivity towards outliers.

In this paper, we propose a fast one-class classification approach in order to overcome the drawbacks of the afore-mentioned algorithms. The one-class classifier of the proposed approach is defined by the hypersphere enclosing the training samples in the feature space, and we estimate the center of this hypersphere without solving any quadratic programming problem, by following the work of [25]. In opposition to their work where the Euclidean distance was used, we propose a new novelty measure based on the truncated Mahalanobis distance in the feature space. In fact, the Mahalanobis distance is a multivariate dissimilarity that takes into account the scatter of the data in that space [26]. The truncated Mahalanobis distance proposed in this paper uses the most relevant axes in the feature space. The remainder of this paper is organized as follows. Section 2 introduces kernel methods, Section 3 describes the proposed one-class approach, Section 4 discusses the results on the real dataset, and Section 5 provides conclu-sion and future works.

II. KERNEL METHODS WITH MACHINE LEARNING

Machine learning techniques study the relations within the samples of a training dataset, and elaborate decision functions that allow to generalize the performance of the algorithms to new "unseen" samples, i.e., samples not existing in the training dataset [27]. Kernel methods rely on mapping the samples from the input space into a reproducing kernel Hilbert space (RKHS), where linear algorithms are applied on the samples in the RKHS in order to estimate the hidden relations existing within the samples in the input space [28]. Therefore, machine learning techniques with kernel methods provide a powerful way for detecting hidden relations using linear algorithms in the feature space [16]. In practice, only the pairwise inner product between the mapped samples is needed [29]. This inner product is computed directly from the input data using a kernel function, without any explicit knowledge on the mapping function.

ISBN: 978-1-4673-7504-7 ©2016 IEEE

Given a trammg dataset Xi, for i = 1,2, ... , n, in a d-dimensional input space X. Let ¢( x) be the mapping function from the input space X into a higher dimensional RKHS 1{

of some given kernel function Let K be the n x n kernel matrix with entries

The kernel matrix plays an important role in learning algo-rithms, since it gathers all the information needed on the pairwise inner product between the mapped samples. The main advantage of using such kernel functions is the construction of classification algorithms in inner product spaces without computing the coordinates of the data in that space, thus without any explicit knowledge of the mapping function ¢. This key idea is known as the kernel trick. Next, we detail the proposed one-class classification algorithm.

III. THE PROPOSED ONE-CLASS ApPROACH

In the proposed approach, the one-class classifier is defined by the hypersphere enclosing most of the training samples in the feature space. We set the empirical center of the data, namely Cn, as the center of this hypersphere, where Cn takes the following form:

1 n Cn = - L ¢(Xi).

n i=l

We estimate C n without solving any quadratic programming problem, by following the work of [25]. The radius of the hy-persphere represents the threshold for classifying new samples as outliers or normal ones. The new novelty measure proposed in this paper is the use of the truncated Mahalanobis distance in the decision function of the classifier in order to compute the radius of the hypersphere, instead of the Euclidean distance which is very sensitive to the presence of outliers among the training dataset. The main advantage of using the Mahalanobis distance is that it takes into account the covariance in each feature direction and the different scaling of the coordinate axes [26].

The center of the hypersphere Cn depends on all the training samples in the feature space. In order to provide a sparse approach, we propose to approximate Cn with a sparse center CA using some support vectors. By analogy to the SVDD approach, the support vectors are the furthest samples to the center, laying on and outside the hypersphere. The sparse center depends on these support vectors, and only these samples are taken into account in computing the truncated Mahalanobis distance in the feature space. Once the sparse center CA is estimated, the radiuslthreshold becomes the truncated Mahalanobis distance in the feature space between CA and any sample lying on the hypersphere, as illustrated in Fig. 1. Samples that lay outside this hypershere are considered as outliers. In the following, we detail how to compute the truncated Mahalanobis distance in the feature space and how to estimate the sparse center CA.

26

Support vectors

..___1 -------------- I

• ¢(xI) . Cn

Fig. I. An illustration of the proposed one-class classification approach, where the hypersphere classifier encloses most of the training samples in the feature space. The center ofthis hypersphere is the sparse center CA, which represents the approximation of the empirical center of the mapped data C n .

A. Computing the Mahalanobis distance in the feature space In order to identify the furthest samples to the center of

the hypersphere, the first step in the proposed approach is to compute the Mahalanobis distance in the feature space between all the training samples and the empirical center Cn. In fact, the Mahalanobis distance between a sample ¢( x) and C n is defined as follows:

11¢(x) - = (¢(x) - - cn), (1)

having the covariance matrix of the samples in the feature space, namely

1 n T = - L (¢(Xi) - Cn)(¢(Xi) - cn) .

n i=l

Since such learning algorithms are constructed in inner product spaces without any knowledge of the mapping function ¢, the covariance matrix cannot be expressed in terms of the samples ¢( x). In order to overcome this problem, we use the singular value decomposition of the covariance matrix namely

= VTDV,

where V represents the matrix of eigenvectors v k of and D the diagonal matrix with the corresponding eigenvalues). k ,

for k = 1,2"" ,no Each pair (vk,).k) satisfies

).kvk =

It is easy to see from the definition of that each eigenvector is a linear combination of the training samples ¢( xd in the feature space, namely:

n

i=l

ISBN: 978-1-4673-7504-7 ©2016 IEEE

By incorporating the expression of v k in the eigen decompo-sition of namely ).kvk = the coefficients are given by solving the eigen decomposition problem

,k k K· k n/\ a = a,

where the matrix K of entries 1 K,( Xi, X j) is the centered version of K.

Since V is an orthogonal matrix, the inverse of the covari-ance matrix can be expressed as follows:

Next, equation (1) takes this form 11¢(x) -having:

where each entry ak of a is associated to an eigenvector v k ,

with:

Finally, the Mahalanobis distance in equation (1) is computed in the feature space as follows:

n n 2

11¢(x) = (3) k=l i=l

B. Truncated Mahalanobis distance

As detailed in the previous section, the Mahalanobis dis-tance investigates all the eigenvectors of the covariance matrix. We propose to "truncate" it, by selecting a set of eigenvectors. This corresponds to projecting the samples onto the subspace spanned by these eigenvectors only. Instead of using all the eigenvectors v k of the covariance matrix for the projection operation, we make use of the advantages in the Kernel Principle Component Analysis approach [30], where only the eigenvectors associated to the largest eigenvalues are taken into consideration. The remaining ones are considered to be associated to noise. Therefore, the Mahalanobis distance is approximated by the truncated Mahalanobis distance in the feature space.

We also adopt the kernel whitening normalization of the eigenvectors as proposed in [31], where the variance of the mapped data is constant in all directions. This normalization

IThe kernel function K:(Xi, Xj) = K:ij is the centered version of ""ij = ""( Xi, Xj), and it is computed as follows:

- 1 ",n 1 ",n + 1 ",n /'l,ij = Kij - ;:;: 61'=1 KiT - -;; 61'=1 K,rj -;;:;:x L.....r,s=l Krs ·

27

rescales the training data to have unit variance in each feature direction, and it is achieved as follows:

(n,\k)21Io:kI12 = 1 ===? Ilo:kll = for all k = 1,2, ... n. n,\k

C. The sparse center

After computing the Mahalanobis distance in the RKHS between all the training samples and the center Cn, the sparse center CA will depend only on the furthest samples to Cn, by analogy to the SVDD approach. The sparse center is a linear combination of these samples, known as the support vectors, and its expression is given as follows:

JEA

where A C {I, 2, ... , n}, and IAI denotes the number of support vectors among the training dataset.

Approximating the center Cn with the sparse center CA is achieved by minimizing the Mahalanobis distance between Cn

and CA as follows:

also truncated by projecting the trammg samples onto the subspace spanned by the eigenvectors associated to the largest eigenvalues of the covariance matrix.

IV. EXPERIMENTAL RESULTS

The proposed one-class classification algorithm is tested on a real dataset from a drinking water distribution plant of Suez, which represents the primary drinking water distribution system in France. The real dataset corresponds to the final step of the drinking water distribution process, before water gets to consumers. The control system of this part of the plant contains primary and secondary storage tanks, six pumps to move the drinking water from the primary to the secondary tank, a sensor that provides the input water flow rate into the primary storage tank, a sensor that shows the water level in the primary tank, a sensor that provides the output water flow rate from the primary storage tank into the secondary storage tank, and a sensor that displays the pressure of the output water flow rate.

Each input sample of this real dataset has 10 attributes. 1 n 2 These attributes represent the states of the six pumps (0 or 1),

arg ,i,n 11;-;: L ¢( xz) - L {:Ji¢( xd II E' the input flow rate into the primary storage tank (in m 3 I h), 1=1 'EA the water level in the primary storage tank (in m), the output

The partial derivative of this cost function with respect to each flow rate into the secondary storage tank (in m 3 Ih), and the {:Ji is computed and set to zero, and for each feature direction pressure of the output flow rate (in bar). In order to test the the following expression is nullified: performance of the IDS existing in the water plant, 4 types

n n of cyberattacks were simulated and injected in the normal ().k) - L "'(Xl, Xk) - L !3j"'(Xj, Xk)) - en). network traffic. The first attack changes the states of the pumps

1=1 kEA j,kEA r=l in the received packets, the second one increases the output This boils down to the following: water flow rate and its pressure, the third attack modifies the

input water flow rate, and the fourth one changes randomly some of the 10 attributes of each new sample.

The coefficients {:Ji are computed using the matrix notation:

(4)

where the entries of the kernel matrix KA are I'C(Xj,Xk) for j, k E A, and k is the column vector with entries

L:kEAI'C(XI,Xk), for 1= 1, ... ,71.. In order to avoid non-invertible singular matrix K A, one can include a regularization parameter v, namely (3 = (KA + vI)- l k.

In order to test new samples and classify them as outliers or normal ones, we fix a radiuslthreshold R based on a predefined number of outliers nout. The decision function for any new sample x is to evaluate the squared Mahalanobis distance between ¢( x) and the sparse center CA as follows: mIn n

L ,\k -k=l i=l i=l JEA

n In n In 2

- LI'C(Xj,x) + LL{:JlI'C(Xj,XI)) . n n i=l j=l i=l j=11EA

If this squared distance is greater than the radius R2, the sample is considered as an outlier; Otherwise, it is considered as a normal sample. The above Mahalanobis distance is

ISBN: 978-1-4673-7504-7 ©2016 IEEE

The Gaussian kernel, which is the most common kernel for one-class classification problems, is used in this paper. The expression of this kernel is given by

I'C(X- x -) = e p (_ Ilxi -"J x 2a2 '

where Xi and Xj are two input samples, and II . 112 represents the 12-norm in the input space. The bandwidth parameter a is computed as proposed in [18], namely

dmax a=--

where dmax refers to the maximal distance between any two samples in the input space, and !vI represents the upper bound on the number of outliers among the training dataset. We compare the results of the proposed one-class classification approach with three other approaches, namely SVDD, slab SVM and robust SVM. The fraction of the support vectors in these approaches is fixed at 10% of the training samples. The sparse center in the proposed approach depends also on 10% of the training samples.

The one-class classification algorithms are tested on nearly 10 000 samples related to the aforementioned attacks and to the normal functioning mode of the drinking water distribution

28

plant. The detection rates of the one-class approaches are given in Table I. The results show that these approaches have good detection rates, and the best results are achieved with the proposed one-class algorithm that outperforms the other approaches for all the types of attacks.

TABLE I DETECTION RATES ON THE DRINKING WATER DISTRIBUTION DATASET.

In this paper

SVDD Robust SVM Slab SVM Proposed approach

Attack I 88.8 81.5 92.6 100

Attack 2 84.6 80.7 73.1 88.8

Attack 3 86.9 82.6 91.3 91.3

Attack 4 76.4 58.8 70.5 82.3

We also investigate the time consumption of these one-class algorithms, and Table II shows the estimated training time for each approach. The proposed one-class approach is the fastest algorithm with only 1.2 secondes; it is 20 times faster than SVDD and Robust SVM, and it is up to 100 times faster than Slab SVM. Finally, when it comes to the estimated time to test new samples, the results are given in Table III. The proposed approach takes only 0.004 second for each new sample, it is twice faster than Robust SVM, 6 times faster than SVDD and 8 times faster than Slab SVM. These results are very important if we want to apply our algorithm in real-world scenarios, where the proposed approach can process over 250 samples per second, compared to only 125 samples for Robust SVM, 43 samples for SVDD, and 32 samples for Slab SVM.

TABLE II ESTIMATED TRAINING TIME (IN SECONDS) OF EACH APPROACH.

In this paper

I SVDD I Robust SYM I Slab SYM Proposed approach

I 22.4 I 20.6 I 119.6 1.2

TABLE III ESTIMATED TIME (IN SECONDS) TO TEST A NEW SAMPLE.

In this paper

I SYDD I Robust SYM I Slab SYM Proposed approach

I 0.023 I 0.008 I 0.031 0.004

V. CONCLUSION

In this paper, we reviewed the security of industrial pro-cesses and critical infrastructures. We showed that the massive use of Information and Communication Technologies has opened new ways for carrying out cyberattacks against these infrastructures, and the complexity of the cyberattacks has

ISBN: 978-1-4673-7504-7 ©2016 IEEE

made the task extremely difficult for traditional intrusion de-tection systems. We investigated machine learning techniques, in particular one-class classification, in order to provide the necessary help to IDS in detecting cyberattacks. We proposed a fast one-class classification approach, in which the classifier is defined by the hypersphere enclosing the training samples in the feature space. We used only a small fraction of the training dataset to estimate the sparse center of this hypersphere, and we proposed a new novelty measure based on the Mahalanobis distance in the RKHS. We tested this approach on a real dataset, and we compared its results with well-known one-class approaches. The proposed approach achieved the highest detection rates, and it was the fastest algorithm when it comes to the estimated training time and to test new samples.

For future works, an extension of this approach to an online algorithm should be considered for real-world applications, since the proposed approach showed a high processing per-formance (over 250 samples per second). Furthermore, this approach could play an important role to the IDS in detecting malicious cyberattacks against industrial processes. Therefore, we should investigate the best ways to integrate this type of approaches in traditional IDS in order to provide a better protection for the critical infrastructures against cyberattacks.

ACKNOWLEDGMENT

The authors would like to thank Suez for providing the water distribution system real dataset.

REFERENCES

[1] S. A. Boyer, SCADA: Supervisory Control And Data Acquisition, 4th ed. USA: International Society of Automation, 2009.

[2] D.-J. Kang, J.-J. Lee, S.-J. Kim, and J.-H. Park, "Analysis on cyber threats to SCADA systems," in Transmission Distribution Conference Exposition: Asia and Pacific, Oct 2009, pp. 1-4.

[3] K. Stouffer, J. Falco, and K. Scarfone, "Sp 800-82. guide to industrial control systems (ICS) security: Supervisory control and data acquisition (SCADA) systems, distributed control systems (DCS), and other control system configurations such as programmable logic controllers (PLC)," National Institute of Standards & Technology, Gaithersburg, MD, United States, Tech. Rep., 2011.

[4] J. Slay and M. Miller, "Lessons learned from the Maroochy water breach," in Critical Infrastructure Protection, 2007, pp. 73-82.

[5] H. Christiansson and E. Luiijf, "Creating a european SCADA security testbed," in Critical Infrastructure Protection, ser. IFIP International Federation for Information Processing, E. Goetz and S. Shenoi, Eds. Springer US, 2007, vol. 253, pp. 237-247.

[6] R. Langner, "Stuxnet: Dissecting a cyberwarfare weapon," Security Privacy, IEEE, vol. 9, no. 3, pp. 49-51, 2011.

[7] K. L. Expert, "Gauss: Abnormal distribution," Kaspersky, Tech. Rep., August 2012.

[8] P. W. Oman and M. Phillips, "Intrusion detection and event monitoring in SCADA networks," in Critical Infrastructure Protection, 2007, pp. l6l-l73.

[9] P. Gross, J. Parekh, and G. Kaiser, "Secure selecticast for collabo-rative intrusion detection systems," in 3rd International Workshop on Distributed Event-Based Systems (DEBS'04), Edinburgh, Scotland, UK, 2004, pp. 50-55.

[10] 1. Fovino, A. Carcano, T. De Lacheze Murel, A. Trombetta, and M. Masera, "ModbuslDNP3 state-based intrusion detection system," in Advanced Information Networking and Applications (AINA), 24th IEEE International Conference on, April 2010, pp. 729-736.

[II] A. Carcano, A. Coletta, M. Guglielmi, M. Masera, I. Fovino, and A. Trombetta, "A multidimensional critical state analysis for detecting intrusions in SCADA systems," Industrial Informatics, IEEE Transac-tions on, vol. 7, no. 2, pp. l79 -186, May 2011.

29

[12] J. Bigham, D. Gamez, and N. Lu, "Safeguarding SCADA systems with anomaly detection." in MMM-ACNS, ser. Lecture Notes in Computer Science, Y. Gorodetsky, L. J. Popyack, and Y. A. Skormin, Eds., vol. 2776. Springer, 2003, pp. 171-182.

[13] N. Ye, Q. Chen, and C. Borror, "Ewma forecast of normal system activity for computer intrusion detection," Reliability, IEEE Transactions on, vol. 53, no. 4, pp. 557-566, Dec 2004.

[14] F. Knorn and D. Leith, "Adaptive Kalman filtering for anomaly detection in software appliances," in INFOCOM Workshops, IEEE, April 2008, pp. 1-6.

[15] T. Veracini, S. Matteoli, M. Diani, and G. Corsini, "An anomaly detection architecture based on a data-adaptive density estimation," in Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), 3rd Workshop on, June 2011, pp. 1-4.

[16] J. Shawe-Taylor and N. Cristianini, Kernel Met/wds for Pattern Analysis. New York, USA: Cambridge University Press, 2004.

[17] P. Nader, P. Honeine, and P. Beauseroy, "Intrusion detection in SCADA systems using one-class classification," in Proc. 21th European Con-ference on Signal Processing (EUSIPCO), Marrakech, Morocco, 9-13 September 2013, pp. 1-5.

[18] --, "lp-norms in one-class classification for intrusion detection in SCADA systems," Industrial Informatics, IEEE Transactions on, vol. 10, no. 4, pp. 2308-2317, Nov 2014.

[19] S. S. Khan and M. G. Madden, "A survey of recent trends in one class classification," in Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science, ser. AICS'09. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 188-197.

[20] B. SchOlkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson, "Estimating the support of a high-dimensional distribution," Neural Comput., vol. 13, no. 7, pp. 1443-1471, Jul. 2001.

[21] D. M. J. Tax and R. P. W. Duin, "Support vector data description," Mach. Learn., vol. 54, no. 1, pp. 45-66, Jan. 2004.

[22] B. SchOlkopf, J. Giesen, and S. Spalinger, "Kernel methods for implicit surface modeling," in Advances in Neural Information Processing Sys-tems 17. MIT Press, 2005, pp. 1193-1200.

[23] M. Eigensatz, J. Giesen, and M. Manjunath, "The solution path of the slab support vector machine," in The 20th Canadian Conference on Computational Geometry, Mcgill University. CCCG, 2008, pp. 211-214.

[24] M. Amer, M. Goldstein, and S. Abdennadher, "Enhancing one-class support vector machines for unsupervised anomaly detection," in Pro-ceedings of the ACM SIGKDD Workshop on Outlier Detection and Description (ODD), August 11-14,. New York, USA, 2013, pp. 8-15.

[25] Z. Noumir, P. Honeine, and C. Richard, "On simple one-class classifica-tion methods," in Proc. IEEE International Symposium on Information Theory, MIT, Cambridge (MA), USA, 1-6 July 2012.

[26] P. C. Mahalanobis, "On the generalised distance in statistics," in Pro-ceedings National Institute of Science, India, vol. 2, Apr. 1936, pp. 49-55.

[27] R. Herbrich, Learning Kernel Classifiers: Theory and Algorithms. Cam-bridge, MA, USA: MIT Press, 2001.

[28] J. P. Vert, K. Tsuda, and B. Scholkopf, "A primer on kernel methods," Kernel Methods in Computational Biology, pp. 35-70, 2004.

[29] N. Aronszajn, "Theory ofreproducing kernels," Trans. Amer. Math. Soc., vol. 68, no. 3, pp. 337 - 404, 1950.

[30] H. Hoffmann, "Kernel PCA for novelty detection," Pattern Recognition, vol. 40, no. 3, pp. 863 - 874, 2007.

[31] D. M. J. Tax and P. Juszczak, "Kernel whitening for one-class classi-fication," in Pattern Recognition with Support Vector Machines, First International Workshop, Niagara Falls, Canada, August 10, 2002, pp. 40-52.

ISBN: 978-1-4673-7504-7 ©2016 IEEE 30

detection of cyberattacks in a water distribution system ...detection of cyberattacks in a water...

Documents