d3.1 security metrics and measurements - disiem...

Project Deliverable

D3.1 Security Metrics and

Measurements Project Number 700692 Project Title DiSIEM – Diversity-enhancements for SIEMs Programme H2020-DS-04-2015 Deliverable type Report Dissemination level PU Submission date 30.11.2017 (M15) Responsible partner FCiências.ID Editor Ana Respício Revision 1.0

The DiSIEM project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 700692.

D3.1

2

Editor Ana Respício, FCiências.ID Contributors Ana Respício, FCiências.ID João Alves, FCiências.ID Luis Miguel Ferreira, FCiências.ID Frances Buontempo, City Ilir Gashi, City Pedro Rodrigues, EDP Susana González, ATOS

D3.1

3

Executive Summary

In this report we review and evaluate the current state of the art in industry and academia for classical security metrics and summarise those that are most suitable for integration with SIEMs. We focus on those metrics that allow Security Operation Centre (SOC) operatives to assess risk and the effectiveness of the security protection systems and controls that they use, and hence help them with decisions regarding operations security. Apart from that, we also focus on metrics to support managerial security analysis and decision-making. We provide a set of metrics for which we gathered information from the industrial partners in the DiSIEM project (all large multinationals working in different industrial sectors: energy, travel bookings and infrastructure) on the types of metrics that they use in their SOCs. Based on the analysis of the responses from the industrial partners, we identify which security metrics are candidates to be further assessed for integration in the components to develop in DiSIEM.

Apart from approaching security metrics, the report reviews the state of the art in risk assessment by SIEMs and we propose a hierarchical model to assess multi-level security risk. This model aims to provide decision-making support at different levels of management, considering the valuation of assets as well as the interdependencies among them.

Apart from providing and analysing classical metrics that may be useful for assessing the security performance of individual protection systems, or the sum of all the protection system that an organisation uses, we also present a review of security “diversity” measures – i.e., how similar or different security protection systems are from each other in their ability to detect attacks, or avoid common vulnerabilities. There are fewer reports in the literature regarding diversity measures compared with metrics for individual components. However they are very important to help organisations with decisions on how to choose amongst different protection systems available, which combination of protection systems is yielding the best protection for a given scenario and/or time period and how these diverse protection systems should be configured in a defence-in-depth architecture.

The metrics presented in this report will form a valuable input to the later deliverables in work package 3 that are concerned with probabilistic assessment of diversity for security, as well as to the components that will be developed in the work-package on Visual Analysis Platform (WP5).

D3.1

4

Table of Contents 1 Introduction ................................................................................................................................................ 8

1.1 Organization of the Document ...................................................................................... 8 2 DiSIEM Security Metrics ........................................................................................................................ 9

2.1 Related work ........................................................................................................................ 9

2.1.1 Definition and purpose............................................................................................ 9

2.1.2 Where, how and what to collect ........................................................................10

2.1.3 Good vs bad metric .................................................................................................11

2.1.4 Categorization, classification and taxonomies .............................................11

2.2 Methodology for definition of a SMs program ......................................................14

2.3 Identification of security metrics to generate ......................................................17

2.4 Questionnaire to assess the utility of the SMs ......................................................18

2.5 Analysis from DiSIEM partners input ......................................................................19

2.5.1 The SMs most and least used by the Security Information teams ........19

2.5.2 The SMs with the highest and lowest utility values ...................................20

2.5.3 SMs already in use by the partners ..................................................................23

2.5.4 SMs already in use produced by the SIEMs ...................................................23

2.5.5 Partners’ observations ..........................................................................................24

2.6 Summary .............................................................................................................................24 3 DiSIEM multi-level risk assessment ................................................................................................25

3.1 Definitions and concepts ...............................................................................................25

3.1.1 Definitions ..................................................................................................................25

3.1.2 Risk assessment process ......................................................................................26

3.1.3 Risk treatment ..........................................................................................................27

3.2 Related work ......................................................................................................................27

3.2.1 Alien Vault and XL-SIEM .......................................................................................27

3.2.2 ArcSight Solution .....................................................................................................29

3.2.3 IBM QRadar ................................................................................................................32

3.2.4 Splunk ..........................................................................................................................33

3.2.5 Atos Risk Assessment Engine .............................................................................34

3.3 Scientific literature review ...........................................................................................37

3.4 A Multi-Level Model for Risk Assessment in SIEM .............................................38

3.4.1 Structure of the model ...........................................................................................38

3.4.2 Characteristics of the layers ................................................................................39

3.4.3 Types of dependencies ..........................................................................................39

3.4.4 Identification of assets and dependencies .....................................................40

3.4.5 Risk evaluation .........................................................................................................41

3.5 Summary .............................................................................................................................49 4 Diversity metrics .....................................................................................................................................50

4.1 Introduction .......................................................................................................................50

4.2 Idealised scenario of diversity assessment when the data is labelled ........54

4.3 Empirical assessment .....................................................................................................54

4.4 Practical issues with diversity assessment with unlabelled data .................55

4.5 Techniques to help labelling data ..............................................................................55

4.6 Time-based metrics for diversity assessment ......................................................56 5 Summary and Conclusions ..................................................................................................................57 6 References .................................................................................................................................................58 7 Appendix – Metrics ................................................................................................................................62

D3.1

5

7.1 People/Management ......................................................................................................62

7.1.1 Governance ................................................................................................................62

7.1.2 Security values ..........................................................................................................65

7.1.3 Assets/Business values .........................................................................................66

7.2 Processes .............................................................................................................................66

7.2.1 Incidents and vulnerabilities status .................................................................66

7.2.2 Threat detection.......................................................................................................69

7.2.3 Security status ..........................................................................................................71

7.3 Technology .........................................................................................................................72

7.3.1 Performance ..............................................................................................................72

7.3.2 Compliance status ...................................................................................................74

7.3.3 Coverage .....................................................................................................................77

D3.1

6

List of Figures Figure 1 – Classification of Security Metrics by their input types – retrieved from

(Julisch, 2009) ........................................................................................................................13

Figure 2 – Taxonomy for business-level Security Metrics, from (Savola, 2007) ...14

Figure 3 – Taxonomy for Security Metrics for information security management in the organization, retrieved from (Savola, 2007) .................................................14

Figure 4 – Taxonomy for SMs ....................................................................................................16

Figure 5 – Distribution of the metrics by average utility value ...................................23

Figure 6 – Number of metrics already used by each partner with information provided or not by their SIEM systems .......................................................................23

Figure 7 – SIEM coverage of the SMs already in use (in percentage) ........................24

Figure 8 – How to compute the relevance variable in ArcSight and its possible values, from (Thiele, 2014) ..............................................................................................30

Figure 9 – Severity level possible values, from (Thiele, 2014) ....................................31

Figure 10 – Priority scores, from (Thiele, 2014) ...............................................................32

Figure 11 – IBM QRadar vulnerabilities scan results, from (IBM, 2017d) ..............33

Figure 12 – Atos Risk Assessment using XL-SIEM and Risk Assessment Engine ..35

Figure 13 – Types dependencies between assets ..............................................................40

Figure 14 – Assets and dependencies discovery process ...............................................41

D3.1

7

List of Tables Table 1 – Business functions and their purpose – derived from (CIS, 2010) .........12

Table 2 – Metrics Categorization - derived from CIS (2010) ........................................13

Table 3 – Consortium members involved with the questionnaire .............................19

Table 4 – List of terms and definitions extracted from (ISO/IEC, 2011) .................25

Table 5 – Analysis of qualitative and quantitative methods .........................................26

Table 6 – AlienVault and XL_SIEM priority scale ...............................................................28

Table 7 – AlienVault and XL_SIEM reliability scale ...........................................................29

Table 8 – Model Confidence score possibilities .................................................................30

Table 9 – Levels of importance of an asset ..........................................................................31

Table 10 – Qualitative and quantitative values proposed by Splunk (2017d) ......34

Table 11 – Inputs required for risk assessment in the reviewed SIEMs and focus of the risk evaluation ..........................................................................................................36

Table 12 – Properties for classifying incidents according to ArcSight ......................48

D3.1

8

1 Introduction

In this report we define security metrics to assess security characteristics that are of interest for the operational and managerial security decision making. We will also discuss measures on how best to combine multiple defences given a threat environment, which involves understanding how the strengths and weaknesses of diverse defences add up to the total strength of the system.

We reviewed and evaluated the current state of the art approaches in industry and academia for security metrics and defined those most suitable for integration with SIEMs, focusing on risk and effectiveness, to support the different C-level executives in their security decision making processes. We also consulted with the industrial partners in the project to verify which of these metrics they use in their environments. We then present a hierarchical model to compute multi-level risk (risk in the context of each C-level) that will consider an adequate valuation of the assets and the interdependencies among them.

We paid special attention to “diversity” measures – i.e., how similar or different security protection systems, vulnerabilities, attacks etc., are from each other – which is a key theme of DiSIEM. These types of diversity metrics are less studied in the literature compared with metrics for individual components, but are very important to be able to more accurately assess the overall security delivered by a combination of protection systems. Assessment of diversity for security will be presented in greater detail in a subsequent deliverables D3.2 which will be submitted in month 18 of the project.

1.1 Organization of the Document

Chapter 2 reviews the state of the art for security metrics, defines those most appropriate to integrate with SIEMs and assesses their utility in the context of the consortium.

Chapter 3 introduces concepts of risk assessment in information security and provides an overview of the state of the art in risk evaluation capabilities in existing SIEM systems. The chapter also proposes a hierarchical model for multi-level risk assessment.

Chapter 4 provides a state of the art in the assessment of diversity for security, the measures that are of interest when assessing diversity and related work on empirical and experimental studies on the assessment of diversity for security using multiple security protection systems.

Chapter 5 summarises the conclusions from the report and discusses ideas for further work within the project.

D3.1

9

2 DiSIEM Security Metrics

This chapter reviews the related literature and proposes a well-structured Security Metrics system with a precise definition and purpose, organised according to taxonomy for the SOC capabilities. A list of metrics to be implemented in the DiSIEM context is proposed. These metrics aim at monitoring the risk and security status of the organization according to the different SOC capabilities and the different SIEM systems adopted in the Consortium. In addition, the adequacy of these metrics is assessed within the Consortium operation context, by means of a questionnaire.

2.1 Related work

2.1.1 Definition and purpose

Bowen et al. (2007) define Security Metrics (SMs) as “tools to monitor the accomplishment of goals and objectives by quantifying the level of security and the efficiency and effectiveness of the security controls, by analysing the adequacy of security activities, and by identifying possible improvement actions.”

Security metrics provide a framework for evaluating the security built into commercially available products or services and allow enriching the knowledge of the organization’s security and can provide information about the organization’s strengths, weaknesses and risks, with a global view of the organization security status.

One of the problems related to the development Security Metrics (SM) is the difficulty in defining them correctly. A bad metric definition leads to misinterpretation, which can lead to misleading evaluations and by consequence a wrong risk assessment. Therefore, instead of an improvement of security and a risk reduction, the opposite might be obtained.

Jansen (2009) and Jaquith (2007) state that the definition of Security Metric is the measurement based on quantifiable measures and is a manner to put numbers around activities of security information. SMs are a subsection of metrics and specify which quantifiable measures must be security-related, be linearly ordered and the methods of measurement should be well defined. Payne (2006) goes further and separates measurements from metrics, states that measurements are raw data collected and metrics are either objective or subjective for human interpretation over measurements, but always simple and precise. Metrics, when well-tailored, can be an efficient tool for security managers to notice the effectiveness of their security programs and their components. With the knowledge gathered through metrics, security managers can answer questions such as, “are we more secure today than we were before?” or “are we secure enough?” or even “how secure are we?” Other authors link Security Metrics with measuring risk levels and countermeasures decision-

D3.1

10

making. Julisch (2009) defines SMs as valid and precise functions, whose return values are inversely related to the vulnerability of the measured system. SMs are used to identify the adequacy of controls, to provide a baseline for comparison purposes; to evaluate the security built; and provide financial information. This management leads to better information security related decisions. In the same work, Julisch also shows that this definition is consistent with the field of software quality metrics. Jansen (2009), Jaquith (2007) and Payne (2006) have the same concept for SMs purpose. Muthukrishnan and Palaniappan (2016), Rathbun and Homsher (2009), and Tashi and Ghernaouti-Helie (2008) argue that SMs play a vital role to any organization. The SMs purpose is to provide an understanding about the security risks, to discover potential problems in the system, detect failures in the IT controls, weakness of the security infrastructure, measure the performance of countermeasure and process, thus facilitating decision-making. In addition, SMs strive to offer a quantitative and objective basis for security assurance for strategic support, quality assurance, and tactical oversight, also provide more information for the assets’ accountability. These criteria can be achieved with models and algorithms which are applied to a collection of measured data.

In the DiSIEM context, security metrics are the final step of measurement, provided by SIEM, and deliver information about the organisation security status, thus providing information to the cybersecurity managers to support a wise decision-making process in what regards the enhancement of the security.

2.1.2 Where, how and what to collect

The Security Information team has a diverse set of sources that can provide useful raw data to generate the SMs, yet where, how and what can be challenging questions. These sources can also give wrong information so it is necessary to take care and know exactly how to answer these three “simple” questions. The correct answers will help to discard the unnecessary and unusable data for SMs computation. Berinato (2005) explains how to get the data, one of the questions above. Network tools can help gathering data, by running scans to find devices and their network IP, the results will provide network coverage. Running vulnerability scans on all the devices discovered will help to identify security patches flaws, and testing the password strength will identify the weak passwords. These are some basic examples of how to get good data for SM.

The SOC team needs to secure and monitor all the organization’s workstations, devices, assets and network flows. Executing this job correctly can be difficult because of the amount of organization records that are generated. Vaarandi and Pihelgas (2014) describe the use of security logs on SMs. A key method in obtaining meaningful data is to filter and remove the duplicates, reducing the large amounts of unnecessary data. This method is crucial for the organizations that use different logs with the same data. Understanding what information each log provides, what data are needed for computing a given SM, and which logs provide them helps the security manager to select the appropriate sources for collecting these data, before the implementation of the SM. For this case, the correlation is also a good tool to be used because it matches the same valuable

D3.1

11

information from different logs. Vaarandi and Pihelgas (2014) also explain about the importance of correlation on the different logs. It is also necessary to be aware of the incorrect data collected, such as false positives given by IDS alarm logs, and discard them whenever possible. Therefore, the main goal is to obtain reliable and understandable measurements, selecting only what is important and according with the organizations’ security objectives (Vaughn et al., 2003). The same work also states that governmental metrics should be addressed for upward reporting and organizational report. As for the commercial side, their metrics are more focused to answer questions about how strong is the security perimeter, what is the return of the investment (ROI), etc. There are five characteristics that should be in mind when selecting metrics, which are: Correctness and Effectiveness, Leading Versus Lagging Indicators, Organizational Security Objectives, Qualitative and Quantitative Properties, Measurements of the Large Versus the Small (Jasen, 2009). All these properties should be previously assumed to facilitate and enhance the selecting phase. Besides, to select a SM it is essential to know what a good metric and a bad metric are and to establish some properties.

2.1.3 Good vs bad metric

Collecting valuable data is important, however, if the generation and selection of metrics is done without care, all the data collected will produce useless and meaningless SM. To prevent this, it is necessary to differentiate a good metric from a bad metric. Jaquith (2007) describes a list of criteria for good metrics and bad metrics, so the security manager can check which category his metrics belongs to. Good metrics should satisfy five criteria: 1) Consistently measured, without subjective criteria; 2) Cheap to gather, preferably in an automated way; 3) Expressed as a cardinal number or percentage, not in a qualitative label like “high”, “medium” and “low”; 4) Expressed using at least one unit of measure, such as “defects”, “hours”, or “dollars”; 5) Contextually specific, and relevant enough to decision-makers that they can act. As for bad metrics, in the same work, Jaquith considers those that are inconsistently measured, usually because they rely on subjective judgements that vary from person to person, cannot be gathered cheaply, as is typical of labour-intensive surveys and one-off spreadsheets. In addition, bad metrics do not express results with cardinal numbers and units of measure, instead, they rely on qualitative high/medium/low ratings, traffic lights, and letter grades.

Payne (2006) uses an acronym for security managers to know if a metrics is good to use: “Good metrics are those that are SMART, i.e. Specific, Measurable, Attainable, Repeatable, and Time-dependent”. SMs that are SMART indicate the degree to which the system is far from or close to its security goals.

2.1.4 Categorization, classification and taxonomies

A taxonomy of SMs is a scheme that helps the classification and management of the organization’s SM. With a well-defined taxonomy, the metrics that have been created are expected be more efficient and useful to the organization. A SM that does not fall under the classification should be discarded for the simple reason

D3.1

12

that it is not necessary or will not be useful. If the team thinks that one metric doesn’t fit under the classification of the taxonomy but is important, then the taxonomy should be revised. Taxonomies improve the cooperation within the teams, even if they belong to different departments. The classification of metrics may vary among organizations, even if they use the same methodology. Jaquith (2007) states that we can use standards as a guide to build frameworks, yet the organizations shouldn’t misuse taxonomies and must be created according to the organizations’ structure. In (CIS, 2010), twenty metrics definitions are presented, specifically for business functions. CIS (2010) provides seven business functions and their respective metrics. Table 1 presents these business functions and their purposes, respectively, based on the information available in (CIS, 2010).

Function Purpose

Incident Management Determining how well the organization detects, identifies, handles and recovers from security incidents.

Vulnerability Management Determining how well the organization manages its security exposure by identifying and mitigating known vulnerabilities.

Patch Management Determining how well the organization is able to maintain the patches state of its systems.

Configuration Management Presenting the configuration state of the system of the organization

Change Management Assessing how the changes of the system configuration can affect the security of the organization.

Application Security Determining the reliability on the security model of business applications to operate as the organization intended.

Financial Metrics Evaluating the investment made in information security.

Table 1 – Business functions and their purpose – derived from (CIS, 2010)

CIS (2010) also categorizes Security Metrics in three hierarchies, based on their purpose and audience. Table 2 presents the categories with the functionality and audience.

D3.1

13

Metric Category Functionality Audience

Management Metrics

Provide information about the performance business functions and

the impact on the organization

Business Management

Operational Metrics Improve the tasks of

business functions and a better understanding

Security Management

Technical Metrics Provide technical details and can be a support for

the other metrics Security Operation

Table 2 – Metrics Categorization - derived from CIS (2010)

Julish (2009) also created a taxonomy, which is shown in Figure 1. The purpose was to create a new classification type of security metrics. This classification, unlike the previous ones, is based on the input data analysed by the SM. The decision to use input data as the basis of a new classification was made because it has a particularly large influence on validation, accuracy, and precision of SM.

Figure 1 – Classification of Security Metrics by their input types – retrieved from

(Julisch, 2009)

D3.1

14

Based on the evaluation of some proposed taxonomies, Savola (2007) proposes a high-level information security metrics taxonomy which covers metrics for organizational information security and product development. Figure 2 and Figure 3 display two examples of the proposed taxonomies. Figure 2 illustrates a taxonomy for business-level SMs with two levels while Figure 3 shows a more detailed taxonomy for SMs for information security management with three levels. The number of levels depends on the level of detail the organization wants to work with.

Figure 2 – Taxonomy for business-level Security Metrics, from (Savola, 2007)

Figure 3 – Taxonomy for Security Metrics for information security management in

the organization, retrieved from (Savola, 2007)

2.2 Methodology for definition of a SMs program

To understand SMs, it is essential to have a proper SMs definition and they must be within the goals of the information security team. For the information security team, security metrics are the final step of measurement and deliver information about the infrastructure security status (and other related information), providing substantiated information to the cybersecurity manager to have a wise decision-making process, resulting in the enhancement of the security. This enhancement can be achieved by changing the definitions of policies, countermeasures or resources allocation.

SMs should be structured to accomplish their purpose, so that the organization is able to discard unsuitable SMs. A methodology for the definition of a SMs program must be well defined not to repeat steps and to reduce the time spent in the creation and maintenance of SMs.

D3.1

15

The methodology adopted to guide the establishment of the SMs program is based on Payne (2006), which includes the following key steps:

1. Define the metrics program goal(s) and objectives 2. Decide which metrics to generate 3. Develop strategies for generating the metrics 4. Establish benchmarks and targets 5. Determine how the metrics will be reported 6. Create an action plan and act on it, and 7. Establish a formal program review/refinement cycle.

The Payne (2006) methodology aims at promoting an understanding of the purpose of the SMs program, driving its outcomes, and the responsibilities in this program management.

In the DiSIEM context, each organization will be responsible for steps 4, 6 and 7. Therefore, in this task we focused on steps 1, 2, and 3. Step 5 will be approached in DiSIEM WP5.

In step 1, we set up one goal sufficiently broad to suit different types of organisations.

The security metrics program aims to establish and provide metrics that measure the maturity of an organisation regarding its Information Security operations, including a way to communicate how efficiently and effectively the organisation is controlling security risks and exploring its management of security capabilities.

For this goal, the following objectives were mapped:

1) To provide a global view of the organization’s security risk;

2) To support individual risk assessment of the monitored infrastructure and their subsystems;

3) To allow assessing the security status of the appliances and operation;

4) To yield information about the SOC team effectiveness; and

5) To support decisions regarding which investments (monetary, staff, work labour) are worthy, considering the organization’s security status and possible loss scenarios.

To organise the metrics to generate, we created a taxonomy which divides the SOC capabilities hierarchically into three common main categories of cybersecurity: People/Management, Processes and Technology – as illustrated in Figure 4, following the common standard.

D3.1

16

Figure 4 – Taxonomy for SMs

The target audience for the People/Management category are the C-level managers. The metrics for the Management category are divided into three subcategories:

– Governance, providing information about the administration and management over the workers and external providers within the organization network.

– Security values, which contains metrics about the investment made and the return of enhancing the security, and

– Assets/Business values, which relates with the assets and business value to the organization and the cost of their loss.

The Processes category focuses on providing more manageable control over the incidents and vulnerabilities, the communications and the state of security in the organization. The Processes category is divided into three subcategories:

– Incidents and vulnerabilities status, which includes metrics about incidents and vulnerabilities detection and resolution;

– Threat detection, which contains metrics about detection of anomalies and abnormal behaviour; and

D3.1

17

– Security status, which is devoted to measures about the system and subsystems.

The final category is Technology, which focuses on the correct and incorrect operation of the cybersecurity tools. The Technology category contains a set of metrics to assess the Performance, Coverage and Compliance Status of security tools used to monitor the assets, detect anomalies or malicious behaviours.

To carry on with the second step of the methodology “Decide which metrics to generate”, Payne (2006) suggests a top-down and a bottom-up approach. The top-down approach, which was adopted, starts with listing objectives of the security program and proceeds to identify specific metrics that would help determine if those objectives are being achieved. Finally, one should determine the measurements needed to generate those metrics.

Next section is devoted to the identification of the metrics to generate in our SMs program.

2.3 Identification of security metrics to generate

We reviewed the literature on security metrics and, considering the objectives established for adoption of SMs and the created taxonomy, we selected a total of 63 security metrics which are appropriate and dedicated for SOC teams and SIEM systems. From these, 16 SMs are in the Management category, 25 in the Processes category, and 22 in the Technologies category. From those 63 metrics, 15 are variations from originals and 2 are new metrics with 6 variations. The remaining metrics were adopted or adapted from (ArcSight, 2010; Berinato, 2005; Butler, 2009; Cain and Couture, 2011; Chuvakin, 2014; Cornell, 2015; Gordon, 2015; and Kotenko et al, 2013).

To facilitate proceeding to the step 3 of the methodology “Develop strategies for generating the metrics”, it is necessary to identify means for collecting the needed data and deriving the metrics. Many times these means are not available yet and would need to be developed and implemented. Therefore, is required to specify: the source of the data, the frequency of data collection and of the metric generation, and who will be responsible for raw data accuracy, data compilation into measurements, and generation of the metric. In our case, as the project proposes to develop means to generate the metrics at different organisations with diverse needs, together with the identification of the metrics we tried to understand if it was possible to obtain the metric through the SIEM or through a third-party and if it would be possible to generate the required data. In addition, the frequency of generation was also proposed.

Appendix 7 presents the survey of all the 63 SMs identified. For each SM, the sources for the data/measurements were identified, as well as the frequency of computation/monitoring. The produced survey, together with an assessment questionnaire, was disseminated to the DiSIEM consortium and answered by the Consortium industrial partners. The analysis of the answers will allow selection of which SMs should be implemented, which sources can produce the required

D3.1

18

data for their calculation, and if the SIEM (or third-parties) can produce the required information from their sources.

Two new managerial metrics, beyond the state of the art, are proposed. Both are devoted to assess the efficacy of the SOC: the PETVI – SOC’s Percentage of effort Time to resolve Vulnerabilities and resolve Incidents (7.1.1.3) and the ERVIDENT –Efficacy of resolution of incidents and vulnerabilities (7.1.1.8 and 7.1.1.11).

The PETVI has two versions: the PETVI(a) - SOC’s Percentage of effort Time to resolve Vulnerabilities and Incidents, which considers only vulnerabilities and incidents opened and closed in the current month; and the PETVI(b) - SOC’s Percentage of effort Time to resolve Vulnerabilities and resolve Incidents, which considers vulnerabilities and incidents that could be opened in the current or previous months but were closed in the current month.

In addition the PETVI has two other variations, considering only vulnerabilities or only incidents, the PETrV and the PETrI, respectively.

The ERVIDENT is derived in two main variations: the Efficacy of resolution of incidents and vulnerabilities (a) opened and resolved in the current month versus (b) - opened within the current or preceding months but resolved in the current month. For a more disaggregated measurement, the metric can focus only on vulnerabilities, the ERV (7.1.1.9), or only on incidents, the ERIDENT (7.1.1.10).

The results of the PETVI, ERVIDENT and their variations should be kept in history to be compared over time, thus allowing observation and correlation of the teams’ effort and efficacy in resolving vulnerabilities and incidents.

2.4 Questionnaire to assess the utility of the SMs

To assess the utility of the selected SMs for the DiSIEM Consortium, the industrial partners were asked to fill a questionnaire to assess mainly which metrics were already in use, the ability to generate the metrics, and the utility of each metric. This would support deciding which metrics should be considered for implementation and integration in DiSIEM components.

The partners were asked to answer the following questions, for each of the surveyed metrics:

1. Do you already use this metric? 2. Does your SIEM (or third-party sources) provide this metric? If Yes, please

explain how. (If you are already using that SM, please describe how the data are collected, whether it is through the SIEM or third-party sources. In case of being through the SIEM, please explain which technology you use to provide information for the SM, for example, Firewall, IPS, antivirus logs, etc.)

3. Can you generate the input data? If yes, explain how. 4. On a scale of 1 – 5, with 5 being the most acceptable/ useful for your

organisation, rank the metric according interest and utility.

D3.1

19

5. What should be the frequency? (Continuously, Hourly, Daily, Weekly, Monthly or Yearly)

6. Do you have any suggestion or observation for this metric? 7. Do you propose to adopt instead, or in addition, a different but related

metric?

The following section analyses the results of the questionnaire.

2.5 Analysis from DiSIEM partners input

This section summarises the analysis of the questionnaire results. The analysis has the purpose of understanding the most used security metrics, the most useful, if the different SIEM systems can provide the required information, in addition to the SIEMs which are the other resources used to provide data to the security metrics, and what is the most useful frequency to generate the security metrics.

After this phase, we can discard and/or enhance the security metrics to be more accurate to the reality and work process of the security team.

The consortium members that were involved with this questionnaire are specified in Table 3.

Partner Function FCiências.ID Creation and analysis of the questionnaire EDP Answered AMADEUS Answered ATOS Answered

Table 3 – Consortium members involved with the questionnaire

2.5.1 The SMs most and least used by the Security Information teams

The most frequently used metrics, used by all three Information Security teams (EDP, AMADEUS, ATOS), are: AC – Asset Criticality; NKVS – Number of known vulnerabilities and not resolved by severity; NVTA – Number of vulnerabilities identified by tested asset.

Concerning the AC has an average utility value of 4, where the maximum value for utility is 5 (4/5), and should be calculated monthly. Only the AMADEUS SIEM (QRadar/Splunk SIEMs) can provide this metric. For computing the AC Security Metric, ATOS uses a dashboard and requires manual value insertion. An enhancement of this metric would be its automatization.

The NKVS and the NVTA have an average utility value of 4 (4/5). The NKVS metric should be calculated weekly (as an average of the answers Monthly, Daily, and Monthly), and the NVTA should be calculated monthly. Only ATOS SIEM provides both metrics, as their computation is a feature of the XL-SIEM SIEM, which is illustrated in a pie chart dashboard classifying the vulnerabilities

D3.1

20

detected by severity with a percentage and value number. The data for these metrics are retrieved by a manual execution of the administrator to scan the servers from the OSSIM dashboard (OpenVAS is used as vulnerability scanner), and from pentest campaign results and internal ticketing system.

For the NVTA metric, XL-SIEM provides a list of reports for scan jobs in a dashboard with a classification by hosts.

Twelve metrics aren’t used by any of the three Information Security teams: PUA - Privileged Users Activity; BV - Business Value; RRSO - Rate of return for security operations; AS - Attack Surface; TAFBU - Top access failures by business unit; IUH - Installation of unauthorized hardware; IUS - Installation of unauthorized software; DTD - Detection to Decision; PL - Patch Latency; PS - Patch Status; AStatus - Antivirus Status; UCC - Unusual configuration changes made in the FW, VPN, WAP and Domain.

2.5.2 The SMs with the highest and lowest utility values

The Security Metrics that received the highest score for utility (5) by at least one partner, are listed in the following. In addition, it is also presented, for each metric, its averaged utility and which partner scored it highest.

PUA - Privileged Users Activity (4;5 – AMADEUS);

PETmV - SOC’s Percentage of effort time to resolve Vulnerabilities (4; 5 –

AMADEUS);

CU - Cost of Updates (4; 5 – AMADEUS);

AC - Asset Criticality (4; 5 – AMADEUS);

BV - Business Value (4; 5 – AMADEUS);

RRSO - Rate of return for security operations (4; 5 – AMADEUS);

MTTR - Mean Time to Remediate (a known vulnerability and a reported

incident) (4; 5 – AMADEUS);

MTTRV(a) - Mean time to resolve a vulnerability (4; 5 – AMADEUS);

MTTRI(a) - Mean time to resolve an incident (4; 5 – AMADEUS);

AOKVS - Age of the oldest known vulnerability and not resolved by

severity (4; 5 – AMADEUS);

NKVS - Number of known vulnerabilities and not resolved by severity (4;

5 – AMADEUS);

NKUV - Number of known unresolved vulnerabilities by vulnerability type

(4; 5 – AMADEUS);

D3.1

21

NVMC - Number of vulnerabilities cases by month in each severity

category (3; 5 – AMADEUS);

NVR - Number of vulnerabilities cases by responsible (3; 5 – AMADEUS);

NATM - Number of assets tested by month (3; 5 – AMADEUS);

NVTA - Number of vulnerabilities identified by tested asset (4; 5 –

AMADEUS);

NVIM - Number of vulnerabilities identified and reported incidents, by

month (3; 5 – AMADEUS);

NRIM - Number of reported incidents by month (4; 5 – AMADEUS);

NRIVM - Number of resolved incidents and vulnerabilities by month (4; 5

– AMADEUS);

ERVIDENT (a) - Efficacy of resolution of incidents and vulnerabilities (4; 5

– AMADEUS);

ERV - Efficacy of resolution of vulnerabilities (3; 5 – AMADEUS);

ERI - Efficacy of resolution of incidents (3; 5 – AMADEUS);

ERVIDENT (b) - Efficacy of resolution of incidents and vulnerabilities (3; 5

– AMADEUS);

PIS - Percentage of infected Systems (4; 5 – AMADEUS);

TMA - Top malware activity (4; 5 – AMADEUS);

TEE - Top Egress Event (4; 5 – AMADEUS);

TIE - Top Ingress Event (4; 5 – AMADEUS);

TFA - Top Foreign attacks (3; 5 – AMADEUS);

TFC - Top Foreign Countries (3; 5 – AMADEUS);

AS - Attack Surface (4; 5 – AMADEUS);

FE - Firewall Entry (4; 5 – AMADEUS);

TAFD - Top access failures by destination (3; 5 – AMADEUS);

IUH - Installation of unauthorized hardware (4; 5 – AMADEUS);

IUS - Installation of unauthorized software (4; 5 – AMADEUS);

SHS - Security "Health" Score (4; 5 – AMADEUS);

EPS - Events per second (4; 5 – AMADEUS);

PE - Peak Event (4; 5 – AMADEUS);

NE - Normal Event (3; 5 – AMADEUS);

CELV - Changes of the event log (4; 5 – AMADEUS);

TE - Top events (4; 5 – AMADEUS);

PAM - Percentage of assets modelled (3; 5 – AMADEUS);

PDM - Percentage of devices monitored (3; 5 – AMADEUS);

DTD - Detection to Decision (4; 5 – AMADEUS);

SEU - SIEM resource usage (4; 5 – AMADEUS);

RH - Rules handled (3; 5 – AMADEUS);

ID/PA - Intrusion Detection / Prevention Activity (3; 5 – AMADEUS);

QF - Quiet Feeds (3; 5 – AMADEUS);

PL - Patch Latency (3; 5 – AMADEUS);

PS - Patch Status (4; 5 – AMADEUS);

D3.1

22

ACover - Antivirus Coverage (3; 5 – AMADEUS);

AStatus - Antivirus Status (3; 5 – AMADEUS);

TUS/PAS - Top unusual scans / probe activities by source (3; 5 –

AMADEUS);

DUAC - Devices with unauthorized or anomalous communications (4; 5 –

AMADEUS);

UCC - Unusual configuration changes made in the FW, VPN, WAP and

Domain (4; 5 – AMADEUS);

TDT - Top dropped traffic by DMZ and FW (3; 5 – AMADEUS);

The ERVIDENT is one of the new metrics here proposed and only EDP currently uses this metric. Though AMADEUS does not use it, they consider that it is of high utility for their organization (5). Concerning the frequency of generation for ERVIDENT, the partners have different opinions: EDP considers that this metric should be computed monthly, while AMADEUS considers that it should be generated yearly.

From the list of metrics for which at least one partners gave the highest utility value, eleven are not used by any of the partners:

PUA – Privileged Users Activity; BV – Business Value; RRSO – Rate of return for security operations; AS – Attack Surface; IUH – Installation of unauthorized hardware; IUS – Installation of unauthorized software; DTD – Detection to Decision; PL – Patch Latency; PS – Patch Status; AStatus – Antivirus Status; UCC – Unusual Configuration changes made in the FW, VPN, WAP and

Domain.

In the current state, the SIEMs (and other third-parties) in DiSIEM cannot calculate these above listed eleven SM, with exception to the PUA, which can be computed by the AMADEUS’s SIEM.

The Security Metric with the lowest utility value (2 out of 5) is: TAFBU - Top access failures by business unit

None of the partners uses this metric and neither do their SIEMs provide it.

Figure 5 illustrates the distribution of the number of metrics for each averaged utility value. Most of the SMs (35) received an average utility of 4 out of 5. No SM is considered without utility for the organizations, though TAFBU received a low utility value of 2.

D3.1

23

Figure 5 – Distribution of the metrics by average utility value

2.5.3 SMs already in use by the partners

The analysis of which metrics are already used by each partner reveals that AMADEUS already uses more than half (38) of the metrics surveyed, followed by EDP that uses half (31) and, lastly, by ATOS that uses only 5, as illustrated by Figure 6.

Figure 6 – Number of metrics already used by each partner with information

provided or not by their SIEM systems

2.5.4 SMs already in use produced by the SIEMs

From the SMs already in use by the respondents, we identified which are produced by their SIEMs. Figure 7 displays graphically, for each partner, which is the percentage of the metrics already in use that are produced by their SIEMs.

0 1

26

35

0 0

5

10

15

20

25

30

35

40

1 2 3 4 5

Nu

mb

er o

f M

etri

cs

Utility Value

D3.1

24

Figure 7 – SIEM coverage of the SMs already in use (in percentage)

2.5.5 Partners’ observations

The questionnaires revealed only two suggestions. One observes the current insertion of the AC value is done manually and should be autonomous. The other respects the UA – User Activity metric, which in ATOS detects the login attempts distributed by the type of user, and ATOS suggested that, in this case, it could be useful to also know the origin IP address.

2.6 Summary

This chapter reviewed classical security metrics, proposed a security metrics taxonomy and provided a set of security metrics suitable to be integrated with the SIEM. An assessment of these metrics was made by the industrial partners. The security metrics scored as being of maximum utility for at least one of the partners are the first candidates to be implemented in the components to develop in work package WP5 on Visual Analysis. To decide which of these metrics will be implemented, we will proceed with a deeper analysis of the requirements for their generation as well as of the feasibility of implementation.

29%

80%

63%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

EDP ATOS AMADEUS

Per

cen

tage

of

cov

erag

e

Partners

D3.1

25

3 DiSIEM multi-level risk assessment

This chapter presents a hierarchical model to assess multi-level security risk. This model aims at providing support for different levels of decision making concerning security operation and management: SOC analysts, middle level IT managers and senior managers. The model considers a valuation of assets as well as the interdependencies among them to contemplate risk spreading in the monitored infrastructure.

3.1 Definitions and concepts

This section starts by introducing concepts regarding risk, the risk assessment process and how assets of an organization are analysed and evaluated.

3.1.1 Definitions

The ISO/IEC 27005:2011 standard has become the de facto guideline to support information security risk management (ISO/IEC, 2011). With the purpose of presenting the most relevant definitions and concepts in the field, to support the remain of this chapter, Table 4 presents several risk management definitions extracted from this standard

Term/ expression

Definition

risk effect of uncertainty on objectives

level of risk magnitude of a risk, expressed in terms of the combination of consequences and their likelihood

consequence outcome of an event affecting objectives

likelihood chance of something happening

event occurrence or change of a particular set of circumstances

risk criteria terms of reference against which the significance of a risk is evaluated

risk management

coordinated activities to direct and control an organisation with regard to risk

risk assessment

overall process of risk identification, risk analysis, and risk evaluation

risk identification

process of finding, recognising and describing risks

risk analysis process to comprehend the nature of risk and to determine the level of risk

risk assessment

overall process of risk identification, risk analysis, and risk evaluation

risk evaluation

process of comparing the results of risk analysis with risk criteria to determine whether the risk and/or its magnitude is acceptable or tolerable

risk treatment process to modify risk

control measure that is modifying risk

Table 4 – List of terms and definitions extracted from (ISO/IEC, 2011)

D3.1

26

According to the standard, an event regards the occurrence or change of a particular set of circumstances, and in the information security context, an event is many times referenced as in incident. However, it is relevant to note here that in the context of SIEMs, the SIEM identifies an event whenever a set of pre-established rules that point out an incident are verified. This means the system may produce a false positive: the event is identified without the corresponding real existence of a security incident. Moreover, it is necessary that the SOC analysts assess if the event is a true positive, and only in that case we consider the occurrence of an incident, which we understand as an event that effectively compromised the security of an asset.

3.1.2 Risk assessment process

The process of assessing risk is based on analysing and evaluating both threats and adverse situations. Here, analysing consists of identifying those hazards, determining how frequently they can happen and what consequences they may cause, conceiving a clean perception of the risk inherent to an asset or organization.

There are three approaches to follow in risk assessment: qualitative, quantitative or semi-quantitative. The difference between qualitative and quantitative methods is based on the scale. The qualitative method uses a scale of qualifying attributes (e.g., Very Low, Low, Medium, High, Very High), while a quantitative approach uses a numerical scale (e.g., 0,1,2,3,4,5,6,7,8,9) to define the possible consequences and their likelihood of happening.

A semi-quantitative approach uses a scale in which a range of numerical values match to one single qualifying attribute (e.g., 0 and 1 match Low, 2 and 3 match Medium, 4 and 5 correspond to High) (ISO/IEC, 2011; NIST, 2012). Even with similarities, there are some advantages and disadvantages in each approach, as shown in Table 5.

Method Advantage Disadvantage Qualitative Simple Inexact

Agile Partial information treatment

Quantitative More precise More complex Complete information

treatment More vulnerable to errors

in treating information Table 5 – Analysis of qualitative and quantitative methods

Evaluating the threats and adverse situations is a process that should consider socioeconomic and environmental factors, promoting decisions based on a comparison with the risk acceptance criteria, i.e. the maximum level of risk that the organization it willing to accept.

D3.1

27

3.1.3 Risk treatment

A set of multiple possibilities of risk-reducing measures is created to control the risk of hazards that may occur (Whitman, 2014). Control strategies are ways to respond to risks identified and can be of five types: Defence, Transferal, Mitigation, Acceptance and Termination (Whitman, 2014).

The defence control aims to prevent the exploitation of the vulnerability. The transferal strategy, of which insurance policies are an example, has the objective of transferring the risk of an asset to another asset or entity. The mitigation strategy focuses on reducing the damage caused when the vulnerability is exploited. Acceptance of a risk might happen when the impact is considered to be low or the cost of mitigation too high. Finally, the termination strategy attempts to eliminate the vulnerable asset after an assessment of its importance.

The mechanisms chosen to be applied rarely eliminate the risk of a threat completely, thus leaving residual risk. The residual risk is the portion of risk that remains after the implementation of control mechanisms, due to several factors like cost, necessity, feasibility, or others.

3.2 Related work

The section starts with a review of the risk assessment process in current SIEM solutions considered in the project, namely, AlienVault, XL-SIEM, HP ArcSight, IBM QRadar, and Splunk, and an analysis their respective risk evaluation processes. An overview of the Atos Risk Assessment Engine, used by XL-SIEM, is also provided. Finally, the section reviews scientific literature related to the improvement of the communication among different stakeholders of an organization, and the impact of an asset could have in other assets regarding their risk.

The process of risk scoring in state-of-the-art SIEMs can vary between scoring events, as in AlienVault and XL-SIEM (AlienVault, 2017) and ArcSight (HP, 2017a) do, or scoring assets based on their vulnerabilities, as done by IBM QRadar (IBM, 2017a) and Splunk (Splunk, 2017a). However, to compute a risk score by the SIEMs that will be discussed further, it is necessary to provide crucial information about the criticality of the assets and other attributes.

3.2.1 Alien Vault and XL-SIEM

The AlienVault’s SIEM solutions: OSSIM (open source)/USM (commercial version) are created by the AlienVault enterprise and divided in three main components. These three main components are the Sensor, the Logger, and the SIEM. There are two types of SIEM provided by AlienVault: Open Source Security Information and Event Management (OSSIM) (AlienVault, 2017a), which is free, and the Unified Security Management (USM) (AlienVault, 2017b), which is the most complete one and, for that reason, we decided to review it. The AlienVault USM is available in software or hardware appliance and has a deployment

D3.1

28

flexibility that allows an all-in-one or divided implementation according to the needs of the organization.

Atos XL-SIEM is built on top of the open source SIEM Alienvault OSSIM developed by and integrates a set of Java processes. XL-SIEM was described in Deliverable 2.1 of the project (DiSIEM Consortium, 2017) and its risk evaluation process is the same as in AlienVault.

Risk score evaluation in these SIEMs is done for each event and the inputs and parameters, set up by the security expert, consist of the asset value, the nature and impact of the potential threat to which it is subjected, as well as the reliability of the data used to identify the attack. The risk score is defined using an integer scale of 0 to 10 based on Equation (1) (AlienVault, 2017).

𝑅𝑖𝑠𝑘 = (𝐴𝑆𝑆𝐸𝑇_𝑉𝑎𝑙𝑢𝑒 ∗ 𝑃𝑅𝐼𝑂𝑅𝐼𝑇𝑌 ∗ 𝑅𝐸𝐿𝐼𝐴𝐵𝐼𝐿𝐼𝑇𝑌) / 25 (1)

Where ASSET_Value 5, PRIORITY 5, RELIABILITY 10.

The values for all the required parameters, ASSET_Value, PRIORITY and RELIABILITY should be manually given by an expert.

The ASSET_Value parameter is specified using an integer scale between 0 and 5. Unfortunately, it seems that the system does not have a method or suggestion of how to classify the variable aforementioned. Due to this fact, some aspects should be noted about the asset evaluation.

When the AlienVault or the XL-SIEM SIEM is evaluating the risk score of an event, it seeks the manually inserted value of the asset in question. If the value is not inserted, the SIEM will use the value assigned to the network where the asset is. The values of the networks and respective components are manually inserted in the SIEM. Henceforth, the SIEM assumes that the value of the asset is the network value until the security expert changes it. It could happen that the asset does not belong to a network or it is just not possible to determine another value and, in that case, it will use the default value of 2.

Another aspect to consider is that for an event having multiple assets involved, the SIEM will use the asset value from the most valuable asset, even if the most valuable asset has a default calculated value.

The PRIORITY parameter focuses on the nature and impact of a threat to the asset and its value in an integer between 0 and 5 as presented by Table 6.

Level Number Qualitative Description 0 No Important 1 Very Low 2 Low 3 Average 4 Important 5 Very Important

Table 6 – AlienVault and XL_SIEM priority scale

D3.1

29

Lastly, the RELIABILITY relies on measuring the likelihood of an attack to happen. The scale for this parameter deviates from the other ones, as this one goes from 0 to 10 with steps of 1 as shown in Table 7.

Level Number Qualitative Description 0 False Positive 1 10% chances of attack 2 20% chances of attack 3 30% chances of attack 4 40% chances of attack 5 50% chances of attack 6 60% chances of attack 7 70% chances of attack 8 80% chances of attack 9 90% chances of attack

10 Real Attack Table 7 – AlienVault and XL_SIEM reliability scale

3.2.2 ArcSight Solution

The ArcSight (HP, 2017a) is a SIEM solution created by Hewlett-Packard (HP) (HP, 2017b). The ArcSight SIEM Platform is an integrated set of products for collecting, analysing, and managing enterprise event's information.

Risk evaluation on ArcSight is based on a priority indicator computed for each of the generated events (Thiele, 2014) and (Jäger, 2014). This indicator points out the priority to which the event should be investigated to determine if it is a threat trying to exploit a vulnerability or not.

The priority formula has four distinct parameters that must be set by the security expert: Model Confidence, Relevance, Severity and Asset Criticality.

The Model Confidence variable concerns the level of knowledge available about the target asset (asset under evaluation), measuring the level to which the target asset was already modelled and/or scanned before.

All possibilities of the Model Confidence score are described in Table 8.

The Relevance variable depends from the target asset having an exploitable vulnerability, where the event represents an action that might exploit it, and from the state of the port being attacked (if it is open or not). The Relevance variable default value starts at the highest score (10) and, depending on the facts mentioned above, it might decrease or increase. In the case that it gets a score over the limit score, it will be capped at the highest possible value. If the action on the event is a port scan the score is decreased by 5, and the same happens for a vulnerability scan. If the port is open, the score will be increased by 5, and the same occurs if there is a vulnerability that can be exploited.

D3.1

30

Score Description

0 Target is not modelled at all; target asset id is not populated

4 Target asset id is present, but it hasn't been scanned for open ports or vulnerabilities

8 Target asset is either scanned for open ports or vulnerabilities, but not for both

10 Target asset is scanned for both open ports and vulnerabilities

Table 8 – Model Confidence score possibilities

For a better insight on this variable, if the action on the event is a scan, on a port or vulnerability, the importance or relevance for the system is decreased, but if the port is open and there is a vulnerability, it will cause an increase in the score.

For example, if the action on the event is a scan, the relevance is set down to 5, being classified as "Partially Relevant". However, if in addition the port is open and/or there is a vulnerability, the relevance value is set to 10, and the Relevance is classified as "Highly Relevant".

Figure 8 illustrates the computation of the possible values of this variable.

Figure 8 – How to compute the relevance variable in ArcSight and its possible

values, from (Thiele, 2014)

The Relevance (R) and the Model Confidence (MC) variables are related to each other in the 𝑆𝑐𝑜𝑟𝑒𝑅𝑀𝐶 , obtained by Equation (2),

𝑆𝑐𝑜𝑟𝑒𝑅𝑀𝐶 =𝑅

(𝑅+𝑀𝐶)−(𝑅∗𝑀𝐶

10) (2)

where 𝑅 ∈ {0, 5, 10}, 𝑀𝐶 ∈ {0, 4, 8, 10}.

The Severity variable (S) considers if the target has already been compromised, as well as if prior activity from this source has been detected. The score associated to this variable is represented by Equation (3):

D3.1

31

𝑆𝑐𝑜𝑟𝑒𝑆 = 1 + (𝑆𝑒𝑣𝑒𝑟𝑖𝑡𝑦𝐿𝑒𝑣𝑒𝑙∗3

100) (3)

where the variable SeverityLevel can be replaced by one of the values described in Figure 9.

Figure 9 – Severity level possible values, from (Thiele, 2014)

The Recognition value (1) is attained when it can be said, with certainty, that the asset is not indeed compromised. The Suspicious value (3) should be added if there is a possibility of the asset being compromised. The Compromised value (3) is a score that allows to indicate if the asset is, indeed, compromised but the attacker cannot do anything yet or if it is even possible for the attacker to do anything. It is important to emphasize that the Suspicious and Compromised scores are the same because if the asset might be compromised, it is advisable to assume that it is indeed, to evaluate the severity. The other two scores, Hostile value (5) and Infiltrators value (6), correspond to situations where the attacker can jeopardize the system with or without more damage.

The last variable, Asset Criticality (AC), is responsible for measuring the impact of an asset based on its importance in the context of the organization, which can take one value of the six levels as specified in Table 9.

Level Description 0 Unknown 2 Very Low 4 Low 6 Medium 8 High

10 Very High Table 9 – Levels of importance of an asset

Considering the importance of an asset, it is possible to compute the score for the asset criticality, 𝑆𝑐𝑜𝑟𝑒𝐴𝐶 , using Equation (4)

𝑆𝑐𝑜𝑟𝑒𝐴𝐶 = ((𝐴𝑠𝑠𝑒𝑡𝐼𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑐𝑒−8

10) ∗ 0.2) (4)

where 𝐴𝑠𝑠𝑒𝑡𝐼𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑐𝑒 ∈ {0, 2, 4, 6, 8, 10}.

Considering all variables, the final score for Priority is computed by Equation (5):

𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 = (𝑆𝑐𝑜𝑟𝑒𝑅𝑀𝐶) ∗ (𝑆𝑐𝑜𝑟𝑒𝑆) ∗ (𝑆𝑐𝑜𝑟𝑒𝐴𝐶) (5)

D3.1

32

The Priority score is displayed in the ArcSight console similarly to the one exhibited in Figure 10.

Figure 10 – Priority scores, from (Thiele, 2014)

3.2.3 IBM QRadar

The IBM solution for SIEM system is the IBM QRadar Secure Intelligent Platform (IBM, 2017b), from now on it will be referred as IBM QRadar. IBM QRadar has deployments for hardware or software based and can have an all-in-one implementation or it can be divided through all the organization as well.

Risk evaluation on IBM QRadar is done by the Vulnerability Manager module. The Risk evaluation relies on finding vulnerabilities in the network and is associated with the asset that has the vulnerability.

To assess the score of each vulnerability found, IBM QRadar uses the Common Vulnerability Scoring System (CVSS) base score (First, 2017). Then, for evaluation, the risk score of each asset is given by the sum of scores of all vulnerability identified in that asset (IBM, 2015).

It is important to mention that multi-level assets can exist and therefore a multi-level vulnerability score can be created. This means that there are super assets and those super assets are just a set of other assets, as an example, a network. A network can be considered a super asset where the other constituent assets are the lower level assets, such as personal computers, workstations, firewalls, and routers.

Figure 11 shows an example of an interface in the IBM QRadar Vulnerability Manager module, where a set of assets and super assets (sets of assets), together with the respective risk scores, are displayed. The Vulnerabilities field is the number of all different kinds of vulnerabilities, and the field Vulnerabilities Instances is the total number of vulnerabilities without differentiation (IBM, 2017c).

D3.1

33

Figure 11 – IBM QRadar vulnerabilities scan results, from (IBM, 2017d)

3.2.4 Splunk

Splunk Enterprise provides a SIEM solution designated as Splunk (Splunk, 2017a). Splunk evaluates risk of objects relying on a risk analysis framework (Splunk, 2017b), where an object represents an asset of the organization and it can be divided into three types: ‘system’, ‘user’, or unspecified asset, known as ‘other’ (Splunk, 2017c).

Whenever a correlation search is made, to detect specific behaviour, an alert is triggered if the specific behaviour has occurred. The alert can be qualified as a notable event, a risk modifier, or both simultaneously (Splunk, 2017c).

A notable event is an event that becomes a task that must be assigned, reviewed and closed by a security expert. A risk modifier is an event and has, at least, an object, an object type, a register of a score of the risk modifier, and a description of the event.

For each object, the Total Risk Score, which represents the total risk of an object for the organization, is the sum of all risk modifiers’ scores for that object, as represented by Equation (6).

𝑇𝑜𝑡𝑎𝑙 𝑅𝑖𝑠𝑘 𝑆𝑐𝑜𝑟𝑒𝑂𝑏𝑗 = ∑ 𝑅𝑖𝑠𝑘 𝑀𝑜𝑑𝑖𝑓𝑖𝑒𝑟𝑠𝑆𝑐𝑜𝑟𝑒𝑠𝑂𝑏𝑗 (6)

The score of each risk modifier depends on three parameters: the BaseRiskScore, the Threat List Weight, and the Event Count Threat, as shown in Equation (7).

𝑅𝑖𝑠𝑘 𝑆𝑐𝑜𝑟𝑒 = 𝐵𝑎𝑠𝑒𝑅𝑖𝑠𝑘𝑆𝑐𝑜𝑟𝑒 + ∑(𝑇ℎ𝑟𝑒𝑎𝑡 𝐿𝑖𝑠𝑡 𝑊𝑒𝑖𝑔ℎ𝑡𝑖 ∗ 𝐸𝑣𝑒𝑛𝑡 𝐶𝑜𝑢𝑛𝑡 𝑇ℎ𝑟𝑒𝑎𝑡𝑖) (7)

D3.1

34

The parameter BaseRiskScore sets the minimum risk score that a risk modifier will have, in case of a correlation search triggering an alert, and it is classified qualitatively and quantitatively without a defined scale. Nevertheless, Splunk advises to use a standard range, as shown in Table 10 (Splunk, 2017d).

Qualitative Value Quantitative Value Info 20 Low 40 Medium 60 High 80 Critical 100

Table 10 – Qualitative and quantitative values proposed by Splunk (2017d)

The BaseRiskScore is set up, by the security expert, when a new correlation search is created. For each occurrence of the correlation search, all the resulting risk modifiers’ score get a Base Risk Score, according to the one defined (Splunk, 2017d).

Relative to the Threat List Weight, this parameter considers the importance, priority, or reputation of each threat list used by the correlation searches to investigate the behaviours. The weight of these lists does not have a limit and it is also set up by the security expert when creating a new Treat List to be used by the correlation searches (Splunk, 2017e).

Finally, the Event Count Threat parameter represents the number of events that have been matched with the Threat List in a specific object to affect the score of the risk modifier.

Once all parameters do not have upper limits, the Total Risk Score of an object does not have an upper limit either, the only restriction being the physical limit of the processors of the machines that handle this computation (Splunk, 2017d).

3.2.5 Atos Risk Assessment Engine

On top of the basic risk assessment procedure provided by AlienVault risk score evaluation (described in section 3.2.1), XL-SIEM can use the Atos Risk Assessment Engine (RAE) component to perform a complete risk assessment combining technical and business aspects of an organization in order to better evaluate the economic impact of an incident and deliver a risk assessment report.

Atos Risk Assessment Engine (WISER, 2016b) evaluates the risk faced by the company by executing an algorithm based on a set of machine-readable risk pattern model rules: a qualitative model, based on the DEXi software (Bohanec, 2016) and a quantitative one. The Risk Assessment Engine will perform such evaluation in near real-time using the following main inputs: The company business and ICT profile as well as information about the

targets or assets in the infrastructure. This information is provided by the

D3.1

35

user through a questionnaire and the configuration of the Risk Assessment Engine graphical dashboard. For each target, the user needs to assign a value (from a minimum of 1 to a maximum of 10) to the level required for that target concerning the security assets Confidentiality, Integrity and Availability. Additionally, the user can indicate the loss values (in euros per incident) for a typical loss scenario and for the worst scenario. These quantitative values can be also suggested automatically by the Risk Assessment Engine based on default values and information provided by the user through the business questionnaire.

Events and alarms received in real-time from the monitoring infrastructure. This monitoring information is provided by the Atos XL-SIEM component. The reception in real-time of these events and alarms will trigger the algorithm is launched and it is updated the risk assessment report.

Vulnerabilities detected when it is executed a vulnerability scan in the monitored infrastructure. A change in the vulnerabilities received can also launch the reevaluation of the risk models.

Figure 12 shows the risk assessment cycle involving the usage of the XL-SIEM component for the monitoring of the target infrastructure and the RAE component for the evaluation of the risk model algorithms. This is the way it is working in the context of the project WISER (https://www.cyberwiser.eu/) where the RAE component was developed.

Figure 12 – Atos Risk Assessment using XL-SIEM and Risk Assessment Engine

The information provided by the inputs described above (monitoring, testing and business and ICT profile information) is transformed in the Risk Assessment Engine in what it is called “indicators”. These indicators are used to instantiate

https://www.cyberwiser.eu/

D3.1

36

the risk models that have been loaded in advance to be executed by the Risk Assessment Engine algorithm.

The model rules can include generic risk patterns based on well-known libraries such as the Common Attack Pattern Enumeration and Classification (CAPEC) and the Open Web Application Security Project (OWASP), but it is also possible to add specific risk models developed for a particular organization. In the project WISER (WISER, 2016a), it has been defined ten common cyber risk patterns that can be used in the Risk Assessment Engine including common attacks such as denial of service attacks to specific attacks to web servers such as SQL injections or buffer overflows.

Consequently, the final risk assessment provided by Atos Risk Assessment Engine in the Cyber Risk Assessment Report is the result of the analysis and aggregation of a set of risk models that evaluate the business/ICT profile of the company, the vulnerabilities detected in the assets selected and the alarms generated by the XL-SIEM in the real-time monitoring of the infrastructure. This risk assessment report also include proposed mitigation measures associated to each risk model and grouped by the targets affected by the risk.

Summary

This section reviewed how risk evaluation is performed in different SIEM solutions, namely, in those that are in study by the DiSIEM project: XL-SIEM, HP ArcSight, QRadar and Splunk solutions. During this review, we realized that the majority of the SIEM solutions assess risk either of events or of assets, being the latest dependent of the events occurred on an asset.

Table 11 summarizes the SIEM solutions considered, their respective inputs, and an indication of the level at which risk is assessed: the asset level or the event level.

Solution Inputs Risk evaluated by IBM QRadar Vulnerability ID Asset

May consider dependencies Vulnerability's Score Asset ID

AlienVault and XL-SIEMs

Asset Value Event Does not consider dependencies Priority

Reliability ArcSight Model Confidence Event

Does not consider dependencies Relevance Severity Asset Criticality

Splunk Base Risk Score Asset Does not consider dependencies Threat List Weight

Table 11 – Inputs required for risk assessment in the reviewed SIEMs and focus of the risk evaluation

D3.1

37

Approaches for risk evaluation that consider dependencies of elements in the infrastructure are clearly more adequate, though more complex to implement, as components are not isolated and risk propagates.

Approaches that focus on evaluating risk of events are more suitable to support the SOC operation, because they provide information about what is happening in the real-time context. Evaluating risk of assets allows focusing more on the infrastructure and the assets, concerning their risk exposition, vulnerabilities and incidents, and less in the real-time context of the operation.

Apart from reviewing basic risk assessment provided by the SIEMs we also reviewed a more complex process offered by XL-SIEM, using the Atos Risk Assessment Engine. This process evaluates risks considering previously defined patterns of risk.

3.3 Scientific literature review

The literature in hierarchical security risk assessment is limited. RiskM is a multi-perspective modelling method for fostering and facilitating the communication and collaboration among stakeholders during the IT risk assessment process (Strecker et al., 2011). This method is sustained by a modelling language which represents all key concepts, objects and relationships between them, such as Risk, Impact Measure, and Uncertainty, for the method to be comprehensible. The multi-perspective view is divided into three different perspectives: IT Operations, Business Process, and Strategic level. Each perspective represents a different stakeholder and a different level of abstraction of the IT Risk.

The method also has a process model covering the three main phases of risk assessment, the risk identification, risk analysis, and the risk evaluation, indicating how each phase should proceed to have the better view, not just of each phase, but also for the IT structure of an entity.

RiskM recommends a two-phased process to evaluate risk, which has a bottom-up approach initially and then a top-down stage to complete the process. We have adapted this concept of process to identify the assets on the organization.

A methodology that addresses risk dependencies and their impact on IT projects during an IT management process is presented in (Kwan, 2010). The author concluded that the current methodologies address the risk management in IT too poorly due to considering risk as consequence of independent events, thus leading to an inadequate identification and management of these same risks. To solve this problem a new management methodology was proposed. This methodology redefines the risk management process and defines the processes to evaluate, react to, monitor, and control the risk dependencies by introducing a novel set of practices and types of dependencies that exist. A dependency is a relationship between two different risks, which are composed by the Impact and the Probability of occurrence, and it can have three different types.

D3.1

38

In addition, (Kwan, 2010) presents a set of three methods to calculate the effects of the risk dependencies, namely the conservative method, the optimistic method, and the weighted method, as well as new metrics to monitor and control the risks.

The analysis of the scientific literature review revealed that exists a gap between the current risk assessment methods and a multi-level risk assessment that allows having a more precise understanding of risk.

3.4 A Multi-Level Model for Risk Assessment in SIEM

This section presents the general concepts of the model. We begin by introducing the structure of the model followed by the description of the characteristics of each layer, where we indicate the type of assets each layer contains. Then, we describe the possible types of dependencies that might exist between the elements of the model.

3.4.1 Structure of the model

The structure of this model is divided hierarchically in three levels of decision making and it has three main objectives: calculate assets' risk, supply additional information on each asset, and support the decision-making process.

To calculate the assets' risk, the model divides the assets into three layers: hosts, applications, and services. The approach to assess the risk is a bottom-up approach, meaning that to be able to assess a service, it is necessary to assess all hosts and applications that are supporting it first.

The layers of the model were designed to map different levels of decision making, which must be considered in isolation due to the nature and complexity of each. The objective in this hierarchical structuring and mapping is to enhance the decision-making support at each level.

The lowest and operational level of decision making is coincident with the hosts' layer, where we are concerned with more technical details about the hosts, their management, and the IT infrastructure itself. The level of decision making that is coincident with the applications' layer is very similar to the previous one. Although the abstraction of the IT technical details and infrastructure is more evident, which allows us to start to focus on the business aspects. However, most of the C-Level managers are more concerned with business and less focused on operational technical issues.

Since there is a necessity of having assets described in a sufficiently abstract manner, at a strategic level, to improve the communication between the security managers and C-Level managers, this model includes a strategic layer of decision making for business functions, which is represented by the services' layer.

D3.1

39

The model can facilitate the communication between security analysts and managers, and C-level managers, and can also improve the process of decision making for each layer. By providing a risk score for each asset in each layer, managers can determine which risks should be treated with more or less priority at that layer, thus improving the efficacy of management of that layer.

The risk assessment for each asset has three strands: vulnerabilities, dependencies, and incidents. The vulnerabilities strand assesses the security anomalies intrinsic to the asset itself, while the dependencies strand assesses the impact of other related assets to the asset currently under evaluation. Finally, the incidents strand assesses the impact of events with an abnormal pattern.

The assessment of the risk was not based on a probabilistic model due to the difficulty to determine the likelihood of a vulnerability to be exploited. Instead, here we used a model based on numerically scoring the severity of vulnerabilities and incidents, as well as the asset valuation, to assess the risk score of each asset. This numerical score reflecting the severity of a vulnerability or an incident is commonly used by organisations to properly assess them and prioritise their vulnerability resolution process.

Complementary, Chapter 4 will approach risk assessment models based on probabilities estimation.

3.4.2 Characteristics of the layers

The proposed model aims at assessing risk in different levels of an organization, creating a global and detailed vision of the security of the information systems and the respective assets. The model has a hierarchical structure being composed of three layers of assets: Hosts, Applications, and Services, where this last layer has a holistic view of the other ones.

The Hosts layer, the lowest level layer, consists of the set of all physical assets. These physical assets can be servers or virtualized servers, personal computers, routers, switches, firewalls, and others. At the Applications layer, the set of assets includes all kind of software, e.g., middleware, web services, or websites, which supports the organization operation and business, as well as its non-profit services. Lastly, the Services layer represents the abstract assets that characterize a set of actions or functions that are supported by applications and hosts, to maintain the objectives of the organization.

3.4.3 Types of dependencies

A dependency is a relationship between two assets that can be either unidirectional or bidirectional. Since the model is hierarchically divided in layers, a dependency can be intra or inter layer, where an intra layered dependency is between assets on the same layer, for instance when a host receives information from others, and the inter layered dependency is between an asset and another from a layer in a lower level, for instance when an application executes on a given host. A dependency can be seen in a different perspective, meaning if asset A depends on asset B, asset B supports asset A.

D3.1

40

Figure 13 represents the three types of dependencies considered between assets: part (a) represents a one directional intralayer dependency, part (b) a bi-directional intralayer dependency, while part (c) represents an interlayer dependency.

Figure 13 – Types dependencies between assets

3.4.4 Identification of assets and dependencies

The process to identify the assets and dependencies between them is divided in two phases: bottom-up, and top-down phase.

The purpose of bottom-up phase is to identify assets supported by applications that have vulnerabilities or had incidents, or assets supported by hosts that have vulnerabilities or had incidents as well. To accomplish its purpose, this phase has three steps.

The first step is to identify hosts that have vulnerabilities or an history of incidents based on a list of vulnerabilities/incidents. The second step consists in finding all applications that are supported by hosts identified as vulnerable. Finally, the third step is to identify all services that are either supported by applications identified in the previous step. By the end of the third step, all services that are supported by vulnerable assets should have been already identified.

The top-down phase aims at identifying the remaining applications and hosts that support the services that were identified in the previous phase. This phase has three steps as well and the first one is to identify all applications that are supporting each service. The second one is to find all hosts that are supporting each application. Finally, the third step is to identify all assets and their dependencies.

This process can be extended to identify all assets, with or without vulnerabilities or past incidents, from the bottom-up phase. To implement the model, this process is not indispensable but applying it guarantees all assets that have vulnerabilities/incidents or relationships with vulnerable assets are identified creating a realistic risk setting. Figure 14 describes in detail the phases in six steps.

(a) one directional intralayer dependency

(b) bi-directional intralayer dependency (c) interlayer

dependency

D3.1

41

Figure 14 – Assets and dependencies discovery process

3.4.5 Risk evaluation

In the context of DiSIEM, we adopted a quantitative approach to risk assessment as this type of approach allows a better differentiation of critical situations of assets requiring treatment (resolution of vulnerabilities).

For each asset, regardless of its layer, a risk score should be computed. This score includes two components: intrinsic, and imported risk. The intrinsic risk is the amount of risk originated from the existing issues on the asset itself, while the imported risk is the amount of risk inherited from other assets due to the dependencies from them. In order to weigh both types of risk and to grant a total risk of the asset, the assessment is based on three variables: vulnerabilities, dependencies, and incidents.

The vulnerabilities variable represents the risk of an asset regarding the vulnerabilities it presents and is part of the intrinsic risk. The dependencies variable assesses the risk of assets supporting the asset, and corresponds to the imported risk. The incidents variable evaluates the impact of events having occurred on the asset that can jeopardize its security and can be classified as intrinsic risk as well.

Consider 𝐽 the set of existing assets. The risk score of a generic asset 𝑗, 𝑗 ∈ 𝐽 is computed by a weighted sum of three risk components (variables), as represented by Equation (8).

D3.1

42

𝑅𝑖𝑠𝑘 𝑆𝑐𝑜𝑟𝑒𝑗 = 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝑆𝑢𝑚(𝑉𝑉𝑗, 𝐷𝑉𝑗 , 𝐼𝑉𝑗) (8)

where

𝑉𝑉𝑗 = value of the vulnerabilities variable for asset 𝑗,

𝐷𝑉𝐽 = value of the dependencies variable for asset 𝑗,

𝐼𝑉𝐽 = value of the incidents variable for asset 𝑗, 𝑗 ∈ 𝐽.

The model has several assumptions, which are described in the following. Any risk score value, including the score value of any of the involved variables, should be comprehended in an interval established in advance, considering zero as the minimum score value and as maximum a predefined value, e.g., 10, 100, or 200. This maximum risk score is set by the organization and defines a risk scale that is linear.

The function 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝑆𝑢𝑚 in the equation indicates that each variable has a specific weight attributed where the sum of all variables' weights is equal to 1.

The risk score of a service should only consider the value of the dependencies variable, once services do not have vulnerabilities or incidents. For the Hosts and Applications' layer, all variables are considered.

3.4.5.1 Vulnerabilities variable score

Equation (9) represents the score of 𝑉𝑉𝑗, the vulnerabilities variable for asset 𝑗,

where 𝑗 ∈ 𝐽. The equation considers the sum of the scores of all vulnerabilities present on the asset, weighted to give different importance to the highest scored vulnerability. The sum is normalised to the interval [0;UL]. Here UL is the upper limit of the scale interval and 𝑀𝑎𝑥𝑆𝑐𝑜𝑟𝑒𝑉 is the maximum score risk value possible for the vulnerability variable of an asset, which considers all the vulnerabilities in the asset.

𝑉𝑉𝑗 =(𝑤𝑣∗(∑ 𝑉𝑢𝑙𝑛𝑆𝑐𝑜𝑟𝑒𝑖𝑗𝑖∈𝑉𝑢𝑙𝑛𝑠(𝐴𝑠𝑠𝑒𝑡𝑗),𝑖≠ℎ

)+𝑤ℎ∗𝑉𝑢𝑙𝑛𝑆𝑐𝑜𝑟𝑒ℎ𝑗)∗𝑈𝐿

𝑀𝑎𝑥𝑆𝑐𝑜𝑟𝑒𝑉, (9)

Where

𝑉𝑢𝑙𝑛𝑠(𝐴𝑠𝑠𝑒𝑡𝑗) is the set of indexes for open vulnerabilities in asset 𝑗, 𝑗 ∈ 𝐽;

𝑉𝑢𝑙𝑛𝑆𝑐𝑜𝑟𝑒𝑖𝑗 is the risk score of vulnerability 𝑖 in asset 𝑗, 𝑖 ∈ 𝑉𝑢𝑙𝑛𝑠(𝐴𝑠𝑠𝑒𝑡𝑗),

𝑗 ∈ 𝐽; h is the index of the highest scored vulnerability in 𝑉𝑢𝑙𝑛𝑠(𝐴𝑠𝑠𝑒𝑡𝑗), 𝑗 ∈ 𝐽;

𝑤ℎ and 𝑤𝑜 are the weights for the highest scored vulnerability and for the sum scores of the others (that are not the highest), respectively, with 𝑤ℎ + 𝑤𝑜 = 1;

D3.1

43

𝑀𝑎𝑥𝑆𝑐𝑜𝑟𝑒𝑉 is the maximum total risk score of vulnerabilities on an asset; and

𝑈𝐿 is the upper limit of the risk scale interval.

The weights 𝑤ℎ and 𝑤𝑜 provide flexibility to the model, because they allow more or less importance to be given to the highest scored vulnerability in comparison to the others. This differentiated level of importance given to the highest vulnerability allows to focus only on the most severe vulnerability by setting 𝑤ℎ = 1 and 𝑤𝑜 = 0. To give equal importance to all vulnerabilities these weights should be set equal.

Scoring a vulnerability

To evaluate the risk score of an open vulnerability in an asset 𝑉𝑢𝑙𝑛𝑆𝑐𝑜𝑟𝑒𝑖𝑗, for

vulnerability 𝑖 in asset 𝑗, 𝑖 ∈ 𝑉𝑢𝑙𝑛𝑠(𝐴𝑠𝑠𝑒𝑡𝑗) , 𝑗 ∈ 𝐽 , three elements are

considered: the vulnerability severity score, the vulnerability persistence, and the valuation of the asset where the vulnerability exists.

The vulnerability persistence has the purpose of quantifying the amount of time during which the vulnerability persisted opened. While the vulnerability remains open, the attackers have the opportunity to explore it in more diversified ways leading to a more probable exploitation. This factor is considered to increase the risk score for situations where a vulnerability is open for a long period. Thus, an increased risk should alert the security managers to pay special attention and resolve that vulnerability.

The persistence of vulnerability 𝑖 in asset 𝑗, 𝑖 ∈ 𝑉𝑢𝑙𝑛𝑠(𝐴𝑠𝑠𝑒𝑡𝑗), given by Equation

(10), measures the relative amount of time the vulnerability has been open considering the maximum amount of time the organisation conceives as possible to have a vulnerability without resolution, respectively. If a value greater than 1 is obtained, it is then capped to 1, thus varying in ]0;1].

𝑉𝑢𝑙𝑛𝑃𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒𝑖𝑗 = min (𝑁𝑜𝐷𝑖𝑗

𝑀𝑎𝑥𝑁𝑜𝐷, 1) (10)

where

𝑁𝑜𝐷𝑖𝑗 = number of days the vulnerability 𝑖 is open in asset 𝑗, 𝑁𝑜𝐷𝑖𝑗 ≥ 1, and

𝑀𝑎𝑥𝑁𝑜𝐷 = maximum number of days a vulnerability can remain open.

Equation (11) presents the calculation of the vulnerability score relating the vulnerability basic score and vulnerability persistence.

𝑉𝑢𝑙𝑛𝑆𝑐𝑜𝑟𝑒𝑖𝑗 = 𝑉𝑢𝑙𝑛𝑆𝑒𝑣𝑒𝑟𝑖𝑡𝑦𝑖 ∗ (1 + 𝑉𝑢𝑙𝑛𝑃𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒𝑖𝑗) ∗ 𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑗 (11)

Where the 𝑉𝑢𝑙𝑛𝑆𝑒𝑣𝑒𝑟𝑖𝑡𝑦𝑖 is the vulnerability severity score by means of the vulnerability scoring system adopted; 𝑉𝑢𝑙𝑛𝑃𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒𝑖𝑗 is the persistence of

D3.1

44

vulnerability 𝑖 in asset 𝑗, and 𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑗 is the business value of asset 𝑗, in the

system for evaluation of assets used by the organisation.

As the purpose of the vulnerability persistence component is to stress the risk of vulnerabilities with a long persistence, to assure that 𝑉𝑢𝑙𝑛𝑆𝑐𝑜𝑟𝑒𝑖𝑗 ≥

𝑉𝑢𝑙𝑛𝑆𝑒𝑣𝑒𝑟𝑖𝑡𝑦𝑖 a factor of (1 + 𝑉𝑢𝑙𝑛𝑃𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒𝑖𝑗) is considered.

Regarding the business value of the asset, it should be expressed in a quantitative scale adequate for organisation and assuming that an asset which has a low business value will be scored with the value 1.

Many times, the organisations rely on qualitative scales to categorise their assets and, in those cases, a corresponding quantitative scale as to be established. Any number of levels can be used, as long as the scale is consistent with the one used by the organisation for its whole risk assessment process. The value of the asset should reflect the relevance and quantity of the business processes supported by that asset. Here, the business value also contemplates the business impact direct and indirect of a security incident involving the asset.

Example of application

Consider an organisation that adopted as scoring system the CVSS vs 3.0 base score (First, 2017). Let us suppose that one of its hosts has the OpenSSL Heartbleed Vulnerability (CVE-2014-0160) vulnerability open for 3 months. The organisation established that the maximum number of days a vulnerability can remain in the open state is 365 (one year). Then the persistence for this vulnerability is VulnPersistence=90/365=0.25. As the score of the CVE-2014-0160 in the CVSS base score is 7.5, in our model this severity score will be increased in 25%, thus leading to a value of the 9.375. Finally, if the valuation of the asset was scored with the value 2, this vulnerability score would double to 𝑉𝑢𝑙𝑛𝑆𝑐𝑜𝑟𝑒 = 18.75. Obviously, the semantic of this score is completely dependent of the organisation definition of risk assessment criteria. Moreover, the definition of a maximum time allowed for resolving a vulnerability sets a limit above which the criticality is so high that no differentiation is needed.

Whenever in Equation (10) the persistence reaches the maximum of 1, because the vulnerability is open for one year or more, the factor weighting persistence in Equation (11) also reaches the maximum value of 2.

Normalising the vulnerabilities variable to the common risk scale

The MaxScoreV parameter represents the maximum total risk score for vulnerabilities on an asset, which would correspond to an extreme situation the organisation can conceive, and beyond which the risk cannot be tolerated. This value is obtained from several parameters defined by the organisation: the maximum score value for which the organisation accepts to maintain a

critical vulnerability without resolution, e.g., a vulnerability with a score of 8 in a score range of 0 to 10, e.g. in CVSS (First, 2010);

D3.1

45

the maximum number of critical vulnerabilities that can exist simultaneously in an asset, without resolution;

the maximum time the organisation tolerates the presence of a critical vulnerability without resolution (as a portion of one year), and

the maximum value for assets valuation in a numerical representation of the upper limit of the scale.

Equation (12) represents the MaxScoreV.

𝑀𝑎𝑥𝑆𝑐𝑜𝑟𝑒𝑉 = 𝑀𝑎𝑥𝑆𝑉 ∗ 2 ∗ 𝑀𝑎𝑥𝑁𝑉 ∗ 𝑀𝑎𝑥𝐵𝑉𝐴 (12)

where

𝑀𝑎𝑥𝑆𝑉 = maximum score value of a vulnerability that is kept without resolution;

𝑀𝑎𝑥𝑁𝑉 = maximum number of open vulnerabilities with the score 𝑀𝑎𝑥𝑆𝑉 or higher;

𝑀𝑎𝑥𝐵𝑉𝐴 = maximum value for business valuation of assets.

In Equation (12), the factor 2 represents the maximum factor for persistence.

3.4.5.2 Dependencies variable score

The dependencies variable 𝐷𝑉𝑗 , 𝑗 ∈ 𝐽, were 𝐽 is the set of assets, is the average the

risk scores of the assets supporting asset 𝑗. These values are already in the scale of the risk score. Equation (13) represents the calculation of the dependencies variable score.

𝐷𝑉𝑗 = 𝑤𝑎 ∗ (∑ 𝑅𝑖𝑠𝑘𝑆𝑐𝑜𝑟𝑒𝑖 ∗ 𝑟𝑒𝑙𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑖𝑖∈𝐷𝑒𝑝𝑠(𝐴𝑠𝑠𝑒𝑡𝑖)𝑖≠ℎ ) + 𝑤ℎ ∗ 𝑅𝑖𝑠𝑘𝑆𝑐𝑜𝑟𝑒ℎ ∗ 𝑟𝑒𝑙𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒ℎ(13)

Where

𝐷𝑒𝑝𝑠(𝐴𝑠𝑠𝑒𝑡𝑗) =set of assets supporting asset 𝑗, 𝑗 ∈ 𝐽;

𝑅𝑖𝑠𝑘𝑆𝑐𝑜𝑟𝑒𝑖 = risk score of asset 𝑖 as computed by Equation (8); ℎ is the index of the asset with the highest risk score in 𝐷𝑒𝑝𝑠(𝐴𝑠𝑠𝑒𝑡𝑗);

𝑤ℎ and 𝑤𝑎 are the weights for the highest scored asset and for the sum scores of the others (that are not the highest), respectively, with 𝑤ℎ, 𝑤𝑎 ≥0, 𝑤ℎ + 𝑤𝑎 = 1; and

𝑟𝑒𝑙𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑖 is the relative contribution of asset 𝑖 for the total business value of assets supporting asset 𝑗,

𝑟𝑒𝑙𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑖 =𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑖

∑ 𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑘𝑘∈𝐷𝑒𝑝𝑠(𝐴𝑠𝑠𝑒𝑡𝑘)

where 𝐵𝑢𝑠𝑉𝑎𝑙𝑢𝑒𝑖 is the business value of asset 𝑖, in the assets evaluation system of the organization.

For the dependencies variable, two components are considered: the average risk score of supporting assets, weighted by the corresponding relative business

D3.1

46

value, and the maximum risk score of supporting assets. Again these components are weighted.


Consider asset XYZ which is support by assets X, Y and Z, which contribute with relative business values of 20%, 30% and 50%, respectively, of the total business value of assets supporting XYZ. Consider that assets X, Y, Z have the risk score values of 5, 40 and 0, respectively. The parameters 𝑤ℎ and 𝑤𝑎 have values 0.7 and 0.3, respectively. The value for the dependencies component for risk of XYZ is 𝐷𝑉𝑋𝑌𝑍 = 0.3 ∗ 5 ∗ 0.2 + 0.7 ∗ 40 ∗ 0.3 = 8.3.

3.4.5.3 Incidents variable score

The evaluation of the incidents variable for asset 𝑗, 𝑗 ∈ 𝐽, includes two parts, as represented by Equation (14): the score of the incidents occurred in the current month and the history of the past three months, weighed accordingly to suit the reality of the organization.

𝐼𝑉𝑗 = 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝑆𝑢𝑚(𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑀𝑜𝑛𝑡ℎ𝑆𝑐𝑜𝑟𝑒𝑗 , 𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑀𝑜𝑛𝑡ℎ𝑠𝑆𝑐𝑜𝑟𝑒𝑗) (14)

This variable represents the amount of risk that is associated with incidents in an asset. An asset that suffers many severe incidents propagates a certain level of risk, either because is very vulnerable or because is attractive for attacks, and this risk should be considered and inherited by assets that depend on it. The current month risk assessment process is the sum of the scores of the incidents that occurred in the current month, in asset j. A conversion of scale is applied to the sum using the intended scale, and the maximum risk possible for incidents on an asset. This process is similar to the computation of the vulnerabilities variable and is represented by Equation (14) specifically on 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑀𝑜𝑛𝑡ℎ𝑆𝑐𝑜𝑟𝑒𝑗 factor.

Equation (15) describes thoroughly the process of assessing the current month score.

𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑀𝑜𝑛𝑡ℎ𝑆𝑐𝑜𝑟𝑒𝑗 =(𝑤𝑜∗∑ 𝐼𝑛𝑐𝑆𝑐𝑜𝑟𝑒𝑖𝑗𝑖∈𝐼𝑛𝑐𝑠(𝐴𝑠𝑠𝑒𝑡𝑗),𝑖≠ℎ

+𝑤ℎ∗𝐼𝑛𝑐𝑆𝑐𝑜𝑟𝑒ℎ𝑗)

𝑀𝑎𝑥𝑆𝑐𝑜𝑟𝑒𝐼∗ 𝑈𝐿 (15)

Where

𝐼𝑛𝑐𝑠(𝐴𝑠𝑠𝑒𝑡𝑗) is the set of incidents in asset 𝑗, in the current month, 𝑗 ∈ 𝐽;

𝐼𝑛𝑐𝑆𝑐𝑜𝑟𝑒𝑖𝑗 is the risk score of incident 𝑖 in asset 𝑗, 𝑖 ∈ 𝐼𝑛𝑐𝑠(𝐴𝑠𝑠𝑒𝑡𝑗), 𝑗 ∈ 𝐽;

h is the index of the highest scored incident in 𝐼𝑛𝑐𝑠(𝐴𝑠𝑠𝑒𝑡𝑗), 𝑗 ∈ 𝐽;

𝑤ℎ and 𝑤𝑜 are the weights for the highest scored incident and for the sum scores of the others, respectively; with 𝑤ℎ, 𝑤𝑜 ≥ 0 and 𝑤ℎ + 𝑤𝑜 = 1;

D3.1

47

𝑀𝑎𝑥𝑆𝑐𝑜𝑟𝑒 is the maximum total risk score of incidents occurring in an asset in a month period, and

𝑈𝐿 is the upper limit of the risk scale interval.

As happens with the vulnerabilities variable, by considering differentiated weights for the incident with the highest score and for the others allows more importance to be given to the most severe incident.

Normalising the incidents variable to the common risk scale

The MaxScoreI represents the possible maximum risk score for incidents in a month, similarly to the MaxScoreV parameter in Equation (9). The MaxScoreI parameter depends on the highest score value of an incident for the incidents scoring system in use by the organization; and the maximum number of incidents with the highest score could occur in a single month – in an extreme situation. Equation (16) represents the MaxScoreI.

𝑀𝑎𝑥𝑆𝑐𝑜𝑟𝑒𝐼 = 𝑀𝑎𝑥𝑆𝐼 ∗ 𝑀𝑎𝑥𝑁𝐼 (16)

Where 𝑀𝑎𝑥𝑆𝐼 = maximum score value of an incident in the incidents scoring

system; 𝑀𝑎𝑥𝑁𝐼 = maximum number of incidents with the score 𝑀𝑎𝑥𝑆𝐼 can happen in

a single month.

The historical component 𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑀𝑜𝑛𝑡ℎ𝑠𝑆𝑐𝑜𝑟𝑒𝑗 assesses the impact of

previous incidents, and depends on the risk score of incidents on the previous three months, as represented by Equation (17).

𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑀𝑜𝑛𝑡ℎ𝑠𝑆𝑐𝑜𝑟𝑒𝑗 = 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝑆𝑢𝑚(𝐹𝑀𝑆𝑐𝑜𝑟𝑒𝑗, 𝑆𝑀𝑆𝑐𝑜𝑟𝑒𝑗, 𝑇𝑀𝑆𝑐𝑜𝑟𝑒𝑗) (17)

Where

𝐹𝑀𝑆𝑐𝑜𝑟𝑒𝑗 is the incidents risk score for asset 𝑗 in the previous month, i.e., the

value of 𝐼𝑉𝑗 one month ago;

𝑆𝑀𝑆𝑐𝑜𝑟𝑒𝑗 is the incidents risk score for asset 𝑗 two months ago, i.e., the value

of 𝐼𝑉𝑗 two months ago; and

𝑇𝑀𝑆𝑐𝑜𝑟𝑒𝑗 is the incidents risk score for asset 𝑗 three months ago, i.e., the

value of 𝐼𝑉𝑗 three months ago.

Scoring an incident

The assessment of incidents can be done using different scoring systems. Each organisation can adopt its own system. In practice, we propose to score an incident using the product in Equation (18).

D3.1

48

𝐼𝑛𝑐𝑆𝑐𝑜𝑟𝑒 = 𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙𝐼𝑚𝑝𝑎𝑐𝑡 ∗ 𝐶𝑜𝑛𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑆𝑒𝑣𝑒𝑟𝑖𝑡𝑦 ∗ 𝑆𝑒𝑐𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛(18)

The variables involved in this equation are common properties for assessing incidents and can take values according, for instance, the ones proposed by of ArcSight, as presented in Table 12.

Propriety Possible Values

Operational Impact

0- No Impact 1- No Immediate Impact 2- Low Priority Impact 3- High Priority Impact 4- Immediate Impact

Consequence Severity

0- None 1- Insignificant 2- Marginal 3- Critical 4- Catastrophic

Security Classification

1- None 2- Insignificant 3- Marginal 4- Critical

Table 12 – Properties for classifying incidents according to ArcSight


Consider an organisation that adopted as incidents scoring system the one described in Table 12. Consider an incident Inc with Operational Impact scored as 4, Consequence Severity scored as 4 and a Security Classification score of 4. This incident receives the maximum incident score, ie, 𝐼𝑛𝑐𝑆𝑐𝑜𝑟𝑒 = 16. This is also the value for 𝑀𝑎𝑥𝑆𝐼. If the maximum number of incidents, 𝑀𝑎𝑥𝑁𝐼, with this maximum score, in a month, is 3, the value of the possible maximum risk score for incidents in a month MaxScoreI is 48. Let us also assume that the scale for risk scoring is [0;100].

Now consider that incident Inc, evaluated above, was the only incident that occurred in asset XYZ in the current month and that in the configuration of the model parameters all the incidents have equal score, then the risk component for

the current month incidents score is 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑀𝑜𝑛𝑡ℎ𝑆𝑐𝑜𝑟𝑒𝑋𝑌𝑍 = 16 ∗100

48= 33.3.

If asset XYZ didn’t suffer any incidents in the previous three months, then this value, weighed accordingly, is the final score for incidents in this asset, otherwise the weighted component of previous incidents scores should be added.

D3.1

49

3.5 Summary

This chapter reviewed risk assessment in state of the art SIEMs and proposed a model for multi-level risk assessment. This model is innovative considering current state of the art risk assessment in SIEMs, as it goes further than assessing events, allows evaluation risk on assets beyond applications and hosts, it considers dependencies and the business value of assets for the company.

The model considers three layers which are related to decision making at different levels of security management: services, applications and hosts. SOC analysts will be more concerned with assessing risk at the hosts and applications level, while assessing risk at the services level will provide executive managers with a clearer and broader view of the security status of the organisation. The model considers knowledge about the monitored infrastructure, taking into account which assets are interdependent thus allowing risk to propagate along the network, as well as information from the SIEM (incidents and, in some cases, vulnerabilities) and from third-parties (vulnerabilities). It also feeds itself as it considers a history of the risk associated with incidents.

The model hierarchy was conceived to make possible assessing the security risk of the organisation as a whole or analysing it by functional areas or business units. In addition, the possibility of comparing the amount of risk in different applications as well as services will enhance the SOC capability to communicate risk to the upper levels of security management decision making. Therefore, it will be easier to identify which applications need special consideration in what regards monitoring and risk treatment, thus contributing for an improved security status. To conclude, the model will contribute to strengthen the analytics capabilities regarding security.

This model will be revised and calibrated in the following tasks of WP3, based on feedback from task T3.1, which results are presented in this report. In work-package Visual Analysis Platform (WP5), we will focus on the visualisation of the model results and how they should be better conveyed to decision makers and on its integration on the envisaged components.

D3.1

50

4 Diversity metrics

In this chapter we discuss metrics that are of interest when several diverse security protection devices are used to protect a system, and the assessor wishes to find out how much better (or worse) the overall system security is when using diverse systems. (“Protection” here may mean directly stopping attacks or delivering to users of a SIEM reliable alerts of attacks.) This involves understanding how the strengths and weaknesses of diverse defences add up to the total strength of the system.

4.1 Introduction

This assessment problem first attracted attention in the 1980s about “design diversity” for software in novel, critical uses like fly-by-wire aircraft and nuclear reactor safety systems. Replicating, but diversifying, components is an appealing way of improving dependability and security: diversity reduces the risk of redundant components (including security defences) sharing the same systematic weaknesses, whether against accidental faults or attacks. But it soon became apparent that assessing gains from diversity would be difficult. Early, naive claims of statistical independence between failures of diverse redundant components proved unjustifiable. This realisation prompted more sophisticated probabilistic modelling of software design diversity, allowing for, and trying to quantify, correlation between the failures of diverse components. The Centre for Software Reliability (CSR), at City, University of London, have applied these approaches for modelling diversity not only to software reliability and safety, but also to common-mode failures due to any causes, human-machine systems, diverse arguments to improve confidence in dependability claims, and diversity for security. CSR has produced the bulk of novel research results on diversity for dependability in the last 30 years1, including: - “Conceptual” models that – beyond proving that independent development

does not ensure independent failures – inform decisions about how to pursue diversity effectively;

- Methods for assessing the reliability of a delivered system taking into account its use of diversity, avoiding the dangerous simplification of assuming failure independence;

- Empirical contributions, both running experiments and assessing data collected by others, which have informed the modelling work, and demonstrate wide variations in effectiveness of diversity, in various contexts. Examples include studies of effects of diversity (probability of common failure, and thus dependability or security gains to be achieved by applying diversity):

- between various ways of organising the process of developing, testing and deploying systems comprised of functionally redundant, complex software;

- between programming languages and program structure;

- in current off-the-shelf software: DBMSs, anti-virus products, operating systems;

1 For an overview of this research and a list of publications see

http://www.csr.city.ac.uk/diversity/.

http://www.csr.city.ac.uk/diversity/

D3.1

51

- between human operators and decision support systems meant to reduce their error rates;

- between alternative ways of verifying software.

In a sense, the most important result has been that judgments based solely on intuition about the efficacy of diversity are often wrong (and often dangerously so). Common a priori generalisations about gains from diversity are unjustifiable. Hence the importance of the “conceptual” models mentioned above. Especially when higher reliability is desired than can be demonstrated by feasible measurements, probabilistic models (1) clarify how to use the evidence and (2) give designers insight about which means applied to pursue diversity actually help to make it effective.

An important part of design for security is defence in depth: “layers” of defence that reduce the probability of successful attack. E.g., an attack needs to pass one (or more) firewalls and go undetected by one or more Intrusion Detection Systems (IDS) before it can exploit a vulnerability in a protected host to attack an asset. Guidance documents now advocate defence in depth as an obvious need; but their qualitative guidance ignores decision problems. The important questions are not about defence in depth being "a good idea", but about e.g. whether these specific three layers would improve security more than those two; and about – if possible – quantifying these security gains.

Crucially, these questions about defence in depth again concern, in fact, diversity: layered defences should be diverse in their weaknesses. Any attack that happens to defeat one defence should with high probability be stopped or detected by some other one2. Diversity and defence in depth are two facets of the same defensive design approach, which this research is meant to support.

In discussing metrics for diversity, one needs to be aware that the word “diversity” is commonly used, confusingly, for both the means used and the results sought: pursuing ”diversity” between parts of a system (e.g., using – as layers of defence or as redundant sensors – components produced by different manufacturers using different design principles) is the means through which one pursues some useful results of “diversity” in their behaviour - specifically, a tendency not to fail in the same situations. The two forms of “diversity” are obviously related, but in non-obvious and sometimes counter-intuitive ways: one can expect two identical IDSs analysing the same traffic to routinely fail to detect the same attacks and give false alarms on the same innocuous traffic; using diverse IDSs will normally reduce the probability of such ‘common failures”; but it is not a priori clear by how much.

2 Some authors use the term “defence in breadth” for defences that are complementary in being

meant for different types of attacks (“covering a broader front”), rather than “happening to” differ in how well they cope with them. In practice there is no sharp boundary but a range of “weaknesses” of defences, from intentionally not covering certain attacks, to accidentally doing so, with deterministic or random effects. We will study effects of diversity on any combinations of these issues.

D3.1

52

Ideally, the design of systems built from diverse parts (this ‘design” meaning, e.g., choosing sets of IDSs and specifying how their results are to be ‘aggregated’) should be driven by measures of “diversity in the results”. However, metrics of “diversity in the means” (e.g., whether two IDSs are based on different principles so as to be likely to have different weaknesses, implemented in ways that are not likely to cause similar bugs, developed by vendors not likely to have used the same faulty software components, etc.) are useful for designing such systems when hard data about diversity “in the results” are inadequate for driving decisions. In security, although we can get these hard data, this is expensive and always affected by the knowledge that measures of effectiveness against past attacks are imperfect predictors of effectiveness for future attacks: therefore, even when we have these hard data about “diversity in the results” we will probably still need to give some weight to metrics about “diversity in the means”. E.g. even if we discovered that two IDSs have so far always only failed in exactly the same circumstances, and thus provided no useful “diversity in the results”, we may want to keep them both because we know them to be radically diverse (e.g. we suspect they have backdoors, but inserted at the behest of two different and reciprocally hostile governments).

Issues of measuring or estimating diversity arise in at least these scenarios: helping a designer or operator of multi-layer defences in choosing a good

configuration of them, by documenting the effects of alternative configurations on overall effectiveness. Metrics allow comparison of different configurations, perhaps turning sensors and tools on or off, changing their configuration or, where this is not possible, recombining and changing rulesets and configurations used or observing improvement over time.

deciding what “aggregation rules” a SIEM should use to condense the outputs of multiple alert systems (e.g. IDSs) into a single “alert/no alert” or “green/yellow/amber” or similar information for an operator (hiding from the operator the complexity of the underlying pattern of discordant outputs from these various sources)

given a decision that the SIEM should reveal to the operators at least some of these multiple sources of alerts, to use human judgement to resolve difficult situations, deciding how they should be presented in a convenient aggregate form. Operators may be easily misled by raw, complex patterns, e.g. tend to trust a majority among the IDSs even though that majority can be wrong and could be guessed to be so. Additional information might be given to help discriminate majorities due to lack of diversity.

SIEMs already expect diversity from the sensors, and are able to “correlate” or “aggregate”3 event reports from firewalls, anti-virus and IDS software, possibly both signature based IDS run alongside an anomaly based tool, as well as system and application logs. In contrast, running several functionally identical sensors in

3 In this text, the verbs “aggregate”, “correlate’, ‘vote’ and ‘adjudicate” are used as approximate

synonyms. They are technical terms from different areas of research, with differences of meaning that are irrelevant for the present purposes.

D3.1

53

tandem produces a different diversity, which can also increase the reliability of a system. However diversity is achieved, the aim is to reduce the likelihood of problems. Modelling can be used to assess the effectiveness of diversity in a system, even when it cannot produce accurate forecasts, for example, due to the difficulty in measuring parameters for the models.

Given two or more IDSs (or any other defence component), useful measures are then the probabilities of joint failures among all the components: for instance, given components A, B and C, the probability of A giving a false negative error while B and C respond correctly, of A and B giving an FN while C respond correctly, and so on. Sometimes it is useful to summarise such measures into e.g. measures of correlation among the failures of these different systems (again divided between FN and FP) and covariance between their coverage of different classes of attacks. This allows aggregation rules to be derived for a SIEM receiving the outputs of various detectors, taking into account the likelihood of failure for specific circumstances.

By estimating these measures for categories of attack types and conditions of use (high network load, etc.), rather than aggregate measures over all these combined, more accurate estimations can be derived. In (Littlewood & Strigini, 2004) we stated “Much of the chance for security system designers to make well-guided choices depends on better empirical measurements of the actual effectiveness of their various defence methods and components. However, these will never be very accurate. Yet, design decisions should at least be consistent with the information that one does have. Explicit probabilistic modelling seems the only way for ensuring this consistency.”

Current SIEMs can generate metrics including reports of events per second along with other high level statistics. Though this tells a company the volume of traffic analysed, it sheds no light on the performance of the overall protection system (i.e. the combination of the different protection systems such as IDSs, Firewalls, anti-virus products etc.) By digging into the number of alerts that truly merited further investigation in conjunction with a count of security breaches that were missed by the detection tools, a bigger picture can be formed and the performance of the overall protection system assessed. Ideally, an improvement in performance would be hoped for over time. So-called “reliability growth models” (RGMs) (also referred to as Security Growth Models when applied to security events) can be used to estimate the future security of the system, provided the events have been collected at the same level of granularity, ordered in time and labelled (i.e. for events that were alerted, whether they were security incidents or not (false positives or true positives), and for events that were not alerted, whether they should have been alerted or not (false negatives and true negatives). The predictive accuracy of RGMs for combinations of detectors as well as individual detectors will be investigated, highlighting how the observed trends include the combined effects of the attack-defence arms race as well as evolution in the levels of diversity between detectors.

D3.1

54

4.2 Idealised scenario of diversity assessment when the data is labelled

Defence tools, in particular signature based IDSs, have been historically known to generate many false positives, and even recent empirical measurements. (Shittu, 2015) confirm this fact. Furthermore, tools can give false negatives, meaning security problems can be missed. With labelled data, indicating how many alerts are true problems, rather than noise, and how many events have been missed or are truly innocuous, an assessment can be made on the effectiveness of an IDS or other security controls, and their combinations.

4.3 Empirical assessment

Diverse IDSs have been assessed in the literature. As noted, defence-in-depth applies to an architecture with multiple security systems. The possible configurations of the diverse protection systems can be assessed to determine observed performance in terms of false negatives and false positives (Algaith, 2017), in order to estimate future performance. Algaith et al consider conventional statistical measures for a binary classification system, finding sensitivity (true positive rate) and specificity (true negative rate), using 36 combinations of IDSs against three applications. Receiver operating characteristic (ROC) curves were plotted, using one point for each configuration, to find an optimal configuration. In the study, it was noted that optimality may take the cost for missed events (false negatives) or un-interesting alerts (false positives) into account. It is generally true that 1-out-of-2 systems – where the system gives an alert if and only if at least one of its two IDSs does – are better at detecting attacks, while 2-out-of-2 systems – where the system gives alerts only when both IDSs do – are better at correctly indicating benign traffic. This work has been extended to consider systems composed of more than two IDSs. For instance, other systems considered include 2-out-of-3, 3-out-of-5, and so on. As a simple example, rules could take into account whether certain patterns of “differences of opinion” between IDSs tend to be indicative of an attack or of innocuous traffic (e.g. “when IDSs A, B and C ‘vote’ alert while IDS C does not, this is likely to be an innocuous packet”.

In addition to assessment, labelled data can aid with decision support. In a specific operational environment, the SOC operators would want to use all the security systems at their disposal. The question is then what advice we can provide on the “adjudication” mechanism that is optimal for a given scenario of events (i.e. which configurations minimises the loss for a given system). Assuming that the organisation is able to cost the various events of interest then, based on previous observations, we can propose an adjudication mechanism that minimises these losses. The optimal adjudicator in this case can be much more refined than just using a conventional 1-out of-N or r-out-of-N configuration mentioned above. For example, previous observations could indicate that when IDSs A, B and C alert at the same time on a given event, and all other protection systems remain silent, then this is more likely to be a false positive. When an event of the same type is observed in the future the optimal adjudicator would “advise” the SOC operator than this is more likely a false positive and hence can

D3.1

55

ignore it. Importantly, these rules may not coincide with “common sense”: empirical measures can e.g. indicate situations in which a minority tends to be right and a majority wrong, with fine-grain discrimination that simpler rules like “adopt the majority view” cannot deliver).

In addition to providing diversity assessment and decisions support to the operators, we will also investigate visualisation techniques to clearly display the information they contain to SOC operatives who may lack expertise in statistical measures and techniques, or even just reducing the cognitive overload that may result from presenting too much information. We could configure an adjudicator that hides diversity from the operator, giving just the benefits from the diversity of sensors and will research how to present diverse streams of detector outputs in a helpful, non-misleading way.

Furthermore, as we mentioned previously, with labelled data, Reliability Growth Models can be applied to predict the time to next, or rate of occurrence of, false positive(s) or negative(s), hence allowing for the security growth (or decay) to be predicted.

4.4 Practical issues with diversity assessment with unlabelled data

It is straightforward to compare data from tools that give alerts at the same level (e.g. at the Networking layer, or Layer 3 of the ISO OSI model), but some tools provide alerts at different levels (e.g. layer 3 from a network IDS, and layer 7 from a host resident anti-virus). It is therefore important to “normalise” data from different tools in such a way that allows “like for like” comparisons. If alerts are at the packet level from one tool and at the connection level from another, or even per file or user, then comparing measures such as sensitivity, specificity and accuracy could be misleading. The data must be transformed to the same “units”, and then counted per packet, connection or time period.

4.5 Techniques to help labelling data

With labelled data it is possible to generate the metrics highlighted in Section 4.1 above and perform the types of assessment highlighted there (sensitivity, specificity etc.). Many journal papers report these metrics for artificial datasets, for example using honeypots, where the nature of the traffic was known, or else likely to be an attack. With high volumes of real–world data generated in a SIEM, thought must be given to how data can be labelled and the level at which the metrics will be calculated. Labelling terabytes of packet data is impractical. Proxy measures could be used, such as a count of occasions on which the SOC team had to investigate a situation which has not been alerted by the SIEM, as a measure of false negatives. Again, we must be careful that the metrics are at the same level of analysis, so if the false negatives are being counted at a given level, the false positives should also be counted at the same level. These measures then allow us to calculate the false positive rates and false negative rates, which, together with the losses associated with these events can then be used in an ROC-type analysis to select diverse configurations (or adjudication mechanism) that minimise the

D3.1

56

losses. Furthermore, the labels themselves may not truly reflect ground truths, so some uncertainty in the labels could be taken into account; for example see (Lowne, 2010). They conclude, “Lack of labelled feedback is not incongruent to adaptive classification.” They show how dynamically adaptive classification systems can be built even without fully labelled input data.

4.6 Time-based metrics for diversity assessment

In some previous work, we studied the diversity in detection capabilities of anti-virus products (Stankovic, 2011). Apart from assessing the diversity between the products we also assessed the evolution of detection capabilities over time. No one anti-virus tool detected every malware instance, and sometimes individual tools would cease to spot a problem when their rule-base was updated. Usually, those viruses missed by such regression were caught by other tools – diverse setups improved detection capability over time. We also analysed the “at-risk time” for a system (periods of time when a malware is not detectable by an anti-virus product). We found that this time-at-risk can be significantly shortened when using diverse anti-virus products. Hence, on a SIEM, apart from metrics on the diversity that exists between products at any one point in time, it is useful to have metrics to track the evolution of the diversity between the different protection systems, and hence assess which combination of products is most useful for reducing the “at risk” time that a system may be in.

D3.1

57

5 Summary and Conclusions

In this report we presented a review of the current state of the art in industry and academia for security metrics. We summarised those metrics that are most suitable for integration with SIEMs. We focused on those metrics that allow SOC operatives to assess risk and the effectiveness of the security protection systems and controls that they use, and hence help them with decisions regarding operations security. In addition, we also provided metrics supporting managerial security decision-making. Two new metrics were proposed.

We presented an analysis of data we gathered from the industrial partners in the DiSIEM project regarding the metrics that they use in their SOCs. Based on the analysis of the responses from the industrial partners we identified a set of security metrics for which we will assess the feasibility of further integration in the components to be developed in the project.

We reviewed the state of the art on risk assessment in current SIEM solutions and then proposed a hierarchical model to assess multi-level security risk. This model aims at providing support for different levels of decision making concerning security operations and management: SOC analyses, middle-level IT management and senior management.

Finally, we presented a review of security “diversity” measures – i.e., how similar or different security protection systems are from each other in their ability to detect attacks, or avoid common vulnerabilities. These measures are important to help organisations with decisions on how to choose amongst different protection systems available, which combination of protection systems is yielding the best protection for a given scenario.

The metrics presented in this report will form a valuable input to the later deliverables in work package 3, which will be more concerned with probabilistic assessment of diversity for security. Deliverable 3.2 will present a detailed analysis of developed, evaluated, and validated probabilistic models built from the ideas presented in this deliverable. Deliverable 3.3 will update and refine the models and metrics deployed to the industrial partners based on those described above.

The security metrics, the risk assessment model, as well as the diversity metrics here presented will be updated and revised for integration in the components that will be developed in the Visual Analysis Platform (WP5).

D3.1

58

6 References

AlienVault. (2017). AlienVault USM Appliance User Guide. Retrieved July 24, 2017, from https://www.alienvault.com/documentation/resources/pdf/usm-appliance-user-guide.pdf

AlienVault. (2017a). AlienVault OSSIM. Retrieved July 24, 2017, from https://www.alienvault.com/products/ossim.

AlienVault. (2017b). AlienVault Unified Security Management. Retrieved July 24, 2017, from https://www.alienvault.com/products.

ArcSight. (2010). WhitePaper: Security Operations Metrics Definitions for Management and Operations Teams [White paper]. HP ArcSight, 44(0), 0–7.

Berinato, S. (2005). A few good information security metrics. Retrieved October 24, 2016, from http://www.csoonline.com/article/2118152/metrics-budgets/a-few-good-information-security-metrics.html

Bohanec, M. (2017). DEXi: A program for multi-attribute decision making. Retrieved May 2017, from http://kt.ijs.si/MarkoBohanec/dexi.html.

Bowen, P., Hash, J., & Wilson, M. (2007). Information security handbook: a guide for managers. In NIST SPECIAL PUBLICATION 800-100, NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY.

Butler, J. M. (2009). Benchmarking security information event management (SIEM) [White paper]. A SANS Whitepaper.

Cain, C. I. and Couture, E. (2011). Establishing a Security Metrics Program. Technical report, GIAC Enterprises.

Chuvakin, A. (2014). On SIEM Tool and Operation Metrics. Retrieved October 28, 2016, from http://blogs.gartner.com/anton-chuvakin/2014/06/17/on-siem-tool-and-operation-metrics/

CIS. (2010). The CIS Security Metrics - v1.1.0.

Cornell, C. (2015). Five Metrics You Should Be Recording for Incident Response. Retrieved October 28, 2016, from https://swimlane.com/five-metrics-for-incident-response/

DiSIEM Consortium. (2017). In-depth Analysis of SIEMs Extensibility. DiSIEM Project Deliverable2.1. February 2017.

First. (2017). Common Vulnerability Scoring System SIG. Retrieved October 22, 2017, from https://www.first.org/cvss/

Gordon, S. (2015). Operationalizing Information Security Putting the top 10 SIEM best Practices to Work - Process, Metrics and Technology Considerations

http://www.csoonline.com/article/2118152/metrics-budgets/a-few-good-information-security-metrics.html

http://www.csoonline.com/article/2118152/metrics-budgets/a-few-good-information-security-metrics.html

http://blogs.gartner.com/anton-chuvakin/2014/06/17/on-siem-tool-and-operation-metrics/

http://blogs.gartner.com/anton-chuvakin/2014/06/17/on-siem-tool-and-operation-metrics/

https://swimlane.com/five-metrics-for-incident-response/

https://swimlane.com/five-metrics-for-incident-response/

D3.1

59

HP. (2017a). ArcSight ESM. Retrieved July 24, 2017, from https://software.microfocus.com/en-us/software/siem-security-information-event-management

HP. (2017b). Hewlett-Packard. Retrieved July 24, 2017, from http://www.hp.com.

IBM. (2015). Asset risk levels and vulnerability categories, IBM QRadar Security Intelligence Platform 7.2.6. Retrieved 24 July, 2017, from http://www.ibm.com/support/knowledgecenter/en/SS42VS_7.2.6/com.ibm.qradar.doc/c_qvm_view_scan_rslthosts_.html.

IBM. (2017a). IBM QRadar SIEM. Retrieved 24 July, 2017, from http://www-03.ibm.com/software/products/en/qradar-siem

IBM. (2017b). Security Intelligence. Retrieved 24 July, 2017, from https://www.ibm.com/security/security-intelligence/qradar/

IBM. (2017c). Scan investigations, IBM QRadar Security Intelligence Platform 7.2.6. Retrieved 24 July, 2017, from http://www.ibm.com/support/knowledgecenter/SS42VS_7.2.6/com.ibm.qradar.doc/c_qvm_scan_invest.html.

IBM. (2017d). QRadar Vulnerability Manager Security Software Integrations. Retrieved 24 July, 2017, from https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W746177d414b9_4c5f_9095_5b8657ff8e9d/page/Bigfix%20&%20Qradar%20Vulnerability%20Manager%20Security%20Software%20Integration/version/8f66dcf2-eff0-49d9-943b-cf1a35208928

ISO/IEC. (2011). ISO/IEC 27005:2011 - Information Security Risk Management. International Organization for Standardization and International Electrotechnical

Commission.

Jäger, T. (2014). Asset and network modeling in HP ArcSight ESM and Express, Hewlett-Packard [PowerPoint Slides]. Retrieved 26 June, 2017, from https://h41382.www4.hpe.com/gfs-shared/downloads-304.pdf.

Jansen, W. (2009). Directions in security metrics research. Technical report, National Institute of Standards and Technology (NIST).

Julisch, K. (2009). A unifying theory of security metrics with applications. IBM Research–Zurich.

Kotenko, I. and Novikova, E. (2014). Visualization of security metrics for cyber situation awareness. In Availability, Reliability and Security (ARES), 2014 Ninth International Conference on, pages 506–513. IEEE.

Kotenko, I., Polubelova, O., Saenko, I., and Doynikova, E. (2013). The ontology of metrics for security evaluation and decision support in SIEM systems. In

D3.1

60

Availability, Reliability and Security (ARES), 2013 Eighth International Conference on, pages 638–645. IEEE

Kwan, T.W. (2010). A risk management methodology with risk dependencies. The Hong Kong Polytechnic University. (People's Republic of China).

Muthukrishnan, S. M., & Palaniappan, S. (2016, May). Security metrics maturity model for operational security. In Computer Applications & Industrial Electronics (ISCAIE), 2016 IEEE Symposium on (pp. 101-106). IEEE.

NIST. (2012). NIST Special Publication 800-30 - Guide for Conducting Risk Assessments. NIST.

Payne, S. C. (2006). A guide to security metrics. SANS Security Essentials GSEC Practical Assignment Version 1.2e..

Rathbun, D., & Homsher, L. (2009). Gathering security metrics and reaping the rewards. SANS Institute, Oct.

Savola, R. M. (2007). Towards a taxonomy for information security metrics. In Proceedings of the 2007 ACM workshop on Quality of protection, pages 28–30. ACM.

Splunk. (2017a). Analytics-Driven SIEM Solutions. Retrieved 26 June, 2017, from https://www.splunk.com/en_us/solutions/solution-areas/security-and-fraud/siem-security-information-and-event-management.html

Splunk. (2017b). Risk Analysis Framework in Splunk ES. Retrieved 26 June, 2017, from http://dev.splunk.com/view/enterprise-security/SP-CAAAFBD

Splunk. (2017c). Analyze risk in Splunk Enterprise Security, S. Enterprise Editor. Retrieved 26 June, 2017, from http://docs.splunk.com/Documentation/ES/latest/User/RiskScoring

Splunk. (2017d). Download a threat intelligence feed from the Internet in Splunk Enterprise Security. Retrieved 26 June, 2017, from http://docs.splunk.com/Documentation/ES/4.7.2/Admin/Downloadthreatfeed

Splunk. (2017e). Risk Analysis with Enterprise Security 3.1. Retrieved 10 September, 2017, from https://www.splunk.com/blog/2014/08/12/risk-analysis-with-enterprise-security-3-1.html

Strecker, S., Heise, D., & Frank, U. (2011). RiskM: A multi-perspective modeling method for IT risk assessment. Information Systems Frontiers, 13(4), 595-611.

Tashi, I. and Ghernaouti-Helie, S. (2008). Efficient security measurements and metrics for risk assessment. In Internet Monitoring and Protection, 2008. ICIMP’08. The third International Conference on, pages 131–138. IEEE

D3.1

61

Thiele, F. (2014). ArcSight Priority Formula [PowerPoint slides]. Protect 2014, Hewlett-Packard. Retrieved 26 June, 2017, from http://h41382.www4.hpe.com/gfs-shared/downloads-340.pdf.

Vaarandi, R. and Pihelgas, M. (2014). Using security logs for collecting and reporting technical security metrics. In Military Communications Conference (MILCOM), 2014 IEEE, pages 294–299. IEEE.

Vaughn, R. B., Henning, R., and Siraj, A. (2003). Information assurance measures and metrics-state of practice and proposed taxonomy. In System Sciences. Proceedings of the 36th Annual Hawaii International Conference on, pages 10. IEEE.

Whitman, M. E. (2014). Management of Information Security, 4th Edition. Cengage Learning.

WISER. (2016a). WISER D3.1 - CYBER RISK PATTERNS. May, 2016.

WISER. (2016b). WISER D5.2 – WISER REAL-TIME ASSESSMENT INFRASTRUCTURE. October, 2016.

D3.1

62

7 Appendix – Metrics

We can divide the SOC capabilities into three main sectors: People/Management where we evaluate the SOC team work, training, their response to incidents and structure, and the management process. Process, where we monitor the incidents, vulnerabilities cases, incident analysis and resolution. Technologies, where we analyse the network infrastructure, the vulnerability track, the SIEM infrastructure and the log management. Security metric must be in the same perspective and direction. We structure all the gathered metrics into these three main topics.

7.1 People/Management

7.1.1 Governance

7.1.1.1 UA - User Activity (ArcSight, 2010)

Definition: The UA metric calculates the top users (usually the top 10) with biggest number of failed logins attempts. This metric helps to detect (patterns of) malicious activity. Input data: The events which contain the users and their failed logins attempts. Output: A list containing ten users with biggest number of failed attempts and the corresponding number. Suggested frequency: Daily.

7.1.1.2 PUA - Privileged Users Activity (ArcSight, 2010; Gordon, 2015)

Definition: The PUA metric computes the top privileged users who have the biggest number of logins. This metric helps the SOC team to detect abnormal activity from these users. Input data: The logs containing the successful login attempts from privileged users. Output: A list of the top 10 privileged users who have the biggest number of logins and the corresponding numbers. Suggested frequency: Daily.

7.1.1.3 PETVI(a) - SOC’s Percentage of effort Time to resolve Vulnerabilities and Incidents

Definition: The PETVI metric calculates in percentage, the SOC’s team effort time to resolve the vulnerabilities and incidents. This metric can be used with the Efficacy metric (7.1.1.8) to obtain a view of the SOC’s team performance. Input data: all the SOC’s team work time and the team’s time deliverable to resolve the vulnerabilities. Output: percentage of the team’s effort time to resolve the vulnerabilities and resolve incidents. Suggested frequency: Monthly.

D3.1

63

7.1.1.4 PETrV - SOC’s Percentage of effort time to resolve Vulnerabilities

Definition: The PETrV metric is a sub metric of the PETVI(a)(7.1.1.3), and only focus on calculating the percentage of effort time in resolving the vulnerabilities. Input data: total SOC’s team work time and the time to resolve the vulnerabilities. Output: percentage of the team’s effort time to resolve the vulnerabilities. Suggested frequency: Monthly.

7.1.1.5 PETrI – SOC’s Percentage of effort time to resolve incidents

Definition: The PETrI metric is a sub metric of the PETVI(a) (7.1.1.3), and only focus on the percentage of effort time on resolving incidents. Input data: total SOC’s team work time and the time of to resolve the incidents. Output: Percentage of the team’s effort time to resolve the incidents. Suggested frequency: Monthly.

7.1.1.6 PETVI(b) - SOC’s Percentage of effort Time to resolve Vulnerabilities and resolve Incidents

Definition: This metric is similar to the PETVI(a) (7.1.1.3). Contrary with the PETVI(a) which calculates the effort for the vulnerabilities and incidents, considering only the vulnerabilities and incidents created and resolved this month, the PETVI(b) metric considers vulnerabilities and incidents resolved this month that were opened this month or in the previous months. Input data: all the team’s work time and the time to resolve the vulnerabilities and incidents of that period (month). Output data: the team’s effort time to resolve vulnerabilities and resolve incidents of that period (month). Suggested frequency: Monthly.

7.1.1.7 NVR – Number of vulnerabilities cases by responsible

Definition: This metric measures the total number of vulnerabilities cases each responsible (owner) of the asset/application has. It can be change to only provide the number of vulnerabilities cases which are still to be resolved. Input data: vulnerabilities cases of each responsible. Output: total number of vulnerabilities cases of each responsible. Suggested frequency: Monthly.

7.1.1.8 ERVIDENT (a) - Efficacy of resolution of incidents and vulnerabilities opened and resolved in the current month

Definition: This metric calculates the efficacy, in that month, of the SOC team and other teams involved in resolving incidents and vulnerabilities. The result

D3.1

64

should be kept in history to be compared for the following months, thus allowing observing and correlating the line of effort of the team. Input data: all the cases opened until that month and the cases closed in that month. Output: the ratio between the total cases resolved and the total opened cases for that period.

𝐸𝑓 = 𝑅𝐶

𝑇𝑜𝑡𝑎𝑙𝐶 ,

Where: 𝐸𝑓 – Efficacy of the team resolving cases

𝑅𝐶 – Resolved cases in that period 𝑇𝑜𝑡𝑎𝑙𝐶 – Total cases in that period (resolved cases on that month + open cases that month) Suggested frequency: Monthly.

7.1.1.9 ERV - Efficacy of resolution of vulnerabilities

Definition: This metric calculates the efficacy, in that month, by the SOC team and other teams involved in resolving vulnerabilities. The result should be kept in history to be compared for the following months, thus allowing observing and correlating the line of effort of the team. Contrary to the previous metric (7.1.1.8) this metric only concerns in calculating the teams’ efficacy for vulnerabilities resolution. Input data: all the vulnerabilities of that period. Output: the ratio between the total number of vulnerabilities resolved and the total number of opened vulnerabilities for that period.

𝐸𝑓 = 𝑅𝐶


Where: 𝐸𝑓 – Efficacy of the team in vulnerabilities resolution

𝑅𝐶 – Resolved vulnerabilities in that period 𝑇𝑜𝑡𝑎𝑙𝐶 – Total number of vulnerabilities in that period (Resolved vulnerabilities + Opened vulnerabilities) Suggested frequency: Monthly.

7.1.1.10 ERIDENT - Efficacy of resolution of incidents

Definition: This metric calculates the efficacy, in that month, by the SOC team and other teams involved in resolving incidents. The result should be kept in history to be compared for the following months, thus allowing observing and correlating the line of effort of the team. Contrary to the metric (7.1.1.8) this metric only concerns with calculating the teams’ efficacy for incidents resolution. Input data: all the incidents of that period. Output: the ratio between the total number of incidents resolved and the total number of incidents opened in that period.

𝐸𝑓 = 𝑅𝐶


D3.1

65

Where: 𝐸𝑓 – Efficacy of the team resolving incidents

𝑅𝐶 – Resolved incidents in that period 𝑇𝑜𝑡𝑎𝑙𝐶 – Total incidents in that period (resolved incidents + incidents cases) Suggested frequency: Monthly.

7.1.1.11 ERVIDENT (b) - Efficacy of resolution of incidents and vulnerabilities resolved in the current month

Definition: This metric considers incidents and vulnerabilities, which were opened and closed in that month, and calculates the efficacy of the SOC team and other teams involved in their resolution. The result should be kept in history to be compared for the following months, thus allowing observing and correlating the line of effort of the team. To observe and correlate the line of effort of the team. Input data: all the cases of that period. Output: the ratio between the total cases resolved and the total opened cases for that period.

𝐸𝑓 = 𝑅𝐶


Where: 𝐸𝑓 – Efficacy of the team

𝑅𝐶 – Resolved cases in that period 𝑇𝑜𝑡𝑎𝑙𝐶 – Total cases in that period (resolved cases + opened cases) Suggested frequency: Monthly.

7.1.1.12 TAFBU - Top access failures by business unit (Gordon, 2015)

Definition: This metric focus in determining the top ten access failures by business unit. Input data: A list of events. Output data: Top ten access failures by business unit. Suggested frequency: Daily.

7.1.2 Security values

7.1.2.1 CU - Cost of Updates (ArcSight, 2010)

Definition: The CU metric calculates the cost of an update. The number (and required time) of signature, policy, application and other software updates. This metric helps the security manager to explain the effort and amount of time involved in updating the various security devices and agents. Input data: Number of software updates and average length time for each update over the time and unitary cost of work and update. Output: Total cost of updates. Suggested frequency: Monthly.

D3.1

66

7.1.2.2 RRSO - Rate of return for security operations (derived from (ArcSight, 2010))

Definition: The purpose of this metric is to show the importance of investment in the security operations, by comparing the overall cost of security operations to losses due to security incidents. It is the percentage ratio between the total costs with incidents resolution and the costs of security operations. Input data: The total number of incidents resolved, the average cost per incident and total costs with security operations. Output: The percentage ratio between the total costs with incidents resolution and the costs of security operations. Suggested frequency: Monthly.

7.1.3 Assets/Business values

7.1.3.1 AC - Asset Criticality (Kotenko et al., 2013)

Definition: The AC metric calculates a value to represent the impact for the organization resulting from the loss of an asset. The value can be quantifiable or qualitative (Low, Medium, High). As assets we consider: hosts (middleware, firewalls, IPS, IDS, databases) and applications. Input data: a list of the assets and information about them (their supply for the company, value for the company, their dependencies, etc.). Output: a value for the criticality of the asset. Suggested frequency: Monthly or when there are changes in the list of assets.

7.1.3.2 BV - Business Value (Kotenko et al., 2013)

Definition: The BV metric calculates a value to represent the impact from the loss of a business or a service to the organization. The metric can be quantifiable or qualitative (Low, Medium, High). Input data: a list of businesses or services and information about them (their value for the company, their dependencies from other business or services, applications, etc.). Output: A value representing the impact from the loss associated to a business or service. Suggested frequency: Monthly or when changes in business/services occur.

7.2 Processes

7.2.1 Incidents and vulnerabilities status

7.2.1.1 MTTR - Mean Time to Remediate (a known vulnerability and a reported incident) (derived from (Kotenko et al., 2013))

Definition: MTTR is the average time the team spent to resolve a ‘problem’ (here the ‘problem’ is the disjunction of known vulnerabilities and reported incidents). Allows to assess the efficiency of resolution vulnerabilities/incidents and

D3.1

67

provides the manager with quantifiable information to request, if necessary, more personal or equipment to improve (reduce) the MTTR. Input data: date of discovery of every known vulnerability/incident and their date of the resolution of these vulnerabilities and incidents. Output: average number of days (can be other type of time measurement) that the team spends to resolve the ‘problem’. Suggested frequency: Daily.

7.2.1.2 MTTRV(a) - Mean time to resolve a vulnerability

Definition: MTTRV is a particular case of MTTR and consists in measuring the average time which a known vulnerability is resolved. Input data: date of discovery of every known vulnerability and their date of resolution. Output: average number of days (can be other type of time measurement) that the team spends to resolve a known vulnerability. Suggested frequency: Daily.

7.2.1.3 MTTRI(a) - Mean time to resolve an incident

Definition: MTTRI is another particular case of MTTR and consists in measuring the average time for the resolution of incidents. Input data: dates of the reported incidents and their resolution dates. Output: average number of days (can be other type of time measurement) that the organization spends to resolve an incident. Suggested frequency: Daily.

7.2.1.4 AOKVS - Age of the oldest known vulnerability and not resolved by severity

Definition: This metric consists in computing the age of the oldest unresolved known vulnerability by each category of vulnerability severity. This helps the team to assess and take priority action in the vulnerabilities by their age and severity category. Input data: dates of all vulnerabilities which are known and not resolved. Output: number of days (can be also minutes, hours, months, etc..) of the oldest unresolved vulnerability for each severity type. Suggested frequency: Daily.

7.2.1.5 NKVS – Number of known vulnerabilities and not resolved by severity

Definition: This metrics consists in measuring the status of existing vulnerabilities by their severity levels, counting the total number of known unresolved vulnerabilities for each severity category. Input data: dates of all the known unresolved vulnerabilities for each severity category. Output: number of unresolved vulnerabilities by their severity level. Suggested frequency: Daily.

D3.1

68

7.2.1.6 NKUV – Number of known unresolved vulnerabilities by vulnerability type

Definition: The metric measures the total number of known unresolved vulnerabilities for each vulnerability type. With these values, the team can manage their efforts and resources to resolve vulnerabilities by knowing how many vulnerabilities are open, by vulnerability type. Input data: dates of open vulnerabilities for each vulnerability type. Output: total number of known unresolved vulnerabilities by vulnerability type. Suggested frequency: Daily.

7.2.1.7 NVMC – Number of vulnerabilities cases by month in each severity category

Definition: This metric lets the team to know and report, for each month the number of vulnerabilities identified by each severity category. Input data: number of vulnerabilities cases identified for the current month for each severity category. Output: total number of vulnerability cases identified for the current month for each severity category. Suggested frequency: Monthly.

7.2.1.8 NVTA – Number of vulnerabilities identified by tested asset

Definition: This metric computes the number of vulnerabilities identified for each tested asset. This metric can be changed to the number of vulnerabilities identified in each tested asset, by their severity category. A correlation of a set of results of this metric will show the most vulnerable tested assets. Input data: vulnerabilities identified for each tested asset. Output: total number of vulnerabilities for each tested asset. Suggested frequency: Monthly.

7.2.1.9 NVIM – Number of vulnerabilities identified and reported incidents, by month

Definition: This metric calculates the total number of ‘problems’ (vulnerabilities and incidents). This metric counts the total number of cases, discarding if where already resolved or are still to be resolved. The metric can be changed to count the total number of vulnerabilities identified and reported incidents which are not yet resolved or are already resolved. These two-additional metrics can be change to provide the results for each month or for the global scenario. Input data: all the cases of the month. Output: total number of cases, opened and closed, of the month. Suggested frequency: Monthly.

7.2.1.10 NRIM – Number of reported incidents by month

Definition: This metric is a particular case of the previous one. It counts the total number of reported incidents for each month. This metric can be changed to just count the number of reported incidents which are still to be resolved or those

D3.1

69

which are already resolved. The metric can also generate sub-metrics for each type of incident (phishing, malicious attack, unauthorized access, etc.). Input data: the reported incidents of the month. Output: total number of reported incidents of the month. Suggested frequency: Monthly.

7.2.1.11 NRIRO – Number of reported incidents at each region of operation

Definition: This metric computes the number of reported incidents for each region of operation. Input data: reported incidents. Output: total number of incidents reported by each region of operation. Suggested frequency: Monthly.

7.2.1.12 NRIVM – Number of resolved incidents and vulnerabilities by month

Definition: This metric calculates the number of resolved incidents and vulnerabilities resolved by the SOC team and other teams. Input data: the cases resolved/closed (true positive) incidents and vulnerabilities in that month. Output: total number of cases resolved. Suggested frequency: Monthly.

7.2.2 Threat detection

7.2.2.1 TMA - Top malware activity (ArcSight, 2010; Gordon, 2015)

Definition: The TMA computes the top malware detected in the organization by their criticality. Input data: Reported incidents with malware activity. Output: Top (ex.: top five or top ten) malware activities and their criticality that were detected inside the organization. Suggested frequency: Daily.

7.2.2.2 TUS/PAS - Top unusual scans / probe activities by source (Gordon, 2015)

Definition: The unusual scans/probe activities metric calculates, by source, the top 10 (by occurrence) classified unusual scans and/or probe activities. This metric can be used to select which sources should be added to the blacklist. Input data: The events containing scans and probes. Output: a list of the top ten sources and their unusual scans and/or probe activities. Suggested frequency: Monthly.

7.2.2.3 ACC - Attacks classified by their criticality (derived from (Gordon, 2015))

Definition: This metric calculates the number of attacks made, by their criticality, against the vulnerable systems. Provides a view about the attackers and how vulnerable the organization is.

D3.1

70

Input data: Reports of incidents and/or events. Output: Number of attacks made, by their criticality, against vulnerable systems. Suggested frequency: Monthly.

7.2.2.4 TEE - Top Egress Event (ArcSight, 2010)

Definition: The TEE metric considers the SIEM events and calculates the top ten source IPs, destination IPs and destination ports for events leaving the organization with malicious activity, originating from within the organization. This metric helps the SOC team to analyse and identify patterns of malicious activity originating from the organization. Input data: communication events leaving the organization, provided by the SIEM, containing the source IPs, destination IPs and destination ports. Output: a list of the top 10 source IPs, destination IPs and destination ports leaving the organization Suggested frequency: Daily.

7.2.2.5 TIE - Top Ingress Event (ArcSight, 2010)

Definition: The TIE metric is similar to the TEE metric and also uses the SIEM events. It calculates the top 10 source IPs, destination IPs and destination ports with malicious intent. It focuses in the communications which the source is the internet and the destination is the organization. This metric helps the SOC team to analyse and identify patterns from malicious activity. Input data: events of communication from the internet to the organization, provided by the SIEM, and containing the source IP, destination IP and destination port. Output: a list of the top 10 communication events from the internet to the organization. Suggested frequency: Daily.

7.2.2.6 TFA - Top Foreign attacks (ArcSight, 2010)

Definition: The TFA calculates the top 10 most severe attacks originating from foreign countries. Input data: security events (attacks) that lead to an incident, their severity levels and their origins. Output: a list of the top 10 most severe attacks. Suggested frequency: Daily.

7.2.2.7 TFC - Top Foreign Countries (ArcSight, 2010)

Definition: The TFC metric calculates the top 10 countries destinations with communication from the organization and the top 10 countries sources with traffic incoming to the organization. Input data: events containing the source and destination country. Output: the top ten destination countries and the top ten source countries. Suggested frequency: Daily.

D3.1

71

7.2.2.8 FE - Firewall Entry (ArcSight, 2010)

Definition: The FE metric calculates the top external blocked sources which exceeded the reasonable number of blocked sessions permitted. Input data: the blocked IPs. Output: the top 10 external sources by block/permitted ratios. Suggested frequency: Daily.

7.2.2.9 TAFD - Top access failures by destination (Gordon, 2015)

Definition: This metric calculates the top ten destination access failures. Input: A list of events. Output data: The top ten access failures by destination (IP address or hostname). Suggested frequency: Daily.

7.2.3 Security status

7.2.3.1 PIS - Percentage of infected Systems (derived from (ArcSight, 2010; Kotenko et al., 2013))

Definition: The PIS metric tracks the occurrences of systems (or assets) infected by malware or with vulnerabilities. It calculates the percentage of infected systems, by different malware infection or independent vulnerabilities, in the organization. Input data: systems’ name and their security status (infected or clean). Output: Percentage of the infected systems, by malware infection or vulnerabilities type. Suggested frequency: Monthly.

7.2.3.2 NATM – Number of assets tested by month

Definition: This metric computes the number of assets which were tested to verify if they had vulnerabilities. Input data: assets tested in the respective month. Output: total number of assets tested for that month. Suggested frequency: Monthly.

7.2.3.3 AS - Attack Surface (Kotenko et al., 2013)

Definition: The AS metric calculates the potentiality of occurrence of an attack using the system resources and their interdependencies. These resources can be critical or not critical entry/exit points, channels, vulnerable subsets/applications of the system, untrusted data items sent, etc. The risk of the system is directly connected with the attack surface, hence if the attack surface increases the risk will also increase. Each resource contributes for the calculation of the attack surface value by their Damage Potential-Effort Ratio. Input data: name of the resources, the dependencies between them and the risk associated.

D3.1

72

Output: value of the attack surface can be two things. A risk’s value of the system (considering he risk of the sub-systems), it can be a quantitative or qualitative value. Percentage of the system infected by a possible attack, concerning the system and sub-systems vulnerabilities and dependencies, and the effort and damage of an attack. Suggested frequency: Monthly.

7.2.3.4 SHS - Security "Health" Score (ArcSight, 2010)

Definition: The SHS metric computes a weighted sum of several statistics regarding antivirus statistics and logs, ingress and egress security events, cases opened (incidents and vulnerabilities), metrics with security statistic about the system’s devices and services. It provides a green/yellow/red indicator displaying the attacks and/or malicious activity over the IT devices/services. Input data: A list of the devices and their security events (attacks and/or malicious activity, which may or not be prevented). Output: A visual display of the IT devices/services' security status. Suggested frequency: Daily.

7.3 Technology

7.3.1 Performance

7.3.1.1 EPS - Events per second (SANS, 2009)

Definition: The EPS metric calculates the average EPS collected into the SIEM. This metric helps to monitor the performance of all SIEM infrastructure’s components aiding in the detection of overload and unresponsive components. Input data: EPS for each SIEM device and collector. Output: Average (daily) EPS for each SIEM device, collector, and for all the SIEM infrastructure. Suggested frequency: Daily.

7.3.1.2 PE - Peak Event (SANS, 2009)

Definition: The PE metric calculates the average of the peak event (PE) for each SIEM device, collector and overall. The PE metric grants a quantifiable information about the performance of the devices in the presence of extreme conditions. By adding the Peak Event of each device, or the Peak Event of each collector, the security manager will get an overall PE perspective. Input data: Events flux of each device in the presence of extreme conditions. To have a more accurate average value is necessary an input data of a minimum period of 90 days. Output: maximum number of events per second in an extreme condition, for each SIEM device, collector, and for the SIEM itself. Suggested frequency: Daily.

D3.1

73

7.3.1.3 NE - Normal Event (SANS, 2009)

Definition: The NE metric calculates the normal behaviour of each SIEM device, collector and the SIEM itself, in the perspective of receiving events. The NE metric offers a quantifiable information about the performance of the devices in the presence of normal activity. By adding the Normal Event of each device, or the Normal Event of each collector, the security manager will get an overall NE perspective. Input data: Events flow of each device and collectors in the presence of normal activity. To have a more accurate average value is necessary an input data of a minimum period of 90 days. Output: The number of events per second in a normal state of operation, for each SIEM device, collector, and then for the SIEM itself. Suggested frequency: Daily.

7.3.1.4 CELV - Changes of the event log volume (Cornell, 2015)

Definition: This metric determines if a device is sending an abnormal number of events, providing to the team a faster response in detection and resolving the problems related with the device, connector or the communication between the two. It uses the three metrics above (EPS, PE and NE) to identify those devices. Input data: Results of the EPS, PE and NE metrics. Output: Device name which is having an abnormal number of events. Suggested frequency: Daily.

7.3.1.5 EM - Events Management (derived from (ArcSight, 2010))

Definition: The EM metric calculates the number of raw events, uncorrelated events, correlated events and annotated events managed within the SIEM infrastructure. With the combination of these values the security manager can extract two valuable information: 1) the importance and the performance of the SIEM in the organization; 2) the ability of the SIEM to reduce the volume of the raw events to uncorrelated events and then correlate those events. With this metric is possible to analyse, over time, if SIEM’s performance is increasing or decaying. The second valuable information is to check whether the analysts are executing the proper follow-up of the cases, by annotating and associating with events of interest. Input data: All the events managed within the SIEM. Output: The total number of raw events, uncorrelated events, correlated events and the annotated events managed within the SIEM. Suggested frequency: Monthly.

7.3.1.6 DTD - Detection to Decision (Cornell, 2015)

Definition: The DTD metric calculates the time it takes for an event/activity to be detected and processed through the detection tools, SIEM infrastructure, etc., before it reaches to the analyst. To calculate the required time, the DTD metric uses the timestamps associated with the events. Input data: Events with timestamps.

D3.1

74

Output: Time taken for an event to be detected and processed before it reaches to the analyst. Extra: average DTD, minimum DTD and maximum DTD. Suggested frequency: Monthly.

7.3.1.7 SEU - SIEM resource usage (derived from (Chuvakin, 2014))

Definition: The SEU metric is an indicator of the amount of CPU, RAM and disk resources used by the SIEM. The security manager can create alerts when the values are too high (or too low). Input data: SIEM’s list resources usage. Output: The amount of CPU, RAM and disk resources used by the SIEM. Suggested frequency: Monthly.

7.3.1.8 RH - Rules handled (derived from (Chuvakin, 2014))

Definition: The RH metric calculates the total number of rules handled by the analysts (and not being acknowledged), providing information about the rules capacity (rules fired vs alerts handled), and how the rules are processing the events. Input data: The rules fired. Output: The total number of rules fired vs the total number of rules handled by the analysts, etc. Suggested frequency: Monthly.

7.3.1.9 ID/PA - Intrusion Detection / Prevention Activity (ArcSight, 2010) Definition: The ID/PA calculates statistics related with the ID/IPS systems and their effectiveness in the intrusion detection. Input data: List of events and incidents. Output: The total number of attacks detected by priority and number of attacks blocked (IPS only). Suggested frequency: Daily.

7.3.1.10 QF - Quiet Feeds (ArcSight, 2010)

Definition: The QF metric calculates the number of feeds which are not giving any information. This can occur due to an interruption or discontinuity of the information given to the feed. With this information, the manager can discard the useless feeds. Input data: feeds and the information provided by each feed. Output: the feeds or the number of feeds not giving information. Suggested frequency: Monthly.

7.3.2 Compliance status

7.3.2.1 PL - Patch Latency (Berinato, 2008)

Definition: The PL metric calculates the time between a patch's release and the successful deployment of that patch in the organization.

D3.1

75

A patch discovery service should be used to obtain the criticality of each missing patch and to calculate the time between the missing patches were introduced and the date of the scan to determine how long each missing patch has been available for each device. Input data: The result of the patch discovery scan. Output: a list by patch criticality with the respective time which the organization was unpatched. Suggested frequency: Monthly.

7.3.2.2 PS - Patch Status (derived from (ArcSight, 2010))

Definition: The PS metric calculates the percentage of the systems that have the latest patches (Operating Systems or application) installed. In time of an attack or imminent crisis, is useful for the organization to detect the systems that aren’t with the latest patch for that vulnerability. Input data: The result of the patch discovery scan. Output: Percentage of the systems without latest patch. Suggested frequency: Monthly

7.3.2.3 AStatus - Antivirus Status (derived from ArcSight, 2010, and Gordon, 2015)

Definition: The AStatus metric computes the risk in the organization, concerning the antivirus policies status, verifying which of the antivirus installed aren’t with the latest released policies and signatures. Input data: The latest released policies and signatures available and the result of the scan containing the antivirus installed, their policies and signatures. Output: Percentage of the antivirus installed which do not have the latest configurations, policies and signatures. Suggested frequency: Monthly.

7.3.2.4 DUAC - Devices with unauthorized or anomalous communications (Gordon, 2015)

Definition: This metric computes a list of devices with unauthorized or anomalous communications. By displaying the devices with unauthorized or anomalous communications the SOC team can react more quickly and effectively, thus allowing for a high-quality monitoring over the devices. An improvement of this metric would consider the criticality of the devices. Input data: A list of events Output: The devices with unauthorized or anomalous communications. Suggested frequency: Daily.

7.3.2.5 UCC - Unusual configuration changes made in the FW, VPN, WAP and Domain (derived from (Gordon, 2015))

Definition: This metric determines the latest (five for instance) unusual configuration changes in the four type of security devices (FW, VPN, WAP and Domain). It improves the SOC team monitoring process regarding security changes. Input data: The events containing those unusual configuration changes.

D3.1

76

Output: The latest unusual configuration changes in the four security devices. Suggested frequency: Continuously.

7.3.2.6 IUH - Installation of unauthorized hardware (derived from (Cain et al., 2011))

Definition: The IUH metric calculates three factors related with unauthorized hardware/device: the average number of hours an unauthorized hardware/device is plugin into the network, the total number of unauthorized hardware/devices connected in the organization’s network and, lastly unauthorized hardware/devices threat level. To calculate the threat level a set of steps are required. The first is a list of unauthorized hardware/devices and their STL - Security Threat Level – being the one the smallest severe threat and the five the most important severe threat. Then a device discovery scan is made to identify the unauthorized devices and when they were installed. After this process the calculation for the threat level begins.

The devices with the same STL are grouped. For each group the STL is multiplied by the number of unauthorized devices (group’s length). Then it’s summed the average plugin hours of the group. The risk score is calculated by the sum of each group threat level multiplied by the equalizer controller threat level (TL). This TL is a score, chosen by the organization, for the threat level by having unauthorized devices connected in the network. This control threat level should be between one to ten

[(STLLevel 1 x DUNAUTH) + AHN]+[(STLLevel 2 x DUNAUTH) + AHN] + [(STLLevel 3 x DUNAUTH)+ AHN] * TL= Devices Threat Level (Risk Score)

where DUNAUTH = Number of unauthorized devices discovered in a given period STL = Security Threat Level, on a scale from 1-5, 5 being a high importance (consider the device threat/risk for the organization. AHN = Average Hours on Network TL – Threat Level Input data: The unauthorized devices types and their STL for the organization, the result of the device discovery scan. Output: Total number of unauthorized devices, average number of hours an unauthorized device is plugin and the unauthorized device threat level Suggested frequency: Monthly.

7.3.2.7 IUS - Installation of unauthorized software (derived from (Cain et al., 2011))

Definition: Similar to the IUH, it calculates the three factors related with unauthorized software. The average number of hours an unauthorized software installed, the total number of software installed, lastly unauthorized software threat level. To calculate the threat level a set of steps are required. The first is a list of unauthorized software and their STL - security threat level – being one the smallest severe threat and five the most important severe threat. Then a

D3.1

77

software discovery scan is made to identify the unauthorized software and determine when they were installed. After this process the calculation for the threat level begins. The devices with the same STL are grouped. For each group the STL is multiplied by the number of unauthorized devices (group’s length). Then it’s summed the average plugin hours of the group. The risk score is calculated by the sum of each group threat level multiplied by the equalizer controller threat level (TL). This TL is a score, chosen by the organization, for the threat level by having unauthorized devices connected in the network. This control threat level should be between one and ten. The software with the same STL are grouped and for each group is multiplied the number of unauthorized software of each group (group’s length) and then is summed the average hours in the network of the group. The results of each group are summed up together. The final sum is then multiplied by an equalizer number, for the formula to be in the interval of [1,10], with ten being the most severe threat level.

[(STLLevel 1 x DUNAUTH) + AHN]+[(STLLevel 2 x DUNAUTH) + AHN] + [(STLLevel 3 x DUNAUTH)+ AHN] * TL = Devices Threat Level (Risk Score)

where DUNAUTH = Number of unauthorized software in a given period STL = Security Threat Level, on a scale from 1-5, 5 being a high importance (consider the device threat/risk for the organization. AHN = Average Hours on Network TL = Threat Level Input data: The unauthorized software types and their STL for the organization, the result of the software discovery scan. Output: Total number of unauthorized software, average number of hours an unauthorized software is installed and the unauthorized software threat level Suggested frequency: Monthly

7.3.3 Coverage

7.3.3.1 TE - Top events (ArcSight, 2010)

Definition: The TE metric determines the most severe events received by the SIEM. It helps the team to detect the severe events, their types and the source which is providing them, and helps to analyse if the priority formula in the SIEM is classifying the events correctly. Input data: events and their severity classified by the SIEM. Output: a list of the most severe events received by the SIEM. Suggested frequency: Daily.

D3.1

78

7.3.3.2 PAM - Percentage of assets modelled (derived from (ArcSight, 2010))

Definition: The PAM metric calculates the percentage of assets being tracked by the SIEM (or other security technology), providing a view of the security team's monitoring surface. Input data: All organization’s assets and the assets being tracked by the SIEM. Output: Percentage of assets being tracked by the SIEM. Suggested frequency: Weekly and/or Monthly.

7.3.3.3 PDM - Percentage of devices monitored (derived from (ArcSight, 2010))

Definition: The PDM metric calculates the percentage of devices (or data feeds) being fed into the SIEM by type. It can be used to track the devices that are being used to feed the SIEM and from those identify which are not providing events, due, for example, bad configuration. Input data: All the devices which should be feeding the SIEM and the archive of events. Output: Percentage of the devices that are truly being used to feed the SIEM. Suggested frequency: Monthly.

7.3.3.4 ACover - Antivirus Coverage (derived from (ArcSight, 2010; Gordon, 2015))

Definition: The ACover metric computes the risk in the organization concerning the devices which do not have antivirus installed and/or the latest antivirus definition files. Input data: The result of the scan to determine which devices have antivirus installed and/or the latest virus definition files and a list of all the devices in the organization. Output: The percentage of the devices which do not have antivirus installed and/or the latest virus definition files. Suggested frequency: Monthly.

7.3.3.5 TDT - Top dropped traffic by DMZ and FW (Gordon, 2015)

Definition: This metric provides two points of view. One is the list of traffic categories with the biggest number of communications dropped by the DMZ and FW. The other is to monitor if the DMZ and FW aren't dropping traffic which should be forwarded. Input data: A list of events. Output: Top 10 dropped traffic from DMZ and FW. Suggested frequency: Daily.

d3.1 security metrics and measurements - disiem...

Documents