managing billions of logs every day - eventtracker...managing billions of logs every day – fast...

9
Managing Billions of logs Every Day Fast In, Smart Out White Paper

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

Managing Billions of logs Every Day Fast In, Smart Out

White Paper

Page 2: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions platforms outside of the

traditional database architecture. IT operations can comprise hundreds or thousands of log sources

ranging from network devices to critical applications to operating systems. Each of these generates log

messages of bewildering variety, in non-standard formats and staggering volumes. This presents a

significant and complex technical challenge to organizations and their vendor of security information and

event management (SIEM) solutions. These systems must capture all of the data, analyze it for critical

alerts in real-time and then securely archive the unaltered logs to meet legal chain-of-custody

requirements while also indexing it for subsequent search and reporting - in present formats and those

not yet defined.

These requirements must be met in a cost effective solution that does not create a data storage explosion.

EventTracker reconciles all of these disparate objectives in a scalable, efficient software application

delivered as a virtual appliance or on physical servers. This white paper describes the unique advantages

of EventTracker’s architecture.

EventTracker describes our design criteria as “Fast In, Smart Out”. Unstructured data is efficiently

received, analyzed against configurable rules for alerting and correlation, and archived as flat files without

the need for a relational database or further pre-processing, manipulation or normalization. This approach

allows for very fast input of log data including those in new or custom formats. EventTracker then creates

a sparse matrix metadata index associated with each log archive providing dramatic performance

enhancement during extraction and display of data for log searches and reporting – hence “Fast In, Smart

Out”.

For security, the archive files are compressed on the file system and a SHA-1 checksum is generated and

striped over each archive file. This provides exceptional storage efficiencies where the indices provide

instant read efficiencies. The architecture utilizes a write-once-read-many implementation ensuring that

once data is committed to archive, it cannot be altered without detection.

EventTracker also includes the industry’s largest pre-defined log knowledge libraries which provide the

automated interpretation of log data represented in easy to understand language for alerting, search,

dashboards and reporting.

Challenges SIEM implementations address distinct challenges in the enterprise network including log volume, variable

log formats, secure data retention, and widely disparate use cases including real-time alerting, access,

correlation, analysis, reporting, forensics, and long-term secure data management.

Page 3: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

Large Volumes of Unstructured Log Data

SIEM - Log

Management

• No Logging Standards

• Millions / Billions / Trillions

• Thousands of Vendor Log Formats

Large Volumes of Unstructured Log Data

Log Volume Event and audit log messages are generated by thousands of log sources, including computing platforms

and applications, networking, storage and security devices. The volume of log messages often varies with

time of day, peaking at shift changes or as critical applications launch across the enterprise. EventTracker

manages the inbound volume, and understands each log message it receives in real-time.

Syslog and SNMP (simple network management protocol) messages from UNIX- or Linux-based systems,

firewalls, routers and switches generally push messages using protocols such as UDP or TCP to export log

data to third party receivers (the SIEM). Microsoft Windows Servers and Workstations, and many other

systems write audit events to local disk. These are collected by pulling [i.e. polling] or pushing them to

receivers. Also, some Windows audit events are written to text or other formats outside of the EVT/EVTX

log files. EventTracker supports polling with proper credentials, or (preferably) transmitted in real-time

using the EventTracker Windows Agent.

EventTracker includes complete facilities to install, configure, upgrade and uninstall EventTracker agents

from the management console. An MSI package is also available for distribution via Microsoft SMS, KACE

or similar software distribution tools. [See “Log Collection” below for more information on EventTracker

Agents.]

Log Collection The most verbose IT logs generated in a typical enterprise include security, network, infrastructure and

application sources which generate hundreds of millions, (even billions+) of event logs each hour/day.

This enormous body of continuously generated information is the basis for providing and assessing IT

security, and is instrumental in demonstrating regulatory compliance using a SIEM.

Page 4: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

The lack of standardization in the industry yields a proliferation of formats and transmission methods

ranging from the traditional UNIX/Linux/Network syslogs and simple network management protocol

(SNMP) traps to open database connectivity (ODBC) and proprietary interfaces such as Check Point®

Software’s OPSEC API, VMware API, the aforementioned Windows EVT/EVTX formats, etc.

EventTracker can operate “agent-optional” as some log sources do not natively send their logs off-

platform/host in a standard manner (e.g., Windows) and a MS gold-logo certified agent is available. The

agent is persisted as a low-overhead, silent service on monitored Windows systems, providing a wealth of

advanced features.

Batch Capture Often, audit log data is generated as text or xml files containing many individual event messages (e.g.,

Bluecoat proxy devices, web servers, Java application data, or Apache log4j, etc.). EventTracker supports

the bulk load of this data directly into the EventVault archive. In this process, EventTracker will invoke a

third-party plug-in to process the raw data before it is archived. For example, Apache logs may be

processed by awstats, a popular statistics package that generates a variety of web activity reports in html.

These reports can be displayed in the various EventTracker Reports Consoles for a unified view without

having to first process them in EventTracker.

Data Decisions Given today’s data volumes, a significant technical challenge is the comprehensive processing of every log

as it is received. Some SIEM applications attempt to manage this by first normalizing log data. Systems

that normalize each inbound message to pre-determined [meta-] data values (i.e. user = “x”) by

performing lookups and data replacement may compromise legal or enforcement processes downstream

as the original log information (which might later be required for audit reporting or forensics/search

operations) is discarded and those systems retain only a subset of message contents.

EventTracker does not normalize log messages on input, but processes and retains them in their original

format. This approach sustains high data input rates and does not discard information to populate a pre-

defined RDBMS schema. Processing includes receipt, parsing, auto-identification, categorization, alerting,

indexing and archiving.

Secure Log Storage Log data archives must be secure as required by PCI-DSS, FISMA, HIPAA and other [internal] security and

third-party compliance standards. The EventTracker EventVault is a file-based log archive which stripes a

SHA-1 checksum on each .cab file, rendering them tamper evident. This checksum is re-generated and

compared each time an archive is accessed and regular file integrity interval checks may also be scheduled.

EventTracker server hardening guidelines provide detailed recommendations on access control

restrictions and encryption configured on the OS to lock down access and protect your log data over time.

Page 5: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

Compatibility EventTracker’s open, flexible log management framework can receive and process any log from any

source. EventTracker is a MS gold logo certified software application available as a virtual appliance or

instantiated on physical Windows Servers or in the “cloud”. Log archives are standard MS .cab files on the

file system. There is no requirement for costly SQL or Oracle RDBMS or database administrators.

EventTracker Architecture The EventTracker baseline architecture as shown in the diagram below provides two primary methods of

data collection, real-time or direct log file transfer (batch). Transfers via syslog (TCP or UDP), SNMP v1 or

v2, or via EventTracker Agents (available for Windows and Solaris BSM) are fully supported. Note that the

optional EventTracker Windows agent is also able to gather log data from Checkpoint devices using the

OPSEC LEA interface and from VMware via its XML API, MS SQL trc, flat files, CSV, W3C, text and XML

formats are also supported in real-time along with any selected events from the Windows event log. Log

file transfers are also supported via ftp, sftp or scp and Apache log4j.

Page 6: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

Scalability The EventTracker implementation is as a set of distributed software modules that communicate via IP

when operating with the multi-server Collection Master / Collection Point architecture (see diagram

below). Up to 20 Virtual Collection Points which include the EventTracker Receiver, Processer and Archiver

logic, may be instantiated, to monitor specific ports (eg 514, 14505, etc.) and optionally write to a unique

disks or spindles for multi-threaded processing. This approach provides superb scalability on generic

server class hardware.

Note: Commodity servers running EventTracker’s Collection Master / Collection Point can fully process billions of inbound logs

with peak loads configurable to 100,000+ events per second. Multiple server implementations based on this architecture can

accommodate even the highest volume operations.

Page 7: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

Disk Utilization Log data is voluminous and much of it is of minimal long-term value. It is sometimes hard to predict

accurately which data will be useful. Consequently, careful organizations opt to retain all of it. This can

very easily turn into a storage and disk utilization nightmare. Efficient storage of log data over variable

retention periods is a critical evaluation factor. Bringing 1K bytes of raw data into a traditional relational

database results in 12K to 15K of new storage – caused by construction of tables and other overhead.

In contrast EventTracker provides significant storage efficiencies due to a very low data storage factor of

0.12 – 0.28 which includes metadata (indices) as well as the compressed flat file log data. Zero

maintenance is required on a daily basis. Data received in real-time is processed for alerts against rules

and also by the correlation engine and behavior modules. Data arriving via file transfer is processed per

rules (which can include processing via a third party plug-in). All data is compressed, archived and signed

with a SHA-1 checksum within the EventVault. An indexing process develops metadata for the newly

created archive which is stored as an XML reference and associated with the target archived file.

These include USB monitoring, selectable or scheduled/compressed event transfer, traffic caching and

encryption, system monitoring and managing of application log files outside the Windows EVT/EVTX event

logs. There are strong advantages to installing EventTracker agents including source/type filtering,

caching, access to the underlying platform for security, real-time caching and protocol stack, access to log

files in other locations etc. Note that EventTracker includes the ability to centrally deploy, configure and

remove remote agents. Alternatively, agents are also available as an MSI package for distribution via other

methods. However, there are cases when their use is not possible or desirable. In such cases, the platform

can be polled periodically for new log entries over the network. Agents are a must in cases where the

platform is closed, binary and/or does not send the log data off-platform (e.g., IBM i-Series or Solaris under

BSM).

A receiver process, which is part of EventTracker, listens on configurable ports for specific protocols (e.g.,

syslog over UDP or TCP, SNMP, real-time streams from EventTracker Windows Agent or Solaris BSM agent

etc. Data is written to cache files on disk as soon as it is received. Once 50Mb of data is received or sixty

minutes have elapsed, the cache is compressed and indexed in preparation for archival.

Processing Log data received in real-time are processed through up to three rules-based functions. First, the log

message is matched with alerting and behavior rules. A positive match triggers the configured,

notifications and/or remedial actions. Notification methods include e-mail, sms text, CTI, pager or forward

as SNMP trap or syslog message. Remedial actions can be triggered as a local script on the master console

or at the optional agent host. Second, the log message is processed by the correlation engine, which

maintains a cache of events. A positive match results in the generation of an EventTracker composite log

message which can combine elements from the source logs in the correlation rule. The new log message

is used to trigger alerts as described earlier. Finally, the contents of the log message are reviewed by the

Page 8: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

Behavior Analysis module with its rule set. Any new or out-of-ordinary condition results in the generation

of an appropriate log message which may be configured to trigger the aforementioned alert conditions.

Data Management EventTracker’s scalable architecture captures the original raw log message in its native form. This is critical

when faced with chain-of-custody questions for Human Resources, civil or criminal issues. While reports

and dashboards may be configured to show only relevant pieces of the log messages, the complete original

log/event is always accessible. Storage compresses flat files (standard.cab) on the Windows file system.

There is no lock-in to proprietary formats. Lastly, the use of standard platforms and formats maximizes

the interoperability within the enterprise architecture. Long term storage can be on-line, near-line or off-

line, using standard Windows file management and back-up capabilities.

Reporting Hundreds of pre-defined reports are available within EventTracker grouped under Security, Compliance

(includes SANS Consensus Audit Guidelines - IT controls) and Operations bundles. Reports may be

generated in PDF, Word, HTML or XLS/XLSX formats. Users interact with a simple point-and-click interface

to specify reporting parameters. These are used to consult the index data and determine the relevant

archive files. This approach can yield can up to 10x improvement in performance as compared to the brute

force approach which would be needed if indexing was absent. The technique is particularly effective

when reporting on exceptions (needle in the haystack). EventTracker also includes the powerful concept

of FLEX reports where the user has complete control over report design. A simple point-and-click interface

allows the user to define log message filters, parsing rules and the output format. Results are usually

generated in the Excel format which allows for further post-processing. This technique is especially

powerful in generating reports for hitherto unknown log formats. Reports can be very quickly created and

scheduled for regular generation and delivery.

Summary The demands of the dynamic enterprise placed on a comprehensive SIEM system can be too great for a

system based on the traditional RDBMS. With its highly scalable distributed file architecture, EventTracker

can meet these demands at a lower total cost of ownership than other systems which lack EventTracker’s

“Capacity on Demand” architecture.

When examining SIEM products, administrators should consider the following:

Is RDBMS Licensing required?

Will the SIEM “fill up” and will the input capacity and storage scale without additional

appliance/hardware purchases?

Is the application logic separated from the storage back-end?

Page 9: Managing Billions of logs Every Day - EventTracker...Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions

White Paper

Managing Billions of Logs Every Day

What precise disk storage will be required to support your volumes and retention requirements

over the next 2-3 years? What is included in the vendor’s “Data Explosion” numbers, which are

often times provided as EPS (events per second)? Do the projections include all storage

overhead, or just the data in the RDBMS?

Does the system continue to collect data during backup, defragmentation and other management

scenarios?

Are agents optional? How many EPS can they handle? What happens to dropped messages?

Is TCP supported for syslog messages to ensure delivery/receipt?

Does the system utilize proprietary protocols or formats?

Does the system capture and retain log data from unrecognized network devices?

What data is stored? – All fields of the data captured or just a standard subset?

Is auto-discovery inherent to minimize pre-configuration effort?

How many unique log are fully supported out-of-the-box?

Does the architecture affordable support distributed WANS, V-LANS? Does each site need a

physical collector appliance?

About EventTracker EventTracker’s advanced security solutions protect enterprises and small businesses from data breaches

and insider fraud, and streamline regulatory compliance. The company’s EventTracker platform comprises

SIEM, vulnerability scanning, intrusion detection, behavior analytics, a honeynet deception network and

other defense in-depth capabilities within a single management platform. The company complements its

state-of-the-art technology with 24/7 managed services from its global security operations center (SOC)

to ensure its customers achieve desired outcomes—safer networks, better endpoint security, earlier

detection of intrusion, and relevant and specific threat intelligence. The company serves the retail,

hospitality, healthcare, legal, banking and financial services, utilities and government sectors.

EventTracker is a division of Netsurion, a leader in remotely-managed IT security services that protect

multi-location businesses’ information, payment systems and on-premise public and private Wi-Fi

networks. www.eventtracker.com.