1. introduction 2. the case for a distributed data-collection … · 2018. 2. 19. · • p2p,...

1. Introduction

2. The case for a distributed data-collection architecture

3. Architecture

4. Validation

5. Conclusion

2

From the operator perspective, reliable data gathered from strategic network locations make all the difference for a number of applications, such as:

• quality and SLA monitoring

• network planning

• security purposes (threat identification and profiling, violation of terms of use, etc.)

However....

Operators still follow classic approaches based on a limited number of standalone probes and filters positioned along strategic places on their own core network, that gather information about specific trends and patterns

5

Triple-Play + xDSL/Cable

• widespread broadband (capacity, # connections)• frequently clients have their own LANs• P2P, IPTV, VoD, VoIP, multimedia content,

messaging…• ISP has devices on the customers’ LANs

(set top boxes, IP phones, alarm devices, etc.)• distributed attacks (botnets, D-DoS…)…

• slow, temporary dial-up connections• traditional applications• point-to-point connections• single-sourced attack patterns• ISP concerns ended at the POP

Dial-up

Triple-play /PON

?

xDSL/Cable

Gbpskbps Mbps Mbps Mbps

The classic model is not scaling properly with the sheer increase of traffic flow volume and diversity – a consequence of access technologies such as DSL and optical fiber and applications such as IPTV, video-on-demand, voice and P2P.

7

Idea:Recruit the customers’gateways to participate in the data collection effort.

Each router has the necessary means and resources to be able support a embedded data-collection mechanisms.

Data and events detected by the users’ gateways are transmitted to the ISP level, where they are subjected to correlation and further processing.

Traditional Distributed

ISP restricted to its own network

Users define the ISP’s scope of influence. Minimum ISP reach extended to CPE

boundaries. Customer LAN can be monitored by the CPE, if the user allows it

Frequently requires dedicated equipmentIt is possible to use already available

network equipment: the broadband routers the subscribers already installed and paid for.

Each probe deals with a sheer amount of trafficEach probe deals with a much smaller traffic

flow, making it possible to apply fine-grained processing techniques

Captured data and event correlation scopes are limited to a global perspective

Monitoring system to access and infer information at two distinct infrastructure levels: microscopic (subscriber) level and

macroscopic (operator) levelTraffic monitoring is possible at the ISP

infrastructure, with scalability limitations Traffic monitoring at the CPE level

8

This coordinated operation model allows the monitoring system to access and infer information at two distinct levels:

• microscopic (subscriber) level

• macroscopic (operator) level

making it capable of detecting trends otherwise impossible for a device operating autonomously (like standalone probes, in the classic model).

Events are encoded and transported using the Intrusion Detection Message Exchange Format (IDMEF - [RFC4765]) format.

9

It is impossible to implement a testbed with the adequate size and scale to evaluate the solution

Solutions: simulation or analytical validation

Main concern refers to the scalability of the solution

Approach: analyze and chatacterize/profile traffic in the access network; evaluate performance of the solution; extrapolate the results.

10

Several alternatives◦ Simulation

Problem: adequate data is not avaliable in order to perform a reliable simulation study

◦ User third-party traffic tracesSome traces are anonymizedUsage patterns are unknownSome traces are obsolete

Chosen alternative:◦ Laboratory trace collection based on the usage pattern of

regular network users

11

1212

• Healthy Home User• 3 clients• Web, traffic and P2P traffic• Network traffic is always initiated from

the LAN

• Vulnerable Home User• 3 clients• 1 Honey Pot (binds ports 1...1023)• Web, traffic and P2P traffic• Network traffic may be initiated from both

networks: LAN and WAN

Regular Honey Pot

Number of security events (per hour) 29 1 686

Percentage of external addresses involved in security events (potential attack sources)

0.19% 0.32%

Number of security events per 100MB of traffic

7 102

Average Traffic Rate (kbps) 150 447

Record time (hours) 7.33 8

Total number of external addresses 50 960 278 433

13

14

Performance measurements with 10,100,1000,10000 messages

HOME GATEWAY ISP Server

Processor Intel(R) Pentium(R) 4 CPU 3.00GHzIntel(R) Pentium(R) 4

CPU 3.00GHzMemory 494 MB 2 GBCaches L1/L2 16KB / 2MB 16KB / 2MBOperatingSystem Ubuntu Linux Ubuntu Linux

Network 82541GI Gigabit Ethernet Controller

3Com Corporation 3c905C-TX/TX-M

The system maintains the same event processing capacity, independently of the number of events, in all message sets

15

The IDMEF traffic produced by the solution is substantially low, even when reporting 100% of the events to the ISP

17

18

Crossing the traces results and the scalability results◦ 3 Scenarios:

Every client is healthy80% healthy and 20% vulnerableEvery client is vulnerable

Most probable scenario

The concept allows for the creation of a distributed data-collection infrastructure leveraging resources already existant (and paid for)

Can be used to support real-time monitoring and reaction mechanisms

Can be extended for other purposes, but some questions remain: how about user privacy ?

1. introduction 2. the case for a distributed data-collection … · 2018. 2. 19. · • p2p,...

Documents