fault and intrusion tolerant (fit) event broker & bft-smart a. casimiro, d. kreutz, a. bessani,...

16
Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa, Portugal Meeting PT, November 27, 2012

Upload: oscar-hill

Post on 28-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

Fault and Intrusion Tolerant (FIT) Event Broker

& BFT-SMaRt

A. Casimiro, D. Kreutz, A. Bessani,

J. Sousa, I. Antunes, P. Veríssimo University of Lisboa, Portugal

Meeting PT, November 27, 2012

Page 2: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

2

Cloud Infrastructures

SAN

VPN

IP Network

Monitoring Tools and Control Engines

Processing farmStorage farm

Switching andRouting

Control

Even

ts

Events

Con

trol

Events

Control

Contro

lEvent

s

Alert! Cloud infrastructures are one of the new hot targets of attacks!

Meeting PT / November 27, 2012

Page 3: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

3

Example scenario:Portugal Telecom Cloud Computing Infrastructure

SmartCloud product First and main problem:

Centralized monitoring approach Diversity of monitoring tools

ArchSight, Pulse, SCOM

Meeting PT / November 27, 2012

Agentless

Agent-Based

Agent with ArchSight

ArcSight(engine)

Mon

itorin

g P

robe Eve

nts

EventsEvents

Events

Events

ArcSight or other

tool

Problems: (a) faults and attacks;(b) diversity is hard to

achieve in practice.

Page 4: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

4

The TRONE approach

Fault and Intrusion Tolerant (FIT) Event Broker

Automated Failure Diagnosis

Multi-homing for fast reconfiguration

Meeting PT / November 27, 2012

SCTP ResourceManager

Replicated Brokers

FIT event brokerConsole

Router

Router

Cloud servers

FailurediagnosisSubscribe

12

3

Publish

1

2

3

Page 5: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

5

FIT Event BrokerGoals and challenges

Overarching goals: To provide support for trustworthy and resilient monitoring of

cloud/datacenter infrastructures To achieve improved Quality of Protection without neglecting

Quality of Service (performance) needs

Some specific challenges: Deal with large flows of information (events) Support different kinds of events (e.g. different criticality) Low intrusiveness and easy integration

Meeting PT / November 27, 2012

Page 6: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

6

FIT Event Broker Assumptions

System entities: Probes, event collectors/brokers, consoles Some event processing may be done by collectors

Fully connected network E.g., all the entities lie in the same monitoring VLAN

Partially synchronous system Clocks may be used to timestamp events

Faults Some FIT brokers may crash or fail in a Byzantine way We do not require/enforce clients (probes/consoles) to be correct

If this is a problem for monitoring, then it must also be solved

Meeting PT / November 27, 2012

Page 7: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

7

FIT Event Broker Baseline design options

Topic-based Publish-Subscribe paradigm Good fit to considered scenarios

State Machine Replication Active replication is better for Byzantine fault tolerance f out of n replicas of a FIT Broker may fail in a Byzantine way

Public-key cryptography Client authentication, avoid attacks from malicious probes

Event channels with support for QoP and QoS Differentiated fault-tolerance support (e.g. crash only or BFT)

Meeting PT / November 27, 2012

Page 8: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

8

FIT Event Broker High level architectural view

Meeting PT / November 27, 2012

Page 9: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

9

FIT Event BrokerInterface

Meeting PT / November 27, 2012

Create event channelIn: TAG and CLASS

Destroy event channelIn: TAG

Register to channelIn: TAG

Publish eventIn: EVENT

Subscribe to channelIn: TAG

Receive eventOut: EVENT

Page 10: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

10

FIT Event BrokerInternal state

From the SMR perspective, it is important to identify the relevant state that needs to be maintained consistent across replicas Data related to the broker configuration

Existing channels and their CLASS Registered publishers and subscribers

Data related to events Events that are ready to be delivered

Agreement protocol

TAG SUBSCRIBER STATUS

T1 S1, S2 OK

T2 S3 OK

S1

S2

S3

Subscription Table

Output queues

TAG-based Filter

All client input that affects the state of the FIT broker state (e.g. channel and subscription data, some events) must be handled as a state machine command

Meeting PT / November 27, 2012

Page 11: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

11

BFT-SMaRtOverview

Java-based platform for BFT SMR, available at http://code.google.com/p/bft-smart/

Actively being developed and improved in our group BFT SMR “common” features

State machine programming model n ≥ 3f+1 replicas required A small step away from being a commercial product

Advanced features Replica recovery (state transfer) Reconfigurations Extensible API: e.g. custom voter

Meeting PT / November 27, 2012

Page 12: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

12

BFT-SMaRtService invocation

Meeting PT / November 27, 2012

PROBE

FIT Broker state Agreement on orderperformed by SMaRt

Page 13: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

13

BFT-SMaRtExecution and response

Meeting PT / November 27, 2012

Commands are delivered to the FIT broker, which updates the state/queues

and replies Voting on clientside

Page 14: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

14

The FIT Broker is currently being implemented…

…and integrated with BFT-SMaRt

Evaluation: Throughput

Aim is to deal with 40K events/sec Resilience

Measure performance under attack Verify recovery and reconfiguration

capabilities

A simple demo is available

Meeting PT / November 27, 2012

BFT-SMaRtImplementation & Evaluation

SMaRtSMaRt

SMaRtSMaRt

ServiceProxy

ServiceProxy

ServiceProxyObject.invoke

ServiceProxyObject.invoke

FIT Broker Replica

Publisher

Subscriber

Page 15: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

15

BFT-SMaRt Implementation & Evaluation

Preliminary results available [DAIS 2012]

Meeting PT / November 27, 2012

Throughput for up to 100 channels

Page 16: Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

16

Summary

FIT Event Broker – Event dissemination support For easier deployment of multiple monitoring tools Manage which events are propagated, to which consoles, with which QoS

BFT-SMaRT – Byzantine fault tolerant replication First usable implementation of BFT replication Leading edge worldwide Resilience against malicious attacks with small overhead

Portugal Telecom’s cloud infrastructure is being used as real use case for application and evaluation of the work

Meeting PT / November 27, 2012