Global Intrusion Detection Global Intrusion Detection Using Distribute Hash TableUsing Distribute Hash Table
Jason Skicewicz, Laurence Jason Skicewicz, Laurence Berland, Yan ChenBerland, Yan Chen
Northwestern University 6/2004Northwestern University 6/2004
Current ArchitectureCurrent Architecture
Intrusion Detection SystemsIntrusion Detection Systems• Vulnerable to attackVulnerable to attack• Many false responsesMany false responses• Limited network viewLimited network view• Varying degrees of intelligenceVarying degrees of intelligence
Centralized Data AggregationCentralized Data Aggregation• Generally done manuallyGenerally done manually• Post-mortem global viewPost-mortem global view• Not real time!Not real time!
Sensor Fusion CentersSensor Fusion Centers
Sensor fusion centers (SFC) aggregates Sensor fusion centers (SFC) aggregates information from sensors throughout the information from sensors throughout the networknetwork• More global viewMore global view• Larger information poolLarger information pool• Still vulnerable to attackStill vulnerable to attack• Overload potential if multiple simultaneous Overload potential if multiple simultaneous
attacksattacks Can’t we leverage all the participants?Can’t we leverage all the participants?
Distributed Fusion CentersDistributed Fusion Centers
Different fusion centers for different Different fusion centers for different anomaliesanomalies
Must attack all fusion centers, or Must attack all fusion centers, or know more about fusion center know more about fusion center assignmentsassignments
Still needs to be manually set up, Still needs to be manually set up, routed torouted to
What if things were redundant and What if things were redundant and self-organizing?self-organizing?
What is DHTWhat is DHT
DHT, or Distributed Hash Tables, is a peer-DHT, or Distributed Hash Tables, is a peer-to-peer system where the location of a to-peer system where the location of a resource or file is found by hashing on the resource or file is found by hashing on the keykey
DHTs include CHORD, CAN, PASTRY, and DHTs include CHORD, CAN, PASTRY, and TAPESTRYTAPESTRY
DHT attempts to spread the keyspace DHT attempts to spread the keyspace across as many nodes as possibleacross as many nodes as possible
Different DHT use different topologiesDifferent DHT use different topologies
CANCAN
CAN is based on a multi-reality n-CAN is based on a multi-reality n-dimensional toroid for routing dimensional toroid for routing (Ratnasamy et al)(Ratnasamy et al)
CANCAN
Each reality is a complete toroid, Each reality is a complete toroid, provides full redundancyprovides full redundancy
Network covers entire address space, Network covers entire address space, dynamically splits spacedynamically splits space
Routes across the CAN, so you don’t Routes across the CAN, so you don’t need to connect directly to the need to connect directly to the Fusion CenterFusion Center
GIDS over DHTGIDS over DHT
Fusion centers are organized on a Fusion centers are organized on a distributed hash tabledistributed hash table• Peer-to-peerPeer-to-peer• Self-organizedSelf-organized• DecentralizedDecentralized• ResilientResilient
We use Content Addressable Network We use Content Addressable Network (CAN)(CAN)• Highly redundantHighly redundant• N-dimensional toroid enhances reachabilityN-dimensional toroid enhances reachability
DIDS diagramDIDS diagram
INTERNET
NIDS NIDS
Host IDS
CAN
Peer-to-peer
Infected Machine
Worm Probe Sent
NIDS Reports to Fusion Center
CAN directs toFusion Center
IDS on probed Host reports toFusion Center
Reporting InformationReporting Information
Fusion Centers need enough information Fusion Centers need enough information to make reasonable decisionsto make reasonable decisions
ID systems all have different proprietary ID systems all have different proprietary reporting formatsreporting formats
Fusion Centers would be overloaded with Fusion Centers would be overloaded with data if full packet dumps were sentdata if full packet dumps were sent
We need a concise, standardized format We need a concise, standardized format for reporting anomaliesfor reporting anomalies
Symptom VectorSymptom Vector
Standardized set of information Standardized set of information reported to fusion centers.reported to fusion centers.
Plugins for IDS could be written to Plugins for IDS could be written to handle producing these vectors and handle producing these vectors and actually connect to the CANactually connect to the CAN
Flexibility for reporting more detailsFlexibility for reporting more details
Symptom VectorSymptom Vector
<src_addr,dst_addr,proto,src_port,dst_port,payload,<src_addr,dst_addr,proto,src_port,dst_port,payload,event_type,lower_limit,upper_limit>event_type,lower_limit,upper_limit>
• Payload: Payload specifies some descriptor of the actual Payload: Payload specifies some descriptor of the actual packet payload. This is most useful for worms. Two packet payload. This is most useful for worms. Two choices we’ve considered so far are a hash of the choices we’ve considered so far are a hash of the contents, or the size in bytescontents, or the size in bytes
• Event_type: A code specifying an event type such as a Event_type: A code specifying an event type such as a worm probe or a SYN floodworm probe or a SYN flood
• Based on the event_type, upper_limit and lower_limit are Based on the event_type, upper_limit and lower_limit are two numerical fields available for the reporting IDS to two numerical fields available for the reporting IDS to provide more informationprovide more information
Payload ReportingPayload Reporting Hash: a semi-unique string produced by Hash: a semi-unique string produced by
performing mathematical transformations performing mathematical transformations on the contenton the content• Uniquely identifies the contentUniquely identifies the content• Cannot easily be matched based on “similarity” Cannot easily be matched based on “similarity”
so it’s hard to spot polymorphic wormsso it’s hard to spot polymorphic worms Size: the number of bytes the worm takes Size: the number of bytes the worm takes
upup• Non-unique: two worms could be of the same Non-unique: two worms could be of the same
size, though we’re doing research to see how size, though we’re doing research to see how often that actually occursoften that actually occurs
• Much easier to spot polymorphism: simple Much easier to spot polymorphism: simple changes cause no or only small changes in sizechanges cause no or only small changes in size
Routing InformationRouting Information
DHT is traditionally a peer to peer file DHT is traditionally a peer to peer file sharing networksharing network• Locates content based on name, hash, Locates content based on name, hash,
etcetc• Not traditionally used to locate Not traditionally used to locate
resourcesresources We develop a routing vector in place We develop a routing vector in place
of traditional DHT addressing of traditional DHT addressing methods, and use it to locate the methods, and use it to locate the appropriate fusion center(s)appropriate fusion center(s)
Routing VectorRouting Vector
Based on the anomaly typeBased on the anomaly type Generalized to ensure similar Generalized to ensure similar
anomalies go to the same fusion anomalies go to the same fusion center, while disparate anomalies are center, while disparate anomalies are distributed across the network for distributed across the network for better resource allocationbetter resource allocation
Worm routing vector: Worm routing vector: <dst_port,payload,event_type,lower_<dst_port,payload,event_type,lower_limit,upper_limit>limit,upper_limit>
Routing VectorRouting Vector
Worm routing vector avoids using Worm routing vector avoids using less relevant fields such as source less relevant fields such as source port or IP addressesport or IP addresses
Designed to utilize only information Designed to utilize only information that will be fairly consistent across that will be fairly consistent across any given wormany given worm
Used to locate fusion center, which Used to locate fusion center, which receives full symptom vector for receives full symptom vector for detailed analysisdetailed analysis
Size and the boundary problemSize and the boundary problem Assume a CAN with several nodes. Each is Assume a CAN with several nodes. Each is
allocated a range of sizes, say in blocks of 1000 allocated a range of sizes, say in blocks of 1000 bytes.bytes.
Assume node A has range 4000-5000 and node B Assume node A has range 4000-5000 and node B has range 5000-6000has range 5000-6000
If a polymorphic worm has size ranging between If a polymorphic worm has size ranging between 4980 and 5080, the information is split4980 and 5080, the information is split
Solution? Have information sent across the Solution? Have information sent across the boundary. Node A sends copies of anything with boundary. Node A sends copies of anything with size >4900 to node B and node B sends anything size >4900 to node B and node B sends anything with size <5100 to Awith size <5100 to A
To DHT or not to DHTTo DHT or not to DHT
DHT automatically organizes everything DHT automatically organizes everything for usfor us
DHT ensures anomalies are somewhat DHT ensures anomalies are somewhat spread out across the networkspread out across the network
DHT routes in real time, without DHT routes in real time, without substantial prior knowledge of the substantial prior knowledge of the anomalyanomaly
DHT is redundant, making an attack DHT is redundant, making an attack against the sensor fusion center tricky at against the sensor fusion center tricky at worst and impossible to coordinate at bestworst and impossible to coordinate at best
Simulating the systemSimulating the system
We build a simple array of nodes, We build a simple array of nodes, and have them generate the and have them generate the symptom and routing vectors as they symptom and routing vectors as they encounter anomaliesencounter anomalies
Not yet complete, work in progressNot yet complete, work in progress Demonstrates fusibility of Demonstrates fusibility of
information appropriately; non-information appropriately; non-interference of multiple simultaneous interference of multiple simultaneous anomaliesanomalies
Further WorkFurther Work
Complete paper (duh)Complete paper (duh) Add CAN to simulation to actually Add CAN to simulation to actually
routeroute Include real-world packet dumps in Include real-world packet dumps in
the simulationthe simulation Test on more complex topologies?Test on more complex topologies?