building a secure and resilient network infrastructure dan massey colorado state university
TRANSCRIPT
Building a Secure and Resilient Network Infrastructure
Dan MasseyColorado State University
7 October 04 [email protected]
Outline
Changes in the Infrastructure Environment
Using Internet Worm Attacks To Motivate:
Secure and Resilient BGP Communication
Path Vector Algorithm Convergence
Network Fault Identification
New Challenges in Authentication
DNS Security
7 October 04 [email protected]
Original Infrastructure Goals
The original designs assumed that: Hardware is unreliable: servers/routers will fail
Network links are unreliable: connections will fail
Data transport is unreliable: bit errors will occur
The goal was to build protocols that : Provide functionality despite all of the above
Scale to extremely large size
Tremendously successful in this respect: BGP routing protocol - 150K+ routes 20K+ systems
DNS naming protocol - 1G of records in 60M zones
7 October 04 [email protected]
The Infrastructure Today Success and growth to large-scale adds:
Implementation and under-specification errors Configuration errors by diverse administrators Complex interactions and challenge of scale Intentional attacks
The Internet works today because: Robust original design masks many problems. Clever operational tricks keep the system afloat
– Ex: AOL BGP TTL Hack (RFC 3682) to protect routers from DDoS
“For every type of animal there is a most convenient size, and a large change in size inevitably carries with it a change of form.”
7 October 04 [email protected]
Changing the Form of the Internet
We need to recognize current design successes. The Internet generally works today.
Includes millions of already deployed systems.
Provides a laboratory for large-scale system problems.
New challenges require a new approach to design. Essential to add resilience and security.
But this does not imply we must start from scratch.
New solutions must either be incrementally deployable or must prove the necessity for a fresh start.
7 October 04 [email protected]
Slammer Worm After 30 Minutes (graph by CAIDA)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
7 October 04 [email protected]
BGP Routing Infrastructure Internet’s Global Routing
Protocol Connects Autonomous
Systems (AS) Path Vector Routing
Protocol Announce the path of AS
used to reach destination Routes adapt to:
Link changes & route polices Does not adapt to traffic
load. Recent worm attacks should
have no impact on BGP.
AS 1
AS 2 AS 3
Prefix P
Prefix P
Path 2,1
Prefix P
Path 3,1
7 October 04 [email protected]
BGP Updates During Nimda Worm
Measurement Artifacts
Routing Changes
Total Attack
7 October 04 [email protected]
BGP Measurement Artifacts BGP peers establish TCP session
and send full route table (120K+ routes) Updates sent only if routes change.
Our results show frequent session resets between ISP routers and the monitoring point. Monitoring point sessions cross
multiple systems in the Internet. Each reset adds 120K updates. But very few ISP-ISP session resets.
Our work in [1] presents rules to remove session reset artifacts.
Initial Table(120K+ routes)
Route Changes
Initial Table(120K+ routes)
7 October 04 [email protected]
What Our Analysis ShowsBGP Advertisements on 9/18/2001
42%
5%8%8%
37%BGP Table Exchange
Duplicate Advertisements
New Announcements
Withdraws
Implicit Withdraws
40.2%
A substantial percentage of the BGP messages during the worm attack were not about route changes
37.6%
8.8%8.3%
7 October 04 [email protected]
FRTR: Improving Peer Communication BGP Updates Are Not (Topology) Event Driven
Session resets trigger high volume surges– Govindan shows cascade failures can result.
Lifetime of Invalid Routes is Unbounded Never recover (until reset) if update is somehow lost.
– Despite TCP, we found cases of “lost” withdrawals. Attacker can poison a route with one update.
Soft-state (periodic re-announce) is too costly…
FRTR Uses Periodic Bloom Filter Digests Digests quickly confirm state after session reset. Periodic digests bound lifetime of faults (w/ high
prob). Co-Author Keyur Patel (Cisco) is exploring Cisco
development.
7 October 04 [email protected]
FRTR Performance For each route at receiver,
check against the digest. Bloom filter results in no
false negatives. Compare total digests for
missing route detection. False positive possible with
known rate. Add salts to reduce the
chance of repeated false positives.
Overhead is a function of digest size and frequency.
Work with Cisco suggests a 1.3% overhead increase.
Complete Details to appear in [2] (DSN 2004)
7 October 04 [email protected]
What Our Analysis Shows (2)BGP Advertisements on 9/18/2001
42%
5%8%8%
37%BGP Table Exchange
Duplicate Advertisements
New Announcements
Withdraws
Implicit Withdraws
40.2%
What about the 60% Not Due to Table Exchange…
37.6%
8.8%8.3%
FRTR Elimanates Bursts
7 October 04 [email protected]
A Closer Look at the Route Changes
Actual path changes
Updates with no path change
This is the U.S. DoD. Explained in [3] (DISCEX 2003)
7 October 04 [email protected]
Improving Path Vector Convergence Infocom 02 [4] uses consistency to detect invalid
paths. Reject path <x1, x2,…, xn, r1,r2…, rm> if
r1 is adirect neighbor r1’s path is not <r1, r2, …., rm>
Adjusted to account for policy and implement in BGP Infocom 03 [Afek, et al] quickly flushes invalid paths.
BGP requires updates be separated by a min interval Send withdraw (to flush route) if blocked by the interval
Our recent work [5] attaches a new attribute: Root Cause Notification (RCN) Identifies the failed link and includes a sequence number. Allows any route relying on the failed link to be rejected.
7 October 04 [email protected]
Simulation Results
7 October 04 [email protected]
What Our Analysis Shows (3)
BGP Advertisements on 9/18/2001
42%
5%8%8%
37%
BGP Table Exchange
Duplicate Advertisements
New Announcements
Withdraws
Implicit Withdraws
40.2%
Can’t Eliminate the actual topology dynamics…
37.6%
8.8%8.3%
FRTR Elimanates Bursts
RCN Improves Convergence
7 October 04 [email protected]
Identifying the Source of Faults It is believed that worm attacks caused
edge instability, but core links remained up. Can we prove (or disprove) this claim?
The Fault Identification Problem BGP Monitoring points collect gigabytes of data
from an ad hoc selection of monitoring points. Underlying Internet topology is not known, but
data does include path information What can you conclude regarding faults?
Pursuing Two Parallel Solutions Enhance protocol to include fault data (RCN). Design tools and algorithms to automate fault
identification
7 October 04 [email protected]
The Link Rank Analysis Toolset LinkRank [6]
developed for analyzing BGP data.
Assigns each AS-AS link a weight based on number of prefixes.
Records aggregate rank changes over time.
Figure shows the graph from AS 6539. Note all links leaving
AS 701 show a route loss.
7 October 04 [email protected]
Combining Multiple Views Previous snapshots
suggested a failure at AS 701.
View from other points shows all BGP monitors saw a shift away from AS 701.
NANOG confirmed a corresponding failure event.
Successfully applied LinkRank to several Internet events.
7 October 04 [email protected]
Formalizing the Results LinkRank relies heavily on human intuition.
Investigating algorithms to automate detection. The Fault Identification Problem
Given only path vector routing table snapshots. Can you find the minimum set of link changes
that explain the snapshots? Can you find a representation of all possible
changes the explain the snapshots? Results for shortest path policies in [7].
Work in progress on other polices and partial link failures.
7 October 04 [email protected]
Lessons From The Worm Attacks Worm Shows Complexity of BGP Dynamics
Need to stablize the peer communication (FRTR).
Need to improve path vector convergence (RCN).
Would like to identify real source topology events.
But we must not forget that Ultimate goal of routing is to delivery packets.
– Route updates are only a means to toward this goal.
Worm attack was not directed against routing.
7 October 04 [email protected]
Infrastructure Faults and Attacks
InternetInternet c.gtld-servers.net
BGP monitor
192.26.92.30
originates route to 192.26.92/24
BGP and DNS Provide No Authentication Faults and attacks can mis-direct traffic. One (of many) examples observed from BGP
logs. Server could have replied with false DNS data.
ISPs announced new pathfor 20 minutes to 3 hours
7 October 04 [email protected]
Cryptography is like magic fairy dust,we just sprinkle it on our protocols
and its makes everything secure
- See IEEE Security and Privacy Magazine, Jan 2003
7 October 04 [email protected]
Secure DNS Query and Response
Caching DNS Server
End-user
www.darpa.mil
www.darpa.mil = 192.5.18.195Plus (RSA) signature by darpa.mil
Attacker can not forge this answer without the darpa.mil private key.
Authoritative DNS Servers
7 October 04 [email protected]
There is no magic fairy dust
7 October 04 [email protected]
What To Take Away A new look at the Internet infrastructure
Scaling up has more profound implications beyond
bigger numbers/tables.
Data reveals interesting problems and provides a
large-scale systems lab.
Challenges Remain in Improving the System
But we can build backwards compatible changes
into the infrastructure (ex: FRTR and RCN)
Need to develop general approaches to resilient
design of large-scale systems (Internet, Sensor
Nets, Etc.)
7 October 04 [email protected]
Reference Cited1. Observation and Analysis of BGP Behavior under Stress, L. Wang, X. Zhao,
D. Pei, R. Bush, D.Massey, A. Mankin, S. F. Wu, and L. Zhang, Proceedings of the SIGCOMM Internet Measurement Workshop, 2002
2. FRTR: A Scalable Mechanism to Restore Routing Table Consistency, L. Wang, D. Massey, K. Patel, and L. Zhang, To appear in IEEE Dependable Systems and Networks (DSN), July 2004
3. Understanding BGP Behavior Through A Study of DoD Prefixes, X.Zhao, M. Lad, D. Pei, L.Wang, D. Massey, S. F. Wu, and L. Zhang, Proceedings of DISCEX III, April 2003.
4. Improving BGP Convergence with Consistency Assertions, D. Pei, L. Wang, X. Zhao, D. Massey, L. Zhang, A. Mankin, Proceedings of the IEEE INFOCOM 2002.
5. BGP-RCN: Improving BGP Convergence Through Root Cause Notification, D. Pei, M. Azuma, N. Nguyen, J. Chen, D. Massey, and L. Zhang, UCLA Department of Computer Science Technical Report, UCLA CSD TR-030047, October 2003.
6. Link-Rank: A Graphical Tool for Capturing BGP Routing Dynamics, M. Lad, D. Massey, and L. Zhang, To appear in Network Operations and Management Symposium (NOMS), April, 2004.
7. An Algorithmic Approach to Identifying Link Failures, M. Lad, A. Nanavati, D. Massey, and L. Zhang, To appear in 10th Pacific Rim Dependable Computing Symposium (PRDC) March, 2004
8. DNS Security Introduction and Requirements, R. Arends, R. Austein, M. Larson, D. Massey and S. Rose, Work in Progress, IETF DNS EXT Working Group. Feb, 2004
7 October 04 [email protected]
Acknowledgements Funding Sources
FNIISC Project: August 2000 - May 2004– DARPA Fault Tolerant Networks– PI: USC/ISI (Dan Massey), UCLA (Lixia Zhang), UC Davis (S. Felix
Wu) Beyond BGP Project: October 2002 - September 2005
– NSF Special Projects in Networking– PI: USC/ISI (Dan Massey) and UCLA (Lixia Zhang)
FMESHD Project: July 2000 - December 2003– DARPA Fault Tolerant Networks– PI: USC/ISI (Dan Massey); subk to NAI (Russ Mundy)
With Thanks to Collaborators and Graduate Students Lixia Zhang and Felix Wu Lan Wang, Dan Pei, and Mohit Lad (UCLA & USC/ISI interns) Naheed Vora (USC & USC/ISI Intern)
7 October 04 [email protected]
Revised DNS Key Management
mil DNS Server
darpa.mil DNS Server
darpa.mil NS records
www.darpa.mil A record
www.darpa.mil SIG(A) by key 2
darpa.mil KEY (pub key 1)
darpa.mil KEY (pub key 2)
darpa.mil SIG(A) by key 1
darpa.mil DS record (hash of pubkey 1)
darpa.mil SIG(DS) by mil private key
Can Change mil key without notifying darpa.mil
Can Change key 2 without notifying .mil
7 October 04 [email protected]
Next Step DNS Security Activities Co-editor of the IETF specification [8].
Last call workshop completed last month. Cleaning up minor issues and nits.
Dept. of Homeland Security DNSSEC Group Group of 10 advising DHS on DNS security
deployment strategies. Need operational policies for end systems.
Investigating Resilient DNS Real security is more than authentication. Joint work with Amir/Terzis and Zhang/Wu.
– NSF ITR Proposal just completed (hours ago :)
7 October 04 [email protected]
DNS Key Roll-Over
mil DNS Server
darpa.mil DNS Server
darpa.mil KEY (pub key 1)
darpa.mil KEY (pub key 2)
darpa.mil SIG(A) by key 1
darpa.mil DS record (hash of pubkey 1)
darpa.mil SIG(DS) by mil private key
darpa.mil KEY (pub key 3)
darpa.mil SIG(A) by key 3
darpa.mil DS record (hash of pubkey 3)
darpa.mil SIG(DS) by mil private key
Objective: Replace KEY 1 with new KEY 3
7 October 04 [email protected]
Multi-Origin AS Routing Announcement
MOAS exists in current BGP operation Some due to operational need; some due to faults
Blind acceptance of MOAS dangerous An open door for traffic hijacking
7 October 04 [email protected]
BGP-based Solution Example
router bgp 59 neighbor 1.2.3.4 remote-as 52 neighbor 1.2.3.4 send-community neighbor 1.2.3.4 route-map setcommunity outroute-map setcommunity match ip address 18.0.0.0/8 set community 59:MOAS 58:MOAS additive
Example configuration:
AS58
18/8, PATH<4>, MOAS{4,58,59}
AS59
18.0
.0.0
/8 18/8, PATH<58>, MOAS{58,59}
18/8, PATH<59>, MOAS{58,59}
18/8, PATH<52>, MOAS{52, 58}
AS52
7 October 04 [email protected]
(b) Two Origin AS’s(a) One Origin AS
BGP false origin detectionSimulation Results
7 October 04 [email protected]
BGP Updates During Slammer Worm
7 October 04 [email protected]
Constructing Fault Graphs Monitor observes a shift from
red path to blue path. (Other monitors reveal node 5)
Convert to a Fault-Graph Combine all topology data. Greedy algorithm to select
``core” faults near root. Recursive search to find
alternates for each core fault. Results in lower fault-graph.
A set of edges is an explanation iff it is cut in the fault-graph. Min explanation = min cut
Extends to multiple views. Used to analyze LinkRank Data
1
2 3
5
7
4 6
Monitor
Desitnation 1
2
5
7
4
Sink
Source
7 October 04 [email protected]
Infrastructure Security Enhancements
BGP and DNS lack authentication. Easy to insert false BGP routes or reply with
false DNS data. S-BGP (BBN) & SoBGP (Cisco) propose
adding Public Key Authentication to BGP. Verify origin is authorized to announce
prefix and verify each link in the AS path.– Is this path authentication the right goal?
Requires a heavy-weight PKI structure. DNSSEC adds authentication to DNS
Further along than the BGP approaches Provides lessons for BGP authentication.