high-performance hardware monitors to protect network processors from data plane attacks
DESCRIPTION
High-Performance Hardware Monitors to Protect Network Processors from Data Plane Attacks. Outline. The problem: software attacks on network routers Routers now include programmable processors Our solution: include a monitor for the processor Software to generate the monitor information - PowerPoint PPT PresentationTRANSCRIPT
Department of Electrical and Computer Engineering
Kekai Hu, Harikrishnan Chandrikakutty, Deepak Unnikrishnan, Tilman Wolf,
and Russell Tessier
Department of Electrical and Computer EngineeringUniversity of Massachusetts Amherst
High-Performance Hardware Monitors to
Protect Network Processors from Data Plane Attacks
Russell Tessier 2
Outline The problem: software attacks on network
routers • Routers now include programmable processors
Our solution: include a monitor for the processor Software to generate the monitor information Implementation on DE4 NetFPGA board Experimental results
Russell Tessier 3
Computer Networks Networks provide
connectivity between end-systems
Success of the Internet: hourglass architecture
Success is also a problem: diverse apps, diverse systems
Changing requirements for the network layer
Layered protocol stack
Physical layer
Link layer
Network layer
Transport layer
Application layer
IP
UDP TCP
HTTP
TLS/SSL
DNS BGP
SIP
Ethernet
DSL FDDI
1000BASE-T
SONET/SDH802.11a/b/g/n
RS-232
...
...
...
...
Example protocols
Russell Tessier 4
Network Architecture Extensions to current
Internet Customization of data
plane Requires router systems
with ability to adapt
`
End system:- IP security- TCP termination
Server:- Content-based switching- Firewall- SSL termination- IP security
Access router:- Access concentration (cable, DSL, wireless)- Network address translation- Policy-based QoS- Monitoring and billing- Firewall
Edge router:- Packet classification- QoS (DiffServ)- monitoring and billing
Core router:- Multiprotocol label switching- QoS aware routing- Monitoring
Russell Tessier 5
Programmable Router Network processor
• General-purpose processing capability in data path
• Packet processing in software
High-performance processing hardware• Scalability through high
levels of parallelism Example: 40-core
network processors
Key challenge:• Security due to
software processing
Router
SwitchingFabric
PortPort
Port
Port
Port
Network Processor
Net
wor
k In
terf
ace Processing
EngineProcessing
Engine
Processing Engine
Processing Engine
I/O
packets
Interconnect
Protocol processing on packet processor
Russell Tessier 6
Exploit of Vulnerable Network Processor Vulnerability in network
processor can be exploited “In-network” denial of
service attack (new type of attack)
Key questions• Can such vulnerabilities occur
in packet processing code?(Yes, we show one example.)
• Can vulnerabilities be exploited?(Yes, for von-Neumann and for Harvard architectures)
Unprotected software-based
router
Vulnerable packet
processor
Packet forwarding software
Malicious packet
In-network denial of service attack
Russell Tessier 7
Header Insertion Exploit Stack smashing during header insertion
• Control flow can be changed to attack code
Attack code: infinite transmission loop• Devastating denial-of-service attack
Prototype implementation• Custom network processor on NetFPGA
Harvard arch.: possible, but less exciting
.
.
.
Packet payload(attack code)
New pkt buffer
Local var
.
.
.
Return address
Stack growth
Newstack
pointer
Low address
High address
CurrentframePrevious frame ptr
Current frame ptr
(generate CM header)
Original packet hdr
Attack model
8
• Congestion management protocol– Inserts a custom header
• Integer overflow vulnerability– len1 = 12– len2 = 65334– sum = 10
LinkHdr
IPHdr
UDPHdr
payload
CMapplication
LinkHdr
IPHdr
UDPHdr
payloadCMHdr
Original packet
New packet
len1 len2
Russell Tessier 9
Defense Mechanism: Hardware Monitor Hardware monitor
co-located witheach processor core• Core reports hash of
each executed instruction
Monitoring graph repre-sents correct behavior• Obtained from offline
analysis of binary• Deviations trigger reset
Change of software easy• Just need matching
monitoring graph
networkprocessor
core
instruction memory
data memory
packet buffer
processing code
network interface
comparison logic
mon. memory
mon. graph
ne
two
rk p
roce
sso
r
ha
rdw
are
mon
itor
hash of processinginstruction
reset/recovery
processing codebinary
NFA monitoringgraph
DFA monitoringgraph
NFA-to-DFA transformation
off
line
an
aly
sis
run
time
op
era
tion
Russell Tessier 10
Offline Analysis of Processing Binary Executed instruction reported by core as 4-bit hash
• Hash combines address, opcode, registers• Hash allows for compact representation of information
Monitoring graph• Each instruction represented as a state• Edges correspond to execution of instruction• Control-flow operations lead to multiple possible next states […] 49c: 97c20010 lhu v0,16(s8) 4a0: 00000000 nop 4a4: 2c420033 sltiu v0,v0,51 4a8: 1440000a bnez v0,4d4 4ac: 00000000 nop 4b0: 3c026666 lui v0,0x6666 4b4: 34430191 ori v1,v0,0x191 4b8: 97c20010 lhu v0,16(s8) […]
49c
4a0
4a4
4a8
4ac
4b0
4b4
4b8
0
7
11
10
10
7
3
6
4d4
Russell Tessier 11
NFA-to-DFA conversion Problem: non-deterministic finite automaton (NFA)
• State may have two next states with same edge value due to hash
• Implementation would need to keep track of multiple states
Solution: NFA-to-DFA conversion (powerset construction)• Well-known algorithm [Hopcroft and Ullman, 1976]
• Deterministic finite automaton (DFA) requires only one state
1 2 3 4 5b c d e f
c
6a
61a
2b
{3,5}c
4d
5e f
f
Russell Tessier 12
Implementation of DFA Monitor Each state may have up to 16 next states
• Most states only have one or two next states
Requirements• Compact representation• Fast processing of each hash value (single memory access)
Idea: grouping of next states in contiguous memory• Trick: determine offset into memory based on order of hash value
d g
h
2
7
311
0
142
7
9
a
c e
grouping
group 1
group 2
group 3
f
b d g
h
2
7
311
0
52
7a
c e
f
b
9
Russell Tessier 13
Implementation of DFA Monitor DFA monitor system
• Memory keeps all valid hash values in one bit vector• Next state based on offset of group and position of hash value
2
number of next states
0
offset in state group
0000 0000 1000 0100
valid hash values on outgoing edges
2 1 0000 1000 0000 1000
1 1 0000 0000 1000 0000
1 0 0000 0010 0000 0000
3 0 0000 0000 0010 0101
... ... ...
1 0 0000 0010 0000 0000
... ... ...
... ... ...
a
c
b
f
e
d
f
h
g
group 1
group 2
group 3
0x0000
0x0002
0x0006
...
group 1
group 2
group 3
group 16
...
group base address
-1
mult
add
...
k(position of
matching hash among valid hash values)
one-hot encoding
...
hash compari-
son
...
4-bit hash function
processor instruction
reset/recovery
3232
4
4
16
16
4
4
16
1
state machine memory
Russell Tessier 14
Altera DE4 NetFPGA Infrastructure
Open source port of NetFPGA 1G [1]
Networking research platform• Altera DE4 board• 5x resources compared to
NetFPGA 1G• PCI Express, 8GB DDR2
Configurable designs• Reference router• Packet generator
1 GigE
1 GigE
1 GigE
1 GigE
Stratix IV
SSRAM
DDRRAM
JTAG
SOPC SYSTEM
TSE MAC
TSE MAC
TSE MAC
TSE MAC
JTAG MASTERBRIDGE
CU
ST
OM
INT
ER
FA
CE
REFERENCE ROUTER
OR
PACKET GENERATOR
PCIe
HO
ST
CO
MP
UT
ER
DE4 BOARD
Russell Tessier 15
Altera DE4 NetFPGA reference router
Complete IPV4 router• Forward packets on all four 1Gbps
interfaces Design components
• Input queues• Input arbiter• Output port lookup• Output queues
Prototype system replaces output port lookup module
MACRxQ
MACRxQ
MACRxQ
CPURxQ
CPURxQ
CPURxQ
CPURxQ
Input Arbiter
Output port lookup
MACTxQ
MACTxQ
MACTxQ
CPUTxQ
CPUTxQ
CPUTxQ
CPUTxQ
Output Queues
MACRxQ
MACTxQ
Russell Tessier 16
Single-core system architecture
Single-core network processor system integrated along with Altera DE4 NetFPGA reference router pipeline
Harvard memory architecture
17
• Separate physical memory space for instruction and data– General memory error techniques for code-injection attacks will
not be successful– Attacks need to be generated with existing library functions
CPU
Address
Data
Data Memory
Instruction
Data
Stack
InstructionsAddress
Instuction Memory
Russell Tessier 18
Experimental Setup
Altera DE4 reference router with prototype system Altera DE4 packet generator
• Generate packets• Capture forwarded packets
Evaluation metrics• Attack detection• Throughput • Resource utilization
Russell Tessier 19
Offline analysis MIPS-GCC compiler generates binary
and branch information Powerset construction to perform NFA-
to-DFA transformation. Memory initialization file is loaded into
memory
MIPS-GCCcompiler
BenchmarkSource code
Memory generator module
NFA to DFA conversionmodule
InstructionInfo
BranchInfo
Instruction Info
Branch Info
Memory initialization file
Russell Tessier 20
Evaluation Monitoring speed
• Single memory access
• Lookup into fixed-size register file
Memory size of monitor• More states due to NFA-to-DFA conversion• More states due to multiple entries in memory for certain states• In practice, overhead is below 10%
Very fast and compact hardware monitor
Benchmarks
21
• NpBench [1]– Modern network applications– Three specific functional groups
• Traffic management and quality of service group• Security and media processing group• Packet processing group
[1] B.K. Lee and L.K. John, "NpBench: a benchmark suite for control plane and data plane applications for network processors," in Proc of 21st International Conference on Computer Design, vol., no., pp. 226- 233, 13-15 Oct. 2003.
Russell Tessier 22
Prototype Implementation on FPGA Small overhead compared to processor core (4096 states):
Correct operation: attack packet detected and dropped
Russell Tessier 23
Attack with Defense in Place Attack packet dropped, router continues to operate
Russell Tessier 24
Throughput
24
Throughput performance of the network processor with security monitor• CM protocol and IPV4 application
Russell Tessier 25
Multicore Monitor Dynamic workloads pose problem for
hardware monitor• Processing may differ between packets• Monitors need to match processing
Mapping between processors and monitors• 1-to-1 mapping requires frequent reload of monitor• Any-to-any mapping costly to implement• Clusters with n-to-m mapping provide balance
Interconnect is configured dyna-mically depending on workload• Mapping between core and
monitor
core
monitor
core
monitor
core
monitor
core
monitor...
...
core core core
monitor monitor monitor monitor
...
...
core core
monitor monitor monitor
...
core core
monitor monitor monitor
...
...
...
...
Russell Tessier 26
System Architecture of Clustered System Multiple cores can access multiple monitors
• Dynamic configuration of crossbar
Secure loading of monitors through external interface
Proc Proc Proc... Proc Proc Proc
...
Proc Proc Proc...
n Processors
Inter-coreInterconnect
External Memory
Crossbar Crossbar Crossbar
Mon Mon Mon...
m Monitors
Mon Mon Mon... Mon Mon Mon...
... ...
AESCentralized
MonitorMemory
Control Processor
...
External Interface
ControlSignals
Network Interface
Russell Tessier 27
Cluster Design Simple implementation of clustered monitor
• Dynamic configuration through programming of demultiplexers
NP Core
32
Hash
MonitorSelect
3
4Hash_1
R/R fromMonitor_1 to 6
Reset/Recover
NP Core
Hash
Hash_2
NP Core
Hash
Hash_4
Monitor_1
Hash 4
FromHash_1 to 4
Reset/Recover
1
Monitor_2 Monitor_3 Monitor_4 Monitor_5 Monitor_6
Proc Select
2
NP Core
Hash
Hash_3
1
4
Russell Tessier 28
Dual-Ported Monitor Implementation Memory of monitor can be shared between two monitors
• Effective use of dual-ported memory• Two monitoring graphs can be used in parallel
One-hotEncoding
HashComparison
4-bit HashFunction
16
4
32
1
ProcessorInstruction
Reset/Recover
NextStateSelect
4
GraphSelect
K-1
K
1
Next State
Valid Hashes
One-hotEncoding
HashComparison
4-bit HashFunction
4
16
NextStateSelect
GraphSelect
1
K
4
14
Next StateValid Hashes 14
1
Reset/Recover
ProcessorInstruction
32
Monitor 1
Monitor 2
Monitoring Graph 2
Monitoring Graph 1
Russell Tessier 29
Prototype Implementation on FPGA
Resources cost for a multi-core system (4 cores, 6 monitors):
Russell Tessier 30
Conclusions Current and future Internet needs to meet new demands Programmable routers provide packet processing platform
• Systems problem: security vulnerabilities• Attacks can be launched within data plane (i.e., not control access)• Monitor-based hardware defense mechanism is effective
Hardware monitor design and prototype• Uses compact DFA (less than 10% more states than NFA)• Verification with single memory access per instruction• Defense shown for Harvard architecture attack
Technique extended to multicore network processors Promising defense against attacks on network infrastructure