[ ppt ] presentation
TRANSCRIPT
Department of Electrical and Computer Engineering
Tilman WolfDepartment of Electrical and Computer Engineering
University of Massachusetts Amherst
Network Services in the Next-Generation Internet
Tilman Wolf 2
Need for Clean-Slate Network Architecture Limitations of current architecture
• Fixed TCP/IP stack • Hardware implementation of forwarding• Extensions are “hacks”
− Firewalls, intrusion detection systems,network address translation
Need for new network architecture • Support for more heterogeneity
− End systems: cell phones, PDAs, RFID tags, sensors− Routers: wireless infrastructure, ad-hoc networks
• Support for new networking paradigms− Data access: content distribution, content addressable networks− Protocols: multipath routing, network coding
`
End system:- IP security- TCP termination
Server:- Content-based switching- Firewall- SSL termination- IP security
Access router:- Access concentration (cable, DSL, wireless)- Network address translation- Policy-based QoS- Monitoring and billing- Firewall
Edge router:- Packet classification- QoS (DiffServ)- monitoring and billing
Core router:- Multiprotocol label switching- QoS aware routing- Monitoring
Tilman Wolf 3
Network Virtualization Virtualization of router system
• Common hardware (“substrate”)• Coexistence of multiple
virtual networks• Specialized networks
deployed as separate protocol stacks (“slices”)
Programmability in data plane• Deployment of new protocols through software
Questions• How to deploy new functionality in empty slice?
− Per-connection functionality and network-wide functionality
• How to manage processing resources in substrate?
`
`
substrate router
parallel “protocol stacks”
Tilman Wolf 4
Outline Introduction Network Services
• Architecture• Routing with network services
Packet processing systems• Runtime management
Conclusions
Tilman Wolf 5
Flexibility vs. Manageability Customization in network architecture
• What is the right level of flexibility? Two extremes
• ASIC implementation of IP router− All packets are handled the same way− No flexibility
• Active networks− Packet processing can be programmed− Too much flexibility – very difficult to manage
Our approach: balanced combination• Set of well-defined protocol processing
features (“network services”)− E.g., reliability, security, scheduling, …
• Custom combination of services provides flexibility
Transport
Network
Application
reliability
trans-coding
SSL privacy
caching
flow control
QoS schedu-
linganycast
multi-cast
IDS
reliability
trans-coding
SSL privacy
caching
flow control
QoS schedu-
ling
anycast
multi-cast
IDS
Network services
Tilman Wolf 6
Network Service Architecture New communication abstraction
• Custom composition of functions along end-to-end path
• expressed as sequence of “network services”
Benefits• End-system application can choose most suitable features• Network can control placement of services• Programmable routers implement network services
Service-Enabled Network
End-System
End-System
Service1
Service2
End-System
End-System
Connection Request
Tilman Wolf 7
Related Work Protocol stack composition on end-system (“vertical”)
• Configurable protocol stacks [Bhatti & Schlichting, SIGCOMM 1995]• Configurable protocol heaps [Braden, Faber & Handley, CCR 2003]• NCSU SILO project [Dutta et al., ICC 2007]
Custom network processing (“horizontal”)• Active networks [Tennenhouse & Wetherall, SIGCOMM 1996]• Modular routers: Click [Kohler et al., TOCS 2000]• Programmable routers and network processors
Substrate systems• Router virtualization: VINI (Princeton), SPP (Washington University)• Forwarding substrate: OpenFlow (Stanford), PoMO (Univ. of Kentucky)
Our focus: abstractions for horizontal composition
Tilman Wolf 8
Network Service Architecture Hierarchical inter-network and intra-network design
• Autonomous System abstraction• Match with administrative
boundaries of Internet
Control plane• Connection
setup• Routing
algorithm
Data plane• Forwarding• Packet processing
Service Node
Service Node
Service Node
Control plane
Data plane Service Node
Service Node
Service Node
Service Controller
Service Node
Service Controller
End-System
End-System
Service-EnabledNetwork
Tilman Wolf 9
Connection Setup Interface to applications
• API similar to Berkeley sockets• Service specification determines
sequence of requested services Example:
*:*>>compression(LZ)>>decompression(LZ)>>192.168.1.1:80
• Connection to 192.168.1.1 port 80• Compression (Lempel Ziv) on path
Options• Parameters necessary for service (e.g.,
LZ)• Constraints service placement (e.g.,
sending LAN, receiving LAN)
connection request
t
End-System
Service Controller
Service Node 1
Service Node 2
service setup
mapping
service setup
setupack
setupack
connectionack
datatransmission
datatransmission
service
processing
service
processing
...
...
resource
allocation
Tilman Wolf 10
Multiparty Interests Connection setup can be influenced by multiple parties:
• Sender (connection to destination)• Receiver (e.g., use of proxy)• Network service provider (e.g., monitoring)
Explicit addition of services
*:*>>128.119.85.114:80
*:*>>proxy>>128.119.85.114:80
*:*>>monitoring>>*.*
*:*>>monitoring>>proxy>>128.119.85.114:80
Receiver
Sender
Network service provider
Tilman Wolf 11
Service Routing Problem Interesting problem at connection setup
• Determine path and select nodes to perform service• How can a node decide best path?
− Better to perform service locally?− Better to defer to downstream node?− Which direction to route connection?
Assumption: single cost metric • Otherwise NP complete
Centralized solution• Global view necessary• Limited scalability
Distributed solution• Dynamic programming
s t
?
?
?
?
Tilman Wolf 12
Distributed Service Matrix Routing Similar to Distance Vector routing
• Each neighbor announces cost of best path to each destination• Each node adds cost to neighbor and picks best router (Bellman-Ford)
Distributed Service Matrix Routing (DSMR)• Expand vector to include service: “service matrix”
− Periodic service matrix exchange− Service matrices stabilize eventually
• Each node can determine best path− Handle service locally OR− Send to neighbor with
lowest cost
• Challenge: each service combination requires columns in matrix− Exponential growth of matrix with number of services
v1
v2
...
- S1 S2 ...
destinations
serv
ices
no s
ervi
ce
63 9
74 5
...
...
... ... ... ...
S1S2
11
15
...
S2S1
10
13
...
Tilman Wolf 13
Approximate DSMR Use information from single service only
• Matrix lists node where service is performed
Routing of multiple services• Allocate best node for last service• Find best path for
second-to-lastservice to that node
• Repeat forall services
Upper boundon path
t
S3
S2
S1
S1S2 S3
s
Least-cost path for services sequence
Least-cost path for one service (given by service matrix)
Approximate least-cost path (and upper-bound)
v1
v2
...
- S1 S2 ...
destinations
serv
ices
no s
ervi
ce
6,v33 9,v4
7,v54 5,v4
...
...
... ... ......
Tilman Wolf 14
Prototype Implementation Emulab prototype
• 12 Autonomous Systems• 60 nodes
Service routing• Centralized within AS• Approximate DSMR
between ASs
149,760 connections• All possible source-
destination pairs• All possible service
combinations
Tilman Wolf 15
Evaluation of DSMR Correctness
• 6 of 149,760 connections failed
Convergence time• Service matrices converge• Time increases with network
size
Approximate DSMR• Works well for small number
of services• Inefficiency grows with
number of service
path length of approximation over optimal route
Tilman Wolf 16
Evaluation of DSMR Connection setup time with
Distributed Service RoutingProtocol (DSRP)• Compared to TCP• Setup time less than 2× longer
Evaluation summary• Routing with service
constraints can be solved efficiently
• Distributed algorithm is scalablewhen using approximation
Tilman Wolf 17
Example Scenario: IPTV Distribution Heterogeneous receivers present challenge with live IPTV
• Current solution: overlay with transcoding on end-systems
Low quality display (H.261)
Low quality display (H.263)
HDTV display(1080p)
Network
HDTV Source (1080p)
Video transcoding
1080p to H.263
1080p to H.261
Tilman Wolf 18
Example Scenario: IPTV Distribution Transcoding in network when using network service
Low quality display (H.261)
Low quality display (H.263)
HDTV display(1080p)
Network
HDTV Source (1080p)
1080p to
H.263
1080p to
H.261
Tilman Wolf 19
Example Scenario: IPTV Distribution Prototype implementation
• Emulab simulation
Service request• *:*>>monitor(bandwidth)
>>multicast(192.168.1.1,videotranscode(1080p,H.264)>>monitor(bandwidth)>>192.168.2.17)>>*:5000
Also prototyped on real router system• Cisco ISR with AXP• Single core processor insufficient
How to design a good packet processing system?
Tilman Wolf 20
Outline Introduction Network Services
• Architecture• Routing with network services
Packet processing systems• Runtime management
Conclusions
Tilman Wolf 21
Programmable Router Flexibility through programmability
• General-purpose processing capability in data path• Packet processing in software
High-performance processing hardware• E.g., network processors
Scalability through highlevel of parallelism
Router
SwitchingFabric
PortPort
Port
Port
Port
Network ProcessorN
etw
ork
Inte
rfac
e Processing Engine
Processing Engine
Processing Engine
Processing Engine
I/O
packets
InterconnectNetwork
services on packet
processor
Tilman Wolf 22
Programming of Packet Processors Programming is challenging
• Distribution of processing onto multiple processors− Run-to-completion model often not feasible
• Limited instruction store on embedded packet processors• Contention for shared data structures
• System components on MPSoC are tightly coupled• Simple code, but repetitiveness amplifies small problems
Typical solution: offline optimization• Simulation to identify performance bottlenecks• Manual adjustment
− Code, thread and processor allocation, memory management
• Repeat Offline optimization cannot handle dynamic environment
• Change in network traffic, network services, slice allocation, etc.
Tilman Wolf 23
Runtime System for Packet Processors
Heterogeneous multi-core packet processing system
Offline programming and configuration
Runtime management
Possible data-path network services
Click configuration Graph of schedulable
Click elements
Task mapping
Processor core
Processor core
Hardware accelerator
Processor core
Installation of Click configuration on packet processing hardware
Update of profiling information
User
Implementation of Click
elements
Packets
Click Click
Click
Adaptation of task allocationto processing resources• Runtime profiling to obtains
usage statistics• Task mapping to adapt to
current requirements Current focus: processing (not memory)
Tilman Wolf 24
Workload Representation Granularity of representation
is important• Too coarse: not easily distributed• Too fine-grained:
scalability problem
Good balance: Click modular router• Directly
translatable into imple-mentation
Tilman Wolf 25
Task Mapping Problem Which Click element (“task”) should run on which processor? Challenges
• Different task “sizes” (in terms of processing requirements)
• Different task utilization• Communication cost of
inter-processor transfers
Leads to packing problem• Computationally hard to solve
Our approach• Simplify problem by creating
tasks of equal size
t3
t41
t61
tT-21
tT-11
tT1
t5
...t11
t21
t81
t71
interconnect
... ... ......
M threads
N processorspacket processing system
... task mapping
...
t5t53
t32
Tilman Wolf 26
Task Replication Profiling provides runtime information
• Task utilization• Task processing time
Compute: “work” per task• work = (utilization) × (processing time)
Replicate tasks with highest work• Replication reduces utilization• Reduced utilization reduces work
Benefits or replication• Task work more balanced
− Simplifies mapping problem
• Larger number of tasks − Allows scaling to large number of processors
ti ti+1ti-1
task replication
ti1 ti+1
1ti-11
ti2
ti3
ti+12
Tilman Wolf 27
Task Replication Example: Click configuration with 23 tasks
• IP forwarding and IPSec as network services
155 ́difference
13.5 ́difference
Tilman Wolf 28
Task Mapping Simple greedy algorithm
• Co-locate tasks withmaximum utilizationedges
• When processor “full” then switch to next
Runtime adaptation• Update profiling information• Update replication• Update mapping• Update NP configuration
Tilman Wolf 29
System Evaluation Our runtime system vs. SMP Click on 4-core Xeon
• For various scenarios we observe up to 1.32x higher throughput
Tilman Wolf 30
What’s Next? Trends continue
• Trend towards more programmability in networks• Trend toward more embedded cores per chip• Trends towards system usability
− Cisco QuantumFlow (40 cores, 4 threads) programmable in ANSI-C
Question: homogeneous or heterogeneous MPSoC?• Homogeneity simplifies
programmability• Hardware accelerators
perform better• How to find balance in
next-generation Internet? tInternet
architecturenext-generation
Internet architecture
general-purpose
processor
specializedhardware
?packet
processingsystem
implementation
slow path processing
fast pathprocessing
today
Tilman Wolf 31
What’s Next? Question: is there a better packet processor design?
• High overhead for managingpacket processing context
• Hardware support for contextmanagement
• Processor core sees simpleinstruction and memoryaddress space
Question: correct service composition semantics?• How can service specifications be verified or composed automatically?• Can enumeration of all service properties be avoided?
service processor
addressshifter
/
/ ///
/
/
32
8
88
addressshifter
32
/12
6 6
instruction in
/32
/10
addr[7..0]
addr[15..8]
addr[11..0]
addr[17..12]
data in/out
PKT_DONE
addr[19..18]
DEC
2/
8
/
/
ServiceTag[7..0]
24FlowTag[23..0]
/32
/18
data[31..0]
addr[19..0]
/32PKT_IN[31..0]
/32PKT_OUT[31..0]
/32
/
12
FLOW_EN
STATE_EN
instruction memory local data
memory
packetmemory
service 1
service 2
service 5
flow 17
service 1
flow 9
service 5
packet (flow 17)
packet(flow 9)
Tilman Wolf 32
Outline Introduction Network Services
• Architecture• Routing with network services
Packet processing systems• Runtime management
Conclusions
Tilman Wolf 33
Conclusions Next-generation Internet needs to meet many demands
• Flexibility is key to avoid ossification
Network services implement new features• Routing with services is important control-plane problem• Distributed Service Matrix Routing provide effective solution
Programmable routers provide packet processing platform• Runtime system for network processors necessary for adaptation• Mapping of processing tasks to hardware resources
Exciting time for networking research• New network architecture and applications• “Clean slate” designs allow for creative contributions
Tilman Wolf 34
Acknowledgements Graduate students
• Xin Huang• Sivakumar Ganapathy• Shashank Shanbhag• Qiang Wu
Sponsors• National Science Foundation• Intel Research Council
Tilman Wolf 36
Publications Network service architecture
• Tilman Wolf, “Service-centric end-to-end abstractions in next-generation networks,” in Proc. of Fifteenth IEEE International Conference on Computer Communications and Networks (ICCCN), Arlington, VA, Oct. 2006, pp. 79-86.
• Sivakumar Ganapathy and Tilman Wolf, “Design of a network service architecture,” in Proc. of Sixteenth IEEE International Conference on Computer Communications and Networks (ICCCN), Honolulu, HI, Aug. 2007.
• Xin Huang, Sivakumar Ganapathy, and Tilman Wolf, “A scalable distributed routing protocol for networks with data-path services,” in Proc. of 16th IEEE International Conference on Network Protocols (ICNP), Orlando, FL, Oct. 2008.
• Shashank Shanbhag and Tilman Wolf, “Implementation of end-to-end abstractions in a network service architecture,” In Proc. of Fourth Conference on emerging Networking EXperiments and Technologies (CoNEXT) , Madrid, Spain, Dec. 2008.
Runtime management of packet processors• Tilman Wolf, Ning Weng, and Chia-Hui Tai, "Run-time support for multi-core packet processing systems,"IEEE Network,
vol. 21, no. 4, pp. 29-37, July 2007. • Qiang Wu and Tilman Wolf, “Dynamic workload profiling and task allocation in packet processing Systems,” in Proc. of
IEEE Workshop on High Performance Switching and Routing (HPSR), Shanghai, China, May 2008.• Xin Huang and Tilman Wolf, "Evaluating dynamic task mapping in network processor runtime systems," IEEE
Transactions on Parallel and Distributed Systems, vol. 19, no. 8, pp. 1086–1098, Aug. 2008. • Qiang Wu and Tilman Wolf, “On runtime management in multi-core packet processing systems,” in Proc. of ACM/IEEE
Symposium on Architectures for Networking and Communication Systems (ANCS), San Jose, CA, Nov. 2008.• Qiang Wu and Tilman Wolf, “Runtime resource allocation in multi-core packet processing systems,” in Proc. of IEEE
Workshop on High Performance Switching and Routing (HPSR), Paris, France, June 2009. Service processor design
• Qiang Wu and Tilman Wolf, “Design of a network service processing platform for data path customization,” In Proc. of The Second ACM SIGCOMM Workshop on Programmable Routers for Extensible Services of TOmorrow (PRESTO) , Barcelona, Spain, August 2009.
Verification of service composition• Shashank Shanbhag, Xin Huang, Santosh Proddatoori, and Tilman Wolf, “Automated service composition in next-
generation networks,” in Proc. of The International Workshop on Next Generation Network Architecture (NGNA), Montreal, Canada, June 2009.