the congestion manager hari balakrishnansrinivasan seshan mit lcs cmu draft-ietf-ecm-cm-01.txt
TRANSCRIPT
The Congestion Manager
Hari Balakrishnan Srinivasan Seshan
MIT LCS CMU
http://nms.lcs.mit.edu/
draft-ietf-ecm-cm-01.txt
July 31, 2000 48th IETF (Pittsburgh) ECM WG 2
CM architecture
• Integrates congestion management across all applications (transport protocols & user-level apps)
• Exposes API for application adaptation, accommodating ALF applications• This draft: sender-only module
TCP1
IP
UDPTCP2
HTTP RTP/RTCP
SCTP
NNTP . . .
Congestion
Manager
API
July 31, 2000 48th IETF (Pittsburgh) ECM WG 3
Outline
• Draft overview (“tutorial” for slackers!)– Terminology– System components– Abstract CM API– Applications
• Issues for discussion
July 31, 2000 48th IETF (Pittsburgh) ECM WG 4
Assumptions & terminology
• Application: Any protocol that uses CM• Well-behaved application: Incorporates
application-level receiver feedback, e.g., TCP (ACKs), RTP (RTCP RRs), …
• Stream– Group of packets with five things in common
[src_addr, src_port, dst_addr, dst_port, ip_proto]
• Macroflow– Group of streams sharing same congestion control and
scheduling algorithms (a “congestion group”)
July 31, 2000 48th IETF (Pittsburgh) ECM WG 5
Architectural components
• CM scope is per-macroflow; not on data path• Congestion controller algorithm MUST be TCP-friendly (see
Floyd document)• Scheduler apportions bandwidth to streams
Congestioncontroller
Scheduler
CM
API to streams on macroflow
July 31, 2000 48th IETF (Pittsburgh) ECM WG 6
Congestion Controller• One per macroflow
• Addresses two issues:– WHEN can macroflow transmit?– HOW MUCH data can be transmitted?
• Uses app notifications to manage state – cm_update() from streams– cm_notify() from IP output whenever packet sent
• Standard API for scheduler interoperability– query(), notify(), update()
• A large number of controllers are possible
July 31, 2000 48th IETF (Pittsburgh) ECM WG 7
Scheduler
• One per macroflow• Addresses one issue:
– WHICH stream on macroflow gets to transmit
• Standard API for congestion controller interoperability– schedule(), query_share(), notify()– This does not presume any scheduler
sophistication
• A large number of schedulers are possible
July 31, 2000 48th IETF (Pittsburgh) ECM WG 8
Sharing
• All streams on macroflow share congestion state
• What should granularity of macroflow be?– [Discussed in November ‘99 IETF]– Default is all streams to given destination address– Grouping & ungrouping API allows this to be
changed by an application program
July 31, 2000 48th IETF (Pittsburgh) ECM WG 9
Abstract CM API
• State maintenance• Data transmission• Application notification• Querying• Sharing granularity
July 31, 2000 48th IETF (Pittsburgh) ECM WG 10
State maintenance
• stream_info is platform-dependent data structure, containing:[src_addr, src_port, dst_addr, dst_port, ip_proto]
• cm_open(stream_info) returns stream ID, sid• cm_close(sid) SHOULD be called at the end• cm_mtu(sid) gives path MTU for stream• Add call for sid--->stream_info (so non apps
can query too)
July 31, 2000 48th IETF (Pittsburgh) ECM WG 11
Data transmission
• Two API modes, neither of which buffers data• Accommodates ALF-oriented applications• Callback-based• Application controls WHAT to send at any
point in time
July 31, 2000 48th IETF (Pittsburgh) ECM WG 12
Callback-based transmission
CM
Application
1. cm_request() 2. cmapp_send() /* callback */
• Useful for ALF applications• TCP too
– On a callback, decide what to send (e.g., retransmission), independent of previous requests
July 31, 2000 48th IETF (Pittsburgh) ECM WG 13
Synchronous transmission
• Applications that transmit off a (periodic) timer loop– Send callbacks wreck timing structure
• Use a different callback• First, register rate and RTT thresholds
– cm_setthresh() per stream
• cmapp_update(newrate, newrtt, newrttdev) when values change
• Application adjusts period, packet size, etc.
July 31, 2000 48th IETF (Pittsburgh) ECM WG 14
Application notification
• Tell CM of successful transmissions and congestion– cm_update(sid, nrecd, nlost, lossmode, rtt)– nrecd, nsent since last cm_update call– lossmode specifies type of congestion as bit-
vector: CM_PERSISTENT, CM_TRANSIENT, CM_ECN
• Should we define more specifics?
July 31, 2000 48th IETF (Pittsburgh) ECM WG 15
Notification of transmission
• cm_notify(stream_info, nsent) from IP output routine– Allows CM to estimate outstanding bytes
• Each cmapp_send() grant has an expiration– max(RTT, CM_GRANT_TIME)
• If app decides NOT to send on a grant, SHOULD call cm_notify(stream_info, 0)
• CM congestion controller MUST be robust to broken or crashed apps that forget to do this
July 31, 2000 48th IETF (Pittsburgh) ECM WG 16
Querying
• cm_query(sid, rate, srtt, rttdev) fills values– Note: CM may not maintain rttdev, so consider
removing this?
• Invalid or non-existent estimate signaled by negative value
July 31, 2000 48th IETF (Pittsburgh) ECM WG 17
Sharing granularity
• cm_getmacroflow(sid) returns mflow identifier• cm_setmacroflow(mflow_id, sid) sets macroflow
for a stream– If macroflowid is -1, new macroflow created
• Iteration over flows allows grouping– Each call overrides previous mflow association
• This API sets grouping, not sharing policy– Such policy is scheduler-dependent– Examples include proxy destinations,client
prioritization, etc.
July 31, 2000 48th IETF (Pittsburgh) ECM WG 18
Example applications
• TCP/CM– Like RFC 2140, TCP-INT, TCP sessions
• Congestion-controlled UDP• Real-time streaming applications
– Synchronous API, esp. for audio
• HTTP server– Uses TCP/CM for concurrent connections– cm_query() to pick content formats
July 31, 2000 48th IETF (Pittsburgh) ECM WG 19
Linux implementation
Congestioncontroller
Scheduler
CM macroflows, kernel APITCP UDP-CC
libcm.a
IP
cm_notify()ip_output() ip_output()
User-level library;implements API
Control socket for callbacksSystem calls (e.g., ioctl)
App stream
cmapp_*()Stream requests, updates
July 31, 2000 48th IETF (Pittsburgh) ECM WG 20
Server performance
0
5
10
15
20
25
30
35
40
45
0 200 400 600 800 1000 1200 1400 1600
cmapp_send()
Buffered UDP-CC
TCP/CM, no delack
TCP, w/ delack
TCP/CM, w/ delack
TCP, no delack
CPU secondsfor 200K pkts
Packet size (bytes)
July 31, 2000 48th IETF (Pittsburgh) ECM WG 21
Security issues
• Incorrect reports of losses or congestion; absence of reports when there’s congestion
• Malicious application can wreck other flows in macroflow
• These are all examples of “NOT-well-behaved applications”
• RFC 2140 has a list– Will be incorporated in next revision– Also, draft-ietf-ipsec-ecn-02.txt has relevant stuff
July 31, 2000 48th IETF (Pittsburgh) ECM WG 22
Issues for discussion
• Prioritization to override cwnd limitation• cm_request(num_packets)
– Request multiple transmissions in a single call
• Reporting variances– Should all CM-to-app reports include a variance
• Reporting congestion state– Should we try and define “persistent” congestion?
• Sharing policy interface– Scheduler-dependent (many possibilities)
July 31, 2000 48th IETF (Pittsburgh) ECM WG 23
Overriding cwnd limitations
• Prioritization– Suppose a TCP loses a packet due to congestion– Sender calls cm_update()– This causes CM to cut window– Now, outstanding exceeds cwnd– What happens to the retransmission?
• Solution(?)– Add a priority parameter to cm_request()– At most one high-priority packet per RTT?
July 31, 2000 48th IETF (Pittsburgh) ECM WG 24
A more complex cm_request()?
• Issue raised by Joe Touch– cm_request(num_packets)
• Potential advantage: higher performance due to fewer protection-boundary crossings
• Disadvantage: makes internals complicated• Observe that:
– Particular implementations MAY batch together libcm-to-kernel calls, preserving simple app API
– Benefits may be small (see graph)
July 31, 2000 48th IETF (Pittsburgh) ECM WG 25
Reporting variances
• Some CM calls do not include variances, e.g., no rate-variance reported
• There are many ways to calculate variances– These are perhaps better done by each
application (e.g., by a TCP)
• The CM does not need to maintain variances to do congestion control
• In fact, our implementation of CM doesn’t even maintain rttdev...
July 31, 2000 48th IETF (Pittsburgh) ECM WG 26
Semantics of congestion reports
• CM_PERSISTENT– Persistent congestion (e.g., TCP timeouts)– Causes CM to go back into slow start
• CM_TRANSIENT: Transient congestion, e.g., three duplicate ACKs
• CM_ECN: ECN echoed from receiver• Should we more precisely define when
CM_PERSISTENT should be reported?– E.g., no feedback for an entire RTT (“window”)
July 31, 2000 48th IETF (Pittsburgh) ECM WG 27
Sharing policy
• Sender talking to a proxy receiver– See, e.g., MUL-TCP
• Client prioritization & differentiation• These are scheduler issues
– Particular schedulers may provide interfaces for these and more
– The scheduler interface specified here is intentionally simple and minimalist
• Vern will talk more about the scheduler
July 31, 2000 48th IETF (Pittsburgh) ECM WG 28
Future Evolution
• Support for non-well behaved applications– Likely use of separate headers
• Policy interfaces for sharing• Handling QoS-enabled paths
– E.g., delay- and loss-based divisions
• Aging of congestion information for idle periods• Expanded sharing of congestion information
– Within cluster and across macroflows