network redundancy elimination
DESCRIPTION
TRANSCRIPT
Network Redundancy EliminationJUNXIAO SHI 2013-11-05
Neil T. Spring and David Wetherall. 2000. A protocol-independent technique for eliminating redundant network traffic. SIGCOMM Comput. Commun. Rev. 30, 4 (August 2000), 87-95. DOI=10.1145/347057.347408 http://doi.acm.org/10.1145/347057.347408
slides ©yoursunny.com 2013, CreativeCommons BY-NC 3.0
Problem
Back in 2000, home Internet is slow
MODEM data rate: 33.6Kbps or 56Kbps
round trip latency: >100ms
2 minutes to load a webpage
Today, Internet isn’t always fast Satellite link (eg. Iridium)
◦ high latency◦ 2.4KB/s◦ $1.35 per minute
2G cellular data (eg. H2O Wireless)◦ high latency◦ low bandwidth◦ $0.30 per MB
Web contents are redundant
Screenshots of http://quotes.wsj.com/index/CN/SHCOMP during a trading day. Quote changes, but other remains same.
Web contents are often uncached
Web authors don’t want you to cache their contents, because:◦ Contents are dynamic. Stock price may
change at any time. News articles are posted throughout the day.
◦ Contents are personalized. Your Facebook homepage is different from anyone else’s.
◦ Access count must be accurate. Advertising revenue is calculated per thousand impressions.
response headers of http://www.dailyfinance.com/
To the naïve user -
Design
Architecture
cache
bandwidth-constrained channel
convert repeated strings into tokens
reconstruct original packet
contents of both caches must be consistent
network layer,protocol-independent
cache
The Cache Cache: holds most recent packets
◦ admission policy: admit all◦ replacement policy: FIFO
Indexed by representative fingerprints of the packets it holds◦ map fingerprint to the most recent packet it appears
Representative fingerprints1. Calculate rolling Rabin fingerprints for sequences of β bytes, mod M.
2. Select fingerprints ending with γ zeros as representative fingerprints.
Rabin fingerprints are not cryptographically secure. Algorithm should not assume collision-free.
Rabin fingerprints are used for finding similar documents, not for chunking.
window size: βselect one in 2γ fingerprintsfingerprint space: M
Sender process
cache
generate representative fingerprints
lookup fingerprints in cache index
verify no collision
expand to the left and to the right, byte-by-byte
convert matched regions into tokensadd packet to cache,
evicting oldest packet if necessary
send encoded, smaller packet
token format• the fingerprint• # bytes expanded to the left• # bytes expanded to the right
Receiver process
cache
lookup tokens in cache index
reconstruct original packet
generate representative fingerprints
add packet to cache, evicting oldest packet if necessary
deliver original packet
Cache consistency Contents of sender cache and receiver cache must be consistent.
Why caches might be inconsistent?◦ Network channel isn’t reliable. A packet that entered sender cache but lost on the
channel will not be present in receiver cache.
How to detect cache inconsistency?◦ Fingerprints! If there’s no collision, receiving an unrecognized fingerprint indicates
caches are inconsistent.
What happens if caches are inconsistent?◦ Receiver cannot reconstruct original packet.
Implementation
Trace analyzer The algorithm is implemented as a user-level process to analyze a trace.
Parameters Fingerprint space: M=260
◦ collision almost impossible
Penalty for each matching region: 12 octets◦ to represent the space needed for the token
Windows size β and fingerprint selecting frequency 2γ
◦ large β: better “quality” of matches, less potential bytes saving◦ small β: worse “quality” of matches (shorter matches in more recent packets)◦ small γ: more likely to find a match, larger index (=less memory for cached packets)◦ large γ: less likely to find a match, less memory usage◦ γ=5, β=64
Performance 45Mbps on a PC with Pentium -550 and 1GB memoryⅢ
This work is designed for slow links.
Follow-up work Future works by same authors:
◦ universal redundancy elimination◦ SmartRE: coordinated network-wide redundancy elimination◦ EndRE: end-system redundancy elimination
Traffic AnalysisHow much redundancy is there?
Amount of redundancy
Internet => corporate30% redundant
corporate => Internet50% redundant
with just 1MB of memory for cache+index:at least 10% redundant
HTTP RTSP Napster Lotus HTTPS FTP-data NNTP DNS ASF AOL SMTP POP Telnet Other0
10
20
30
40
50
60
70
traffi
c am
ount
(%)
Redundancy by protocol HTTP, Telnet, POP, ASF have high percentage of repeated strings.
HTTPS, FTP-data, Napster, RTSP, NNTP have low percentage of repeated strings.
redundant traffic
Redundancy elimination algorithm is protocol-independent, so we can save bytes on non-Web traffic.
Comparison with HTTP caching
Squid gzip Squid+gzip RE Squid+RE0
20
40
60
80
100
traffi
c (%
)
redundancy elimination works better than HTTP
caching and compression