network redundancy elimination

24
Network Redundancy Elimination JUNXIAO SHI 2013-11-05 Neil T. Spring and David Wetherall. 2000. A protocol-independent technique for eliminating redundant network traffic. SIGCOMM Comput. Commun. Rev. 30, 4 (August 2000), 87-95. DOI=10.1145/347057.347408 http://doi.acm.org/10.1145/347057.347408 s ©yoursunny.com 2013, CreativeCommons BY-NC 3.0

Upload: shi-junxiao

Post on 19-Dec-2014

202 views

Category:

Technology


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Network Redundancy Elimination

Network Redundancy EliminationJUNXIAO SHI 2013-11-05

Neil T. Spring and David Wetherall. 2000. A protocol-independent technique for eliminating redundant network traffic. SIGCOMM Comput. Commun. Rev. 30, 4 (August 2000), 87-95. DOI=10.1145/347057.347408 http://doi.acm.org/10.1145/347057.347408

slides ©yoursunny.com 2013, CreativeCommons BY-NC 3.0

Page 2: Network Redundancy Elimination

Problem

Page 3: Network Redundancy Elimination

Back in 2000, home Internet is slow

MODEM data rate: 33.6Kbps or 56Kbps

round trip latency: >100ms

2 minutes to load a webpage

Page 4: Network Redundancy Elimination

Today, Internet isn’t always fast Satellite link (eg. Iridium)

◦ high latency◦ 2.4KB/s◦ $1.35 per minute

2G cellular data (eg. H2O Wireless)◦ high latency◦ low bandwidth◦ $0.30 per MB

Page 5: Network Redundancy Elimination

Web contents are redundant

Screenshots of http://quotes.wsj.com/index/CN/SHCOMP during a trading day. Quote changes, but other remains same.

Page 6: Network Redundancy Elimination

Web contents are often uncached

Web authors don’t want you to cache their contents, because:◦ Contents are dynamic. Stock price may

change at any time. News articles are posted throughout the day.

◦ Contents are personalized. Your Facebook homepage is different from anyone else’s.

◦ Access count must be accurate. Advertising revenue is calculated per thousand impressions.

response headers of http://www.dailyfinance.com/

Page 7: Network Redundancy Elimination

To the naïve user -

Page 8: Network Redundancy Elimination

Design

Page 9: Network Redundancy Elimination

Architecture

cache

bandwidth-constrained channel

convert repeated strings into tokens

reconstruct original packet

contents of both caches must be consistent

network layer,protocol-independent

cache

Page 10: Network Redundancy Elimination

The Cache Cache: holds most recent packets

◦ admission policy: admit all◦ replacement policy: FIFO

Indexed by representative fingerprints of the packets it holds◦ map fingerprint to the most recent packet it appears

Page 11: Network Redundancy Elimination

Representative fingerprints1. Calculate rolling Rabin fingerprints for sequences of β bytes, mod M.

2. Select fingerprints ending with γ zeros as representative fingerprints.

Rabin fingerprints are not cryptographically secure. Algorithm should not assume collision-free.

Rabin fingerprints are used for finding similar documents, not for chunking.

window size: βselect one in 2γ fingerprintsfingerprint space: M

Page 12: Network Redundancy Elimination

Sender process

cache

generate representative fingerprints

lookup fingerprints in cache index

verify no collision

expand to the left and to the right, byte-by-byte

convert matched regions into tokensadd packet to cache,

evicting oldest packet if necessary

send encoded, smaller packet

token format• the fingerprint• # bytes expanded to the left• # bytes expanded to the right

Page 13: Network Redundancy Elimination

Receiver process

cache

lookup tokens in cache index

reconstruct original packet

generate representative fingerprints

add packet to cache, evicting oldest packet if necessary

deliver original packet

Page 14: Network Redundancy Elimination

Cache consistency Contents of sender cache and receiver cache must be consistent.

Why caches might be inconsistent?◦ Network channel isn’t reliable. A packet that entered sender cache but lost on the

channel will not be present in receiver cache.

How to detect cache inconsistency?◦ Fingerprints! If there’s no collision, receiving an unrecognized fingerprint indicates

caches are inconsistent.

What happens if caches are inconsistent?◦ Receiver cannot reconstruct original packet.

Page 15: Network Redundancy Elimination

Implementation

Page 16: Network Redundancy Elimination

Trace analyzer The algorithm is implemented as a user-level process to analyze a trace.

Page 17: Network Redundancy Elimination

Parameters Fingerprint space: M=260

◦ collision almost impossible

Penalty for each matching region: 12 octets◦ to represent the space needed for the token

Windows size β and fingerprint selecting frequency 2γ

◦ large β: better “quality” of matches, less potential bytes saving◦ small β: worse “quality” of matches (shorter matches in more recent packets)◦ small γ: more likely to find a match, larger index (=less memory for cached packets)◦ large γ: less likely to find a match, less memory usage◦ γ=5, β=64

Page 18: Network Redundancy Elimination

Performance 45Mbps on a PC with Pentium -550 and 1GB memoryⅢ

This work is designed for slow links.

Page 19: Network Redundancy Elimination

Follow-up work Future works by same authors:

◦ universal redundancy elimination◦ SmartRE: coordinated network-wide redundancy elimination◦ EndRE: end-system redundancy elimination

Page 20: Network Redundancy Elimination

Traffic AnalysisHow much redundancy is there?

Page 21: Network Redundancy Elimination

Amount of redundancy

Internet => corporate30% redundant

corporate => Internet50% redundant

with just 1MB of memory for cache+index:at least 10% redundant

Page 22: Network Redundancy Elimination

HTTP RTSP Napster Lotus HTTPS FTP-data NNTP DNS ASF AOL SMTP POP Telnet Other0

10

20

30

40

50

60

70

traffi

c am

ount

(%)

Redundancy by protocol HTTP, Telnet, POP, ASF have high percentage of repeated strings.

HTTPS, FTP-data, Napster, RTSP, NNTP have low percentage of repeated strings.

redundant traffic

Redundancy elimination algorithm is protocol-independent, so we can save bytes on non-Web traffic.

Page 23: Network Redundancy Elimination

Comparison with HTTP caching

Squid gzip Squid+gzip RE Squid+RE0

20

40

60

80

100

traffi

c (%

)

redundancy elimination works better than HTTP

caching and compression

Page 24: Network Redundancy Elimination