shift-based pattern matching for compressed web traffic author: anat bremler-barr, yaron...

Shift-based Pattern Matching for Compressed Web Traffic

Author:

Anat Bremler-Barr, Yaron Koral ,Victor Zigdon

Publisher: IEEE HPSR,2011

Presenter: Kai-Yang, Liu

Date: 2011/11/2

INTRODUCTION•Two-thirds of the top 1000 most popular

sites like Yahoo!, Google, MSN, YouTube, Facebook and others use HTTP compression to enhance the speed of their content downloads.

The GZIP Algorithm•LZ77 compression LZ77 compression technique is that we can compress a

series of bytes (characters) if we spot that this series of bytes has already appeared in the past. The algorithm replaces each repeated string by (distance,length) pair.

For example:the text: ‘abcdefgabcde’ can be compressed to:

‘abcdefg(7,5)’; LZ77 refers to the above pair as “pointer” and to uncompressed bytes as “literals”.

•Huffman Coding- reduce the symbol coding size by encoding frequent symbols with fewer bits.

INTRODUCTION

•Recent work (ACCH algorithm) presents technique for pattern matching on compressed traffic that decompresses the traffic and then uses data from the decompression phase to accelerate the process.

•We present Shift-based Pattern matching for Compressed traffic algorithm, SPC, that accelerates MWM on compressed traffic.

THE MODIFIED WU-MANBER ALGORITHM• MWM trims all patterns to their m bytes prefix,

where m is the size of the shortest pattern.• MWM chooses predefined group of bytes, namely

B, to determine the shift value.• MWM starts by precomputing two tables: a skip

shift table called ShiftTable and a patterns hash table, called Ptrns .

• The scan is performed using a virtual scan window of size m. The shift value is determined by indexing the ShiftTable with the B bytes suffix of the scan window.

THE MODIFIED WU-MANBER ALGORITHM

SHIFT-BASED PATTERN MATCHING FOR COMPRESSED TRAFFIC (SPC)• The bytes referred by the pointers were already

scanned; hence, if we have a prior knowledge that an area does not contain patterns, we can skip scanning most of it.

• Observe that even if no patterns were found when the referred area was scanned, patterns may occur in the boundaries of the pointer.

• The general method of the algorithm is to use a combined technique that scans uncompressed portions of the data using MWM and skips scanning most of the data represented by the LZ77 pointers.

SHIFT-BASED PATTERN MATCHING FOR COMPRESSED TRAFFIC (SPC)

EXPERIMENTAL RESULTS

•Data SetWe collected HTTP pages encoded with GZIP

taken from a list constructed from the Alexa website that maintains web traffic metrics and top-site lists.

•Pattern SetOur pattern-sets were gathered from two different

sources: ModSecurity , an open source web application firewall and Snort, an open source network intrusion prevention system.

SPC Characteristics Analysis• In order to understand the impact of B and m we

examined the character of skip ratio, Sr, the percentage of characters the algorithm skips.

•The Snort pattern set contains many short patterns, specifically 410 distinct patterns of length ≤ 3, 539 of length 4 and 381 of length 5.

•To circumvent this problem we inspected the containing rules. We can eliminate most of the short patterns by using longer pattern within the same rule or relying on specific flow parameters.

EXPERIMENTAL RESULTS(Skip Ratio)

EXPERIMENTAL RESULTS(Throughput)

EXPERIMENTAL RESULTS(Storage)

shift-based pattern matching for compressed web traffic author: anat bremler-barr, yaron...

compressed traffic algorithm

shiftbased pattern matching

snort pattern

present shiftbased pattern

pattern setour patternsets

longer pattern

short patterns

uncompressed bytes

Documents

ieee hpsr 2014 scaling multi-core network processors without...

hpsr 2006 distributed crossbar schedulers cyriel minkenberg...

gender and ethics in practice: experiences of researchers...

design project for steel and timber design ... - koral eren

hpsr newsletter -...

enhancing health systems and role of health policy and...

deep packet inspection as a service anat bremler-barr idc...

space-time tradeoffs in software-based deep packet...

keystone / module 5 / slideshow 4 / hpsr research ideas...

space-time tradeoffs in software-based deep packet...

opportunities in middlebox virtualization prof. anat...

hpsr? what is health policy and systems research?

brief announcement: spoofing prevention method anat...

restoration by path concatenation: fast recovery of mpls...

network-aware clustering of web clients advanced ip topics...

companion to hpsr security briefing podcast episode 14 ......

hpsr building2009

liron schiff * (tau) joint work with yehuda afek, anat...

1 yehuda afek, tel-aviv university / wanwall ltd. anat...

jar - leto 2020 barefoot - leto berg koral...