a quick guide to freescale regular

2
freescale.com Brochure A Quick Guide to Freescale Regular Expressions and Stateful Rules Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. This product incorporates Power dummy legal text © Freescale Semiconductor, Inc. 2007 Document Number: BRREGEXSTATERLLG REV 0 Learn More: For current information about Freescale products and documentation, please visit www.freescale.com. Variables There are a number of read-write and read-only variables available to the rule writers. Read-write variables are: Type Name Range Size Session Variables SRV[x] x=1..15 1 byte SRV[x:y] x = 1..14 y = 2..15 (y-x+1) = 1..8 (y - x + 1) bytes Session Flags SF[x] X=1..16 1 bit Global Variables GV[x] x=1..16 1 byte GV[x:y] x = 1..15 y = 2..16 (y-x+1) = 1..8 (y - x + 1) bytes Temporary Flags TF[x] x = 1..16 1 bit Temporary Variables * see note below TV[x] x = 1..16 1 byte TV[x:y] x = 1..15 y = 2..16 (y-x+1) = 1..8 (y - x + 1) bytes Note: Only available in rules without context Read-only variables indicate information related to the specific pattern matched when the corresponding event happens. Where applicable, the first byte of either the SUI or work-unit is considered to be at position or offset 1. Read-only variables are: Name Size Description $T 4 bytes A generic tag assigned to the pattern that invoked this reaction. $I 1 byte Indicates whether the match that invoked this reaction is inconclusive. $P 4 bytes The position within the SUI at which the trigger byte is found. $Nl 1 byte The number of bytes to the left of the trigger byte that are matched by the pattern. $Nr 1 byte The number of bytes to the right of the trigger byte that are matched by the pattern. $N 1 byte The number of bytes matched by the pattern. $M 4 bytes The position within the work-unit of the rightmost byte of the match. $Sc 6 bytes The number of bytes completely scanned (i.e. scanned and not held as residue) prior to the current SUI. $Si 6 bytes The number of bytes initially scanned (i.e. scanned and possibly held as residue) prior to the current work-unit. $R 1 byte The number of bytes of residue that are prepended to the current work-unit. $Ob 4 bytes The position within the SUI where a line break character (LF or CR) last occurred. A value of 0 indicates that a line break has not yet been detected in the current SUI. $Ox 4 bytes The position within the SUI where an extended character (i.e. with bit 7 set) last occurred. A value of 0 indicates that an extended character has not yet been detected in the current SUI. $X 8 bytes A generic 64-bit value captured by the DXE during the pattern match that invoked this reaction. $Xn 1 byte The number of character positions that matched and contributed to the value $X. - A value of 0 indicates no successful capture match. - For string captures, a value greater than 8 indicates an overflow. - For binary captures, a value greater than 64 indicates an overflow. - For octal captures, a value greater than 21 indicates an overflow. - For hexadecimal captures, a value greater than 16 indicates an overflow. - For decimal captures, a value greater than 16 indicates an overflow. $Y 8 bytes A second generic 64-bit value captured by the DXE during the pattern match that invoked this reaction. $Yn 1 byte The number of character positions that matched and contributed to the value $Y. - A value of 0 indicates no successful capture match. - For string captures, a value greater than 8 indicates an overflow. - For binary captures, a value greater than 64 indicates an overflow. - For octal captures, a value greater than 21 indicates an overflow. - For hexadecimal captures, a value greater than 16 indicates an overflow. - For decimal captures, a value greater than 16 indicates an overflow. Actions and Control Constructs Once an event is matched in a desired state, various actions can take place, optional under the control of a number of program flow constructs. The available actions are: Action Syntax Example change state next_state <state_name> next_state LOGGED_IN exit processing exit exit variable assignment target = <src_operand> (assign) = <src_operand> + <src_operand> (add) = <src_operand> - <src_operand> (subtract) = <src_operand> & <src_operand> (bitwise AND) = <src_operand> | <src_operand> (bitwise OR) = <src_operand> << <src_operand> (shift left) = <src_operand> >> <src_operand> (shift right) GV[1] = GV[1] + 1 SRV[1] = 0xab & GV[4] GV[1] = GV[1] - SRV[1] report Either “report” or “write” report { <report item> <report item> ... } write <write item>:<width> Report or write items are any read/write variable, read only variable, fixed value or a string. Width of a report item can optionally be reported via: <item>:<width> report { GV[1] SRV[1:3] 15:4 ‘A string’:32 } write $X:2 write 0x58:8 If/else if (<condition) { <action_1> ... } else { <action_1> <action_2> ... } else part is optional STATE LOGGED_IN: EVENT “login” if (GV[1] == 1) { report {0x0001} } else {report {0x0000} } while while (<condition>) { action } Glossary Terminology Definition Data Examination Engine (DXE) The hardware engine within the Pattern Matcher that performs the actual regular expression evaluation. This is the second stage of the three-stage pipeline that implements the core pattern matching functionality Deflate Engine The hardware engine within the Pattern Matcher that performs decompression of the incoming data that has previously been compressed using the DEFLATE Compressed Data format. Fingerprint A set of contiguous symbols extracted from the pattern to represent the pattern in the KES stage. Inconclusive Match An inconclusive match is reported to software when the Pattern Matcher has started to match a pattern, but was unable to reach a conclusion on whether a complete match has occurred or not. Key Element Scanner (KES) The hardware engine within the Pattern Matcher that performs the fingerprint matching. This is the first stage of the three-stage pipeline that implements the core pattern matching functionality. Pattern Matcher The hardware that implements the regular expression pattern matching functionality. It consists of the DMA Engine, the Deflate Engine, and the core pattern matching engines (the Key Element Scanner Engine, the Data Examination Engine, and the Stateful Rule Engine). Pattern Set An exclusive grouping of patterns that are to be searched for simultaneously. Pattern Subset A non-exclusive grouping of patterns within a given pattern set that are to be searched for simultaneously, with or without other subsets. Residue Scanned data, from the previous work-unit, prepended to the current work-unit of the same stream. This is used to detect patterns that cross work-unit boundaries. Session A logical grouping of one or more input streams, typically both directions of a flow, representing a network conversation. Stateful rules are applied per session. Stateful Rule User-defined instructions that are executed by the Pattern Matcher when specified pattern matching events occur. Stateful Rule Engine (SRE) The hardware engine within the Pattern Matcher that performs the stateful rule execution. This is the third stage of the three-stage pipeline that implements the core pattern matching functionality. Stream When used in the context of input data, a stream is the ordered set of bytes or packets in a single direction of a specified flow. String Under Inspection (SUI) The input string of bytes to be searched by the Pattern Matcher, including the residue. An SUI is composed of the contents of the work-unit, prepended with the residue from the previous SUI on this stream (if residue is enabled). Trigger Byte The rightmost byte of the fingerprint. Work-Unit The input string of bytes to be searched by the Pattern Matcher, excluding the residue.

Upload: others

Post on 23-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

freescale.com

Brochure

A Quick Guide to Freescale RegularExpressions and Stateful Rules

Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. This product incorporates Power dummy legal text© Freescale Semiconductor, Inc. 2007

Document Number: BRREGEXSTATERLLG REV 0

Learn More: For current information about Freescale products and documentation, please visit www.freescale.com.

VariablesThere are a number of read-write and read-only variables available to the rule writers.

Read-write variables are:

Type Name Range SizeSession Variables SRV[x] x=1..15 1 byte

SRV[x:y] x = 1..14 y = 2..15 (y-x+1) = 1..8

(y - x + 1) bytes

Session Flags SF[x] X=1..16 1 bit

Global Variables GV[x] x=1..16 1 byte

GV[x:y] x = 1..15 y = 2..16 (y-x+1) = 1..8

(y - x + 1) bytes

Temporary Flags TF[x] x = 1..16 1 bit

Temporary Variables * see note below

TV[x] x = 1..16 1 byte

TV[x:y] x = 1..15 y = 2..16 (y-x+1) = 1..8

(y - x + 1) bytes

Note: Only available in rules without context

Read-only variables indicate information related to the specific pattern matched when the corresponding event happens. Where applicable, the first byte of either the SUI or work-unit is considered to be at position or offset 1. Read-only variables are:

Name Size Description$T 4 bytes A generic tag assigned to the pattern that invoked this reaction.

$I 1 byte Indicates whether the match that invoked this reaction is inconclusive.

$P 4 bytes The position within the SUI at which the trigger byte is found.

$Nl 1 byte The number of bytes to the left of the trigger byte that are matched by the pattern.

$Nr 1 byte The number of bytes to the right of the trigger byte that are matched by the pattern.

$N 1 byte The number of bytes matched by the pattern.

$M 4 bytes The position within the work-unit of the rightmost byte of the match.

$Sc 6 bytes The number of bytes completely scanned (i.e. scanned and not held as residue) prior to the current SUI.

$Si 6 bytes The number of bytes initially scanned (i.e. scanned and possibly held as residue) prior to the current work-unit.

$R 1 byte The number of bytes of residue that are prepended to the current work-unit.

$Ob 4 bytes The position within the SUI where a line break character (LF or CR) last occurred. A value of 0 indicates that a line break has not yet been detected in the current SUI.

$Ox 4 bytes The position within the SUI where an extended character (i.e. with bit 7 set) last occurred. A value of 0 indicates that an extended character has not yet been detected in the current SUI.

$X 8 bytes A generic 64-bit value captured by the DXE during the pattern match that invoked this reaction.

$Xn 1 byte The number of character positions that matched and contributed to the value $X. - A value of 0 indicates no successful capture match. - For string captures, a value greater than 8 indicates an overflow. - For binary captures, a value greater than 64 indicates an overflow. - For octal captures, a value greater than 21 indicates an overflow. - For hexadecimal captures, a value greater than 16 indicates an overflow. - For decimal captures, a value greater than 16 indicates an overflow.

$Y 8 bytes A second generic 64-bit value captured by the DXE during the pattern match that invoked this reaction.

$Yn 1 byte The number of character positions that matched and contributed to the value $Y. - A value of 0 indicates no successful capture match. - For string captures, a value greater than 8 indicates an overflow. - For binary captures, a value greater than 64 indicates an overflow. - For octal captures, a value greater than 21 indicates an overflow. - For hexadecimal captures, a value greater than 16 indicates an overflow. - For decimal captures, a value greater than 16 indicates an overflow.

Actions and Control ConstructsOnce an event is matched in a desired state, various actions can take place, optional under the control of a number of program flow constructs. The available actions are:

Action Syntax Examplechange state next_state <state_name> next_state LOGGED_INexit processing exit exitvariable assignment target = <src_operand> (assign)

= <src_operand> + <src_operand> (add) = <src_operand> - <src_operand> (subtract) = <src_operand> & <src_operand> (bitwise AND) = <src_operand> | <src_operand> (bitwise OR) = <src_operand> << <src_operand> (shift left) = <src_operand> >> <src_operand> (shift right)

GV[1] = GV[1] + 1 SRV[1] = 0xab & GV[4] GV[1] = GV[1] - SRV[1]

report Either “report” or “write” report { <report item> <report item> ... } write <write item>:<width> Report or write items are any read/write variable, read only variable, fixed value or a string. Width of a report item can optionally be reported via: <item>:<width>

report { GV[1] SRV[1:3] 15:4 ‘A string’:32 } write $X:2 write 0x58:8

If/else if (<condition) { <action_1> ... } else { <action_1> <action_2> ... } else part is optional

STATE LOGGED_IN: EVENT “login” if (GV[1] == 1) { report {0x0001} } else {report {0x0000} }

while while (<condition>) { action }

GlossaryTerminology Definition

Data Examination Engine (DXE) The hardware engine within the Pattern Matcher that performs the actual regular expression evaluation. This is the second stage of the three-stage pipeline that implements the core pattern matching functionality

Deflate Engine The hardware engine within the Pattern Matcher that performs decompression of the incoming data that has previously been compressed using the DEFLATE Compressed Data format.

Fingerprint A set of contiguous symbols extracted from the pattern to represent the pattern in the KES stage.

Inconclusive Match An inconclusive match is reported to software when the Pattern Matcher has started to match a pattern, but was unable to reach a conclusion on whether a complete match has occurred or not.

Key Element Scanner (KES) The hardware engine within the Pattern Matcher that performs the fingerprint matching. This is the first stage of the three-stage pipeline that implements the core pattern matching functionality.

Pattern Matcher The hardware that implements the regular expression pattern matching functionality. It consists of the DMA Engine, the Deflate Engine, and the core pattern matching engines (the Key Element Scanner Engine, the Data Examination Engine, and the Stateful Rule Engine).

Pattern Set An exclusive grouping of patterns that are to be searched for simultaneously.

Pattern Subset A non-exclusive grouping of patterns within a given pattern set that are to be searched for simultaneously, with or without other subsets.

Residue Scanned data, from the previous work-unit, prepended to the current work-unit of the same stream. This is used to detect patterns that cross work-unit boundaries.

Session A logical grouping of one or more input streams, typically both directions of a flow, representing a network conversation. Stateful rules are applied per session.

Stateful Rule User-defined instructions that are executed by the Pattern Matcher when specified pattern matching events occur.

Stateful Rule Engine (SRE) The hardware engine within the Pattern Matcher that performs the stateful rule execution. This is the third stage of the three-stage pipeline that implements the core pattern matching functionality.

Stream When used in the context of input data, a stream is the ordered set of bytes or packets in a single direction of a specified flow.

String Under Inspection (SUI) The input string of bytes to be searched by the Pattern Matcher, including the residue. An SUI is composed of the contents of the work-unit, prepended with the residue from the previous SUI on this stream (if residue is enabled).

Trigger Byte The rightmost byte of the fingerprint.

Work-Unit The input string of bytes to be searched by the Pattern Matcher, excluding the residue.

Freescale Regex Line Options:

A Freescale Regex line is of the following format:

<exprName> /expression/[options]

# Regex Line Examples

IDS_pattern /^Location\x3a\s*URL\s*\x3a/smi

SPAM_pattern /U\.?S\.?(D|D\.)? *\$ *(\d+,\d+,\d+|\d+\.\d+\.\d+|(\d+\.\d+|\d+) *milli?on)/

VIRUS_pattern /\x56\xBE\x00\x04\x00\x04\x57\x56\x55\x6A\x6B\xFF\x59\x59/

tag=0x00000833

Character RepresentationCharacter/Sequence Meaning Example

Any printable character except . * ? + [ ] ( ) { } ^ $ \ |

Match literally a matches a

\ (backslash) followed by any of . * ? + [ ] ( ) { } ^ $ \ |

A backslash escapes special characters to suppress their special meaning

\$ matches $

\a Alert (bell), x07 \a matches hex 07

\e ESC character, x1B \e matches hex 1B

\n New line, x0A \n matches hex 0A

\r Carriage return, x0D \r matches hex 0D

\f Form feed, x0C \f matches hex 0C

\t Horizontal tab, x09 \t matches hex 09

\nnn Character specified by a three digit octal code

\101 matches A

\xmm Character specified by a one or two digit hexadecimal code

\x41 matches A

AnchorsSequence Meaning Example

^ (caret) Match the position of the start of work-unit, the start of stream, or after any newline, depending on the setting of the m and stream options.

^abc matches abc def, does not match xxabcdef

$ (dollar) Match the position of the end of work-unit, the end of stream, or before a newline, depending on the setting of the m and stream options.

def$ matches abcdef, does not match abcdefxx

Grouping, Conditional and Control Sequence Meaning Example(...) Group subpattern (abc) matches abc

(...|...) Alternation Match subpatterns on either side of | (pipe)

(aaa|bbb|ccc) matches aaa, bbb or ccc

* Match 0 or more times, as many times as possible (greedy match)

ab* matches a in xxxaxx, abb in xxxabbxx

+ Match 1 or more times, as many times as possible

ab+ matches abb in xxxabbxx

? Match 0 or 1 times, as many times as possible

ab? matches a in xxxaxx, ab in xxxabbxx

{n} Match exactly n times a{3} matches aaa

{n,} Match at least n times, as many times as possible

ab{3,} matches abbbbb in abbbbbx

{x,y} Match at least x times but no more than y times, and as many times as possible

ab{1,3} matches abbb in xabbbbbbx

? Match 0 or more times, as few times as possible

ab*? matches a in xxxabbxx

+? Match 1 or more times, but as few times as possible

ab+? matches ab in xxxabbxx

?? Match 0 or 1 time, but as few times as possible

ab?? matches a in xxxaxx or xxxabbxx

{n,}? Match at least n times, but as few times as possible

ab{3,}? matches abbb in abbbbbbx

{x,y}? Match at least x times, no more than y times and as few times as possible

ab{1,3}? matches ab in xabbbbbbx

Character Classes and ShorthandsClass Constructs Meaning Example

[...] characters inside square brakcets

Character construct to match a single character listed or contained within a listed range Note rules inside character class not identical to outside

[a] matches a

Any character except ^-[]\ inside [...]

Add to the possible matches for that class

[123] matches 1, 2 or 3

\ (backslash) followed by any of ^-[]\ inside [...]

Special meaning of special characters suppressed

[\^\-\]\\] matches ^, -, ] or \

- (hyphen) inside [...], except immediately after the opening [

Match the specified range of characters Match hyphen literally if placed right after opening [

[a-zA-Z] matches any lowercase or uppercase letter

[^...], ^ (caret) immediately after the opening [

Negated character class, matches single character not listed and not contained within a listed range

[^123] matches 0, 4, 5, 6, 7, 8, 9, a…z, A…Z, !@#$%^...

\b inside [...] Backspace hex 08 [\b] matches backspace

. (dot) Match any character, excluding \n, newline, unless Regex line option “s” set

. matches a, b, c, …1, 2, 3, #, $, …

\w Word character, shorthand for [a-zA-Z0-9_]

\w matches a, b, …, A, B, …, 0, 1, …9, _

\W Non-word character, shorthand for [^a-zA-Z0-9_]

\W matches any character that is not a, b ... A, B, ... 0, 1 ... 9, _

\d Digit character, shorthand for [0-9] \d matches 0, 1, 2, …, 9

\D Non-digit character, shorthand for [^0–9]

\D matches any character that is not 0, 1, 2, …, 9

\s White space character, shorthand for [\n\r\f\t ]

\s matches any white space character

\S Non-white space character, shorthand for [^\n\r\f\t ]

\S matches any character that is not a white space

Freescale Regular Expressions and Stateful Rules

This quick guide is a reference for the Regular Expressions (Regex)

and Stateful Rules supported by the Freescale hardware-based

Pattern Matcher, a great vehicle to accelerate content security

and other applications such as intrusion detection/prevention,

antivirus, anti-spam, application classification and content filtering.

For a complete description, please consult “Pattern Matcher 1.1

Software User Guide.”

www.freescale.com2 www.freescale.com 3

# Define Stateful Rule Matching the Protocol ExchangeSTATEFUL_RULE: HTTP_Recognizer

RESET_STATE:

EVENT “http_request”

next_state AWAIT_response

STATE AWAIT_response:

EVENT “http_response”

# report HTTP traffic observed

report {0x00000001}

next_state RESET_STATE

Stateful Rule StructureThe structure of the general multi-state stateful rule is as follows. Note that there

can be fewer or more states, events and actions as depicted.

STATEFUL_RULE: <rule_name>

RESET_STATE:

EVENT “regex_name_1”

action_1

STATE <state_name_2>:

EVENT “<regex_name_2>”

action_1

action_2

EVENT “<regex_name_3>”

action_1

action_2

STATE <state_name_3>:

EVENT “<regex_name_4>”

action_1

action_2

EVENT END_OF_SUI

action_1

action_2

StatesA general stateful rule has two or more states and always starts with the

“RESET_STATE”

EventsEvents are either matching of specific regexes or END_OF_SUI

(String Under Inspection)

Data Capture The Pattern Matcher supports capturing and storing data from the input data

stream. The captured data can be subsequently used within Stateful Rules.

The syntax for the capture portion of the Regex is as follows:

(?<var><modifier>[<subexpression>])

Where

• (? … ) is the escape sequence

• <var> specifies the register in which the captured data is to be

stored. Valid values are $X and $Y, both 64-bit registers.

• <modifier> specifies how the data should be interpreted and stored:

L Interpreted as a literal, stored Left-justified

R Interpreted as a literal, stored Right-justified

# Interpreted as a literal, hashed before stored

B Interpreted as an ASCII-encoded Binary number,

numeric value stored

O Interpreted as an ASCII-encoded Octal number,

numeric value stored

D Interpreted as an ASCII-encoded Decimal number,

numeric value stored

H Interpreted as an ASCII-encoded Hexadecimal number,

numeric value stored

• <subexpression> is the capture expression which, when the overall

expression is matched, will cause the data matching this subexpression

to be stored in the <var> register according to the <modifier> value.

Example 1: exp /login:(?$XL[A-Za-z0-9]{1,8})/This specifies to the Pattern Matcher to search for a string consisting

of “login:” followed immediately by 1 to 8 alphanumeric characters.

The “(?$XL” construct specifies to the engine to store into the $X register

the portion of the match that falls within the parentheses, in this example,

the 1 to 8 alphanumeric characters, with the capture starting from the left

edge of the data matching the expression.

Example 2: exp /LENGTH=(?$YH[0-7][0-9A-Fa-f]{2})/This specifies to the Pattern Matcher to search for a string consisting

of “LENGTH=” followed immediately an ASCII-encoded hexadecimal number

from 000 to 7FF, and to store the number into the $Y register.

Stateful RulesStateful Rules provide the basic constructs to build Finite State Machine-based

applications such as tracking application layer protocol states, parsing application

layer messages and matching complicated patterns.

Stateful Rule ExampleThe syntax and semantics of Stateful Rules are very intuitive. The following

example is intended to give readers a feel of how stateful rules can be used.

This Stateful Rule classifies HTTP traffic by matching HTTP request and response

message patterns and the stateful relationship between them.

# Define Regex Signatures of HTTP Request and Responsehttp_request /^(GET|POST)\s.*?HTTP\/1\.\d$/

http_response /^HTTP\/1\.\d\s200\sOK$/

Option Description

s Dot matches new line in addition to other characters

i Case-insensitive match

m ^ and $ match at line breaks

tag=0x<hex value> tag = <decimal value>

32-bit tag value returned in the match report

set=<set value> Select mutually exclusive pattern set from 0 to 255

subsets=0x<hex value> subsets=<decimal value>

Select pattern subsets, 16-bit mask per pattern set

stream Treat anchors relative to the stream. Default is relative to the work-unit.

noreport Flag to suppress pattern match report generation