a quick guide to freescale regular
TRANSCRIPT
freescale.com
Brochure
A Quick Guide to Freescale RegularExpressions and Stateful Rules
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. This product incorporates Power dummy legal text© Freescale Semiconductor, Inc. 2007
Document Number: BRREGEXSTATERLLG REV 0
Learn More: For current information about Freescale products and documentation, please visit www.freescale.com.
VariablesThere are a number of read-write and read-only variables available to the rule writers.
Read-write variables are:
Type Name Range SizeSession Variables SRV[x] x=1..15 1 byte
SRV[x:y] x = 1..14 y = 2..15 (y-x+1) = 1..8
(y - x + 1) bytes
Session Flags SF[x] X=1..16 1 bit
Global Variables GV[x] x=1..16 1 byte
GV[x:y] x = 1..15 y = 2..16 (y-x+1) = 1..8
(y - x + 1) bytes
Temporary Flags TF[x] x = 1..16 1 bit
Temporary Variables * see note below
TV[x] x = 1..16 1 byte
TV[x:y] x = 1..15 y = 2..16 (y-x+1) = 1..8
(y - x + 1) bytes
Note: Only available in rules without context
Read-only variables indicate information related to the specific pattern matched when the corresponding event happens. Where applicable, the first byte of either the SUI or work-unit is considered to be at position or offset 1. Read-only variables are:
Name Size Description$T 4 bytes A generic tag assigned to the pattern that invoked this reaction.
$I 1 byte Indicates whether the match that invoked this reaction is inconclusive.
$P 4 bytes The position within the SUI at which the trigger byte is found.
$Nl 1 byte The number of bytes to the left of the trigger byte that are matched by the pattern.
$Nr 1 byte The number of bytes to the right of the trigger byte that are matched by the pattern.
$N 1 byte The number of bytes matched by the pattern.
$M 4 bytes The position within the work-unit of the rightmost byte of the match.
$Sc 6 bytes The number of bytes completely scanned (i.e. scanned and not held as residue) prior to the current SUI.
$Si 6 bytes The number of bytes initially scanned (i.e. scanned and possibly held as residue) prior to the current work-unit.
$R 1 byte The number of bytes of residue that are prepended to the current work-unit.
$Ob 4 bytes The position within the SUI where a line break character (LF or CR) last occurred. A value of 0 indicates that a line break has not yet been detected in the current SUI.
$Ox 4 bytes The position within the SUI where an extended character (i.e. with bit 7 set) last occurred. A value of 0 indicates that an extended character has not yet been detected in the current SUI.
$X 8 bytes A generic 64-bit value captured by the DXE during the pattern match that invoked this reaction.
$Xn 1 byte The number of character positions that matched and contributed to the value $X. - A value of 0 indicates no successful capture match. - For string captures, a value greater than 8 indicates an overflow. - For binary captures, a value greater than 64 indicates an overflow. - For octal captures, a value greater than 21 indicates an overflow. - For hexadecimal captures, a value greater than 16 indicates an overflow. - For decimal captures, a value greater than 16 indicates an overflow.
$Y 8 bytes A second generic 64-bit value captured by the DXE during the pattern match that invoked this reaction.
$Yn 1 byte The number of character positions that matched and contributed to the value $Y. - A value of 0 indicates no successful capture match. - For string captures, a value greater than 8 indicates an overflow. - For binary captures, a value greater than 64 indicates an overflow. - For octal captures, a value greater than 21 indicates an overflow. - For hexadecimal captures, a value greater than 16 indicates an overflow. - For decimal captures, a value greater than 16 indicates an overflow.
Actions and Control ConstructsOnce an event is matched in a desired state, various actions can take place, optional under the control of a number of program flow constructs. The available actions are:
Action Syntax Examplechange state next_state <state_name> next_state LOGGED_INexit processing exit exitvariable assignment target = <src_operand> (assign)
= <src_operand> + <src_operand> (add) = <src_operand> - <src_operand> (subtract) = <src_operand> & <src_operand> (bitwise AND) = <src_operand> | <src_operand> (bitwise OR) = <src_operand> << <src_operand> (shift left) = <src_operand> >> <src_operand> (shift right)
GV[1] = GV[1] + 1 SRV[1] = 0xab & GV[4] GV[1] = GV[1] - SRV[1]
report Either “report” or “write” report { <report item> <report item> ... } write <write item>:<width> Report or write items are any read/write variable, read only variable, fixed value or a string. Width of a report item can optionally be reported via: <item>:<width>
report { GV[1] SRV[1:3] 15:4 ‘A string’:32 } write $X:2 write 0x58:8
If/else if (<condition) { <action_1> ... } else { <action_1> <action_2> ... } else part is optional
STATE LOGGED_IN: EVENT “login” if (GV[1] == 1) { report {0x0001} } else {report {0x0000} }
while while (<condition>) { action }
GlossaryTerminology Definition
Data Examination Engine (DXE) The hardware engine within the Pattern Matcher that performs the actual regular expression evaluation. This is the second stage of the three-stage pipeline that implements the core pattern matching functionality
Deflate Engine The hardware engine within the Pattern Matcher that performs decompression of the incoming data that has previously been compressed using the DEFLATE Compressed Data format.
Fingerprint A set of contiguous symbols extracted from the pattern to represent the pattern in the KES stage.
Inconclusive Match An inconclusive match is reported to software when the Pattern Matcher has started to match a pattern, but was unable to reach a conclusion on whether a complete match has occurred or not.
Key Element Scanner (KES) The hardware engine within the Pattern Matcher that performs the fingerprint matching. This is the first stage of the three-stage pipeline that implements the core pattern matching functionality.
Pattern Matcher The hardware that implements the regular expression pattern matching functionality. It consists of the DMA Engine, the Deflate Engine, and the core pattern matching engines (the Key Element Scanner Engine, the Data Examination Engine, and the Stateful Rule Engine).
Pattern Set An exclusive grouping of patterns that are to be searched for simultaneously.
Pattern Subset A non-exclusive grouping of patterns within a given pattern set that are to be searched for simultaneously, with or without other subsets.
Residue Scanned data, from the previous work-unit, prepended to the current work-unit of the same stream. This is used to detect patterns that cross work-unit boundaries.
Session A logical grouping of one or more input streams, typically both directions of a flow, representing a network conversation. Stateful rules are applied per session.
Stateful Rule User-defined instructions that are executed by the Pattern Matcher when specified pattern matching events occur.
Stateful Rule Engine (SRE) The hardware engine within the Pattern Matcher that performs the stateful rule execution. This is the third stage of the three-stage pipeline that implements the core pattern matching functionality.
Stream When used in the context of input data, a stream is the ordered set of bytes or packets in a single direction of a specified flow.
String Under Inspection (SUI) The input string of bytes to be searched by the Pattern Matcher, including the residue. An SUI is composed of the contents of the work-unit, prepended with the residue from the previous SUI on this stream (if residue is enabled).
Trigger Byte The rightmost byte of the fingerprint.
Work-Unit The input string of bytes to be searched by the Pattern Matcher, excluding the residue.
Freescale Regex Line Options:
A Freescale Regex line is of the following format:
<exprName> /expression/[options]
# Regex Line Examples
IDS_pattern /^Location\x3a\s*URL\s*\x3a/smi
SPAM_pattern /U\.?S\.?(D|D\.)? *\$ *(\d+,\d+,\d+|\d+\.\d+\.\d+|(\d+\.\d+|\d+) *milli?on)/
VIRUS_pattern /\x56\xBE\x00\x04\x00\x04\x57\x56\x55\x6A\x6B\xFF\x59\x59/
tag=0x00000833
Character RepresentationCharacter/Sequence Meaning Example
Any printable character except . * ? + [ ] ( ) { } ^ $ \ |
Match literally a matches a
\ (backslash) followed by any of . * ? + [ ] ( ) { } ^ $ \ |
A backslash escapes special characters to suppress their special meaning
\$ matches $
\a Alert (bell), x07 \a matches hex 07
\e ESC character, x1B \e matches hex 1B
\n New line, x0A \n matches hex 0A
\r Carriage return, x0D \r matches hex 0D
\f Form feed, x0C \f matches hex 0C
\t Horizontal tab, x09 \t matches hex 09
\nnn Character specified by a three digit octal code
\101 matches A
\xmm Character specified by a one or two digit hexadecimal code
\x41 matches A
AnchorsSequence Meaning Example
^ (caret) Match the position of the start of work-unit, the start of stream, or after any newline, depending on the setting of the m and stream options.
^abc matches abc def, does not match xxabcdef
$ (dollar) Match the position of the end of work-unit, the end of stream, or before a newline, depending on the setting of the m and stream options.
def$ matches abcdef, does not match abcdefxx
Grouping, Conditional and Control Sequence Meaning Example(...) Group subpattern (abc) matches abc
(...|...) Alternation Match subpatterns on either side of | (pipe)
(aaa|bbb|ccc) matches aaa, bbb or ccc
* Match 0 or more times, as many times as possible (greedy match)
ab* matches a in xxxaxx, abb in xxxabbxx
+ Match 1 or more times, as many times as possible
ab+ matches abb in xxxabbxx
? Match 0 or 1 times, as many times as possible
ab? matches a in xxxaxx, ab in xxxabbxx
{n} Match exactly n times a{3} matches aaa
{n,} Match at least n times, as many times as possible
ab{3,} matches abbbbb in abbbbbx
{x,y} Match at least x times but no more than y times, and as many times as possible
ab{1,3} matches abbb in xabbbbbbx
? Match 0 or more times, as few times as possible
ab*? matches a in xxxabbxx
+? Match 1 or more times, but as few times as possible
ab+? matches ab in xxxabbxx
?? Match 0 or 1 time, but as few times as possible
ab?? matches a in xxxaxx or xxxabbxx
{n,}? Match at least n times, but as few times as possible
ab{3,}? matches abbb in abbbbbbx
{x,y}? Match at least x times, no more than y times and as few times as possible
ab{1,3}? matches ab in xabbbbbbx
Character Classes and ShorthandsClass Constructs Meaning Example
[...] characters inside square brakcets
Character construct to match a single character listed or contained within a listed range Note rules inside character class not identical to outside
[a] matches a
Any character except ^-[]\ inside [...]
Add to the possible matches for that class
[123] matches 1, 2 or 3
\ (backslash) followed by any of ^-[]\ inside [...]
Special meaning of special characters suppressed
[\^\-\]\\] matches ^, -, ] or \
- (hyphen) inside [...], except immediately after the opening [
Match the specified range of characters Match hyphen literally if placed right after opening [
[a-zA-Z] matches any lowercase or uppercase letter
[^...], ^ (caret) immediately after the opening [
Negated character class, matches single character not listed and not contained within a listed range
[^123] matches 0, 4, 5, 6, 7, 8, 9, a…z, A…Z, !@#$%^...
\b inside [...] Backspace hex 08 [\b] matches backspace
. (dot) Match any character, excluding \n, newline, unless Regex line option “s” set
. matches a, b, c, …1, 2, 3, #, $, …
\w Word character, shorthand for [a-zA-Z0-9_]
\w matches a, b, …, A, B, …, 0, 1, …9, _
\W Non-word character, shorthand for [^a-zA-Z0-9_]
\W matches any character that is not a, b ... A, B, ... 0, 1 ... 9, _
\d Digit character, shorthand for [0-9] \d matches 0, 1, 2, …, 9
\D Non-digit character, shorthand for [^0–9]
\D matches any character that is not 0, 1, 2, …, 9
\s White space character, shorthand for [\n\r\f\t ]
\s matches any white space character
\S Non-white space character, shorthand for [^\n\r\f\t ]
\S matches any character that is not a white space
Freescale Regular Expressions and Stateful Rules
This quick guide is a reference for the Regular Expressions (Regex)
and Stateful Rules supported by the Freescale hardware-based
Pattern Matcher, a great vehicle to accelerate content security
and other applications such as intrusion detection/prevention,
antivirus, anti-spam, application classification and content filtering.
For a complete description, please consult “Pattern Matcher 1.1
Software User Guide.”
www.freescale.com2 www.freescale.com 3
# Define Stateful Rule Matching the Protocol ExchangeSTATEFUL_RULE: HTTP_Recognizer
RESET_STATE:
EVENT “http_request”
next_state AWAIT_response
STATE AWAIT_response:
EVENT “http_response”
# report HTTP traffic observed
report {0x00000001}
next_state RESET_STATE
Stateful Rule StructureThe structure of the general multi-state stateful rule is as follows. Note that there
can be fewer or more states, events and actions as depicted.
STATEFUL_RULE: <rule_name>
RESET_STATE:
EVENT “regex_name_1”
action_1
STATE <state_name_2>:
EVENT “<regex_name_2>”
action_1
action_2
EVENT “<regex_name_3>”
action_1
action_2
STATE <state_name_3>:
EVENT “<regex_name_4>”
action_1
action_2
EVENT END_OF_SUI
action_1
action_2
StatesA general stateful rule has two or more states and always starts with the
“RESET_STATE”
EventsEvents are either matching of specific regexes or END_OF_SUI
(String Under Inspection)
Data Capture The Pattern Matcher supports capturing and storing data from the input data
stream. The captured data can be subsequently used within Stateful Rules.
The syntax for the capture portion of the Regex is as follows:
(?<var><modifier>[<subexpression>])
Where
• (? … ) is the escape sequence
• <var> specifies the register in which the captured data is to be
stored. Valid values are $X and $Y, both 64-bit registers.
• <modifier> specifies how the data should be interpreted and stored:
L Interpreted as a literal, stored Left-justified
R Interpreted as a literal, stored Right-justified
# Interpreted as a literal, hashed before stored
B Interpreted as an ASCII-encoded Binary number,
numeric value stored
O Interpreted as an ASCII-encoded Octal number,
numeric value stored
D Interpreted as an ASCII-encoded Decimal number,
numeric value stored
H Interpreted as an ASCII-encoded Hexadecimal number,
numeric value stored
• <subexpression> is the capture expression which, when the overall
expression is matched, will cause the data matching this subexpression
to be stored in the <var> register according to the <modifier> value.
Example 1: exp /login:(?$XL[A-Za-z0-9]{1,8})/This specifies to the Pattern Matcher to search for a string consisting
of “login:” followed immediately by 1 to 8 alphanumeric characters.
The “(?$XL” construct specifies to the engine to store into the $X register
the portion of the match that falls within the parentheses, in this example,
the 1 to 8 alphanumeric characters, with the capture starting from the left
edge of the data matching the expression.
Example 2: exp /LENGTH=(?$YH[0-7][0-9A-Fa-f]{2})/This specifies to the Pattern Matcher to search for a string consisting
of “LENGTH=” followed immediately an ASCII-encoded hexadecimal number
from 000 to 7FF, and to store the number into the $Y register.
Stateful RulesStateful Rules provide the basic constructs to build Finite State Machine-based
applications such as tracking application layer protocol states, parsing application
layer messages and matching complicated patterns.
Stateful Rule ExampleThe syntax and semantics of Stateful Rules are very intuitive. The following
example is intended to give readers a feel of how stateful rules can be used.
This Stateful Rule classifies HTTP traffic by matching HTTP request and response
message patterns and the stateful relationship between them.
# Define Regex Signatures of HTTP Request and Responsehttp_request /^(GET|POST)\s.*?HTTP\/1\.\d$/
http_response /^HTTP\/1\.\d\s200\sOK$/
Option Description
s Dot matches new line in addition to other characters
i Case-insensitive match
m ^ and $ match at line breaks
tag=0x<hex value> tag = <decimal value>
32-bit tag value returned in the match report
set=<set value> Select mutually exclusive pattern set from 0 to 255
subsets=0x<hex value> subsets=<decimal value>
Select pattern subsets, 16-bit mask per pattern set
stream Treat anchors relative to the stream. Default is relative to the work-unit.
noreport Flag to suppress pattern match report generation