We use cookies on our website to ensure we provide you with the best experience on our website. By using our website, you agree to the use of cookies for analytics and personalized content.This website uses cookies. More Information
It seems like your browser didn't download the required fonts. Please revise your security settings and try again.
Barracuda Email Security Gateway

How should I use regular expressions with my Barracuda Spam Firewall?

  • Type: Knowledgebase
  • Date changed: 10 months ago
Solution #00003273

Scope:
All Barracuda Spam Firewalls, all firmware versions.

Answer:

What follows is a comprehensive guide to assist you in creating and using regular expressions on the Barracuda Spam Firewall.

Regular expressions can be daunting at first. A regex can be described as "math for the written word." This can be as simple as 3+5 or as complicated as differential calculus. However, if all you need to catch are simple words or phrases, chances are your custom rules can be constructed in a WYSIWYG (What You See Is What You Get) fashion. If you want to block all mail that has "hot girls" in it, the regular expression hot girls will solve your problem.

A common application of regex technology is to enforce company policy (foul language) or other fairly static requirements (e.g. whitelisting the subject of company newsletters). Using it to combat the latest spam trends is not a good idea unless you are going to maintain the list and remove the entries that are no longer needed.

Each regular expression is evaluated on a line by line basis. When regular expressions are saved, they are compared against an internal dictionary. Any collisions against the dictionary are presented as warnings. The warnings are simply meant to make you aware of things. If something doesn't compute, you will get an error.

There are several things that may not be apparent at first glance when using regexes on the Barracuda Spam Firewall. These are important and should be taken into account when deciding how to implement your filters.
The Barracuda Spam Firewall is case insensitive. It will force all of your regexes to lowercase before comparing it against incoming mail in a case insensitive fashion. This means that SPOTLIGHT and spotlight will match the same thing.

This also means that \W doesn't do what you expect. It is converted into \w before being compared (and will match the opposite of what you intended). Caveat: this is currently not the case with extended ascii characters, meaning that À will not be translated to à.
The Barracuda Spam Firewall does partial word matching unless otherwise specified. This means that the regular expression cialis is not what you want (unless you want to block specialist and socialism as well). This can work for or against you, so keep aware of this. If you want to force matches against whole words, use word boundaries by using the special \b character like this: \bcialis\b. Word boundaries are covered in greater detail below.
There is some text preprocessing going on. The filters you write are not compared against the raw message body (and this is a good thing). There is some parsing and conversion that is handled ahead of time. The most common occurrence of this is =20, which is converted to a space.

This also shows up where matches are made against foreign language characters that don't fit in the standard ascii character set. If you look at the raw source of one of these messages you won't see a Д (you see a =E4 instead). The Barracuda Spam Firewall understands these and does the conversion so that having Д as a filter will catch those messages. Note: the exact translation may depend on the character set defined.
There are three different pages in the Barracuda Spam Firewall's interface that allow you to enter and use regular expressions.
On the Block/Accept > Subject Filtering page. These rules are only compared against the subject of the email.
On the Block/Accept > Body Filtering page. These rules are compared against every mime part of the message that the barracuda knows how to process. This includes (but is not limited to) txt, html, rtf, attached html documents, and forwarded messages.
On the Block/Accept > Header Filtering page. These rules are compared against the headers of each message. Keep in mind that since content filtering is done before the scoring of the message, attempting to block everything that triggers a specific spam scoring rule on the Barracuda Spam Firewall won't work.
There are three basic types of characters that are used when constructing regex:
Normal characters: letters and numbers, the ascii character set. If it's not mentioned as a Metacharacter or Special character, it probably falls in here.
Metacharacters: characters that modify or expand the usage of other characters. Here are some of the available metacharacters:
^ - Start of line. If at the start of a [...] statement, it will match the opposite of what is contained in the [...] statement.

$ - End of line.
. - Any single character (excluding the newline character).

  • Logical OR operator. {...}

    - Repetition statement. Can be a fixed number, like

    {5}

    , or a range, like

    {2,6}

    .
    [...] - Explicit characters to match.
    (...) - Logical grouping of contents.

  • - Zero to twenty repetitions of preceding character.
    + - One to twenty repetitions of preceding character.
    ? - zero or one repetitions of preceding character.
    \ - Escape character, which can be used to match a literal instance of the above characters.
    Special characters: all of these are a backslash, followed by a letter. Here are some of the available special characters:
    \b - Word boundary, between [a-z_0-9] and [^a-z_0-9]. These are found at the start and end of lines, when applicable.
    \t - Tab.
    \n - Newline.
    \s - Any blank character, including space, tab, newline, carrage return, form feed, and vertical tab.
    \d - Any digit, shorthand for [0-9].
    \w - Any word character, shorthand for [a-z0-9_].

Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)

POSIX character classes (US-ASCII only)
\p

{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character:[A-Z]
\p{ASCII} All ASCII:[\x00-\x7F]
\p{Alpha} An alphabetic character:[\p{Lower}

\p

{Upper}

]
\p

{Digit} A decimal digit: [0-9]
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}

]
\p

{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~
\p{Graph} A visible character: [\p{Alnum}\p{Punct}

]
\p

{Print}

A printable character: [\p

{Graph}

]
\p

{Blank}

A space or a tab: [ \t]
\p

{Cntrl}

A control character: [\x00-\x1F\x7F]
\p

{XDigit}

A hexadecimal digit: [0-9a-fA-F]
\p

{Space}

A whitespace character: [ \t\n\x0B\f\r]

Line terminators

A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:

A newline (line feed) character ('\n'),
A carriage-return character followed immediately by a newline character ("\r\n"),
A standalone carriage-return character ('\r'),
A next-line character ('\u0085'),
A line-separator character ('\u2028'), or
A paragraph-separator character ('\u2029).
If UNIX_LINES mode is activated, then the only line terminators recognized are newline characters.

The regular expression . matches any character except a line terminator unless the DOTALL flag is specified.

By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence.

Additional Notes:

Here are some simple examples that might help.
sex will match:
sex
sexy
Essex
sexist
sexton
sexual
Sussex
And so on.
^Hi there will match any line starting with
hi there
nice day$ will match any line ending in:
nice day
part.time will match:
part-time
part_time
partstimer
part times
partДtime
And so on, but not:
parttime
There has to be a single something in between the words.
the (good|bad|ugly) dog will match:
the good dog
the bad dogs
the ugly doggy
the go

{1,3}

d cat will match:
the god cat
the good cats
the goood caterpillar
free c[aàáâãäåÀÁÂÃÄÅ]sh will match:
free cash
free cAsh
free càsh
free cÃsh
And so on. Note that the Barracuda Spam Firewall doesn't know how to force lowercase the extented ascii character set - so you would need to specify both cases, like ä and Ä.
my [a-z]* feet will match:
my fast feet
my large feet
my supercalifragilistic feet
And so on. It will not match:
my supercalifragilisticexpialidocious feet
Because "supercalifragilisticexpialidocious" contains more than 20 characters.
win \$\d+ now will match:
win $7 now
win $100 now
win $654000 now
\bsexy?\b will match:
sex
sexy

To find ID BARRACUDA101us use expression: BARRACUDA101us
To find all IDs starting with Barr use expression Barr.*
To find all IDs ending with Z use expression .*Z
To find all IDs with 10 somewhere in the text use expression: .10.
To find all IDs with a number somewhere in the text use expression: .[/d].*

Link to This Page:
https://campus.barracuda.com/solution/50160000000H8SvAAK