It seems like your browser didn't download the required fonts. Please revise your security settings and try again.
Barracuda Web Security Gateway

How should I use regular expressions with my Barracuda Web Filter?

  • Type: Knowledgebase
  • Date changed: 2 years ago

Solution #00006365



Applies to all Barracuda Web Security Gateway appliances on all versions of firmware.

What follows is a comprehensive guide to assist you in creating and using regular expressions ('regex') on the Barracuda Web Filter.

Regular expressions can be daunting at first. A 'regex' can be described as "math for the written word." This can be as simple as 3+5 or as complicated as differential calculus. However, if all you need to catch are simple words or phrases, chances are your custom rules can be constructed in a WYSIWYG (What You See Is What You Get) fashion. If you want to block all mail that has "hot girls" in it, the regular expression hot girls will solve your problem.

A common application of regex technology is to enforce company policy (foul language) or other fairly static requirements (e.g. whitelisting the subject of company newsletters). Using it to combat the latest malware trends is not a good idea unless you are going to maintain the list and remove the entries that are no longer needed.

Each regular expression is evaluated on a line-by-line basis. When regular expressions are saved, they are compared against an internal dictionary. Any collisions against the dictionary are presented as warnings. The warnings are simply meant to make you aware of things. If something doesn't compute, you will get an error.

There are several things that may not be apparent at first glance when using regexes on the Barracuda Web Filter. These are important and should be taken into account when deciding how to implement your filters.

  1. The Barracuda Web Filter is case insensitive. It will force all of your regexes to lowercase before comparing it against incoming mail in a case insensitive fashion. This means that SPOTLIGHT and spotlight will match the same thing.This also means that \W doesn't do what you expect. It is converted into \w before being compared (and will match the opposite of what you intended, since the backslash forces a literal interpretation of the converted lowercase w). Caveat: this is currently not the case with extended ascii characters, meaning that À will not be translated to à.
  2. The Barracuda Web Filter does partial word matching unless otherwise specified. This means that the regular expression cialis is not what you want (unless you want to block specialist and socialism as well). This can work for or against you, so keep aware of this. If you want to force matches against whole words, use word boundaries by using the special \b character like this: \bcialis\b. Word boundaries are covered in greater detail below.
  3. There is some text pre-processing going on. The filters you write are not compared against the raw URL (and this is a good thing). There is some parsing and conversion that is handled ahead of time. The most common occurrence of this is =20, which is converted to a space.This also shows up where matches are made against foreign language characters that don't fit in the standard ascii character set. If you look at the raw source of one of these messages you won't see a ? (you see a =E4 instead). The Barracuda Web Filter understands these and does the conversion so that having ? as a filter will catch those messages. Note: the exact translation may depend on the character set defined.

There are three different pages in the Barracuda Web Filter's interface that allow you to enter and use regular expressions.

  1. On the Block/Accept->URL Patterns page. Use this page to create blacklists and whitelists for URLs that contain specific patterns or keywords. You can create URL pattern filters for either unauthenticated or authenticated users.
  2. http(s )://.*\.uk/  

    http(s )://.*\.ru/

    http(s )://.*\.au/

  3. On the Block/Accept->Exceptions page. Use a URL Pattern Exception to override other Content Filter, Domain, Application, URL Pattern or MIME type policies configured elsewhere on the Barracuda Web Filter.
  4. On the Advanced->Proxy page. Enter regular expressions against which to match domains which you want to exempt from proxy authentication. This is necessary for applications and domains that do not offer support Kerberos or NTLM authentication.

There are three basic types of characters that are used when constructing a regex:

1. Normal characters: letters and numbers, the ascii character set. If it is not mentioned as a metacharacter or special character, it probably falls in here.

2. Metacharacters: characters that modify or expand the usage of other characters. Here are some of the available metacharacters:
^ - Start of line. If at the start of a [...] statement, it will match the opposite of what is contained in the [...] statement.
$ - End of line.
. - Any single character (excluding the newline character).
- Logical OR operator.
{...} - Repetition statement. Can be a fixed number, like {5}, or a range, like {2,6}.
[...] - Explicit characters to match.
(...) - Logical grouping of contents.
* - Zero to twenty repetitions of preceding character.
+ - One to twenty repetitions of preceding character.
? - zero or one repetitions of preceding character.
\ - Escape character, which can be used to match a literal instance of the above characters.

3. Special characters: all of these are a backslash, followed by a letter. Here are some of the available special characters:
\b - Word boundary, between [a-z_0-9] and [^a-z_0-9]. These are found at the start and end of lines, when applicable.
\t - Tab.
\n - Newline.
\s - Any blank character, including space, tab, newline, carrage return, form feed, and vertical tab.
\d - Any digit, shorthand for [0-9].
\w - Any word character, shorthand for [a-z0-9_].

Additional notes:
Here are some simple examples that might help.

sex will match:
And so on.

^Hi there will match any line starting with
hi there

nice day$ will match any line ending in:
nice day

will match:
part times
And so on, but not:
There has to be a single something in between the words.

the (good bad ugly) dog will match:
the good dog
the bad dogs
the ugly doggy

the go{1,3}d cat
will match:
the god cat
the good cats
the goood caterpillar

free c[aàáâãäåÀÁÂÃÄÅ]sh will match:
free cash
free cAsh
free càsh
free cÃsh

And so on. Note that the Barracuda Web Filter doesn't know how to force lowercase the extended ascii character set - so you would need to specify both cases, like ä and Ä.

my [a-z]* feet will match:
my fast feet
my large feet
my supercalifragilistic feet

And so on. It will not match:
my supercalifragilisticexpialidocious feet
Because "supercalifragilisticexpialidocious" contains more than 20 characters.

win \$\d+ now will match:
win $7 now
win $100 now
win $654000 now

\bsexy?\b will match:

Link to this page: