Tech Note: Understanding the rules.MailRules file
IS 7.1 marks the introduction of several new facilities to help the admin deal with the problem of unsolicited email (SPAM). These facilities and their use are outlined in the tech note Securing Internet Services 7.1. One of these facilities, the rules.MailRules file, allows the admin to use a scripting language to customize how IS processes incoming SMTP mail messages. This document is intended to help explain the rules.MailRules file to admins, many of whom do not use scripting languages every day.
SMTP email messages are transmitted in a format called RFC-822 (now actually superseded by RFC-2822) which specifies the layout of the message. An RFC-822 message typically contains a series of lines referred to as the header fields (or just headers for short), followed by a single empty line, followed by the message body. RFC-822 headers take the following form:
Subject: Gone fishin'
^ ^ ^--The data for this header, in this case "Gone fishin'"
| |--: - there is always a colon following any header
--An RFC-822 header name, in this case Subject
It is important to note that while RFC-822 specifies many headers (for example To, From, Subject, Date) it is also possible to extend the standard set of headers with experimental headers which must begin with an X-, for example:
X-OriginalArrivalTime: 11 Feb 2003 00:38:15.0656 (UTC) FILETIME=[DD6C4E80:01C2D165]
If you are interested in the complete list of legal RFC-822 headers, you can always read the specification (available in many places, such as http://www.faqs.org/rfcs/rfc822.html). For those who are more interested in the practical applications of these headers, it is possible to learn a fair bit just by using the Client's "View->Show Internet Headers" option on some Internet mail you've received.
So, what does this all have to do with the rules.MailRules file? Simply stated, the rules.MailRules file contains a series of lines that represent RFC-822 headers to examine, a test on the content of that header, and an action to perform if the test succeeds. Before we get any farther into what you can do with the rules.MailRules file, here's a list of things it cannot do:
- It cannot script IS's handling of the SMTP conversation (RFC-821/2821) other than to NDN a message.
- It cannot decode MIME (RFC-2045+) or script handling of MIME parts (rules.AttachmentBlock can block MIME attachments).
- It cannot examine the body of an RFC-822 message.
So the script contained in the rules.MailRules file can examine RFC-822 headers and react to them. Let's have a look at what a rule actually looks like and then delve into actually writing rules. A rules.MailRules file contains one of 3 types of lines:
1) a blank line - blank lines have no meaning and are just used to make the file more readable
2) a comment line - any line starting with the "#" character these have no script meaning, again used for readability
3) a rules line - the "meat" of the file, these lines define the SMTP rules for the system
Each rules line takes the same basic form, a header to perform this test on, a test to perform, and an action to take:
X-Mailer: "Millennium Mailer" SET $spamlevel += 75
^ ^ ^ ^-- The action to take, in this case add 75 to the spamlevel variable
| | |-- The test, in this case: Does the X-Mailer header contain the data "Millennium Mailer"?
| |-- There is always a colon separating the header from the test
|-- The header on which to execute this rule
Given that each rules line breaks down into these same parts, let's look at each part to see what sorts of things it can contain.
The Header part
The main purpose of the header part is to say which RFC-822 header this rule should be run on. As such, it can contain any RFC-822 header or experimental header (X- header) that you might find in incoming SMTP mail. It would typically contain things like To, From, Subject, Date, Return-Path, etc. but there are a couple of special characters that can appear in this field:
'^' - This means "apply this rule before any headers are processed", which is useful for initializing script variables.
'*' - This means "apply this rule on every RFC-822 header", useful for scanning all headers for a particular value.
'' - (empty) This means "apply this rule after the last header has been processed", useful for reacting to data tallied by other rules.
The header part cannot contain variables, conditionals, or function calls. It can only contain an RFC-822 header name or one of the 3 special characters listed above.
The Colon part
Always stick a colon after the header part. In other words (with apologies to MP): "There shall be one colon separating the header part from the action part. No more. No less. One shalt be the number thou shalt count, and the number of the counting shall be one. Four shalt thou not count, nor either count thou three, nor two. Five is right out. Once the number one, being the first number, be reached, then, proceed to the test part."
The Test part
The condition tests in the rules scripting language fall into three basic categories:
1) A simple expression. The simple expression is a string in quotes which means "Does this string (not case sensitive) occur in the data of the RFC-822 header?". The string can contain some simple wildcard characters: '?' meaning "match any single character" and '*' meaning "match any group of characters". The quoted string can have a NOT in front of it meaning reverse the sense of the test. For example:
If the following RFC-822 header was processed by the following rules, what would happen?
Date: Tue, 11 Feb 2003 16:27:41 -0500
Date: "Feb 2003" SPAM # Condition is true, message would be marked as SPAM
Date: "*viagra*" SPAM # Condition is false (no viagra here), message would not be marked as SPAM
Date: "Tue, 11 Feb 2003 16:27:41 -0500" SPAM # Condition is true, message would be marked as SPAM
Date: NOT "200?" SPAM # Condition is false (reversed by NOT), message would not be marked as SPAM
Date: "*Feb*" SPAM # Condition is true, message would be marked as SPAM
Date: "July 2003" SPAM # Condition is false (no July here), message would not be marked as SPAM
2) A regular expression (regexp). A regexp is a well known (in UNIX-land) way of describing string pattern matching. You can find out more gory details by typing "man grep" at a UNIX command prompt or by hitting http://netbsd.gw.com/cgi-bin/man-cgi?egrep+1+NetBSD-current on the web. For our purposes, I'll say that regexp allows a finer level of control over matching than simple expressions (above) and has the additional capability of allowing the data matched to be stored away in a variable for later use. Here are some example regexp rules with explanations:
# store away X-Mailer string for later reference
X-Mailer: regexp:"\\(.+\\)" SET $xmailerstring = "\\1"
# ()'s are needed to set up a backreference, \ is needed to escape the (, and the other \ is needed to escape the first \
# .+ means match any one or more characters, since this is in ()'s, we can use \1 in the action (extra \ needed to escape the \) area
# check Received: headers for SPAMmers
Received: regexp:"\\([0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+\\)" SET $IP = "\\1"
# we want a backreference, so \\()'s are needed, the pattern is any cluster of digits followed by . (\\. in the code) followed by digits, ., etc
# we then store this IP address pulled from the header in $IP
Received: IF (@isspamip($IP)) NDN 550 "Sorry, your message has triggered a SPAM block, please contact the postmaster"
# check $IP against IP blocking list
# check if first part of domain name domain name is all numeric
From: regexp:".*@[0-9]+\\..*" SPAM
# match any characters up until the @, then if all the characters up until the first . after that are digits, this is SPAM
# check if subject is all capitals (with numbers and some punctuation)
Subject: regexp:"[A-Z\\?\\!\\.\\,0-9]+" SPAM
# if the subject is entirely made up of A-Z (uppercase only), ?, !, ., 0-9, and , then this is SPAM
3) A conditional expression. All of this string pattern matching stuff is great, but there are times in the rules scripting language when it is useful to test the state of a variable or invoke a built in function. This is what conditional expressions are used for. Conditional expressions always take the form "IF (expression)" where the expression is either a variable test or a call to a built in function. Here are some examples of conditional expressions taken from the shipping rules.MailRules file along with explanations:
# if IP is in trusted IP list, skip rules processing
^: IF (@istrustedip($senderip)) DONE
# before any headers are processed (the ^), if the sender's ip (built in variable) is in the trusted list, take the action DONE (skip further processing)
# check for from names in SPAM address list
From: if (@isspamaddress($from)) SET $spamlevel += 101 AND $spamtests += "FROM_IN_SPAM_FILTERS;"
# when processing the From header, if the from address is in the SPAM address list, set SPAM level to extreme and add the reason to the spamtests variable
# set up a crossposting variable for later tests, this shows that the condition can be (1) meaning always
: IF (1) SET $xpost = $#BCC + $#To + $#Cc
# after all headers have been processed (the "" before the :), always (IF (1)) set $xpost to the sum of the built in #To, #Cc, and #Bcc variables
# NDN extreme level SPAM if admin has chosen this option
: IF ($spamlevel > $HighSpamMax && $XtremeCausesNDN == 1) NDN 550 "Sorry, your message has triggered a SPAM block, please contact the postmaster"
# after all headers have been processed (the "" before the :), if the SPAM level is above the high threshold and admin has set XtremeCausesNDN
# then NDN the message with an SMTP 550 error
The Action part
The action part is what gives the rules system it's teeth. Once an RFC-822 header has matched a rule and the conditional part is met, it is the action which executes and alters IS's processing of this header in some way. There are 8 actions the rules system can take which can be used to alter the message that arrives, reject the message altogether, or change the behavior of the rules system. Here are the 8 actions broken down by type, with a short description of what each one does:
1) Rules that alter the delivered message
SPAM - marks message the with priority JUNK and sets the bit indicating it was a machine generated message
DISCARDHEADER - discards the current header, it will not appear in the Client's "View->Show Internet Headers" option
INJECT - injects an RFC-822 header into the message as delivered in FirstClass, it will appear in the Client's "View->Show Internet Headers" option
REPLACE - replaces an RFC-822 header in the message as delivered in FirstClass, it will appear in the Client's "View->Show Internet Headers" option
2) Rules that reject the message
DISCARDMESSAGE - NDN (with 552 Delivery Failed.) the message without delivering it to the recipients
NDN - generates an SMTP error which causes the sending server to NDN the message, this action optionally takes a parameter of the form "550 Bad thing happened." which must contain both a numeric code and associated text
3) Rules that affect rules processing
SET - used to set variables for processing by later rules
DONE - stop any further processing of rules for this message
The examples in the previous sections show the usage of several of these actions, we'll include a few more examples here:
# set the icon of a message in the low SPAM range
: IF ($LowSpamMin <= $spamlevel && $spamlevel <= $LowSpamMax) INJECT "X-FC-Icon-ID: 23050"
# after all headers have been processed, if the $spamlevel variable is in the LOW range, then add an X-FC-Icon-ID header (which sets the icon in FC)
# NOTE: there are many X-FC headers which affect message attributes such as formID, IconID, and forms data.
# These can be discovered by seeing what IS renders when a message is sent out through SMTP
# Admin settable variables are defined here
^: IF (1) SET $CrosspostLimit=15 AND $CrosspostIncr=5 AND $XpostSpamLevel=5 AND $XpostSpamIncrVal=5 AND $XtremeCausesNDN=0
# before any headers are processed (the ^), set a variety of variables that affect later processing
# in many cases the admin can simply change these values to make the default rules.MailRules file behave differently
# if the admin treats the subject block list as naughty words, this rule would discard inappropriate content without NDNing
Subject: IF (@inblocklist($subject)) DISCARDMESSAGE
# when the Subject header arrives, check if any of the text is in the subject block list, and NDN the message if it is
An important part in understanding the rules.MailRules file is understanding the SMTP conversation that delivers a message and at what point in that conversation various rules related events occur. What follows is a timeline of significant events in the transmission of a simple RFC-822 message into IS via SMTP:
Internet Services Data flow direction Other SMTP server
listening on SMTP port
<------------------------- connects to IS machine
accepts connection (now knows IP addr) ------------------------->
sends "220 is.com FirstClass v7.1..." ------------------------->
<------------------------- sends SMTP command "HELO I am server.com"
sends "250 HELO" (knows server name) -------------------------> knows it has reached the correct server
<------------------------- issues SMTP command "MAIL FROM: <user@server.com>"
if sender OK, issues "250 OK" -------------------------> moves on to recipients
if sender bad, issues "501 bad sender" -------------------------> retries or gives up
<------------------------- issues SMTP command "RCPT TO: <user@is.com>"
if recipient OK, issues "250 OK" -------------------------> moves on to transferring RFC-822 message
if recipient bad, issues "501 bad recipient" -------------------------> retries another recipient or gives up
<------------------------- issues SMTP command "DATA"
at this point, ^ rules are run
sends "354 Send message" -------------------------> starts sending RFC-822 message headers
<------------------------- sends "To: user@is.com"
at this point, To and * rules are run
<------------------------- sends "From: user@server.com"
at this point, From and * rules are run
<------------------------- sends "Subject: Hello world"
at this point, Subject and * rules are run
<------------------------- sends "<CRLF>" (a blank line signifying end of headers)
at this point, "" rules are run
starts sending body
<------------------------- sends "Hi User"
<------------------------- sends "How are you?"
<------------------------- sends "Love, User."
<------------------------- sends "<CRLF>.<CRLF>" (a blank line '.' then another blank line)
sends "250 Message Accepted" -------------------------> hangs up connection and we're done
Now that we know when things happen during the SMTP conversation, it might be helpful to run a sample message through a sample rules file and see what happens. Let's say we have the following rules.MailRules file:
# If the message is from a trusted IP, we're done
^: IF (@istrustedip($senderip)) DONE
# Admin settable variables are defined here
^: IF (1) SET $SpamMax=50
# checked for SPAMmers in Received headers
Received: regexp:"\\([0-9][0-9]*\\.[0-9][0-9]*\\.[0-9][0-9]*\\.[0-9][0-9]*\\)" SET $IP = "\\1"
Received: IF (@isspamip($IP)) NDN
#check subject
Subject: IF (@inblocklist($subject)) SET $spamlevel += 50
Subject: " " SET $spamlevel += 25
Subject: IF (@allcaps($subject)) SET $spamlevel += 25
# an errors-to makes something less likely to be SPAM
Errors-To: "*@*" SET $spamlevel -= 20 AND $spamtests += "-ERRORS_TO;"
# If any header says Viagra, this is junk
*: "Viagra" SET $spamlevel += 25
# rules to deal with SPAM level, processed at the end of the headers
: IF ($spamlevel >= $SpamMax) NDN 550 "Sorry, your message has triggered a SPAM block, please contact the postmaster"
And the following message is fed in, what will happen:
Other mail server connects to IS
SMTP pleasantries are exchanged up to DATA command
^ rules are run, so SpamMax is now 50, and if this other mail server is at a trusted IP address all rules processing stops
To: user@is.com arrives on SMTP channel, * rule runs, "user@is.com" is compared to "Viagra", nothing happens
From: user@is.com arrives on SMTP channel, * rule runs, "user@is.com" is compared to "Viagra", nothing happens
Subject: HI THERE!! arrives on SMTP channel
rule "Subject: IF (@inblocklist($subject)) SET $spamlevel += 50" runs, nothing happens
rule "Subject: " " SET $spamlevel += 25" runs, $spamlevel is now 25
rule "Subject: IF (@allcaps($subject)) SET $spamlevel += 25" runs, $spamlevel is now 50
* rule runs, "HI THERE!!" is compared to "Viagra", nothing happens
"<CRLF>" blank line arrives triggering "" rules
rule ": IF ($spamlevel >= $SpamMax) NDN 550 "Sorry, your message has triggered a SPAM block, please contact the postmaster"" runs
since $spamlevel is 50, message is NDNed
Other mail server sees "550 Sorry, your..." and gives up.
Another message comes in and process starts over with a fresh set of variables...
Understanding variables and function in rules
Since the rules.MailRules file that ships with IS uses variables to implement it's features, it is important to understand what they are and how they are used. Variables are placeholders which can store some data for the duration of the processing of a single message. They can be tested using conditional expressions, and assigned using the "SET" action. There are two kinds of variables in the rules system:
built-in - these are defined by the rules system and contain data about the message being processed, some can be SET to alter the message's attributes
user-defined - these can be defined by any rule writer by putting $<sometext> into your rule
One important thing to remember is that either type of variable is that they must be set to some value before they are used. For example if you write a rule that says:
:IF ($myvar > 50) NDN 550 "Bad doggie"
...without having done a rule that says:
^: IF (1) SET $myvar = 51
...then your rule will never execute, since $myvar has no value.
The same goes for built-in rules, for example, using $#To before the To: header has been received will return zero.
Functions give the rule author access to built in functionality of IS, such as testing to see if an address is in the filters list. They can be used in conditional expressions and they return TRUE or FALSE, allowing the rule author to perform an action based on the result of the function. Rule authors cannot define their own functions. What follows is a list of the built-in variables and functions, along with a brief description of each:
@inblocklist(<string> or <variable>[, <case>]) // <case> is "yes", "no", "true", "false"
// default is "yes"
@seenheader(<string> or <variable>)
@istrustedip(<string> or <variable>)
@istrustedaddress(<string> or <variable>)
@isspamip(<string> or <variable>)
@isspamaddress(<string> or <variable>)
@islocaladdress(<string> or <variable>)
// the following behave as in IS script statements
@split(...)
@substr(...)
@length(...)
@indexof(...)
@upper(...)
@lower(...)
@rand()
// built in variables
// Name Values Readonly/Readwrite
$MachineGenerated "1", "0" Readwrite
$Priority "Normal", "Urgent", "Bulk", "Junk" Readwrite
$IsNewsArticle "1", "0" Readonly
$IsSpammer "1", "0" Readwrite
$MessageID <contents of Message-ID header> Readonly
$Subject <contents of subject header> Readwrite
$From <contents of From: header> Readonly
$Sender <contents of MAIL FROM:> Readonly
$#To <number of To: recipients> Readonly
$#Cc <number of Cc: recipients> Readonly
$#BCC <number of BCC recipients> Readonly
$HaveReplyTo "1", "0" Readonly
$HaveResentReplyTo "1", "0" Readonly
$SenderIP <IP address of sending SMTP host> Readonly
$MyIP <IP address of this host> Readonly
$Authenticated "1", "0" Readonly
$AuthCanRelay "1", "0" Readonly
What does the default rules.MailRules file do?
Many admins may opt to run the default rules.MailRules file or to just tweak the rules to suit their needs. For these people it is very important to understand the default rules, what each one does, and where they can tweak things easily. In order to help accomplish this, we're including the entire text of the default rules.MaileRules file in blue below. Additional comments and explanations will be added in red:
The first 2 rules are applied before any RFC-822 headers are received. These rules are designed to allow trusted sites and IP addresses to bypass rules processing. If you don't want this behavior, simply comment out these lines by adding a '#' in front of each.
# If the message is from a trusted address or site, we're done <==== Comment line
^: IF (@istrustedaddress($sender)) DONE <==== Before headers are processed, if $Sender is a trusted address, skip further rules processing
$Sender is a predefined variable containing the contents of SMTP "Mail From" command
^: IF (@istrustedip($senderip)) DONE <==== Before headers are processed, if $SenderIP is a trusted IP address, skip further rules processing
$SenderIP is a predefined variable containing the IP address of the sending SMTP server
The next 2 lines actually contain a commented out (inactive) rule. This rule, if commented back in, would run when the RFC-822 "From" header appears and would test to see if the address it contained was a trusted address. This rule is commented out since it is not uncommon for SPAMmers to "fake" that their message is from a local or trusted sender by putting a local address in this header. An admin could reactivate this rule by removing the '#' if they felt that this test was worthwhile for their site.
# The From address is more easily spoofed, so less trustworthy <==== Comment line
#From: if (@istrustedaddress($from)) DONE <==== Commented out rule, this is inactive unless the '#' is removed
The 7 lines that follow set up variables that the admin can use to control the operation of some of the rules. The idea is that the admin can tweak various thresholds and limits, thereby changing when a rule kicks in and how it affects the SPAM score.
# Admin settable variables are defined here <==== Comment line
^: IF (1) SET $CrosspostLimit=15 AND $CrosspostIncr=5 <==== These two variables affect the cross post rules. The idea is that the more recipients a message has,
the more likely it is to be SPAM. The CrosspostLimit says that any message with over 15 total
recipients is likely to be SPAM. The CrosspostIncr says that for every 5 recipients thereafter
the message becomes even more suspicious. An admin can increase these values to decrease the
chances of crossposting causing a message to be marked as SPAM. IMPORTANT: if you use crossposting
as an indicator of SPAM, you will need to add any mailing lists you want to receive to your trusted
addresses list, since they often contain large numbers of recipients.
^: IF (1) SET $XpostSpamLevel=5 AND $XpostSpamIncrVal=5 <==== These two variables also affect the cross post rules, but these affect the SPAM score that the rules
generate. XpostSpamLevel is the amount that will be added to the SPAM score as soon as the initial
cross post limit is exceeded. XpostSpamIncrVal is the amount that will be added to the SPAM score
for each group of $CrosspostIncr additional recipients. So, using the values from these variables
and the line above, a 12 recipient message would get no SPAM score, a 16 recipient message would
score 5, a 22 recipient message would score 10, and a 100 recipient message would score 90.
^: IF (1) SET $XtremeCausesNDN=0 <==== When the SPAM score of a message exceeds HighSpamMax (defined below) then the SPAM warning
level of the message is said to be Extreme. This variable allows the admin to define the handling of
Extreme level messages. As set by default (0) this variable allows the rules system to deliver
Extreme score SPAM to users with indicators similar to High score SPAM messages. If this variable
is set to 1, Extreme score SPAM will be NDNed by IS and will not be delivered to the user.
# Changing these variables requires consideration of the various values <==== Comment line
# of the individual spam tests <==== Comment line
^: IF (1) SET $LowSpamMin=10 AND $LowSpamMax=25 <==== These variables define the range of SPAM scores that will be treated as a Low level SPAM warning.
The default settings say that SPAM scores below 10 will be treated as no SPAM score at all, while
scores in the 10-25 range will be treated as Low, getting the Low warning icon, an X-SPAM-Warning
header, and an X-SPAM-Level header.
^: IF (1) SET $MedSpamMax=50 AND $HighSpamMax=100 <==== These variables define the range of SPAM scores that will be treated as a Medium and High level SPAM
warning. The default settings say that SPAM scores between 25-50 will be treated as Medium level
SPAM, while scores in the 50-100 range will be treated as High. Each of these ranges marks the
message as JUNK, marks it as machine generated, gives it the appropriate warning icon, an
appropriate X-SPAM-Warning header, and an X-SPAM-Level header.
The following 2 rules combine to scan the Received header for an IP address and then check it against the SPAM IP list. The theory is that if a known SPAMmer is in the list of hosts that routed this message, then it is likely SPAM.
# <==== Comment line
# rules to set SPAM level <==== Comment line
# <==== Comment line
# check received headers for SPAMmers <==== Comment line
Received: regexp:"\\([0-9][0-9]*\\.[0-9][0-9]*\\.[0-9][0-9]*\\.[0-9][0-9]*\\)" SET $IP = "\\1" <==== Use a regexp to scan the Received header for an IP
address and store it in $IP. The '\\' construct is used
throughout to escape the '\' character. The '\(' gets the
round brackets escaped properly, since round brackets
&nbs |