Time format strings in IS templates
Here's a full list of the % specifiers in IS time format strings:
a - abbrev weekday name
A - full weekday name
b - abbrev month name
B - full month name
c - fixed format date and time (Mon,22 Feb,99 3:15 PM)
#c - alternate fixed format date and time (Monday,22 February,1999 12:43:54 AM)
d - day of month as decimal number - 0 padded
#d - alternate, no 0 padding
D - calendar date link format YYYYMMDD
e - day of month as decimal number
H - hour as decimal number (0 padded) - 24 hour clock
#H - alternate, no 0 padding
i - hour as decimal number - 12 hour clock
I - hour as decimal number - 12 hour clock - 0 padded
#I - alternate, no 0 padding
j - day of year as decimal number (0 padded) - 001-366
#j - alternate, no 0 padding
m - month as decimal number (0 padded)
#m - alternate, no 0 padding
M - minute as decimal number (0 padded)
#M - alternate, no 0 padding
N - date as number of seconds
p - AM/PM for 12 hour clock
P - AM/PM for 12 hour clock, numeric form
S - second as decimal number (0 padded)
#S - alternate, no 0 padding
W - week number of year as decimal number (0 padded) - Monday based
#W - alternate, no 0 padding
U - week number of year as decimal number - Sunday based 0 padded
#U - alternate, no 0 padding
w - weekday as decimal number
x - fixed format date (22 Feb,99)
#x - alternate fixed format date (22 February,1999)
X - fixed format time (6:23:33)
y - year without century as a decimal number, 0 padded
#y - alternate, no 0 padding
Y - year with century as a decimal number, 0 padded
Z - Timezone as numeric
z - Timezone as name
HTML mail handling in FC7 Internet Services
One of the features of the FC7 IS release is an improved handling of Internet email containing HTML body content. In order to fully grasp what we mean by this vague statement, I'll start things off with a bit of a primer about how Internet mail is handled in FirstClass.
Overview
Internet mail arrives at IS over the SMTP protocol. The format it is in is generically referred to as RFC-822. A further advancement which is now pretty universal is MIME (Multipart Internet Mail Encoding). So, MIME messages come in over SMTP and are converted by IS into the FirstClass message format. The RFC header information (which includes things like To:, Subject:, Date:, etc.) is converted into the corresponding FirstClass fields and the body is punched in. We also store the full set of RFC headers in a special field in the message which is currently used to implement the client's "Show Internet headers" feature.
When IS is called on to "render" a message to a protocol that expects RFC-822 format (such as POP3 and SMTP), it opens the FirstClass message and converts the data it finds into RFC format. This rendering is an accurate portrayal of the FirstClass message, but if the message originally came in through the Internet, then some data is lost (like special RFC headers that have no FirstClass equivalent). A future enhancement in this area will be for IS to reintroduce the RFC headers it has stored aside and "blend" them into the message it is rendering.
Traditionally with IS, much if the accuracy issues have revolved around this RFC header translation, but in recent years with the increasing popularity of HTML messages the body has become more of an issue. Pre-FC7 IS's had the ability to take any text message body and inject it into the message body of the FirstClass message it was creating. If IS discovered HTML content, it would attach it to the message where it could be double-clicked and viewed in a web browser. FC7 introduced the ability for IS to convert incoming HTML message bodies into FirstClass styled message bodies.
Outbound
IS's handling of outbound mail is not significantly altered from previous versions. As the snap below shows, there has always been an option in IS to select the format of the message body, with HTML being one of the choices. This has traditionally been (and to some degree remains) frowned on by Internet watchdog types who rightly point out that such messages cannot be read properly at all destinations, but HTML mail is certainly so prevalent now that some admins will turn this on. IS does not generate multi-part alternative MIME (where there are alternate bodies, one in text, one in HTML) when this option is turned on, so we are considered more "rude" than mailers which do. We will be addressing this in a future release.
Inbound
FC7 IS introduces the ability for admins to select from one of three ways of handling inbound HTML messages, which is done on the Advanced Mail form with the Inbound HTML field.
The three choices are:
• Add as attachment - legacy choice, works like 6.1 IS
• Decode into message body - IS translates the HTML into FC styles and discards the HTML
• Decode and add as attachment - IS translates the HTML into FC styles and attaches the original HTML
This all looks pretty obvious, but there are some subtle issues with the translation of inbound HTML which are useful for admins to understand. There are really 2 kinds of HTML message that by their nature are handled a bit differently. The first type is MIME encapsulated HTML. This type of HTML message is generated by most email programs which allow users to compose HTML mail (like OE, Netscape, etc.). In this type of HTML message, the RFC message which arrives contains not only the complete message, but also all of the graphics and other elements that are embedded in the page. This type of message can generally decodes very well, since all of the content "travels" with the message. IS generates MIME encapsulated HTML and most business to business email takes this form
The other type of message really represents the mailing of a web page to a destination. In this form of message, the HTML body contains the "skeleton" of the message, and all of the "embedded" content (such as images) are represented as links back to a web site. This format takes advantage of the way that browsers work so that when the HTML is loaded in the browser, it automatically goes off and fetches the rest of the content. This type of message is most commonly used by automated mailing systems and is very popular with spammers since it generates "traffic" to their site whenever such a message is opened. Even more sinister is that these messages can contain links which, in conjunction with applications running on a Spammer's web server, can track information about those who open these messages. IS makes
no attempt to "pull" the embedded content of this type of message and so the resulting message in a person's mailbox often appears to be poorly formatted and littered with links. While the user experience on this type of message is slightly poorer, there is are significant security and performance advantages to this approach.
Regular expressions and the HeaderMatch document
The HeaderMatch document uses UNIX style regular expressions to match various HTTP headers against a pattern. The regular expression meta-characters IS supports in order to make up the patterns to match against are:
\ The backslash is an escape character. It indicates that the next character should be interpreted literally, instead of as a wildcard
(e.g., '\.' means a literal period, not a wild card character.) Use \\ to indicate a literal backslash.
. This is a character wild card. It will match any one character (except the end of line character).
? This is means 'match zero or one of the immediately preceding character or class'.
* This is means 'match zero or more of the immediately preceding character or class'.
+ This is means 'match one or more of the immediately preceding character or class'.
^ This meta-character has a dual meaning.
- If this is first character in a pattern, it means that the first matched character must also be the first character in the line.
- If this occurs immediately before a character class, it negates that class.
The first meaning takes precedence over the second (e.g., a regular expression ^[a-z] matches only if the first character of the line is a lowercase letter.).
If you wish to use both meanings, you can use two ^ signs (e.g., a regular expression ^^[0-9] matches if the first character of the line is not a digit).
$ If this is the last character in a pattern, it means that in order to match the last matched character must also be the last character of the line.
[ This indicates the beginning of a character class (see below)
] This indicates the end of a character class (see below)
Other Any character other than the ones above is interpreted literally (e.g., a pattern '^This' would match only if the beginning of the line had the literal characters 'T' 'h' 'i' 's').
Character classes:
A character class is a way of defining a range of characters allowed to match. They occur within square brackets. In these classes you may place any collection of literal characters (e.g. a class [aeiou] would match any lowercase vowel) or you can use a hyphen to designate a range of characters (e.g. [a-z] would match any lowercase letter). Note that when designating ranges you must ensure that the first characters numeric value is less than the last characters numeric value. For standard alphanumeric characters this is fairly easy (they obey the expected english conventions of the ASCII table (e.g. a < b < c ... < z)), but this is not the case for extended characters. Consult the appropriate charset table to determine ordering).
All pattern matching is case-sensitive. If you need case-insensitivity you will have to use character classes (e.g. [Uu][Nn][Ii][Xx] would match any case variant of "UNIX"). This also applies to character classes (e.g. to match any letter you would need the class [A-Za-z])
If you need more information on this topic, find a copy of O'Reilly's "UNIX In A Nutshell" (I'm sure QA has a copy of this somewhere, or if not the dev library should have one.) and look up the "grep" command. (or find a UINX box and enter "man grep" on the command line).
If you would like to see additional features by the author of this article, click here.
|