 |
How Internet Services Works
Internet Services (IS) is a series of protocol modules that expand the FirstClass server’s functionality to encompass popular Internet protocols. It is implemented as a single program which uses Centrinity’s cross platform portability libraries as a base, FCP as a communication backbone, and Centrinity's cooperative multi-tasking kernel at it’s core. Since all of the protocol modules included in IS are Internet related, IS is able to use a lot of common code for address translation, content translation, and Internet networking. This Internet functionality "toolkit" is made up of C++ objects that can be bound together in various configurations to perform such tasks as translating Internet protocols to/from FCP.
IS was designed to meet a set of requirements that were specified to maximize scalability, security and reliability, reduce coding effort, and minimize code size. These requirements are:
1) IS must translate between Internet protocols and FCP, in both directions, with high fidelity
2) Translation must be done in real time, without using local disk storage
3) IS must handle multiple conversations simultaneously
4) IS needs server-like reliability
5) Cross platform portability
The high fidelity translation requirement is pretty straightforward, it is a feature that has universal appeal. As well, since data is coming in from the Internet, being stored in FirstClass format, and then being rendered back out again, any effort that raises the fidelity of the translations will improve the experience for those using exclusively Internet protocols to access the FirstClass data store. The requirement that specifies no local disk storage is a little more subtle to explain. This is really a prerequisite of all Protocol Modules, and is based on some fairly good arguments in the scalability, security and reliability areas. Almost all non-trivial gateways need message storage of some kind. Some of the most difficult code to produce is the stuff that safely and efficiently manages disk storage. When Protocol Modules are
written to use the FirstClass data store, rather than re-implementing their own schemes, they are less complex to write (faster to market, fewer bugs/more reliable), they inherit a strong disk subsystem (high throughput, less chance of data loss), and they get a permissions subsystem (highly secure, directory driven) for free. The multiple conversation requirement is really just a statement of what the market expects. While a simple SMTP gateway could get away with a one-at-a-time design, the FirstClass server is positioned as a message switch, a web server, and a directory server. It is expected that this sort of software handle large numbers of concurrent requests. The reliability issue follows a similar argument. Unless the IS module could boast reliability similar to the FirstClass server's, it would be difficult to deploy the product as an Internet server. Those who use Internet protocols to access their data want high levels of service both in terms of speed and availability.
Since IS is an extension of the server's functionality, it makes sense that it should run on all platforms that server can run on, hence the cross platform requirement.
With a grasp of what the IS designers were trying to accomplish, we have the foundation to move on and look at the product's design. As mentioned before, IS is built on Centrinity’s cross platform portability layer. This not only provided us with a large amount of optimized and debugged code, it also allowed us to concentrate our efforts on core Internet functionality instead of platform implementation details. The need to handle large numbers of simultaneous connections led to the choice of the Centrinity’s multi-tasking kernel as IS's tasking model. High fidelity translation is accomplished through the use of document translation classes, C++ objects that can be bound together in different configurations to translate FCP content to and from the various Internet content formats.
To better understand how IS works, we can examine how an Internet protocol connection is serviced by IS. In order to service the connection, an object that understands the Internet protocol in question is needed. As well, an object is needed to manage the FCP connection to the server. These objects are then bound together with a content translation “stack”. The entire lot is run as a task under the multi-tasking kernel.
While IS does have a few local files, most of the data it needs is actually downloaded from the server during the IS’s initial connection. This data falls into two general categories, configuration data, and Internet alias information. The configuration data is represented by the IS forms stored on the server, and is used to control how IS works, most significantly, which protocols it will support and how much simultaneous traffic it will handle. The Internet alias data is exported from the server’s directory, and is stored by IS for use in translating between FirstClass addresses and Internet aliases. It is important to note that IS does not know about every deliverable address on the server, just those with Internet aliases. Any addresses it does not have an alias for are programmatically translated between FirstClass and
Internet formats by IS.
As mentioned above, the basic IS model is to convert data between an Internet connection and an FCP connection. Examining the way these FCP connections are managed, gives insight into how IS works. The first connection IS makes as it starts up is the configuration connection. This connection is used by IS to get the data from the IS configuration forms, and to get the alias information from the server's directory. When all configuration data has come down, this connection is logged out. After startup IS establishes it's permanent connections, which stay logged in for the entire time IS is running. The first of these is referred to as the gateway connection. This connection is used to feed in both monitor data and all incoming SMTP, NNTP, and POP3 importer messages. The next two permanent connections are the SMTP and NNTP connections,
which are used to transmit all outgoing messages for these protocols. If one of these protocols is disabled, the corresponding connection will not be made. Note that the three permanent connections correspond to the Gateway Services (see Protocol Modules section for details on types of services) portions of IS, and that each one is logged in as the Internet gateway.
The remaining FCP connections used by IS are transient, and are logged in and out to service Internet connections to the Client and Directory Services protocol modules. These connections are typically logged in as a particular user, although "anonymous" connections of various types do log in as the Internet gateway.
Let's look at each major component of IS, what it does, and how it relates to the other components. The components can be divided into sections based on their general role in IS.
Support modules
These parts of IS perform functions useful to all of IS, and can be thought of as the framework that the rest of IS is built on. The Configuration module is used to manage IS's initial connection to the server. It collects and stores the IS configuration forms data for later retrieval by other IS modules. It also collects the incoming alias information which it passes in to the Name Translator. The Name Translator is the interface module used to store and access the alias information retrieved from the server at startup. It provides functions that allow the other parts of IS to translate addresses between FirstClass and Internet addresses. The Resolver is used when IS needs to access a DNS to resolve a domain name. It access DNS's using the UDP-based resolver protocol and caches the results based on parameters found in the records, and
defaults chosen by the administrator. Note that IS does not resolve domain names this way when making a connection to a site, instead it uses this to retrieve MX records, useful in resolving Internet email addresses. The Dispatcher is IS's "listener", accepting and distributing incoming TCP connection on behalf of the rest of IS.
Document translation
These modules represent the translation stacks that bind Internet protocol handlers to FCP handlers. There are two major categories of translator: RFCe - RFC encoders which translate FirstClass message and document content to Internet content, and RFCd - RFC decoders which translate Internet content to FirstClass formats. These modules are made up of dozens of translators which understand content types that IS supports, like MIME, HTTP, etc.
Gateway Services
These modules implement IS's three Gateway Services, SMTP, NNTP, and the POP3 importer. The Gateway Server manages IS's gateway session to the server. This is the connection that all monitor and incoming message data is sent across. This modules is referred to as a server, since it accepts a login from the FirstClass server, and presents it with a "desktop". It is from this "desktop" that the FirstClass server retrieves incoming email, conference content, and monitor data. The SMTP Server implements IS's SMTP server, handling incoming SMTP connections, and routing data in through the connection managed by the Gateway Server. The NNTP Server does the same for the NNTP protocol. The SMTP Client manages the FCP connection through which outbound SMTP messages flow. It detects new arrivals in the Internet gateway mailbox
and hands the queued message off to the SMTP Message Agent for delivery. The SMTP Message Agent handles delivery of outbound SMTP messages. The NNTP Client and Message Agent perform similar functions for NNTP. The POP3 Client module manages server requests to schedule POP3 collection, creating a POP3 Message Agent for each entry in the POP3 mailbox forms. The POP3 Message Agent handles collection of POP3 messages from other servers, routing the data it finds into the server through the Gateway Server connection.
Client Services
These modules implement IS's three Client Services, HTTP, FTP, and POP3. The Internet Client is analogous to the Gateway Server, but for Client and Directory Services. In manages the transient FCP connections required by these services. The POP3, HTTP, and FTP Servers handle incoming protocol connections of the appropriate type, using the Internet Client to manage the connection to the server.
Directory Services
These modules implement IS's two Gateway Services, Finger and LDAP. The Finger and LDAP Servers implements IS's handle incoming protocol connections of the appropriate type, using the Internet Client to manage the connection to the server.
In addition to the major functional components of IS which are listed above, IS also has a toolkit framework that allows us to build very powerful features into our protocol modules. Listed below are some of the major framework components, with a brief description of how they are used to add features to IS:
Connection Filters
This component allows administrators to add lists of addresses, domain names, and IP masks which IS can query through a high speed lookup mechanism. Entries can be combined to block or allow addresses in any combination. This is currently in use in our SMTP Server component to provide blocking of SMTP delivery and relay privileges, with "trusted" address overrides. IS uses this technology in every protocol module to provide IS with relief from denial of service attacks and to provide an additional level of security from illegal access, sort of a built in firewall.
Smart Caching
The Name Translator and HTTP modules of IS both have requirements of very high speed access to data which is not stored locally in IS. In order to balance these requirements our toolkit includes Smart Caching, which is a process where IS can cache data it has accessed from the server, and then notify the server of it's interest in any changes to that data. Since the server is managing the data, and is aware of IS's interest, it can notify IS whenever a change occurs. The Name Translator uses this facility to get address updates from the server as the admin changes a user's alias. The HTTP Server takes advantage of this to store web content in cache, knowing that it can invalidate the entry when notified by the server. This method of caching provides performance beyond competing web server products, which must query the file system on each
request to see if their cache has become stale.
HTML Templates
IS's HTML templating subsystem provides any IS site with the ability to highly customize the web display of FirstClass data store content. Templates are defined on a per-site and per-object basis, with future plans to allow definition on a per-conference basis. The template is composed of standard HTML with additional "keywords" to control the embedding of FirstClass object content. As an object is opened, the data from the FirstClass message store is merged on-the-fly with the appropriate template to produce a customized view of the object.
Server-Based Monitoring, Notification, and Logging
Each component of IS needs the ability to keep the admin informed about what's going on. The toolkit provides interfaces that allow IS modules to send real-time monitor data, email notifications, or log file entries to the server. These capabilities are key to making IS a faceless service, administered by the admin as part of the server.
Central Data Store Search
IS is one of a very limited number of web servers that provides searching without an external program, that works right out of the box on all content. This capability is provided through a toolkit interface to the built in search capabilities of the FirstClass server. Any protocol with search capabilities (like IMAP4) gets equal access to this capability. While IS can support external search engines, there are a number of good reasons to use the built in facility, including: 1) Ease of setup - setting up a 3rd party search engine on your web site is complex. 2) Real time - all web search engines take static content and index it to create a search database, our version updates instantly. Create a FirstClass document and it is immediately searchable, no need to manually feed in the page, or wait until a "spider" notices the
change. 3) Hierarchical - typical search engines flatten your site into a database of keywords. If you're in the Tech Support area, and you do a search for "price", you get all kinds of unrelated hits about product prices, 3rd party prices, and mixed in somewhere, the price of paid tech support. With our search, you get a site search that allows all content to be examined, and you get a local search which allows you to search within the content relevant to where you are. If you do a local search in the Tech Support area on "price", you get the price for paid tech support, with no extraneous "noise" hits. 4) Contextual - a normal search engine gives you a link to the "hit" document. If you are interested in all of the related information, you can do another search, or if you're very clever you can edit the URL and find the index page at this level of the web site. With IS's search, each hit returns a context link (in folder) which allows you to
see what part of the larger data store this document is part of. 5) Hit control - with a standard search engine, the indexer decides what is relevant and what comes up first for a given keyword or search string. You get methods for tweaking the behavior, but a lot of it is under program control. With IS's built in search, the webmaster controls the order and type of hits. The first level of control is server permissions, making certain places "unsearchable" to prevent noise. The next level comes from organizing content to make local searches more powerful. Finally, the IS search occurs in a deterministic manner, based on the user's sort order preference. This means that the webmaster can place "anchor" documents that will return first in most searches, controlling the "ranking" of hits.
Central Directory Services
Through the directory subsystem of IS, all Protocol Modules inherit the ability to address and authenticate using the server directory. There are many benefits to this approach, including: 1) no need to add users separately for each protocol 2) strong core permissions system applied to all access through Internet protocols 3) ability to produce a "view" of the server address space which has been filtered appropriately for the user involved.
Content and Address Translators
IS's translation facilities are generic, and available to all protocol modules. This means that the FirstClass to HTML document converter developed for the HTTP server, can (and has already been) reused to provide HTTP message bodies in SMTP or POP3 messages. This system is extensible and is designed to allow additional content translators (i.e. Wordperfect to HTML) to be added as the need to publish additional content types arises.
If you would like to see additional features by the author of this article, click here.
|  |