1.0 The Misc.writing Birthday Robot.

This robot reads HTML pages, posts news via NNTP and sends birthday cards. The application might be a bit silly, but it demonstrates how much you can accomplish with the standard Tcl commands and packages and a little bit of extra code.

Pat Marcello (one of the readers of the misc.writing newsgroup) has collected a list of the names and birthdays of regular posters to the misc.writing newsgroup. Given the list of birthdays, it seems like a clever idea to post birthday wishes for the folks born in a given month.

Unfortunately, given the deadlines and schedules that a writer sometimes has to deal with, Pat doesn't always have the time to post birthday greetings.

At this point, I had the idea of writing a robot to take care of the task. The robot could examine the birthdays on Pat's site, find the folks with birthdays in a given month, generate a post from a set of templates, and send out electronic birthday cards.

To send electronic birthday cards, the robot must browse to a Web based electronic card server, and register the recipient for a birthday card. The server then notifiies the recipient that a card is waiting for them via e-mail. The robot will only send cards to people it can identify an E-Mail address for.

This chapter discusses how this project was designed and implemented.

This project had several design goals:

  1. Minimize the requirements for human data processing.
  2. Use existing tools whenever and wherever possible.
  3. Modularize all new functionality for easy inclusion in other projects.
  4. Make Robot testable without affecting the misc.writing news group or spamming innocent bystanders.

1.1 Requirements

The requirements for this robot are reasonably simple:

  1. Access the birthday listing on Pat's homepage and extract the relevant dates and people.

    Getting a list of dates and people could also be done by having Pat supply the robot with a pre-formatted list, but that would violate the goal of minimizing the human interaction. Having all the information in one human readable file simplifies the data maintenance.

  2. Post a message to the misc.writing newsgroup to wish these folks a happy birthday.

  3. Acquire E-Mail addresses for the birthday boys and girls.

  4. Send an E-Mail Birthday Card to the appropriate folks.

1.2 Specifications

As is normal, the simple requirements get a bit more complex when we start nailing down the specifications.

  1. The robot will access the appropriate HTML page, parse that page, and extract the names of readers with birthdays in the current month.
    1. The robot will access the birthday page using HTTP protocols.
    2. The birthday page URL may be a hardcoded value within the script.
    3. The robot must parse HTML documents
    4. The robot must determine the current month
    5. The robot must select the appropriate entries from the HTML page based on the current month.

  2. The robot will access a newsserver, and generate a posting
    1. The robot will interact with the newsserver using NNTP protocol.
    2. The robot must have a method for generating unique messages that contain a list of names and dates.
    3. The subject will not change from posting to posting.
    4. The robot will post a news message at the beginning of each month.

  3. The robot must be able to map from a given name to an E-Mail address.
    1. Only currently active members of misc.writing will receive cards.

      "Active members" are defined as people who have posted within the news pruning period of the MSEN.com news server.

    2. The robot must be able to parse NNTP replies.

      The information from the NNTP overview is adequate for finding an email address.

    3. The robot should be careful to not send a card to someone who does *not* have a birthday this month. (Errors of omission are preferred to errors of commission.)

  4. The robot must interact with an electronic card server.
    1. An existing card server will be used (no new card server software will be written.)
    2. The card server must be free.
    3. A single card server may be used.
    4. The cardserver URL may be hardcoded within the script.
    5. The robot will interact with the card server using HTTP protocol.
    6. The cards will have a single message.
    7. Different cards will be selected randomly.
    8. Initially, the robot will send cards when it posts the news message. Eventually (undefined phase 2), the robot should send cards on the birthday.

  5. The HTML page accessed, newsgroups, etc must be configurable to allow test versions of these objects to be inserted into the normal program flow.

1.3 Design

Parts of the design and implementation are made easier by the tools that Tcl provides. Specifically: an HTTP library already exists (in Tcl revision 8.0 and newer), which makes the HTTP interaction code simpler; the Tcl channel based I/O generalization makes it easy to interact via sockets (for the NNTP engine); and the regular expression and string support within Tcl make parsing the HTML files relatively easy.

The subsystem design for this application splits into four segments:

  1. Interacting with an HTTP server.
  2. Interacting with an NNTP server.
  3. Parsing messages.
  4. Creating and Manipulating the birthday messages.

The next sections will discuss the design of these subsystems, followed by discussion of the implementation of the subsystems.

1.3.1 Interacting with an HTTP server.

The HTTP protocol is a relatively simple stateless protocol: an HTTP server responds to a request on socket 80 by returning an HTML document, which may be the requested document, or an HTML page with an error message.

The Tcl HTTP library includes several commands for interacting with an HTTP server. This application uses these procedures:

Syntax: ::http::geturl url ?options? 

::http::geturl Send a request to a URL and receive the return page. The geturl procedure returns a tag which can be used with other ::hhtp:: commands to manipulate the HTML data.

This command is slightly misnamed in that it can perform a GET, POST or HEAD operation.

This command supports several options, the one used in this project is:
-query Execute a POST operation, instead of the default GET operation.

url The URL to interact with. The IP address portion of this may be a raw dot-format IP address (10.123.45.67), or a site name (example.com).

Syntax: ::http::data  tag  

data Return the HTML page associated with an HTML tag.
tag A tag returned by ::http::geturl to identify the HTML page.

Syntax: ::http::formatQuery key1 val1 ?key2 val2? ...  

formatQuery Format a set of key and value pairs into a string for use with the :http::geturl -query command.

Return the new string to the calling process.

Spaces in strings will be replaced by + symbols, and <BR> entries will become newlines in the output.

key val A key and value pair. These will be translated into &key=value in the returned string.

1.3.2 Interacting with the NNTP news server

The NNTP protocol (RFC 977) is similar to several other Internet protocols such as SMTP and POP that use an interaction pattern of prompt-command-reply.

The command may be a single line or multiple lines terminated by a line containing a single period. The reply to these commands may also be a single line, or multiple lines terminated by a line with a single period. The reply will always include a 3 digit status code. These codes use defined values for reporting command success or failure.

Here is an example of an interaction with an NNTP server:


 $> telnet news.example.com nntp
 Trying 204.42.224.2...
 Connected to news.example.com
 Escape character is '^]'.
 200 news.example.com DNEWS Version 4.5j, S0, posting OK 
 group comp.lang.tcl
 211 146 93718 93888 comp.lang.tcl selected
 xover 93720 93721
 224 data follows
 93720   Re: Sound Playback in Tcl/Tk!  [was: Re: Idea for new extension]
 Nat Pryce <np2@doc.ic.ac.uk>     Thu, 10 Sep 1998 15:06:57 +0100
 <01bddcc1$f4b76360$0400000a@crossbow.lcdmultimedia.com>     2788    38 
     Xref: 
 group nosuch.group.exists
 411 no such news group
 .

The status digits 200, 211 and 224 are all success codes (indicated by the initial digit 2), and the 411 status is a failure (indicated by the initial digit 4).

While the actual commands vary between the different protocols, the prompt-command-reply pattern and the meaning of the status codes remain consistent. This makes it possible to design the NNTP interaction package as two sets of procedures:

  1. A low-level set of procedures handles the interaction with a Internet server. This set of procedures must be able to:

  2. A set of procedures to implement the NNTP protocol using the low level procedures. This set of procedures must be able to:

1.3.3 Parsing the Data

Most of the new code in this application was written to parse the HTML pages and news postings. Parsing a set of data to represent the information to a human is a slightly different problem from parsing a page to extract information. In the former case, a program is trying to extract presentation information from an HTML page or a news article. The robot, however, is trying to extract specific items of information from this data.

To make the information accessible to a robot, the parse engines extract the data into nested lists or associative arrays. The structure of the list, and the names used for the associative array indices can enable the robots to extract the information they need from the parsed data.

1.3.3.1 Parsing an HTML page.

The html_library.tcl package (discussed in chapter 10) is fine for displaying HTML documents, but does not try to parse non-presentation information from the document. Rather than try to adopt the html_library package, I wrote a parse engine that uses simple heuristics to extract particular sets of information from an HTML page.

The HTML parse engines will convert a table into a set of lists, and will convert an HTML page with <input...> fields into an associative array.

The various types of <input...> tags (radio, hidden, etc) are denoted with a naming convention that merges the value of the type= field and the value of the name= field.

1.3.3.2 Parsing a News Article

A Usenet News article consists of a header with several required fields and zero or more optional fields, and a message body.

The news parser will convert a news article (or an xover article overview) into an associative array. The indices of the array are the names of the fields (from, subject, etc), and the contents of each array index are the contents of that field. The body of an article is placed in the body index.

1.3.4 Creating and Manipulating birthday messages

Creating and manipulating the birthday messages is a fairly simple operation compared to interacting with HTTP and NNTP servers or parsing HTML pages.

The technique used in this subsystem is to create several boilerplate messages which contain fields to be replaced with the appropriate data for each month. The two replacement fields are the month, and the list of people with birthdays this month.

The Tcl regsub command can replace all occurrences of one string with another, which makes the message subset of the package quite simple.

1.4 Implementation Details

The misc.writing birthday robot is implemented in several script files. The primary flow and the interaction with the birthday URL is contained in birth_bot.tcl and the interaction with the electronic card server is implemented in sendCard.tcl. The general purpose parsing and string manipulation commands are contained in other files as shown below.

The program files used by the birthday Robot are:

birth_bot.tcl The main entry point and functions specific to this robot
sendCard.tcl Contains a procedure to send an E-Mail birthday card via the vietsandiego.com E-Mail card server.
nntp.tcl Contains procedures to interact with an NNTP server.
IP_proto.tcl Contains procedures that provide low level interactions with a server that uses the prompt-response style of Internet communication protocol.
parse.tcl Contains procedures that parse an HTML page.
listx.tcl Contains procedures with extended list commands.
stringx.tcl Contains procedures with extended string commands.

The procedures implemented in nntp.tcl and parse.tcl are maintained within namespaces to protect the applications that use these packages from namespace pollution and collisions.

The procedures in listx.tcl, stringx.tcl, and IP_proto.tcl are not embedded within namespaces. These procedures are designed to be merged into other packages, and the other packages will create any required namespaces.

Because these procedures are expected to be embedded within a namespace, the code uses variable instead of global for the state variables, and uses the namespace current command to wrap callback procedures.

In this application the listx.tcl and stringx.tcl procedures are nested inside the parse namespace, and the IP_proto.tcl procedures are nested inside the nntp namespace.

1.4.1 Interacting with the HTTP server

There are two sets of HTTP interactions in this package:

  1. Getting the HTML page with the list of birthdays.
  2. Sending a birthday card to folks for whom E-Mail addresses can be found.

1.4.1.1 Getting the HTML page with birthday list.

Getting an HTML page with the ::http:: package uses the HTTP GET operation. This is simple enough that for most applications you can just put the code inline.

Because I would need to modify the page to test the package behavior, I put the code to get the URL into a separate procedure, with a check to see if a test file or the actual URL should be loaded. This lets me modify a local copy of the HTML page and load that for testing. Loading the local copy is also significantly faster, which doesn't hurt in the development and debugging phases.

As work on this robot progressed I discovered that HTML pages generated by Microsoft Publisher 97 can have the high order bit turned on for some space characters. This changes the space character from a hex value of 0x20 to the value 0xA0.

The Tcl string commands understand that a space is 0x20, and treats 0xA0 as a non-space character. This can make parsing words tricky. Fortunately, the Tcl binary and regsub commands make it easy to replace all the 0xA0 bytes with 0x20 bytes. Having a separate function for loading pages provided a centralized location to apply this fix to the data before parsing it.


proc getPage {url} {
    global birthdayBot

if {![string match "" $birthdayBot(testPage)]} { set pg [exec cat $birthdayBot(testPage)] } else { set id [::http::geturl $url] set pg [::http::data $id] }

# # Strip out the 0xA0 characters and replace them with real spaces # before we let this page be used.

set sub [binary format H a0] regsub -all $sub $pg " " pg

return $pg }

1.4.1.2 Interacting with the E-Mail card server

The Email Birthday Card server interactions use both the HTTP GET (to get the card description form) and POST (to send the contents of the card to the server) operations.

The flow of the card sending procedure is shown below. The flow within the sendCard procedure is shown on the left, and the calls to other packages is on the right.

The flow shown above is used when talking with the www.vietsandiego.com E-Mail card server. The www.vietsandiego.com card server follows a fairly common pattern for HTML pages that need to get user input before they perform some action.

The HTML form requests that the user fill in several INPUT tags, and then send the data to the server with a HTTP POST operation. The input values are checked and if all fields are valid a preview is presented to the user. If the user confirms the preview with another POST operation, the server performs the requested operation.

Since a number of sites use a sequence of events similar to this (or simpler), I generalized the procedure so that it would be easy to use to talk with other sites. The sendCard procedure is actually a generic "Fill in a form and ship it" procedure, with most of the information about what the forms contain in the code that invokes sendCard

1.4.1.2.1 Filling in the form

An HTML form includes one or more <INPUT...> tags. Each of these tags has an identifier defined with the NAME=... attribute.

The code that calls sendCard must know the values of the NAME attributes, and must call sendCard with the attribute names and values as call arguments. The sendCard procedure accepts attribute/value pairs as arguments in the form attribute1 value1 attribute2 value2. The sendCard procedure uses these attribute names as the names for the Tcl variables that contain values to be assigned to those INPUT tags.

For example, if the HTML form includes an input tag resembling:

<INPUT NAME=reply_to>

The invocation of sendCard would resemble:

sendCard http://www.example.com card.html reply_to robot@example.org

Within the sendCard procedure, a variable named reply_to would be assigned the value robot@example.org.

The implementation of sendCard uses the args argument definition to allow any number of variable name and value pairs to be assigned.


proc sendCard {baseUrl cardUrl args} {

foreach {var val} $args { eval [list set $var $val] lappend required $var } ... }

There are two things to notice in this code snippet:

The code below shows how the INPUT tags are processed. The indices of the associative array variable fields are named using the pattern type.name where type is the value in the TYPE=... attribute of the HTML tag, and name is the value of the NAME=... attribute of the HTML tag.

For each INPUT tag the NAME attribute is extracted, and that name is used to get the value from the local variable with that name. The attribute name and the value in the variable are appended to the list of key/value pairs that will be used as an argument to formatQuery. Once the value has been extracted from a local variable, the local variable is unset.

After the INPUT tags have all been processed, the list of required fields is checked against the local variables. If any of these local variables still exist, it means that something has been changed in the HTML form, and these attribute/value pairs are no longer valid. The HTML form should be examined and the call arguments modified to reflect the new HTML form.


    #
    # Get the input fields, and assign the values from the command line
    #  Unset the variables as they are used to mark which fields are
    #  identified.
    #

foreach field [array names fields input.*] { set name [lindex [split $field "."] 1] lappend msg $name [set $name] eval unset $name }

# # Check that the expected required fields were found in the page. # This is a sanity check that the card server hasn't changed the # page beyond recognition. #

foreach req $required { if {[info exists $req]} { error "$req not found in page - " } }

The code that invokes sendCard needs to provide values for all <INPUT.. and <TEXTAREA tags, but does not need to provide values for multiple choice buttons like <INPUT TYPE=RADIO...> tags. The sendCard procedure will make a random choice from the available values for RADIO type INPUT tags.

1.4.1.2.2 Formatting the POST operation

Once the values for the attributes have been determined, the sendCard procedure formats a POST command and sends it to the www.vietsandiego.com host with the ::http:: geturl site -query queryString command.

The site to receive this POST operation is extracted from the ACTION attribute field of the FORM tag.

The queryString is generated with the ::http::formatQuery procedure. This procedure accepts keyword/value pairs and turns them into an HTML query string, with appropriate the appropriate &, = and + punctuation marks inserted.

For example:


% ::http::formatQuery count 2 string "two words"
count=2&string=two+words

Note that the quotes (or braces) are necessary to group multiple words into a single argument for this command.

The foreach field [array names... loop shown in the previous code snippet collects all the attributes and values into a single list. This list is passed to formatQuery to format it into proper form for a POST operation with the command:

set query [eval ::http::formatQuery $msg]

Note that if formatQuery is invoked as

::http::formatQuery $msg

there is only a single argument being passed to formatQuery, and it is formatted as a single string. By using the eval command, the $msg variable is substituted before formatQuery is invoked. This splits the list elements into separate arguments as required by the formatQuery procedure.

1.4.2 Parsing the HTML pages

The HTML parsing code uses different algorithms to extract different types of information from an HTML page. The different algorithms are required because the HTML tags come in two basic flavors:

1.4.2.1 Parsing self contained HTML tags

The robot uses the procedure ExtractFormInfo to extract the information from an HTML form. ExtractFormInfo is defined in the file parse.tcl.

Syntax: ::parse::ExtractFormInfo text actions fields  

::parse::ExtractFormInfo Extracts the data that defines a form and returns the values in a list of actions (if more than one <FORM ACTION=...> definition exists), and an associative array that describes the <INPUT...> and <TEXTAREA...> tags.

The indices of the associative array are named as: type.name, in which type is the type of <INPUT...> item that this entry represents, and name is the value of the NAME= field in this item.

The value associated with these indices depends on the type of the item.

radio.name This entry contains a list of possible return values for the radio item.

input.name This entry contains the maximum length for this item. This value is obtained from the <SIZE=...> field for <INPUT> tags, and by multiplying the code ROWS and COLUMN attributes for <TEXTAREA...> tags.

text The HTML page to extract form information from.

actions The name of an Tcl variable to receive the list of action URLs.

fields The name of an associative array variable to receive the parsed <INPUT...> and <TEXTAREA...> information.

The ExtractFormInfo code loops through the HTML page, finding the first match to a regular expression, extracting the required flag = value fields from the HTML tag, and then removing the HTML tag from the text.

The values for attributes can be extracted from the string and the tag can be removed from the text with set of regexp commands as shown in the example below.

The processing loop in ExtractFormInfo resembles this:


while {[regexp -nocase {<input[^>]*>} $page inputTag]} {

# extract the interesting attributes

regexp -nocase {name[ ]*=[ ]*([^ ]*)} $inputTag full name regexp -nocase {type[ ]*=[ ]*([^ ]*)} $inputTag full type regexp -nocase {size[ ]*=[ ]*([^ ]*)} $inputTag full size regexp -nocase {value[ ]*=[ ]*([^ ]*)} $inputTag full value

# Remove this tag from further consideration

regsub $inputTag $page "" page ... }

The ExtractFormInfo procedure returns data in two variables that are provided by the calling process: a list of possible URL's that can be posted to, and an associative array describing the <INPUT... and <TEXTAREA... tags.

The list of cgi scripts that can be invoked by this page is obtained from the <form...> tags. It is simply returned as a list. If there are multiple <form...> tags, the calling procedure needs to know what cgi scripts might be named to determine which URL to respond to.

The type of input tag is encoded into the name of the indices of the associative array in which the possible values are returned. The values for a radio type input are returned as a list of possible choices, and the maximum size is returned as the value for input and textarea.

1.4.2.2 Parsing open/close style HTML tags

The HTML tags that mark the start and end of information, like the <TABLE> tag, are more difficult to parse. While the contents of an INPUT tag are a small, defined set, a TABLE entry may contain any valid HTML or text string. A TABLE entry might even have another TABLE embedded within it.

Tables are extracted from an HTML form with the ExtractAllTables procedure, defined in parse.tcl

Syntax: ::parse::ExtractAllTables text 

::parse::ExtractAllTables Converts tables in an HTML page into a set of lists. Each table is a list composed of lists. Each list entity within a table is a row, and each list entity within a row is a column.

If a table has an embedded table, the embedded table is a list.

Any text not included within a table is discarded.

The <table...>, <td...>, , <tr...> and <th...>, strings are discarded.

text The portion of an HTML page, containing one or more tables to convert into a list.

The ExtractAllTables function extracts all the tables from an HTML page, and returns them as a set of lists in which each table is a list, and each row is a list within the table-list, and each column is a list within a row-list.

For instance, the simple table:


  <TABLE>
    <TR>
      <TD> row-1_column-1
      <TD> row-1_column-2
    <TR>
      <TD> row-2_column-1
      <TD> row-2_column-2
  </TABLE>

Would be converted into a list resembling:


{{row-1_column-1} {row-1_column-2}} {{row-2_column-1} {row-2_column-2}}

In this case, the whole list is the table, the two list entities are the two row-lists, and each row-list entity contains two list entities for the two columns.

Because one table can be embedded within another table, and a form could have multiple consecutive tables, the obvious techniques for quickly extracting the information from the page with regexp or string procedures don't work. I used a variant of the classical compiler technique of searching for the right hand terminator (</TABLE>) and then backtracking to find the corresponding left hand initiator (<TABLE>).

The code that does this starts by stripping out any text before the first <TABLE> and after the last </TABLE>, and then parsing the tables from the beginning, parsing the inner-most tables before outer tables when there are tables within tables.

After removing the extra text, the function looks for the first </TABLE>, and creates a temporary string composed of the text before the first </TABLE> marker.

The next step is to find the last <TABLE> marker in the temporary string, and extract the text between the start and end TABLE markers. This extracts a complete table from the text.

Once a table is extracted from the text, it is split into rows, and each row is split into columns with the splitMulti command (discussed in the next section).

The resulting list is saved, and the table is replaced in the original page with a token denoting which saved list belongs here. The parsing loop then continues to process the text until there are no remaining </TABLE> markers in the text.

When all the tables have been converted to tokens, the text between tokens is deleted, and the tokens are replaced with the appropriate lists. The new list generated by replacing the tokens with lists is returned to the calling procedure.

1.4.2.1 New string and list procedures

Tcl has a powerful set of string and list manipulation commands, but it lacks a few features that I needed in order to parse the HTML pages. I added these procedures in listx.tcl and stringx.tcl

The listx procedures include:

The stringx procedures are mostly shortcuts to using regular expressions. They include:

These procedures simplified writing the parsing procedures in parse.tcl.

1.4.3 Interacting with the NNTP Server

The NNTP interaction code is contained in two files. IP_proto.tcl contains the low level connection opening and closing and dialog procedures, and nntp.tcl contains the code to implement the NNTP commands and parse the resulting input into data structures.

1.4.3.1 The IP_proto.tcl Procedures

The procedures in IP_proto.tcl open a socket connection to a server and handle the primitive functions of sending a command, retrieving a response, and checking the status return.

The IP_proto procedures include:

IPOpen Open a connection to a server
sendText Send a string of text (a single line or multiple lines) to a server and optionally wait for the response.

Most of the code in IP_proto.tcl is straightforward. The interesting procedures are the sendText and GetInput procedures.

To make life simpler for the package using the IP_proto procedures, the prompt-command-reply sequence is generalized into a single sendText command that waits until the reply has been received and then returns the reply text to the procedure that invoked sendText.

The two wrinkles in this interaction are that a reply may be one or more lines of text, and a reply may take an unknown length of time to arrive, depending on network conditions, load on the server, etc.

The reply text generated in response to a command may be a single line (as shown in the previous NNTP example, or multiple lines with a line containing a single period to mark the end of the reply text. The expected number of lines in a reply is known to the code that creates a command string, but is not defined in the low level communication protocol. The code that creates the command must tell the lowlevel code how many lines (ONELINE or MULTILINE) to expect to allow the low level code to know when a complete reply has been received.

The sendText procedure accepts an argument to define whether the expected reply will be a single line or multiple lines. Since most of the interactions with a server have single line responses, this argument defaults to a single line.


    proc sendText {text {style ONELINE}} {
        variable protocolIPstate

set protocolIPstate(mode) $style ...

Because the IP_proto procedures may be used with Tk applications, the code can not block while waiting for a reply.

The vwait command is used to coordinate the sending of a command and the availability of the reply. A fileevent is defined to invoke GetInput to read input when data becomes available.

The fileevent is defined when the socket is opened with the command:


        fileevent $protocolIPstate(channel) readable\
          "[namespace current]::GetInput $protocolIPstate(channel)"

This fileevent command causes GetInput to be invoked whenever there is data available on the socket. The GetInput procedure appends any input to the global state variable protocolIPstate(input), and sets the variable protocolIPstate(isWriteable) when a reply is complete, as shown in the following example.

The GetInput procedure is invoked from the event loop whenever input is available. Because of this event-driven invocation, GetInput has very little state information and can't distinguish between the first line of data (which should replace any previously received lines) and subsequent lines of data (which should be appended to the received text). Clearing the protocolIPstate(input) variable is the responsibility of the procedure that initiates a command sequence (the sendText procedure).

After removing error detecting code, the GetInput procedure resembles this:


    proc GetInput {chan} {
        variable protocolIPstate  

set line [gets $chan]

switch "$protocolIPstate(mode)" { MULTILINE { if {$line == "."} { set protocolIPstate(isWriteable) 1 } } ONELINE { set protocolIPstate(isWriteable) 1 } default {puts "NO MATCH TO: $protocolIPstate(mode)"} } append protocolIPstate(input) "$line\n" }

The control flow when a command is sent from a generic nntp.tcl procedure resembles this:

1.4.3.2

The procedures in nntp.tcl implement the NNTP communication protocol, as described in section 1.3.2. The procedures understand the format of the various NNTP replies, and can parse that data into a set of associative arrays or return the raw replies to the calling program.

The nntp.tcl procedures are:

Open Opens a connection to the NNTP news server.
Group Sends a group command to the server, selecting the newsgroup to process.
Xover Sends an xover command to the server, which retrieves an overview of the available articles.
Article Sends an article command to the server, which retrieves an article from the server.
Post Sends a post command to the server, which posts an article to a newsgroup.
SearchOverView Searches the overview and returns a list of articles or a list of the positions in the overview that match a search criteria.

The Group, Xover, Article and Post commands implement the NNTP commands of the same name. The Open command is not part of the NNTP specification since obtaining a connection to the server is not part of the protocol, but this functionality is required to make a useful package.

The SearchOverView procedure is also not a part of the NNTP communication protocol. This procedure implements the fairly common activity of searching for postings by author, subject, crossreference, etc.

As mentioned previously, nntp commands are implemented in a namespace (::nntp::), while the low level routines are not. This allows the low level routines to be included within the ::nntp:: namespace by placing the package require protocolIP command inside the namespace eval section of the nntp.tcl script. Not embedding the low level protocol procedures in a namespace makes the nntp code slightly simpler, but still hides these commands and variables from the top level application.

1.5 Behavior of the misc.writing Birthday Robot.

The previous sections describe all the subsystems of the robot. The control flow of the robot is straightforward, as shown below. The flow within (birth_bot.tcl) is on the left, and calls out to other libraries and packages are on the right.

As the previous discussion has shown, the tricky parts of this robot (parsing the HTML pages) are extracted into separate procedures. If these procedures find any unexpected condition they generate an error, which will abort the task. At this point a human must fix things.

Since there are many places in which the robot has to deal with human generated data (the HTML pages at vietsandiego.com, postings to the newsgroup, etc.) the robot makes no attempt to recover from unexpected conditions. Humans are just too good at coming up with unexpected ways to confuse a poor, simple robot.

1.6 Summary

There is plenty of room for more work with this robot.

However, at this point, the robot works, and that's pretty good for a robot written with less than 1000 lines of code.

This chapter reproduced with permission from "Tcl/Tk for Real Programmers" (ISBN: 0122612051) published by Academic Press Professional. No further reproduction is permitted without permission from the author.