Misc.writing Birthday Robot.
This robot reads HTML pages, posts news via NNTP and sends birthday
cards. The application might be a bit silly, but it demonstrates how
much you can accomplish with the standard Tcl commands and packages
and a little bit of extra code.
Pat Marcello (one of the readers of the misc.writing
newsgroup) has collected a list of the names and birthdays of regular
posters to the misc.writing newsgroup. Given the list of
birthdays, it seems like a clever idea to post birthday wishes for the
folks born in a given month.
Unfortunately, given the deadlines and schedules that a writer sometimes has to deal with, Pat doesn't always have the time to post birthday greetings.
At this point, I had the idea of writing a robot to take care of the task. The robot could examine the birthdays on Pat's site, find the folks with birthdays in a given month, generate a post from a set of templates, and send out electronic birthday cards.
To send electronic birthday cards, the robot must browse to a Web based electronic card server, and register the recipient for a birthday card. The server then notifiies the recipient that a card is waiting for them via e-mail. The robot will only send cards to people it can identify an E-Mail address for.
This chapter discusses how this project was designed and implemented.
This project had several design goals:
misc.writing news group or spamming innocent bystanders.
The requirements for this robot are reasonably simple:
Getting a list of dates and people could also be done by having Pat supply the robot with a pre-formatted list, but that would violate the goal of minimizing the human interaction. Having all the information in one human readable file simplifies the data maintenance.
misc.writing newsgroup to
wish these folks a happy birthday.
As is normal, the simple requirements get a bit more complex when we start nailing down the specifications.
HTML page, parse that page,
and extract the names of readers with birthdays in the current month.
HTTP protocols.
HTML documents
HTML page
based on the current month.
"Active members" are defined as people who have posted within the news pruning period of the MSEN.com news server.
NNTP replies.
The information from the NNTP overview is adequate for finding an email address.
HTTP protocol.
HTML page accessed, newsgroups, etc must be
configurable to allow test versions of these objects to be inserted
into the normal program flow.
Parts of the design and implementation are made easier by the tools
that Tcl provides. Specifically: an HTTP library already
exists (in Tcl revision 8.0 and newer), which makes the
HTTP interaction code simpler; the Tcl channel based I/O
generalization makes it easy to interact via sockets (for the
NNTP engine); and the regular expression and string
support within Tcl make parsing the HTML files relatively
easy.
The subsystem design for this application splits into four segments:
HTTP server.
NNTP server.
The next sections will discuss the design of these subsystems, followed by discussion of the implementation of the subsystems.
HTTP server.
The HTTP protocol is a relatively simple stateless protocol:
an HTTP server responds to a request on socket 80 by
returning an HTML document, which may be the requested
document, or an HTML page with an error message.
The Tcl HTTP library includes several commands for interacting
with an HTTP server. This application uses these
procedures:
Syntax: ::http::geturl url ?options?
::http::geturl |
Send a request to a URL and receive
the return page. The geturl procedure returns
a tag which can be used with other ::hhtp:: commands
to manipulate the HTML data.
This command is slightly misnamed in that it can
perform a This command supports several options, the one used in this project is:
| ||
url |
The URL to interact with. The IP address portion of this may be a raw dot-format IP address (10.123.45.67), or a site name (example.com). |
Syntax: ::http::data tag
data |
Return the HTML page associated with an
HTML tag.
|
tag |
A tag returned by ::http::geturl to
identify the HTML page.
|
Syntax: ::http::formatQuery key1 val1 ?key2 val2? ...
formatQuery |
Format a set of key and value pairs into a string for
use with the :http::geturl -query command.
Return the new string to the calling process.
Spaces in strings will be
replaced by
|
key val |
A key and value pair. These will be translated into
&key=value in the returned string.
|
NNTP news server
The NNTP protocol (RFC 977) is similar to several other
Internet protocols such as SMTP and POP
that use an interaction pattern of
prompt-command-reply.
The command may be a single line or multiple lines
terminated by a line containing a single period. The reply
to these commands may also be a single line, or multiple lines
terminated by a line with a single period. The reply will always
include a 3 digit status code. These codes use defined values
for reporting command success or failure.
Here is an example of an interaction with an NNTP server:
$> telnet news.example.com nntp
Trying 204.42.224.2...
Connected to news.example.com
Escape character is '^]'.
200 news.example.com DNEWS Version 4.5j, S0, posting OK
group comp.lang.tcl
211 146 93718 93888 comp.lang.tcl selected
xover 93720 93721
224 data follows
93720 Re: Sound Playback in Tcl/Tk! [was: Re: Idea for new extension]
Nat Pryce <np2@doc.ic.ac.uk> Thu, 10 Sep 1998 15:06:57 +0100
<01bddcc1$f4b76360$0400000a@crossbow.lcdmultimedia.com> 2788 38
Xref:
group nosuch.group.exists
411 no such news group
.
The status digits 200, 211 and 224 are all success codes (indicated by the initial digit 2), and the 411 status is a failure (indicated by the initial digit 4).
While the actual commands vary between the different protocols, the
prompt-command-reply pattern and the meaning of the status
codes remain consistent. This makes it possible to design the
NNTP interaction package as two sets of procedures:
prompt-command-reply sequences.
Most of the new code in this application was written to parse the HTML
pages and news postings. Parsing a set of data to represent the information
to a human is a slightly different problem from parsing a page to
extract information. In the former case, a program is trying to
extract presentation information from an HTML page
or a news article. The robot, however, is trying to extract specific
items of information from this data.
To make the information accessible to a robot, the parse engines extract the data into nested lists or associative arrays. The structure of the list, and the names used for the associative array indices can enable the robots to extract the information they need from the parsed data.
HTML page.
The html_library.tcl package (discussed in chapter 10) is
fine for displaying HTML documents, but does not try to
parse non-presentation information from the document. Rather than try
to adopt the html_library package, I wrote a parse engine
that uses simple heuristics to extract particular sets of information
from an HTML page.
The HTML parse engines will convert a table into a set
of lists, and will convert an HTML page with
<input...> fields into an associative array.
The various types of <input...> tags (radio,
hidden, etc) are denoted with a naming convention that merges
the value of the type= field and the value of the
name= field.
A Usenet News article consists of a header with several required fields and zero or more optional fields, and a message body.
The news parser will convert a news article (or an xover
article overview) into an associative array. The indices of the
array are the names of the fields (from, subject, etc),
and the contents of each array index are the contents of that field.
The body of an article is placed in the body index.
Creating and manipulating the birthday messages is a fairly simple
operation compared to interacting with HTTP and
NNTP servers or parsing HTML pages.
The technique used in this subsystem is to create several boilerplate messages which contain fields to be replaced with the appropriate data for each month. The two replacement fields are the month, and the list of people with birthdays this month.
The Tcl regsub command can replace all occurrences of one
string with another, which makes the message subset of the package
quite simple.
The misc.writing birthday robot is implemented in several script
files. The primary flow and the interaction with the birthday URL is
contained in birth_bot.tcl and the interaction with the
electronic card server is implemented in sendCard.tcl. The
general purpose parsing and string manipulation commands are contained
in other files as shown below.
The program files used by the birthday Robot are:
birth_bot.tcl |
The main entry point and functions specific to this robot |
sendCard.tcl |
Contains a procedure to send an E-Mail birthday card
via the vietsandiego.com E-Mail card server.
|
nntp.tcl |
Contains procedures to interact with an NNTP server.
|
IP_proto.tcl |
Contains procedures that provide low level interactions with a server that uses the prompt-response style of Internet communication protocol. |
parse.tcl |
Contains procedures that parse an HTML page.
|
listx.tcl |
Contains procedures with extended list commands.
|
stringx.tcl |
Contains procedures with extended string commands.
|
The procedures implemented in nntp.tcl and
parse.tcl are maintained within namespaces to protect the
applications that use these packages from namespace pollution and
collisions.
The procedures in listx.tcl, stringx.tcl, and
IP_proto.tcl are not embedded within namespaces.
These procedures are designed to be merged into other packages, and the
other packages will create any required namespaces.
Because these procedures are expected to be embedded within a namespace,
the code uses variable instead of global for
the state variables, and uses the namespace current
command to wrap callback procedures.
In this application the listx.tcl and
stringx.tcl procedures are nested inside the
parse namespace, and the IP_proto.tcl
procedures are nested inside the nntp namespace.
HTTP server
There are two sets of HTTP interactions in this package:
HTML page with the list of birthdays.
HTML page with birthday list.
Getting an HTML page with the ::http::
package uses the HTTP GET operation. This
is simple enough that for most applications you can just put the code
inline.
Because I would need to modify the page to test the package behavior, I
put the code to get the URL into a separate procedure, with a check to
see if a test file or the actual URL should be loaded. This lets me
modify a local copy of the HTML page and load that for testing.
Loading the local copy is also significantly faster, which doesn't
hurt in the development and debugging phases.
As work on this robot progressed I discovered that HTML
pages generated by Microsoft Publisher 97 can have the high order bit
turned on for some space characters. This changes the space character
from a hex value of 0x20 to the value 0xA0.
The Tcl string commands understand that a space is 0x20,
and treats 0xA0 as a non-space character. This can make
parsing words tricky. Fortunately, the Tcl binary and
regsub commands make it easy to replace all the
0xA0 bytes with 0x20 bytes. Having a
separate function for loading pages provided a centralized location to
apply this fix to the data before parsing it.
proc getPage {url} {
global birthdayBot
if {![string match "" $birthdayBot(testPage)]} {
set pg [exec cat $birthdayBot(testPage)]
} else {
set id [::http::geturl $url]
set pg [::http::data $id]
}
#
# Strip out the 0xA0 characters and replace them with real spaces
# before we let this page be used.
set sub [binary format H a0]
regsub -all $sub $pg " " pg
return $pg
}
The Email Birthday Card server interactions use both the HTTP
GET (to get the card description form) and POST (to
send the contents of the card to the server) operations.
The flow of the card sending procedure is shown below. The flow within
the sendCard procedure is shown on the left, and the
calls to other packages is on the right.

The flow shown above is used when talking with the
www.vietsandiego.com E-Mail card server. The
www.vietsandiego.com card server follows a fairly common
pattern for HTML pages that need to get user input before
they perform some action.
The HTML form requests that the user fill in several
INPUT tags, and then send the data to the server with a
HTTP POST operation. The input values are checked and if
all fields are valid a preview is presented to the user. If the user
confirms the preview with another POST operation, the
server performs the requested operation.
Since a number of sites use a sequence of events similar to this (or
simpler), I generalized the procedure so that it would be easy to use
to talk with other sites. The sendCard procedure is
actually a generic "Fill in a form and ship it" procedure, with most of
the information about what the forms contain in the code that
invokes sendCard
An HTML form includes one or more
<INPUT...> tags. Each of these tags has an identifier
defined with the NAME=... attribute.
The code that calls sendCard must know the values of the
NAME attributes, and must call sendCard with
the attribute names and values as call arguments. The
sendCard procedure accepts attribute/value pairs as
arguments in the form attribute1 value1 attribute2 value2.
The sendCard procedure uses these attribute names as the
names for the Tcl variables that contain values to be assigned to those
INPUT tags.
For example, if the HTML form includes an input tag
resembling:
<INPUT NAME=reply_to>
The invocation of sendCard would resemble:
sendCard http://www.example.com card.html reply_to robot@example.org
Within the sendCard procedure, a variable named
reply_to would be assigned the value
robot@example.org.
The implementation of sendCard uses the args
argument definition to allow any number of variable name and value
pairs to be assigned.
proc sendCard {baseUrl cardUrl args} {
foreach {var val} $args {
eval [list set $var $val]
lappend required $var
}
...
}
There are two things to notice in this code snippet:
list command is used with the
arguments to eval. The eval command
will concatenate its arguments into a string before evaluating them.
If the arguments are not grouped with the list
command, multiple word arguments will become multiple words, and the
command will produce unexpected results (usually an error).
required. This list of required attributes will be used
as a sanity check to confirm that the expected attributes were found in
the HTML form.
The code below shows how the INPUT tags are processed. The
indices of the associative array variable fields are
named using the pattern type.name where
type is the value in the TYPE=...
attribute of the HTML tag, and name is
the value of the NAME=... attribute of the
HTML tag.
For each INPUT tag the NAME attribute is
extracted, and that name is used to get the value from the local
variable with that name. The attribute name and the value in the
variable are appended to the list of key/value pairs that will be used
as an argument to formatQuery. Once the value has been
extracted from a local variable, the local variable is unset.
After the INPUT tags have all been processed, the list of
required fields is checked against the local variables. If any of
these local variables still exist, it means that something has been
changed in the HTML form, and these attribute/value pairs
are no longer valid. The HTML form should be examined and
the call arguments modified to reflect the new HTML form.
#
# Get the input fields, and assign the values from the command line
# Unset the variables as they are used to mark which fields are
# identified.
#
foreach field [array names fields input.*] {
set name [lindex [split $field "."] 1]
lappend msg $name [set $name]
eval unset $name
}
#
# Check that the expected required fields were found in the page.
# This is a sanity check that the card server hasn't changed the
# page beyond recognition.
#
foreach req $required {
if {[info exists $req]} {
error "$req not found in page - "
}
}
The code that invokes sendCard needs to provide values for
all <INPUT.. and <TEXTAREA tags, but does
not need to provide values for multiple choice buttons like
<INPUT TYPE=RADIO...> tags. The sendCard
procedure will make a random choice from the available values for
RADIO type INPUT tags.
POST operation
Once the values for the attributes have been determined, the
sendCard procedure formats a POST command and
sends it to the www.vietsandiego.com host with the
::http:: geturl site -query queryString
command.
The site to receive this POST operation is
extracted from the ACTION attribute field of the
FORM tag.
The queryString is generated with the
::http::formatQuery procedure. This procedure accepts
keyword/value pairs and turns them into an HTML query string, with
appropriate the appropriate &, = and +
punctuation marks inserted.
For example:
% ::http::formatQuery count 2 string "two words"
count=2&string=two+words
Note that the quotes (or braces) are necessary to group multiple words into a single argument for this command.
The foreach field [array names... loop shown in the
previous code snippet collects all the attributes and values into a
single list. This list is passed to formatQuery to format
it into proper form for a POST operation with the command:
set query [eval ::http::formatQuery $msg]
Note that if formatQuery is invoked as
::http::formatQuery $msg
there is only a single argument being passed to
formatQuery, and it is formatted as a single string. By
using the eval command, the $msg variable is
substituted before formatQuery is invoked. This splits
the list elements into separate arguments as required by
the formatQuery procedure.
HTML pages
The HTML parsing code uses different algorithms to extract
different types of information from an HTML page. The
different algorithms are required because the HTML tags
come in two basic flavors:
Tags like <IMG src=foo.gif> and
<INPUT name=foo> are self contained. The
text associated with this tag doesn't have any information for
these robots.
Tags that define entities like tables and lists contain the useful information in the text between the tags. The presentation instructions embedded in these tags doesn't matter to the robot.
HTML tags
The robot uses the procedure ExtractFormInfo to extract
the information from an HTML form.
ExtractFormInfo is defined in the file
parse.tcl.
Syntax: ::parse::ExtractFormInfo text actions fields
::parse::ExtractFormInfo |
Extracts the data that defines a form and returns
the values in a list of actions (if more than one <FORM ACTION=...>
definition exists), and an associative array that describes the
<INPUT...> and <TEXTAREA...> tags.
The indices of the associative array are named as: The value associated with these indices depends on the type of the item.
| ||||
text |
The HTML page to extract form information
from.
| ||||
actions |
The name of an Tcl variable to
receive the list of action URLs.
| ||||
fields |
The name of an associative array variable to
receive the parsed <INPUT...> and <TEXTAREA...>
information.
|
The ExtractFormInfo code loops through the
HTML page, finding the first match to a regular expression,
extracting the required flag = value fields from the
HTML tag, and then removing the HTML tag from
the text.
The values for attributes can be extracted from the string and the tag
can be removed from the text with set of regexp commands
as shown in the example below.
The processing loop in ExtractFormInfo resembles this:
while {[regexp -nocase {<input[^>]*>} $page inputTag]} {
# extract the interesting attributes
regexp -nocase {name[ ]*=[ ]*([^ ]*)} $inputTag full name
regexp -nocase {type[ ]*=[ ]*([^ ]*)} $inputTag full type
regexp -nocase {size[ ]*=[ ]*([^ ]*)} $inputTag full size
regexp -nocase {value[ ]*=[ ]*([^ ]*)} $inputTag full value
# Remove this tag from further consideration
regsub $inputTag $page "" page
...
}
The ExtractFormInfo procedure returns data in two
variables that are provided by the calling process: a list of possible
URL's that can be posted to, and an associative array
describing the <INPUT... and
<TEXTAREA... tags.
The list of cgi scripts that can be invoked by this page is obtained
from the <form...> tags. It is simply returned as a
list. If there are multiple <form...> tags, the
calling procedure needs to know what cgi scripts might be named to
determine which URL to respond to.
The type of input tag is encoded into the name of the indices of the
associative array in which the possible values are returned. The
values for a radio type input are returned as
a list of possible choices, and the maximum size is returned as
the value for input and textarea.
HTML tags
The HTML tags that mark the start and end of information,
like the <TABLE> tag, are more difficult to parse.
While the contents of an INPUT tag are a small, defined
set, a TABLE entry may contain any valid
HTML or text string. A TABLE entry might
even have another TABLE embedded within it.
Tables are extracted from an HTML
form with the ExtractAllTables procedure, defined in
parse.tcl
Syntax: ::parse::ExtractAllTables text
::parse::ExtractAllTables |
Converts tables in an HTML page into
a set of lists. Each table is a list composed of lists. Each
list entity within a table is a row, and each list entity within a
row is a column.
If a table has an embedded table, the embedded table is a list. Any text not included within a table is discarded. The <table...>, <td...>, , <tr...> and <th...>, strings are discarded.
|
text |
The portion of an HTML page,
containing one or more tables to convert into a list.
|
The ExtractAllTables function extracts all the tables from
an HTML page, and returns them as a set of lists in which
each table is a list, and each row is a list within the table-list, and
each column is a list within a row-list.
For instance, the simple table:
<TABLE>
<TR>
<TD> row-1_column-1
<TD> row-1_column-2
<TR>
<TD> row-2_column-1
<TD> row-2_column-2
</TABLE>
Would be converted into a list resembling:
{{row-1_column-1} {row-1_column-2}} {{row-2_column-1} {row-2_column-2}}
In this case, the whole list is the table, the two list entities are the two row-lists, and each row-list entity contains two list entities for the two columns.
Because one table can be embedded within another table, and a form could
have multiple consecutive tables, the obvious techniques for quickly
extracting the information from the page with regexp or
string procedures don't work. I used a variant of
the classical compiler technique of searching for the right hand
terminator (</TABLE>) and then backtracking to find the
corresponding left hand initiator (<TABLE>).
The code that does this starts by stripping out any text before the
first <TABLE> and after the last
</TABLE>, and then parsing the tables from the
beginning, parsing the inner-most tables before outer tables when
there are tables within tables.
After removing the extra text, the function looks for the first
</TABLE>, and creates a temporary string composed of
the text before the first </TABLE> marker.
The next step is to find the last <TABLE> marker in
the temporary string, and extract the text between the start and end
TABLE markers. This extracts a complete table from the text.
Once a table is extracted from the text, it is split into rows, and
each row is split into columns with the splitMulti command
(discussed in the next section).
The resulting list is saved, and the table is replaced in the original
page with a token denoting which saved list belongs here. The parsing
loop then continues to process the text until there are no remaining
</TABLE> markers in the text.
When all the tables have been converted to tokens, the text between tokens is deleted, and the tokens are replaced with the appropriate lists. The new list generated by replacing the tokens with lists is returned to the calling procedure.
string and list procedures
Tcl has a powerful set of string and list manipulation commands, but
it lacks a few features that I needed in order to parse the
HTML pages. I added these procedures in
listx.tcl and stringx.tcl
The listx procedures include:
splitMulti regularExpression text
The parse engines need to split text into a list at multiple-character
strings, as well as the the single character splits supported by
the split command.
This procedure allows the parse engines to convert a table into a list
of rows by splitting on the <TR>, and to convert
each row into a list of columns by splitting on
<TD>.
lsearchAll list globPattern
The lsearchAll command is similar to lsearch,
except that it returns a list of the locations of all the matches
of a pattern in a list instead of the location of the first pattern.
trimList list ?trimValue?
The trimList command will trim each entry in a list. This
is the equivalent of
foreach element $list {
set l [string trim $element]
lappend newList $l
}
getListElement list globPattern
The getListElement procedure merges the lsearch
and lindex commands to find a list element and return the
element instead of the index.
The stringx procedures are mostly shortcuts to
using regular expressions. They include:
stringx_FindAll regularExpression text
As with the list searchAll, this procedure returns a list
of all the matches to a regular expression in a string.
stringx_Count regularExpression text
Returns the number of times a regular expression occurs in a string.
stringx_Before regularExpression text
Returns the text that comes before a regular expression.
stringx_After regularExpression text
Returns the text that comes after a regular expression.
stringx_Between regularExpression1 regularExpression2 text
Returns the text that comes between two regular expressions.
These procedures simplified writing the parsing procedures in
parse.tcl.
NNTP Server
The NNTP interaction code is contained in two files.
IP_proto.tcl contains the low level connection opening and
closing and dialog procedures, and nntp.tcl contains the
code to implement the NNTP commands and parse the
resulting input into data structures.
IP_proto.tcl Procedures
The procedures in IP_proto.tcl open a socket connection
to a server and handle the primitive functions of sending a command,
retrieving a response, and checking the status return.
The IP_proto procedures include:
IPOpen |
Open a connection to a server |
sendText |
Send a string of text (a single line or multiple lines) to a server and optionally wait for the response. |
Most of the code in IP_proto.tcl is straightforward. The
interesting procedures are the sendText and
GetInput procedures.
To make life simpler for the package using the IP_proto
procedures, the prompt-command-reply sequence is
generalized into a single sendText command that waits
until the reply has been received and then returns the
reply text to the procedure that invoked
sendText.
The two wrinkles in this interaction are that a reply may be one or more lines of text, and a reply may take an unknown length of time to arrive, depending on network conditions, load on the server, etc.
The reply text generated in response to a
command may be a single line (as shown in the previous
NNTP example, or multiple lines with a line containing a
single period to mark the end of the reply text. The
expected number of lines in a reply is known to the code
that creates a command string, but is not
defined in the low level communication protocol. The code that
creates the command must tell the lowlevel code how many
lines (ONELINE or MULTILINE) to expect
to allow the low level code to know when a complete reply
has been received.
The sendText procedure accepts an argument to define
whether the expected reply will be a single line or multiple lines.
Since most of the interactions with a server have single line
responses, this argument defaults to a single line.
proc sendText {text {style ONELINE}} {
variable protocolIPstate
set protocolIPstate(mode) $style
...
Because the IP_proto procedures may be used with Tk
applications, the code can not block while waiting for a reply.
The vwait command is used to coordinate the sending of a
command and the availability of the reply. A fileevent
is defined to invoke GetInput to read input when data
becomes available.
The fileevent is defined when the socket is opened with
the command:
fileevent $protocolIPstate(channel) readable\
"[namespace current]::GetInput $protocolIPstate(channel)"
This fileevent command causes GetInput to be
invoked whenever there is data available on the socket. The
GetInput procedure appends any input to the global state
variable protocolIPstate(input), and sets the variable
protocolIPstate(isWriteable) when a reply is complete,
as shown in the following example.
The GetInput procedure is invoked from the event loop
whenever input is available. Because of this event-driven invocation,
GetInput has very little state information and can't
distinguish between the first line of data (which should replace any
previously received lines) and subsequent lines of data (which should
be appended to the received text). Clearing the
protocolIPstate(input) variable is the responsibility of
the procedure that initiates a command sequence (the
sendText procedure).
After removing error detecting code, the GetInput procedure
resembles this:
proc GetInput {chan} {
variable protocolIPstate
set line [gets $chan]
switch "$protocolIPstate(mode)" {
MULTILINE {
if {$line == "."} {
set protocolIPstate(isWriteable) 1
}
}
ONELINE {
set protocolIPstate(isWriteable) 1
}
default {puts "NO MATCH TO: $protocolIPstate(mode)"}
}
append protocolIPstate(input) "$line\n"
}
The control flow when a command is sent from a generic nntp.tcl
procedure resembles this:

The procedures in nntp.tcl implement the NNTP
communication protocol, as described in section 1.3.2. The procedures
understand the format of the various NNTP replies, and can
parse that data into a set of associative arrays or return the raw
replies to the calling program.
The nntp.tcl procedures are:
Open |
Opens a connection to the NNTP news server.
|
Group |
Sends a group command to the
server, selecting the newsgroup to process.
|
Xover |
Sends an xover command to the server, which
retrieves an overview of the available articles.
|
Article |
Sends an article command to the server, which
retrieves an article from the server.
|
Post |
Sends a post command to the server, which
posts an article to a newsgroup.
|
SearchOverView |
Searches the overview and returns a list of articles or a list of the positions in the overview that match a search criteria. |
The Group, Xover, Article and Post commands
implement the NNTP commands of the same name. The
Open command is not part of the NNTP
specification since obtaining a connection to the server is not part of
the protocol, but this functionality is required to make a useful
package.
The SearchOverView procedure is also not a part of the
NNTP communication protocol. This procedure implements the
fairly common activity of searching for postings by author,
subject, crossreference, etc.
As mentioned previously, nntp commands are implemented in
a namespace (::nntp::), while the low level routines are
not. This allows the low level routines to be included within the
::nntp:: namespace by placing the package require
protocolIP command inside the namespace eval
section of the nntp.tcl script. Not embedding the
low level protocol procedures in a namespace makes the nntp
code slightly simpler, but still hides these commands and variables
from the top level application.
misc.writing Birthday Robot.
The previous sections describe all the subsystems of the robot. The
control flow of the robot is straightforward, as shown below. The flow
within (birth_bot.tcl) is on the left, and calls out to
other libraries and packages are on the right.

As the previous discussion has shown, the tricky parts of this robot
(parsing the HTML pages) are extracted into separate
procedures. If these procedures find any unexpected condition they
generate an error, which will abort the task. At this point a human must
fix things.
Since there are many places in which the robot has to deal with human
generated data (the HTML pages at
vietsandiego.com, postings to the newsgroup, etc.) the robot
makes
no attempt to recover from unexpected conditions. Humans are
just too good at coming up with unexpected ways to confuse a poor,
simple robot.
There is plenty of room for more work with this robot.
dejanews for the E-Mail addresses, instead of limiting itself
to the postings on my ISP.
However, at this point, the robot works, and that's pretty good for a robot written with less than 1000 lines of code.
This chapter reproduced with permission from "Tcl/Tk for Real Programmers" (ISBN: 0122612051) published by Academic Press Professional. No further reproduction is permitted without permission from the author.