Lab 3 Perl CGI

As in the preceding, the following is excerpted from Jacqueline D. Hamilton's awesome online class at http://www.cgi101.com. (She has now published this in book form and I highly recommend it.) In particular, this material comes from http://www.cgi101.com/class/ch4/text.html.

Contents

Introduction

Now that you have learned some Perl we will use Perl to create useful forms.

When sending form data to your CGI, your web server encodes the data being sent. Alphanumeric characters are sent as themselves; spaces are converted to plus signs (+); other characters - like tabs, quotes, etc. - are converted to "%HH" - a percent sign and two hexadecimal digits representing the ASCII code of the character. This is called URL encoding. Here's a table of some commonly encoded characters:

    Normal Character URL Encoded String
    \t (tab)            %09
    \n (return)         %0A
    /                   %2F
    ~                   %7E
    :                   %3A
    ;                   %3B
    @                   %40
    &                   %26
    
    
In order to do anything useful with the data, your CGI script must decode these. Fortunately, this is pretty easy to do in Perl, using the substitute and translate commands. Perl has powerful pattern matching and replacement capabilities; it can match the most complex patterns in a string, using regular expressions. But it's also quite capable of the most simple replacements. The basic syntax for substitutions is:

    $mystring =~ s/pattern/replacement/;

This command substitutes "pattern" for "replacement" in the scalar variable "$mystring". Notice the operator is a =~ (an equal sign followed by a tilde) - this is a special operator for Perl, telling it that it's about to do a pattern match or replacement. Here's an example of how it works:

    $greetings = "Hello. My name is xnamex.\n"; $greetings =~ s/xnamex/Bob/; print $greetings;
The above code will print out "Hello. My name is Bob." Notice the substitution has replaced "xnamex" with "Bob" in the $greetings string.

A similar but slightly different command is the translate command:

    $mystring =~ tr/searchlist/replacementlist/;
This command translates every character in "searchlist" to its corresponding character in "replacementlist", for the entire value of $mystring. One common use of this is to change the case of all characters in a string:

    $lowerc =~ tr/[A-Z]/[a-z]/;
This results in $lowerc being translated to all lowercase letters. The brackets around [A-Z] denote a class of characters to match.

Decoding Form Data

With the POST method, form data is sent in an input stream from the server to your CGI script. To get this data, store it, and decode it, we'll use the following block of code (which is a more complex script that can be used with the forms page :

    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs) {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $FORM{$name} = $value;
    }
    
Let's look at each part of this. First, we read the input stream using this line:

    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
The input stream is coming in over STDIN (standard input), and we're using Perl's read function to store the data into the scalar variable $buffer. You'll also notice the third argument to the read function, which specifies the length of data to be read; we want to read to the end of the CONTENT_LENGTH, which is set as an environment variable by the server.

Next we split the buffer into an array of pairs:

    @pairs = split(/&/, $buffer);
Form data pairs are separated by & signs when they are transmitted - for example, fname=joe&lname=smith. Now we'll use a foreach loop to further splits each pair on the equal signs:

    foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair);
The next line translates every "+" sign back to a space:

    $value =~ tr/+/ /;
Next is a rather complicated regular expression that substitutes every %HH hex pair back to its equivalent ASCII character, using the pack() function. For now we'll just use it to parse the form data:

    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
Finally, we store the values into a hash called %FORM:

    $FORM{$name} = $value; }
The keys of %FORM are the form input names themselves. So, for example, if you have three text fields in the form - called name, email-address, and age - you could refer to them in your script by using $FORM{'name'}, $FORM{'email-address'}, and $FORM{'age'}.

Let's try it. Create a new CGI script with the following, calling it post.cgi (or post.pl) , save it, and chmod it:

    #!/usr/bin/perl
    
    print "Content-type:text/html\n\n";
    
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs) {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $FORM{$name} = $value;
    }
    
    print "<html><head><title>Form Output</title></head><body>";
    print "<h2>Results from FORM post</h2>\n";
    
    foreach $key (keys(%FORM)) {
        print "$key = $FORM{$key}<br>";
    }
    
    print "</body></html>";
    
Source code: http://www.cgi101.com/class/ch4/post.txt

This code can be used to handle almost any form, from a simple guestbook form to a more complex order form. Whatever variables you have in your form, this CGI will print them out, along with the data that was entered.

Let's test the script. Create an HTML form with the fields listed below:

    <form action="post.cgi" method="POST">
             Your Name:  <input type="text" name="name">
         Email Address:  <input type="text" name="email">
                   Age:  <input type="text" name="age">
        Favorite Color:  <input type="text" name="favorite_color">
    <input type="submit" value="Send">
    <input type="reset" value="Clear Form">
    </form>
    
Source code: http://www.cgi101.com/class/ch4/post.html

Enter some data into the fields, and press "send" when finished. The output will be the variable names of these text boxes, plus the actual data you typed into each field.

Tip: If you've had trouble getting the boxes to align on your form, try putting <pre> tags around the input fields. Then you can line them up with your text editor, and the result is a much neater looking form. The reason for this is that most web browsers use a fixed-width font (like Monaco or Courier) for preformatted text, so aligning forms and other data is much easier in a preformatted text block than in regular HTML. This will only work if your text editor is also using a fixed-width font! Another way to align input boxes is to put them all into a table, with the input name in the left column, and the input box in the right column.


A Form-to-Email CGI

Most people using forms want the data emailed back to them, so, let's write a form-to-mail CGI. First you'll need to figure out where the sendmail program lives on the Unix system you're on. (For cgi101.com, it's in /usr/sbin/sendmail. If you're not sure where yours is, try doing "which sendmail" or "whereis sendmail"; usually one of these two commands will yield the location of the sendmail program.)

Copy your post.cgi to a new file named mail.cgi. Now the only change will be to the foreach loop. Instead of printing to standard output (the HTML page the person sees after clicking submit), you want to print the values of the variables to a mail message. So, first, we must open a pipe to the sendmail program:

    $mailprog = '/usr/sbin/sendmail'; open (MAIL, "|$mailprog -t")
The pipe causes all of the ouput we print to that filehandle (MAIL) to be fed directly to the sendmail program as if it were standard input to that program.

You also need to specify the recipient of the email, with either:

    $recipient = 'nullbox@cgi101.com'; $recipient = "nullbox\@cgi101.com";
Perl will complain if you use an "@" sign inside a double-quoted string or a print <<EndHTML block. You can safely put an @-sign inside a single-quoted string, like 'nullbox@cgi101.com', or you can escape the @-sign in other strings by using a backslash. For example, "nullbox\@cgi101.com".

You don't need to include the comments in the following code; they are just there to show you what's happening.

    #!/usr/bin/perl
    
    print "Content-type:text/html\n\n";
    
    # parse the form data.
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs) {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;  
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $FORM{$name} = $value;
    }
    
    # where is the mail program?
    $mailprog = '/usr/sbin/sendmail';
    
    # change this to your own email address
    
    $recipient = 'nullbox@cgi101.com';
     
    # this opens an output stream and pipes it directly to the 
    # sendmail program.  If sendmail can't be found, abort nicely 
    # by calling the dienice subroutine (see below)
    
    open (MAIL, "|$mailprog -t") or dienice("Can't access 
    $mailprog!\n");
    
    # here we're printing out the header info for the mail 
    # message. You must specify who it's to, or it won't be 
    # delivered:
    
    print MAIL "To: $recipient\n";
    
    # Reply-to can be set to the email address of the sender, 
    # assuming you have actually defined a field in your form
    # called 'email'.
    
    print MAIL "Reply-to: $FORM{'email'} ($FORM{'name'})\n";
    
    # print out a subject line so you know it's from your form cgi.
    # The two \n\n's end the header section of the message.  
    # anything you print after this point will be part of the 
    # body of the mail.
    
    print MAIL "Subject: Form Data\n\n";
    
    # here you're just printing out all the variables and values, 
    # just like before in the previous script, only the output 
    # is to the mail message rather than the followup HTML page.
    
    foreach $key (keys(%FORM)) {
        print MAIL "$key = $FORM{$key}\n";
    }
    
    # when you finish writing to the mail message, be sure to 
    # close the input stream so it actually gets mailed.
    
    close(MAIL);
    
    # now print something to the HTML page, usually thanking 
    # the person for filling out the form, and giving them a 
    # link back to your homepage
    
    print <<EndHTML;
    <h2>Thank You</h2>
    Thank you for writing.  Your mail has been delivered.<p>
    Return to our <a href="index.html">home page</a>.
    </body></html>
    EndHTML
    
    # The dienice subroutine, for handling errors.
    sub dienice {
        my($errmsg) = @_;
        print "<h2>Error</h2>\n";
        print "$errmsg<p>\n";
        print "</body></html>\n";
        exit;
    }
    
    

Now let's test the new script. Here's the form again, only the action this time points to mail.cgi:

    <form action="mail.cgi" method="POST">
           Your Name: <input type="text" name="name">
       Email Address: <input type="text" name="email">
                 Age: <input type="text" name="age">
      Favorite Color: <input type="text" name="favorite_color">
    <input type="submit" value="Send">
    <input type="reset" value="Clear Form">
    </form>
    

Save it, enter some data into the form, and press "send". If the script runs successfully, you'll get email in a few moments with the results of your post. (Remember to change the $recipient in the form to your email address!)

Sending Mail to More Than One Recipient

What if you want to send the output of the form to more than one email address? Simple: just add the desired addresses to the $recipients line:

    $recipient = 'kira@cgi101.com, kira@io.com, webmaster@cgi101.com';


Subroutines

In the above script we used a new structure: a subroutine called "dienice." As in many languages, a subroutine is a block of code, separate from the main program, that only gets run if it's directly called. In the above example, dienice >/i>only runs if the main program can't open sendmail. Rather than aborting and giving you a server error (or worse, NO error), you want your script to give you some useful data about what went wrong; dienice does that, by printing the error message and closing html tags, and exiting from Perl. There are several ways to call a subroutine:

    &subname; &subname(args); subname; subname(args);
The &-sign before the subroutine name is optional. args are values to pass into the subroutine.

Subroutines are useful for isolating blocks of code that are reused frequently in your script. The structure of a subroutine is as follows:

    sub subname { ...code to execute... }
A subroutine can be placed anywhere in your CGI, though for readability it's usually best to put them at the end, after your main code. You can also include and use subroutines from different files and modules.

You can pass data into your subroutines. For example:

    mysub($a,$b,$c);
This passes the scalar variables $a, $b, and $c to the mysub subroutine. The data being passed (the arguments) are sent as a list. The subroutine accesses the list of arguments via the special array "@_". You can then assign the elements of that array to special temporary variables:

    sub mysub { my($tmpa, $tmpb, $tmpc) = @_; ...code to execute... }
Notice the my in front of the variable list? my is a Perl function that limits the scope of a variable or list of variables to the enclosing subroutine. This keeps your temporary variables visible only to the subroutine itself (where they're actually needed and used), rather than to the entire script (where they're not needed).

We'll be using the dienice subroutine throughout the rest of the book, as a generic catch-all error-handler.


Resources

Visit http://www.cgi101.com/ for learning more CGI-related issues in Perl. Also visit Network Programming in Perl to see another application side of Perl.