Adventures with Perl

The perl programming language has been around for six years or so. I first heard about it from my friends who did UNIX system administration. They liked perl because it was much easier to write scripts to do system adiministration tasks in perl than it was to write csh or (the horror) sh scripts. But since I avoid system administration work when ever I can and since csh and awk scripts had filled my needs, I did not take the time to learn perl.

Although I did not know perl I started to notice that its use was spreading outside of system administration. When I started to work on compilers for hardware design languages (hardware design languages like Verilog and VHDL are used to simulate and design integrated circuits), I found a whole community of perl users. In logic synthesis, a hardware design language compiler reads in a high level description of the hardware and produces a digital circuit net list that specifies digital cells and their interconnections. These net lists contain a great deal of information, including names from the user's design. Although there are standards for net lists, there tend to be enough differences between formats and naming conventions that the users of these compilers write perl scripts to massage their net lists. They also write perl scripts to extract information from these net lists. Although these perl scripts started out as quick hacks to get a job done, in some cases they metastasized into in-house applications.

In the last two years or so there has been an explosion in the popularity of perl. Perl allows people to add computation and other dynamic functionality to their Web pages by calling perl scripts via the Common Gateway Interface (CGI). Anything that can be executed on the Web server (e.g., compiled C or C++, an awk script, etc...) can be invoked via CGI. According to Developing CGI Applications with Perl by John Deep and Peter Holfelder:

Perl's ease of use and strong pattern-matching and string-manipulation properties have make it very popular as a CGI application language.

I was starting to feel like I should know something about this perl language, so one day when I was browsing around the Stanford University book store computer book section, I picked up a copy of Programming perl by Larry Wall and Randal L. Schwartz, published by O'Reilly & Associates. Click here for a book review of Wall and Schwartz's book. The book sat on my shelf for a few months until I came up with a task that looked too complicated for an awk script. My first inclination was to write a C or C++ program to do the job, but it seemed like the perfect excuse to learn perl and fill in yet another buzz word on my resume. Click here for my humble opinion of the perl programming language.

The rest of this Web page discusses the perl script. I decided to write this Web page for two reasons.

  1. It took a solid day of wading through Programming perl to get the perl script written, debugged and tested. From long experience I have found it valuable to keep examples like this around for future reference. Putting the example on the Web gives me easy access to it in the future.

  2. Two of the most important things that are needed when learning to program in a new language or write software for a new system is documentation that outlines the basics and sample code. Although Wall and Schwartz's book Programming perl includes a chapter titled Real Perl Programs that would be particularly useful for system administrators, I did not find any examples quite like the text processing application I had in mind. I wish that I had an example like this when I started. My hope is that someone else might find this a useful reference. This discussion is not intended to replace a book on perl. I assume that the reader already has such a book for reference and knows how to construct algorithms. I don't claim to be an expert perl programmer. Nor do I claim that my perl code is the most efficient way to do the job in perl. So if you're tempted to flame me because my perl code is dumb, send it to /dev/null and write your own Web page.

A Problem for Perl

The compiler that I am currently working on produces a pseudo-instruction set. This instruction set is shared by an assembler and an interpreter. Since we are experimenting with our implementation, on occasion a new instruction will be added or an existing instruction will be modified. The compiler has a file named opc.table that defines the instructions used by the compiler's code generation phase. We were generating the other files by hand from the opc.table file, which is an awkward and error prone process. So I decided to write a perl script that would automaticly generate two sub-files:

  1. opc.h

    This file contains a list of #defines that associate op-code names with op-code values. This file also contains the typedef for the op-code data structure used by the assembler and the interpreter.

  2. opc.C

    This file contains the initialization of the op-code table that is used by the assembler and the interpreter.

Creating opc.h and opc.C

The perl script has two arguments: the path to the opc.table input file and the output file name. Since two files are created, a .h and a .C file, only the root of the name is given (opc, to generate opc.h and opc.C, for example). The command line to execute the script is shown below.

perl perl_script input_file -o output_file

The script starts with a simple command line argument check to verify that there are the correct number of arguments and that the second argument is "-o". Perl does not have an argc (command line argument count) variable, although it does have an @ARGV array, which corresponds to argv in C. The number of elements in an array (like @ARGV) can be obtained by prefixing the array name with $#.


  $argcnt = $#ARGV + 1;
  if ($argcnt != 3 || $ARGV[1] ne "-o") {
     print "usage: perl perl_script input_file -o output_file\n";
  }
  else {
    $infile = $ARGV[ 0 ];
    $outfile = $ARGV[ 2 ];
    .... 
  }

The opc.table file is opened and read into the optable dynamic array. Since the file contents are now stored in memory, the input file can be closed.

      @optable = <INFILE_HANDLE>;
      close(INFILE_HANDLE);

Part of the opc.table file is shown below. This file may also contain C style comments (e.g., lines beginning with /*) and blank lines.

OPC op_imm =	{0,		"imm",1,	{IMM}                   };
OPC op_addr =	{0,		"addr",1,	{IMM} 		        };
OPC op_slice =	{0,		"slice",2,	{SAL,SAL}		};
OPC op_exit =	{0,		"exit",0,	{0}			};
OPC op_call =	{0,		"call",1,	{IMM}			};
OPC op_trigger = {0,		"trigger",1,	{SAL}			};
OPC op_sys_call = {0,		"sys_call",0,	{0}                   	};
OPC op_arg =	{0,		"arg",1,	{SAL}			};

The file opc.h is created from opc.table. The opc.h file contains a set of #defines that associate a symbol name with an op-code value. These defines are shown below:

#define OP_IMM 	 1
#define OP_ADDR 	 2
#define OP_SLICE 	 3
#define OP_EXIT 	 4
#define OP_CALL 	 5
#define OP_TRIGGER 	 6
#define OP_SYS_CALL 	 7
#define OP_ARG 	 8

The while loop below sequences through the @optable array and calls the create_define subroutine for each array element (which consists of a line from opc.table). Note the "&" which is used to distinguish subroutine name.

      $linecnt = 0;
      $opcnt = 1;  # this is a global
      while ($optable[$linecnt]) {
         &create_define( $optable[ $linecnt ] );
         ++$linecnt;
      } # while

The while loop could have been written more efficiently as

      for ($i = 0; $i < $#optable + 1; $i++) {
         &create_define( $optable[ $i ] );
      } # for

or

      foreach $line ( @optable ) {
         &create_define( $line );
      } # foreach

The create_define subroutine is shown below.

sub create_define {
  local($line) = @_;
  local($type, $name, $eq, $zero, $ascii, $args, $close);
  local($name);

  ($type, $name, $eq, $zero, $ascii, $args, $close) = split(/\s+/, $line);
  if ($type ne "/*" && $type ne "") {
     if (! &is_skip_op( $name )) {
       $name =~ tr/a-z/A-Z/;
       print OUTFILE_HANDLE_H "#define ", $name," \t ", $opcnt, "\n";
       ++$opcnt; # this is a global
     }
  }
} # create_defines

The assignment

  local($line) = @_;

assigns the subroutine argument list (which consists of one argument) to the local variable $line. The statement

  ($type, $name, $eq, $zero, $ascii, $args, $close) = split(/\s+/, $line);

"splits" $line into an array of strings, where each string is separated by white space in the original line. Note that \s denotes a "white space character" and \s+ means "one or more white space characters".

The =~ operator is a bit like the op-equal in C. The =~ operator takes the string operation on the left hand side and applies it to the string on the left. The statement

       $name =~ tr/a-z/A-Z/;

maps (translates) lower case characters into upper case characters.

The opc.table file contains some operator definitions that are only used internally by the compiler. These must be skipped when creating the opc.h and opc.C files. The create_defines subroutine calls is_skip_op to check to see if the current operator definition is one of the operators that should be skipped. Note that in perl the last statement does the same thing as the break statement in C (however, apparently last cannot be used to exit while loops in perl).

sub is_skip_op {
  local($name) = @_;
  local($answer, $found, $i);

  $found = 0;
  for ($i = 0; $i < $#skip_op + 1; $i++) {
    if ($skip_op[ $i ] eq $name) {
        $found = 1;
        last;  # exit the loop
    }
  }
  $answer = $found;  
}  # is_skip_op

Note that is_skip_op is a function that returns a boolean value. Perl subroutines return the value from the last expression executed (in this case, the assignment).

When I first wrote this perl script, I thought that is_skip_op would be perfect for perl's associative arrays. For example:

#
# This use of an associative array does not work correctly.
#
sub is_skip_op {
  local($name) = @_;
  local($answer, $found);

  $found = 0;
  if ($skip_op{ $name }) {
    $found = 1;
  }
  $answer = $found;  
}  # is_skip_op

Unfortunately, associative arrays do not seem to return a null value for array elements that are not present, so the code above does not work.

The rest of the code in the perl script creates the op-code data structure initialization file, opc.C. Part of the opc.table file is shown below, along with the associated lines of opc.C that is created by the perl script. Note that there is an extra field in opc.C which follows the quoted string. For most operators this field is set to one. However, for selected operators (OP_SYS_CALL, OP_BR, OP_BT and OP_BF) this field is set to zero.

opc.table

OPC op_imm =	{0,		"imm",1,	{IMM}                   };
OPC op_addr =	{0,		"addr",1,	{IMM} 		        };
OPC op_slice =	{0,		"slice",2,	{SAL,SAL}		};
OPC op_exit =	{0,		"exit",0,	{0}			};
OPC op_call =	{0,		"call",1,	{IMM}			};
OPC op_trigger = {0,		"trigger",1,	{SAL}			};
OPC op_sys_call = {0,		"sys_call",0,	{0}                   	};
OPC op_arg =	{0,		"arg",1,	{SAL}			};
OPC op_br =     {0,		"br",1,		{IMM}			};
OPC op_bt = 	{0,		"bt",2,		{SAL,IMM}		};
OPC op_bf = 	{0,		"bf",2,	        {SAL,IMM}	        };

opc.C

{OP_IMM,	"imm",1,1,	{IMM}	}, /* 1 */
{OP_ADDR,	"addr",1,1,	{IMM}	}, /* 2 */
{OP_SLICE,	"slice",2,1,	{SAL,SAL}	}, /* 3 */
{OP_EXIT,	"exit",0,0,	{0}	}, /* 4 */
{OP_CALL,	"call",1,1,	{IMM}	}, /* 5 */
{OP_TRIGGER,	"trigger",1,1,	{SAL}	}, /* 6 */
{OP_SYS_CALL,	"sys_call",0,0,	{0}	}, /* 7 */
{OP_ARG,	"arg",1,1,	{SAL}	}, /* 8 */
{OP_BR,	"br",1,0,	{IMM}	}, /* 9 */
{OP_BT,	"bt",2,0,	{SAL,IMM}	}, /* 10 */
{OP_BF,	"bf",2,0,	{SAL,IMM}	}, /* 11 */

The complete perl script, in all its gory detail, can be fetched below.

Useful Command Line Arguments

I debugged this perl script the old fashion way, with print statements. When I was almost done I noticed that there is a command line switch, -d which runs the perl interpreter in debug mode. There is also a flag, -w which, in theory, prints warnings about problems that may be in your perl script. Unfortunately the -w flag does not catch variable use before definition errors, which are particularly easy to encounter in a language like perl. For example, in the code fragement below the variables $x and $y are local to the block. The perl compiler does not notice that $y is read before it is given a value.

  else {
    # An example of a use before a definition
    local($x, $y);

    $x = $y;
    $x++;
    $y = $x;
  }

Perl for Windows NT

The version of perl that is documented in Programming perl by Wall and Schwartz is version 4. From looking around on the Web, it appears that perl has a dedicated following. Since the source for perl is in the public domain (under the "GNU general public license"), people have been busily adding features to perl. Version 5 of perl is now available. There are a number of sites that distribute perl for UNIX and most corporate UNIX networks will have perl. Perl for windows NT can be ftp'ed from ftp://ftp.microsoft.com/bussys/winnt/winnt-public/reskit/nt351/.

Ian Kaplan, August 1996