Polymorphic Types in C++

Introduction

An experienced C programmer can master the basic syntax of C++ in a few months. Mastering the application of C++ is another matter. Proper use of C++ is directly related to software engineering. I've often had the experience of knowing what I wanted to do in C++ but I've been unsure of how to best use the language to implement the design I had in mind. This article discusses what I think of as polymorphic data types. A polymorphic data type is an object that has a base type with a set of data fields that all objects of that type share. Types derived from that base type each have data fields that only apply to that type.

Most C++ books discuss inheritance, which is the foundation for a polymorphic type. However, in most cases the examples given are trivial. The objective of this article is to provide real examples showing polymorphic types and inheritance can be used. Since most of my work recently involves compilers, I will use examples from compiler implementation. However any large complex program usually requires similar types.

An Example in C

The code generation phase of a compiler must have data structures that can be used to represent instructions and instruction operands. For a RISC (Reduced Instruction Set Computer) instruction set usually has instructions with four times of operands:

  1. Register operand. For example add r1, r2, r3 where the contents of registers r2 and r3 are added and the result is placed in r1.
  2. Register index operand. For example ld r1, 12(r30) where the contents of the memory location pointed to by (12 + r30) is loaded into r1.
  3. Immediate operand. For example li r1, 42 where the immediate value 42 is loaded into register r1.
  4. Label operand. For example br some_label where the program branches to the address some_label.

Objects of type operand may have common fields. But each derived type (e.g., register, register index, immediate and label) will have fields that are unique to that type. Instructions and operands must be allocated dynamically, since the compiler cannot know in advance how many instructions will be generated. As a result, operands are usually referenced via a pointer. The code that processes operands must be able to determine what kind of operand it is (e.g., register, index, immediate or label). To catch errors in the program logic illegial referenced to an operand field must print an error or assert. For example, if the special fields for a register operand are referenced for an immediate operand, the program should assert.

An example of a polymorphic operand type, written in C, is shown below. All objects of type operand share the fields used, defined and kind.


   typedef enum 
   { 
       bad_opr_type, opr_index, opr_reg, opr_imm, opr_label
   } opr_kind;


    typedef struct {
	Boolean used;      // operand used flag
        Boolean defined;
	opr_kind kind;  // union "tag"
	union {
            reg_index      index;
	    registor       reg;
	    unsigned short imm;
            branch_target  target;
	} u;
    } operand;

The kind field is used to store the object type (e.g., opr_index, opr_reg, opr_imm or opr_label. Macros might be used to reference the various sub-type fields. For example

#define get_opr_imm( pOpr )  ((pOpr)->u.imm)

One problem with this macro is that it does not check the kind field before accessing the union. If there is an error in the program logic and the object pointed to by pOpr is not an immediate operand, the error will not be caught when the field is accessed. Although a more complex macro could be used, C++ provides a much more elegant way to implement the operand object and its sub-types.

The skeletons for the operand base class and the operand_imm, operand_reg, operand_rindex and operand_label derived classes are shown below. Each of the classes derived from operand will inherit any member variables or member functions defined in the parent class. Note that the variable kind (used in the structure above) is no longer needed, since the virtual function get_kind() can return the opr_kind for each derived class.

   typedef enum 
   { 
       bad_opr_type, opr_index, opr_reg, opr_imm, opr_label
   } opr_kind;

   /*
    * operand base class
    */
   class operand {
   
   private:
     unsigned int used         :  1,    // operand referenced
                  defined      :  1,    // operand used as a destination
                  unused_flags : 30;
   public:
     virtual opr_kind get_kind(void)         { return bad_opr_type; }
   };  // operand 
   
   
   class operand_imm : public operand {
   private:
     short int imm;
   public:
     opr_kind get_kind(void)         { return opr_imm; }
   }; // operand_imm
   
   
   class operand_reg : public operand {
   private:
     registor reg;
   public:
     opr_kind get_kind(void)         { return opr_reg; }
   };  // operand_reg
   
   
   class operand_rindex : public operand {
   private:
       registor base_reg;
       short int offset;
   public:
     opr_kind get_kind(void)         { return opr_index; }
   };  // operand_rindex
   
   
   class operand_label : public operand {
   private:
       char *label;
   public:
     opr_kind get_kind(void)         { return opr_label; }
   };  // operand_label

The operand_reg and operand_rindex classes make use of the registor class. For complete definitions for the registor and operand classes, follow the links below. These classes also make use of the POSIX include files

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

A pointer to any of the operand derived classes is compatible with a pointer to the base class. For example, the gen_tri function below, which could be used to generate a three operand instruction has three operand pointer arguments. Pointers to any of the derived classes are type compatible with these arguments.

void gen_tri( OPERATOR op, 
              operand *dest, operand *opr1, operand *opr2, 
              basic_block *bb)
{
  ...
} // gen_tri

Polymorphic types constructed in this manner make heavy use of virtual functions. The operand base class includes a virtual function for every class function in the derived classes. If a class function is used incorrectly (an operand_reg class function is used when the object is an operand_imm), there will be an assertion. This helps catch logic errors in the program.

All instances of the operand class contain a pointer to the virtual function table. So there is little extra memory overhead. However there is some call overhead, since all virtual function calls are indirect, through the virtual function pointer. In most cases this extra overhead is insignificant.

Objects derived from the operand type must be allocated with new so that the virtual function table pointer is properly initialized. If pool allocation is used (as it would be in most compilers), the new operator for the operand type must be overridden. This is done in the operand base class. See Overloading New in C++ for a discussion of how new can be overloaded in C++. The pool class referenced in the overloaded version of new takes the form

class pool {
public:    
  void *GetMem( unsigned int num_bytes )    
  {
     void *pVoid;

     // allocate memory from the memory pool
     return pVoid;
  }
};

Ian Kaplan
Last updated May 1998


back to Notes on Software and Software Engineering

back to home page