This web page publishes a C++ program that generates a cascade multiplier. The design for this multiplier comes from Joseph J.F. Cavanagh's excellent book on computer arithmetic, Digital Computer Arithmetic: Design and Implementation, 1984. Sadly this book is out of print. My copy of this book was printed before books were commonly printed on acid free paper and the pages are starting to yellow. Although this book is almost twenty years old, computer arithmetic remains the same. The only thing that has changed is the density of VLSI chips. Hardware modules like the cascade multiplier, discussed here, consumed too much hardware area in some cases. Given current chip density, a 32x32 multiplier can be included in a microprocessor.
I implemented this multiplier because I wanted a large structural Verilog test case for Quickturn's SimServer hardware accelerated behavioral simulation system. I was also fascinated by the tiling and recursive nature of the multiplier. The multiplier implemented here is not the most efficient in its use of hardware nor the fastest in terms of logic levels. Cavanagh shows how 4X4 adder slices can be combined with a look ahead carry tree (a so called Wallace tree) to build larger multipliers, which are faster than this multiplier for larger bit widths.
The figure below shows what I have been referring to as a cascade multiplier. The multiplier shown here takes two 4-bit operands (e.g., its a 4 X 4 multiplier) and generates an 8-bit result. The FA elements in the diagram are full adders. This version of the multiplier does unsigned multiplication. A slight variation on this design will perform signed integer multiplication.
Figure 3.36, A 4 X 4 NMM (nonadditive multiply module), pg 192, Digital Computer Arithmetic by Joseph J.F. Cavanagh, 1984
I'm not sure why Cavanagh refers to this as a nonadditive multiply module, since it is clearly using addition. The cascade multiplier generation program generates structural Verilog plus a small behavioral Verilog testbench. This test bench is only useful for testing small multiplier, since it tests all combinations of operands. The automaticly generated Verilog multiplier is shown below. It has been edited slightly for HTML formatting and readability (e.g., the module names are shown in bold, etc..)
module full_addr( a, b, c_in, c_out, sum ); input a, b, c_in; output sum, c_out; wire a1, a2, a3, a4; wire c1, c2, c3, c4; assign a1 = (~a) & (~b) & c_in; assign a2 = (~a) & b & (~c_in); assign a3 = a & (~b) & (~c_in); assign a4 = a & b & c_in; assign sum = a1 | a2 | a3 | a4; assign c1 = (~a) & b & c_in; assign c2 = a & (~b) & c_in; assign c3 = a & b & (~c_in); assign c4 = a & b & c_in; assign c_out = c1 | c2 | c3 | c4; endmodule module mult( A, B, rslt ); input [ 3:0] A, B; output [ 7:0] rslt; wire a00, a01, a02, a03; wire b00, b01, b02, b03; wire r00, r01, r02, r03, r04, r05, r06, r07; wire c00, c01, c02, c03, c04, c05, c06, c07; wire c08, c09, c10, c11; assign a00 = A[ 0]; assign a01 = A[ 1]; assign a02 = A[ 2]; assign a03 = A[ 3]; assign b00 = B[ 0]; assign b01 = B[ 1]; assign b02 = B[ 2]; assign b03 = B[ 3]; assign rslt[ 0] = r00; assign rslt[ 1] = r01; assign rslt[ 2] = r02; assign rslt[ 3] = r03; assign rslt[ 4] = r04; assign rslt[ 5] = r05; assign rslt[ 6] = r06; assign rslt[ 7] = r07; assign r00 = a00 & b00; full_addr fa000 (a01 & b00, a00 & b01, 1'b0, c00, s00); full_addr fa001 (a02 & b00, a01 & b01, 1'b0, c01, s01); full_addr fa002 (a03 & b00, a02 & b01, 1'b0, c02, s02); assign r01 = s00; full_addr fa003 (s01 , a00 & b02, c00, c03, s03); assign r02 = s03; full_addr fa004 (s02 , a01 & b02, c01, c04, s04); full_addr fa005 (a03 & b01, a02 & b02, c02, c05, s05); full_addr fa006 (s04 , a00 & b03, c03, c06, s06); assign r03 = s06; full_addr fa007 (s05 , a01 & b03, c04, c07, s07); full_addr fa008 (a03 & b02, a02 & b03, c05, c08, s08); full_addr fa009 (s07 , 1'b0, c06, c09, s09); assign r04 = s09; full_addr fa010 (s08 , c09, c07, c10, s10); assign r05 = s10; full_addr fa011 (a03 & b03, c10, c08, c11, s11); assign r06 = s11; assign r07 = c11; endmodule module testbench; reg [ 3:0] A, B; wire [ 7:0] rslt; mult mult_inst (A, B, rslt); initial begin : init parameter FALSE = 0; parameter TRUE = 1; reg [ 4:0] i, j; reg passed; passed = TRUE; for (j = 1; j < 16; j = j + 1) begin for (i = 1; i < 16; i = i + 1) begin A = i; B = j; #1; if (rslt != i * j) begin passed = FALSE; $display("test failed for %d x %d", i, j ); end // if end // for i end // for j if (passed) $display("test passed"); else $display("test failed"); $finish; end endmodule
The C++ code that generates the cascade multiplier is shown below. Elsewhere on the bearcave.com web pages I've been critical of the fact that most programmers do not document their software. For most large projects I try to make sure that my software is clearly documented. But like a lot of programmers, I'm embarrassed to say that I'm guilty of not providing much documentation for small hacks like this one. Even small hacks can be useful later, so this is clearly not good practice.
The source code below has been formatted for HTML. You an download the original, unformatted code here
/* Generate Verilog for a combinatoric multiplier This program generates the Verilog for a combinatoric multiplier that is built out of cascaded full adders. The program is given a command line argument for the width of the multiplier operands. It generates the full adder module, the multiplier and the associated test bench, which tests the multiplier through the complete range. This means that for values larger than 8 the verilog generated takes a long time to simulate on Verilog XL. The multipler in created by this program is based on the cascade multiplier described in "Digital Computer Arithmetic" by Joseph J.F. Cavanagh, McGraw-Hill, 1984 To compile on Solaris CC genmult.C -o genmult To run genmultFor example genmult 4 Author: Ian Kaplan Copyright and use: You are granted the use of this software without limitation as long as you include Copyright Ian Kaplan, iank@bearcave.com */ #include <ctype.h> #include <string.h> #include <stdio.h> enum { #ifndef FALSE FALSE = 0, #endif #ifndef TRUE TRUE = 1, #endif bogus_filler }; int get_size( char *num_str ) { int strok, len, i; int size; size = 0; len = strlen( num_str ); strok = TRUE; // check to make sure that all the characters in // the string are digits. for (i = 0; i < len; i++) { if (! isdigit( num_str[i] )) { strok = FALSE; break; } } // if the string is OK, convert it to an integer if (strok) { sscanf( num_str, "%d", &size ); } return size; } // get_size void gen_adder(FILE *fp) { fprintf(fp, "\n"); fprintf(fp, "module full_addr( a, b, c_in, c_out, sum );\n"); fprintf(fp, "input a, b, c_in;\n"); fprintf(fp, "output sum, c_out;\n"); fprintf(fp, "\n"); fprintf(fp, "wire a1, a2, a3, a4;\n"); fprintf(fp, "wire c1, c2, c3, c4;\n"); fprintf(fp, "\n"); fprintf(fp, " assign a1 = (~a) & (~b) & c_in;\n"); fprintf(fp, " assign a2 = (~a) & b & (~c_in);\n"); fprintf(fp, " assign a3 = a & (~b) & (~c_in);\n"); fprintf(fp, " assign a4 = a & b & c_in;\n"); fprintf(fp, " assign sum = a1 | a2 | a3 | a4;\n"); fprintf(fp, "\n"); fprintf(fp, " assign c1 = (~a) & b & c_in;\n"); fprintf(fp, " assign c2 = a & (~b) & c_in;\n"); fprintf(fp, " assign c3 = a & b & (~c_in);\n"); fprintf(fp, " assign c4 = a & b & c_in;\n"); fprintf(fp, " assign c_out = c1 | c2 | c3 | c4;\n"); fprintf(fp, "\n"); fprintf(fp, "endmodule\n"); fprintf(fp, "\n"); } // get_adder void gen_header( unsigned int mult_size, FILE *fp = stdout ) { fprintf(fp, "module mult( A, B, rslt );\n"); fprintf(fp, "input [%2d:0] A, B;\n", mult_size-1 ); fprintf(fp, "output [%3d:0] rslt;\n\n", (mult_size * 2)-1 ); } // gen_header void gen_end( FILE *fp ) { fprintf(fp, "endmodule\n"); } // gen_end void gen_mult( unsigned int mult_size, FILE *fp = stdout ) { int i, j, addr_cnt, rslt_cnt; fprintf(fp, " assign r00 = a00 & b00;\n"); addr_cnt = 0; for (i = 0; i < mult_size - 1; i++) { fprintf(fp, " full_addr fa%03d (", addr_cnt ); fprintf(fp, "a%02d & b00, a%02d & b01, 1'b0, c%02d, s%02d", i + 1, i, addr_cnt, addr_cnt ); fprintf(fp, ");\n"); addr_cnt++; } fprintf( fp, " assign r01 = s00;\n\n"); rslt_cnt = 2; for ( j = 2; j < mult_size; j++ ) { for (i = 0; i < mult_size - 1; i++) { if (i < mult_size - 2) { fprintf(fp, " full_addr fa%03d (", addr_cnt ); // module full_adder( a, b, c_in, sum, c_out ); fprintf(fp, "s%02d , a%02d & b%02d, c%02d, c%02d, s%02d", addr_cnt - (mult_size-2), i, j, addr_cnt - (mult_size-1), addr_cnt, addr_cnt ); fprintf(fp, ");\n"); if (i == 0) { fprintf( fp, " assign r%02d = s%02d;\n", rslt_cnt, addr_cnt ); } } else { fprintf(fp, " full_addr fa%03d (", addr_cnt ); fprintf(fp, "a%02d & b%02d, a%02d & b%02d, c%02d, c%02d, s%02d", i+1, j - 1, i, j, addr_cnt - (i+1), addr_cnt, addr_cnt ); fprintf(fp, ");\n"); } addr_cnt++; } rslt_cnt++; fprintf(fp, "\n"); } for (i = 0; i < mult_size - 2; i++) { if (i == 0) { fprintf(fp, " full_addr fa%03d (", addr_cnt ); // module full_adder( a, b, c_in, sum, c_out ); fprintf(fp, "s%02d , 1'b0, c%02d, c%02d, s%02d", addr_cnt - (mult_size-2), addr_cnt - (mult_size-1), addr_cnt, addr_cnt ); fprintf(fp, ");\n"); } else { fprintf(fp, " full_addr fa%03d (", addr_cnt ); // module full_adder( a, b, c_in, sum, c_out ); fprintf(fp, "s%02d , c%02d, c%02d, c%02d, s%02d", addr_cnt - (mult_size-2), addr_cnt -1, addr_cnt - (mult_size-1), addr_cnt, addr_cnt ); fprintf(fp, ");\n"); } fprintf( fp, " assign r%02d = s%02d;\n", rslt_cnt, addr_cnt ); addr_cnt++; rslt_cnt++; } fprintf(fp, " full_addr fa%03d (", addr_cnt ); fprintf(fp, "a%02d & b%02d, c%02d, c%02d, c%02d, s%02d", mult_size-1, mult_size-1, addr_cnt-1, addr_cnt - (mult_size-1), addr_cnt, addr_cnt ); fprintf(fp, ");\n"); fprintf(fp, "\n"); fprintf( fp, " assign r%02d = s%02d;\n", mult_size + (mult_size -2), addr_cnt ); fprintf( fp, " assign r%02d = c%02d;\n", mult_size + (mult_size - 1), addr_cnt ); } // gen_mult void gen_vars(int mult_size, char *name, FILE *fp ) { int i, semi; for (i = 0; i < mult_size; i++) { semi = FALSE; if ((i % 8) == 0) { if (i > 0) { fprintf(fp, ";\n"); semi = TRUE; } fprintf( fp, "wire "); } else if (i < mult_size ) fprintf(fp, ", "); fprintf( fp, "%s%02d", name, i ); } if (! semi) fprintf(fp, ";\n"); } void gen_assign( unsigned int mult_size, char *dest, char *src, FILE *fp ) { int i; for (i = 0; i < mult_size; i++) { fprintf(fp, " assign %s%02d = %s[%2d];\n", dest, i, src, i ); } } // gen_assign void gen_assign2( unsigned int mult_size, char *dest, char *src, FILE *fp ) { int i; for (i = 0; i < mult_size; i++) { fprintf(fp, " assign %s[%2d] = %s%02d;\n", dest, i, src, i ); } } // gen_assign void gen_assigns( unsigned int mult_size, FILE *fp ) { gen_vars(mult_size, "a", fp); fprintf(fp, "\n"); gen_vars(mult_size, "b", fp); fprintf(fp, "\n"); gen_vars(mult_size * 2, "r", fp ); fprintf(fp, "\n"); gen_vars(mult_size * (mult_size-1), "c", fp); fprintf(fp, "\n"); gen_assign(mult_size, "a", "A", fp ); fprintf(fp, "\n"); gen_assign(mult_size, "b", "B", fp ); fprintf(fp, "\n"); gen_assign2(mult_size * 2, "rslt", "r", fp ); fprintf(fp, "\n"); } // gen_assigns void gen_mult_module( unsigned int mult_size, FILE *fp ) { gen_header( mult_size, fp ); gen_assigns( mult_size, fp ); gen_mult( mult_size, fp ); gen_end( fp ); } // gen_mult_module void gen_testbench( unsigned int mult_size, FILE *fp ) { fprintf(fp, "\n"); fprintf(fp, "module testbench;\n"); fprintf(fp, "reg [%2d:0] A, B;\n", mult_size-1); fprintf(fp, "wire [%2d:0] rslt;\n", (mult_size * 2)-1); fprintf(fp, "\n"); fprintf(fp, " mult mult_inst (A, B, rslt);\n"); fprintf(fp, "\n"); fprintf(fp, " initial\n"); fprintf(fp, " begin : init\n"); fprintf(fp, " parameter FALSE = 0;\n"); fprintf(fp, " parameter TRUE = 1;\n"); fprintf(fp, " reg [%2d:0] i, j;\n", mult_size); fprintf(fp, " reg passed;\n"); fprintf(fp, "\n"); fprintf(fp, " passed = TRUE;\n"); fprintf(fp, " for (j = 1; j < %2d; j = j + 1)\n", 1 << mult_size); fprintf(fp, " begin\n"); fprintf(fp, " for (i = 1; i < %2d; i = i + 1)\n", 1 << mult_size ); fprintf(fp, " begin\n"); fprintf(fp, " A = i;\n"); fprintf(fp, " B = j;\n"); fprintf(fp, " #1;\n"); fprintf(fp, " if (rslt != i * j)\n"); fprintf(fp, " begin\n"); fprintf(fp, " passed = FALSE;\n"); fprintf(fp, " $display(\"test failed for %%d x %%d\", i, j );\n"); fprintf(fp, " end // if\n"); fprintf(fp, " end // for i\n"); fprintf(fp, " end // for j\n"); fprintf(fp, "\n"); fprintf(fp, " if (passed)\n"); fprintf(fp, " $display(\"test passed\");\n"); fprintf(fp, " else\n"); fprintf(fp, " $display(\"test failed\");\n"); fprintf(fp, " $finish;\n"); fprintf(fp, " end\n"); fprintf(fp, "\n"); fprintf(fp, "endmodule\n"); fprintf(fp, "\n"); } // gen_testbench void gen_verilog( unsigned int mult_size, FILE *fp = stdout ) { gen_adder( fp ); gen_mult_module( mult_size, fp ); gen_testbench( mult_size, fp ); } // gen_verilog int main(int argc, char *argv[] ) { int status; char *progname; progname = argv[ 0 ]; status = 0; if (argc != 2) { printf("usage: %s <multiplier size>\n", progname ); status = 1; } else { unsigned mult_size; mult_size = get_size( argv[ 1 ] ); if (mult_size > 0) { gen_verilog( mult_size ); } else { printf("usage: size should be greater than zero\n", progname ); status = 1; } } return status; }
Ian Kaplan
July 2001
back to VLSI Design and Simulation