Code Generation

11.0 Introduction

The code generation phase translates the intermediate representation into "code". In these modules, the final code is assembly language code which is then assembled, linked and executed. Another options is object code which is output by many production-quality compilers. Still another method is to interpret the intermediate representation to execute the program directly.

The preparation for code generation phase decides how to allocate registers and memory. Thhe code generation phase translates the intermediate representation to assembly language using these registers and memory assignments.


11.1 Preparation for Code Generation

To prepare for code generation, the compiler decides where values of variables and expressions will reside during execution. The preferred location is a register since instructions execute faster when the data referred to in operands reside in registers. Ultimate storage is often a memory location, and due to the scarcity of registers, even intermediate results may need to be assigned memory locations also.

Various methods have been developed for good use of registers. One technique is to store all loop variables in registers (until there are no more registers) since statements inside loops may execute more than once.

Another easily implemented technique is to store the variables used the most in registers.

For machines with a stack and a stack pointer, operations involving the stack are generally quicker than those involving an access to (non-stack) memory. Thus, another code generation technique is to push all variables in a procedure onto the stack before the procedure is executed, access them from the stack (and two registers).

Careful assignment of expressions and variables to registers can increase the efficiency of the resulting compiler. In this module, we will generate code as though there were only one available register.


11.2 Generation of Directives from the Symbol Table

The symbol table is used to generate directives. Thus if A is entered in the symbol table with class equal to variable, and attribute type equal to integer, then the directive which allocates space for an integer is generated.

Example 1 shows directives for integers for a number of machines.

Example 1 Directives from a symbol table entry for a name with class = variable, type = integer, and character string A

 	On an 86-family:	A	DW	?
 	On a M68000:		A	DS.L
 


11.3 Code Generation from Abstract Syntax Trees

A simple code generator can generate code from an abstract syntax tree merely by walking the tree.

Consider the following abstract syntax tree for the two assignment statements:

X1 := A + BB * 12;

X2 := A/2 + BB * 12;

from Module 1:

Using a tree walk which generates code for the subtrees first, then applies the parent operator, the following code can be easily produced:

Load BB, R1
Mult #12, R1
Store R1, Temp1
Load A, R1
Add Temp1, R1
Store R1, Temp2
Load Temp2, R1
Store R1, X1
Load A, R1
Div R1, #2
Store R1, Temp3
Load BB, R1
Mult #12, R1
Store T4, R1
Load T3, R1
Add T4, R1
Store T5, R1
Load T5,R1
Store R1,X1

For two leaf nodes, the method shown loads the left-most into a register. The right-most leaf node could just as well have been the one put into the register.

Exercise 3 asks the reader to write the algorithm for this method.

Notice that the code continually stores the value in the register into memory in order to reuse it for the next computation. This is called a register spill.

The example in this section generates code for assignment statements whose right-hand sides are expressions. The next section discusses code generation strategies for various other language constructs.


11. 4 Standard Code Generation Strategies

A code generator can be written to recognize standard "templates":

  1. Assignment Statements

    A tree pattern of the form:

    generates a Move instruction:

    where aPlace represents the register, stack position or memory location assigned to a.

  2. Arithmetic Operations

    Suppose Op represents an arithmetic operation and consider an abstract syntax tree for the statement t = a Op b:

    One possible code sequence is:

    This is the method used in the example of the previous section. Of course, some machines require that special registers be used for operations such as multiplication and division.

    Example 2

  3. IF Statements

    IF statements can be represented by an abstract syntax tree such as:

    A typical code sequence is:

    Example 3 illustrates this for the statement:

    In Example 3, aPlace and bPlace mey refer to the variables a and b themselves, if the machine allows two memory operands. For machines such as the 86-family (PC-clones) which require that one operand be in a register, the instruction:

    can be replaced by:

    if neither a nor b have been previously assigned to a register.

  4. Loops

    Loops are just conditionals whose code is repeated. Consider the loop:

     	LOOP While condition DO
     	   Statements
     	ENDLOOP		
     
     

    An abstract syntax tree is:

    A reasonable code sequence is:

    Example 4 shows such a loop:


    11.5 Code Generation Techniques

         True code generator generators are still evolving, although much research has been devoted to this topic. Retargetable code generators or table-driven code generators are becoming more common. They enable a code generator to be created for a new machine with relative ease by separating the code generation algorithm (the driver) from the machine description. This is similar to the front-end generators which we saw in earlier chapters.


    11.6 Register Allocation

    To prepare for code generation, the compiler decides where values of variables and expressions will reside during execution. The preferred location is a register since instructions execute faster when the data referred to in operands reside in registers. Ultimate storage is often a memory location, and due to the scarcity of registers, even intermediate results may need to be assigned memory locations also.

         Memory allocation techniques are discussed in Module 12; in this chapter, we will address the issue of register allocation.


    11.6.1 Register Allocation vs. Assignment

    The term register allocation is used for two tasks: (1) register allocation itself which decides which program values shall reside in registers and (2) register assignment which picks the specific register in which these values will reside.

         Some compilers make a tentative allocation, then try an assignment, and if necessary reallocate.


    11.6.2 Register Allocation Schemes

    A good code generator will generate code that minimizes accesses to main memory; it will try to keep as much currently active data (values of variables and expressions) in registers as possible.

         There are two general approaches to register allocation. The first divides the registers to be allocated into two classes. The first class is those globally allocated: those to be allocated for the whole program or for a whole subprogram or perhaps a loop, and the second class is those used for temporary values and computations within a straight-line (no branches from IF's or loops) sequence of code.

         The second method is to do all allocation on a global basis, without dividing the registers into two classes.

         Simple global register allocation allocates registers for variables in inner loops first since that is generally where a program spends a lot of its time. Of course, this same register should be used for the variables if it also appears in an outer loop. After registers have been allocated globally, at least one (and often more) register is kept free for holding temporary results.


    11.6.3 Register Allocation by Usage Counts

    A slightly more sophisticated method for global register allocation is called usage counts. (This is a heuristic. A heuristic method is one that usually, but not always, makes things better.) In this method, registers are allocated first to the variables that are used the most.

    Example 5

    Consider the following loop:

     	LOOP: X = 2 * E
     	      Z = Y + X + 1
     	      IF some condition THEN
     	        Y = Z + Y
     	        D = Y - 1
     	      ELSE Z = X - Y
     	        D = 2
     	      ENDIF
     	        X = Z + D
                     Z = X
             ENDLOOP
     
    Here, there are five references to X, and Z, five references to Y, three references to D, and one to E. Thus, if there are three registers, a reasonable approach would be to allocate X and Z to two of them, saving the third for local computations. The resulting code would be (something like):

     	Load	X,R1 
     	Load	Z,R2 
     Loop:	Load	E,R3 
     	Mult	#2,R3 
     	Store	R3,X 
     	Copy	R1,R3 
     	Add	Y,R3
     	Add	#1,R3
     	Store   R3,Z
     	IF some condition  THEN
      	  Copy  R2,R3
     	  Add	Y,R3
               Store R3,Y
     	  Load	Y,R3
     	  Sub   #1,R3
     	  Store R3,D
     	ELSE Copy R1,R3
     	  Sub	Y,R3
     	  Store R3,Z
     	  Load  #2,R3
         	  Store R3,D
             ENDIF
     	  Copy  R2,R3
               Add   D,R3
               Store R3,X  
               Store R3,Z
     {ENDLoop}
     

    11.6.4 Register Allocation by Graph Coloring

    Using data flow of uses and definitions of variables, set up use-def chains:

    Two program quantities cannot share the same register if their use-def chains overlap.

    Called Overlapping lifetimes

    Called an Interference Graph

    If n registers, and graph can be colored with n colors, registers can be assigned.

    An NP-Complete Problem


    11.6.5 Register Assignment and Reassignment

    Register Allocation prioritizes which variables to be kept in registers

    Register Assignment (the other side) Which actual registers to use

    May depend on the machine

    Registers allocated and then freed

    Register spill: when we remove a value we'd really like to keep in a register, but can't