13.1.2 Compiler Issues in RISC Machines

Some of the issues that follow are also common to CISC machines and to recent languages such as Ada. They are mentioned here because compiler designers for RISC architectures have found them particularly important.

Pipelining

Pipelining traditionally has been the process of fetching instructions or data while another instruction is executing. It becomes increasingly important on RISC processors. Because all computations must take place in registers, and loads and stores access memory, it is important for the compiler to reorder code when it can so that the CPU is executing while a load or store is taking place. Since it is estimated that 1/3 of a program's executable instructions involve loads and stores, this is an especially important task. Pipelining has also been used to try to sustain an execution rate of more than one instruction/cycle.

Register Usage

Because RISC computations take place in registers and are one cycle or less, but a load takes more than one cycle, a technique called scoreboarding has been developed. When an instruction is begun whose result will affect a register, that register is marked on a scoreboard. This mark is erased when the instruction concludes (and the register has its new data). Other instructions check the scoreboard, and if they need the results from that register, the CPU waits. A clever compiler can keep these CPU waiting periods to a minimum.

If all program quantities are kept in registers, a speed-up of as much as 1/3 can be expected from the program. A large number of registers facilitates this, but even these must be used effectively.

Because of the large number of registers, it is often possible to keep each operand in a register and store the result in yet another register, thus allowing both operands to remain in a register for a possible future use.

A technique called value tracking keeps a list of program quantities whose values are currently in registers. A set of ordered pairs (Value, Register) is maintained, and when the value is needed for another computation, it is fetched from the register.

Registers may be organized into a file of (possibly overlapping) register sets. When these sets are managed by the hardware, they are called register windows. Register windows allow for fast storing and restoring of information in procedure calls. These register windows are organized into a circular queue rather than the traditional stack which stores and restores for each call. A new procedure call (automatically) increments a pointer so that the outgoing parameter registers become the ingoing parameter registers for the new procedure. A return operation reverses this process.

Optimization and Code Generation Techniques

Since loads and stores may slow execution, it is important for the code generator to minimize them. Peephole optimization techniques can remove redundant loads or unnecessary stores.

A two-cycle instruction such as a branch has to compute an address. Execution cannot continue until after the address is computed. A technique known as delayed branching moves the branch instruction back one instruction so that the statement preceding the branch is executed at the same time that the address is being calculated. Another technique is to execute the instruction after a conditional branch at the same time as the address fetch (hoping that the branch doesn't take place).

Other standard optimizations, such as computing common subexpressions once, and good register allocation, such as that performed by coloring, can improve RISC programs by higher percentages. One performance estimate for a RISC compiler vs. the same compiler on a CISC machine found the improvement on the RISC machine to be about 50% with nowhere near that much improvement on the CISC machine.

Run-time checks can sometimes be eliminated by the compiler. Checking that a variable X1 in the expression X1 = A + BB * 12 is within its declared bounds of say 0 .. 255, is not needed at run time if the compiler knows that A is declared as 0 .. 100 and B is declared as 0 .. 10. Of course, checks for A and B are still needed unless a similar dependence can also be found at compile time. On RISC machines, this sort of compiler simulation can cost as much as 20% in compile time, but may remove as many as 70% of the run-time checks.

Inlining is the replacement of a procedure call with the code of the called procedure. Even with register windowing, this can produce a program which runs faster at the expense of a (perhaps huge) increase in code space. Inlining is most productive when there are lots of calls to very short procedures.

Because the typical RISC architecture does not have microcode, the translated code is "bare-bones", exposing code sequences to optimizations which may have been hidden on CISC machines.

Debugger Interaction

Production-quality compilers produce code in the object file format of the machine. It is this structure to which the debugger refers. If there is a one-one correspondence between a source program construct and its place in the object file, then debugging is facilitated. In heavily optimized code, this may not be the case.

Debugging of optimized code is a challenge in any architecture-CISC or RISC. If a computation, say a loop invariant one, has been moved, and a programmer steps to that instruction point, the expected instruction is not there. Ideally, the debugger should try to figure out what the compiler has done and either undo it or inform the programmer. This may mean that the compiler should leave this information in the object file for the debugger.

Another problem arises from the register window situation. If a variable has been kept in a register during one procedure execution, and the programmer asks to see that variable's value when executing another procedure (and another register window), the debugger has to be informed that the variable is no longer in that register.

Debugging inlined code (see above) is also a difficult task. A bug in a procedure has now been reproduced as many times as the inlined procedure is called. Compilers, ideally, should leave information in the object module's symbol file, indicating that a piece of code had really only been written by the programmer once.