13.1.2 Compiler Issues in RISC Machines
Some of the issues that follow are also common to CISC machines and to recent languages such as Ada. They are mentioned here because compiler designers for RISC
architectures have found them particularly important.
Pipelining
Pipelining traditionally has been the process of fetching instructions or data while
another instruction is executing. It becomes increasingly important on RISC processors. Because all computations must take place in registers, and loads and stores
access memory, it is important for the compiler to reorder code when it can so that
the CPU is executing while a load or store is taking place. Since it is estimated that
1/3 of a program's executable instructions involve loads and stores, this is an especially important task. Pipelining has also been used to try to sustain an execution rate
of more than one instruction/cycle.
Register Usage
Because RISC computations take place in registers and are one cycle or less, but a
load takes more than one cycle, a technique called scoreboarding has been developed. When an instruction is begun whose result will affect a register, that register is
marked on a scoreboard. This mark is erased when the instruction concludes (and the
register has its new data). Other instructions check the scoreboard, and if they need
the results from that register, the CPU waits. A clever compiler can keep these CPU
waiting periods to a minimum.
If all program quantities are kept in registers, a speed-up of as much as 1/3 can be
expected from the program. A large number of registers facilitates this, but even
these must be used effectively.
Because of the large number of registers, it is often possible to keep each operand
in a register and store the result in yet another register, thus allowing both operands
to remain in a register for a possible future use.
A technique called value tracking keeps a list of program quantities whose values
are currently in registers. A set of ordered pairs (Value, Register) is maintained, and
when the value is needed for another computation, it is fetched from the register.
Registers may be organized into a file of (possibly overlapping) register sets.
When these sets are managed by the hardware, they are called register windows.
Register windows allow for fast storing and restoring of information in procedure
calls. These register windows are organized into a circular queue rather than the traditional stack which stores and restores for each call. A new procedure call (automatically) increments a pointer so that the outgoing parameter registers become the
ingoing parameter registers for the new procedure. A return operation reverses this
process.
Optimization and Code Generation Techniques
Since loads and stores may slow execution, it is important for the code generator to
minimize them. Peephole optimization techniques can remove redundant loads or
unnecessary stores.
A two-cycle instruction such as a branch has to compute an address. Execution
cannot continue until after the address is computed. A technique known as delayed
branching moves the branch instruction back one instruction so that the statement
preceding the branch is executed at the same time that the address is being calculated.
Another technique is to execute the instruction after a conditional branch at the same
time as the address fetch (hoping that the branch doesn't take place).
Other standard optimizations, such as computing common subexpressions once,
and good register allocation, such as that performed by coloring, can improve RISC
programs by higher percentages. One performance estimate for a RISC compiler vs.
the same compiler on a CISC machine found the improvement on the RISC machine
to be about 50% with nowhere near that much improvement on the CISC machine.
Run-time checks can sometimes be eliminated by the compiler. Checking that a
variable X1 in the expression X1 = A + BB * 12 is within its declared bounds of say
0 .. 255, is not needed at run time if the compiler knows that A is declared as 0 .. 100
and B is declared as 0 .. 10. Of course, checks for A and B are still needed unless a
similar dependence can also be found at compile time. On RISC machines, this sort
of compiler simulation can cost as much as 20% in compile time, but may remove as
many as 70% of the run-time checks.
Inlining is the replacement of a procedure call with the code of the called procedure. Even with register windowing, this can produce a program which runs faster at
the expense of a (perhaps huge) increase in code space. Inlining is most productive
when there are lots of calls to very short procedures.
Because the typical RISC architecture does not have microcode, the translated
code is "bare-bones", exposing code sequences to optimizations which may have
been hidden on CISC machines.
Debugger Interaction
Production-quality compilers produce code in the object file format of the machine.
It is this structure to which the debugger refers. If there is a one-one correspondence
between a source program construct and its place in the object file, then debugging is
facilitated. In heavily optimized code, this may not be the case.
Debugging of optimized code is a challenge in any architecture-CISC or RISC.
If a computation, say a loop invariant one, has been moved, and a programmer steps
to that instruction point, the expected instruction is not there. Ideally, the debugger
should try to figure out what the compiler has done and either undo it or inform the
programmer. This may mean that the compiler should leave this information in the
object file for the debugger.
Another problem arises from the register window situation. If a variable has been
kept in a register during one procedure execution, and the programmer asks to see
that variable's value when executing another procedure (and another register window), the debugger has to be informed that the variable is no longer in that register.
Debugging inlined code (see above) is also a difficult task. A bug in a procedure
has now been reproduced as many times as the inlined procedure is called. Compilers, ideally, should leave information in the object module's symbol file, indicating
that a piece of code had really only been written by the programmer once.
|