|
13.2.3 Compiling for Array Processors and SupercomputersArray processors (sometimes called vector machines) and supercomputers use parallelism to increase performance. Typically, these machines are used to perform high- precision arithmetic calculations on large arrays. In addition, they may need to operate in real time or close to real time. Supercomputers are also used for general-purpose computing, while array processors are special-purpose (often peripheral) processors which operate solely on vectors. Discovering data dependencies is one of the major issues for these architectures. Data Dependencies Data dependence checking is important for detecting possibilities for parallel scheduling of serial code. Strictly speaking, the problem is one of concurrency. For example, the following statements cannot be executed at the same time: X := Y + 1 Z := X + 2 Since X's value is needed to compute Z, the second statement depends on the first. The following statements could be executed in parallel: X := Y + 1 Z := Y + 2 Parallelization, like optimization, has a potentially higher payoff in loops. The following loop can be changed to execute in parallel: FOR I := 1 TO AHighNumber A[I] := A[I] + B[I] ENDFOR For two processors this could become: FOR I := 1 TO AHaghNumber BY 2 A[I] := A[I] + B[I] A[I+1] := A[I + 1] + B[I + 1] ENDFOR This is called loop unrolling. Since each statement within the loop is affecting and using a separate element of the array, the two statements can be executed in parallel. There will be half as many test and branch instructions to execute since the loop is now counting by 2's. Some machines have numerous processors. On a machine with 64 processors, the following FOR I := 1 TO N * 69 A[I] := A[I] + B[I] ENDFOR might become FOR I := 0 TO N - 1 DO FOR J := 64 * I + 1 TO 64 * I + 64 DO A[J] := A[J] + B[J] ENDFOR ENDFOR The inner loop statement can now be executed simultaneously on all 64 processors. The statement in the following loop contains a data dependency and cannot be effectively parallelized: FOR I := 1 TO AHighNumber A[I] := A[I - 1] + B[I] ENDFOR Here, the value computed in one iteration of the loop is used in the next iteration so that the loop cannot be unrolled and processed in parallel. Debugger Interaction Producing information in the object module which the debugger can use is important since there may no longer be a one-one correspondence between the code produced by the programmer and that which is scheduled for parallel execution. If the technique of scheduling code after the compiler has produced it is used, then there isn't even a one-one correspondence between the code produced by the compiler and that being executed by the debugger. In this case, the scheduler should leave a trail for the debugger to follow. |