Optimization: Introduction and Control Flow Analysis

8.0 Introduction

The optimization phase makes changes to the intermediate representation so that the code generator will put out better but equivalent code.

The term optimization is a misnomer. The problem of changing code in all cases to the best it can be is an undecidable problem; it cannot be solved an algorithm on a computer.

We can, however, improve code in most cases.

There are machine dependent optimizations such as good allocation of registers and machine independent optimizations such as computing constants; constant computation at compile-time rather than generating code to compute them saves execution time.

Optimization can consume a lot of compile-time, so we want the most payoff for the least effort. In particular, the compiler needs to guess where the most frequently executed parts of the program are.

8.1 Comparison of Algorithms

Optimization can't speed up a really slow algorithm by large factors. Figure 1 shows, within multiplication by a constant, the amount of data which can be processed relative to a base value. If 1000 "widgets" can be processed in linear time, then only 140 can be processed using an algorithm which is O(nlogn), etc.

Suppose we are able to optimize a program to produce a speed-up of a factor of 10(an ambitious improvement).

If we let S = problem size, then after a speed-up by a factor of 10:


 

 	Complexity		Problems Size

 

 	Linear (n)		10S

 	nlogn 			10S (for large S)

 	n2 			3.16S

 	n3 			2.15S

 	2n 			S + 3.3

 

We can conclude from this that bad algorithms are not helped much by optimization.

However, clever optimizations of clever code may not improve things either. Knuth (1971) presented an amusing discussion in which he referred to an algorithm which he had developed many years ago which translated a complicated expression to fewer instructions than some other people's algorithm. Years later, he realized that programmers rarely used such complicated expressions, and it isn't worth the compiler's time to optimize them.

8.2 What to Optimize

We want to optimize things that happen frequently. There are two kinds of frequently: (1) how frequently a piece of code is ever written and (2) how frequently a piece of code is executed within a program.

For (1), Knuth (1971) shows empirically that simple assignment statements with one operation and one to three variables are themost often used statements:

 	A := B 	A := A OP 1 	A := A OP k 	A := B OP k 	A := B OP C 
Here, A, B, and, C are variables and k is a constant. Follow-up studies have continued to comfirm this. It does little good to optimize for things programmers rarely write.

For (2), code in loops tends to be executed the most; thus optimizations concentrate onloops.

8.3 Issues in Optimization

The most important optimization principle is correctness. The meaning of a program can't be changed:

The task is to find frequently occurring code sequences, perform improvements to them, and forget the rest.

Sometimes the questions are asked: Why optimize? Can't programmers just be taught to write good code? The answer is (rarely) yes . For large programs which execute for many hours or days, programmers do rewrite them to make them as efficient as possible. The more common answer, however, is no. Programmers should be able to use the features of a language to implement an algorithm without worrying about its efficiency. They should be optimizing for readability and maintainability. Example 1 shows a common case


 EXAMPLE 1 Two-dimensional arrays

 	LOOP

 	  A[I,J] := . . .

 	  . . .

 	  . . .

 	   := A[I,J]

 	  . . .

 	ENDLOOP

 
An optimizing compiler is expected to optimize the high-level features.

Optimizations optimize most frequently for time. This is represented, syntactically (that is, it can be seen by looking at the code), by minimization of the number of operations and minimization of the number of memory accesses. Operations can be computed faster if their operands are in registers. Constants can be computed at compile-time.

Optimizations optimize next most frequently for space . Minimizing memory accesses may produce a smaller amount of code as well as code that executes faster since most machine code requires more bytes to reference memory.

There are also space-time tradeoffs. For example, replacing two instances of the same code by a subroutine may save code space, but takes more time because of the overhead of jumping to the subroutine, passing arguments and returning from the subroutine.

An optimization should speed up a program by a measurable amount. Also, we don't optimize in heavy debugging environments (student environments are heavy debugging environments). It takes too much compiler time for environments where a program is compiled many times (until it is correct) and run once or twice.

On the other hand, as we shall see, certain so-called peephole optimizations are so easy that they should be done by all compilers.

8.4 Optimization Phases

Optimization can be performed on the tree. This is called high-level optimization. Even with today's programming standards -- no transfer into the middle of loops, etc. -- principle 1 makes these high-livel optimizations unsafe. We will see why when we look at a formal definition of a loop. More aggressive, as well as correct, optimizations are made on a graphical intermediate representation of a program.

The phase which represents the pertinent, possible flow of control is often called control flow analysis. If this representation is graphical, then a flow graph depicts all possible execution paths. Control flow analysis simplifies the data flow analysis. Data flow analysis is the proces of collecting information about the modification, preservation , and use of program "quantities"-- usually variables and expressions.

Once control flow analysis and data flow analysis have been done, the next phase, the improvement phase, improves the program code so that it runs faster or uses less space. This phase is sometimes termed optimization. Thus, the term optimization is used for this final code improvement, as well as for the entire process which includes control flow analysis and data flow analysis.

Optimization algorithms attempt to remove useless code, eliminate redundant expressions, move invariant computations out of loops, etc.

The next section describes control flow analysis. We begin by creating the control flow graph.

8.5 Control Flow Analysis

Control flow analysis proceeds in two steps: (1) determine the basic blocks -- sequences of straight-line code and (2) build a flow graph by taking branches into account.

8.5.1 Basic Blocks

A basic block is a sequence of intermediate representation constructs (quadruples, abstract syntax trees, whatever) which allow no flow of control in or out of the block except at the top or bottom. Figure 3 shows the structure of a basic block.

We will use the term statement for the intermediate representation and show it in quadruple form because quadruples are easier to read than IR in tree form.

Leaders

A basic block consists of a leader and all code before the next leader. We define a leader to be (1) the first statement in the program (or procedure), (2) the target of a branch, identified most easily because it has a label, and (3) the statement after a "diverging flow " : the statement after a conditional or unconditional branch. Figure 4 shows this pictorially:

Basic blocks can be built during parsing if it is assumed that all labels are referenced or after parsing without that assumption. Example 2 shows the outline of a FOR-Loop and its basic blocks.

Since a basic block consists of straight-line code, it computes a set of expressions. Many optimizations are really transformations applied to basic blocks and to sequences of basic blocks.

Although this program might be translated in different ways, the discussion here is still valid.


One we have computed basic blocks, we can create the control flow graph.

8.5.2 Building a Flow Graph

A flow graph shows all possible execution paths. We will use this information to perform optimizations.

Formally, a flow graph is a directed graph G, with N nodes and E edges. (Remember that a directed graph is a connected set of nodes where every node is reachable from the initial node.) Example 3 shows the control flow graph for Example 2. The rules for building such a graph follow.

More formally, we create nodes and arcs as shown in Figure 5. Successors and predecessors are defined as in Figure 5.

These are all the edges in a flow graph. (Convince yourself of this.)

8.5.3 Techniques for Building Flow Graphs

Building a control flow graph involves finding predecessors and successors.

Consider the first situation of Figure 6 where basic block A has been built and a leader is identified for the beginning of basic block B. We cannot build the tail of the arrow to basic block B until we find the diverging flow that leads here. Backpatching: returning and filling in the tail information, will define the source of this arc.

The second picture represents the opposite situation. In the second picture in Figure 6, we know we have an arc, but haven't found the leader to which the arc should point; the tail of the arc is known, but its head is not. Here, we can allocate a data structure for a basic block and then backpatch when it is found.

8.5.4 Date Struction for building Control Flow Graphs

Basic blocks may be created and destroyed as the program is modified in the optimization process. Thus, space must be made available for growing collections of blocks. Storage reclamation is essential to handle optimization of large programs.

Since the structure of the flow graph varies with time, the data structure for blocks might keep pointers to lists of successors.

Since statements within the block change with time, it is reasonable to keep the intermediate representation as a linked list and reclaim storage of unused statements. Another possibility is to allocate dynamically on a heap and then garbage collect (see a data structures text if the terms heap and garbage collection are not familiar) when the heap runs out. See Module 12 for a discussion of garbage collection.

8.5.5 DAGs

The previous sections looked at the control flow graph nodes of basic blocks. DAGs, on the other hands, create a useful structure for the intermediate representation within the basic blocks.

A directed graph with no cycles, called a DAG (Directed Acyclic Graph), is used to represent a basic block. Thus, the nodes of a flow graph are themselves graphs!

We can create a DAG instead of an abstract syntax tree by modifying the operations for constructing the nodes. If the node already exists, the operation returns a pointer to the existing node. Example 4 shows this for the two-line assignment statement example.

8.5.6 Data Structure for a DAG

As Example 4 shows, each node may have more than one pointer to it. We can represent a DAG internally by a process known as value-numbering . Each node is numbered and this number is entered whenever this node is reused. This is shown in Example 5.


 EXAMPLE 5 Value-numbering

 

 	Node	   Left Child	    Right Child

 	1. X1

 	2. a

 	3. bb

 	4. 12

 	5. * 		(3)		(4)

 	6. +		(2)		(5)

 	8.y :=		(1)		(6)

 	8. X2

 	9. 2

 	10. /		(2)		(9)

 	11. +		(10)		(5)

 	12. :=		(8)		(11)

 

8.5.7 Benefits of DAGs

When a DAG is created, common subexpressions are detected. Also, creating a DAG makes it easy to see variables and expressions which are used or defined within a block.

DAGs also produce good code at code generation time. In Module 11, we will see a code generation algorithm which produces good (but not optimal!) code from a DAG.

Example 6 shows a bubble sort procedure and a set of quadruples to which it might be translated. Quadruples are an intermediate representation consisting of a single operation, up to two operands and a result. They are easier for humans to read than an abstract syntax tree. The quadruples shown here are in "assignment statement" form.


 EXAMPLE 6 IR for a bubble sort procedure

 	FOR I := 1 TO n - 1 DO

 	        FOR J := 1 TO I DO

 		IF A[J] > A[J+1] THEN

 		    BEGIN

 			Temp := A[J]

 			A[J] := A[ J + 1 ]

 			A[ J + 1 ] := Temp

 		    END

 
 	
 		

 		I := 1

 		GOTO ITest

 	ILoop:	J := 1

 		GOTO JTest

 	JLoop:	T1 := 4 * J

 		T2 := A [T1] 		; A[J]

 		T3 := J + 1

 		T4 := 4 * T3

 		T5 := A[T4] 		; A[ J + 1 ]

 		IF T2 <= T5 GOTO JPlus

 		T6 := 4 * J

 		Temp := A[T6] 		; Temp := A[J]

 		T7 := J + 1

 		T8 := 4 * T7

 		T9 := A[T8] 		; A { J + 1 ]

 		T10 := 4 * J

 		A[T10] := T9 		; A[J] := A[ J + 1 ]

 		T11 := J + 1

 		T12 := 4 * T11

 		A[T12] := Temp		; A [ J + 1 ] := Temp

 	JPlus:	J := J + 1

 	JTest:	IF J < = I GOTO JLoop

 	IPlus:	I := I + 1

 	ITest:	IF I <= n - 1 GOTO ILoop

 

 	T1 := addr(List)

 	T2 := T1 + I

 	temp := (T2)

 

 	T1 := addr (List)

 	T2 := T1 + I

 	(T2) := Temp

 
Example 7 shows the basic blocks and the control flow graph for the procedure in Example 6.

Example 8 shows a DAG for block 3 of Example 8.

8.6 Loops

Programs spend the bulk of their time in loops. Thus, the payoff in loop optimization is potentially high.

There are two main types of loop optimizations: (1) movement of loop-invariant computations outside of the loop and (2) elimination of excess induction variables, the variables which are a linear function of the loop index. Other optimizations often performed within loops are reductions in strength of certain multiplications to additions.

Example 9 shows invariant code movement. The details of how this optimization is found and implemented are given in the next two modules. In this module we focus on the importance of being able to identify a loop.

EXAMPLE 9 Invariant code motion


 	FOR index := 1 TO 10000 DO	t := y * z

 	BEGIN				FOR index := 1 TO 10000 DO

 		x := y * z ;		BEGIN

 		j := index * 3 ; -- >		x := t

 	END					j := index * 3

 	.				END

 	.				.

 					.

 
center>

8.6.1 Defining a Loop

In order to have a method for finding loops suitable for optimization ( not all are ), we need to define what a loop is.

Intuitively, cycles in flow graphs are loops, but to perform optimizations we need a second property:

8.6.2 Headers

A node n is the header of a set of nodes S if every path from the start node to a node in S first goes through n.

A cycle must have a header to be considered a loop. Optimizations such as code motion cannot be done safely if there is more than one entry to the loop. Also, the header concept makes the algorithms for loop-invariant code motion, etc. conceptually simplier.

Example 10 shows a flow graph with two cycles, one of which is a loop and one of which is not a loop.


Finding Loops

Although the preceding definition is helpful for us humans to find a loop, we will need some more definitions in order to write an algorithm to find a loop.

An efficient algorithm for finding a loop involves finding dominators, depth-first search, and depth-first ordering.

8.6.3 Dominators

A node d dominates a node n, written d DOM n, if every path from the start node to n goes through d. This means that a node dominates itself. Example 11 shows dominators for the graph of Example 10.

Note that for a node to dominate a node n, it must be n or dominate every predecessor of n. We use this fact to create an algorithm to find the dominators in a control flow graph. We use the notation Dom(n) for the set of dominators of a node n

This algorithm for computing dominators does not indicate in what order to visit the node, just that all nodes are visited. In a later section, we will discuss an order for which this algorithm converges rapidly. Example 12 shows dominators being calculated for the graph of Example 11.

8.6.4 Natural Loops

We use dominators to find loops. Since loops contain a cycle, and a header dominates all the nodes in a loop,

For this reason, we search for an arc whose head dominates its tail. This is called a back edge. Loops must have a back edge.

The natural loop of the back edge is defined to be the smallest set of nodes that includes the back edge and has no predecessors outside the set except for the predecessor of the header. Natural loops are the loops for which we find optimizations.

8.6.5 Depth-First Order and Depth-First Spanning Trees

Intuitively, we begin at the start node and attempt to get as far away as possible. As we proceed, we can make a (yet another!) tree from the arcs of the flow graph. This is called a depth-first spanning tree, DFST.

A depth-first order is defined to be the reverse of the order in which we last visit the nodes of the control graph when we create the DFST. The following algorithm creates a depth-first spanning tree.

Exercise 4 asks the readers to compute dominators by visiting the nodes in a somewhat random order and then to compute dominators by visiting the nodes in a depth-first order. For some graphs, the algorithm terminates faster when the nodes are visited in a depth-first order than when they are visited in the order Start, Node A, Node B, Node C, Node D.

8.6.6 Finding Natural Loops Efficiently

We introduce one more definition - a backwards edge.

These steps find the natural loops of a program in an efficient way.