4.0 Introduction

A parser generator, also called a syntax analyzer generator, follows the form shown in Chapter 1:

Although syntax analysis is more powerful than lexical analysis (a parser can recognize tokens), it is not necessarily more complicated to generate one. In fact, generation of a top-down parser is relatively simple.

 

Generation

The metalanguage for a parser generator is a context-free grammar. This context-free grammar is usually expressed in Backus-Naur form, BNF, or extended Backus-Naur form, EBNF, themselves metalanguages for context-free grammars. EBNF differs from BNF, allowing shortcuts resulting in fewer but more complicated productions. In this text, we often say the metalanguage for a parser generator is BNF.

The BNF in Example 1 is more complicated than that shown in Chapter 1 because top-down parser generation places more restrictions on the grammar.

Example 1 shows the parser generator converting this BNF into tables (or a Case statement). The form of the tables depends upon whether the generated parser is a top-down parser or bottom-up parser. Top-down parsers are easy to generate; bottom-up parsers are more difficult to generate.

4.1.1 Metalanguage Restriction

The grammar from Modules 1 and 2 actually has two problems which a top-down parser generator must eliminate. We discuss these as preprocessor steps to the parser generator.

Left-Recursion

The grammar for assignment statements from Modules 1 and 2 is left-recursive; that is, in the process of expanding the first nonterminal, that nonterminal is generated again:

The most reasonable top-down parsing method is one where we look only at the next token in the input. Looking at more than the next token has proved too time-consuming in practice.

Parsing, here initiated by the driver program, proceeds by expanding productions until a terminal (token) is finally generated. Then that terminal is read from the input and the procedure continues. If the first nonterminal is the same as that being expanded, as in the production for an expression above, it will be continually and indefinitely expanded without ever generating a terminal that can be matched in the grammar.

Generation of top-down parsers may require a preprocessing step that eliminates left-recursion.

Common Prefix

A second problem is caused by two productions for the same nonterminal whose right-hand side begins with the same symbol:

S --> a

S --> a

These two productions contain a common prefix, and once again the parser, as initiated by the driver looking at only the next symbol in the input, does not know which of these productions to choose. Fortunately, grammars can be changed to eliminate these problems. We will show algorithms for changing the grammars to eliminate both left-recursion and common prefixes.

4.1.2 Elimination of Left-Recursion

There is a formal technique for eliminating left-recursion from productions

Step One: Direct-Recursion

For each rule which contains a left-recursive option,

introduce a new nonterminal A' and rewrite the rule as

Thus the production:

is left-recursive with "E" playing the role of "A","+ T" playing the role of , and "T" playing the role of A'. Introducing the new nonterminal E', the production can be replaced by:

Of course, there may be more than one left-recursive part on the right-hand side. The general rule is to replace: