3.0 Introduction

3.1 Metalanguage

3.2 The Driver

3.3 The Generator

3.4 Error Handling

3.5 Generating vs. Writing

3.6 LEX, A Lexical Analyzer Generator

3.7 Summary

Web References

Exercises

3.3.1 Lexical Analyzer Generator Step 1:
Converting a Regular Expression to a Finite
Automaton

There are six steps (a) - (f):

The diagram above shows that the regular expression is recognized by a nondeterministic finite automaton that has no way to get to a final state.

The diagram above shows that the regular expression is recognized by the nondeterministic finite automaton which goes from its initial state to its final state by reading no input. Such a move is called an - transition

In table form this would be denoted:

The reader may be wondering about an input of and how it would be recognized by the driver. In fact, it will not be. This is just a first step in table creation, and we will allow the creation of a nondeterministic finite automation (NFA). Section 3.3.2 outlines how to turn this NFA into a deterministic finite automation (DFA)

The diagram above shows that the language {a} denoted by the regular expression a is recognized by the finite automatation that goes from its initial to final state by reading an a. In table from this would be:

In the diagram above, we presume that the languages denoted by the regular expressions R and R, respectively, are recognized by the finite automata denoted by M and M.

Adding -transitions to the beginning and end creates a finite automation which will recognize either the language represented by R or the language represented by R, i.e., the union of these two languages, LR U LR.

We could eliminate the new final state and the -transitions to it by letting the final states of M and M remain final states.

In table form, this is:

Here, the table entry for f is shown in a different row because the process of recognizing strings in either LR or LR may lead to other intermediate states.

Our example, which recognizes

Letters = A | B | C | . . . | a | b | . . . |z

and Literal = 0 | 1 | 2 | . . . | 9

will use this step.

In the diagram above, M is the machine that accepts LR and M is the language that accepts LR. Thus, after this combined automation has recognized a legal string in LR, it will start looking for legal strings LR. This is exactly the set of strings denoted by R R. The related table is:

Here the table leads from an initial state, i, to the final state of M upon recognition of strings in LR, then from the final state in M, fM to the initial state in M on an -transition, that is, without reading any input, and then from the initial state in M, iM to the final state in M, fM upon recognizing the strings in LR. The is often omitted. It is understood that two regular expressions appearing next to one another represents the concatenation. We will use this step for the example, when we define an identifier as a string of letters and digits (conatenated). Another example will recognize a literal as the concatenation of one or more digits, that is,

Identifier = Letter (Letter | Digit) *

means

Identifier = Letter (letter | Digit) *

Here, M is a finite automation recognizing some LR. A new initial state is added leading to both M and to a new final state. The - transition recognizes the empty string, (if R did not contain already). The transition to M causes the language LR to be recognized. The transition from all final states of R to its initial state will allow the recognition of LR , i = 1, 2,... The table is:

We will use this step as well as the preceding two, for the regular expressions

Identifier = Letter (Letter | Digit)*

and

Literal = Digit* = Digit Digit*

Using these diagrams as guidelines, we can "build" a nondeterministic finite automation from any regular expression (See Exercise 6

Send questions and comments to: Karen Lemone