3.0 Introduction
3.1 Metalanguage

3.2 The Driver

3.3 The Generator

3.4 Error Handling

3.5 Generating vs. Writing

3.6 LEX, A Lexical Analyzer Generator

3.7 Summary

Web References

Exercises

The Lex Metalanguage

The LEX metalanguage input consists of three parts: (1) definitions which give names to sequences of characters, (2) translation rules which define tokens and allow association of a number representing the class, and (3) user-written procedures in C which are associated with a regular expression. Each of these three sections is separated by a double percent sign, "%%". The input has the following form:

Definitions (Gives names to sequences of characters) %% Rules (Defines tokens. Class may be returned as a #) %% User-Written Procedures (In C)

The Definition section consists of a sequence of Names, each followed by an Expression to be given that name.

The Rules consist of a sequence of regular expressions, each followed by an (optional) action which returns a number for that regular expression and performs other actions if desired, such as installing the characters in the token into a name table.

The User-Written Procedures are the code for the C functions invokes as actions in the rules. Thus, the form for each section is:


      Name1            Expression1

      Name1            Expression1

      ...

      %%

      RegExp1            {Action1}

      RegExp2            {Action2}

      ...

      %%
      C function1

      C function2

...

A section may be empty, but the "%%" is still needed.

Like UNIX itself, the metalanguage is case-sensitive, that is, "A" and "a" are read as different characters.

Example 4 shows a LEX description for our language consisting of assignment statements. There are other ways of expressing the tokens described here.

Send questions and comments to: Karen Lemone