3.0 Introduction

3.1 Metalanguage

3.2 The Driver

3.3 The Generator

3.4 Error Handling

3.5 Generating vs. Writing

3.6 LEX, A Lexical Analyzer Generator

3.7 Summary

Web References

Exercises

The Lex Metalanguage

The LEX metalanguage input consists of three parts: (1) definitions which give names to sequences of characters, (2) translation rules which define tokens and allow association of a number representing the class, and (3) user-written procedures in C which are associated with a regular expression. Each of these three sections is separated by a double percent sign, "%%". The input has the following form:

    Definitions (Gives names to sequences of characters)
    %%
    Rules (Defines tokens. Class may be returned as a #)
    %%
    User-Written Procedures (In C)

The Definition section consists of a sequence of Names, each followed by an Expression to be given that name.

The Rules consist of a sequence of regular expressions, each followed by an (optional) action which returns a number for that regular expression and performs other actions if desired, such as installing the characters in the token into a name table.

The User-Written Procedures are the code for the C functions invokes as actions in the rules. Thus, the form for each section is:

    Name1            Expression1
    Name1            Expression1
    ...
    %%
    RegExp1            {Action1}
    RegExp2            {Action2}
    ...
    %%

    C function1
    C function2


    ...

A section may be empty, but the "%%" is still needed.

Like UNIX itself, the metalanguage is case-sensitive, that is, "A" and "a" are read as different characters.

Example 4 shows a LEX description for our language consisting of assignment statements. There are other ways of expressing the tokens described here.

Send questions and comments to: Karen Lemone