Although the picture above implies that an entire compiler can be created by a compiler, in fact, compiler generators cannot yet generate entire compilers automatically.
Like compilers themselves, compiler generators are frequently separated into phases. For the front end of a compiler, the phases are often termed lexical analyzer generator, syntax analyzer generator (or parser generator) and semantic analyzer generator.
As the names imply, a lexical analyzer (or scanner) generator generates a lexical analyzer (or scanner), and a syntax analyzer (or parser) generator generates a syntax analyzer (or parser).
Generator phases for the back end of compilers are still very much a research topic although work has been done on code generator generators. Figure 1 shows how a compiler generator can be parametrized. The rectangular boxes show the standard compiler phases while the ovals show the corresponding generators. Figure 1 shows how a compiler generator can be parametrized. The rectangular boxes show the standard compiler phases while the ovals show the corresponding generators. The inputs to the ovals are termed metalanguages, which are languages that describe other languages. The front-end parts are shown grouped because often there is a single input file containing all the information needed to generate the front end of a compiler. In Figure 1, IR stands for Intermediate Representation.
Scanner generators are introduced in Section 1.3 and parser generators in Sec1.4. Attribute grammars and evaluator generators are introduced in Sec.1.5. For the back end, optimization and code generation will be discussed more from the point of view of automating or retargetting, and they are introduced in Section 1.6 and Section 1.7. This chapter is an overview of compiler tools. Algorithms and data structures are discussed within each chapter.
The first translator-like compilers were written in the late 1950's. Credit is given to FORTRAN as the first successfully compiled language. It took 18 person-years to develop the FORTRAN compiler because the language was being designed at the same time that it was being implemented, and since this was a first all around, the translation process was not well understood.
We can now design and implement compilers faster and better because (1) languages are better understood, (2) tools now exist for some of the phases of a compiler, and (3) data structures and algorithms are well-known for various compiler tasks.
By the late 1950's, computers and computer languages were proliferating. To create a compiler for N languages on M machines would require N * M programs if each were to be created separately. However, if the front-end of a compiler were to translate the source to a common intermediate language, and the back end of a compiler were to translate the source to a common intermediate language, and the back end of a compiler were to translate this intermediate language to the language of the machine, then (ideally) only N + M programs would be needed.
This was the philosophy behind UNCOL Sammet,1969 ), one of the first attempts at automating the process of compiler creation. UNCOL stood for Universal Computer-Oriented Language and was actually an intermediate language, not a compiler generator.
It proved difficult to design an intermediate language which was able to "capture" the variable aspects of different programming languages and different machines. This difficulty is still an issue in today's compiler tools. Front-end compiler generators began to appear in the early 1960's, and new generators continue to be created today.
Perhaps the best-known compiler tool is YACC, Yet Another Compiler-Compiler, written by Steve Johnson (1975). YACC was first created for the UNIX operating system and is associated with another tool called LEX ( Lesk and Schmidt, 1975 ) which generates a scanner, although it is not necessary to use a scanner generated by LEX to use YACC. Users can write their own scanners. LEX/YACC have been ported to other machines and operating systems.
Compiler tools whose metalanguages include attribute grammars began to appear in the late 1970's and 1980's. GAG ( Kastens et al., 1982) are based upon attribute grammar descriptions.
The PQCC ( Wulf et al., 1975 ) project and others have undertaken the difficult task of generating the back end of a compiler.