OCR Computing A-Level Revision

How Compilers Work

Compilers are complex, however they basically go through three processes - lexical analysis, syntax analysis, and code generation.

Please note, although I am using Python for simplicity in my examples, Python is not (usually) compiled.

Lexical Analysis (3.2.f)

Lexical analysis takes the source program, strips out comments and whitespace, and replaces reserved words and symbols used in the program with tokens (fixed length strings of binary digits). Variable names are stored for later use and error messages are output if necessary.

IF FOO > BAR THEN:  
  FOOBAR = TRUE  
ELSE:  
  FOOBAR = FALSE

The above code might be converted to the following sequence of tokens (expressed in denary for easy reading, but in reality they would be expressed in binary):

10 50 20 50 11 90 99 99 50 22 41 12 99 99 50 22 42

Note that "50" simply means "there's a variable here, look it up in the variable lookup table".

Syntax Analysis (3.2.g)

You do not need to have detailed knowledge of the syntax of programming languages

Syntax analysis is where the token stream is verified against the syntax rules of the programming language. Consider the token stream above. It reads:

IF VARIABLE GREATER-THAN VARIABLE THEN : (space) (space) VARIABLE SET-TO TRUE ELSE : (space) (space) VARIABLE SET-TO FALSE

This is perfectly valid syntax for Python. However, in a different language (such as C or PHP), the syntax could be different.

Code Generation (3.2.h)

Code generation is where code is actually converted into machine code (which can be executed). Many low-level functions are created from each high-level function. During code generation, the code can be optimised - either for smallest code size, or fastest performance, or to improve some other aspect of the code.