## Compiler Structure

• Lexical Analyzer
• Parsing
• Semantic Analysis
• Optimization
• Code Generation

## Lexical Analyzer

### Functionality

Lexical analysis is achieved by DFA/NFA derived from regular expressions that collect substrings to produce tokens.

### Steps

Step 1 – Define the finite set of tokens(integer, keyword, relation, identifier, whitespaces, brackets and stuff)

Step 2 – Form an implementation to convert strings to token/lexeme pairs, discard meaningless tokens.

### Implementation of a lexical analyzer

Define regular expressions to catch lexical structures, then use DFA, NFA to implement the process of regular expressions.

Procedure – partition into tokens; input the string to get the matching result.

Problems May Emerg:

• Ambiguity 1: multiple prefixes can be matched. Solution: maximal much policy
• Ambiguity 2: able to be interpreted as multiple types of tokens. Solution: priority list for token types
• Error Handling: write a regex to match all errornous strings

### Finite Automata

#### DFA vs NFA

DFA Features:

• One move leads to one transition exactly.
• No epsilon-moves

NFA Features:

• Multiple transitions are allowed with epsilon-moves.

NFA can be converted to DFA with exponentially increasing size in space.(Suppose there are $$n$$ states for NFA, $$2^n – 1$$ states are possible for generated DFA)

DFA can be represented as a jump table, so it is structure-trivial.