- Lexical Analyzer
- Semantic Analysis
- Code Generation
Lexical analysis is achieved by DFA/NFA derived from regular expressions that collect substrings to produce tokens.
Step 1 – Define the finite set of tokens(integer, keyword, relation, identifier, whitespaces, brackets and stuff)
Step 2 – Form an implementation to convert strings to token/lexeme pairs, discard meaningless tokens.
Implementation of a lexical analyzer
Define regular expressions to catch lexical structures, then use DFA, NFA to implement the process of regular expressions.
Procedure – partition into tokens; input the string to get the matching result.
Problems May Emerg:
- Ambiguity 1: multiple prefixes can be matched. Solution: maximal much policy
- Ambiguity 2: able to be interpreted as multiple types of tokens. Solution: priority list for token types
- Error Handling: write a regex to match all errornous strings
DFA vs NFA
- One move leads to one transition exactly.
- No epsilon-moves
- Multiple transitions are allowed with epsilon-moves.
NFA can be converted to DFA with exponentially increasing size in space.(Suppose there are \(n\) states for NFA, \(2^n – 1\) states are possible for generated DFA)
DFA can be represented as a jump table, so it is structure-trivial.