### 6.1 Lexical Structure of Programming Languages

- Token categories
- reserved words (keywords)
- literals or constants
- special symbols
- identifiers

- principle of longest substring takes the longest possible substring of nonblank characters as a single token.
- Regular expression
- + mean one or more
- ? mean one or none
- * mean none or more

### 6.2 Context-Free Grammars and BNF’s

- Elements of CFG
- start symbol is nonterminal that stands for the entire, top-level phrase being defined.
- nonterminals are broken down into further phrase structures
- terminals cannot be broken down into further phrase structures
- productions are rules use for the derivation.

- CFG defines a language.

### 6.3 Parse Trees and Abstract Syntax Trees

- A Parse Tree is labeled by nonterminal at interior nodes and terminals at leaves. The structure of a parse tree is completely specified by the grammar rules of the language and a derivation of a particular sequence of terminals.
- Abstract syntax tree abstract the essential structure of the parse tree.

Example: arithmetic expression grammar expr → expr + expr | expr * expr | (expr) | number number → number digit | digit digit → 0 | 1 | 2 | 3 | 4| 5 | 6 | 7 | 8 | 9 Example: parse tree for 234 number / \ number digit / \ | number digit 4 | | digit 3 | 2 Example: abstract syntax tree for 234 4 / 3 / 2

### 6.4 Ambiguity, Associativity, and Precedence

- A grammar which two distinct parse trees are possible for the same string is ambiguous.
- left-recursive rule for an operation causes it to left-associate, right-recursive causes right-associate.

3 + 4 + 5 left-recursive/associate expr → expr + term expr / | \ expr + term / | \ | expr + term 5 | | 3 4 right-recursive/associate expr → term + expr expr / | \ term + expr | / | \ 3 term + expr | | 4 5

### 6.5 EBNFs and Syntax Diagrams

- EBNF
- {} stands for “zero or more repetition of”
- [] indicate optional part of the structure

### 6.6 Parsing Techniques and Tools

- Bottom-up parser match an input with the right-hand sides of the grammar rules, when match occurs, the right-hand side is replaced by, or reduced to, the nonterminal on the left.
- Top-down parsers expand the nonterminal to match incoming tokens and directly construct a derivation.
- Recursive-decent parser turns the nonterminals into a group of mutually recursive procedures whose actions are based on the right-hand sides of the BNF’s.
- predictive parser requirement
- able to distinguish between choices in a grammar rule in term of First sets or the First sets of any two choices must not have any token in common (where First() to be the set of tokens that can begin the string ).
- for any optional part, no token beginning the optional part an also come after the optional part.