From 810dcd183e589e1a6763f79f4a430c3e0d3c7130 Mon Sep 17 00:00:00 2001 From: Melody Horn Date: Thu, 1 Oct 2020 04:38:53 -0600 Subject: finish scanning --- syntax.md | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/syntax.md b/syntax.md index 91d2bac..0ad1402 100644 --- a/syntax.md +++ b/syntax.md @@ -17,6 +17,8 @@ A *token* is one of the following kinds of token: - a *string literal*, - or a *punctuator*. +Tokens are separated by either *whitespace* or a *comment*. + ## Keywords A *keyword* is one of the following 28 literal words: @@ -68,3 +70,90 @@ Subsequent characters may have any of the above-listed Unicode categories, or on - Decimal Digit Number (e.g. `0`) - Letter Number (e.g. `Ⅳ`, U+2163 Roman Numeral Four) - Other Number (e.g. `¼`, U+00BC Vulgar Fraction One Quarter) + +## Constants + +A *constant* can have one of five types: +- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`}; +- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`}; +- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`}; +- a *floating-point constant*, a decimal constant followed by one of + - `.` followed by a decimal constant, + - either `e` or `E` followed by a decimal constant, + - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant; +- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`. + +### Escape Sequences + +The following sequences of characters are *escape sequences*: +- `\'` +- `\"` +- `\\` +- `\r` +- `\n` +- `\t` +- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} +- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} +- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} + +## String Literals + +A *string literal* begins with a `"`. +It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`. +It then ends with a `"`. + +## Punctuators + +The following sequences of characters form *punctuators*: +- `[` +- `]` +- `(` +- `)` +- `{` +- `}` +- `.` +- `+` +- `-` +- `*` +- `/` +- `%` +- `;` +- `!` +- `&` +- `|` +- `^` +- `~` +- `>` +- `<` +- `=` +- `->` +- `++` +- `--` +- `>>` +- `<<` +- `<=` +- `>=` +- `==` +- `!=` +- `&&` +- `||` +- `+=` +- `-=` +- `*=` +- `/=` +- `%=` +- `&=` +- `|=` +- `^=` + +## Whitespace + +A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other. + +## Comment + +A *comment* can be either a *line comment* or a *block comment*. + +A *line comment* begins with the characters `//` if they occur outside of a string literal and ends with a newline character U+000A. + +A *block comment* begins with the characters `/*` if they occur outside of a string literal and ends with the characters `*/`. -- cgit v1.2.3