aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMelody Horn <melody@boringcactus.com>2020-10-01 04:38:53 -0600
committerMelody Horn <melody@boringcactus.com>2020-10-01 04:38:53 -0600
commit810dcd183e589e1a6763f79f4a430c3e0d3c7130 (patch)
treee98a2b0fab0da08ee381cee47f2e43540796b6f9
parent6f187130dcfb283fc0b0622cf1af62827d8443de (diff)
downloadspec-810dcd183e589e1a6763f79f4a430c3e0d3c7130.tar.gz
spec-810dcd183e589e1a6763f79f4a430c3e0d3c7130.zip
finish scanning
-rw-r--r--syntax.md89
1 files changed, 89 insertions, 0 deletions
diff --git a/syntax.md b/syntax.md
index 91d2bac..0ad1402 100644
--- a/syntax.md
+++ b/syntax.md
@@ -17,6 +17,8 @@ A *token* is one of the following kinds of token:
- a *string literal*,
- or a *punctuator*.
+Tokens are separated by either *whitespace* or a *comment*.
+
## Keywords
A *keyword* is one of the following 28 literal words:
@@ -68,3 +70,90 @@ Subsequent characters may have any of the above-listed Unicode categories, or on
- Decimal Digit Number (e.g. `0`)
- Letter Number (e.g. `Ⅳ`, U+2163 Roman Numeral Four)
- Other Number (e.g. `¼`, U+00BC Vulgar Fraction One Quarter)
+
+## Constants
+
+A *constant* can have one of five types:
+- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`};
+- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`};
+- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`};
+- a *floating-point constant*, a decimal constant followed by one of
+ - `.` followed by a decimal constant,
+ - either `e` or `E` followed by a decimal constant,
+ - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant;
+- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`.
+
+### Escape Sequences
+
+The following sequences of characters are *escape sequences*:
+- `\'`
+- `\"`
+- `\\`
+- `\r`
+- `\n`
+- `\t`
+- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
+- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
+- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
+
+## String Literals
+
+A *string literal* begins with a `"`.
+It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`.
+It then ends with a `"`.
+
+## Punctuators
+
+The following sequences of characters form *punctuators*:
+- `[`
+- `]`
+- `(`
+- `)`
+- `{`
+- `}`
+- `.`
+- `+`
+- `-`
+- `*`
+- `/`
+- `%`
+- `;`
+- `!`
+- `&`
+- `|`
+- `^`
+- `~`
+- `>`
+- `<`
+- `=`
+- `->`
+- `++`
+- `--`
+- `>>`
+- `<<`
+- `<=`
+- `>=`
+- `==`
+- `!=`
+- `&&`
+- `||`
+- `+=`
+- `-=`
+- `*=`
+- `/=`
+- `%=`
+- `&=`
+- `|=`
+- `^=`
+
+## Whitespace
+
+A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other.
+
+## Comment
+
+A *comment* can be either a *line comment* or a *block comment*.
+
+A *line comment* begins with the characters `//` if they occur outside of a string literal and ends with a newline character U+000A.
+
+A *block comment* begins with the characters `/*` if they occur outside of a string literal and ends with the characters `*/`.