aboutsummaryrefslogtreecommitdiff
path: root/syntax.md
diff options
context:
space:
mode:
authorMelody Horn <melody@boringcactus.com>2020-10-31 21:59:00 -0600
committerMelody Horn <melody@boringcactus.com>2020-10-31 21:59:00 -0600
commit5af481d62df80d8be3f5835042d30372ef9cbe04 (patch)
tree6d995adefedd90fd3db269f898a527313a37af10 /syntax.md
parentc916253f17b329550250549ea0aef4b67ced026f (diff)
downloadspec-5af481d62df80d8be3f5835042d30372ef9cbe04.tar.gz
spec-5af481d62df80d8be3f5835042d30372ef9cbe04.zip
define and annotate some language elements
Diffstat (limited to 'syntax.md')
-rw-r--r--syntax.md196
1 files changed, 0 insertions, 196 deletions
diff --git a/syntax.md b/syntax.md
index 80fa54b..96f0b88 100644
--- a/syntax.md
+++ b/syntax.md
@@ -1,204 +1,8 @@
# Syntax (old)
-The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features.
-
-## Source Files
-
-A Crowbar source file is UTF-8.
-Crowbar source files can come in two varieties, an *implementation file* and a *header file*.
-An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension.
-
-A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree).
-
-## Scanning
-
-A *token* is one of the following kinds of token:
-
-- a *keyword*,
-- an *identifier*,
-- a *constant*,
-- a *string literal*,
-- or a *punctuator*.
-
-Tokens are separated by either *whitespace* or a *comment*.
-
-### Keywords
-
-A *keyword* is one of the following literal words:
-
-- `bool`
-- `break`
-- `case`
-- `char`
-- `const`
-- `continue`
-- `default`
-- `do`
-- `double`
-- `else`
-- `enum`
-- `extern`
-- `float`
-- `for`
-- `fragile`
-- `function`
-- `if`
-- `include`
-- `int`
-- `long`
-- `return`
-- `short`
-- `signed`
-- `sizeof`
-- `struct`
-- `switch`
-- `unsigned`
-- `void`
-- `while`
-
-### Identifiers
-
-An *identifier* is a sequence of one or more characters having Unicode categories within a legal set.
-
-The first character in an identifier must have one of the following Unicode categories:
-
-- `Pc` Connector Punctuation (e.g. `_`)
-- `Ll` Lowercase Letter (e.g. `h`)
-- `Lm` Modifier Letter (e.g. `ʹ`, U+02B9 Modifier Letter Prime)
-- `Lo` Other Letter (e.g. `א`, U+05D0 Hebrew Letter Alef)
-- `Lt` Titlecase Letter (e.g. `Dž`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron)
-- `Lu` Uppercase Letter (e.g. `B`)
-- `Mn` Nonspacing Mark (e.g. ` ̂`, U+0302 Combining Circumflex Accent)
-- `Sk` Modifier Symbol (e.g. `^`, U+005E Circumflex Accent)
-
-Subsequent characters may have any of the above-listed Unicode categories, or one of the following:
-
-- `Nd` Decimal Digit Number (e.g. `0`)
-- `Nl` Letter Number (e.g. `Ⅳ`, U+2163 Roman Numeral Four)
-- `No` Other Number (e.g. `¼`, U+00BC Vulgar Fraction One Quarter)
-
-### Constants
-
-A *constant* can have one of six types:
-
-- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`};
-- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`};
-- an *octal constant*, the prefix `0o` followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `_`};
-- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`};
-- a *floating-point constant*, a decimal constant followed by one of
- - `.` followed by a decimal constant,
- - either `e` or `E` followed by a decimal constant,
- - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant;
-- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`.
-
-#### Escape Sequences
-
-The following sequences of characters are *escape sequences*:
-
-- `\'`
-- `\"`
-- `\\`
-- `\r`
-- `\n`
-- `\t`
-- `\0`
-- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-
-### String Literals
-
-A *string literal* begins with a `"`.
-It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`.
-It then ends with a `"`.
-
-### Punctuators
-
-The following sequences of characters form *punctuators*:
-
-- `[`
-- `]`
-- `(`
-- `)`
-- `{`
-- `}`
-- `.`
-- `,`
-- `+`
-- `-`
-- `*`
-- `/`
-- `%`
-- `;`
-- `!`
-- `&`
-- `|`
-- `^`
-- `~`
-- `>`
-- `<`
-- `=`
-- `->`
-- `++`
-- `--`
-- `>>`
-- `<<`
-- `<=`
-- `>=`
-- `==`
-- `!=`
-- `&&`
-- `||`
-- `+=`
-- `-=`
-- `*=`
-- `/=`
-- `%=`
-- `&=`
-- `|=`
-- `^=`
-
-### Whitespace
-
-A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other.
-
-### Comments
-
-A *comment* can be either a *line comment* or a *block comment*.
-
-A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A.
-
-A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`.
-
-## Parsing
-
-The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar):
-
-### Entry points
-
-```PEG
-HeaderFile ← HeaderFileElement+
-HeaderFileElement ← IncludeStatement /
- TypeDeclaration /
- FunctionDeclaration
-
-ImplementationFile ← ImplementationFileElement+
-ImplementationFileElement ← HeaderFileElement /
- FunctionDefinition
-```
-
### Top-level elements
```PEG
-IncludeStatement ← 'include' string-literal ';'
-
-TypeDeclaration ← StructDeclaration /
- EnumDeclaration
-StructDeclaration ← 'struct' identifier '{' VariableDeclaration+ '}' ';'
-EnumDeclaration ← 'enum' identifier '{' EnumBody '}' ';'
-EnumBody ← identifier ('=' Expression)? ',' EnumBody /
- identifier ('=' Expression)? ','?
-
FunctionDeclaration ← FunctionSignature ';'
FunctionDefinition ← FunctionSignature Block
FunctionSignature ← Type identifier '(' SignatureArguments? ')'