aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.build.yml30
-rw-r--r--errors.md2
-rw-r--r--index.md74
-rw-r--r--safety.md184
-rw-r--r--syntax.md696
-rw-r--r--tagged-unions.md2
-rw-r--r--types.md2
-rw-r--r--vs-c.md140
8 files changed, 565 insertions, 565 deletions
diff --git a/.build.yml b/.build.yml
index e40edc3..73562ea 100644
--- a/.build.yml
+++ b/.build.yml
@@ -1,15 +1,15 @@
-image: debian/stable
-packages:
- - pandoc
- - wkhtmltopdf
- - poppler-utils
-sources:
- - https://git.sr.ht/~boringcactus/crowbar-spec
-tasks:
- - page-count: |
- cd crowbar-spec
- pandoc -s -o ../spec.pdf -t html -M "title=Crowbar Specification" *.md
- cd ..
- pdfinfo spec.pdf | grep Pages
-artifacts:
- - spec.pdf
+image: debian/stable
+packages:
+ - pandoc
+ - wkhtmltopdf
+ - poppler-utils
+sources:
+ - https://git.sr.ht/~boringcactus/crowbar-spec
+tasks:
+ - page-count: |
+ cd crowbar-spec
+ pandoc -s -o ../spec.pdf -t html -M "title=Crowbar Specification" *.md
+ cd ..
+ pdfinfo spec.pdf | grep Pages
+artifacts:
+ - spec.pdf
diff --git a/errors.md b/errors.md
index 1ea1912..1333ed7 100644
--- a/errors.md
+++ b/errors.md
@@ -1 +1 @@
-TODO
+TODO
diff --git a/index.md b/index.md
index b63a778..6692800 100644
--- a/index.md
+++ b/index.md
@@ -1,37 +1,37 @@
-Crowbar: the good parts of C, with a little bit extra.
-
-**This is entirely a work-in-progress, and should not be relied upon to be stable (or even true) in any way.**
-
-Crowbar is a language that is derived from (and, wherever possible, interoperable with) C, and aims to remove as many [footgun](https://en.wiktionary.org/wiki/footgun)s and as much needless complexity from C as possible while still being familiar to C developers.
-
-Ideally, a typical C codebase should be straightforward to rewrite in Crowbar, and any atypical C constructions not supported by Crowbar can be left as C.
-
-# Context
-
-- [Rust is not a good C replacement](https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-replacement.html)
-
-# cactus's Blog Posts
-
-- [Crowbar: Defining a good C replacement](https://www.boringcactus.com/2020/09/28/crowbar-1-defining-a-c-replacement.html)
-- [Crowbar: Simplifying C's type names](https://www.boringcactus.com/2020/10/13/crowbar-2-simplifying-c-type-names.html)
-- [Crowbar: Turns out, language development is hard](https://www.boringcactus.com/2020/10/19/crowbar-3-this-is-tough.html)
-
-# Comparison with C
-
-The [comparison with C](vs-c.md) is an informal overview of the places where Crowbar and C diverge.
-
-# Syntax
-
-[Read the Syntax chapter of the spec.](syntax.md)
-
-# Semantics
-
-TODO
-
-# Discuss
-
-- [announcement mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-announce)
-- [permanent discussion mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-devel)
-- ephemeral discussions via IRC: #crowbar-lang on freenode ([join via irc](ircs://chat.freenode.net/#crowbar-lang), [join via web](https://webchat.freenode.net/#crowbar-lang))
-
-[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)
+Crowbar: the good parts of C, with a little bit extra.
+
+**This is entirely a work-in-progress, and should not be relied upon to be stable (or even true) in any way.**
+
+Crowbar is a language that is derived from (and, wherever possible, interoperable with) C, and aims to remove as many [footgun](https://en.wiktionary.org/wiki/footgun)s and as much needless complexity from C as possible while still being familiar to C developers.
+
+Ideally, a typical C codebase should be straightforward to rewrite in Crowbar, and any atypical C constructions not supported by Crowbar can be left as C.
+
+# Motivation
+
+- [Rust is not a good C replacement](https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-replacement.html)
+
+# Journal
+
+- [Crowbar: Defining a good C replacement](https://www.boringcactus.com/2020/09/28/crowbar-1-defining-a-c-replacement.html)
+- [Crowbar: Simplifying C's type names](https://www.boringcactus.com/2020/10/13/crowbar-2-simplifying-c-type-names.html)
+- [Crowbar: Turns out, language development is hard](https://www.boringcactus.com/2020/10/19/crowbar-3-this-is-tough.html)
+
+# Comparison with C
+
+The [comparison with C](vs-c.md) is an informal overview of the places where Crowbar and C diverge.
+
+# Syntax
+
+[Read the Syntax chapter of the spec.](syntax.md)
+
+# Semantics
+
+TODO
+
+# Discuss
+
+- [announcement mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-announce)
+- [permanent discussion mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-devel)
+- ephemeral discussions via IRC: #crowbar-lang on freenode ([join via irc](ircs://chat.freenode.net/#crowbar-lang), [join via web](https://webchat.freenode.net/#crowbar-lang))
+
+[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)
diff --git a/safety.md b/safety.md
index b8a2303..845c45b 100644
--- a/safety.md
+++ b/safety.md
@@ -1,92 +1,92 @@
-Each item in Wikipedia's [list of types of memory errors](https://en.wikipedia.org/wiki/Memory_safety#Types_of_memory_errors) and what Crowbar does to prevent them.
-
-In general, Crowbar does its best to ensure that code will not exhibit any of the following memory errors.
-However, sometimes the compiler knows less than the programmer, and so code that looks dangerous is actually fine.
-Crowbar allows programmers to suspend the memory safety checks with the `fragile` keyword.
-
-# Access errors
-
-## Buffer overflow
-
-Crowbar addresses buffer overflow with bounds checking.
-In C, the type `char *` can point to a single character, a null-terminated string of unknown length, a buffer of fixed size, or nothing at all.
-In Crowbar, the type `char *` can only point to either a single character or nothing at all.
-If a buffer is declared as `char[50] name;` then it has type `char[50]`, and can be implicitly converted to `(char[50])*`, a pointer-to-50-chars.
-If memory is dynamically allocated, it works as follows:
-
-```crowbar
-void process(size_t bufferSize, char[bufferSize] buffer) {
- // do some work with buffer, given that we know its size
-}
-
-int main(int argc, (char[1024?])[argc] argv) {
- size_t bufferSize = getBufferSize();
- (char[bufferSize])* buffer = malloc(bufferSize);
- process(bufferSize, buffer);
- free(buffer);
-}
-```
-
-Note that `malloc` as part of the Crowbar standard library has signature `(void[size])* malloc(size_t size);` and so no cast is needed above.
-In C, `buffer` in `main` would have type pointer-to-VLA-of-char, but `buffer` in `process` would have type VLA-of-char, and this conversion would emit a compiler warning.
-However, in Crowbar, a `(T[N])*` is always implicitly convertible to `T[N]`, so no warning exists.
-(This is translated into C by dereferencing `buffer` in `main`.)
-
-Note as well that the type of `argv` is complicated.
-This is because the elements of `argv` have unconstrained size.
-TODO figure out if that's the right way to handle that
-
-## Buffer over-read
-
-bounds checking again
-
-## Race condition
-
-uhhhhh πŸ€·β€β™€οΈ
-
-## Page fault
-
-bounds checking, dubious-pointer checking
-
-## Use after free
-
-`free(x);` not followed by `x = NULL;` is a compiler error.
-`owned` and `borrowed` keywords
-
-# Uninitialized variables
-
-forbid them in syntax
-
-## Null pointer dereference
-
-dubious-pointer checking
-
-## Wild pointers
-
-dubious-pointer checking
-
-# Memory leak
-
-## Stack exhaustion
-
-uhhhhhh πŸ€·β€β™€οΈ
-
-## Heap exhaustion
-
-that counts as error handling, just the `malloc`-shaped kind
-
-## Double free
-
-this is just use-after-free but the use is calling free on it
-
-## Invalid free
-
-don't do that
-
-## Mismatched free
-
-how does that even happen
-
-## Unwanted aliasing
-
-uhhh don't do that?
+Each item in Wikipedia's [list of types of memory errors](https://en.wikipedia.org/wiki/Memory_safety#Types_of_memory_errors) and what Crowbar does to prevent them.
+
+In general, Crowbar does its best to ensure that code will not exhibit any of the following memory errors.
+However, sometimes the compiler knows less than the programmer, and so code that looks dangerous is actually fine.
+Crowbar allows programmers to suspend the memory safety checks with the `fragile` keyword.
+
+# Access errors
+
+## Buffer overflow
+
+Crowbar addresses buffer overflow with bounds checking.
+In C, the type `char *` can point to a single character, a null-terminated string of unknown length, a buffer of fixed size, or nothing at all.
+In Crowbar, the type `char *` can only point to either a single character or nothing at all.
+If a buffer is declared as `char[50] name;` then it has type `char[50]`, and can be implicitly converted to `(char[50])*`, a pointer-to-50-chars.
+If memory is dynamically allocated, it works as follows:
+
+```crowbar
+void process(size_t bufferSize, char[bufferSize] buffer) {
+ // do some work with buffer, given that we know its size
+}
+
+int main(int argc, (char[1024?])[argc] argv) {
+ size_t bufferSize = getBufferSize();
+ (char[bufferSize])* buffer = malloc(bufferSize);
+ process(bufferSize, buffer);
+ free(buffer);
+}
+```
+
+Note that `malloc` as part of the Crowbar standard library has signature `(void[size])* malloc(size_t size);` and so no cast is needed above.
+In C, `buffer` in `main` would have type pointer-to-VLA-of-char, but `buffer` in `process` would have type VLA-of-char, and this conversion would emit a compiler warning.
+However, in Crowbar, a `(T[N])*` is always implicitly convertible to `T[N]`, so no warning exists.
+(This is translated into C by dereferencing `buffer` in `main`.)
+
+Note as well that the type of `argv` is complicated.
+This is because the elements of `argv` have unconstrained size.
+TODO figure out if that's the right way to handle that
+
+## Buffer over-read
+
+bounds checking again
+
+## Race condition
+
+uhhhhh πŸ€·β€β™€οΈ
+
+## Page fault
+
+bounds checking, dubious-pointer checking
+
+## Use after free
+
+`free(x);` not followed by `x = NULL;` is a compiler error.
+`owned` and `borrowed` keywords
+
+# Uninitialized variables
+
+forbid them in syntax
+
+## Null pointer dereference
+
+dubious-pointer checking
+
+## Wild pointers
+
+dubious-pointer checking
+
+# Memory leak
+
+## Stack exhaustion
+
+uhhhhhh πŸ€·β€β™€οΈ
+
+## Heap exhaustion
+
+that counts as error handling, just the `malloc`-shaped kind
+
+## Double free
+
+this is just use-after-free but the use is calling free on it
+
+## Invalid free
+
+don't do that
+
+## Mismatched free
+
+how does that even happen
+
+## Unwanted aliasing
+
+uhhh don't do that?
diff --git a/syntax.md b/syntax.md
index c195811..e0ecea5 100644
--- a/syntax.md
+++ b/syntax.md
@@ -1,348 +1,348 @@
-The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features.
-
-# Source Files
-
-A Crowbar source file is UTF-8.
-Crowbar source files can come in two varieties, an *implementation file* and a *header file*.
-An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension.
-
-A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree).
-
-# Scanning
-
-A *token* is one of the following kinds of token:
-- a *keyword*,
-- an *identifier*,
-- a *constant*,
-- a *string literal*,
-- or a *punctuator*.
-
-Tokens are separated by either *whitespace* or a *comment*.
-
-## Keywords
-
-A *keyword* is one of the following literal words:
-- `bool`
-- `break`
-- `case`
-- `char`
-- `const`
-- `continue`
-- `default`
-- `do`
-- `double`
-- `else`
-- `enum`
-- `extern`
-- `float`
-- `for`
-- `fragile`
-- `function`
-- `if`
-- `include`
-- `int`
-- `long`
-- `return`
-- `short`
-- `signed`
-- `sizeof`
-- `struct`
-- `switch`
-- `typedef`
-- `unsigned`
-- `void`
-- `while`
-
-## Identifiers
-
-An *identifier* is a sequence of one or more characters having Unicode categories within a legal set.
-
-The first character in an identifier must have one of the following Unicode categories:
-- `Pc` Connector Punctuation (e.g. `_`)
-- `Ll` Lowercase Letter (e.g. `h`)
-- `Lm` Modifier Letter (e.g. `ΚΉ`, U+02B9 Modifier Letter Prime)
-- `Lo` Other Letter (e.g. `א`, U+05D0 Hebrew Letter Alef)
-- `Lt` Titlecase Letter (e.g. `Η…`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron)
-- `Lu` Uppercase Letter (e.g. `B`)
-- `Mn` Nonspacing Mark (e.g. ` Μ‚`, U+0302 Combining Circumflex Accent)
-- `Sk` Modifier Symbol (e.g. `^`, U+005E Circumflex Accent)
-
-Subsequent characters may have any of the above-listed Unicode categories, or one of the following:
-- `Nd` Decimal Digit Number (e.g. `0`)
-- `Nl` Letter Number (e.g. `β…£`, U+2163 Roman Numeral Four)
-- `No` Other Number (e.g. `ΒΌ`, U+00BC Vulgar Fraction One Quarter)
-
-## Constants
-
-A *constant* can have one of six types:
-- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`};
-- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`};
-- an *octal constant*, the prefix `0o` followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `_`};
-- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`};
-- a *floating-point constant*, a decimal constant followed by one of
- - `.` followed by a decimal constant,
- - either `e` or `E` followed by a decimal constant,
- - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant;
-- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`.
-
-### Escape Sequences
-
-The following sequences of characters are *escape sequences*:
-- `\'`
-- `\"`
-- `\\`
-- `\r`
-- `\n`
-- `\t`
-- `\0`
-- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-
-## String Literals
-
-A *string literal* begins with a `"`.
-It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`.
-It then ends with a `"`.
-
-## Punctuators
-
-The following sequences of characters form *punctuators*:
-- `[`
-- `]`
-- `(`
-- `)`
-- `{`
-- `}`
-- `.`
-- `,`
-- `+`
-- `-`
-- `*`
-- `/`
-- `%`
-- `;`
-- `!`
-- `&`
-- `|`
-- `^`
-- the tilde, `~` (given special treatment on this line due to [a bug in the Markdown renderer that sr.ht uses](https://github.com/miyuchina/mistletoe/issues/91))
-- `>`
-- `<`
-- `=`
-- `->`
-- `++`
-- `--`
-- `>>`
-- `<<`
-- `<=`
-- `>=`
-- `==`
-- `!=`
-- `&&`
-- `||`
-- `+=`
-- `-=`
-- `*=`
-- `/=`
-- `%=`
-- `&=`
-- `|=`
-- `^=`
-
-## Whitespace
-
-A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other.
-
-## Comments
-
-A *comment* can be either a *line comment* or a *block comment*.
-
-A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A.
-
-A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`.
-
-# Parsing
-
-The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar):
-
-## Entry points
-
-```
-HeaderFile ← HeaderFileElement+
-HeaderFileElement ← IncludeStatement /
- TypeDeclaration /
- FunctionDeclaration
-
-ImplementationFile ← ImplementationFileElement+
-ImplementationFileElement ← HeaderFileElement /
- FunctionDefinition
-```
-
-## Top-level elements
-
-```
-IncludeStatement ← 'include' string-literal ';'
-
-TypeDeclaration ← StructDeclaration /
- EnumDeclaration /
- TypedefDeclaration
-StructDeclaration ← 'struct' identifier '{' VariableDeclaration+ '}' ';'
-EnumDeclaration ← 'enum' identifier '{' EnumBody '}' ';'
-EnumBody ← identifier ('=' Expression)? ',' EnumBody /
- identifier ('=' Expression)? ','?
-TypedefDeclaration ← 'typedef' identifier '=' Type ';'
-
-FunctionDeclaration ← FunctionSignature ';'
-FunctionDefinition ← FunctionSignature Block
-FunctionSignature ← Type identifier '(' SignatureArguments? ')'
-SignatureArguments ← Type identifier ',' SignatureArguments /
- Type identifier ','?
-```
-
-## Statements
-
-```
-Block ← '{' Statement* '}'
-
-Statement ← VariableDefinition /
- VariableDeclaration /
- IfStatement /
- SwitchStatement /
- WhileStatement /
- DoWhileStatement /
- ForStatement /
- FlowControlStatement /
- AssignmentStatement /
- ExpressionStatement
-
-VariableDefinition ← Type identifier '=' Expression ';'
-VariableDeclaration ← Type identifier ';'
-
-IfStatement ← 'if' Expression Block 'else' Block /
- 'if' Expression Block
-
-SwitchStatement ← 'switch' Expression '{' SwitchCase+ '}'
-SwitchCase ← CaseSpecifier Block /
- 'default' Block
-CaseSpecifier ← 'case' Expression ',' CaseSpecifier /
- 'case' Expression ','?
-
-WhileStatement ← 'while' Expression Block
-DoWhileStatement ← 'do' Block 'while' Expression ';'
-ForStatement ← 'for' VariableDefinition? ';' Expression ';' AssignmentStatementBody? Block
-
-FlowControlStatement ← 'continue' ';' /
- 'break' ';' /
- 'return' Expression? ';'
-
-AssignmentStatement ← AssignmentStatementBody ';'
-AssignmentStatementBody ← AssignmentTargetExpression '=' Expression /
- AssignmentTargetExpression '+=' Expression /
- AssignmentTargetExpression '-=' Expression /
- AssignmentTargetExpression '*=' Expression /
- AssignmentTargetExpression '/=' Expression /
- AssignmentTargetExpression '%=' Expression /
- AssignmentTargetExpression '&=' Expression /
- AssignmentTargetExpression '^=' Expression /
- AssignmentTargetExpression '|=' Expression /
- AssignmentTargetExpression '++' /
- AssignmentTargetExpression '--'
-
-ExpressionStatement ← Expression ';'
-```
-
-## Types
-
-```
-Type ← 'const' BasicType /
- BasicType '*' /
- BasicType '[' Expression ']' /
- BasicType 'function' '(' (BasicType ',')* ')' /
- BasicType
-BasicType ← 'void' /
- IntegerType /
- 'signed' IntegerType /
- 'unsigned' IntegerType /
- 'float' /
- 'double' /
- 'bool' /
- 'struct' identifier /
- 'enum' identifier /
- 'typedef' identifier /
- '(' Type ')'
-IntegerType ← 'char' /
- 'short' /
- 'int' /
- 'long'
-```
-
-## Expressions
-
-```
-AssignmentTargetExpression ← identifier ATEElementSuffix*
-ATEElementSuffix ← '[' Expression ']' /
- '.' identifier /
- '->' identifier
-
-AtomicExpression ← identifier /
- constant /
- string-literal /
- '(' Expression ')'
-
-ObjectExpression ← AtomicExpression ObjectSuffix* /
- ArrayLiteralExpression /
- StructLiteralExpression
-ObjectSuffix ← '[' Expression ']' /
- '(' CommasExpressionList? ')' /
- '.' identifier /
- '->' identifier
-CommasExpressionList ← Expression ',' CommasExpressionList? /
- Expression ','?
-ArrayLiteralExpression ← '{' CommasExpressionList '}'
-StructLiteralExpression ← '{' StructLiteralBody '}'
-StructLiteralBody ← StructLiteralElement ',' StructLiteralBody? /
- StructLiteralElement ','?
-StructLiteralElement ← '.' identifier '=' Expression
-
-FactorExpression ← '(' Type ')' FactorExpression /
- '&' FactorExpression /
- '*' FactorExpression /
- '+' FactorExpression /
- '-' FactorExpression /
- '~' FactorExpression /
- '!' FactorExpression /
- 'sizeof' FactorExpression /
- 'sizeof' Type /
- ObjectExpression
-
-TermExpression ← FactorExpression TermSuffix*
-TermSuffix ← '*' FactorExpression /
- '/' FactorExpression /
- '%' FactorExpression
-
-ArithmeticExpression ← TermExpression ArithmeticSuffix*
-ArithmeticSuffix ← '+' TermExpression /
- '-' TermExpression
-
-BitwiseOpExpression ← ArithmeticExpression '<<' ArithmeticExpression /
- ArithmeticExpression '>>' ArithmeticExpression /
- ArithmeticExpression '^' ArithmeticExpression /
- ArithmeticExpression ('&' ArithmeticExpression)+ /
- ArithmeticExpression ('|' ArithmeticExpression)+ /
- ArithmeticExpression
-
-ComparisonExpression ← BitwiseOpExpression '==' BitwiseOpExpression /
- BitwiseOpExpression '!=' BitwiseOpExpression /
- BitwiseOpExpression '<=' BitwiseOpExpression /
- BitwiseOpExpression '>=' BitwiseOpExpression /
- BitwiseOpExpression '<' BitwiseOpExpression /
- BitwiseOpExpression '>' BitwiseOpExpression /
- BitwiseOpExpression
-
-Expression ← ComparisonExpression ('&&' ComparisonExpression)+ /
- ComparisonExpression ('||' ComparisonExpression)+ /
- ComparisonExpression
-```
-
-[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)
+The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features.
+
+# Source Files
+
+A Crowbar source file is UTF-8.
+Crowbar source files can come in two varieties, an *implementation file* and a *header file*.
+An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension.
+
+A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree).
+
+# Scanning
+
+A *token* is one of the following kinds of token:
+- a *keyword*,
+- an *identifier*,
+- a *constant*,
+- a *string literal*,
+- or a *punctuator*.
+
+Tokens are separated by either *whitespace* or a *comment*.
+
+## Keywords
+
+A *keyword* is one of the following literal words:
+- `bool`
+- `break`
+- `case`
+- `char`
+- `const`
+- `continue`
+- `default`
+- `do`
+- `double`
+- `else`
+- `enum`
+- `extern`
+- `float`
+- `for`
+- `fragile`
+- `function`
+- `if`
+- `include`
+- `int`
+- `long`
+- `return`
+- `short`
+- `signed`
+- `sizeof`
+- `struct`
+- `switch`
+- `typedef`
+- `unsigned`
+- `void`
+- `while`
+
+## Identifiers
+
+An *identifier* is a sequence of one or more characters having Unicode categories within a legal set.
+
+The first character in an identifier must have one of the following Unicode categories:
+- `Pc` Connector Punctuation (e.g. `_`)
+- `Ll` Lowercase Letter (e.g. `h`)
+- `Lm` Modifier Letter (e.g. `ΚΉ`, U+02B9 Modifier Letter Prime)
+- `Lo` Other Letter (e.g. `א`, U+05D0 Hebrew Letter Alef)
+- `Lt` Titlecase Letter (e.g. `Η…`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron)
+- `Lu` Uppercase Letter (e.g. `B`)
+- `Mn` Nonspacing Mark (e.g. ` Μ‚`, U+0302 Combining Circumflex Accent)
+- `Sk` Modifier Symbol (e.g. `^`, U+005E Circumflex Accent)
+
+Subsequent characters may have any of the above-listed Unicode categories, or one of the following:
+- `Nd` Decimal Digit Number (e.g. `0`)
+- `Nl` Letter Number (e.g. `β…£`, U+2163 Roman Numeral Four)
+- `No` Other Number (e.g. `ΒΌ`, U+00BC Vulgar Fraction One Quarter)
+
+## Constants
+
+A *constant* can have one of six types:
+- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`};
+- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`};
+- an *octal constant*, the prefix `0o` followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `_`};
+- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`};
+- a *floating-point constant*, a decimal constant followed by one of
+ - `.` followed by a decimal constant,
+ - either `e` or `E` followed by a decimal constant,
+ - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant;
+- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`.
+
+### Escape Sequences
+
+The following sequences of characters are *escape sequences*:
+- `\'`
+- `\"`
+- `\\`
+- `\r`
+- `\n`
+- `\t`
+- `\0`
+- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
+- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
+- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
+
+## String Literals
+
+A *string literal* begins with a `"`.
+It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`.
+It then ends with a `"`.
+
+## Punctuators
+
+The following sequences of characters form *punctuators*:
+- `[`
+- `]`
+- `(`
+- `)`
+- `{`
+- `}`
+- `.`
+- `,`
+- `+`
+- `-`
+- `*`
+- `/`
+- `%`
+- `;`
+- `!`
+- `&`
+- `|`
+- `^`
+- the tilde, `~` (given special treatment on this line due to [a bug in the Markdown renderer that sr.ht uses](https://github.com/miyuchina/mistletoe/issues/91))
+- `>`
+- `<`
+- `=`
+- `->`
+- `++`
+- `--`
+- `>>`
+- `<<`
+- `<=`
+- `>=`
+- `==`
+- `!=`
+- `&&`
+- `||`
+- `+=`
+- `-=`
+- `*=`
+- `/=`
+- `%=`
+- `&=`
+- `|=`
+- `^=`
+
+## Whitespace
+
+A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other.
+
+## Comments
+
+A *comment* can be either a *line comment* or a *block comment*.
+
+A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A.
+
+A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`.
+
+# Parsing
+
+The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar):
+
+## Entry points
+
+```
+HeaderFile ← HeaderFileElement+
+HeaderFileElement ← IncludeStatement /
+ TypeDeclaration /
+ FunctionDeclaration
+
+ImplementationFile ← ImplementationFileElement+
+ImplementationFileElement ← HeaderFileElement /
+ FunctionDefinition
+```
+
+## Top-level elements
+
+```
+IncludeStatement ← 'include' string-literal ';'
+
+TypeDeclaration ← StructDeclaration /
+ EnumDeclaration /
+ TypedefDeclaration
+StructDeclaration ← 'struct' identifier '{' VariableDeclaration+ '}' ';'
+EnumDeclaration ← 'enum' identifier '{' EnumBody '}' ';'
+EnumBody ← identifier ('=' Expression)? ',' EnumBody /
+ identifier ('=' Expression)? ','?
+TypedefDeclaration ← 'typedef' identifier '=' Type ';'
+
+FunctionDeclaration ← FunctionSignature ';'
+FunctionDefinition ← FunctionSignature Block
+FunctionSignature ← Type identifier '(' SignatureArguments? ')'
+SignatureArguments ← Type identifier ',' SignatureArguments /
+ Type identifier ','?
+```
+
+## Statements
+
+```
+Block ← '{' Statement* '}'
+
+Statement ← VariableDefinition /
+ VariableDeclaration /
+ IfStatement /
+ SwitchStatement /
+ WhileStatement /
+ DoWhileStatement /
+ ForStatement /
+ FlowControlStatement /
+ AssignmentStatement /
+ ExpressionStatement
+
+VariableDefinition ← Type identifier '=' Expression ';'
+VariableDeclaration ← Type identifier ';'
+
+IfStatement ← 'if' Expression Block 'else' Block /
+ 'if' Expression Block
+
+SwitchStatement ← 'switch' Expression '{' SwitchCase+ '}'
+SwitchCase ← CaseSpecifier Block /
+ 'default' Block
+CaseSpecifier ← 'case' Expression ',' CaseSpecifier /
+ 'case' Expression ','?
+
+WhileStatement ← 'while' Expression Block
+DoWhileStatement ← 'do' Block 'while' Expression ';'
+ForStatement ← 'for' VariableDefinition? ';' Expression ';' AssignmentStatementBody? Block
+
+FlowControlStatement ← 'continue' ';' /
+ 'break' ';' /
+ 'return' Expression? ';'
+
+AssignmentStatement ← AssignmentStatementBody ';'
+AssignmentStatementBody ← AssignmentTargetExpression '=' Expression /
+ AssignmentTargetExpression '+=' Expression /
+ AssignmentTargetExpression '-=' Expression /
+ AssignmentTargetExpression '*=' Expression /
+ AssignmentTargetExpression '/=' Expression /
+ AssignmentTargetExpression '%=' Expression /
+ AssignmentTargetExpression '&=' Expression /
+ AssignmentTargetExpression '^=' Expression /
+ AssignmentTargetExpression '|=' Expression /
+ AssignmentTargetExpression '++' /
+ AssignmentTargetExpression '--'
+
+ExpressionStatement ← Expression ';'
+```
+
+## Types
+
+```
+Type ← 'const' BasicType /
+ BasicType '*' /
+ BasicType '[' Expression ']' /
+ BasicType 'function' '(' (BasicType ',')* ')' /
+ BasicType
+BasicType ← 'void' /
+ IntegerType /
+ 'signed' IntegerType /
+ 'unsigned' IntegerType /
+ 'float' /
+ 'double' /
+ 'bool' /
+ 'struct' identifier /
+ 'enum' identifier /
+ 'typedef' identifier /
+ '(' Type ')'
+IntegerType ← 'char' /
+ 'short' /
+ 'int' /
+ 'long'
+```
+
+## Expressions
+
+```
+AssignmentTargetExpression ← identifier ATEElementSuffix*
+ATEElementSuffix ← '[' Expression ']' /
+ '.' identifier /
+ '->' identifier
+
+AtomicExpression ← identifier /
+ constant /
+ string-literal /
+ '(' Expression ')'
+
+ObjectExpression ← AtomicExpression ObjectSuffix* /
+ ArrayLiteralExpression /
+ StructLiteralExpression
+ObjectSuffix ← '[' Expression ']' /
+ '(' CommasExpressionList? ')' /
+ '.' identifier /
+ '->' identifier
+CommasExpressionList ← Expression ',' CommasExpressionList? /
+ Expression ','?
+ArrayLiteralExpression ← '{' CommasExpressionList '}'
+StructLiteralExpression ← '{' StructLiteralBody '}'
+StructLiteralBody ← StructLiteralElement ',' StructLiteralBody? /
+ StructLiteralElement ','?
+StructLiteralElement ← '.' identifier '=' Expression
+
+FactorExpression ← '(' Type ')' FactorExpression /
+ '&' FactorExpression /
+ '*' FactorExpression /
+ '+' FactorExpression /
+ '-' FactorExpression /
+ '~' FactorExpression /
+ '!' FactorExpression /
+ 'sizeof' FactorExpression /
+ 'sizeof' Type /
+ ObjectExpression
+
+TermExpression ← FactorExpression TermSuffix*
+TermSuffix ← '*' FactorExpression /
+ '/' FactorExpression /
+ '%' FactorExpression
+
+ArithmeticExpression ← TermExpression ArithmeticSuffix*
+ArithmeticSuffix ← '+' TermExpression /
+ '-' TermExpression
+
+BitwiseOpExpression ← ArithmeticExpression '<<' ArithmeticExpression /
+ ArithmeticExpression '>>' ArithmeticExpression /
+ ArithmeticExpression '^' ArithmeticExpression /
+ ArithmeticExpression ('&' ArithmeticExpression)+ /
+ ArithmeticExpression ('|' ArithmeticExpression)+ /
+ ArithmeticExpression
+
+ComparisonExpression ← BitwiseOpExpression '==' BitwiseOpExpression /
+ BitwiseOpExpression '!=' BitwiseOpExpression /
+ BitwiseOpExpression '<=' BitwiseOpExpression /
+ BitwiseOpExpression '>=' BitwiseOpExpression /
+ BitwiseOpExpression '<' BitwiseOpExpression /
+ BitwiseOpExpression '>' BitwiseOpExpression /
+ BitwiseOpExpression
+
+Expression ← ComparisonExpression ('&&' ComparisonExpression)+ /
+ ComparisonExpression ('||' ComparisonExpression)+ /
+ ComparisonExpression
+```
+
+[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)
diff --git a/tagged-unions.md b/tagged-unions.md
index 1ea1912..1333ed7 100644
--- a/tagged-unions.md
+++ b/tagged-unions.md
@@ -1 +1 @@
-TODO
+TODO
diff --git a/types.md b/types.md
index 1ea1912..1333ed7 100644
--- a/types.md
+++ b/types.md
@@ -1 +1 @@
-TODO
+TODO
diff --git a/vs-c.md b/vs-c.md
index fe4ed3e..e086b4c 100644
--- a/vs-c.md
+++ b/vs-c.md
@@ -1,70 +1,70 @@
-What differentiates Crowbar from C?
-
-# Removals
-
-Some of the footguns and complexity in C come from misfeatures that can simply not be used.
-
-## Footguns
-
-Some constructs in C are almost always the wrong thing.
-
-- `goto`
-- Hexadecimal float literals
-- Wide characters
-- Digraphs
-- Prefix `++` and `--`
-- Chaining mixed left and right shifts (e.g. `x << 3 >> 2`)
-- Chaining relational/equality operators (e.g. `3 < x == 2`)
-- Mixed chains of bitwise or logical operators (e.g. `2 & x && 4 ^ y`)
-- The comma operator `,`
-
-Some constructs in C exhibit implicit behavior that should instead be made explicit.
-
-- `typedef`
-- Octal escape sequences
-- Using an assignment operator (`=`, `+=`, etc) or (postfix) `++` and `--` as components in a larger expression
-- The conditional operator `?:`
-- Preprocessor macros (but constants are fine)
-
-## Needless Complexity
-
-Some type modifiers in C exist solely for the purpose of enabling optimizations which most compilers can do already.
-
-- `inline`
-- `register`
-
-Some type modifiers in C only apply in very specific circumstances and so aren't important.
-
-- `restrict`
-- `volatile`
-- `_Imaginary`
-
-# Adjustments
-
-Some C features are footguns by default, so Crowbar ensures that they are only used correctly.
-
-- Unions are not robust by default.
- Crowbar only supports unions when they are [tagged unions](tagged-unions.md) (or declared and used with the `fragile` keyword).
-
-C's syntax isn't perfect, but it's usually pretty good.
-However, sometimes it just sucks, and in those cases Crowbar makes changes.
-
-- C's variable declaration syntax is far from intuitive in nontrivial cases (function pointers, pointer-to-`const` vs `const`-pointer, etc).
- Crowbar uses [simplified type syntax](types.md) to keep types and variable names distinct.
-- `_Bool` is just `bool`, `_Complex` is just `complex` (why drag the preprocessor into it?)
-- Adding a `_` to numeric literals as a separator
-- All string literals, char literals, etc are UTF-8
-- Octal literals have a `0o` prefix (never `0O` because that looks nasty)
-
-# Additions
-
-## Anti-Footguns
-
-- C is generous with memory in ways that are unreliable by default.
- Crowbar adds [memory safety conventions](safety.md) to make correctness the default behavior.
-- C's conventions for error handling are unreliable by default.
- Crowbar adds [error propagation](errors.md) to make correctness the default behavior.
-
-## Trivial Room For Improvement
-
-- Binary literals, prefixed with `0b`/`0B`
+What differentiates Crowbar from C?
+
+# Removals
+
+Some of the footguns and complexity in C come from misfeatures that can simply not be used.
+
+## Footguns
+
+Some constructs in C are almost always the wrong thing.
+
+- `goto`
+- Hexadecimal float literals
+- Wide characters
+- Digraphs
+- Prefix `++` and `--`
+- Chaining mixed left and right shifts (e.g. `x << 3 >> 2`)
+- Chaining relational/equality operators (e.g. `3 < x == 2`)
+- Mixed chains of bitwise or logical operators (e.g. `2 & x && 4 ^ y`)
+- The comma operator `,`
+
+Some constructs in C exhibit implicit behavior that should instead be made explicit.
+
+- `typedef`
+- Octal escape sequences
+- Using an assignment operator (`=`, `+=`, etc) or (postfix) `++` and `--` as components in a larger expression
+- The conditional operator `?:`
+- Preprocessor macros (but constants are fine)
+
+## Needless Complexity
+
+Some type modifiers in C exist solely for the purpose of enabling optimizations which most compilers can do already.
+
+- `inline`
+- `register`
+
+Some type modifiers in C only apply in very specific circumstances and so aren't important.
+
+- `restrict`
+- `volatile`
+- `_Imaginary`
+
+# Adjustments
+
+Some C features are footguns by default, so Crowbar ensures that they are only used correctly.
+
+- Unions are not robust by default.
+ Crowbar only supports unions when they are [tagged unions](tagged-unions.md) (or declared and used with the `fragile` keyword).
+
+C's syntax isn't perfect, but it's usually pretty good.
+However, sometimes it just sucks, and in those cases Crowbar makes changes.
+
+- C's variable declaration syntax is far from intuitive in nontrivial cases (function pointers, pointer-to-`const` vs `const`-pointer, etc).
+ Crowbar uses [simplified type syntax](types.md) to keep types and variable names distinct.
+- `_Bool` is just `bool`, `_Complex` is just `complex` (why drag the preprocessor into it?)
+- Adding a `_` to numeric literals as a separator
+- All string literals, char literals, etc are UTF-8
+- Octal literals have a `0o` prefix (never `0O` because that looks nasty)
+
+# Additions
+
+## Anti-Footguns
+
+- C is generous with memory in ways that are unreliable by default.
+ Crowbar adds [memory safety conventions](safety.md) to make correctness the default behavior.
+- C's conventions for error handling are unreliable by default.
+ Crowbar adds [error propagation](errors.md) to make correctness the default behavior.
+
+## Trivial Room For Improvement
+
+- Binary literals, prefixed with `0b`/`0B`