diff options
author | Melody Horn <melody@boringcactus.com> | 2020-10-25 11:40:21 -0600 |
---|---|---|
committer | Melody Horn <melody@boringcactus.com> | 2020-10-25 11:40:21 -0600 |
commit | 1f20ab0d5fe29276a6e55e8bd9aa3e1d967aafdf (patch) | |
tree | 54bd767ad428708385e896e44110f2db4036c77e | |
parent | e434036cffd2a5fa1d7ca9195b45cf1c61a71537 (diff) | |
download | spec-1f20ab0d5fe29276a6e55e8bd9aa3e1d967aafdf.tar.gz spec-1f20ab0d5fe29276a6e55e8bd9aa3e1d967aafdf.zip |
fucking windows line endings smdh
-rw-r--r-- | .build.yml | 30 | ||||
-rw-r--r-- | errors.md | 2 | ||||
-rw-r--r-- | index.md | 74 | ||||
-rw-r--r-- | safety.md | 184 | ||||
-rw-r--r-- | syntax.md | 696 | ||||
-rw-r--r-- | tagged-unions.md | 2 | ||||
-rw-r--r-- | types.md | 2 | ||||
-rw-r--r-- | vs-c.md | 140 |
8 files changed, 565 insertions, 565 deletions
@@ -1,15 +1,15 @@ -image: debian/stable
-packages:
- - pandoc
- - wkhtmltopdf
- - poppler-utils
-sources:
- - https://git.sr.ht/~boringcactus/crowbar-spec
-tasks:
- - page-count: |
- cd crowbar-spec
- pandoc -s -o ../spec.pdf -t html -M "title=Crowbar Specification" *.md
- cd ..
- pdfinfo spec.pdf | grep Pages
-artifacts:
- - spec.pdf
+image: debian/stable +packages: + - pandoc + - wkhtmltopdf + - poppler-utils +sources: + - https://git.sr.ht/~boringcactus/crowbar-spec +tasks: + - page-count: | + cd crowbar-spec + pandoc -s -o ../spec.pdf -t html -M "title=Crowbar Specification" *.md + cd .. + pdfinfo spec.pdf | grep Pages +artifacts: + - spec.pdf @@ -1 +1 @@ -TODO
+TODO @@ -1,37 +1,37 @@ -Crowbar: the good parts of C, with a little bit extra.
-
-**This is entirely a work-in-progress, and should not be relied upon to be stable (or even true) in any way.**
-
-Crowbar is a language that is derived from (and, wherever possible, interoperable with) C, and aims to remove as many [footgun](https://en.wiktionary.org/wiki/footgun)s and as much needless complexity from C as possible while still being familiar to C developers.
-
-Ideally, a typical C codebase should be straightforward to rewrite in Crowbar, and any atypical C constructions not supported by Crowbar can be left as C.
-
-# Context
-
-- [Rust is not a good C replacement](https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-replacement.html)
-
-# cactus's Blog Posts
-
-- [Crowbar: Defining a good C replacement](https://www.boringcactus.com/2020/09/28/crowbar-1-defining-a-c-replacement.html)
-- [Crowbar: Simplifying C's type names](https://www.boringcactus.com/2020/10/13/crowbar-2-simplifying-c-type-names.html)
-- [Crowbar: Turns out, language development is hard](https://www.boringcactus.com/2020/10/19/crowbar-3-this-is-tough.html)
-
-# Comparison with C
-
-The [comparison with C](vs-c.md) is an informal overview of the places where Crowbar and C diverge.
-
-# Syntax
-
-[Read the Syntax chapter of the spec.](syntax.md)
-
-# Semantics
-
-TODO
-
-# Discuss
-
-- [announcement mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-announce)
-- [permanent discussion mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-devel)
-- ephemeral discussions via IRC: #crowbar-lang on freenode ([join via irc](ircs://chat.freenode.net/#crowbar-lang), [join via web](https://webchat.freenode.net/#crowbar-lang))
-
-[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)
+Crowbar: the good parts of C, with a little bit extra. + +**This is entirely a work-in-progress, and should not be relied upon to be stable (or even true) in any way.** + +Crowbar is a language that is derived from (and, wherever possible, interoperable with) C, and aims to remove as many [footgun](https://en.wiktionary.org/wiki/footgun)s and as much needless complexity from C as possible while still being familiar to C developers. + +Ideally, a typical C codebase should be straightforward to rewrite in Crowbar, and any atypical C constructions not supported by Crowbar can be left as C. + +# Motivation + +- [Rust is not a good C replacement](https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-replacement.html) + +# Journal + +- [Crowbar: Defining a good C replacement](https://www.boringcactus.com/2020/09/28/crowbar-1-defining-a-c-replacement.html) +- [Crowbar: Simplifying C's type names](https://www.boringcactus.com/2020/10/13/crowbar-2-simplifying-c-type-names.html) +- [Crowbar: Turns out, language development is hard](https://www.boringcactus.com/2020/10/19/crowbar-3-this-is-tough.html) + +# Comparison with C + +The [comparison with C](vs-c.md) is an informal overview of the places where Crowbar and C diverge. + +# Syntax + +[Read the Syntax chapter of the spec.](syntax.md) + +# Semantics + +TODO + +# Discuss + +- [announcement mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-announce) +- [permanent discussion mailing list](https://lists.sr.ht/~boringcactus/crowbar-lang-devel) +- ephemeral discussions via IRC: #crowbar-lang on freenode ([join via irc](ircs://chat.freenode.net/#crowbar-lang), [join via web](https://webchat.freenode.net/#crowbar-lang)) + +[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/) @@ -1,92 +1,92 @@ -Each item in Wikipedia's [list of types of memory errors](https://en.wikipedia.org/wiki/Memory_safety#Types_of_memory_errors) and what Crowbar does to prevent them.
-
-In general, Crowbar does its best to ensure that code will not exhibit any of the following memory errors.
-However, sometimes the compiler knows less than the programmer, and so code that looks dangerous is actually fine.
-Crowbar allows programmers to suspend the memory safety checks with the `fragile` keyword.
-
-# Access errors
-
-## Buffer overflow
-
-Crowbar addresses buffer overflow with bounds checking.
-In C, the type `char *` can point to a single character, a null-terminated string of unknown length, a buffer of fixed size, or nothing at all.
-In Crowbar, the type `char *` can only point to either a single character or nothing at all.
-If a buffer is declared as `char[50] name;` then it has type `char[50]`, and can be implicitly converted to `(char[50])*`, a pointer-to-50-chars.
-If memory is dynamically allocated, it works as follows:
-
-```crowbar
-void process(size_t bufferSize, char[bufferSize] buffer) {
- // do some work with buffer, given that we know its size
-}
-
-int main(int argc, (char[1024?])[argc] argv) {
- size_t bufferSize = getBufferSize();
- (char[bufferSize])* buffer = malloc(bufferSize);
- process(bufferSize, buffer);
- free(buffer);
-}
-```
-
-Note that `malloc` as part of the Crowbar standard library has signature `(void[size])* malloc(size_t size);` and so no cast is needed above.
-In C, `buffer` in `main` would have type pointer-to-VLA-of-char, but `buffer` in `process` would have type VLA-of-char, and this conversion would emit a compiler warning.
-However, in Crowbar, a `(T[N])*` is always implicitly convertible to `T[N]`, so no warning exists.
-(This is translated into C by dereferencing `buffer` in `main`.)
-
-Note as well that the type of `argv` is complicated.
-This is because the elements of `argv` have unconstrained size.
-TODO figure out if that's the right way to handle that
-
-## Buffer over-read
-
-bounds checking again
-
-## Race condition
-
-uhhhhh π€·ββοΈ
-
-## Page fault
-
-bounds checking, dubious-pointer checking
-
-## Use after free
-
-`free(x);` not followed by `x = NULL;` is a compiler error.
-`owned` and `borrowed` keywords
-
-# Uninitialized variables
-
-forbid them in syntax
-
-## Null pointer dereference
-
-dubious-pointer checking
-
-## Wild pointers
-
-dubious-pointer checking
-
-# Memory leak
-
-## Stack exhaustion
-
-uhhhhhh π€·ββοΈ
-
-## Heap exhaustion
-
-that counts as error handling, just the `malloc`-shaped kind
-
-## Double free
-
-this is just use-after-free but the use is calling free on it
-
-## Invalid free
-
-don't do that
-
-## Mismatched free
-
-how does that even happen
-
-## Unwanted aliasing
-
-uhhh don't do that?
+Each item in Wikipedia's [list of types of memory errors](https://en.wikipedia.org/wiki/Memory_safety#Types_of_memory_errors) and what Crowbar does to prevent them. + +In general, Crowbar does its best to ensure that code will not exhibit any of the following memory errors. +However, sometimes the compiler knows less than the programmer, and so code that looks dangerous is actually fine. +Crowbar allows programmers to suspend the memory safety checks with the `fragile` keyword. + +# Access errors + +## Buffer overflow + +Crowbar addresses buffer overflow with bounds checking. +In C, the type `char *` can point to a single character, a null-terminated string of unknown length, a buffer of fixed size, or nothing at all. +In Crowbar, the type `char *` can only point to either a single character or nothing at all. +If a buffer is declared as `char[50] name;` then it has type `char[50]`, and can be implicitly converted to `(char[50])*`, a pointer-to-50-chars. +If memory is dynamically allocated, it works as follows: + +```crowbar +void process(size_t bufferSize, char[bufferSize] buffer) { + // do some work with buffer, given that we know its size +} + +int main(int argc, (char[1024?])[argc] argv) { + size_t bufferSize = getBufferSize(); + (char[bufferSize])* buffer = malloc(bufferSize); + process(bufferSize, buffer); + free(buffer); +} +``` + +Note that `malloc` as part of the Crowbar standard library has signature `(void[size])* malloc(size_t size);` and so no cast is needed above. +In C, `buffer` in `main` would have type pointer-to-VLA-of-char, but `buffer` in `process` would have type VLA-of-char, and this conversion would emit a compiler warning. +However, in Crowbar, a `(T[N])*` is always implicitly convertible to `T[N]`, so no warning exists. +(This is translated into C by dereferencing `buffer` in `main`.) + +Note as well that the type of `argv` is complicated. +This is because the elements of `argv` have unconstrained size. +TODO figure out if that's the right way to handle that + +## Buffer over-read + +bounds checking again + +## Race condition + +uhhhhh π€·ββοΈ + +## Page fault + +bounds checking, dubious-pointer checking + +## Use after free + +`free(x);` not followed by `x = NULL;` is a compiler error. +`owned` and `borrowed` keywords + +# Uninitialized variables + +forbid them in syntax + +## Null pointer dereference + +dubious-pointer checking + +## Wild pointers + +dubious-pointer checking + +# Memory leak + +## Stack exhaustion + +uhhhhhh π€·ββοΈ + +## Heap exhaustion + +that counts as error handling, just the `malloc`-shaped kind + +## Double free + +this is just use-after-free but the use is calling free on it + +## Invalid free + +don't do that + +## Mismatched free + +how does that even happen + +## Unwanted aliasing + +uhhh don't do that? @@ -1,348 +1,348 @@ -The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features.
-
-# Source Files
-
-A Crowbar source file is UTF-8.
-Crowbar source files can come in two varieties, an *implementation file* and a *header file*.
-An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension.
-
-A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree).
-
-# Scanning
-
-A *token* is one of the following kinds of token:
-- a *keyword*,
-- an *identifier*,
-- a *constant*,
-- a *string literal*,
-- or a *punctuator*.
-
-Tokens are separated by either *whitespace* or a *comment*.
-
-## Keywords
-
-A *keyword* is one of the following literal words:
-- `bool`
-- `break`
-- `case`
-- `char`
-- `const`
-- `continue`
-- `default`
-- `do`
-- `double`
-- `else`
-- `enum`
-- `extern`
-- `float`
-- `for`
-- `fragile`
-- `function`
-- `if`
-- `include`
-- `int`
-- `long`
-- `return`
-- `short`
-- `signed`
-- `sizeof`
-- `struct`
-- `switch`
-- `typedef`
-- `unsigned`
-- `void`
-- `while`
-
-## Identifiers
-
-An *identifier* is a sequence of one or more characters having Unicode categories within a legal set.
-
-The first character in an identifier must have one of the following Unicode categories:
-- `Pc` Connector Punctuation (e.g. `_`)
-- `Ll` Lowercase Letter (e.g. `h`)
-- `Lm` Modifier Letter (e.g. `ΚΉ`, U+02B9 Modifier Letter Prime)
-- `Lo` Other Letter (e.g. `Χ`, U+05D0 Hebrew Letter Alef)
-- `Lt` Titlecase Letter (e.g. `Η
`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron)
-- `Lu` Uppercase Letter (e.g. `B`)
-- `Mn` Nonspacing Mark (e.g. ` Μ`, U+0302 Combining Circumflex Accent)
-- `Sk` Modifier Symbol (e.g. `^`, U+005E Circumflex Accent)
-
-Subsequent characters may have any of the above-listed Unicode categories, or one of the following:
-- `Nd` Decimal Digit Number (e.g. `0`)
-- `Nl` Letter Number (e.g. `β
£`, U+2163 Roman Numeral Four)
-- `No` Other Number (e.g. `ΒΌ`, U+00BC Vulgar Fraction One Quarter)
-
-## Constants
-
-A *constant* can have one of six types:
-- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`};
-- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`};
-- an *octal constant*, the prefix `0o` followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `_`};
-- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`};
-- a *floating-point constant*, a decimal constant followed by one of
- - `.` followed by a decimal constant,
- - either `e` or `E` followed by a decimal constant,
- - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant;
-- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`.
-
-### Escape Sequences
-
-The following sequences of characters are *escape sequences*:
-- `\'`
-- `\"`
-- `\\`
-- `\r`
-- `\n`
-- `\t`
-- `\0`
-- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`}
-
-## String Literals
-
-A *string literal* begins with a `"`.
-It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`.
-It then ends with a `"`.
-
-## Punctuators
-
-The following sequences of characters form *punctuators*:
-- `[`
-- `]`
-- `(`
-- `)`
-- `{`
-- `}`
-- `.`
-- `,`
-- `+`
-- `-`
-- `*`
-- `/`
-- `%`
-- `;`
-- `!`
-- `&`
-- `|`
-- `^`
-- the tilde, `~` (given special treatment on this line due to [a bug in the Markdown renderer that sr.ht uses](https://github.com/miyuchina/mistletoe/issues/91))
-- `>`
-- `<`
-- `=`
-- `->`
-- `++`
-- `--`
-- `>>`
-- `<<`
-- `<=`
-- `>=`
-- `==`
-- `!=`
-- `&&`
-- `||`
-- `+=`
-- `-=`
-- `*=`
-- `/=`
-- `%=`
-- `&=`
-- `|=`
-- `^=`
-
-## Whitespace
-
-A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other.
-
-## Comments
-
-A *comment* can be either a *line comment* or a *block comment*.
-
-A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A.
-
-A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`.
-
-# Parsing
-
-The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar):
-
-## Entry points
-
-```
-HeaderFile β HeaderFileElement+
-HeaderFileElement β IncludeStatement /
- TypeDeclaration /
- FunctionDeclaration
-
-ImplementationFile β ImplementationFileElement+
-ImplementationFileElement β HeaderFileElement /
- FunctionDefinition
-```
-
-## Top-level elements
-
-```
-IncludeStatement β 'include' string-literal ';'
-
-TypeDeclaration β StructDeclaration /
- EnumDeclaration /
- TypedefDeclaration
-StructDeclaration β 'struct' identifier '{' VariableDeclaration+ '}' ';'
-EnumDeclaration β 'enum' identifier '{' EnumBody '}' ';'
-EnumBody β identifier ('=' Expression)? ',' EnumBody /
- identifier ('=' Expression)? ','?
-TypedefDeclaration β 'typedef' identifier '=' Type ';'
-
-FunctionDeclaration β FunctionSignature ';'
-FunctionDefinition β FunctionSignature Block
-FunctionSignature β Type identifier '(' SignatureArguments? ')'
-SignatureArguments β Type identifier ',' SignatureArguments /
- Type identifier ','?
-```
-
-## Statements
-
-```
-Block β '{' Statement* '}'
-
-Statement β VariableDefinition /
- VariableDeclaration /
- IfStatement /
- SwitchStatement /
- WhileStatement /
- DoWhileStatement /
- ForStatement /
- FlowControlStatement /
- AssignmentStatement /
- ExpressionStatement
-
-VariableDefinition β Type identifier '=' Expression ';'
-VariableDeclaration β Type identifier ';'
-
-IfStatement β 'if' Expression Block 'else' Block /
- 'if' Expression Block
-
-SwitchStatement β 'switch' Expression '{' SwitchCase+ '}'
-SwitchCase β CaseSpecifier Block /
- 'default' Block
-CaseSpecifier β 'case' Expression ',' CaseSpecifier /
- 'case' Expression ','?
-
-WhileStatement β 'while' Expression Block
-DoWhileStatement β 'do' Block 'while' Expression ';'
-ForStatement β 'for' VariableDefinition? ';' Expression ';' AssignmentStatementBody? Block
-
-FlowControlStatement β 'continue' ';' /
- 'break' ';' /
- 'return' Expression? ';'
-
-AssignmentStatement β AssignmentStatementBody ';'
-AssignmentStatementBody β AssignmentTargetExpression '=' Expression /
- AssignmentTargetExpression '+=' Expression /
- AssignmentTargetExpression '-=' Expression /
- AssignmentTargetExpression '*=' Expression /
- AssignmentTargetExpression '/=' Expression /
- AssignmentTargetExpression '%=' Expression /
- AssignmentTargetExpression '&=' Expression /
- AssignmentTargetExpression '^=' Expression /
- AssignmentTargetExpression '|=' Expression /
- AssignmentTargetExpression '++' /
- AssignmentTargetExpression '--'
-
-ExpressionStatement β Expression ';'
-```
-
-## Types
-
-```
-Type β 'const' BasicType /
- BasicType '*' /
- BasicType '[' Expression ']' /
- BasicType 'function' '(' (BasicType ',')* ')' /
- BasicType
-BasicType β 'void' /
- IntegerType /
- 'signed' IntegerType /
- 'unsigned' IntegerType /
- 'float' /
- 'double' /
- 'bool' /
- 'struct' identifier /
- 'enum' identifier /
- 'typedef' identifier /
- '(' Type ')'
-IntegerType β 'char' /
- 'short' /
- 'int' /
- 'long'
-```
-
-## Expressions
-
-```
-AssignmentTargetExpression β identifier ATEElementSuffix*
-ATEElementSuffix β '[' Expression ']' /
- '.' identifier /
- '->' identifier
-
-AtomicExpression β identifier /
- constant /
- string-literal /
- '(' Expression ')'
-
-ObjectExpression β AtomicExpression ObjectSuffix* /
- ArrayLiteralExpression /
- StructLiteralExpression
-ObjectSuffix β '[' Expression ']' /
- '(' CommasExpressionList? ')' /
- '.' identifier /
- '->' identifier
-CommasExpressionList β Expression ',' CommasExpressionList? /
- Expression ','?
-ArrayLiteralExpression β '{' CommasExpressionList '}'
-StructLiteralExpression β '{' StructLiteralBody '}'
-StructLiteralBody β StructLiteralElement ',' StructLiteralBody? /
- StructLiteralElement ','?
-StructLiteralElement β '.' identifier '=' Expression
-
-FactorExpression β '(' Type ')' FactorExpression /
- '&' FactorExpression /
- '*' FactorExpression /
- '+' FactorExpression /
- '-' FactorExpression /
- '~' FactorExpression /
- '!' FactorExpression /
- 'sizeof' FactorExpression /
- 'sizeof' Type /
- ObjectExpression
-
-TermExpression β FactorExpression TermSuffix*
-TermSuffix β '*' FactorExpression /
- '/' FactorExpression /
- '%' FactorExpression
-
-ArithmeticExpression β TermExpression ArithmeticSuffix*
-ArithmeticSuffix β '+' TermExpression /
- '-' TermExpression
-
-BitwiseOpExpression β ArithmeticExpression '<<' ArithmeticExpression /
- ArithmeticExpression '>>' ArithmeticExpression /
- ArithmeticExpression '^' ArithmeticExpression /
- ArithmeticExpression ('&' ArithmeticExpression)+ /
- ArithmeticExpression ('|' ArithmeticExpression)+ /
- ArithmeticExpression
-
-ComparisonExpression β BitwiseOpExpression '==' BitwiseOpExpression /
- BitwiseOpExpression '!=' BitwiseOpExpression /
- BitwiseOpExpression '<=' BitwiseOpExpression /
- BitwiseOpExpression '>=' BitwiseOpExpression /
- BitwiseOpExpression '<' BitwiseOpExpression /
- BitwiseOpExpression '>' BitwiseOpExpression /
- BitwiseOpExpression
-
-Expression β ComparisonExpression ('&&' ComparisonExpression)+ /
- ComparisonExpression ('||' ComparisonExpression)+ /
- ComparisonExpression
-```
-
-[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)
+The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features. + +# Source Files + +A Crowbar source file is UTF-8. +Crowbar source files can come in two varieties, an *implementation file* and a *header file*. +An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension. + +A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree). + +# Scanning + +A *token* is one of the following kinds of token: +- a *keyword*, +- an *identifier*, +- a *constant*, +- a *string literal*, +- or a *punctuator*. + +Tokens are separated by either *whitespace* or a *comment*. + +## Keywords + +A *keyword* is one of the following literal words: +- `bool` +- `break` +- `case` +- `char` +- `const` +- `continue` +- `default` +- `do` +- `double` +- `else` +- `enum` +- `extern` +- `float` +- `for` +- `fragile` +- `function` +- `if` +- `include` +- `int` +- `long` +- `return` +- `short` +- `signed` +- `sizeof` +- `struct` +- `switch` +- `typedef` +- `unsigned` +- `void` +- `while` + +## Identifiers + +An *identifier* is a sequence of one or more characters having Unicode categories within a legal set. + +The first character in an identifier must have one of the following Unicode categories: +- `Pc` Connector Punctuation (e.g. `_`) +- `Ll` Lowercase Letter (e.g. `h`) +- `Lm` Modifier Letter (e.g. `ΚΉ`, U+02B9 Modifier Letter Prime) +- `Lo` Other Letter (e.g. `Χ`, U+05D0 Hebrew Letter Alef) +- `Lt` Titlecase Letter (e.g. `Η
`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron) +- `Lu` Uppercase Letter (e.g. `B`) +- `Mn` Nonspacing Mark (e.g. ` Μ`, U+0302 Combining Circumflex Accent) +- `Sk` Modifier Symbol (e.g. `^`, U+005E Circumflex Accent) + +Subsequent characters may have any of the above-listed Unicode categories, or one of the following: +- `Nd` Decimal Digit Number (e.g. `0`) +- `Nl` Letter Number (e.g. `β
£`, U+2163 Roman Numeral Four) +- `No` Other Number (e.g. `ΒΌ`, U+00BC Vulgar Fraction One Quarter) + +## Constants + +A *constant* can have one of six types: +- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`}; +- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`}; +- an *octal constant*, the prefix `0o` followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `_`}; +- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`}; +- a *floating-point constant*, a decimal constant followed by one of + - `.` followed by a decimal constant, + - either `e` or `E` followed by a decimal constant, + - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant; +- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`. + +### Escape Sequences + +The following sequences of characters are *escape sequences*: +- `\'` +- `\"` +- `\\` +- `\r` +- `\n` +- `\t` +- `\0` +- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} +- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} +- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} + +## String Literals + +A *string literal* begins with a `"`. +It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`. +It then ends with a `"`. + +## Punctuators + +The following sequences of characters form *punctuators*: +- `[` +- `]` +- `(` +- `)` +- `{` +- `}` +- `.` +- `,` +- `+` +- `-` +- `*` +- `/` +- `%` +- `;` +- `!` +- `&` +- `|` +- `^` +- the tilde, `~` (given special treatment on this line due to [a bug in the Markdown renderer that sr.ht uses](https://github.com/miyuchina/mistletoe/issues/91)) +- `>` +- `<` +- `=` +- `->` +- `++` +- `--` +- `>>` +- `<<` +- `<=` +- `>=` +- `==` +- `!=` +- `&&` +- `||` +- `+=` +- `-=` +- `*=` +- `/=` +- `%=` +- `&=` +- `|=` +- `^=` + +## Whitespace + +A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other. + +## Comments + +A *comment* can be either a *line comment* or a *block comment*. + +A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A. + +A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`. + +# Parsing + +The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar): + +## Entry points + +``` +HeaderFile β HeaderFileElement+ +HeaderFileElement β IncludeStatement / + TypeDeclaration / + FunctionDeclaration + +ImplementationFile β ImplementationFileElement+ +ImplementationFileElement β HeaderFileElement / + FunctionDefinition +``` + +## Top-level elements + +``` +IncludeStatement β 'include' string-literal ';' + +TypeDeclaration β StructDeclaration / + EnumDeclaration / + TypedefDeclaration +StructDeclaration β 'struct' identifier '{' VariableDeclaration+ '}' ';' +EnumDeclaration β 'enum' identifier '{' EnumBody '}' ';' +EnumBody β identifier ('=' Expression)? ',' EnumBody / + identifier ('=' Expression)? ','? +TypedefDeclaration β 'typedef' identifier '=' Type ';' + +FunctionDeclaration β FunctionSignature ';' +FunctionDefinition β FunctionSignature Block +FunctionSignature β Type identifier '(' SignatureArguments? ')' +SignatureArguments β Type identifier ',' SignatureArguments / + Type identifier ','? +``` + +## Statements + +``` +Block β '{' Statement* '}' + +Statement β VariableDefinition / + VariableDeclaration / + IfStatement / + SwitchStatement / + WhileStatement / + DoWhileStatement / + ForStatement / + FlowControlStatement / + AssignmentStatement / + ExpressionStatement + +VariableDefinition β Type identifier '=' Expression ';' +VariableDeclaration β Type identifier ';' + +IfStatement β 'if' Expression Block 'else' Block / + 'if' Expression Block + +SwitchStatement β 'switch' Expression '{' SwitchCase+ '}' +SwitchCase β CaseSpecifier Block / + 'default' Block +CaseSpecifier β 'case' Expression ',' CaseSpecifier / + 'case' Expression ','? + +WhileStatement β 'while' Expression Block +DoWhileStatement β 'do' Block 'while' Expression ';' +ForStatement β 'for' VariableDefinition? ';' Expression ';' AssignmentStatementBody? Block + +FlowControlStatement β 'continue' ';' / + 'break' ';' / + 'return' Expression? ';' + +AssignmentStatement β AssignmentStatementBody ';' +AssignmentStatementBody β AssignmentTargetExpression '=' Expression / + AssignmentTargetExpression '+=' Expression / + AssignmentTargetExpression '-=' Expression / + AssignmentTargetExpression '*=' Expression / + AssignmentTargetExpression '/=' Expression / + AssignmentTargetExpression '%=' Expression / + AssignmentTargetExpression '&=' Expression / + AssignmentTargetExpression '^=' Expression / + AssignmentTargetExpression '|=' Expression / + AssignmentTargetExpression '++' / + AssignmentTargetExpression '--' + +ExpressionStatement β Expression ';' +``` + +## Types + +``` +Type β 'const' BasicType / + BasicType '*' / + BasicType '[' Expression ']' / + BasicType 'function' '(' (BasicType ',')* ')' / + BasicType +BasicType β 'void' / + IntegerType / + 'signed' IntegerType / + 'unsigned' IntegerType / + 'float' / + 'double' / + 'bool' / + 'struct' identifier / + 'enum' identifier / + 'typedef' identifier / + '(' Type ')' +IntegerType β 'char' / + 'short' / + 'int' / + 'long' +``` + +## Expressions + +``` +AssignmentTargetExpression β identifier ATEElementSuffix* +ATEElementSuffix β '[' Expression ']' / + '.' identifier / + '->' identifier + +AtomicExpression β identifier / + constant / + string-literal / + '(' Expression ')' + +ObjectExpression β AtomicExpression ObjectSuffix* / + ArrayLiteralExpression / + StructLiteralExpression +ObjectSuffix β '[' Expression ']' / + '(' CommasExpressionList? ')' / + '.' identifier / + '->' identifier +CommasExpressionList β Expression ',' CommasExpressionList? / + Expression ','? +ArrayLiteralExpression β '{' CommasExpressionList '}' +StructLiteralExpression β '{' StructLiteralBody '}' +StructLiteralBody β StructLiteralElement ',' StructLiteralBody? / + StructLiteralElement ','? +StructLiteralElement β '.' identifier '=' Expression + +FactorExpression β '(' Type ')' FactorExpression / + '&' FactorExpression / + '*' FactorExpression / + '+' FactorExpression / + '-' FactorExpression / + '~' FactorExpression / + '!' FactorExpression / + 'sizeof' FactorExpression / + 'sizeof' Type / + ObjectExpression + +TermExpression β FactorExpression TermSuffix* +TermSuffix β '*' FactorExpression / + '/' FactorExpression / + '%' FactorExpression + +ArithmeticExpression β TermExpression ArithmeticSuffix* +ArithmeticSuffix β '+' TermExpression / + '-' TermExpression + +BitwiseOpExpression β ArithmeticExpression '<<' ArithmeticExpression / + ArithmeticExpression '>>' ArithmeticExpression / + ArithmeticExpression '^' ArithmeticExpression / + ArithmeticExpression ('&' ArithmeticExpression)+ / + ArithmeticExpression ('|' ArithmeticExpression)+ / + ArithmeticExpression + +ComparisonExpression β BitwiseOpExpression '==' BitwiseOpExpression / + BitwiseOpExpression '!=' BitwiseOpExpression / + BitwiseOpExpression '<=' BitwiseOpExpression / + BitwiseOpExpression '>=' BitwiseOpExpression / + BitwiseOpExpression '<' BitwiseOpExpression / + BitwiseOpExpression '>' BitwiseOpExpression / + BitwiseOpExpression + +Expression β ComparisonExpression ('&&' ComparisonExpression)+ / + ComparisonExpression ('||' ComparisonExpression)+ / + ComparisonExpression +``` + +[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/) diff --git a/tagged-unions.md b/tagged-unions.md index 1ea1912..1333ed7 100644 --- a/tagged-unions.md +++ b/tagged-unions.md @@ -1 +1 @@ -TODO
+TODO @@ -1 +1 @@ -TODO
+TODO @@ -1,70 +1,70 @@ -What differentiates Crowbar from C?
-
-# Removals
-
-Some of the footguns and complexity in C come from misfeatures that can simply not be used.
-
-## Footguns
-
-Some constructs in C are almost always the wrong thing.
-
-- `goto`
-- Hexadecimal float literals
-- Wide characters
-- Digraphs
-- Prefix `++` and `--`
-- Chaining mixed left and right shifts (e.g. `x << 3 >> 2`)
-- Chaining relational/equality operators (e.g. `3 < x == 2`)
-- Mixed chains of bitwise or logical operators (e.g. `2 & x && 4 ^ y`)
-- The comma operator `,`
-
-Some constructs in C exhibit implicit behavior that should instead be made explicit.
-
-- `typedef`
-- Octal escape sequences
-- Using an assignment operator (`=`, `+=`, etc) or (postfix) `++` and `--` as components in a larger expression
-- The conditional operator `?:`
-- Preprocessor macros (but constants are fine)
-
-## Needless Complexity
-
-Some type modifiers in C exist solely for the purpose of enabling optimizations which most compilers can do already.
-
-- `inline`
-- `register`
-
-Some type modifiers in C only apply in very specific circumstances and so aren't important.
-
-- `restrict`
-- `volatile`
-- `_Imaginary`
-
-# Adjustments
-
-Some C features are footguns by default, so Crowbar ensures that they are only used correctly.
-
-- Unions are not robust by default.
- Crowbar only supports unions when they are [tagged unions](tagged-unions.md) (or declared and used with the `fragile` keyword).
-
-C's syntax isn't perfect, but it's usually pretty good.
-However, sometimes it just sucks, and in those cases Crowbar makes changes.
-
-- C's variable declaration syntax is far from intuitive in nontrivial cases (function pointers, pointer-to-`const` vs `const`-pointer, etc).
- Crowbar uses [simplified type syntax](types.md) to keep types and variable names distinct.
-- `_Bool` is just `bool`, `_Complex` is just `complex` (why drag the preprocessor into it?)
-- Adding a `_` to numeric literals as a separator
-- All string literals, char literals, etc are UTF-8
-- Octal literals have a `0o` prefix (never `0O` because that looks nasty)
-
-# Additions
-
-## Anti-Footguns
-
-- C is generous with memory in ways that are unreliable by default.
- Crowbar adds [memory safety conventions](safety.md) to make correctness the default behavior.
-- C's conventions for error handling are unreliable by default.
- Crowbar adds [error propagation](errors.md) to make correctness the default behavior.
-
-## Trivial Room For Improvement
-
-- Binary literals, prefixed with `0b`/`0B`
+What differentiates Crowbar from C? + +# Removals + +Some of the footguns and complexity in C come from misfeatures that can simply not be used. + +## Footguns + +Some constructs in C are almost always the wrong thing. + +- `goto` +- Hexadecimal float literals +- Wide characters +- Digraphs +- Prefix `++` and `--` +- Chaining mixed left and right shifts (e.g. `x << 3 >> 2`) +- Chaining relational/equality operators (e.g. `3 < x == 2`) +- Mixed chains of bitwise or logical operators (e.g. `2 & x && 4 ^ y`) +- The comma operator `,` + +Some constructs in C exhibit implicit behavior that should instead be made explicit. + +- `typedef` +- Octal escape sequences +- Using an assignment operator (`=`, `+=`, etc) or (postfix) `++` and `--` as components in a larger expression +- The conditional operator `?:` +- Preprocessor macros (but constants are fine) + +## Needless Complexity + +Some type modifiers in C exist solely for the purpose of enabling optimizations which most compilers can do already. + +- `inline` +- `register` + +Some type modifiers in C only apply in very specific circumstances and so aren't important. + +- `restrict` +- `volatile` +- `_Imaginary` + +# Adjustments + +Some C features are footguns by default, so Crowbar ensures that they are only used correctly. + +- Unions are not robust by default. + Crowbar only supports unions when they are [tagged unions](tagged-unions.md) (or declared and used with the `fragile` keyword). + +C's syntax isn't perfect, but it's usually pretty good. +However, sometimes it just sucks, and in those cases Crowbar makes changes. + +- C's variable declaration syntax is far from intuitive in nontrivial cases (function pointers, pointer-to-`const` vs `const`-pointer, etc). + Crowbar uses [simplified type syntax](types.md) to keep types and variable names distinct. +- `_Bool` is just `bool`, `_Complex` is just `complex` (why drag the preprocessor into it?) +- Adding a `_` to numeric literals as a separator +- All string literals, char literals, etc are UTF-8 +- Octal literals have a `0o` prefix (never `0O` because that looks nasty) + +# Additions + +## Anti-Footguns + +- C is generous with memory in ways that are unreliable by default. + Crowbar adds [memory safety conventions](safety.md) to make correctness the default behavior. +- C's conventions for error handling are unreliable by default. + Crowbar adds [error propagation](errors.md) to make correctness the default behavior. + +## Trivial Room For Improvement + +- Binary literals, prefixed with `0b`/`0B` |