From 1f20ab0d5fe29276a6e55e8bd9aa3e1d967aafdf Mon Sep 17 00:00:00 2001 From: Melody Horn Date: Sun, 25 Oct 2020 11:40:21 -0600 Subject: fucking windows line endings smdh --- syntax.md | 696 +++++++++++++++++++++++++++++++------------------------------- 1 file changed, 348 insertions(+), 348 deletions(-) (limited to 'syntax.md') diff --git a/syntax.md b/syntax.md index c195811..e0ecea5 100644 --- a/syntax.md +++ b/syntax.md @@ -1,348 +1,348 @@ -The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features. - -# Source Files - -A Crowbar source file is UTF-8. -Crowbar source files can come in two varieties, an *implementation file* and a *header file*. -An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension. - -A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree). - -# Scanning - -A *token* is one of the following kinds of token: -- a *keyword*, -- an *identifier*, -- a *constant*, -- a *string literal*, -- or a *punctuator*. - -Tokens are separated by either *whitespace* or a *comment*. - -## Keywords - -A *keyword* is one of the following literal words: -- `bool` -- `break` -- `case` -- `char` -- `const` -- `continue` -- `default` -- `do` -- `double` -- `else` -- `enum` -- `extern` -- `float` -- `for` -- `fragile` -- `function` -- `if` -- `include` -- `int` -- `long` -- `return` -- `short` -- `signed` -- `sizeof` -- `struct` -- `switch` -- `typedef` -- `unsigned` -- `void` -- `while` - -## Identifiers - -An *identifier* is a sequence of one or more characters having Unicode categories within a legal set. - -The first character in an identifier must have one of the following Unicode categories: -- `Pc` Connector Punctuation (e.g. `_`) -- `Ll` Lowercase Letter (e.g. `h`) -- `Lm` Modifier Letter (e.g. `ʹ`, U+02B9 Modifier Letter Prime) -- `Lo` Other Letter (e.g. `א`, U+05D0 Hebrew Letter Alef) -- `Lt` Titlecase Letter (e.g. `Dž`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron) -- `Lu` Uppercase Letter (e.g. `B`) -- `Mn` Nonspacing Mark (e.g. ` ̂`, U+0302 Combining Circumflex Accent) -- `Sk` Modifier Symbol (e.g. `^`, U+005E Circumflex Accent) - -Subsequent characters may have any of the above-listed Unicode categories, or one of the following: -- `Nd` Decimal Digit Number (e.g. `0`) -- `Nl` Letter Number (e.g. `Ⅳ`, U+2163 Roman Numeral Four) -- `No` Other Number (e.g. `¼`, U+00BC Vulgar Fraction One Quarter) - -## Constants - -A *constant* can have one of six types: -- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`}; -- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`}; -- an *octal constant*, the prefix `0o` followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `_`}; -- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`}; -- a *floating-point constant*, a decimal constant followed by one of - - `.` followed by a decimal constant, - - either `e` or `E` followed by a decimal constant, - - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant; -- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`. - -### Escape Sequences - -The following sequences of characters are *escape sequences*: -- `\'` -- `\"` -- `\\` -- `\r` -- `\n` -- `\t` -- `\0` -- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} -- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} -- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} - -## String Literals - -A *string literal* begins with a `"`. -It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`. -It then ends with a `"`. - -## Punctuators - -The following sequences of characters form *punctuators*: -- `[` -- `]` -- `(` -- `)` -- `{` -- `}` -- `.` -- `,` -- `+` -- `-` -- `*` -- `/` -- `%` -- `;` -- `!` -- `&` -- `|` -- `^` -- the tilde, `~` (given special treatment on this line due to [a bug in the Markdown renderer that sr.ht uses](https://github.com/miyuchina/mistletoe/issues/91)) -- `>` -- `<` -- `=` -- `->` -- `++` -- `--` -- `>>` -- `<<` -- `<=` -- `>=` -- `==` -- `!=` -- `&&` -- `||` -- `+=` -- `-=` -- `*=` -- `/=` -- `%=` -- `&=` -- `|=` -- `^=` - -## Whitespace - -A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other. - -## Comments - -A *comment* can be either a *line comment* or a *block comment*. - -A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A. - -A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`. - -# Parsing - -The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar): - -## Entry points - -``` -HeaderFile ← HeaderFileElement+ -HeaderFileElement ← IncludeStatement / - TypeDeclaration / - FunctionDeclaration - -ImplementationFile ← ImplementationFileElement+ -ImplementationFileElement ← HeaderFileElement / - FunctionDefinition -``` - -## Top-level elements - -``` -IncludeStatement ← 'include' string-literal ';' - -TypeDeclaration ← StructDeclaration / - EnumDeclaration / - TypedefDeclaration -StructDeclaration ← 'struct' identifier '{' VariableDeclaration+ '}' ';' -EnumDeclaration ← 'enum' identifier '{' EnumBody '}' ';' -EnumBody ← identifier ('=' Expression)? ',' EnumBody / - identifier ('=' Expression)? ','? -TypedefDeclaration ← 'typedef' identifier '=' Type ';' - -FunctionDeclaration ← FunctionSignature ';' -FunctionDefinition ← FunctionSignature Block -FunctionSignature ← Type identifier '(' SignatureArguments? ')' -SignatureArguments ← Type identifier ',' SignatureArguments / - Type identifier ','? -``` - -## Statements - -``` -Block ← '{' Statement* '}' - -Statement ← VariableDefinition / - VariableDeclaration / - IfStatement / - SwitchStatement / - WhileStatement / - DoWhileStatement / - ForStatement / - FlowControlStatement / - AssignmentStatement / - ExpressionStatement - -VariableDefinition ← Type identifier '=' Expression ';' -VariableDeclaration ← Type identifier ';' - -IfStatement ← 'if' Expression Block 'else' Block / - 'if' Expression Block - -SwitchStatement ← 'switch' Expression '{' SwitchCase+ '}' -SwitchCase ← CaseSpecifier Block / - 'default' Block -CaseSpecifier ← 'case' Expression ',' CaseSpecifier / - 'case' Expression ','? - -WhileStatement ← 'while' Expression Block -DoWhileStatement ← 'do' Block 'while' Expression ';' -ForStatement ← 'for' VariableDefinition? ';' Expression ';' AssignmentStatementBody? Block - -FlowControlStatement ← 'continue' ';' / - 'break' ';' / - 'return' Expression? ';' - -AssignmentStatement ← AssignmentStatementBody ';' -AssignmentStatementBody ← AssignmentTargetExpression '=' Expression / - AssignmentTargetExpression '+=' Expression / - AssignmentTargetExpression '-=' Expression / - AssignmentTargetExpression '*=' Expression / - AssignmentTargetExpression '/=' Expression / - AssignmentTargetExpression '%=' Expression / - AssignmentTargetExpression '&=' Expression / - AssignmentTargetExpression '^=' Expression / - AssignmentTargetExpression '|=' Expression / - AssignmentTargetExpression '++' / - AssignmentTargetExpression '--' - -ExpressionStatement ← Expression ';' -``` - -## Types - -``` -Type ← 'const' BasicType / - BasicType '*' / - BasicType '[' Expression ']' / - BasicType 'function' '(' (BasicType ',')* ')' / - BasicType -BasicType ← 'void' / - IntegerType / - 'signed' IntegerType / - 'unsigned' IntegerType / - 'float' / - 'double' / - 'bool' / - 'struct' identifier / - 'enum' identifier / - 'typedef' identifier / - '(' Type ')' -IntegerType ← 'char' / - 'short' / - 'int' / - 'long' -``` - -## Expressions - -``` -AssignmentTargetExpression ← identifier ATEElementSuffix* -ATEElementSuffix ← '[' Expression ']' / - '.' identifier / - '->' identifier - -AtomicExpression ← identifier / - constant / - string-literal / - '(' Expression ')' - -ObjectExpression ← AtomicExpression ObjectSuffix* / - ArrayLiteralExpression / - StructLiteralExpression -ObjectSuffix ← '[' Expression ']' / - '(' CommasExpressionList? ')' / - '.' identifier / - '->' identifier -CommasExpressionList ← Expression ',' CommasExpressionList? / - Expression ','? -ArrayLiteralExpression ← '{' CommasExpressionList '}' -StructLiteralExpression ← '{' StructLiteralBody '}' -StructLiteralBody ← StructLiteralElement ',' StructLiteralBody? / - StructLiteralElement ','? -StructLiteralElement ← '.' identifier '=' Expression - -FactorExpression ← '(' Type ')' FactorExpression / - '&' FactorExpression / - '*' FactorExpression / - '+' FactorExpression / - '-' FactorExpression / - '~' FactorExpression / - '!' FactorExpression / - 'sizeof' FactorExpression / - 'sizeof' Type / - ObjectExpression - -TermExpression ← FactorExpression TermSuffix* -TermSuffix ← '*' FactorExpression / - '/' FactorExpression / - '%' FactorExpression - -ArithmeticExpression ← TermExpression ArithmeticSuffix* -ArithmeticSuffix ← '+' TermExpression / - '-' TermExpression - -BitwiseOpExpression ← ArithmeticExpression '<<' ArithmeticExpression / - ArithmeticExpression '>>' ArithmeticExpression / - ArithmeticExpression '^' ArithmeticExpression / - ArithmeticExpression ('&' ArithmeticExpression)+ / - ArithmeticExpression ('|' ArithmeticExpression)+ / - ArithmeticExpression - -ComparisonExpression ← BitwiseOpExpression '==' BitwiseOpExpression / - BitwiseOpExpression '!=' BitwiseOpExpression / - BitwiseOpExpression '<=' BitwiseOpExpression / - BitwiseOpExpression '>=' BitwiseOpExpression / - BitwiseOpExpression '<' BitwiseOpExpression / - BitwiseOpExpression '>' BitwiseOpExpression / - BitwiseOpExpression - -Expression ← ComparisonExpression ('&&' ComparisonExpression)+ / - ComparisonExpression ('||' ComparisonExpression)+ / - ComparisonExpression -``` - -[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/) +The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features. + +# Source Files + +A Crowbar source file is UTF-8. +Crowbar source files can come in two varieties, an *implementation file* and a *header file*. +An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension. + +A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree). + +# Scanning + +A *token* is one of the following kinds of token: +- a *keyword*, +- an *identifier*, +- a *constant*, +- a *string literal*, +- or a *punctuator*. + +Tokens are separated by either *whitespace* or a *comment*. + +## Keywords + +A *keyword* is one of the following literal words: +- `bool` +- `break` +- `case` +- `char` +- `const` +- `continue` +- `default` +- `do` +- `double` +- `else` +- `enum` +- `extern` +- `float` +- `for` +- `fragile` +- `function` +- `if` +- `include` +- `int` +- `long` +- `return` +- `short` +- `signed` +- `sizeof` +- `struct` +- `switch` +- `typedef` +- `unsigned` +- `void` +- `while` + +## Identifiers + +An *identifier* is a sequence of one or more characters having Unicode categories within a legal set. + +The first character in an identifier must have one of the following Unicode categories: +- `Pc` Connector Punctuation (e.g. `_`) +- `Ll` Lowercase Letter (e.g. `h`) +- `Lm` Modifier Letter (e.g. `ʹ`, U+02B9 Modifier Letter Prime) +- `Lo` Other Letter (e.g. `א`, U+05D0 Hebrew Letter Alef) +- `Lt` Titlecase Letter (e.g. `Dž`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron) +- `Lu` Uppercase Letter (e.g. `B`) +- `Mn` Nonspacing Mark (e.g. ` ̂`, U+0302 Combining Circumflex Accent) +- `Sk` Modifier Symbol (e.g. `^`, U+005E Circumflex Accent) + +Subsequent characters may have any of the above-listed Unicode categories, or one of the following: +- `Nd` Decimal Digit Number (e.g. `0`) +- `Nl` Letter Number (e.g. `Ⅳ`, U+2163 Roman Numeral Four) +- `No` Other Number (e.g. `¼`, U+00BC Vulgar Fraction One Quarter) + +## Constants + +A *constant* can have one of six types: +- a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`}; +- a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`}; +- an *octal constant*, the prefix `0o` followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `_`}; +- a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`}; +- a *floating-point constant*, a decimal constant followed by one of + - `.` followed by a decimal constant, + - either `e` or `E` followed by a decimal constant, + - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant; +- or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`. + +### Escape Sequences + +The following sequences of characters are *escape sequences*: +- `\'` +- `\"` +- `\\` +- `\r` +- `\n` +- `\t` +- `\0` +- `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} +- `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} +- `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} + +## String Literals + +A *string literal* begins with a `"`. +It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`. +It then ends with a `"`. + +## Punctuators + +The following sequences of characters form *punctuators*: +- `[` +- `]` +- `(` +- `)` +- `{` +- `}` +- `.` +- `,` +- `+` +- `-` +- `*` +- `/` +- `%` +- `;` +- `!` +- `&` +- `|` +- `^` +- the tilde, `~` (given special treatment on this line due to [a bug in the Markdown renderer that sr.ht uses](https://github.com/miyuchina/mistletoe/issues/91)) +- `>` +- `<` +- `=` +- `->` +- `++` +- `--` +- `>>` +- `<<` +- `<=` +- `>=` +- `==` +- `!=` +- `&&` +- `||` +- `+=` +- `-=` +- `*=` +- `/=` +- `%=` +- `&=` +- `|=` +- `^=` + +## Whitespace + +A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other. + +## Comments + +A *comment* can be either a *line comment* or a *block comment*. + +A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A. + +A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`. + +# Parsing + +The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar): + +## Entry points + +``` +HeaderFile ← HeaderFileElement+ +HeaderFileElement ← IncludeStatement / + TypeDeclaration / + FunctionDeclaration + +ImplementationFile ← ImplementationFileElement+ +ImplementationFileElement ← HeaderFileElement / + FunctionDefinition +``` + +## Top-level elements + +``` +IncludeStatement ← 'include' string-literal ';' + +TypeDeclaration ← StructDeclaration / + EnumDeclaration / + TypedefDeclaration +StructDeclaration ← 'struct' identifier '{' VariableDeclaration+ '}' ';' +EnumDeclaration ← 'enum' identifier '{' EnumBody '}' ';' +EnumBody ← identifier ('=' Expression)? ',' EnumBody / + identifier ('=' Expression)? ','? +TypedefDeclaration ← 'typedef' identifier '=' Type ';' + +FunctionDeclaration ← FunctionSignature ';' +FunctionDefinition ← FunctionSignature Block +FunctionSignature ← Type identifier '(' SignatureArguments? ')' +SignatureArguments ← Type identifier ',' SignatureArguments / + Type identifier ','? +``` + +## Statements + +``` +Block ← '{' Statement* '}' + +Statement ← VariableDefinition / + VariableDeclaration / + IfStatement / + SwitchStatement / + WhileStatement / + DoWhileStatement / + ForStatement / + FlowControlStatement / + AssignmentStatement / + ExpressionStatement + +VariableDefinition ← Type identifier '=' Expression ';' +VariableDeclaration ← Type identifier ';' + +IfStatement ← 'if' Expression Block 'else' Block / + 'if' Expression Block + +SwitchStatement ← 'switch' Expression '{' SwitchCase+ '}' +SwitchCase ← CaseSpecifier Block / + 'default' Block +CaseSpecifier ← 'case' Expression ',' CaseSpecifier / + 'case' Expression ','? + +WhileStatement ← 'while' Expression Block +DoWhileStatement ← 'do' Block 'while' Expression ';' +ForStatement ← 'for' VariableDefinition? ';' Expression ';' AssignmentStatementBody? Block + +FlowControlStatement ← 'continue' ';' / + 'break' ';' / + 'return' Expression? ';' + +AssignmentStatement ← AssignmentStatementBody ';' +AssignmentStatementBody ← AssignmentTargetExpression '=' Expression / + AssignmentTargetExpression '+=' Expression / + AssignmentTargetExpression '-=' Expression / + AssignmentTargetExpression '*=' Expression / + AssignmentTargetExpression '/=' Expression / + AssignmentTargetExpression '%=' Expression / + AssignmentTargetExpression '&=' Expression / + AssignmentTargetExpression '^=' Expression / + AssignmentTargetExpression '|=' Expression / + AssignmentTargetExpression '++' / + AssignmentTargetExpression '--' + +ExpressionStatement ← Expression ';' +``` + +## Types + +``` +Type ← 'const' BasicType / + BasicType '*' / + BasicType '[' Expression ']' / + BasicType 'function' '(' (BasicType ',')* ')' / + BasicType +BasicType ← 'void' / + IntegerType / + 'signed' IntegerType / + 'unsigned' IntegerType / + 'float' / + 'double' / + 'bool' / + 'struct' identifier / + 'enum' identifier / + 'typedef' identifier / + '(' Type ')' +IntegerType ← 'char' / + 'short' / + 'int' / + 'long' +``` + +## Expressions + +``` +AssignmentTargetExpression ← identifier ATEElementSuffix* +ATEElementSuffix ← '[' Expression ']' / + '.' identifier / + '->' identifier + +AtomicExpression ← identifier / + constant / + string-literal / + '(' Expression ')' + +ObjectExpression ← AtomicExpression ObjectSuffix* / + ArrayLiteralExpression / + StructLiteralExpression +ObjectSuffix ← '[' Expression ']' / + '(' CommasExpressionList? ')' / + '.' identifier / + '->' identifier +CommasExpressionList ← Expression ',' CommasExpressionList? / + Expression ','? +ArrayLiteralExpression ← '{' CommasExpressionList '}' +StructLiteralExpression ← '{' StructLiteralBody '}' +StructLiteralBody ← StructLiteralElement ',' StructLiteralBody? / + StructLiteralElement ','? +StructLiteralElement ← '.' identifier '=' Expression + +FactorExpression ← '(' Type ')' FactorExpression / + '&' FactorExpression / + '*' FactorExpression / + '+' FactorExpression / + '-' FactorExpression / + '~' FactorExpression / + '!' FactorExpression / + 'sizeof' FactorExpression / + 'sizeof' Type / + ObjectExpression + +TermExpression ← FactorExpression TermSuffix* +TermSuffix ← '*' FactorExpression / + '/' FactorExpression / + '%' FactorExpression + +ArithmeticExpression ← TermExpression ArithmeticSuffix* +ArithmeticSuffix ← '+' TermExpression / + '-' TermExpression + +BitwiseOpExpression ← ArithmeticExpression '<<' ArithmeticExpression / + ArithmeticExpression '>>' ArithmeticExpression / + ArithmeticExpression '^' ArithmeticExpression / + ArithmeticExpression ('&' ArithmeticExpression)+ / + ArithmeticExpression ('|' ArithmeticExpression)+ / + ArithmeticExpression + +ComparisonExpression ← BitwiseOpExpression '==' BitwiseOpExpression / + BitwiseOpExpression '!=' BitwiseOpExpression / + BitwiseOpExpression '<=' BitwiseOpExpression / + BitwiseOpExpression '>=' BitwiseOpExpression / + BitwiseOpExpression '<' BitwiseOpExpression / + BitwiseOpExpression '>' BitwiseOpExpression / + BitwiseOpExpression + +Expression ← ComparisonExpression ('&&' ComparisonExpression)+ / + ComparisonExpression ('||' ComparisonExpression)+ / + ComparisonExpression +``` + +[![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/) -- cgit v1.2.3