The syntax of Crowbar will eventually mostly match the syntax of C, with fewer obscure/advanced/edge case features. # Source Files A Crowbar source file is UTF-8. Crowbar source files can come in two varieties, an *implementation file* and a *header file*. An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension. A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into a parse tree). # Scanning A *token* is one of the following kinds of token: - a *keyword*, - an *identifier*, - a *constant*, - a *string literal*, - or a *punctuator*. Tokens are separated by either *whitespace* or a *comment*. ## Keywords A *keyword* is one of the following literal words: - `bool` - `break` - `case` - `char` - `const` - `continue` - `default` - `do` - `double` - `else` - `enum` - `extern` - `float` - `for` - `function` - `if` - `include` - `int` - `long` - `return` - `short` - `signed` - `sizeof` - `struct` - `switch` - `typedef` - `unsigned` - `void` - `while` ## Identifiers An *identifier* is a sequence of one or more characters having Unicode categories within a legal set. The first character in an identifier must have one of the following Unicode categories: - Connector Punctuation (e.g. `_`) - Format Other (e.g. Zero-Width Joiner) - Lowercase Letter (e.g. `h`) - Modifier Letter (e.g. `ʹ`, U+02B9 Modifier Letter Prime) - Modifier Symbol (e.g. `^`, U+005E Circumflex Accent) - Nonspacing Mark (e.g. ` ̂`, U+0302 Combining Circumflex Accent) - Other Letter (e.g. `א`, U+05D0 Hebrew Letter Alef) - Titlecase Letter (e.g. `Dž`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron) - Uppercase Letter (e.g. `B`) Subsequent characters may have any of the above-listed Unicode categories, or one of the following: - Decimal Digit Number (e.g. `0`) - Letter Number (e.g. `Ⅳ`, U+2163 Roman Numeral Four) - Other Number (e.g. `¼`, U+00BC Vulgar Fraction One Quarter) ## Constants A *constant* can have one of five types: - a *decimal constant*, a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `_`}; - a *binary constant*, a prefix (either `0b` or `0B`) followed by a sequence of characters drawn from the set {`0`, `1`, `_`}; - a *hexadecimal constant*, a prefix (either `0x` or `0X`) followed by a sequence of characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`, `_`}; - a *floating-point constant*, a decimal constant followed by one of - `.` followed by a decimal constant, - either `e` or `E` followed by a decimal constant, - or a `.` followed by a decimal constant followed by either an `e` or `E` followed by a decimal constant; - or a *character constant*, a `'` followed by either a single character or an *escape sequence* followed by another `'`. ### Escape Sequences The following sequences of characters are *escape sequences*: - `\'` - `\"` - `\\` - `\r` - `\n` - `\t` - `\x` followed by two characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} - `\u` followed by four characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} - `\U` followed by eight characters drawn from the set {`0`, `1`, `2`, `3`, `4`, `5`, `6`, `7`, `8`, `9`, `A`, `a`, `B`, `b`, `C`, `c`, `D`, `d`, `E`, `e`, `F`, `f`} ## String Literals A *string literal* begins with a `"`. It then contains a sequence where each element is either an escape sequence or a character that is neither `"` nor `\`. It then ends with a `"`. ## Punctuators The following sequences of characters form *punctuators*: - `[` - `]` - `(` - `)` - `{` - `}` - `.` - `+` - `-` - `*` - `/` - `%` - `;` - `!` - `&` - `|` - `^` - `~` - `>` - `<` - `=` - `->` - `++` - `--` - `>>` - `<<` - `<=` - `>=` - `==` - `!=` - `&&` - `||` - `+=` - `-=` - `*=` - `/=` - `%=` - `&=` - `|=` - `^=` ## Whitespace A nonempty sequence of characters is considered to be *whitespace* if each character in it has a Unicode class of either Space Separator or Control Other. ## Comments A *comment* can be either a *line comment* or a *block comment*. A *line comment* begins with the characters `//` if they occur outside of a string literal or comment, and ends with a newline character U+000A. A *block comment* begins with the characters `/*` if they occur outside of a string literal or comment, and ends with the characters `*/`. # Parsing The syntax of Crowbar is given as a [parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar): ## Entry points ``` HeaderFile ← HeaderFileElement+ HeaderFileElement ← IncludeStatement / TypeDeclaration / FunctionDeclaration ImplementationFile ← ImplementationFileElement+ ImplementationFileElement ← HeaderFileElement / FunctionDefinition ``` ## Top-level elements ``` IncludeStatement ← 'include' string-literal ';' TypeDeclaration ← StructDeclaration / EnumDeclaration / TypedefDeclaration StructDeclaration ← 'struct' identifier '{' VariableDeclaration+ '}' ';' EnumDeclaration ← 'enum' identifier '{' EnumBody '}' ';' EnumBody ← identifier ',' EnumBody / identifier ','? TypedefDeclaration ← 'typedef' identifier '=' Type ';' FunctionDeclaration ← FunctionSignature ';' FunctionDefinition ← FunctionSignature Block FunctionSignature ← Type identifier '(' SignatureArguments ')' SignatureArguments ← Type identifier ',' SignatureArguments / Type identifier ','? ``` ## Statements ``` Block ← '{' Statement* '}' Statement ← VariableDefinition / VariableDeclaration / IfStatement / SwitchStatement / WhileStatement / DoWhileStatement / ForStatement / FlowControlStatement / AssignmentStatement / ExpressionStatement VariableDefinition ← Type identifier '=' Expression ';' VariableDeclaration ← Type identifier ';' IfStatement ← 'if' Expression Block 'else' Block / 'if' Expression Block SwitchStatement ← 'switch' Expression '{' SwitchCase+ '}' SwitchCase ← CaseSpecifier Block / 'default' Block CaseSpecifier ← 'case' Expression ',' CaseSpecifier / 'case' Expression ','? WhileStatement ← 'while' Expression Block DoWhileStatement ← 'do' Block 'while' Expression ';' ForStatement ← 'for' VariableDefinition? ';' Expression ';' AssignmentStatementBody? Block FlowControlStatement ← 'continue' ';' / 'break' ';' / 'return' Expression? ';' AssignmentStatement ← AssignmentStatementBody ';' AssignmentStatementBody ← AssignmentTargetExpression '=' Expression / AssignmentTargetExpression '+=' Expression / AssignmentTargetExpression '-=' Expression / AssignmentTargetExpression '*=' Expression / AssignmentTargetExpression '/=' Expression / AssignmentTargetExpression '%=' Expression / AssignmentTargetExpression '<<=' Expression / AssignmentTargetExpression '>>=' Expression / AssignmentTargetExpression '&=' Expression / AssignmentTargetExpression '^=' Expression / AssignmentTargetExpression '|=' Expression / AssignmentTargetExpression '++' / AssignmentTargetExpression '--' ExpressionStatement ← Expression ';' ``` ## Types ``` Type ← 'const' BasicType / BasicType '*' / BasicType '[' Expression ']' / BasicType 'function' '(' (BasicType ',')* ')' / BasicType BasicType ← 'void' / IntegerType / 'signed' IntegerType / 'unsigned' IntegerType / 'float' / 'double' / 'bool' / 'struct' identifier / 'enum' identifier / 'typedef' identifier / '(' Type ')' IntegerType ← 'char' / 'short' / 'int' / 'long' ``` ## Expressions ``` AssignmentTargetExpression ← identifier ATEElementSuffix* ATEElementSuffix ← '[' Expression ']' / '.' identifier / '->' identifier AtomicExpression ← identifier / constant / string-literal / '(' Expression ')' ObjectExpression ← AtomicExpression ObjectSuffix* / ArrayLiteralExpression / StructLiteralExpression ObjectSuffix ← '[' Expression ']' / '(' CommasExpressionList? ')' / '.' identifier / '->' identifier CommasExpressionList ← Expression ',' CommasExpressionList? / Expression ','? ArrayLiteralExpression ← '{' CommasExpressionList '}' StructLiteralExpression ← '{' StructLiteralBody '}' StructLiteralBody ← StructLiteralElement ',' StructLiteralBody? / StructLiteralElement ','? StructLiteralElement ← '.' identifier '=' Expression FactorExpression ← '(' Type ')' FactorExpression / '&' FactorExpression / '*' FactorExpression / '+' FactorExpression / '-' FactorExpression / '~' FactorExpression / '!' FactorExpression / 'sizeof' FactorExpression / 'sizeof' Type / ObjectExpression TermExpression ← FactorExpression TermSuffix* TermSuffix ← '*' FactorExpression / '/' FactorExpression / '%' FactorExpression ArithmeticExpression ← TermExpression ArithmeticSuffix* ArithmeticSuffix ← '+' TermExpression / '-' TermExpression BitwiseOpExpression ← ArithmeticExpression '<<' ArithmeticExpression / ArithmeticExpression '>>' ArithmeticExpression / ArithmeticExpression '^' ArithmeticExpression / ArithmeticExpression ('&' ArithmeticExpression)+ / ArithmeticExpression ('|' ArithmeticExpression)+ / ArithmeticExpression ComparisonExpression ← BitwiseOpExpression '==' BitwiseOpExpression / BitwiseOpExpression '!=' BitwiseOpExpression / BitwiseOpExpression '<=' BitwiseOpExpression / BitwiseOpExpression '>=' BitwiseOpExpression / BitwiseOpExpression '<' BitwiseOpExpression / BitwiseOpExpression '>' BitwiseOpExpression / BitwiseOpExpression Expression ← ComparisonExpression ('&&' ComparisonExpression)+ / ComparisonExpression ('||' ComparisonExpression)+ / ComparisonExpression ``` [![Creative Commons BY-SA License](https://i.creativecommons.org/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)