The syntax of Crowbar will eventually mostly match the syntax of C, with fewer obscure/advanced/edge case features. # Source Files A Crowbar source file is UTF-8. Crowbar source files can come in two varieties, an *implementation file* and a *header file*. An implementation file conventionally has a `.cro` extension, and a header file conventionally has a `.hro` extension. A Crowbar source file is read into memory in two phases: *scanning* (which converts text into an unstructured sequence of tokens) and *parsing* (which converts an unstructured sequence of tokens into an Abstract Syntax Tree, or AST). # Scanning A *token* is one of the following kinds of token: - a *keyword*, - an *identifier*, - a *constant*, - a *string literal*, - or a *punctuator*. ## Keywords A *keyword* is one of the following 28 literal words: - `bool` - `break` - `case` - `char` - `const` - `continue` - `default` - `do` - `double` - `else` - `enum` - `extern` - `float` - `for` - `if` - `include` - `int` - `long` - `return` - `short` - `signed` - `sizeof` - `struct` - `switch` - `typedef` - `unsigned` - `void` - `while` ## Identifiers An *identifier* is a sequence of one or more characters having Unicode categories within a legal set. The first character in an identifier must have one of the following Unicode categories: - Connector Punctuation (e.g. `_`) - Format Other (e.g. Zero-Width Joiner) - Lowercase Letter (e.g. `h`) - Modifier Letter (e.g. `ʹ`, U+02B9 Modifier Letter Prime) - Modifier Symbol (e.g. `^`, U+005E Circumflex Accent) - Nonspacing Mark (e.g. ` ̂`, U+0302 Combining Circumflex Accent) - Other Letter (e.g. `א`, U+05D0 Hebrew Letter Alef) - Titlecase Letter (e.g. `Dž`, U+01C5 Latin Capital Letter D With Small Letter Z With Caron) - Uppercase Letter (e.g. `B`) Subsequent characters may have any of the above-listed Unicode categories, or one of the following: - Decimal Digit Number (e.g. `0`) - Letter Number (e.g. `Ⅳ`, U+2163 Roman Numeral Four) - Other Number (e.g. `¼`, U+00BC Vulgar Fraction One Quarter)