1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
|
Scanning
--------
.. glossary::
token
A single atomic unit in a Crowbar source file.
May be a :term:`keyword`, an :term:`identifier`, a :term:`constant`,
a :term:`string literal`, or a :term:`punctuator`.
Keywords, identifiers, and constants (except for :term:`character constant`\ s) must have either whitespace or a comment separating them.
Punctuators, string literals, and character constants do not require explicit separation from adjacent tokens.
keyword
One of the literal words ``bool``, ``break``,
``case``, ``const``, ``continue``,
``default``, ``do``,
``else``, ``enum``,
``false``, ``float32``, ``float64``, ``for``, ``fragile``, ``function``,
``if``, :crowbar:ref:`include <IncludeStatement>`, ``int8``, ``int16``, ``int32``, ``int64``, ``intaddr``, ``intmax``, ``intsize``,
``opaque``,
``return``,
``sizeof``, ``struct``, ``switch``,
``true``,
``uint8``, ``uint16``, ``uint32``, ``uint64``, ``uintaddr``, ``uintmax``, ``uintsize``, ``union``,
``void``,
or ``while``.
identifier
A nonempty sequence of characters blah blah blah
.. todo::
figure out https://www.unicode.org/reports/tr31/tr31-33.html
constant
A numeric (or numeric-equivalent) value specified directly within the code.
May be a :term:`decimal constant`, a :term:`binary constant` , an :term:`octal constant`,
a :term:`hexadecimal constant`, a :term:`floating-point constant`, a :term:`hexadecimal floating-point constant`,
or a :term:`character constant`.
Any of these except for the character constant may contain underscores; these are ignored by the compiler and only meaningful to humans reading the code.
decimal constant
A sequence of characters matching the regular expression ``[0-9_]+``.
Denotes the numeric value of the given sequence of decimal digits.
binary constant
A sequence of characters matching the regular expression ``0[bB][01_]+``.
Denotes the numeric value of the given sequence of binary digits (after the ``0[bB]`` prefix has been removed).
octal constant
A sequence of characters matching the regular expression ``0o[0-7_]+``.
Denotes the numeric value of the given sequence of octal digits (after the ``0o`` prefix has been removed).
hexadecimal constant
A sequence of characters matching the regular expression ``0[xX][0-9a-fA-F]+``.
Denotes the numeric value of the given sequence of hexadecimal digits (after the ``0[xX]`` prefix has been removed).
floating-point constant
A sequence of characters matching the regular expression ``[0-9_]+\.[0-9_]+([eE][+-]?[0-9_]+)?``.
.. note::
Unlike in C and many other languages, ``6e3`` in Crowbar is not a valid floating-point constant.
The Crowbar-compatible spelling is ``6.0e3``.
Denotes the numeric value of the given decimal number, optionally expressed in scientific notation.
That is, ``XeY`` denotes :math:`X * 10^Y`.
hexadecimal floating-point constant
A sequence of characters matching the regular expression ``0(fx|FX)[0-9a-fA-F_]+\.[0-9a-fA-F_]+[pP][+-]?[0-9_]+``.
Denotes the numeric value of the given hexadecimal number expressed in binary scientific notation.
That is, ``XpY`` denotes :math:`X * 2^Y`.
character constant
A pair of single quotes ``'`` surrounding either a single character or an :term:`escape sequence`.
The single character may not be a single quote or a backslash ``\``.
Denotes the Unicode scalar value for either the single surrounded character or the character denoted by the escape sequence.
escape sequence
One of the following pairs of characters:
* ``\'``, denoting the single quote ``'``
* ``\"``, denoting the double quote ``"``
* ``\\``, denoting the backslash ``\``
* ``\r``, denoting the carriage return (U+000D)
* ``\n``, denoting the line feed, or newline (U+000A)
* ``\t``, denoting the (horizontal) tab (U+0009)
* ``\0``, denoting a null character (U+0000)
Or a sequence of characters matching one of the following regular expressions:
* ``\\x[0-9a-fA-F]{2}``, denoting the numeric value of the given two hexadecimal digits
* ``\\u[0-9a-fA-F]{4}``, denoting the numeric value of the given four hexadecimal digits
* ``\\U[0-9a-fA-F]{8}``, denoting the numeric value of the given eight hexadecimal digits
string literal
A pair of double quotes ``"`` surrounding a sequence whose elements are either single characters or escape sequences.
No single-character element may be the double quote or the backslash.
Denotes the UTF-8-encoded sequence of bytes representing the sequence of characters which, either directly or via an escape sequence, are specified between the quotes.
punctuator
One of the literal sequences of characters ``[``, ``]``, ``(``, ``)``,
``{``, ``}``, ``.``, ``,``, ``+``, ``-``, ``*``, ``/``, ``%``, ``;``, ``:``,
``!``, ``&``, ``|``, ``^``, ``~``, ``>``, ``<``, ``=``, ``->``, ``++``,
``--``, ``>>``, ``<<``, ``<=``, ``>=``, ``==``, ``!=``, ``&&``, ``||``,
``+=``, ``-=``, ``*=``, ``/=``, ``%=``, ``&=``, ``|=``, or ``^=``.
whitespace
A nonempty sequence of characters that each has a Unicode general category of either Control (``Cc``) or Separator (``Z``).
Separates tokens.
comment
Text that the compiler should ignore.
May be a :term:`line comment` or a :term:`block comment`.
line comment
A sequence of characters beginning with the characters ``//`` (outside of a :term:`string literal` or :term:`comment`) and ending with a newline character U+000A.
block comment
A sequence of characters beginning with the characters ``/*`` (outside of a :term:`string literal` or :term:`comment`) and ending with the characters ``*/``.
|