aboutsummaryrefslogtreecommitdiff
path: root/language/scanning.rst
blob: 2c5e290218323b43b4ce76a47c68a062d1c54ba8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
Scanning
--------

.. glossary::

    token
        A single atomic unit in a Crowbar source file.
        May be a :term:`keyword`, an :term:`identifier`, a :term:`constant`,
        a :term:`string literal`, or a :term:`punctuator`.
        Keywords, identifiers, and constants (except for :term:`character constant`\ s) must have either whitespace or a comment separating them.
        Punctuators, string literals, and character constants do not require explicit separation from adjacent tokens.

    keyword
        One of the literal words ``bool``, ``break``,
        ``case``, ``const``, ``continue``,
        ``default``, ``do``,
        ``else``, ``enum``,
        ``false``, ``float32``, ``float64``, ``for``, ``fragile``, ``function``,
        ``if``, :crowbar:ref:`include <IncludeStatement>`, ``int8``, ``int16``, ``int32``, ``int64``, ``intaddr``, ``intmax``, ``intsize``,
        ``long``,
        ``opaque``,
        ``return``,
        ``short``, ``sizeof``, ``struct``, ``switch``,
        ``true``,
        ``uint8``, ``uint16``, ``uint32``, ``uint64``, ``uintaddr``, ``uintmax``, ``uintsize``, ``union``,
        ``void``,
        or ``while``.

    identifier
        A nonempty sequence of characters blah blah blah

        .. todo::

            figure out https://www.unicode.org/reports/tr31/tr31-33.html

    constant
        A numeric (or numeric-equivalent) value specified directly within the code.
        May be a :term:`decimal constant`, a :term:`binary constant` , an :term:`octal constant`,
        a :term:`hexadecimal constant`, a :term:`floating-point constant`, a :term:`hexadecimal floating-point constant`,
        or a :term:`character constant`.
        Any of these except for the character constant may contain underscores; these are ignored by the compiler and only meaningful to humans reading the code.

    decimal constant
        A sequence of characters matching the regular expression ``[0-9_]+``.
        Denotes the numeric value of the given sequence of decimal digits.

    binary constant
        A sequence of characters matching the regular expression ``0[bB][01_]+``.
        Denotes the numeric value of the given sequence of binary digits (after the ``0[bB]`` prefix has been removed).

    octal constant
        A sequence of characters matching the regular expression ``0o[0-7_]+``.
        Denotes the numeric value of the given sequence of octal digits (after the ``0o`` prefix has been removed).

    hexadecimal constant
        A sequence of characters matching the regular expression ``0[xX][0-9a-fA-F]+``.
        Denotes the numeric value of the given sequence of hexadecimal digits (after the ``0[xX]`` prefix has been removed).

    floating-point constant
        A sequence of characters matching the regular expression ``[0-9_]+\.[0-9_]+([eE][+-]?[0-9_]+)?``.

        .. note::

            Unlike in C and many other languages, ``6e3`` in Crowbar is not a valid floating-point constant.
            The Crowbar-compatible spelling is ``6.0e3``.

        Denotes the numeric value of the given decimal number, optionally expressed in scientific notation.
        That is, ``XeY`` denotes :math:`X * 10^Y`.

    hexadecimal floating-point constant
        A sequence of characters matching the regular expression ``0(fx|FX)[0-9a-fA-F_]+\.[0-9a-fA-F_]+[pP][+-]?[0-9_]+``.
        Denotes the numeric value of the given hexadecimal number expressed in binary scientific notation.
        That is, ``XpY`` denotes :math:`X * 2^Y`.

    character constant
        A pair of single quotes ``'`` surrounding either a single character or an :term:`escape sequence`.
        The single character may not be a single quote or a backslash ``\``.
        Denotes the Unicode scalar value for either the single surrounded character or the character denoted by the escape sequence.

    escape sequence
        One of the following pairs of characters:

        * ``\'``, denoting the single quote ``'``
        * ``\"``, denoting the double quote ``"``
        * ``\\``, denoting the backslash ``\``
        * ``\r``, denoting the carriage return (U+000D)
        * ``\n``, denoting the line feed, or newline (U+000A)
        * ``\t``, denoting the (horizontal) tab (U+0009)
        * ``\0``, denoting a null character (U+0000)

        Or a sequence of characters matching one of the following regular expressions:

        * ``\\x[0-9a-fA-F]{2}``, denoting the numeric value of the given two hexadecimal digits
        * ``\\x[0-9a-fA-F]{4}``, denoting the numeric value of the given four hexadecimal digits
        * ``\\x[0-9a-fA-F]{8}``, denoting the numeric value of the given eight hexadecimal digits

    string literal
        A pair of double quotes ``"`` surrounding a sequence whose elements are either single characters or escape sequences.
        No single-character element may be the double quote or the backslash.
        Denotes the UTF-8-encoded sequence of bytes representing the sequence of characters which, either directly or via an escape sequence, are specified between the quotes.

    punctuator
        One of the literal sequences of characters ``[``, ``]``, ``(``, ``)``,
        ``{``, ``}``, ``.``, ``,``, ``+``, ``-``, ``*``, ``/``, ``%``, ``;``,
        ``!``, ``&``, ``|``, ``^``, ``~``, ``>``, ``<``, ``=``, ``->``, ``++``,
        ``--``, ``>>``, ``<<``, ``<=``, ``>=``, ``==``, ``!=``, ``&&``, ``||``,
        ``+=``, ``-=``, ``*=``, ``/=``, ``%=``, ``&=``, ``|=``, or ``^=``.

    whitespace
        A nonempty sequence of characters that each has a Unicode general category of either Control (``Cc``) or Separator (``Z``).
        Separates tokens.

    comment
        Text that the compiler should ignore.
        May be a :term:`line comment` or a :term:`block comment`.

    line comment
        A sequence of characters beginning with the characters ``//`` (outside of a :term:`string literal` or :term:`comment`) and ending with a newline character U+000A.

    block comment
        A sequence of characters beginning with the characters ``/*`` (outside of a :term:`string literal` or :term:`comment`) and ending with the characters ``*/``.