aboutsummaryrefslogtreecommitdiff
path: root/language/scanning.rst
blob: 86177ac8e77eeafc2a9d759413c0eb30e3548909 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
Scanning
--------

.. glossary::

    token
        A single atomic unit in a Crowbar source file.
        May be a :term:`keyword`, an :term:`identifier`, a :term:`constant`,
        a :term:`string literal`, or a :term:`punctuator`.
        Keywords, identifiers, and constants (except for :term:`character constant`\ s) must have either whitespace or a comment separating them.
        Punctuators, string literals, and character constants do not require explicit separation from adjacent tokens.

    keyword
        One of the literal words ``bool``, :crowbar:ref:`break`, ``case``,
        ``char``, ``const``, ``continue``, ``default``, ``do``, ``double``,
        ``else``, ``enum``, ``extern``, ``float``, ``for``, ``fragile``,
        ``function``, ``if``, ``include``, ``int``, ``long``, ``return``,
        ``short``, ``signed``, ``sizeof``, ``struct``, ``switch``,
        ``unsigned``, ``void``, or ``while``.
    
    identifier
        A nonempty sequence of characters blah blah blah

        .. todo::

            figure out https://www.unicode.org/reports/tr31/tr31-33.html
    
    constant
        A numeric (or numeric-equivalent) value specified directly within the code.
        May be a :term:`decimal constant`, a :term:`binary constant` , an :term:`octal constant`,
        a :term:`hexadecimal constant`, a :term:`floating-point constant`, a :term:`hexadecimal floating-point constant`,
        or a :term:`character constant`.
        Any of these except for the character constant may contain underscores; these are ignored by the compiler and only meaningful to humans reading the code.

    decimal constant
        A sequence of characters matching the regular expression ``[0-9_]+``.
        Denotes the numeric value of the given sequence of decimal digits.

    binary constant
        A sequence of characters matching the regular expression ``0[bB][01_]+``.
        Denotes the numeric value of the given sequence of binary digits (after the ``0[bB]`` prefix has been removed).

    octal constant
        A sequence of characters matching the regular expression ``0o[0-7_]+``.
        Denotes the numeric value of the given sequence of octal digits (after the ``0o`` prefix has been removed).

    hexadecimal constant
        A sequence of characters matching the regular expression ``0[xX][0-9a-fA-F]+``.
        Denotes the numeric value of the given sequence of hexadecimal digits (after the ``0[xX]`` prefix has been removed).

    floating-point constant
        A sequence of characters matching the regular expression ``[0-9_]+\.[0-9_]+([eE][+-]?[0-9_]+)?``.
        
        .. note::

            Unlike in C and many other languages, ``6e3`` in Crowbar is not a valid floating-point constant.
            The Crowbar-compatible spelling is ``6.0e3``.
        
        Denotes the numeric value of the given decimal number, optionally expressed in scientific notation.
        That is, ``XeY`` denotes :math:`X * 10^Y`.

    hexadecimal floating-point constant
        A sequence of characters matching the regular expression ``0(fx|FX)[0-9a-fA-F_]+\.[0-9a-fA-F_]+[pP][+-]?[0-9_]+``.
        Denotes the numeric value of the given hexadecimal number expressed in binary scientific notation.
        That is, ``XpY`` denotes :math:`X * 2^Y`.
    
    character constant
        A pair of single quotes ``'`` surrounding either a single character or an :term:`escape sequence`.
        The single character may not be a single quote or a backslash ``\``.
        Denotes the Unicode code point number for either the single surrounded character or the character denoted by the escape sequence.
    
    escape sequence
        One of the following pairs of characters:

        * ``\'``, denoting the single quote ``'``
        * ``\"``, denoting the double quote ``"``
        * ``\\``, denoting the backslash ``\``
        * ``\r``, denoting the carriage return (U+000D)
        * ``\n``, denoting the line feed, or newline (U+000A)
        * ``\t``, denoting the (horizontal) tab (U+0009)
        * ``\0``, denoting a null character (U+0000)
        
        Or a sequence of characters matching one of the following regular expressions:

        * ``\\x[0-9a-fA-F]{2}``, denoting the numeric value of the given two hexadecimal digits
        * ``\\x[0-9a-fA-F]{4}``, denoting the numeric value of the given four hexadecimal digits
        * ``\\x[0-9a-fA-F]{8}``, denoting the numeric value of the given eight hexadecimal digits

    string literal
        A pair of double quotes ``"`` surrounding a sequence whose elements are either single characters or escape sequences.
        No single-character element may be the double quote or the backslash.
        Denotes the UTF-8-encoded sequence of bytes representing the sequence of characters which, either directly or via an escape sequence, are specified between the quotes.

    punctuator
        One of the literal sequences of characters ``[``, ``]``, ``(``, ``)``,
        ``{``, ``}``, ``.``, ``,``, ``+``, ``-``, ``*``, ``/``, ``%``, ``;``,
        ``!``, ``&``, ``|``, ``^``, ``~``, ``>``, ``<``, ``=``, ``->``, ``++``,
        ``--``, ``>>``, ``<<``, ``<=``, ``>=``, ``==``, ``!=``, ``&&``, ``||``,
        ``+=``, ``-=``, ``*=``, ``/=``, ``%=``, ``&=``, ``|=``, or ``^=``.

    whitespace
        A nonempty sequence of characters that each has a Unicode general category of either Control (``Cc``) or Separator (``Z``).
        Separates tokens.
    
    comment
        Text that the compiler should ignore.
        May be a :term:`line comment` or a :term:`block comment`.
    
    line comment
        A sequence of characters beginning with the characters ``//`` (outside of a :term:`string literal` or :term:`comment`) and ending with a newline character U+000A.
    
    block comment
        A sequence of characters beginning with the characters ``/*`` (outside of a :term:`string literal` or :term:`comment`) and ending with the characters ``*/``.