diff options
| author | Melody Horn <melody@boringcactus.com> | 2020-10-29 19:32:38 -0600 | 
|---|---|---|
| committer | Melody Horn <melody@boringcactus.com> | 2020-10-29 19:32:38 -0600 | 
| commit | 673ac6bd5696bce4c9f18d39b0cecd5db1aa8f22 (patch) | |
| tree | 1f661e57f1b8c9b0219be7a30d08fc60f8d9fe13 | |
| parent | 158c4ae02f902012e09e20434a0b26790a804b35 (diff) | |
| download | spec-673ac6bd5696bce4c9f18d39b0cecd5db1aa8f22.tar.gz spec-673ac6bd5696bce4c9f18d39b0cecd5db1aa8f22.zip | |
finish moving tokens to new format
| -rw-r--r-- | language/scanning.rst | 91 | ||||
| -rw-r--r-- | syntax.md | 2 | 
2 files changed, 82 insertions, 11 deletions
| diff --git a/language/scanning.rst b/language/scanning.rst index 7a7b7d3..86177ac 100644 --- a/language/scanning.rst +++ b/language/scanning.rst @@ -3,6 +3,13 @@ Scanning  .. glossary:: +    token +        A single atomic unit in a Crowbar source file. +        May be a :term:`keyword`, an :term:`identifier`, a :term:`constant`, +        a :term:`string literal`, or a :term:`punctuator`. +        Keywords, identifiers, and constants (except for :term:`character constant`\ s) must have either whitespace or a comment separating them. +        Punctuators, string literals, and character constants do not require explicit separation from adjacent tokens. +      keyword          One of the literal words ``bool``, :crowbar:ref:`break`, ``case``,          ``char``, ``const``, ``continue``, ``default``, ``do``, ``double``, @@ -17,26 +24,90 @@ Scanning          .. todo::              figure out https://www.unicode.org/reports/tr31/tr31-33.html +     +    constant +        A numeric (or numeric-equivalent) value specified directly within the code. +        May be a :term:`decimal constant`, a :term:`binary constant` , an :term:`octal constant`, +        a :term:`hexadecimal constant`, a :term:`floating-point constant`, a :term:`hexadecimal floating-point constant`, +        or a :term:`character constant`. +        Any of these except for the character constant may contain underscores; these are ignored by the compiler and only meaningful to humans reading the code.      decimal constant          A sequence of characters matching the regular expression ``[0-9_]+``.          Denotes the numeric value of the given sequence of decimal digits. -        Underscores are ignored by the compiler, but may be useful separators for other readers. -     +      binary constant          A sequence of characters matching the regular expression ``0[bB][01_]+``.          Denotes the numeric value of the given sequence of binary digits (after the ``0[bB]`` prefix has been removed). -        Underscores are ignored by the compiler, but may be useful separators for other readers. -     +      octal constant          A sequence of characters matching the regular expression ``0o[0-7_]+``.          Denotes the numeric value of the given sequence of octal digits (after the ``0o`` prefix has been removed). -        Underscores are ignored by the compiler, but may be useful separators for other readers. -    token -        A single atomic unit in a Crowbar source file. -        Has one (and exactly one) of the following types. +    hexadecimal constant +        A sequence of characters matching the regular expression ``0[xX][0-9a-fA-F]+``. +        Denotes the numeric value of the given sequence of hexadecimal digits (after the ``0[xX]`` prefix has been removed). + +    floating-point constant +        A sequence of characters matching the regular expression ``[0-9_]+\.[0-9_]+([eE][+-]?[0-9_]+)?``. +         +        .. note:: -.. todo:: +            Unlike in C and many other languages, ``6e3`` in Crowbar is not a valid floating-point constant. +            The Crowbar-compatible spelling is ``6.0e3``. +         +        Denotes the numeric value of the given decimal number, optionally expressed in scientific notation. +        That is, ``XeY`` denotes :math:`X * 10^Y`. -    finish transcribing token definitions +    hexadecimal floating-point constant +        A sequence of characters matching the regular expression ``0(fx|FX)[0-9a-fA-F_]+\.[0-9a-fA-F_]+[pP][+-]?[0-9_]+``. +        Denotes the numeric value of the given hexadecimal number expressed in binary scientific notation. +        That is, ``XpY`` denotes :math:`X * 2^Y`. +     +    character constant +        A pair of single quotes ``'`` surrounding either a single character or an :term:`escape sequence`. +        The single character may not be a single quote or a backslash ``\``. +        Denotes the Unicode code point number for either the single surrounded character or the character denoted by the escape sequence. +     +    escape sequence +        One of the following pairs of characters: + +        * ``\'``, denoting the single quote ``'`` +        * ``\"``, denoting the double quote ``"`` +        * ``\\``, denoting the backslash ``\`` +        * ``\r``, denoting the carriage return (U+000D) +        * ``\n``, denoting the line feed, or newline (U+000A) +        * ``\t``, denoting the (horizontal) tab (U+0009) +        * ``\0``, denoting a null character (U+0000) +         +        Or a sequence of characters matching one of the following regular expressions: + +        * ``\\x[0-9a-fA-F]{2}``, denoting the numeric value of the given two hexadecimal digits +        * ``\\x[0-9a-fA-F]{4}``, denoting the numeric value of the given four hexadecimal digits +        * ``\\x[0-9a-fA-F]{8}``, denoting the numeric value of the given eight hexadecimal digits + +    string literal +        A pair of double quotes ``"`` surrounding a sequence whose elements are either single characters or escape sequences. +        No single-character element may be the double quote or the backslash. +        Denotes the UTF-8-encoded sequence of bytes representing the sequence of characters which, either directly or via an escape sequence, are specified between the quotes. + +    punctuator +        One of the literal sequences of characters ``[``, ``]``, ``(``, ``)``, +        ``{``, ``}``, ``.``, ``,``, ``+``, ``-``, ``*``, ``/``, ``%``, ``;``, +        ``!``, ``&``, ``|``, ``^``, ``~``, ``>``, ``<``, ``=``, ``->``, ``++``, +        ``--``, ``>>``, ``<<``, ``<=``, ``>=``, ``==``, ``!=``, ``&&``, ``||``, +        ``+=``, ``-=``, ``*=``, ``/=``, ``%=``, ``&=``, ``|=``, or ``^=``. + +    whitespace +        A nonempty sequence of characters that each has a Unicode general category of either Control (``Cc``) or Separator (``Z``). +        Separates tokens. +     +    comment +        Text that the compiler should ignore. +        May be a :term:`line comment` or a :term:`block comment`. +     +    line comment +        A sequence of characters beginning with the characters ``//`` (outside of a :term:`string literal` or :term:`comment`) and ending with a newline character U+000A. +     +    block comment +        A sequence of characters beginning with the characters ``/*`` (outside of a :term:`string literal` or :term:`comment`) and ending with the characters ``*/``. @@ -1,4 +1,4 @@ -# Syntax +# Syntax (old)  The syntax of Crowbar mostly matches the syntax of C, with fewer obscure/advanced/edge case features. |