summary refs log tree commit diff stats
path: root/doc/manual/lexing.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/manual/lexing.txt')
-rw-r--r--doc/manual/lexing.txt417
1 files changed, 0 insertions, 417 deletions
diff --git a/doc/manual/lexing.txt b/doc/manual/lexing.txt
deleted file mode 100644
index 7ffd5eb1c..000000000
--- a/doc/manual/lexing.txt
+++ /dev/null
@@ -1,417 +0,0 @@
-Lexical Analysis
-================
-
-Encoding
---------
-
-All Nim source files are in the UTF-8 encoding (or its ASCII subset). Other
-encodings are not supported. Any of the standard platform line termination
-sequences can be used - the Unix form using ASCII LF (linefeed), the Windows
-form using the ASCII sequence CR LF (return followed by linefeed), or the old
-Macintosh form using the ASCII CR (return) character. All of these forms can be
-used equally, regardless of platform.
-
-
-Indentation
------------
-
-Nim's standard grammar describes an `indentation sensitive`:idx: language.
-This means that all the control structures are recognized by indentation.
-Indentation consists only of spaces; tabulators are not allowed.
-
-The indentation handling is implemented as follows: The lexer annotates the
-following token with the preceding number of spaces; indentation is not
-a separate token. This trick allows parsing of Nim with only 1 token of
-lookahead.
-
-The parser uses a stack of indentation levels: the stack consists of integers
-counting the spaces. The indentation information is queried at strategic
-places in the parser but ignored otherwise: The pseudo terminal ``IND{>}``
-denotes an indentation that consists of more spaces than the entry at the top
-of the stack; ``IND{=}`` an indentation that has the same number of spaces. ``DED``
-is another pseudo terminal that describes the *action* of popping a value
-from the stack, ``IND{>}`` then implies to push onto the stack.
-
-With this notation we can now easily define the core of the grammar: A block of
-statements (simplified example)::
-
-  ifStmt = 'if' expr ':' stmt
-           (IND{=} 'elif' expr ':' stmt)*
-           (IND{=} 'else' ':' stmt)?
-
-  simpleStmt = ifStmt / ...
-
-  stmt = IND{>} stmt ^+ IND{=} DED  # list of statements
-       / simpleStmt                 # or a simple statement
-
-
-
-Comments
---------
-
-Comments start anywhere outside a string or character literal with the
-hash character ``#``.
-Comments consist of a concatenation of `comment pieces`:idx:. A comment piece
-starts with ``#`` and runs until the end of the line. The end of line characters
-belong to the piece. If the next line only consists of a comment piece with
-no other tokens between it and the preceding one, it does not start a new
-comment:
-
-
-.. code-block:: nim
-  i = 0     # This is a single comment over multiple lines.
-    # The scanner merges these two pieces.
-    # The comment continues here.
-
-
-`Documentation comments`:idx: are comments that start with two ``##``.
-Documentation comments are tokens; they are only allowed at certain places in
-the input file as they belong to the syntax tree!
-
-
-Multiline comments
-------------------
-
-Starting with version 0.13.0 of the language Nim supports multiline comments.
-They look like:
-
-.. code-block:: nim
-  #[Comment here.
-  Multiple lines
-  are not a problem.]#
-
-Multiline comments support nesting:
-
-.. code-block:: nim
-  #[  #[ Multiline comment in already
-     commented out code. ]#
-  proc p[T](x: T) = discard
-  ]#
-
-Multiline documentation comments also exist and support nesting too:
-
-.. code-block:: nim
-  proc foo =
-    ##[Long documentation comment
-    here.
-    ]##
-
-
-Identifiers & Keywords
-----------------------
-
-Identifiers in Nim can be any string of letters, digits
-and underscores, beginning with a letter. Two immediate following
-underscores ``__`` are not allowed::
-
-  letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff'
-  digit ::= '0'..'9'
-  IDENTIFIER ::= letter ( ['_'] (letter | digit) )*
-
-Currently any Unicode character with an ordinal value > 127 (non ASCII) is
-classified as a ``letter`` and may thus be part of an identifier but later
-versions of the language may assign some Unicode characters to belong to the
-operator characters instead.
-
-The following keywords are reserved and cannot be used as identifiers:
-
-.. code-block:: nim
-   :file: ../keywords.txt
-
-Some keywords are unused; they are reserved for future developments of the
-language.
-
-
-Identifier equality
--------------------
-
-Two identifiers are considered equal if the following algorithm returns true:
-
-.. code-block:: nim
-  proc sameIdentifier(a, b: string): bool =
-    a[0] == b[0] and
-      a.replace(re"_|–", "").toLower == b.replace(re"_|–", "").toLower
-
-That means only the first letters are compared in a case sensitive manner. Other
-letters are compared case insensitively and underscores and en-dash (Unicode
-point U+2013) are ignored.
-
-This rather unorthodox way to do identifier comparisons is called
-`partial case insensitivity`:idx: and has some advantages over the conventional
-case sensitivity:
-
-It allows programmers to mostly use their own preferred
-spelling style, be it humpStyle, snake_style or dash–style and libraries written
-by different programmers cannot use incompatible conventions.
-A Nim-aware editor or IDE can show the identifiers as preferred.
-Another advantage is that it frees the programmer from remembering
-the exact spelling of an identifier. The exception with respect to the first
-letter allows common code like ``var foo: Foo`` to be parsed unambiguously.
-
-Historically, Nim was a fully `style-insensitive`:idx: language. This meant that
-it was not case-sensitive and underscores were ignored and there was no even a
-distinction between ``foo`` and ``Foo``.
-
-
-String literals
----------------
-
-Terminal symbol in the grammar: ``STR_LIT``.
-
-String literals can be delimited by matching double quotes, and can
-contain the following `escape sequences`:idx:\ :
-
-==================         ===================================================
-  Escape sequence          Meaning
-==================         ===================================================
-  ``\n``                   `newline`:idx:
-  ``\r``, ``\c``           `carriage return`:idx:
-  ``\l``                   `line feed`:idx:
-  ``\f``                   `form feed`:idx:
-  ``\t``                   `tabulator`:idx:
-  ``\v``                   `vertical tabulator`:idx:
-  ``\\``                   `backslash`:idx:
-  ``\"``                   `quotation mark`:idx:
-  ``\'``                   `apostrophe`:idx:
-  ``\`` '0'..'9'+          `character with decimal value d`:idx:;
-                           all decimal digits directly
-                           following are used for the character
-  ``\a``                   `alert`:idx:
-  ``\b``                   `backspace`:idx:
-  ``\e``                   `escape`:idx: `[ESC]`:idx:
-  ``\x`` HH                `character with hex value HH`:idx:;
-                           exactly two hex digits are allowed
-==================         ===================================================
-
-
-Strings in Nim may contain any 8-bit value, even embedded zeros. However
-some operations may interpret the first binary zero as a terminator.
-
-
-Triple quoted string literals
------------------------------
-
-Terminal symbol in the grammar: ``TRIPLESTR_LIT``.
-
-String literals can also be delimited by three double quotes
-``"""`` ... ``"""``.
-Literals in this form may run for several lines, may contain ``"`` and do not
-interpret any escape sequences.
-For convenience, when the opening ``"""`` is followed by a newline (there may
-be whitespace between the opening ``"""`` and the newline),
-the newline (and the preceding whitespace) is not included in the string. The
-ending of the string literal is defined by the pattern ``"""[^"]``, so this:
-
-.. code-block:: nim
-  """"long string within quotes""""
-
-Produces::
-
-  "long string within quotes"
-
-
-Raw string literals
--------------------
-
-Terminal symbol in the grammar: ``RSTR_LIT``.
-
-There are also raw string literals that are preceded with the
-letter ``r`` (or ``R``) and are delimited by matching double quotes (just
-like ordinary string literals) and do not interpret the escape sequences.
-This is especially convenient for regular expressions or Windows paths:
-
-.. code-block:: nim
-
-  var f = openFile(r"C:\texts\text.txt") # a raw string, so ``\t`` is no tab
-
-To produce a single ``"`` within a raw string literal, it has to be doubled:
-
-.. code-block:: nim
-
-  r"a""b"
-
-Produces::
-
-  a"b
-
-``r""""`` is not possible with this notation, because the three leading
-quotes introduce a triple quoted string literal. ``r"""`` is the same
-as ``"""`` since triple quoted string literals do not interpret escape
-sequences either.
-
-
-Generalized raw string literals
--------------------------------
-
-Terminal symbols in the grammar: ``GENERALIZED_STR_LIT``,
-``GENERALIZED_TRIPLESTR_LIT``.
-
-The construct ``identifier"string literal"`` (without whitespace between the
-identifier and the opening quotation mark) is a
-generalized raw string literal. It is a shortcut for the construct
-``identifier(r"string literal")``, so it denotes a procedure call with a
-raw string literal as its only argument. Generalized raw string literals
-are especially convenient for embedding mini languages directly into Nim
-(for example regular expressions).
-
-The construct ``identifier"""string literal"""`` exists too. It is a shortcut
-for ``identifier("""string literal""")``.
-
-
-Character literals
-------------------
-
-Character literals are enclosed in single quotes ``''`` and can contain the
-same escape sequences as strings - with one exception: `newline`:idx: (``\n``)
-is not allowed as it may be wider than one character (often it is the pair
-CR/LF for example).  Here are the valid `escape sequences`:idx: for character
-literals:
-
-==================         ===================================================
-  Escape sequence          Meaning
-==================         ===================================================
-  ``\r``, ``\c``           `carriage return`:idx:
-  ``\l``                   `line feed`:idx:
-  ``\f``                   `form feed`:idx:
-  ``\t``                   `tabulator`:idx:
-  ``\v``                   `vertical tabulator`:idx:
-  ``\\``                   `backslash`:idx:
-  ``\"``                   `quotation mark`:idx:
-  ``\'``                   `apostrophe`:idx:
-  ``\`` '0'..'9'+          `character with decimal value d`:idx:;
-                           all decimal digits directly
-                           following are used for the character
-  ``\a``                   `alert`:idx:
-  ``\b``                   `backspace`:idx:
-  ``\e``                   `escape`:idx: `[ESC]`:idx:
-  ``\x`` HH                `character with hex value HH`:idx:;
-                           exactly two hex digits are allowed
-==================         ===================================================
-
-A character is not an Unicode character but a single byte. The reason for this
-is efficiency: for the overwhelming majority of use-cases, the resulting
-programs will still handle UTF-8 properly as UTF-8 was specially designed for
-this. Another reason is that Nim can thus support ``array[char, int]`` or
-``set[char]`` efficiently as many algorithms rely on this feature.  The `Rune`
-type is used for Unicode characters, it can represent any Unicode character.
-``Rune`` is declared in the `unicode module <unicode.html>`_.
-
-
-Numerical constants
--------------------
-
-Numerical constants are of a single type and have the form::
-
-  hexdigit = digit | 'A'..'F' | 'a'..'f'
-  octdigit = '0'..'7'
-  bindigit = '0'..'1'
-  HEX_LIT = '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )*
-  DEC_LIT = digit ( ['_'] digit )*
-  OCT_LIT = '0' ('o' | 'c' | 'C') octdigit ( ['_'] octdigit )*
-  BIN_LIT = '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )*
-
-  INT_LIT = HEX_LIT
-          | DEC_LIT
-          | OCT_LIT
-          | BIN_LIT
-
-  INT8_LIT = INT_LIT ['\''] ('i' | 'I') '8'
-  INT16_LIT = INT_LIT ['\''] ('i' | 'I') '16'
-  INT32_LIT = INT_LIT ['\''] ('i' | 'I') '32'
-  INT64_LIT = INT_LIT ['\''] ('i' | 'I') '64'
-
-  UINT_LIT = INT_LIT ['\''] ('u' | 'U')
-  UINT8_LIT = INT_LIT ['\''] ('u' | 'U') '8'
-  UINT16_LIT = INT_LIT ['\''] ('u' | 'U') '16'
-  UINT32_LIT = INT_LIT ['\''] ('u' | 'U') '32'
-  UINT64_LIT = INT_LIT ['\''] ('u' | 'U') '64'
-
-  exponent = ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )*
-  FLOAT_LIT = digit (['_'] digit)* (('.' (['_'] digit)* [exponent]) |exponent)
-  FLOAT32_SUFFIX = ('f' | 'F') ['32']
-  FLOAT32_LIT = HEX_LIT '\'' FLOAT32_SUFFIX
-              | (FLOAT_LIT | DEC_LIT | OCT_LIT | BIN_LIT) ['\''] FLOAT32_SUFFIX
-  FLOAT64_SUFFIX = ( ('f' | 'F') '64' ) | 'd' | 'D'
-  FLOAT64_LIT = HEX_LIT '\'' FLOAT64_SUFFIX
-              | (FLOAT_LIT | DEC_LIT | OCT_LIT | BIN_LIT) ['\''] FLOAT64_SUFFIX
-
-
-As can be seen in the productions, numerical constants can contain underscores
-for readability. Integer and floating point literals may be given in decimal (no
-prefix), binary (prefix ``0b``), octal (prefix ``0o`` or ``0c``) and hexadecimal
-(prefix ``0x``) notation.
-
-There exists a literal for each numerical type that is
-defined. The suffix starting with an apostrophe ('\'') is called a
-`type suffix`:idx:. Literals without a type suffix are of the type ``int``,
-unless the literal contains a dot or ``E|e`` in which case it is of
-type ``float``. For notational convenience the apostrophe of a type suffix
-is optional if it is not ambiguous (only hexadecimal floating point literals
-with a type suffix can be ambiguous).
-
-
-The type suffixes are:
-
-=================    =========================
-  Type Suffix        Resulting type of literal
-=================    =========================
-  ``'i8``            int8
-  ``'i16``           int16
-  ``'i32``           int32
-  ``'i64``           int64
-  ``'u``             uint
-  ``'u8``            uint8
-  ``'u16``           uint16
-  ``'u32``           uint32
-  ``'u64``           uint64
-  ``'f``             float32
-  ``'d``             float64
-  ``'f32``           float32
-  ``'f64``           float64
-  ``'f128``          float128
-=================    =========================
-
-Floating point literals may also be in binary, octal or hexadecimal
-notation:
-``0B0_10001110100_0000101001000111101011101111111011000101001101001001'f64``
-is approximately 1.72826e35 according to the IEEE floating point standard.
-
-Literals are bounds checked so that they fit the datatype. Non base-10
-literals are used mainly for flags and bit pattern representations, therefore
-bounds checking is done on bit width, not value range. If the literal fits in
-the bit width of the datatype, it is accepted.
-Hence: 0b10000000'u8 == 0x80'u8 == 128, but, 0b10000000'i8 == 0x80'i8 == -1
-instead of causing an overflow error.
-
-Operators
----------
-
-Nim allows user defined operators. An operator is any combination of the
-following characters::
-
-       =     +     -     *     /     <     >
-       @     $     ~     &     %     |
-       !     ?     ^     .     :     \
-
-These keywords are also operators:
-``and or not xor shl shr div mod in notin is isnot of``.
-
-`=`:tok:, `:`:tok:, `::`:tok: are not available as general operators; they
-are used for other notational purposes.
-
-``*:`` is as a special case treated as the two tokens `*`:tok: and `:`:tok:
-(to support ``var v*: T``).
-
-
-Other tokens
-------------
-
-The following strings denote other tokens::
-
-    `   (     )     {     }     [     ]     ,  ;   [.    .]  {.   .}  (.  .)
-
-
-The `slice`:idx: operator `..`:tok: takes precedence over other tokens that
-contain a dot: `{..}`:tok: are the three tokens `{`:tok:, `..`:tok:, `}`:tok:
-and not the two tokens `{.`:tok:, `.}`:tok:.
-