summary refs log tree commit diff stats
diff options
context:
space:
mode:
authorGrzegorz Adam Hankiewicz <gradha@imap.cc>2013-11-15 23:04:21 +0100
committerGrzegorz Adam Hankiewicz <gradha@imap.cc>2013-11-16 20:45:57 +0100
commit38eb67de835db986105a7340def055cad704f697 (patch)
tree7586babecfb3bd9b25ae3ff9d293d3e5bec32a6b
parent9061b8961eb5cefd31eba9314a5932f35f524eac (diff)
downloadNim-38eb67de835db986105a7340def055cad704f697.tar.gz
Expands tutorial macro section with step by step guide.
-rw-r--r--doc/tut2.txt269
1 files changed, 260 insertions, 9 deletions
diff --git a/doc/tut2.txt b/doc/tut2.txt
index e1e36bfc4..4d8d0be15 100644
--- a/doc/tut2.txt
+++ b/doc/tut2.txt
@@ -699,15 +699,22 @@ once.
 Macros
 ======
 
-Macros enable advanced compile-time code transformations, but they
-cannot change Nimrod's syntax. However, this is no real restriction because
-Nimrod's syntax is flexible enough anyway.
-
-To write a macro, one needs to know how the Nimrod concrete syntax is converted
-to an abstract syntax tree (AST). The AST is documented in the
-`macros <macros.html>`_ module.
-
-There are two ways to invoke a macro:
+Macros enable advanced compile-time code transformations, but they cannot
+change Nimrod's syntax. However, this is no real restriction because Nimrod's
+syntax is flexible enough anyway. Macros have to be implemented in pure Nimrod
+code if `foreign function interface (FFI)
+<manual.html#foreign-function-interface>`_ is not enabled in the compiler, but
+other than that restriction (which at some point in the future will go away)
+you can write any kind of Nimrod code and the compiler will run it at compile
+time.
+
+There are two ways to write a macro, either *generating* Nimrod source code and
+letting the compiler parse it, or creating manually an abstract syntax tree
+(AST) which you feed to the compiler. In order to build the AST one needs to
+know how the Nimrod concrete syntax is converted to an abstract syntax tree
+(AST). The AST is documented in the `macros <macros.html>`_ module.
+
+Once your macro is finished, there are two ways to invoke it:
 (1) invoking a macro like a procedure call (`expression macros`:idx:)
 (2) invoking a macro with the special ``macrostmt``
     syntax (`statement macros`:idx:)
@@ -796,3 +803,247 @@ Term rewriting macros
 Term rewriting macros can be used to enhance the compilation process
 with user defined optimizations; see this `document <trmacros.html>`_ for 
 further information.
+
+
+Building your first macro
+-------------------------
+
+To give a footstart to writing macros we will show now how to turn your typical
+dynamic code into something that compiles statically. For the exercise we will
+use the following snippet of code as the starting point:
+
+.. code-block:: nimrod
+
+  import strutils, tables
+
+  proc readCfgAtRuntime(cfgFilename: string): TTable[string, string] =
+    let
+      inputString = readFile(cfgFilename)
+    var
+      rawLines = split(inputString, {char(0x0a), char(0x0d)})
+      source = ""
+
+    result = initTable[string, string]()
+    for line in rawLines:
+      var chunks = split(line, ',')
+      if chunks.len != 2:
+        quit("Input needs comma split values, got: " & line)
+      result[chunks[0]] = chunks[1]
+
+    if result.len < 1: quit("Input file empty!")
+
+  let info = readCfgAtRuntime("data.cfg")
+
+  when isMainModule:
+    echo info["licenseOwner"]
+    echo info["licenseKey"]
+    echo info["version"]
+
+Presumably this snippet of code could be used in a commercial software, reading
+a configuration file to display information about the person who bought the
+software. This external file would be generated by an online web shopping cart
+to be included along the program containing the license information::
+
+  version,1.1
+  licenseOwner,Hyori Lee
+  licenseKey,M1Tl3PjBWO2CC48m
+
+The ``readCfgAtRuntime`` proc will open the given filename and return a
+``TTable`` from the `tables module <tables.html>`_. The parsing of the file is
+done (without much care for handling invalid data or corner cases) using the
+``split`` proc from the `strutils module <strutils.html>`_. There are many
+things which can fail; mind the purpose is explaining how to make this run at
+compile time, not how to properly implement a DRM scheme.
+
+The reimplementation of this code as a compile time proc will allow us to get
+rid of the ``data.cfg`` file we would need to distribute along the binary, plus
+if the information is really constant, it doesn't make from a logical point of
+view to have it *mutable* in a global variable, it would be better if it was a
+constant. Finally, and likely the most valuable feature, we can implement some
+verification at compile time. You could think if this as a better *unit
+testing*, since it is impossible to obtain a binary unless everything is
+correct, preventing you to ship to users a broken program which won't start
+because a small critical file is missing or its contents changed by mistake to
+something invalid.
+
+
+Generating source code
+++++++++++++++++++++++
+
+Our first attempt will start by modifying the program to generate a compile
+time string with the *generated source code*, which we then pass to the
+``parseStmt`` proc from the `macros module <macros.html>`_. Here is the
+modified source code implementing the macro:
+
+.. code-block:: nimrod
+  import macros, strutils
+
+  macro readCfgAndBuildSource(cfgFilename: string): stmt =
+    let
+      inputString = slurp(cfgFilename.strVal)
+    var
+      rawLines = split(inputString, {char(0x0a), char(0x0d)})
+      source = ""
+
+    for line in rawLines:
+      var chunks = split(line, ',')
+      if chunks.len != 2:
+        error("Input needs comma split values, got: " & line)
+      source &= "const cfg" & chunks[0] & "= \"" & chunks[1] & "\"\n"
+
+    if source.len < 1: error("Input file empty!")
+    result = parseStmt(source)
+
+  readCfgAndBuildSource("data.cfg")
+
+  when isMainModule:
+    echo cfglicenseOwner
+    echo cfglicenseKey
+    echo cfgversion
+
+The good news is not much has changed! First, we need to change the handling of
+the input parameter. In the dynamic version the ``readCfgAtRuntime`` proc
+receives a string parameter. However, in the macro version it is also declared
+as string, but this is the *outside* interface of the macro.  When the macro is
+run, it actually gets a ``PNimrodNode`` object instead of a string, and we have
+to call the ``strVal`` proc from the `macros module <macros.html>`_ to obtain
+the string being passed in to the macro.
+
+Second, we cannot use the ``readFile`` proc from the `system module
+<system.html>`_ due to FFI restriction at compile time. If we try to use this
+proc, or any other which depends on FFI, the compiler will error with the
+message ``cannot evaluate`` and a dump of the macro's source code, along with a
+stack trace where the compiler reached before bailing out. We can get around
+this limitation by using the ``slurp`` proc from the `system module
+<system.html>`_, which was precisely made for compilation time (just like
+``gorge`` which executes an external program and captures its output).
+
+The interesting thing is that our macro does not return a runtime ``TTable``
+object. Instead, it builds up Nimrod source code into the ``source`` variable.
+For each line of the configuration file a ``const`` variable will be generated.
+To avoid conflicts we prefix these variables with ``cfg``. In essence, what the
+compiler is doing is replacing the line calling the macro with the following
+snippet of code:
+
+.. code-block:: nimrod
+  const cfgversion= "1.1"
+  const cfglicenseOwner= "Hyori Lee"
+  const cfglicenseKey= "M1Tl3PjBWO2CC48m"
+
+You can verify this yourself adding the line ``echo source`` somewhere at the
+end of the macro and compiling the program. Another difference is that instead
+of calling the usual ``quit`` proc to abort (which we could still call) this
+version calls the ``error`` proc. The ``error`` proc has the same behavior as
+``quit`` but will dump also the source and file line information where the
+error happened, making it easier for the programmer to find where compilation
+failed. In this situation it would point to the line invoking the macro, but
+**not** the line of ``data.cfg`` we are processing, that's something the macro
+itself would need to control.
+
+
+Generating AST by hand
+++++++++++++++++++++++
+
+To generate an AST we would need to intimately know the structures used by the
+Nimrod compiler exposed in the `macros module <macros.html>`_, which at first
+look seems a daunting task. But we can use a helper shortcut the ``dumpTree``
+macro, which is used as a statement macro instead of an expression macro.
+Since we know that we want to generate a bunch of ``const`` symbols we can
+create the following source file and compile it to see what the compiler
+*expects* from us:
+
+.. code-block:: nimrod
+  import macros
+
+  dumpTree:
+    const cfgversion: string = "1.1"
+    const cfglicenseOwner= "Hyori Lee"
+    const cfglicenseKey= "M1Tl3PjBWO2CC48m"
+
+During compilation of the source code we should see the following lines in the
+output (again, since this is a macro, compilation is enough, you don't have to
+run any binary)::
+
+  StmtList
+    ConstSection
+      ConstDef
+        Ident !"cfgversion"
+        Ident !"string"
+        StrLit 1.1
+    ConstSection
+      ConstDef
+        Ident !"cfglicenseOwner"
+        Empty
+        StrLit Hyori Lee
+    ConstSection
+      ConstDef
+        Ident !"cfglicenseKey"
+        Empty
+        StrLit M1Tl3PjBWO2CC48m
+
+With this output we have a better idea of what kind of input the compiler
+expects. We need to generate a list of statements. For each constant the source
+code generates a ``ConstSection`` and a ``ConstDef``. If we were to move all
+the constants to a single ``const`` block we would see only a single
+``ConstSection`` with three children.
+
+Maybe you didn't notice, but in the ``dumpTree`` example the first constant
+explicitly specifies the type of the constant.  That's why in the tree output
+the two last constants have their second child ``Empty`` but the first has a
+string identifier. So basically a ``const`` definition is made up from an
+identifier, optionally a type (can be an *empty* node) and the value. Armed
+with this knowledge, let's look at the finished version of the AST building
+macro:
+
+.. code-block:: nimrod
+  import macros, strutils
+
+  macro readCfgAndBuildAST(cfgFilename: string): stmt =
+    let
+      inputString = slurp(cfgFilename.strVal)
+    var
+      rawLines = split(inputString, {char(0x0a), char(0x0d)})
+
+    result = newNimNode(nnkStmtList)
+    for line in rawLines:
+      var chunks = split(line, ',')
+      if chunks.len != 2:
+        error("Input needs comma split values, got: " & line)
+      var
+        section = newNimNode(nnkConstSection)
+        constDef = newNimNode(nnkConstDef)
+      constDef.add(newIdentNode("cfg" & chunks[0]))
+      constDef.add(newEmptyNode())
+      constDef.add(newStrLitNode(chunks[1]))
+      section.add(constDef)
+      result.add(section)
+
+    if result.len < 1: error("Input file empty!")
+
+  readCfgAndBuildAST("data.cfg")
+
+  when isMainModule:
+    echo cfglicenseOwner
+    echo cfglicenseKey
+    echo cfgversion
+
+Since we are building on the previous example generating source code, we will
+only mention the differences to it. Instead of creating a temporary ``string``
+variable and writing into it source code as if it were written *by hand*, we
+use the ``result`` variable directly and create a statement list node
+(``nnkStmtList``) which will hold our children.
+
+For each input line we have to create a constant definition (``nnkConstDef``)
+and wrap it inside a constant section (``nnkConstSection``). Once these
+variables are created, we fill them hierarchichally like the previous AST dump
+tree showed: the constant definition is a child of the section definition, and
+the constant definition has an identifier node, an empty node (we let the
+compiler figure out the type), and a string literal with the value.
+
+A last tip when writing a macro: if you are not sure the AST you are building
+looks ok, you may be tempted to use the ``dumpTree`` macro. But you can't use
+it *inside* the macro you are writting/debugging. Instead ``echo`` the string
+generated by ``treeRepr``. If at the end of the this example you add ``echo
+treeRepr(result)`` you should get the same output as using the ``dumpTree``
+macro, but of course you can call that at any point of the macro where you
+might be having troubles.