about summary refs log tree commit diff stats
path: root/js/games/nluqo.github.io/~bh/v3ch5/langi.html
diff options
context:
space:
mode:
authorelioat <elioat@tilde.institute>2023-08-23 07:52:19 -0400
committerelioat <elioat@tilde.institute>2023-08-23 07:52:19 -0400
commit562a9a52d599d9a05f871404050968a5fd282640 (patch)
tree7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/v3ch5/langi.html
parent5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff)
downloadtour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz
*
Diffstat (limited to 'js/games/nluqo.github.io/~bh/v3ch5/langi.html')
-rw-r--r--js/games/nluqo.github.io/~bh/v3ch5/langi.html3297
1 files changed, 3297 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/v3ch5/langi.html b/js/games/nluqo.github.io/~bh/v3ch5/langi.html
new file mode 100644
index 0000000..927e3cf
--- /dev/null
+++ b/js/games/nluqo.github.io/~bh/v3ch5/langi.html
@@ -0,0 +1,3297 @@
+<HTML>
+<HEAD>
+<TITLE>Computer Science Logo Style vol 3 ch 5: Programming Language Implementation</TITLE>
+</HEAD>
+<BODY>
+<CITE>Computer Science Logo Style</CITE> volume 3:
+<CITE>Beyond Programming</CITE> 2/e Copyright (C) 1997 MIT
+<H1>Programming Language Implementation</H1>
+
+<TABLE width="100%"><TR><TD>
+<IMG SRC="../csls3.jpg" ALT="cover photo">
+<TD><TABLE>
+<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian
+Harvey</A><BR>University of California, Berkeley</CITE>
+<TR><TD align="right"><BR>
+<TR><TD align="right"><A HREF="../pdf/v3ch05.pdf">Download PDF version</A>
+<TR><TD align="right"><A HREF="../v3-toc2.html">Back to Table of Contents</A>
+<TR><TD align="right"><A HREF="https://people.eecs.berkeley.edu/~bh/v3ch4/v3ch4.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../v3ch6/v3ch6.html"><STRONG>NEXT</STRONG></A>
+<TR><TD align="right"><A HREF="https://mitpress.mit.edu/books/computer-science-logo-style-second-edition-volume-3">MIT
+Press web page for <CITE>Computer Science Logo Style</CITE></A>
+</TABLE></TABLE>
+
+<HR><P>Program file for this chapter: <A HREF="pascal.lg"><CODE>pascal</CODE></A>
+
+<P>We are now ready to turn from the questions of language design to those of
+compiler implementation.  A Pascal compiler is a much larger programming
+project than most of the ones we've explored so far.  You might well ask,
+&quot;where do we <EM>begin</EM> in writing a compiler?&quot; My goal in this chapter
+is to show some of the parts that go into a compiler design.
+
+<P>A compiler translates programs from a language like Pascal into the
+machine language of some particular computer model.  My compiler
+translates into a simplified, simulated machine language; the compiled
+programs are actually carried out by another Logo program, the simulator,
+rather than directly by the computer hardware.  The advantage of using this
+simulated machine language is that this compiler will work no matter what
+kind of computer you have; also, the simplifications in this simulated
+machine allow me to leave out many confusing details of a practical compiler.
+Our machine language is, however, realistic enough to give you a good sense
+of what compiling into a real machine language would be like; it's loosely
+based on the MIPS microprocessor design.  You'll see in a moment that most
+of the structure of the compiler is independent of the target language,
+anyway.
+
+<P>
+
+<P>Here is a short, uninteresting Pascal program:
+
+<P><PRE>program test;
+
+procedure doit(n:integer);
+   begin
+      writeln(n,n*n)
+   end;
+
+begin
+   doit(3)
+end.
+</PRE>
+
+<P>If you type this program into a disk file and then compile it
+using <CODE>compile</CODE> as described in Chapter 4, the compiler will translate
+the program into this sequence of instructions, contained in a list in the
+variable named <CODE>%test</CODE>:
+
+<P>
+
+<P><PRE>[       [add 3 0 0]
+        [add 4 0 0]
+        [addi 2 0 36]
+        [jump &quot;g1]
+%doit   [store 1 0(4)]
+        [jump &quot;g2]
+g2      [rload 7 36(4)]
+        [putint 10 7]
+        [rload 7 36(4)]
+        [rload 8 36(4)]
+        [mul 7 7 8]
+        [putint 10 7]
+        [newline]
+        [rload 1 0(4)]
+        [add 2 4 0]
+        [rload 4 3(2)]
+        [jr 1]
+g1      [store 5 1(2)]
+        [add 5 2 0]
+        [addi 2 2 37]
+        [store 4 3(5)]
+        [store 4 2(5)]
+        [addi 7 0 3]
+        [store 7 36(5)]
+        [add 4 5 0]
+        [rload 5 1(4)]
+        [jal 1 &quot;%doit]
+        [exit]
+]
+</PRE>
+
+<P>I've displayed this list of instructions with some extra spacing
+thrown in to make it look somewhat like a typical <EM>assembler</EM> listing.
+(An assembler is a program that translates a notation like <CODE>add 3 0 0</CODE>
+into a binary number, the form in which the machine hardware actually
+recognizes these instructions.)  A real assembler listing wouldn't have the
+square brackets that Logo uses to mark each sublist, but would instead
+depend on the convention that each instruction occupies one line.
+
+<P>The first three instructions carry out initialization that would be the same
+for any compiled Pascal program; the fourth is a <CODE>jump</CODE> instruction that
+tells the (simulated) computer to skip to the instruction following the
+<EM>label</EM> <CODE>g1</CODE> that appears later in the program.  (A word that
+isn't part of a sublist is a label.)  In Pascal, the body of the main
+program comes after the declarations of procedures; this <CODE>jump</CODE>
+instruction allows the compiler to translate the parts of the program in the
+order in which they appear.
+
+<P>(Two instructions later, you'll notice a <CODE>jump</CODE> to a label that comes
+right after the jump instruction!  The compiler issues this useless
+instruction just in case some internal procedures were declared within
+the procedure <CODE>doit</CODE>.  A better compiler would include an <EM>
+optimizer</EM> that would go through the compiled program looking for
+ways to eliminate unnecessary instructions such as this one.  The optimizer
+is the most important thing that I've left out of my compiler.)
+
+<P>We're not ready yet to talk in detail about how the compiled instructions
+represent the Pascal program, but you might be able to guess certain things.
+For example, the variable <CODE>n</CODE> in procedure <CODE>doit</CODE> seems to be
+represented as <CODE>36(4)</CODE> in the compiled program; you can see where <CODE>
+36(4)</CODE> is printed and then multiplied by itself, although it may not yet be
+clear to you what the numbers <CODE>7</CODE> and <CODE>8</CODE> have to do with anything.
+Before we get into those details, I want to give a broader overview of
+the organization of the compiler.
+
+<P>The compilation process is divided into three main pieces.  First and
+simplest is <EM>tokenization.</EM> The compiler initially sees the
+source program as a string of characters: <CODE>p</CODE>, then <CODE>r</CODE>, and so on,
+including spaces and line separators.  The first step in compilation is to
+turn these characters into symbols, so that the later stages of compilation
+can deal with the word <CODE>program</CODE> as a unit.  The second piece of the
+compiler is the <EM>parser,</EM> the part that recognizes certain
+patterns of symbols as representing meaningful units.  &quot;Oh,&quot; says the
+parser, &quot;I've just seen the word <CODE>procedure</CODE> so what comes next must be
+a procedure header and then a <CODE>begin</CODE>-<CODE>end</CODE> block for the body of
+the procedure.&quot; Finally, there is the process of <EM>
+code generation,</EM> in which each unit that was recognized by the
+parser is actually translated into the equivalent machine language
+instructions.
+
+<P>(I don't mean that parsing and code generation happen separately, one after
+the other, in the compiler's algorithm.  In fact each meaningful unit is
+translated as it's encountered, and the translation of a large unit like a
+procedure includes, recursively, the translation of smaller units like
+statements.  But parsing and code generation are conceptually two different
+tasks, and we'll talk about them separately.)
+
+<P><H2>Formal Syntax Definition</H2>
+
+<P>One common starting place is to develop a formal definition for the language
+we're trying to compile.  The regular expressions of Chapter 1 are an
+example of what I mean by a formal definition.  A
+regular expression tells
+us unambiguously that certain strings of characters are accepted as members
+of the category defined by the expression, while other strings aren't.  A
+language like Pascal is too complicated to be described by a regular
+expression, but other kinds of formal definition can be used.
+
+<P>The formal systems of Chapter 1 just gave a yes-or-no decision for any input
+string:  Is it, or is it not, accepted in the language under discussion?
+That's not quite good enough for a compiler.  We don't just want to know
+whether a Pascal program is syntactically correct; we want a translation of
+the program into some executable form.  Nevertheless, it turns out to be
+worthwhile to begin by designing a formal acceptor for Pascal.  That part of
+the compiler--the part that determines the syntactic structure of the
+source program--is called the <EM>parser.</EM>  Later we'll add provisions for
+<EM>code generation:</EM> the translation of each syntactic unit of the
+source program into a piece of <EM>object</EM> (executable) program that
+carries out the meaning (the <EM>semantics</EM>) of that unit.
+
+<P>One common form in which programming languages are described is the
+<EM>production rule</EM> notation mentioned briefly in Chapter 1.  For
+example, here is part of a specification for Pascal:
+
+<P><PRE>program:  program <EM>identifier</EM> <EM>filenames</EM> ; <EM>block</EM> .
+
+filenames:  | ( <EM>idlist</EM> )
+
+idlist:  <EM>identifier</EM> | <EM>idlist</EM> , <EM>identifier</EM>
+
+block:  <EM>varpart</EM> <EM>procpart</EM> <EM>compound</EM>
+
+varpart:  | var <EM>varlist</EM>
+
+procpart:  | <EM>procpart</EM> <EM>procedure</EM> | <EM>procpart</EM> <EM>function</EM>
+
+compound:  begin <EM>statements</EM> end
+
+statements:  <EM>statement</EM> | <EM>statements</EM> ; <EM>statement</EM>
+
+procedure:  procedure <EM>identifier</EM> <EM>args</EM> ; <EM>block</EM> ;
+
+function:  function <EM>identifier</EM> <EM>args</EM> : <EM>type</EM> ; <EM>block</EM> ;
+</PRE>
+
+<P>A program consists of six components.  Some of these components
+are particular words (like <CODE>program</CODE>) or punctuation marks; other
+components are defined in terms of even smaller units by other
+rules.<SUP>*</SUP>
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>The <EM>filenames</EM> component is an optional list of
+names of files, part of Pascal's input/output capability; my compiler doesn't
+handle file input or output, so it ignores this list if there is one.</SMALL></BLOCKQUOTE></SMALL><P>A vertical bar (<CODE>|</CODE>) in a rule separates alternatives; an idlist
+(identifier list) is either a single identifier or a smaller idlist followed
+by a comma and another identifier.  Sometimes one of the alternatives in a
+rule is empty; for example, a varpart can be empty because a block need not
+declare any local variables.
+
+<P>The goal in designing a formal specification is to capture the
+syntactic hierarchy
+of the language you're describing.  For example, you could define
+a Pascal type as
+
+<P><PRE>type:  integer | real | char | boolean | array <EM>range</EM> of integer |
+               packed array <EM>range</EM> of integer | array <EM>range</EM> of real |
+               ...
+</PRE>
+
+<P>but it's better to say
+
+<P><PRE>type:  <EM>scalar</EM> | array <EM>range</EM> of <EM>scalar</EM> |
+               packed array <EM>range</EM> of <EM>scalar</EM>
+
+scalar:  integer | real | char | boolean
+</PRE>
+
+<P>Try completing the syntactic description of my subset of Pascal along these
+lines.  You might also try a similar syntactic description of Logo.  Which
+is easier?
+
+<P>Another kind of formal description is
+
+the <EM>recursive transition network</EM> (RTN).
+An RTN is like a finite-state machine except that instead
+of each arrow representing a single symbol in the machine's alphabet, an
+arrow can be labeled with the name of another RTN; such an arrow represents
+any string of symbols accepted by that RTN.
+
+<P>Below I show two RTNs, one for a program and one for a
+sequence of statements (the body of a compound statement).
+In the former, the transition from state 5 to state 6 is followed if what
+comes next in the Pascal program is a string of symbols accepted by the RTN
+named &quot;block.&quot; In these diagrams, a word in <CODE>typewriter</CODE> style like
+<CODE>program</CODE> represents a single symbol, as in a finite-state machine diagram,
+while a word in <EM>italics</EM> like <EM>block</EM> represents any
+string accepted by the RTN of that name.  The <EM>statements</EM> RTN
+is recursive; one path through the network involves a transition that
+requires parsing a smaller <EM>statements</EM> unit.
+
+<P><CENTER><IMG SRC="rtn1.gif" ALT="figure: rtn1"></CENTER>
+<P><CENTER><IMG SRC="rtn2.gif" ALT="figure: rtn2"></CENTER>
+
+<P><H2>Tokenization</H2>
+
+
+<P>In both the production rules and the RTNs I've treated words like <CODE>
+program</CODE> as a single symbol of the &quot;alphabet&quot; of the language.  It would
+be possible, of course, to use single characters as the alphabetic symbols
+and describe the language in this form:
+
+<P><CENTER><IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/tokenrtn.gif" ALT="figure: tokenrtn"></CENTER>
+
+<P>Extending the formal description down to that level, though, makes
+it hard to see the forest for the trees; the important structural patterns
+get lost in details about, for instance, where spaces are required between
+words (as in <CODE>program tower</CODE>), where they're optional (as in <CODE>
+2 + 3</CODE>), and where they're not allowed at all (<CODE>prog
+ram</CODE>).  A similar complication is that a comment in braces can
+be inserted anywhere in the program; it would be enormously complicated if
+every state of every RTN had to have a transition for a left brace beginning
+a comment.
+
+<P>Most language processors therefore group the characters of the source
+program into <EM>tokens</EM> used as the alphabet for the formal
+grammar.  A token may be a single character, such as a punctuation mark, or
+a group of characters, such as a word or a number.  Spaces do not ordinarily
+form part of tokens, although in the Pascal compiler one kind of token is a
+quoted character string that can include spaces.  Comments are also removed
+during tokenization.  Here's what the <CODE>tower</CODE> program from Chapter 4
+looks like in token form:
+
+<P><CENTER><IMG SRC="tokenized.gif"></CENTER>
+
+<P>Tokenization is what the Logo <CODE>readlist</CODE> operation does when
+it uses spaces and brackets to turn the string of characters you type into
+a sequence of words and lists.
+
+<P>Tokenization is also called <EM>lexical analysis.</EM> This term has
+nothing to do with lexical scope; the word &quot;lexical&quot; is used not to remind
+us of a dictionary but because the root &quot;lex&quot; means <EM>word</EM> and
+lexical analysis divides the source program into words.
+
+<P>I've been talking as if the Pascal compiler first went through the entire
+source file tokenizing it and then went back and parsed the result.  That's
+not actually how it works; instead, the parser just calls a procedure named
+<CODE>token</CODE> whenever it wants to see the next token in the source file.
+I've already mentioned that Pascal was designed to allow the compiler to
+read straight through the source program without jumping around and
+re-reading parts of it.
+
+<P><H2>Lookahead</H2>
+
+
+<P>Consider the situation when the parser has recognized the first token
+(<CODE>program</CODE>) as the beginning of a program and it invokes <CODE>token</CODE> to
+read the second token, the program name.  In the <CODE>tower</CODE> program, the
+desired token is <CODE>tower</CODE>.  <CODE>Token</CODE> reads the letter <CODE>t</CODE>; since
+it's a letter, it must be the beginning of an identifier.  Any number of
+letters or digits following the <CODE>t</CODE> will be part of the identifier, but
+the first non-alphanumeric character ends the token.  (In this case, the
+character that ends the token will be a semicolon.)
+
+<P>What this means is that <CODE>token</CODE> has to read one character too many in
+order to find the end of the word <CODE>tower</CODE>.  The semicolon isn't
+part of that token; it's part of the <EM>following</EM> token.  (In fact it's
+the entire following token, but in other situations that need not be true.)
+Ordinarily <CODE>token</CODE> begins its work by reading a character from the
+source file, but the next time we call <CODE>token</CODE> it has to deal with the
+character it's already read.  It would simplify things enormously if <CODE>
+token</CODE> could &quot;un-read&quot; the semicolon that ends the token <CODE>
+tower</CODE>.  It's possible to allow something like un-reading by using a
+technique called <EM>lookahead.</EM>
+
+<P><PRE>to getchar
+local &quot;char
+if namep &quot;peekchar
+    [make &quot;char :peekchar
+     ern &quot;peekchar
+     output :char]
+output readchar
+end
+</PRE>
+
+<P><CODE>Getchar</CODE> is the procedure that <CODE>token</CODE> calls to read the next
+character from the source file.  Ordinarily <CODE>getchar</CODE> just invokes the
+primitive <CODE>readchar</CODE> to read a character from the file.<SUP>*</SUP>  But if there is a variable named <CODE>peekchar</CODE>, then <CODE>
+getchar</CODE> just outputs whatever is in that variable without looking at the
+file.  <CODE>Token</CODE> can now un-read a character by saying
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I'm lying.
+The real <CODE>getchar</CODE> is slightly more complicated because it checks for an
+unexpected end of file and because it prints the characters that it reads
+onto the screen.  The program listing at the end of the chapter tells the
+whole story.</SMALL></BLOCKQUOTE></SMALL><P><PRE>make &quot;peekchar :char
+</PRE>
+
+<P>This technique only allows <CODE>token</CODE> to un-read a single
+character at a time.  It would be possible to replace <CODE>peekchar</CODE> with a
+<EM>list</EM> of pre-read characters to be recycled.  But in fact one is
+enough.  When a program &quot;peeks at&quot; characters before they're read &quot;for
+real,&quot; the technique is called <EM>lookahead.</EM>  <CODE>Getchar</CODE> uses
+<EM>one-character lookahead</EM> because <CODE>peekchar</CODE> only stores a single
+character.
+
+<P>It turns out that, for similar reasons, the Pascal parser will occasionally
+find it convenient to peek at a <EM>token</EM> and re-read it later.  <CODE>
+Token</CODE> therefore provides for one-<EM>token</EM> lookahead using a similar
+mechanism:
+
+<P><PRE>to token
+local [token char]
+if namep &quot;peektoken [make &quot;token :peektoken
+                     ern &quot;peektoken   output :token]
+make &quot;char getchar
+if equalp :char &quot;|{| [skipcomment output token]
+if equalp :char char 32 [output token]
+if equalp :char char 13 [output token]
+if equalp :char char 10 [output token]
+if equalp :char &quot;' [output string &quot;']
+if memberp :char [+ - * / = ( , ) |[| |]| |;|] [output :char]
+if equalp :char &quot;|&lt;| [output twochar &quot;|&lt;| [= &gt;]]
+if equalp :char &quot;|&gt;| [output twochar &quot;|&gt;| [=]]
+if equalp :char &quot;. [output twochar &quot;. [.]]
+if equalp :char &quot;: [output twochar &quot;: [=]]
+if numberp :char [output number :char]
+if letterp ascii :char [output token1 lowercase :char]
+(throw &quot;error sentence [unrecognized character:] :char)
+end
+
+to twochar :old :ok
+localmake &quot;char getchar
+if memberp :char :ok [output word :old :char]
+make &quot;peekchar :char
+output :old
+end
+</PRE>
+
+<P>As you can see, <CODE>token</CODE> is mainly a selection of special cases.
+<CODE>Char 32</CODE> is a space; <CODE>char 13</CODE> or <CODE>char 10</CODE> is the end-of-line
+character.  <CODE>Skipcomment</CODE> skips over characters until it sees a right
+brace.  <CODE>String</CODE> accumulates characters up to and including a single
+quote (apostrophe), except that two single quotes in a row become one single
+quote inside the string and don't end the string.  <CODE>Number</CODE> is a little
+tricky because of decimal points (the string of characters <CODE>1.10</CODE> is a
+single token, but the string <CODE>1..10</CODE> is three tokens!) and exponent
+notation.  I'm not showing you all the details because the compiler is a
+very large program and we'll never get through it if I annotate every
+procedure.  But I did want to show you <CODE>twochar</CODE> because it's a good,
+simple example of character lookahead at work.  If the character <CODE>&lt;</CODE> is
+seen in the source program, it may be a token by itself or it may be part of
+the two-character tokens <CODE>&lt;=</CODE> or <CODE>&lt;&gt;</CODE>.  <CODE>Twochar</CODE> takes a peek
+at the next character in the file to decide.
+
+<P>If the character that <CODE>token</CODE> reads isn't part of any recognizable
+token, the procedure generates an error.  (The error is caught by the
+toplevel procedure <CODE>compile</CODE> so that it can close the source file.)
+This extremely primitive error handling is one of the most serious
+deficiencies in my compiler; it would be better if the compilation process
+continued, despite the error, so that any other errors in the program could
+also be discovered.  In a real compiler, more than half of the parsing
+effort goes into error handling; it's relatively trivial to parse a correct
+source program.
+
+<P><H2>Parsing</H2>
+
+
+<P>There are general techniques for turning a formal language specification,
+such as a set of production rules, into an algorithm for parsing the
+language so specified.  These techniques are analogous to the program in
+Chapter 1 that translates a regular expression into a finite-state machine.
+A program that turns a formal specification into a parser is called a
+<EM>parser generator.</EM>
+
+<P>The trouble is that the techniques that work for <EM>any</EM> set of rules
+are quite slow.  The time required to parse a sequence of length <EM>n</EM> is
+O(<EM>n</EM><SUP><SMALL>2</SMALL></SUP>) if the grammar is unambiguous or O(<EM>n</EM><SUP><SMALL>3</SMALL></SUP>) if it's
+ambiguous.  A grammar is <EM>ambiguous</EM> if the same input sequence
+can be parsed correctly in more than one way.  For example, if the
+production rule
+
+<P><PRE>idlist:  <EM>identifier</EM> | <EM>idlist</EM> , <EM>identifier</EM>
+</PRE>
+
+<P>is applied to the string
+
+<P><PRE>Beatles,Who,Zombies,Kinks
+</PRE>
+
+<P>then the only possible application of the rule to accept the
+string produces this left-to-right grouping:
+
+<P><CENTER><IMG SRC="unambiguous.gif" ALT="figure: unambiguous"></CENTER>
+
+<P>However, if the rule were
+
+<P><PRE>idlist:  <EM>identifier</EM> | <EM>idlist</EM> , <EM>idlist</EM>
+</PRE>
+
+<P>this new rule would accept the same strings, but would allow
+alternative groupings like
+
+<P><CENTER><IMG SRC="ambiguous.gif" ALT="figure: ambiguous"></CENTER>
+
+<P>The former rule could be part of an unambiguous grammar; the new
+rule makes the grammar that contains it ambiguous.
+
+<P>
+
+<P>It's usually not hard to devise an unambiguous grammar for any practical
+programming language, but even a quadratic algorithm is too slow.  Luckily,
+most programming languages have <EM>deterministic grammars,</EM>
+which is a condition even stricter than being unambiguous.  It means that a
+parser can read a program from left to right, and can figure out what to do
+with the next token using only a fixed amount of lookahead.  A parser for a
+deterministic grammar can run in linear time, which is a lot better than
+quadratic.
+
+<P>When I said &quot;figure out what to do with the next token,&quot; I was being
+deliberately vague.  A deterministic parser doesn't necessarily know exactly
+how a token will fit into the complete program--which production rules will
+be branch nodes in a parse tree having this token as a leaf node--as soon
+as it reads the token.  As a somewhat silly example, pretend that the word
+<CODE>if</CODE> is not a &quot;reserved word&quot; in Pascal; suppose it could be the
+name of a variable.  Then, when the parser is expecting the beginning of a
+new statement and the next token is the word <CODE>if</CODE>, the parser doesn't
+know whether it is seeing the beginning of a conditional statement such as
+<CODE>if x &gt; 0 then writeln('positive')</CODE> or the beginning of an assignment
+statement such as <CODE>if := 87</CODE>.  But the parser could still be deterministic.
+Upon seeing the word <CODE>if</CODE>, it would enter a state (as in a finite state
+machine) from which there are two exits.  If the next token turned out to be
+the <CODE>:=</CODE> assignment operator, the parser would follow one transition; if
+the next token was a variable or constant value, the parser would choose a
+different next state.
+
+<P>The real Pascal, though, contains no such syntactic cliffhangers.  A Pascal
+compiler can always tell which production rule the next token requires.
+That's why the language includes keywords like <CODE>var</CODE>, <CODE>
+procedure</CODE>, and <CODE>function</CODE>.  For the most part, you could figure out
+which kind of declaration you're reading without those keywords by looking
+for clues like whether or not there are parentheses after the identifier
+being declared.  (If so, it's a procedure or a function.)  But the keywords
+let you know from the beginning what to expect next.  That means we can
+write what's called a <EM>predictive grammar</EM> for Pascal,
+even simpler to implement than a deterministic one.
+
+<P>There are general algorithms for parsing deterministic languages, and there
+are parser generators using these algorithms.  One widely used example is
+the YACC (Yet Another Compiler Compiler) program that translates
+production rules into a parser in the C programming language.<SUP>*</SUP> But because Pascal's
+grammar is so simple I found it just as easy to do the translation by hand.
+For each production rule in a formal description of Pascal, the compiler
+includes a Logo procedure that parses each component part of the production
+rule.  A parser written in this way is called a <EM>
+recursive descent parser.</EM> Here's a sample:
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>A parser
+generator is also called a <EM>compiler compiler</EM> because it treats
+the formal specification as a kind of source program and produces a compiler
+as the object program.  But the name isn't quite accurate because, as you
+know, there's more to a compiler than the parser.</SMALL></BLOCKQUOTE></SMALL><P><PRE>to statement
+local [token type]
+ifbe &quot;begin [compound stop]
+ifbe &quot;for [pfor stop]
+ifbe &quot;if [pif stop]
+ifbe &quot;while [pwhile stop]
+ifbe &quot;repeat [prepeat stop]
+ifbe &quot;write [pwrite stop]
+ifbe &quot;writeln [pwriteln stop]
+make &quot;token token
+make &quot;peektoken :token
+if memberp :token [|;| end until] [stop]
+make &quot;type gettype :token
+if emptyp :type [(throw &quot;error sentence :token [can't begin statement])]
+if equalp :type &quot;procedure [pproccall stop]
+if equalp :type &quot;function [pfunset stop]
+passign
+end
+
+to pif
+local [cond elsetag endtag]
+make &quot;cond pboolean pexpr
+make &quot;elsetag gensym
+make &quot;endtag gensym
+mustbe &quot;then
+code (list &quot;jumpf :cond (word &quot;&quot; :elsetag))
+regfree :cond
+statement
+code (list &quot;jump (word &quot;&quot; :endtag))
+code :elsetag
+ifbe &quot;else [statement]
+code :endtag
+end
+</PRE>
+
+<P>Many of the details of <CODE>pif</CODE> have to do with code
+generation, but never mind those parts now.  For the
+moment, my concern is with the parsing aspect of these procedures: how they
+decide what to accept.
+
+<P><CODE>Statement</CODE> is an important part of the parser; it is invoked whenever a
+Pascal statement is expected.  It begins by checking the next token from the
+source file.  If that token is <CODE>begin</CODE>, <CODE>for</CODE>, <CODE>if</CODE>, <CODE>while</CODE>,
+or <CODE>repeat</CODE> then we're finished with the token and <CODE>statement</CODE> turns
+to a subprocedure to handle the syntax of whatever structured statement type
+we've found.  If the token isn't one of those, then the statement has to be
+a simple statement and the token has to be an identifier, i.e., the name of
+a procedure, a function, or a variable.  (One other trivial possibility is
+that this is an <EM>empty</EM> statement, if we're already up to the
+semicolon, <CODE>end</CODE>, or <CODE>until</CODE> that marks the end of a statement.)
+In any of these cases, the token we've just read is important to the parsing
+procedure that will handle the simple statement, so <CODE>statement</CODE> un-reads
+it before deciding what to do next.  <CODE>Gettype</CODE> outputs the type of the
+identifier, either a variable type like <CODE>real</CODE> or else <CODE>procedure</CODE>
+or <CODE>function</CODE>.  (The compiler data structures that underlie the work of
+<CODE>gettype</CODE> will be discussed later.)  If the token is a
+procedure name, then this is a procedure call statement.  If the token is a
+function name, then this is the special kind of assignment inside a function
+definition that provides the return value for the function.  Otherwise, the
+token must be a variable name and this is an ordinary assignment statement.
+
+<P>The procedure <CODE>pif</CODE> parses <CODE>if</CODE> statements.  (The letter <CODE>p</CODE> in
+its name stands for &quot;Pascal&quot;; many procedures in the compiler have such
+names to avoid conflicts with Logo procedures with similar purposes.)  The
+syntax of Pascal <CODE>if</CODE> is
+
+<P><PRE>ifstatement:  if <EM>boolean</EM> then <EM>statement</EM> |
+                 if <EM>boolean</EM> then <EM>statement</EM> else <EM>statement</EM>
+</PRE>
+
+<P>When <CODE>pif</CODE> begins, the token <CODE>if</CODE> has just been read by
+<CODE>statement</CODE>.  So the first thing that's required is a boolean expression.
+<CODE>Pexpr</CODE> parses an expression; that task is relatively complicated and
+will be discussed in more detail later.  <CODE>Pboolean</CODE> ensures that the
+expression just parsed does indeed produce a value of type <CODE>boolean</CODE>.
+
+<P>The next token in the source file <EM>must</EM> be the word <CODE>then</CODE>.  The
+instruction
+
+<P><PRE>mustbe &quot;then
+</PRE>
+
+<P>in <CODE>pif</CODE> ensures that.  Here's <CODE>mustbe</CODE>:
+
+<P><PRE>to mustbe :wanted
+localmake &quot;token token
+if equalp :token :wanted [stop]
+(throw &quot;error (sentence &quot;expected :wanted &quot;got :token))
+end
+</PRE>
+
+<P>If <CODE>mustbe</CODE> returns successfully, <CODE>pif</CODE> then invokes
+<CODE>statement</CODE> recursively to parse the true branch of the conditional.
+The production rule tells us that there is then an <EM>optional</EM> false
+branch, signaled by the reserved word <CODE>else</CODE>.  The instruction
+
+<P><PRE>ifbe &quot;else [statement]
+</PRE>
+
+<P>handles that possibility.  If the next token matches the first
+input to <CODE>ifbe</CODE> then the second input, an instruction list, is carried
+out.  Otherwise the token is un-read.  There is also an <CODE>ifbeelse</CODE> that
+takes a third input, an instruction list to be carried out if the next token
+isn't equal to the first input.  (<CODE>Ifbeelse</CODE> still un-reads the token in
+that case, before it runs the third input.)  These must be macros so that
+the instruction list inputs can include <CODE>output</CODE> or <CODE>stop</CODE>
+instructions (as discussed in Volume 2), as in the invocations of <CODE>ifbe</CODE>
+in <CODE>statement</CODE> seen a moment ago.
+
+<P><PRE>.macro ifbe :wanted :action
+localmake &quot;token token
+if equalp :token :wanted [output :action]
+make &quot;peektoken :token
+output []
+end
+
+.macro ifbeelse :wanted :action :else
+localmake &quot;token token
+if equalp :token :wanted [output :action]
+make &quot;peektoken :token
+output :else
+end
+</PRE>
+
+<P>If there were no code generation involved, <CODE>pif</CODE> would be written this
+way:
+
+<P><PRE>to pif
+pboolean pexpr
+mustbe &quot;then
+statement
+ifbe &quot;else [statement]
+end
+</PRE>
+
+<P>This simplified procedure is a straightforward translation of the
+RTN
+
+<P><CENTER><IMG SRC="pif.gif" ALT="figure: pif"></CENTER>
+
+<P>The need to generate object code complicates the parser.  But
+don't let that distract you; in general you can see the formal structure of
+Pascal syntax reflected in the sequence of instructions used to parse that
+syntax.
+
+<P>The procedures that handle other structured statements, such as <CODE>pfor</CODE>
+and <CODE>pwhile</CODE>, are a lot like <CODE>pif</CODE>.  Procedure and function
+declarations (procedures <CODE>procedure</CODE>, <CODE>function</CODE>, and <CODE>proc1</CODE> in
+the compiler) also use the same straightforward parsing technique, but are a
+little more complicated because of the need to keep track of type
+declarations for each procedure's parameters and local variables.
+Ironically, the hardest thing to compile is the &quot;simple&quot; assignment
+statement, partly because of operator precedence (multiplication before
+addition) in expressions (procedure <CODE>pexpr</CODE> in the compiler) and partly
+because of the need to deal with the complexity of variables, including
+special cases such as assignments to <CODE>var</CODE> parameters and array elements.
+
+<P>I haven't yet showed you <CODE>pboolean</CODE> because you have to understand how
+the compiler handles expressions first.  But it's worth noticing that Pascal
+can check <EM>at compile time</EM> whether or not an expression is going to
+produce a <CODE>boolean</CODE> value even though the program hasn't been run yet
+and the variables in the expression don't have values yet.  It's the strict
+variable typing of Pascal that makes this compile-time checking possible.
+If we were writing a Logo compiler, the checking would have to be postponed
+until run time because you can't, in general, know what type of datum will
+be computed by a Logo expression until it's actually evaluated.
+
+<P><H2>Expressions and Precedence</H2>
+
+<P>
+
+<P>Arithmetic or boolean expressions appear not only on the right side of
+assignment statements but also as actual parameters, array index values, and
+as &quot;phrases&quot; in structured statements.  One of the classic problems in
+compiler construction is the translation of these expressions to executable
+form.  The interesting difficulty concerns <EM>
+operator precedence</EM>--the rule that in a string of alternating
+operators and operands, multiplications are done before additions, so
+
+<P><PRE>a + b * c + d
+</PRE>
+
+<P>means
+
+<P><PRE>a + (b * c) + d
+</PRE>
+
+<P>Pascal has four levels of operator precedence.  The highest level, number 4,
+
+is the <EM>unary</EM> operators <CODE>+</CODE>, <CODE>-</CODE>, and <CODE>not</CODE>.  (The first
+
+two can be used as unary operators (<CODE>-3</CODE>) or <EM>binary</EM> ones (<CODE>
+6-3</CODE>); it's only in the unary case that they have this
+precedence.)<SUP>*</SUP> Then comes multiplication, division, and
+logical <CODE>and</CODE> at level 3.  Level 2 has binary addition, subtraction, and
+<CODE>or</CODE>.  And level 1 includes the relational operators like
+<CODE>=</CODE>.
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>It's unfortunate that the word &quot;binary&quot; is used in
+computer science both for base-2 numbers and for two-input operations.
+ Kenneth Iverson, in his documentation for the
+language APL, used the words <EM>monadic</EM> and <EM>
+dyadic</EM> instead of unary and binary to avoid that ambiguity.  But
+those terms haven't caught on.</SMALL></BLOCKQUOTE></SMALL><P>
+
+<P>The formalization of precedence could be done using the mechanisms we've
+already seen.  For example, here is a production rule grammar for
+expressions using only the four basic arithmetic operations.
+
+<P><PRE>expression:  <EM>term</EM> | <EM>expression</EM> + <EM>term</EM> | <EM>expression</EM> - <EM>term</EM>
+term:  <EM>factor</EM> | <EM>term</EM> * <EM>factor</EM> | <EM>term</EM> / <EM>factor</EM>
+factor:  <EM>variable</EM> | <EM>number</EM> | ( <EM>expression</EM> )
+</PRE>
+
+<P>This grammar also introduces into the discussion the fact that the
+precedence of operations can be changed by using parentheses.
+
+<P>This grammar, although formally correct, is not so easy to use in a
+recursive descent parser.
+One subtle but important problem is that it's <EM>left recursive:</EM>  Some
+of the alternative forms for an <CODE>expression</CODE> start with an <CODE>
+expression</CODE>.  If we tried to translate this into a Logo procedure it would
+naturally start out
+
+<P><PRE>to expression
+local [left op right]
+make &quot;left expression
+ifbe &quot;+
+  [make &quot;op &quot;+
+   make &quot;right term]
+  [ifbe &quot;-
+     [make &quot;op &quot;-
+      make &quot;right term]
+     [make &quot;op []] ]
+...
+</PRE>
+
+<P>But this procedure will never get past the first <CODE>make</CODE>; it's
+an infinite loop.  It will never actually read a token from the source file;
+instead it keeps invoking itself recursively.
+
+<P>Left association is a problem for automatic compiler compilers, too.  There
+are ways to solve the problem but I don't want to get into that because in
+fact arithmetic expressions are generally handled by an entirely different
+scheme, which I'll show you in a moment.  The problem wouldn't come up if
+the order of the operands were reversed, so the rules said
+
+<P><PRE>expression:  <EM>term</EM> | <EM>term</EM> + <EM>expression</EM> | <EM>term</EM> - <EM>expression</EM>
+</PRE>
+
+<P>and so on.  Unfortunately this changes the meaning, and the rules
+of Pascal say that equal-precedence operations are performed left to right.
+
+<P>In any case, the formalization of precedence with production rules gets more
+complicated as the number of levels of precedence increases.  I showed you a
+grammar with two levels.  Pascal, with four levels, might reasonably be done
+in a similar way, but think about the C programming language, which has 15
+levels of precedence!
+
+<P><H2>The Two-Stack Algorithm for Expressions</H2>
+
+
+<P>What we're after is an algorithm that will allow the compiler to read an
+expression once, left to right, and group operators and operands correctly.
+The algorithm involves the use of two stacks, one for operations and one for
+data.  For each operation we need to know whether it's unary or binary and
+what its precedence level is.  I'll use the notation &quot;<CODE>*</CODE><SUB><SMALL>2,3</SMALL></SUB>&quot; to
+represent binary <CODE>*</CODE> at precedence level 3.  So the expression
+
+<P><PRE>a + b * - c - d
+</PRE>
+
+<P>will be represented in this algorithm as
+
+<P>
+<IMG SRC="vdash.gif"><SUB><SMALL>0</SMALL></SUB> a +<SUB><SMALL>2,2</SMALL></SUB> b *<SUB><SMALL>2,3</SMALL></SUB> -<SUB><SMALL>1,4</SMALL></SUB> c -<SUB><SMALL>2,2</SMALL></SUB> d <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"><SUB><SMALL>0</SMALL></SUB>
+
+
+<P>The symbols <IMG SRC="vdash.gif"> and <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"> aren't really part of the source
+expression; they're imaginary markers for the beginning and end of the
+expression.  When we read a token that doesn't make sense as part of an
+expression, we can un-read that token and pretend we read a <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"> instead.
+These markers are given precedence level zero because they form a boundary
+for <EM>any</EM> operators inside them, just as a low-precedence operator like
+<CODE>+</CODE> is a boundary for the operands of a higher-precedence operator like
+<CODE>*</CODE>.  (For the same reason, you'll see that parentheses are considered
+precedence zero.)
+
+<P>The two minus signs in this expression have two different meanings.  As you
+read the following algorithm description, you'll see how the algorithm knows
+whether an operation symbol is unary or binary.
+
+<P><STRONG>Step 1.</STRONG>  We initialize the two stacks this way:
+
+<P><PRE>operation:  [ <IMG SRC="vdash.gif"><SUB><SMALL>0</SMALL></SUB> ]
+
+data:       [ ]
+</PRE>
+
+<P><STRONG>Step 2.</STRONG>  We are now expecting a datum, such as a variable.  Read a
+token.  If it's an operation, it must be unary; subscript it accordingly and
+go to step 4.  If it's a datum, push it onto the data stack.  (If it's
+neither an operation nor a datum, something's wrong.)
+
+<P><STRONG>Step 3.</STRONG>  We are now expecting a binary operation.  Read a token.  If
+it's an operation, subscript it as binary and go to step 4.  If not, we've
+reached the end of the expression.  Un-read the token, and go to step 4 with
+the token <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"><SUB><SMALL>0</SMALL></SUB>.
+
+<P><STRONG>Step 4.</STRONG>  We have an operation in hand at this point and we know its
+precedence level and how many arguments it needs.  Compare its precedence
+level with that of the topmost (most recently pushed) operation on the stack.
+If the precedence of the new operation is less than or equal to that of the
+one on the stack, go to step 5.  If it's greater, go to step 6.
+
+<P><STRONG>Step 5.</STRONG>  The topmost operation on the stack has higher precedence than
+the one we just read, so we should do it right away.  (For example, we've
+just read the <CODE>+</CODE> in <CODE>a*b+c</CODE>; the multiplication operation and both
+of its operands are ready on the stacks.)  Pop the operation off
+the stack, pop either one or two items off the data stack depending on the
+first subscript of the popped operation, then compile machine instructions to
+perform the indicated computation.  Push the result on the data stack
+as a single quantity.  However, if the operation we popped is <IMG SRC="vdash.gif">, then
+we're finished.  There should be only one thing on the data stack, and it's
+the completely compiled expression.  Otherwise, we still have the new
+operation waiting to be processed, so return to step 4.
+
+<P><STRONG>Step 6.</STRONG>  The topmost operation on the stack has lower precedence than
+the one we just read, so we can't do it yet because we're still reading its
+right operand.  (For example, we've just read the <CODE>*</CODE> in <CODE>a+b*c</CODE>;
+we're not ready to do either operation until we read the <CODE>c</CODE> later.)
+Push the new operation onto the operation stack, then return to step 2.
+
+<P>Here's how this algorithm works out with the sample expression above.  In
+the data stack, a boxed entry like <IMG SRC="a+b.gif"> means the result from
+translating that subexpression into the object language.
+
+<P><IMG SRC="stack1.gif">
+<BR><IMG SRC="stack2.gif">
+
+<P>The final value on the data stack is the translation of the
+entire expression.
+
+<P>The algorithm so far does not deal with parentheses.  They're handled
+somewhat like operations, but with slightly different rules.  A left
+parenthesis is stored on the operation stack as <CODE>(</CODE><SUB><SMALL>0</SMALL></SUB>, like the
+special marker at the beginning of the expression, but it does not invoke
+step 5 of the algorithm before being pushed on the stack.  A right
+parenthesis <EM>does</EM> invoke step 5, but only as far down the stack as
+the first matching left parenthesis; if it were an ordinary operation of
+precedence zero it would pop everything off the stack.  You might try to
+express precisely how to modify the algorithm to allow for parentheses.
+
+<P>Here are the procedures that embody this algorithm in the compiler.
+<CODE>Pgetunary</CODE> and <CODE>pgetbinary</CODE> output a list like
+
+<P><PRE>[sub 2 2]
+</PRE>
+
+<P>for binary <CODE>-</CODE> or
+
+<P><PRE>[minus 1 4]
+</PRE>
+
+<P>for unary minus.  (I'm leaving out some complications
+having to do with type checking.)  They work by looking for a <CODE>unary</CODE> or
+<CODE>binary</CODE> property on the property list of the operation symbol.
+Procedures with names like <CODE>op.prec</CODE> are selectors for the members
+of these lists.
+
+<P>In this algorithm, only step 5 actually generates any instructions in the
+object program.  This is the step in which an operation is removed from
+the operation stack and actually performed.  Step 5 is carried out by the
+procedure <CODE>ppopop</CODE> (Pascal pop operation); most of that procedure deals
+with code generation, but I've omitted that part of the procedure in the
+following listing because right now we're concerned with the parsing
+algorithm.  We'll return to code generation shortly.
+
+<P><CODE>Pexpr1</CODE> invokes <CODE>pdata</CODE> when it expects to read an operand, which
+could be a number, a variable, or a function call.  <CODE>Pdata</CODE>, which I'm
+not showing here, generates code to make the operand available and
+outputs the location of the result in the simulated computer, in a form
+that can be used by <CODE>pexpr</CODE>.
+
+<P><PRE>to pexpr
+local [opstack datastack parenlevel]
+make &quot;opstack [[popen 1 0]]                       ; step 1
+make &quot;datastack []
+make &quot;parenlevel 0
+output pexpr1
+end
+
+to pexpr1 
+local [token op]
+make &quot;token token                                 ; step 2
+while [equalp :token &quot;|(|] [popen  make &quot;token token]
+make &quot;op pgetunary :token
+if not emptyp :op [output pexprop :op]
+push &quot;datastack pdata :token
+make &quot;token token                                 ; step 3
+while [and (:parenlevel &gt; 0) (equalp :token &quot;|)| )]
+   [pclose  make &quot;token token]
+make &quot;op pgetbinary :token
+if not emptyp :op [output pexprop :op]
+make &quot;peektoken :token
+pclose
+if not emptyp :opstack [(throw &quot;error [too many operators])]
+if not emptyp butfirst :datastack [(throw &quot;error [too many operands])]
+output pop &quot;datastack
+end
+
+to pexprop :op                                    ; step 4
+while [(op.prec :op) &lt; (1 + op.prec first :opstack)] [ppopop]
+push &quot;opstack :op                                 ; step 6
+output pexpr1
+end
+
+to ppopop                                         ; step 5
+local [op function args left right type reg]
+make &quot;op pop &quot;opstack
+make &quot;function op.instr :op
+if equalp :function &quot;plus [stop]
+make &quot;args op.nargs :op
+make &quot;right pop &quot;datastack
+make &quot;left (ifelse equalp :args 2 [pop &quot;datastack] [[[] []]])
+make &quot;type pnewtype :op exp.type :left exp.type :right
+... code generation omitted ...
+push &quot;datastack (list :type &quot;register :reg)
+end
+
+to popen
+push &quot;opstack [popen 1 0]
+make &quot;parenlevel :parenlevel+1
+end
+
+to pclose
+while [(op.prec first :opstack) &gt; 0] [ppopop]
+ignore pop &quot;opstack
+make &quot;parenlevel :parenlevel - 1
+end
+</PRE>
+
+<P><H2>The Simulated Machine</H2>
+
+<P>We're ready to move from parsing to code generation, but first you must
+understand what a computer's native language is like.  Most computer models
+in use today have a very similar structure, although there are differences
+in details.  My simulated computer design makes these detail choices in
+favor of simplicity rather than efficiency.  (It wouldn't be very efficient
+no matter what, compared to real computers.  This &quot;computer&quot; is actually an
+interpreter, written in Logo, which is itself an interpreter.  So we have
+two levels of interpretation involved in each simulated instruction, whereas
+on a real computer, each instruction is carried out directly by the
+hardware.  Our compiled Pascal programs, as you've probably already noticed,
+run very slowly.  That's not Pascal's fault, and it's not even primarily my
+compiler's fault, even though the compiler doesn't include optimization
+techniques.  The main slowdown is in the interpretation of the machine
+instructions.)
+
+<P>Every computer includes a <EM>processor,</EM> which decodes
+instructions and carries out the indicated arithmetic operations, and a
+<EM>memory,</EM> in which information (such as the values of
+variables) is stored.  In modern computers, the processor is generally
+a single <EM>integrated circuit,</EM> nicknamed a <EM>chip,</EM>
+which is a rectangular black plastic housing one or two inches on a side
+that contains thousands or even millions of tiny components made of silicon.
+The memory is usually a <EM>circuit board</EM> containing several memory
+chips.  Computers also include circuitry to connect with input and output
+devices, but we're not going to have to think about those.  What makes one
+computer model different from another is mainly the processor.  If you
+have a PC, its processor is probably an Intel design with a name like
+80486 or Pentium; if you have a Macintosh, the processor might be a Motorola
+68040 or a Power PC chip.
+
+<P>It turns out that the wiring connecting the processor to the memory is
+often the main limiting factor on the speed of a computer.  Things happen at
+great speed within the processor, and within the memory, but only one value
+at a time can travel from one to the other.  Computer designers have invented
+several ways to get around this problem, but the important one for our
+purposes is that every modern processor includes a little bit of memory
+within the processor chip itself.  By &quot;a little bit&quot; I mean that a typical
+processor has enough memory in it to hold 32 values, compared to several
+million values that can be stored in the computer's main memory.  The 32
+memory slots within the processor are called <EM>
+registers.</EM><SUP>*</SUP>
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>One current topic in computer architecture research
+is the development of <EM>parallel</EM> computers with many processors working
+together.  In some of these designs, each processor includes its own
+medium-size memory within the processor chip.</SMALL></BLOCKQUOTE></SMALL><P>Whenever you want to perform an arithmetic operation, the operands must
+already be within the processor, in registers.  So, for example, the Pascal
+instruction
+
+<P><PRE>c := a + b
+</PRE>
+
+<P>isn't compiled into a single machine instruction.  First we must
+<EM>load</EM> the values of <CODE>a</CODE> and <CODE>b</CODE> from memory into registers,
+then add the two registers, then <EM>store</EM> the result back into memory:
+
+<P><PRE>rload 8 a
+rload 9 b
+add 10 8 9
+store 10 c
+</PRE>
+
+<P>The first <CODE>rload</CODE> instruction loads the value from memory
+location <CODE>a</CODE> into register 8.<SUP>*</SUP> The <CODE>add</CODE> instruction adds
+the numbers in registers 8 and 9, putting the result into register 10.  (In
+practice, you'll see that the compiler would be more likely to conserve
+registers by reusing one of the operand registers for the result, but for
+this first example I wanted to keep things simple.)  Finally we store the
+result into the variable <CODE>c</CODE> in memory.
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Really I should have called this
+instruction <CODE>load</CODE>, but my machine simulator uses Logo procedures to
+carry out the machine instructions, and I had to pick a name that wouldn't
+conflict with the Logo <CODE>load</CODE> primitive.</SMALL></BLOCKQUOTE></SMALL><P>The instructions above are actually not machine language instructions, but
+rather <EM>assembly language</EM> instructions, a kind of shorthand.
+A program called an <EM>assembler</EM> translates assembly language into
+machine language, in which each instruction is represented as a number.
+For example, if the instruction code for <CODE>add</CODE> is <CODE>0023</CODE>, then
+the <CODE>add</CODE> instruction above might be translated into <CODE>0023100809</CODE>,
+with four digits for the instruction code and two digits for each of the three
+register numbers.  (In reality the encoding would use binary numbers rather
+than the decimal numbers I've shown in this example.)  Since a machine
+language instruction is just a number, the instructions that make up a
+computer program are stored in memory along with the program's data values.
+But one of the simplifications I've made in my simulated computer is that
+the simulator deals directly with assembly language instructions, and those
+instructions are stored in a Logo list, separate from the program's data memory.
+
+<P>The simulated computer has 32 processor registers plus 3000 locations of
+main memory; it's a very small computer, but big enough for my sample
+Pascal programs.  (You can change these sizes by editing procedure <CODE>
+opsetup</CODE> in the compiler.)  The registers are numbered from 0 to 31, and
+the memory locations are numbered from 0 to 2999.  The number of a memory
+location is called its <EM>address.</EM>  Each memory location can hold
+one numeric value.<SUP>*</SUP>  A Pascal array will be represented by a contiguous block of
+memory locations, one for each member of the array.  Each register, too, can
+hold one numeric value.  In this machine, as in some real computers, register
+number 0 is special; it always contains the value zero.
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>This, too, is a simplification.  In real computers,
+different data types require different amounts of memory.  A character
+value, for example, fits into eight <EM>bits</EM> (binary digits) of memory,
+whereas an integer requires 32 bits in most current computers.  Instead of a
+single <CODE>load</CODE> instruction, a real computer has a separate one for each
+datum size.</SMALL></BLOCKQUOTE></SMALL><P>The simulated computer understands 50 instruction codes, fewer than most
+real computers.  The first group we'll consider are the 14 binary arithmetic
+instructions: <CODE>add</CODE>, <CODE>sub</CODE>, <CODE>mul</CODE>, <CODE>div</CODE> (real quotient),
+<CODE>quo</CODE> (integer quotient), <CODE>rem</CODE> (remainder), <CODE>land</CODE> (logical
+and), <CODE>lor</CODE> (logical or), <CODE>eql</CODE> (compare two operands for equality),
+<CODE>neq</CODE> (not equal), <CODE>less</CODE>, <CODE>gtr</CODE> (greater than), <CODE>leq</CODE> (less
+than or equal), and <CODE>geq</CODE> (greater than or equal).  The result of each
+of the six comparison operators is <CODE>0</CODE> for false or <CODE>1</CODE> for true.
+The machine also has four unary arithmetic instructions: <CODE>lnot</CODE>
+(logical not), <CODE>sint</CODE> (truncate to integer), <CODE>sround</CODE> (round to
+integer), and <CODE>srandom</CODE>.  Each of these 18 arithmetic instructions takes
+its operands from registers and puts its result into a register.
+
+<P>All but the last three of these are typical instructions of real
+computers.<SUP>*</SUP>
+(Not every computer has all of them; for example, if a computer has <CODE>eql</CODE>
+and <CODE>lnot</CODE>, then it doesn't really need a <CODE>neq</CODE> instruction because
+the same value can be computed by a sequence of two instructions.)  The
+operations <CODE>sint</CODE>, <CODE>sround</CODE>, and <CODE>srandom</CODE> are less likely to be
+machine instructions on actual computers.  On the other hand, most real
+computers have a <EM>system call</EM> mechanism, which is a machine
+instruction that switches the computer from the user's program to a part of
+the operating system that performs some task on behalf of the user.  System
+calls are used mainly for input and output, but we can pretend that there are
+system calls to compute these Pascal library functions.  (The letter <CODE>s</CODE>
+in the instruction names stands for &quot;system call&quot; to remind us.)
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>One important simplification is that in the simulated
+computer, the same instructions are used for all kinds of numbers.  A typical
+computer has three <CODE>add</CODE> instructions: one for integers, one for short
+reals (32 bits), and one for long reals (64 bits).</SMALL></BLOCKQUOTE></SMALL><P>The simulated computer also has another set of 18 <EM>immediate</EM>
+instructions, with the letter <CODE>i</CODE> added to the instruction name: <CODE>
+addi</CODE>, <CODE>subi</CODE>, and so on.  In these instructions, the rightmost operand
+in the instruction is the actual value desired, rather than the number of
+a register containing the operand.  For example,
+
+<P><PRE>add 10 8 9
+</PRE>
+
+<P>means, &quot;add the number in register 8 and the number in register 9,
+putting the result into register 10.&quot;  But
+
+<P><PRE>addi 10 8 9
+</PRE>
+
+<P>means, &quot;add the number in register 8 to the value 9, putting
+the result in register 10.&quot;
+
+<P>It's only the right operand that can be made immediate.  So, for example,
+the Pascal assignment
+
+<P><PRE>y := x - 5
+</PRE>
+
+<P>can be translated into
+
+<P><PRE>rload 8 x
+subi 8 8 5
+store 8 y
+</PRE>
+
+<P>but the Pascal assignment
+
+<P><PRE>y := 5 - x
+</PRE>
+
+<P>must be translated as
+
+<P><PRE>addi 8 0 5
+rload 9 x
+sub 8 8 9
+store 8 y
+</PRE>
+
+<P>This example illustrates one situation in which it's useful to
+have register 0 guaranteed to contain the value 0.
+
+<P>Our simulated machine has six more system call instructions having to do
+with printing results.  One of them, <CODE>newline</CODE>, uses no operands and
+simply prints a newline character, moving to the beginning of a new line
+on the screen.  Four more are for printing the value in a register; the
+instruction used depends on the data type of the value in the register.
+The instructions are <CODE>putch</CODE> for a character, <CODE>puttf</CODE> for a boolean
+(true or false) value, <CODE>putint</CODE> for an integer, and <CODE>putreal</CODE> for a
+real number.  Each takes two operands; the first, an immediate value, gives
+the minimum width in which to print the value, and the second is a register
+number.  So the instruction
+
+<P><PRE>putint 10 8
+</PRE>
+
+<P>means, &quot;print the integer value in register 8, using at least
+10 character positions on the line.&quot;  The sixth printing instruction,
+<CODE>putstr</CODE>, is used only for constant character strings in the Pascal
+program; its first operand is a width, as for the others, but its second
+is a Logo list containing the string to print:
+
+<P><PRE>putstr 1 [The shuffled deck:]
+</PRE>
+
+<P>This is, of course, unrealistic; in a real computer the second
+operand would have to be the memory address of the beginning of the
+array of characters to print.  But the way I handle printing isn't very
+realistic in any case; I wanted to do the simplest possible thing, because
+worrying about printing really doesn't add anything to your understanding
+of the process of compilation, which is the point of this chapter.
+
+<P>The next group of instructions has to do with the flow of control in the
+computer program.  Ordinarily the computer carries out its instructions
+in sequence, that is, in the order in which they appear in the program.
+But in order to implement conditionals (such as <CODE>if</CODE>), loops (such as
+<CODE>while</CODE>), and procedure calls, we must be able to jump out of sequence.
+The <CODE>jump</CODE> instruction takes a single operand, a <EM>label</EM>
+that appears somewhere in the program.  When the computer carries out a jump
+instruction, it looks for the specified label and starts reading instructions
+just after where that label appears in the program.  (We saw an example of
+labels at the beginning of this chapter.)
+
+<P>The <CODE>jump</CODE> instruction is used for unconditional jumps.  In order to
+implement conditionals and loops, we need a way to jump if some condition
+is true.  The instruction <CODE>jumpt</CODE> (jump if true) has two operands,
+a register number and a label.  It jumps to the specified label if and only
+if the given register contains a true value.  (Since registers hold only
+numbers, we use the value 1 to represent true, and 0 to represent false.)
+Similarly, <CODE>jumpf</CODE> jumps if the value in the given register is false.
+
+<P>For procedure and function calls, we need a different mechanism.  The jump
+is unconditional, but the computer must remember where it came from, so that
+it can continue where it left off once the called procedure or function
+returns.  The instruction <CODE>jal</CODE> (jump and link) takes two operands, a
+register and a label.  It puts into the register the address of the
+instruction following the <CODE>jal</CODE> instruction.<SUP>*</SUP> Then it jumps to the specified label.
+To return from the called procedure, we use the <CODE>jr</CODE> (jump register)
+instruction.  It has one operand, a register number; it jumps to the
+instruction whose address is in the register.
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>In a real computer,
+each instruction is stored in a particular memory location, so the address
+of an instruction is the address of the memory location in which it's stored.
+In this simulated computer, I keep the program in the form of a Logo list,
+and so I cheat and put the sublist starting at the next instruction into
+the register.  This isn't quite as much of a cheat as it may seem, though,
+since you know from Chapter 3 that Logo represents a list with the memory
+address of the first pair of the list.</SMALL></BLOCKQUOTE></SMALL><P>One final instruction that affects the flow of control is the <CODE>exit</CODE>
+system call.  It requires no operands; it terminates the running of the
+program.  In this simulated computer, it returns to a Logo prompt; in a
+real computer, the operating system would start running another user program.
+
+<P>The only remaining instructions are <CODE>rload</CODE> and <CODE>store</CODE>.  You already
+know what these do, but I've been showing them in oversimplified form so far.
+The second operand can't just be a variable name, because that variable might
+not be in the same place in memory every time the procedure is called. Think,
+for example, about a recursive procedure.  Several invocations may be in
+progress at once, all of them carrying out the same compiled instructions,
+but each referring to a separate set of local variables.  The solution to
+this problem is that the compiler arranges to load into a register
+the address of a block of memory containing all the local variables for a
+given procedure call.  If the variable <CODE>c</CODE>, for example, is in the sixth
+memory location of that block, an instruction to load or store that variable
+must be able to say &quot;the memory location whose address is the contents of
+register 4 (let's say) plus five.&quot; So each load and store instruction
+contains an <EM>index register</EM> in parentheses following an
+<EM>offset</EM> to be added to the contents of that register.  We'd say
+
+<P><PRE>store 8 5(4)
+</PRE>
+
+<P>to store the contents of register 8 into the variable <CODE>c</CODE>,
+provided that register 4 points to the correct procedure invocation's
+local variables and that <CODE>c</CODE> is in the sixth position in the block.
+(The first position in the block would have offset 0, and so on.)
+
+<P><H2>Stack Frames</H2>
+
+<P>The first step in invoking a procedure or function is to set aside, or
+<EM>allocate,</EM> a block of memory locations for use by that invocation.
+This block will include the procedure's local variables, its arguments, and
+room to save the values of registers as needed.  The compiler's data
+structures include, for each procedure, how much memory that procedure needs
+when it's invoked.  That block of memory is called a <EM>frame.</EM>
+
+<P>In most programming languages, including Pascal and Logo (but not, as it
+turns out, Lisp), the frame allocated when a procedure invocation begins can
+be released, or <EM>deallocated,</EM> when that invocation returns to its
+caller.  In other words, the procedure's local variables no longer exist
+once the invocation is finished.  In these languages, the frames for all the
+active procedure invocations can be viewed as a <EM>stack,</EM> a data structure
+to which new elements are added by a Push operation, and elements are removed
+using a Pop operation that removes the most recently pushed element.  (In
+this case, the elements are the frames.)  That is, suppose that procedure A
+invokes B, which invokes C, which invokes D.  For each of these invocations
+a new frame is pushed onto the stack.  Which procedure finishes first?  It
+has to be D, the last one invoked.  When D returns, its frame can be popped
+off the stack.  Procedure C returns next, and its frame is popped, and so on.
+The phrase <EM>stack frame</EM> is used to refer to frames that
+behave like elements of a stack.
+
+<P>My Pascal compiler allocates memory starting at location 0 and working
+upward.  At the beginning of the program, a <EM>global frame</EM> is allocated
+to hold the program's global variables.  Register 3, the
+<EM>global pointer,</EM> always contains the
+address of the beginning of the global frame, so that every procedure can
+easily make use of global variables.  (Since the global frame is the first
+thing in memory, its address is always zero, so the value in register 3 is
+always 0.  But in a more realistic implementation the program itself would
+appear in memory before the global frame, so its address would be greater
+than zero.)  
+
+<P>At any point in the program, register 4, the <EM>frame pointer,</EM>
+contains the address of the beginning of the current frame, that is, the
+frame that was created for the current procedure invocation.  Register 2,
+the <EM>stack pointer,</EM> contains the address of the first
+currently unused location in memory.
+
+<P>My compiler is a little unusual in that when a procedure is called, the
+stack frame for the new invocation is allocated by the caller, not by the
+called procedure.  This simplifies things because the procedure's arguments
+can be stored in its own frame; if each procedure allocates its own frame,
+then the caller must store argument values in its (the caller's) frame,
+because the callee's frame doesn't exist yet.  So, in my compiler, the
+first step in a procedure call is to set register 5, the <EM>new frame
+pointer,</EM> to point to the first free memory location, and change the
+stack pointer to allocate the needed space.  If <CODE>N</CODE> memory locations
+are needed for the new frame, the calling procedure will contain the
+following instructions:
+
+<P><PRE>add 5 2 0
+addi 2 2 N
+</PRE>
+
+<P>The first instruction copies the value from register 2 (the
+first free memory location) into register 5; the second adds <CODE>N</CODE> to
+register 2.  (I've left out a complication, which is that the old value
+in register 5 must be saved somewhere before putting this new value
+into it.  You can read the code generation instructions at the beginning
+of <CODE>pproccall1</CODE>, in the program listing at the end of the chapter,
+for all the details.)  The current frame pointer is also saved in
+location 3 of the new frame:
+
+<P><PRE>store 4 3(5)
+</PRE>
+
+<P>The compiler uses data abstraction to refer to these register
+numbers and frame slots; for example, the procedure <CODE>reg.frameptr</CODE> takes
+no arguments and always outputs 4, while <CODE>frame.prevframe</CODE> outputs 3.
+
+<P>The next step is to put the argument values into the new frame.  During
+this process, the calling procedure must use register 4 to refer to its own
+variables, and register 5 to refer to the callee's variables.  The final
+step, just before calling the procedure, is to make the
+frame pointer (register 4) point to the new frame:
+
+<P><PRE>add 4 5 0
+</PRE>
+
+<P>Once the caller has set up the new frame and saved the necessary registers,
+it can call the desired procedure, putting the return address in register 1:
+
+<P><PRE>jal 1 &quot;proclabel
+</PRE>
+
+<P>The first step in the called procedure is to save the return
+address in location zero of its frame:
+
+<P><PRE>store 1 0(4)
+</PRE>
+
+<P>The procedure then carries out the instructions in its body.  When it's
+ready to return, it must load the saved return address back into register
+1, then restore the old stack pointer and frame pointer to deallocate
+its frame, and finally return to the caller:
+
+<P><PRE>rload 1 0(4)
+add 2 4 0
+rload 4 3(2)
+jr 1
+</PRE>
+
+<P>(Procedure <CODE>proc1</CODE> in the compiler generates these
+instructions for each procedure.)
+
+<P>One final complication about stack frames comes from Pascal's block
+structure.  Suppose we have a program with internal procedures arranged
+in this structure:
+
+<P><CENTER><IMG SRC="blocks.gif" ALT="figure: blocks"></CENTER>
+
+<P>Then suppose that the main program calls procedure A, which
+calls B, which calls C, which calls itself recursively.  The current (inner)
+invocation of C has access to its own variables, those of procedure A, and
+the global variables, but not to procedure B's variables.  How does
+procedure C know where procedure A's stack frame is located?  The answer is
+that every frame, in addition to saving a pointer to the previous frame,
+must include a pointer to the <EM>lexically enclosing</EM> frame.  The
+calling procedure sets this up; it can do this because it knows its own
+lexical depth and that of the called procedure.  For example, when procedure
+B calls procedure C, C's lexically enclosing frame will be the same as B's
+(namely, the frame for the invocation of A), because B and C are at the
+same lexical depth.  (They are both declared inside A.)  But when procedure
+A calls procedure B, which is declared within itself, A must store its own
+frame pointer as B's lexically enclosing frame.  Here is a picture of what's
+where in memory:
+
+<P><CENTER><IMG SRC="memory.gif" ALT="figure: memory"></CENTER>
+
+<P>If all these pointers between frames confuse you, it might help to keep in
+mind that the two kinds of pointers have very different purposes.  The
+pointer to the previous frame is used only when a procedure returns, to
+help in putting everything back the way it was before the procedure was
+called (in particular, restoring the old value of register 4).  The pointer
+to the lexically enclosing frame is used while the procedure is running,
+whenever the procedure makes reference to a variable that belongs to some
+outer procedure (for example, a reference in procedure B or C to a variable
+that belongs to procedure A).<SUP>*</SUP>
+
+<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>If procedures used the previous-frame
+pointers to make variable references, we would be compiling a dynamically
+scoped language!  In this example, because Pascal is lexically scoped,
+procedure C can't refer to procedure B's variables, even though B called C.</SMALL></BLOCKQUOTE></SMALL><P><H2>Data Structures</H2>
+
+<P>In this section I'll describe the main data structures used during
+compilation (abstract data types for identifiers and for expressions)
+and during the running of the program (registers and frames).
+
+<P>The main body of information that the compiler must maintain is the list of
+Pascal identifiers (variable, procedure, and function names).  Since Pascal
+is lexically scoped, some attention is necessary to ensure that each compiled
+Pascal procedure has access to precisely the variables that it should.
+At any point during the compilation, the value of <CODE>:idlist</CODE> is a list of
+just those identifiers that may be used in the part of the program being
+compiled.  We'll see in a moment how that's accomplished.
+
+<P>There are two main categories of identifier: procedure names (including the
+main program and functions in this category) and variable names.  The
+information maintained for a procedure name looks like this example:
+
+<P><PRE>[myproc procedure %myproc [2 46]]
+</PRE>
+
+<P>The first member of this list is the Pascal name of the program,
+procedure, or function.  The second member is the type indicator, which
+will be one of the words <CODE>program</CODE>, <CODE>procedure</CODE>, or <CODE>function</CODE>.
+The third member is the procedure's &quot;Logo name,&quot; the unique name used within
+the compiler to represent this program or procedure.  The program's Logo name
+is used as the variable name whose value will be the compiled program; the
+Logo names for procedures and functions are used as the labels in the
+compiled program at which each procedure or function begins.  The fourth
+member of the list contains the frame information for the procedure; it's
+a list of two numbers, the lexical depth and the frame size.  The lexical
+depth is 0 for the main program, 1 for a procedure declared inside the
+main program, 2 for a procedure declared inside a depth-1 procedure, and
+so on.  The frame size indicates how many memory locations must be allocated
+for each invocation of the procedure.  (For the main program, the frame size
+indicates the size of the global frame.)
+
+<P>Because of the Pascal scope rules, there can be two procedures with the same
+name, each declared within a different region of the program.  But there is
+no scoping of labels in the compiled program; each label must be unique.
+The simplest solution would be to use a distinct program-generated name for
+every Pascal procedure; the Pascal <CODE>doit</CODE> would become the
+Logo <CODE>g14</CODE>.  In fact I chose to modify this approach somewhat.  When an
+identifier <CODE>symbol</CODE> is declared in the source program, the compiler
+looks to see whether another identifier with the same name has appeared
+anywhere in the program.  If not, the Logo name <CODE>%symbol</CODE> is used; if
+so, a generated symbol is used.  This rule makes the compiled
+program a little easier to read, while preserving the rule that all Logo
+names must be unique.  The percent sign in <CODE>%symbol</CODE> ensures that this
+Logo name doesn't conflict with any names used in the
+compiler itself.  Procedure <CODE>newlname</CODE> in the compiler takes a Pascal
+identifier as input and generates a new Logo name to correspond.
+
+<P>The selectors <CODE>id.type</CODE>, <CODE>id.lname</CODE>, and <CODE>id.frame</CODE> are used
+for the second through fourth members of these lists.  There's no selector
+for the first member, the Pascal name, because the compiler never
+extracts this information explicitly.  Instead, the Pascal name is used
+by procedure <CODE>getid</CODE>, which takes a Pascal name as its input and
+returns the corresponding identifier list.
+
+<P>For variable names, the identifier information looks a little different:
+
+<P><PRE>[i integer [1 41] false]
+</PRE>
+
+<P>The first two members of this list are the Pascal name and the
+type, the same as for a procedure.  The third member is the <EM>pointer</EM>
+information for the variable: its lexical depth and the offset within
+a frame where it should be kept.  The compiler will use this information
+to issue instructions to load or store the value of the variable.  The
+fourth member of the list is <CODE>true</CODE> if this variable is a <CODE>var</CODE>
+(call by reference) parameter, <CODE>false</CODE> otherwise.
+
+<P>The variable <CODE>i</CODE> above has a scalar type, so its type indicator is
+a word.  Had it been an array, the type indicator would be a list such as
+
+<P><PRE>[integer [0 6] [5 3]]
+</PRE>
+
+<P>for a variable declared as <CODE>array [0..5, 5..7] of integer</CODE>.
+
+<P>For each dimension of the array, the first number in the list
+is the smallest possible index, while the second number is the number of
+possible index values in this dimension.  That is, the range <CODE>[3..7]</CODE>
+is represented by the list <CODE>[3 5]</CODE> because there are five possible
+values starting from 3.  Notice that there is no &quot;Logo name&quot; for a
+variable; in the compiled program, a variable is represented as an
+offset and an index register, such as <CODE>41(4)</CODE>.
+
+<P>For variables, the selectors used are <CODE>id.type</CODE>, <CODE>id.pointer</CODE>,
+and <CODE>id.varp</CODE>.
+
+<P>The information about currently accessible identifiers is kept in the list
+<CODE>idlist</CODE>.  This variable holds a list of lists; each Pascal identifier
+is represented by a list as indicated above.  <CODE>Idlist</CODE> is a local
+variable in the compiler procedures <CODE>program</CODE>, <CODE>procedure</CODE>, and <CODE>
+function</CODE>.  That is, there is a separate version for each block of the
+Pascal source program.  Each local version starts out with the same value as
+the higher-level version; identifiers declared within a block are added to
+the local version but not to the outer one.  When the compiler finishes a
+block, the (Logo) procedure in charge of that block stops and the outer <CODE>
+idlist</CODE> becomes current again.
+
+<P>This arrangement may or may not seem strange to you.  Recall that we had to
+invent this <CODE>idlist</CODE> mechanism because Pascal's
+lexical scope is different from Logo's dynamic scope.  The reason we have
+these different versions of <CODE>idlist</CODE> is to keep track of which
+identifiers are lexically available to which blocks.  And yet we are using
+Logo's dynamic scope to determine which <CODE>idlist</CODE> is available at any
+point in the compilation.  The reason this works is that <EM>the
+dynamic environment at compile time reflects the
+lexical environment at run time.</EM> For example, in the <CODE>tower</CODE>
+program, the fact that <CODE>tower</CODE> <EM>contains</EM> <CODE>hanoi</CODE>, which in
+turn contains <CODE>movedisk</CODE>, is reflected in the fact that <CODE>program</CODE>
+(compiling <CODE>tower</CODE>) <EM>invokes</EM> <CODE>procedure</CODE> (compiling <CODE>
+hanoi</CODE>), which in turn <EM>invokes</EM> <CODE>procedure</CODE> recursively
+(compiling <CODE>movedisk</CODE>).  Earlier I said that lexical scope is easier for
+a compiler than dynamic scope; this paragraph may help you see why that's
+true.  Even dynamically scoped Logo naturally falls into providing lexical
+scope for a Pascal compiler.
+
+<P>Here is how procedure and function declarations are compiled:
+
+<P><PRE>to procedure
+proc1 &quot;procedure framesize.proc
+end
+
+to function
+proc1 &quot;function framesize.fun
+end
+
+to proc1 :proctype :framesize
+localmake &quot;procname token
+localmake &quot;lexical.depth :lexical.depth+1
+localmake &quot;frame (list :lexical.depth 0)
+push &quot;idlist (list :procname :proctype (newlname :procname) :frame)
+localmake &quot;idlist :idlist
+ ...
+end
+</PRE>
+
+<P>(I'm leaving out the code generation part for now.)  What I want
+to be sure you understand is that the <CODE>push</CODE> instruction adds the new
+procedure name to the <EM>outer</EM> <CODE>idlist</CODE>; after that, it creates a
+new <CODE>idlist</CODE> whose initial value is the same as the old one.  It's very
+important that the instruction
+
+<P><PRE>localmake &quot;idlist :idlist
+</PRE>
+
+<P>comes where it does and not at the beginning of the procedure.
+<CODE>Proc1</CODE> needs access to the outer <CODE>idlist</CODE> when it starts, and
+then later it &quot;shadows&quot; that variable with its own local version.  This
+example shows that Logo's <CODE>local</CODE> command really is an executable
+command and not a declaration like Pascal's <CODE>var</CODE> declaration.  In
+Pascal it would be unthinkable to declare a new local variable in the middle
+of a block.
+
+<P><CODE>Getid</CODE> depends on Logo's dynamic scope to give it access to the right
+version of <CODE>idlist</CODE>.  Think about writing a Pascal compiler in Pascal.
+There would be a large block for <CODE>program</CODE> with many other procedures
+inside it.  Two of those inner procedures would be the ones for <CODE>
+procedure</CODE> and <CODE>function</CODE>.  (Of course they couldn't have those names,
+because they're Pascal reserved words.  They'd be called <CODE>
+compileprocedure</CODE> or some such thing.  But I think this will be easier to
+follow if I stick with the names used in the Logo version of the compiler.)
+Those two procedures should be at the same level of block structure; neither
+should be lexically within the other.  That's because a Pascal procedure
+block can include a function definition or vice versa.  Now, where in the
+lexical structure does <CODE>getid</CODE> belong?  It needs access to the local
+<CODE>idlist</CODE> of either <CODE>procedure</CODE> or <CODE>function</CODE>, whichever is
+currently active.  Similarly, things like <CODE>statement</CODE> need to be
+lexically within both <CODE>procedure</CODE> and <CODE>function</CODE>, and actually also
+within <CODE>program</CODE> because the outermost program block has statements too.
+It would theoretically be possible to solve the problem by writing three
+identical versions of each of these subprocedures, but that solution is too
+horrible to contemplate.  Instead a more common technique is to have only
+one <CODE>idlist</CODE> variable, a global one, and write the compiler so that it
+explicitly maintains a stack of old values of that variable.  The Pascal
+programmer has to do the work that the programming language should be doing
+automatically.  This is an example in which dynamic scope, while not
+absolutely essential, makes the program much easier to write and more
+straightforward to understand.
+
+<P>For every procedure or function in the Pascal source program, the compiler
+creates a global Logo variable with the same name as the corresponding
+label--that is, either a percent-prefix name or a generated symbol.  The
+value of this variable is a list of types, one for each argument to the
+procedure or function.  (For a function, the first member of the list is the
+type of the function itself; the butfirst is the list of types of its
+arguments.)  The compiler examines this &quot;type signature&quot; variable when a
+procedure or function is invoked, to make sure that the types of the actual
+arguments match the types of the formal parameters.
+
+<P>The other important compile-time data structure is the one that represents
+a compiled expression.  When the compiler calls <CODE>pexpr</CODE>, its job is to
+parse an expression from the Pascal source program and generate code to
+compute (when the compiled program runs!) the value of the expression.
+
+The generated code leaves the computed value in some register.  What <CODE>
+pexpr</CODE> returns to its caller is a data structure indicating which register
+and what type the expression has, like this:
+
+<P>
+
+<P><PRE>[real register 8]
+</PRE>
+
+<P>The first member of this list is the type of the expression.
+Most of the time, the second member is the word <CODE>register</CODE> and the
+third member is the register number in which the expression's value
+can be found.  The only exception is for a constant expression; if
+the expression is, for example, <CODE>15</CODE> then <CODE>pexpr</CODE> will output
+
+<P><PRE>[integer immediate 15]
+</PRE>
+
+<P>For the most part, these immediate expressions are useful
+only within recursive calls to <CODE>pexpr</CODE>.  In compiling the Pascal
+assignment
+
+<P><PRE>x := 15
+</PRE>
+
+<P>we're going to have to get the value 15 into a register anyway
+in order to be able to store it into <CODE>x</CODE>; the generated code will be
+something like
+
+<P><PRE>addi 7 0 15
+store 7 48(4)
+</PRE>
+
+<P>An immediate expression is most useful in compiling something like
+
+<P><PRE>x := a+15
+</PRE>
+
+<P>in which we can avoid loading the value 15 into a register, but
+can directly add it to the register containing <CODE>a</CODE>:
+
+<P><PRE>rload 7 53(4)
+addi 7 7 15
+store 7 48(4)
+</PRE>
+
+<P>The members of an expression list are examined using the selectors <CODE>
+exp.type</CODE>, <CODE>exp.mode</CODE> (the word <CODE>register</CODE> or <CODE>immediate</CODE>),
+and <CODE>exp.value</CODE> (the register number or immediate value).
+
+<P>In this compiler an &quot;expression&quot; is always a
+<EM>scalar</EM> type; although the formal definition of Pascal allows for
+array expressions, there are no operations that act on arrays the way
+operations like <CODE>+</CODE> act on scalars, and so an array expression can only
+be the name of an array variable.  (<EM>Members</EM> of arrays can, of
+course, be part of a scalar expression.)  <CODE>Passign</CODE>, the compiler
+procedure that handles assignment statements, first checks for the special
+case of an array assignment and then, only if the left side of the
+assignment is a scalar, invokes <CODE>pexpr</CODE> to parse a scalar expression.
+
+<P>In order to understand the code generated by the compiler, you should also
+know about the <EM>runtime</EM> data structures used by compiled programs.
+First, certain registers are reserved for special purposes:
+
+<P><TABLE>
+<TR><TH align="right">number<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">name<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">purpose
+<TR><TD>&nbsp;
+<TR><TD align="right">0<TD><TD><CODE>reg.zero</CODE><TD><TD>always contains zero
+<TR><TD align="right">1<TD><TD><CODE>reg.retaddr</CODE><TD><TD>return address from procedure call
+<TR><TD align="right">2<TD><TD><CODE>reg.stackptr</CODE><TD><TD>first free memory address
+<TR><TD align="right">3<TD><TD><CODE>reg.globalptr</CODE><TD><TD>address of global frame
+<TR><TD align="right">4<TD><TD><CODE>reg.frameptr</CODE><TD><TD>address of current frame
+<TR><TD align="right">5<TD><TD><CODE>reg.newfp</CODE><TD><TD>address of frame being made for procedure call
+<TR><TD align="right">6<TD><TD><CODE>reg.retval</CODE><TD><TD>return value from function
+<TR><TD align="right">7<TD><TD><CODE>reg.firstfree</CODE><TD><TD>first register available for expressions
+</TABLE>
+
+<P>We've already seen most of these while discussing stack frames.
+A Pascal function returns its result in register 6; the caller immediately
+copies the return value into some other register so that it won't be
+lost if the program calls another function, for a case like
+
+<P><PRE>x := f(3)+f(4)
+</PRE>
+
+<P>Whenever a register is needed to hold some computed value, the compiler
+calls the Logo procedure <CODE>newregister</CODE>, which finds the first register
+number starting from 7 that isn't currently in use.  When the value in
+a register is no longer needed, the compiler calls <CODE>regfree</CODE> to indicate
+that that register can be reassigned by <CODE>newregister</CODE>.
+
+<P>The other noteworthy runtime data structure is the use of slots within each
+frame for special purposes:
+
+<P><TABLE>
+<TR><TH align="right">number<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">name<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">purpose
+<TR><TD>&nbsp;
+<TR><TD align="right">0<TD><TD><CODE>frame.retaddr</CODE><TD><TD>address from which this procedure was called
+<TR><TD align="right">1<TD><TD><CODE>frame.save.newfp</CODE><TD><TD>saved register 3 while filling this new frame
+<TR><TD align="right">2<TD><TD><CODE>frame.outerframe</CODE><TD><TD>the frame lexically enclosing this one
+<TR><TD align="right">3<TD><TD><CODE>frame.prevframe</CODE><TD><TD>the frame from which this one was called
+<TR><TD align="right">4-35<TD><TD><CODE>frame.regsave</CODE><TD><TD>space for saving registers
+<TR><TD align="right">36<TD><TD><CODE>frame.retval</CODE><TD><TD>function return value
+</TABLE>
+
+<P>Why is there both a register and a frame slot for a function's return
+value?  Remember that the way you indicate the return value in a Pascal
+function is by assigning to the function's name as if it were a variable.
+Such an assignment is not necessarily the last instruction in the function;
+it may do more work after computing the return value.  The compiler notices
+as assignment to the function name and generates code to save the computed
+value in slot 36 of the current frame.  Then, when the function actually
+returns, the compiler generates the instruction
+
+<P><PRE>rload 6 36(4)
+</PRE>
+
+<P>to copy the return value into register 6.  The function's frame
+is about to be freed, so the caller can't look there
+for the return value; that's why a register is used.
+
+<P>Each frame includes a block of space for saving registers when another
+procedure is called.  That's because each procedure allocates register
+numbers independently; each starts with register 7 as the first free one.
+So if the registers weren't saved before a procedure call and restored
+after the call, the values in the registers would be lost.  (Although
+the frame has enough room to save all 32 registers, to make things simple,
+not all 32 are actually saved.  The compiler knows which registers contain
+active expression values at the moment of the procedure call, and it
+generates code to save and restore only the necessary ones.)
+
+<P>You might think it would be easier to have each procedure use a separate
+set of registers, so saving wouldn't be necessary.  But this
+doesn't work for two reasons.  First, there are only a few registers, and
+in a large program we'd run out.  Even more important, the compiled
+code for a <EM>recursive</EM> procedure is going to use the same registers
+in each invocation, so we certainly can't avoid saving registers in that
+situation.
+
+<P><H2>Code Generation</H2>
+
+
+<P>Let's look again at how the compiler handles a Pascal <CODE>if</CODE> statement:
+
+<P><PRE>to pif
+local [cond elsetag endtag]
+make &quot;cond pboolean pexpr
+make &quot;elsetag gensym
+make &quot;endtag gensym
+mustbe &quot;then
+code (list &quot;jumpf :cond (word &quot;&quot; :elsetag))
+regfree :cond
+statement
+code (list &quot;jump (word &quot;&quot; :endtag))
+code :elsetag
+ifbe &quot;else [statement]
+code :endtag
+end
+</PRE>
+
+<P>I showed you this procedure while talking about parsing, asking you to
+ignore the parts about code generation.  Now we'll come back to that part of
+the process.
+
+<P>The format of the <CODE>if</CODE> statement is either of these:
+
+<P><PRE>if <EM>condition</EM> then <EM>statement</EM>
+if <EM>condition</EM> then <EM>statement</EM> else <EM>statement</EM>
+</PRE>
+
+<P>(There is probably a semicolon after the statement, but it's
+not officially part of the <CODE>if</CODE>; it's part of the compound statement
+that contains the <CODE>if</CODE>.)  When we get to <CODE>pif</CODE>, the compiler has
+already read the token <CODE>if</CODE>; the next thing to read is an expression,
+which must be of type <CODE>boolean</CODE>, providing the condition part of the
+statement.
+
+<P>In the instruction
+
+<P><PRE>make &quot;cond pboolean pexpr
+</PRE>
+
+<P>the call to <CODE>pexpr</CODE> generates code for the expression and
+returns an expression list, in the format shown earlier.  The procedure <CODE>
+pboolean</CODE> does three things:  First, it checks the mode of the expression;
+if it's immediate, the value is loaded into a register.  Second, it checks
+the type of the expression to ensure that it really is boolean.  Third,
+<CODE>pboolean</CODE> returns just the register number, which will be used in code
+generated by <CODE>pif</CODE>.
+
+<P><PRE>to pboolean :expr [:pval noimmediate :expr]
+if equalp exp.type :pval &quot;boolean [output exp.value :pval]
+(throw &quot;error sentence exp.type :pval [not true or false])
+end
+
+to noimmediate :value
+if equalp exp.mode :value &quot;immediate ~
+   [localmake &quot;reg newregister
+    code (list &quot;addi :reg reg.zero exp.value :value)
+    output (list exp.type :value &quot;register :reg)]
+output :value
+end
+</PRE>
+
+<P>Overall, the code compiled for the <CODE>if</CODE> statement will look like this:
+
+<P><PRE>    ... get condition into register <EM>cond</EM> ...
+    jumpf <EM>cond</EM> &quot;g5
+    ... code for <CODE>then</CODE> statement ...
+    jump &quot;g6
+g5
+    ... code for <CODE>else</CODE> statement ...
+g6
+</PRE>
+
+<P>The labels <CODE>g5</CODE> and <CODE>g6</CODE> in this example are generated
+symbols; they'll be different each time.  The labels are generated by the
+instructions
+
+<P><PRE>make &quot;elsetag gensym
+make &quot;endtag gensym
+</PRE>
+
+<P>in <CODE>pif</CODE>.  After we call <CODE>pexpr</CODE> to generate the code
+for the conditional expression, we explicitly generate the <CODE>jumpf</CODE>
+instruction:
+
+<P><PRE>code (list &quot;jumpf :cond (word &quot;&quot; :elsetag))
+regfree :cond
+</PRE>
+
+<P>Notice that once we've generated the <CODE>jumpf</CODE> instruction, we
+no longer need the value in register <CODE>:cond</CODE>, and we call <CODE>regfree</CODE>
+to say so.  The rest of this code generation process should be easy to work
+out.  All of the structured statements (<CODE>for</CODE>, <CODE>while</CODE>, and <CODE>
+repeat</CODE>) are similarly simple.
+
+<P>The code generation for expressions is all in <CODE>ppopop</CODE>.  Most of the
+complexity of dealing with expressions is in the parsing, not in the code
+generation; by the time we get to <CODE>ppopop</CODE>, we know that we want to
+carry out a single operation on two values, both of which are either in
+registers or immediate values.  The simple case is that both are in
+registers; suppose, for example, that we are given the subtraction operation
+and the two operands are in registers 8 and 9.  Then we just generate the
+instruction
+
+<P><PRE>sub 8 8 9
+</PRE>
+
+<P>and declare register 9 free.  <CODE>Ppopop</CODE> is a little long,
+because it has to check for special cases such as immediate operands.
+Also, a unary minus is turned into a subtraction from register zero,
+since there is no unary <CODE>minus</CODE> operation in our simulated machine.
+
+<P>Ironically, it's the &quot;simple&quot; statements that are hardest to
+compile: assignment and procedure calling.  For procedure (or function)
+calling, the difficulty is in matching actual argument expressions with
+formal parameters.  Procedure <CODE>pproccall1</CODE> generates the instructions to
+manipulate frame pointers, as described earlier, and procedure <CODE>
+procargs</CODE> fills the newly-created frame with the actual argument values.
+(If an argument is an array passed by value, each member of the array must
+be copied into the new frame.)  Assignment, handled by procedure <CODE>
+passign</CODE> in the compiler, is similar to argument passing; a value must be
+computed and then stored into a frame.  I wouldn't be too upset if you
+decide to stop here and take code generation for memory references on faith.
+
+<P>Suppose we are compiling the assignment
+
+<P><PRE>x := <EM>expression</EM>
+</PRE>
+
+<P><CODE>Passign</CODE> reads the name <CODE>x</CODE> and uses <CODE>getid</CODE> to find
+the information associated with that name.  If the assignment is to an array
+member, then <CODE>passign</CODE> must also read the array indices, but let's say
+that we are assigning to a scalar variable, to keep it simple.
+
+<P><PRE>to passign
+local [name id type index value pointer target]
+<U>make &quot;name token</U>
+make &quot;index []
+ifbe &quot;|[| [make &quot;index commalist [pexpr] mustbe &quot;|]|]
+mustbe &quot;|:=|
+<U>make &quot;id getid :name</U>
+<U>make &quot;pointer id.pointer :id</U>
+<U>make &quot;type id.type :id</U>
+passign1
+end
+</PRE>
+
+<P>Procedure <CODE>passign1</CODE> contains the steps that are in common between
+ordinary assignment (handled by <CODE>passign</CODE>) and assignment to the name of
+the current function, to set the return value (handled by <CODE>pfunset</CODE>,
+which you can read in the complete listing at the end of the chapter).
+
+<P><PRE>to passign1
+if and (listp :type) (emptyp :index) [parrayassign :id  stop]
+setindex &quot;false
+<U>make &quot;value check.type :type pexpr</U>
+<U>codestore :value (id.pointer :id) (id.varp :id) :index</U>
+regfree :value
+end
+</PRE>
+
+<P>We call <CODE>pexpr</CODE> to generate the code to compute the
+expression. <CODE>Check.type</CODE> is like <CODE>pboolean</CODE>, which you saw earlier,
+except that it takes the desired type as an argument.  It returns the
+number of the register that contains the expression value.
+
+<P>The real work is done by <CODE>codestore</CODE>, which takes four inputs.
+The first is the register number whose value should be stored; the other
+three inputs indicate where in memory the value should go.  First comes the
+pointer from the identifier list; this, you'll recall, tells us the lexical
+depth at which the variable was declared and the offset within its frame
+where the variable is kept.  Next is a true or false value indicating
+whether or not this variable is a <CODE>var</CODE> parameter; if so, then its value
+is a pointer to the variable whose value we <EM>really</EM> want to change.
+Finally, the <EM>index</EM> input will be zero for a scalar variable, or the
+number of a register containing the
+array index for an array member.  (Procedure <CODE>lindex</CODE>, whose name stands
+for &quot;linear index,&quot; has been called to generate code to convert the possible
+multi-dimensional indices, with possibly varying starting values, into a
+single number indicating the position within the array, starting from zero
+for the first member.)
+
+<P><PRE>to codestore :reg :pointer :varflag :index
+localmake &quot;target memsetup :pointer :varflag :index
+code (list &quot;store :reg targetaddr)
+regfree last :target
+end
+</PRE>
+
+<P>(There is a similar procedure <CODE>codeload</CODE> used to generate
+the code to load a variable's value into a register.)  <CODE>Codestore</CODE>
+invokes a subprocedure <CODE>memsetup</CODE> whose job is to work out an
+appropriate operand for an <CODE>rload</CODE> or <CODE>store</CODE> machine instruction.
+That operand must be an offset and an index register, such as <CODE>41(4)</CODE>.
+What <CODE>memsetup</CODE> returns is a list of the two numbers, in this case
+<CODE>[41 4]</CODE>.  Procedure <CODE>targetaddr</CODE> turns that into the right notation
+for use in the instruction.
+
+<P><CODE>Memsetup</CODE> is the most complicated procedure in the compiler, because
+there are so many special cases.  I'll describe the easy cases here.  Suppose
+that we are dealing with a scalar variable that isn't a <CODE>var</CODE> parameter.
+Then there are three cases.  If the lexical depth of that variable is equal
+to the current lexical depth, then this variable is declared in the same
+block that we're compiling.  In that case, we use register 4 (the current
+frame pointer) as the index register, and the variable's frame slot as the
+offset.  If the variable's lexical depth is zero, then it's a global
+variable.  In that case, we use register 3 (the global frame pointer) as
+the index register, and the variable's frame slot as the offset.  If the
+variable's depth is something other than zero or the current depth, then
+we have to find a pointer to the variable's own frame by looking in the
+current frame's <CODE>frame.outerframe</CODE> slot, and perhaps in <EM>that</EM>
+frame's <CODE>frame.outerframe</CODE> slot, as many times as the difference between
+the current depth and the variable's depth.
+
+<P>If the variable is a <CODE>var</CODE> parameter, then we go through the same cases
+just described, and then load the value of that variable (which is a pointer
+to the variable we really want) into a register.  We use that new register
+as the index register, and zero as the offset.
+
+<P>If the variable is an array member, then we must add the linear index
+(which is already in a register) to the offset as computed so far.
+
+<P>Perhaps an example will help sort this out.  Here is the compiled version
+of the <CODE>tower</CODE> program, with annotations:
+
+<P><PRE>[           [add 3 0 0]                   set up initial pointers
+            [add 4 0 0]
+            [addi 2 0 36]
+            [jump &quot;g1]                    jump to main program
+
+%hanoi      [store 1 0(4)]                save return value
+            [jump &quot;g2]                    jump to body of hanoi
+
+%movedisk   [store 1 0(4)]
+            [jump &quot;g3]
+
+g3          [putstr 1 [Move disk ]]       body of movedisk
+            [rload 7 36(4)]
+            [putint 1 7]                  write(number:1)
+            [putstr 1 [ from ]]
+            [rload 7 37(4)]
+            [putch 1 7]                   write(from:1)
+            [putstr 1 [ to ]]
+            [rload 7 38(4)]
+            [putch 1 7]                   write(to:1)
+            [newline]
+            [rload 1 0(4)]                reload return address
+            [add 2 4 0]                   free stack frame
+            [rload 4 3(2)]
+            [jr 1]                        return to caller
+
+g2          [rload 7 36(4)]               body of hanoi
+            [neqi 7 7 0]                  if number &lt;&gt; 0
+            [jumpf 7 &quot;g4]
+            [store 5 1(2)]                allocate new frame
+            [add 5 2 0]
+            [addi 2 2 40]
+            [store 4 3(5)]                set previous frame
+            [rload 7 2(4)]
+            [store 7 2(5)]                set enclosing frame
+            [rload 7 36(4)]
+            [subi 7 7 1]
+            [store 7 36(5)]               first arg is number-1
+            [rload 7 37(4)]
+            [store 7 37(5)]               next arg is from
+            [rload 7 39(4)]
+            [store 7 38(5)]               next arg is other
+            [rload 7 38(4)]
+            [store 7 39(5)]               next arg is onto
+            [add 4 5 0]                   switch to new frame
+            [rload 5 1(4)]
+            [jal 1 &quot;%hanoi]               recursive call
+
+            [store 5 1(2)]                set up for movedisk
+            [add 5 2 0]
+            [addi 2 2 39]
+            [store 4 3(5)]
+            [store 4 2(5)]                note different enclosing frame
+            [rload 7 36(4)]
+            [store 7 36(5)]               copy args
+            [rload 7 37(4)]
+            [store 7 37(5)]
+            [rload 7 38(4)]
+            [store 7 38(5)]
+            [add 4 5 0]
+            [rload 5 1(4)]
+            [jal 1 &quot;%movedisk]            call movedisk
+
+            [store 5 1(2)]                second recursive call
+            [add 5 2 0]
+            [addi 2 2 40]
+            [store 4 3(5)]
+            [rload 7 2(4)]
+            [store 7 2(5)]
+            [rload 7 36(4)]
+            [subi 7 7 1]
+            [store 7 36(5)]
+            [rload 7 39(4)]
+            [store 7 37(5)]
+            [rload 7 38(4)]
+            [store 7 38(5)]
+            [rload 7 37(4)]
+            [store 7 39(5)]
+            [add 4 5 0]
+            [rload 5 1(4)]
+            [jal 1 &quot;%hanoi]
+            [jump &quot;g5]                    end of if...then
+
+g4
+g5          [rload 1 0(4)]                return to caller
+            [add 2 4 0]
+            [rload 4 3(2)]
+            [jr 1]
+
+g1          [store 5 1(2)]                body of main program
+            [add 5 2 0]                   prepare to call hanoi
+            [addi 2 2 40]
+            [store 4 3(5)]
+            [store 4 2(5)]
+            [addi 7 0 5]                  constant argument 5
+            [store 7 36(5)]
+            [addi 7 0 97]                 ASCII code for 'a'
+            [store 7 37(5)]
+            [addi 7 0 98]                 ASCII code for 'b'
+            [store 7 38(5)]
+            [addi 7 0 99]                 ASCII code for 'c'
+            [store 7 39(5)]
+            [add 4 5 0]
+            [rload 5 1(4)]
+            [jal 1 &quot;%hanoi]               call hanoi
+            [exit]
+]
+</PRE>
+
+<P>
+<TABLE width="100%"><TR><TD><A HREF="../v3-toc2.html">(back to Table of Contents)</A>
+<TD align="right"><A HREF="https://people.eecs.berkeley.edu/~bh/v3ch4/v3ch4.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../v3ch6/v3ch6.html"><STRONG>NEXT</STRONG></A>
+</TABLE>
+
+<P><H2>Program Listing</H2>
+
+<P><PRE>
+to compile :file
+if namep "peekchar [ern "peekchar]
+if namep "peektoken [ern "peektoken]
+if not namep "idlist [opsetup]
+if not emptyp :file [openread :file]
+setread :file
+ignore error
+catch "error [program]
+localmake "error error
+if not emptyp :error [print first butfirst :error]
+setread []
+if not emptyp :file [close :file]
+end
+
+;; Global setup
+
+to opsetup
+make "numregs 32
+make "memsize 3000
+pprop "|=| "binary [eql 2 [boolean []] 1]
+pprop "|<>| "binary [neq 2 [boolean []] 1]
+pprop "|<| "binary [less 2 [boolean []] 1]
+pprop "|>| "binary [gtr 2 [boolean []] 1]
+pprop "|<=| "binary [leq 2 [boolean []] 1]
+pprop "|>=| "binary [geq 2 [boolean []] 1]
+pprop "|+| "binary [add 2 [[] []] 2]
+pprop "|-| "binary [sub 2 [[] []] 2]
+pprop "or "binary [lor 2 [boolean boolean] 2]
+pprop "|*| "binary [mul 2 [[] []] 3]
+pprop "|/| "binary [quo 2 [real []] 3]
+pprop "div "binary [div 2 [integer integer] 3]
+pprop "mod "binary [rem 2 [integer integer] 3]
+pprop "and "binary [land 2 [boolean boolean] 3]
+pprop "|+| "unary [plus 1 [[] []] 4]
+pprop "|-| "unary [minus 1 [[] []] 4]
+pprop "not "unary [lnot 1 [boolean boolean] 4]
+make "idlist `[[trunc function int [1 ,[framesize.fun+1]]]
+               [round function round [1 ,[framesize.fun+1]]]
+               [random function random [1 ,[framesize.fun+1]]]]
+make "int [integer real]
+make "round [integer real]
+make "random [integer integer]
+end
+
+;; Block structure
+
+to program
+mustbe "program
+localmake "progname token
+ifbe "|(| [ignore commalist [id]  mustbe "|)|]
+mustbe "|;|
+localmake "lexical.depth 0
+localmake "namesused []
+localmake "needint "false
+localmake "needround "false
+localmake "needrandom "false
+localmake "idlist :idlist
+localmake "frame [0 0]
+localmake "id (list :progname "program (newlname :progname) :frame)
+push "idlist :id
+localmake "codeinto word "% :progname
+make :codeinto []
+localmake "framesize framesize.proc
+program1
+mustbe ".
+code [exit]
+foreach [int round random] "plibrary
+make :codeinto reverse thing :codeinto
+end
+
+to program1
+localmake "regsused (array :numregs 0)
+for [i reg.firstfree :numregs-1] [setitem :i :regsused "false]
+ifbe "var [varpart]
+.setfirst butfirst :frame :framesize
+if :lexical.depth = 0 [code (list "add reg.globalptr reg.zero reg.zero)
+                       code (list "add reg.frameptr reg.zero reg.zero)
+                       code (list "addi reg.stackptr reg.zero :framesize)]
+localmake "bodytag gensym
+code (list "jump (word "" :bodytag))
+tryprocpart
+code :bodytag
+mustbe "begin
+blockbody "end
+end
+
+to plibrary :func
+if not thing (word "need :func) [stop]
+code :func
+code (list "rload reg.firstfree (memaddr framesize.fun reg.frameptr))
+code (list (word "s :func) reg.retval reg.firstfree)
+code (list "add reg.stackptr reg.frameptr reg.zero)
+code (list "rload reg.frameptr (memaddr frame.prevframe reg.stackptr))
+code (list "jr reg.retaddr)
+end
+
+;; Variable declarations
+
+to varpart
+local [token namelist type]
+make "token token
+make "peektoken :token
+if reservedp :token [stop]
+vargroup
+foreach :namelist [newvar ? :type]
+mustbe "|;|
+varpart
+end
+
+to vargroup
+make "namelist commalist [id]
+mustbe ":
+ifbe "packed []
+make "type token
+ifelse equalp :type "array [make "type arraytype] [typecheck :type]
+end
+
+to id
+local "token
+make "token token
+if letterp ascii first :token [output :token]
+make "peektoken :token
+output []
+end
+
+to arraytype
+local [ranges type]
+mustbe "|[|
+make "ranges commalist [range]
+mustbe "|]|
+mustbe "of
+make "type token
+typecheck :type
+output list :type :ranges
+end
+
+to range
+local [first last]
+make "first range1
+mustbe "..
+make "last range1
+if :first > :last ~  
+   [(throw "error (sentence [array bounds not increasing:]
+                            :first ".. :last))]
+output list :first (1 + :last - :first)
+end
+
+to range1
+local "bound
+make "bound token
+if equalp first :bound "' [output ascii first butfirst :bound]
+if equalp :bound "|-| [make "bound minus token]
+if equalp :bound int :bound [output :bound]
+(throw "error sentence [array bound not ordinal:] :bound)
+end
+
+to typecheck :type
+if memberp :type [real integer char boolean] [stop]
+(throw "error sentence [undefined type] :type)
+end
+
+to newvar :pname :type
+if reservedp :pname [(throw "error sentence :pname [reserved word])]
+push "idlist (list :pname :type (list :lexical.depth :framesize) "false)
+make "framesize :framesize + ifelse listp :type [arraysize :type] [1]
+end
+
+to arraysize :type
+output reduce "product map [last ?] last :type
+end
+
+;; Procedure and function declarations
+
+to tryprocpart
+ifbeelse "procedure ~
+         [procedure tryprocpart] ~
+         [ifbe "function [function tryprocpart]]
+end
+
+to procedure
+proc1 "procedure framesize.proc
+end
+
+to function
+proc1 "function framesize.fun
+end
+
+to proc1 :proctype :framesize
+localmake "procname token
+localmake "lexical.depth :lexical.depth+1
+localmake "frame (list :lexical.depth 0)
+push "idlist (list :procname :proctype (newlname :procname) :frame)
+localmake "idlist :idlist
+make lname :procname []
+ifbe "|(| [arglist]
+if equalp :proctype "function ~
+   [mustbe ":
+    localmake "type token
+    typecheck :type
+    make lname :procname fput :type thing lname :procname]
+mustbe "|;|
+code lname :procname
+code (list "store reg.retaddr (memaddr frame.retaddr reg.frameptr))
+program1
+if equalp :proctype "function ~
+   [code (list "rload reg.retval (memaddr frame.retval reg.frameptr))]
+code (list "rload reg.retaddr (memaddr frame.retaddr reg.frameptr))
+code (list "add reg.stackptr reg.frameptr reg.zero)
+code (list "rload reg.frameptr (memaddr frame.prevframe reg.stackptr))
+code (list "jr reg.retaddr)
+mustbe "|;|
+end
+
+to arglist
+local [token namelist type varflag]
+make "varflag "false
+ifbe "var [make "varflag "true]
+vargroup
+foreach :namelist [newarg ? :type :varflag]
+ifbeelse "|;| [arglist] [mustbe "|)|]
+end
+
+to newarg :pname :type :varflag
+if reservedp :pname [(throw "error sentence :pname [reserved word])]
+localmake "pointer (list :lexical.depth :framesize)
+push "idlist (list :pname :type :pointer :varflag)
+make "framesize :framesize + ifelse (and listp :type not :varflag) ~
+                                    [arraysize :type] [1]
+queue lname :procname ifelse :varflag [list "var :type] [:type]
+end
+
+;; Statement part
+
+to blockbody :endword
+statement
+ifbeelse "|;| [blockbody :endword] [mustbe :endword]
+end
+
+to statement
+local [token type]
+ifbe "begin [compound stop]
+ifbe "for [pfor stop]
+ifbe "if [pif stop]
+ifbe "while [pwhile stop]
+ifbe "repeat [prepeat stop]
+ifbe "write [pwrite stop]
+ifbe "writeln [pwriteln stop]
+make "token token
+make "peektoken :token
+if memberp :token [|;| end until] [stop]
+make "type gettype :token
+if emptyp :type [(throw "error sentence :token [can't begin statement])]
+if equalp :type "procedure [pproccall stop]
+if equalp :type "function [pfunset stop]
+passign
+end
+
+;; Compound statement
+
+to compound
+blockbody "end
+end
+
+;; Structured statements
+
+to pfor
+local [var init step final looptag endtag testreg]
+make "var token
+mustbe "|:=|
+make "init pinteger pexpr
+make "step 1
+ifbeelse "downto [make "step -1] [mustbe "to]
+make "final pinteger pexpr
+mustbe "do
+make "looptag gensym
+make "endtag gensym
+code :looptag
+localmake "id getid :var
+codestore :init (id.pointer :id) (id.varp :id) 0
+make "testreg newregister
+code (list (ifelse :step<0 ["less] ["gtr]) :testreg :init :final)
+code (list "jumpt :testreg (word "" :endtag))
+regfree :testreg
+statement
+code (list "addi :init :init :step)
+code (list "jump (word "" :looptag))
+code :endtag
+regfree :init
+regfree :final
+end
+
+to prepeat
+local [cond looptag]
+make "looptag gensym
+code :looptag
+blockbody "until
+make "cond pboolean pexpr
+code (list "jumpf :cond (word "" :looptag))
+regfree :cond
+end
+
+to pif
+local [cond elsetag endtag]
+make "cond pboolean pexpr
+make "elsetag gensym
+make "endtag gensym
+mustbe "then
+code (list "jumpf :cond (word "" :elsetag))
+regfree :cond
+statement
+code (list "jump (word "" :endtag))
+code :elsetag
+ifbe "else [statement]
+code :endtag
+end
+
+to pwhile
+local [cond looptag endtag]
+make "looptag gensym
+make "endtag gensym
+code :looptag
+make "cond pboolean pexpr
+code (list "jumpf :cond (word "" :endtag))
+regfree :cond
+mustbe "do
+statement
+code (list "jump (word "" :looptag))
+code :endtag
+end
+
+;; Simple statements: write and writeln
+
+to pwrite
+mustbe "|(|
+pwrite1
+end
+
+to pwrite1
+pwrite2
+ifbe "|)| [stop]
+ifbeelse ", [pwrite1] [(throw "error [missing comma])]
+end
+
+to pwrite2
+localmake "result pwrite3
+ifbe ": [.setfirst (butfirst :result) token]
+code :result
+if not equalp first :result "putstr [regfree last :result]
+end
+
+to pwrite3
+localmake "token token
+if equalp first :token "' ~
+   [output (list "putstr 1 (list butlast butfirst :token))]
+make "peektoken :token
+localmake "result pexpr
+if equalp first :result "char [output (list "putch 1 pchar :result)]
+if equalp first :result "boolean [output (list "puttf 1 pboolean :result)]
+if equalp first :result "integer [output (list "putint 10 pinteger :result)]
+output (list "putreal 20 preal :result)
+end
+
+to pwriteln
+ifbe "|(| [pwrite1]
+code [newline]
+end
+
+;; Simple statements: procedure call
+
+to pproccall
+localmake "pname token
+localmake "id getid :pname
+localmake "lname id.lname :id
+localmake "vartypes thing :lname
+pproccall1 framesize.proc
+end
+
+to pproccall1 :offset
+code (list "store reg.newfp (memaddr frame.save.newfp reg.stackptr))
+code (list "add reg.newfp reg.stackptr reg.zero)
+code (list "addi reg.stackptr reg.stackptr (last id.frame :id))
+code (list "store reg.frameptr (memaddr frame.prevframe reg.newfp))
+localmake "newdepth first id.frame :id
+ifelse :newdepth > :lexical.depth ~
+       [code (list "store reg.frameptr
+                   (memaddr frame.outerframe reg.newfp))] ~
+       [localmake "tempreg newregister
+        code (list "rload :tempreg (memaddr frame.outerframe reg.frameptr))
+        repeat (:lexical.depth - :newdepth)
+               [code (list "rload :tempreg 
+                           (memaddr frame.outerframe :tempreg))]
+        code (list "store :tempreg (memaddr frame.outerframe reg.newfp))
+        regfree :tempreg]
+if not emptyp :vartypes [mustbe "|(|  procargs :vartypes :offset]
+for [i reg.firstfree :numregs-1] ~
+    [if item :i :regsused
+        [code (list "store :i (memaddr frame.regsave+:i reg.frameptr))]]
+code (list "add reg.frameptr reg.newfp reg.zero)
+code (list "rload reg.newfp (memaddr frame.save.newfp reg.frameptr))
+code (list "jal reg.retaddr (word "" :lname))
+for [i reg.firstfree :numregs-1] ~
+    [if item :i :regsused
+        [code (list "rload :i (memaddr frame.regsave+:i reg.frameptr))]]
+end
+
+to procargs :types :offset
+if emptyp :types [mustbe "|)|  stop]
+localmake "next procarg first :types :offset
+if not emptyp butfirst :types [mustbe ",]
+procargs butfirst :types :offset+:next
+end
+
+to procarg :type :offset
+local "result
+if equalp first :type "var [output procvararg last :type]
+if listp :type [output procarrayarg :type]
+make "result check.type :type pexpr
+code (list "store :result (memaddr :offset reg.newfp))
+regfree :result
+output 1
+end
+
+to procvararg :ftype
+local [pname id type index]
+make "pname token
+make "id getid :pname
+make "type id.type :id
+ifelse wordp :ftype ~
+       [setindex "true] ~
+       [make "index 0]
+if not equalp :type :ftype ~
+   [(throw "error sentence :pname [arg wrong type])]
+localmake "target memsetup (id.pointer :id) (id.varp :id) :index
+localmake "tempreg newregister
+code (list "addi :tempreg (last :target) (first :target))
+code (list "store :tempreg (memaddr :offset reg.newfp))
+regfree last :target
+regfree :tempreg
+output 1
+end
+
+to procarrayarg :type
+localmake "pname token
+localmake "id getid :pname
+if not equalp :type (id.type :id) ~
+   [(throw "error (sentence "array :pname [wrong type for arg]))]
+localmake "size arraysize :type
+localmake "rtarget memsetup (id.pointer :id) (id.varp :id) 0
+localmake "pointreg newregister
+code (list "addi :pointreg reg.newfp :offset)
+localmake "ltarget (list 0 :pointreg)
+copyarray
+output :size
+end
+
+;; Simple statements: assignment statement (including function value)
+
+to passign
+local [name id type index value pointer target]
+make "name token
+make "index []
+ifbe "|[| [make "index commalist [pexpr] mustbe "|]|]
+mustbe "|:=|
+make "id getid :name
+make "pointer id.pointer :id
+make "type id.type :id
+passign1
+end
+
+to pfunset
+local [name id type index value pointer target]
+make "name token
+make "index []
+if not equalp :name :procname ~
+   [(throw "error sentence [assign to wrong function] :name)]
+mustbe "|:=|
+make "pointer (list :lexical.depth frame.retval)
+make "type first thing lname :name
+make "id (list :name :type :pointer "false)
+passign1
+end
+
+to passign1
+if and (listp :type) (emptyp :index) [parrayassign :id  stop]
+setindex "false
+make "value check.type :type pexpr
+codestore :value (id.pointer :id) (id.varp :id) :index
+regfree :value
+end
+
+to noimmediate :value
+if equalp exp.mode :value "immediate ~
+   [localmake "reg newregister
+    code (list "addi :reg reg.zero exp.value :value)
+    output (list exp.type :value "register :reg)]
+output :value
+end
+
+to check.type :type :result
+if equalp :type "real [output preal :result]
+if equalp :type "integer [output pinteger :result]
+if equalp :type "char [output pchar :result]
+if equalp :type "boolean [output pboolean :result]
+end
+
+to preal :expr [:pval noimmediate :expr]
+if equalp exp.type :pval "real [output exp.value :pval]
+output pinteger :pval
+end
+
+to pinteger :expr [:pval noimmediate :expr]
+local "type
+make "type exp.type :pval
+if memberp :type [integer boolean char] [output exp.value :pval]
+(throw "error sentence exp.type :pval [isn't ordinal])
+end
+
+to pchar :expr [:pval noimmediate :expr]
+if equalp exp.type :pval "char [output exp.value :pval]
+(throw "error sentence exp.type :pval [not character value])
+end
+
+to pboolean :expr [:pval noimmediate :expr]
+if equalp exp.type :pval "boolean [output exp.value :pval]
+(throw "error sentence exp.type :pval [not true or false])
+end
+
+to parrayassign :id
+localmake "right token
+if equalp first :right "' ~
+   [pstringassign :type (butlast butfirst :right)  stop]
+localmake "rid getid :right
+if not equalp (id.type :id) (id.type :rid) ~
+   [(throw "error (sentence "arrays :name "and :right [unequal types]))]
+localmake "size arraysize id.type :id
+localmake "ltarget memsetup (id.pointer :id) (id.varp :id) 0
+localmake "rtarget memsetup (id.pointer :rid) (id.varp :rid) 0
+copyarray
+end
+
+to pstringassign :type :string
+if not equalp first :type "char [stringlose]
+if not emptyp butfirst last :type [stringlose]
+if not equalp (last first last :type) (count :string) [stringlose]
+localmake "ltarget memsetup (id.pointer :id) (id.varp :id) 0
+pstringassign1 newregister (first :ltarget) (last :ltarget) :string
+regfree last :ltarget
+end
+
+to pstringassign1 :tempreg :offset :reg :string
+if emptyp :string [regfree :tempreg  stop]
+code (list "addi :tempreg reg.zero ascii first :string)
+code (list "store :tempreg (memaddr :offset :reg))
+pstringassign1 :tempreg :offset+1 :reg (butfirst :string)
+end
+
+to stringlose
+(throw "error sentence :name [not string array or wrong size])
+end
+
+;; Multiple array indices to linear index computation
+
+to setindex :parseflag
+ifelse listp :type ~
+       [if :parseflag
+           [mustbe "|[|  make "index commalist [pexpr]  mustbe "|]| ]
+        make "index lindex last :type :index
+        make "type first :type] ~
+       [make "index 0]
+end
+
+to lindex :bounds :index
+output lindex1 (offset pinteger noimmediate first :index
+                       first first :bounds) ~
+               butfirst :bounds butfirst :index
+end
+
+to lindex1 :sofar :bounds :index
+if emptyp :bounds [output :sofar]
+output lindex1 (nextindex :sofar
+                          last first :bounds
+                          pinteger noimmediate first :index
+                          first first :bounds) ~
+               butfirst :bounds butfirst :index
+end
+
+to nextindex :old :factor :new :offset
+code (list "muli :old :old :factor)
+localmake "newreg offset :new :offset
+code (list "add :old :old :newreg)
+regfree :newreg
+output :old
+end
+
+to offset :indexreg :lowbound
+if not equalp :lowbound 0 [code (list "subi :indexreg :indexreg :lowbound)]
+output :indexreg
+end
+
+;; Memory interface: load and store instructions
+
+to codeload :reg :pointer :varflag :index
+localmake "target memsetup :pointer :varflag :index
+code (list "rload :reg targetaddr)
+regfree last :target
+end
+
+to codestore :reg :pointer :varflag :index
+localmake "target memsetup :pointer :varflag :index
+code (list "store :reg targetaddr)
+regfree last :target
+end
+
+to targetaddr
+output memaddr (first :target) (last :target)
+end
+
+to memaddr :offset :index
+output (word :offset "\( :index "\))
+end
+
+to memsetup :pointer :varflag :index
+localmake "depth first :pointer
+localmake "offset last :pointer
+local "newreg
+ifelse equalp :depth 0 ~
+       [make "newreg reg.globalptr] ~
+       [ifelse equalp :depth :lexical.depth
+               [make "newreg reg.frameptr]
+               [make "newreg newregister
+                code (list "rload :newreg
+                           (memaddr frame.outerframe reg.frameptr))
+                repeat (:lexical.depth - :depth) - 1
+                       [code (list "rload :newreg
+                                   (memaddr frame.outerframe :newreg))]]]
+if :varflag ~
+   [ifelse :newreg = reg.frameptr
+           [make "newreg newregister
+            code (list "rload :newreg (memaddr :offset reg.frameptr))]
+           [code (list "rload :newreg (memaddr :offset :newreg))]
+    make "offset 0]
+if not equalp :index 0 ~
+   [code (list "add :index :index :newreg)
+    regfree :newreg
+    make "newreg :index]
+output list :offset :newreg
+end
+
+to copyarray
+localmake "looptag gensym
+localmake "sizereg newregister
+code (list "addi :sizereg reg.zero :size)
+code :looptag
+localmake "tempreg newregister
+code (list "rload :tempreg (memaddr (first :rtarget) (last :rtarget)))
+code (list "store :tempreg (memaddr (first :ltarget) (last :ltarget)))
+code (list "addi (last :rtarget) (last :rtarget) 1)
+code (list "addi (last :ltarget) (last :ltarget) 1)
+code (list "subi :sizereg :sizereg 1)
+code (list "gtr :tempreg :sizereg reg.zero)
+code (list "jumpt :tempreg (word "" :looptag))
+regfree :sizereg
+regfree :tempreg
+regfree last :ltarget
+regfree last :rtarget
+end
+
+;; Expressions
+
+to pexpr
+local [opstack datastack parenlevel]
+make "opstack [[popen 1 0]]
+make "datastack []
+make "parenlevel 0
+output pexpr1
+end
+
+to pexpr1
+local [token op]
+make "token token
+while [equalp :token "|(|] [popen  make "token token]
+make "op pgetunary :token
+if not emptyp :op [output pexprop :op]
+push "datastack pdata :token
+make "token token
+while [and (:parenlevel > 0) (equalp :token "|)| )] ~
+      [pclose  make "token token]
+make "op pgetbinary :token
+if not emptyp :op [output pexprop :op]
+make "peektoken :token
+pclose
+if not emptyp :opstack [(throw "error [too many operators])]
+if not emptyp butfirst :datastack [(throw "error [too many operands])]
+output pop "datastack
+end
+
+to pexprop :op
+while [(op.prec :op) < (1 + op.prec first :opstack)] [ppopop]
+push "opstack :op
+output pexpr1
+end
+
+to popen
+push "opstack [popen 1 0]
+make "parenlevel :parenlevel + 1
+end
+
+to pclose
+while [(op.prec first :opstack) > 0] [ppopop]
+ignore pop "opstack
+make "parenlevel :parenlevel - 1
+end
+
+to pgetunary :token
+output gprop :token "unary
+end
+
+to pgetbinary :token
+output gprop :token "binary
+end
+
+to ppopop
+local [op function args left right type reg]
+make "op pop "opstack
+make "function op.instr :op
+if equalp :function "plus [stop]
+make "args op.nargs :op
+make "right pop "datastack
+make "left (ifelse equalp :args 2 [pop "datastack] [[[] []]])
+make "type pnewtype :op exp.type :left exp.type :right
+if equalp exp.mode :left "immediate ~
+   [localmake "leftreg newregister
+    code (list "addi :leftreg reg.zero exp.value :left)
+    make "left (list exp.type :left "register :leftreg)]
+ifelse equalp exp.mode :left "register ~
+       [make "reg exp.value :left] ~
+       [ifelse equalp exp.mode :right "register
+               [make "reg exp.value :right]
+               [make "reg newregister]]
+if equalp :function "minus ~
+   [make "left (list exp.type :right "register reg.zero)
+    make "function "sub
+    make "args 2]
+if equalp exp.mode :right "immediate ~
+   [make "function word :function "i]
+ifelse equalp :args 2 ~
+       [code (list :function :reg exp.value :left exp.value :right)] ~
+       [code (list :function :reg exp.value :right)]
+if not equalp :reg exp.value :left [regfree exp.value :left]
+if (and (equalp exp.mode :right "register)
+        (not equalp :reg exp.value :right)) ~
+   [regfree exp.value :right]
+push "datastack (list :type "register :reg)
+end
+
+to pnewtype :op :ltype :rtype
+local "type
+make "type op.types :op
+if emptyp :ltype [make "ltype :rtype]
+if not emptyp last :type [pchecktype last :type :ltype :rtype]
+if and (equalp :ltype "real) (equalp :rtype "integer) [make "rtype "real]
+if and (equalp :ltype "integer) (equalp :rtype "real) [make "ltype "real]
+if not equalp :ltype :rtype [(throw "error [type clash])]
+if emptyp last :type ~
+   [if not memberp :rtype [integer real]
+       [(throw "error [nonarithmetic type])]]
+if emptyp first :type [output :rtype]
+output first :type
+end
+
+to pchecktype :want :left :right
+if not equalp :want :left [(throw "error (sentence :left "isn't :want))]
+if not equalp :want :right [(throw "error (sentence :right "isn't :want))]
+end
+
+;; Expression elements
+
+to pdata :token
+if equalp :token "true [output [boolean immediate 1]]
+if equalp :token "false [output [boolean immediate 0]]
+if equalp first :token "' [output pchardata :token]
+if numberp :token [output (list numtype :token "immediate :token)]
+localmake "id getid :token
+if emptyp :id [(throw "error sentence [undefined symbol] :token)]
+localmake "type id.type :id
+if equalp :type "function [output pfuncall :token]
+local "index
+setindex "true
+localmake "reg newregister
+codeload :reg (id.pointer :id) (id.varp :id) :index
+output (list :type "register :reg)
+end
+
+to pchardata :token
+if not equalp count :token 3 ~
+   [(throw "error sentence :token [not single character])]
+output (list "char "immediate ascii first butfirst :token)
+end
+
+to numtype :number
+if memberp ". :number [output "real]
+if memberp "e :number [output "real]
+output "integer
+end
+
+to pfuncall :pname
+localmake "id getid :pname
+localmake "lname id.lname :id
+if namep (word "need :lname) [make (word "need :lname) "true]
+localmake "vartypes thing :lname
+localmake "returntype first :vartypes
+make "vartypes butfirst :vartypes
+pproccall1 framesize.fun
+localmake "reg newregister
+code (list "add :reg reg.retval reg.zero)
+output (list :returntype "register :reg)
+end
+
+;; Parsing assistance
+
+to code :stuff
+if emptyp :stuff [stop]
+push :codeinto :stuff
+end
+
+to commalist :test [:sofar []]
+local [result token]
+make "result run :test
+if emptyp :result [output :sofar]
+ifbe ", [output (commalist :test (lput :result :sofar))]
+output lput :result :sofar
+end
+
+.macro ifbe :wanted :action
+localmake "token token
+if equalp :token :wanted [output :action]
+make "peektoken :token
+output []
+end
+
+.macro ifbeelse :wanted :action :else
+localmake "token token
+if equalp :token :wanted [output :action]
+make "peektoken :token
+output :else
+end
+
+to mustbe :wanted
+localmake "token token
+if equalp :token :wanted [stop]
+(throw "error (sentence "expected :wanted "got :token))
+end
+
+to newregister
+for [i reg.firstfree :numregs-1] ~
+    [if not item :i :regsused [setitem :i :regsused "true  output :i]]
+(throw "error [not enough registers available])
+end
+
+to regfree :reg
+setitem :reg :regsused "false
+end
+
+to reservedp :word
+output memberp :word [and array begin case const div do downto else end ~
+                      file for forward function goto if in label mod nil ~
+                      not of packed procedure program record repeat set ~
+                      then to type until var while with]
+end
+
+;; Lexical analysis
+
+to token
+local [token char]
+if namep "peektoken [make "token :peektoken
+                     ern "peektoken   output :token]
+make "char getchar
+if equalp :char "|{| [skipcomment output token]
+if equalp :char char 32 [output token]
+if equalp :char char 13 [output token]
+if equalp :char char 10 [output token]
+if equalp :char "' [output string "']
+if memberp :char [+ - * / = ( , ) |[| |]| |;|] [output :char]
+if equalp :char "|<| [output twochar "|<| [= >]]
+if equalp :char "|>| [output twochar "|>| [=]]
+if equalp :char ". [output twochar ". [.]]
+if equalp :char ": [output twochar ": [=]]
+if numberp :char [output number :char]
+if letterp ascii :char [output token1 lowercase :char]
+(throw "error sentence [unrecognized character:] :char)
+end
+
+to skipcomment
+if equalp getchar "|}| [stop]
+skipcomment
+end
+
+to string :string
+local "char
+make "char getchar
+if not equalp :char "' [output string word :string :char]
+make "char getchar
+if equalp :char "' [output string word :string :char]
+make "peekchar :char
+output word :string "'
+end
+
+to twochar :old :ok
+localmake "char getchar
+if memberp :char :ok [output word :old :char]
+make "peekchar :char
+output :old
+end
+
+to number :num
+local "char
+make "char getchar
+if equalp :char ". ~
+   [make "char getchar ~
+    ifelse equalp :char ". ~
+           [make "peektoken ".. output :num] ~
+           [make "peekchar :char output number word :num ".]]
+if equalp :char "e [output number word :num twochar "e [+ -]]
+if numberp :char [output number word :num :char]
+make "peekchar :char
+output :num
+end
+
+to token1 :token
+local "char
+make "char getchar
+if or letterp ascii :char numberp :char ~
+   [output token1 word :token lowercase :char]
+make "peekchar :char
+output :token
+end
+
+to letterp :code
+if and (:code > 64) (:code < 91) [output "true]
+output and (:code > 96) (:code < 123)
+end
+
+to getchar
+local "char
+if namep "peekchar [make "char :peekchar  ern "peekchar  output :char]
+if eofp [output char 1]
+output rc1
+end
+
+to rc1
+local "result
+make "result readchar
+type :result
+output :result
+end
+
+;; Data abstraction: ID List
+
+to newlname :word
+if memberp :word :namesused [output gensym]
+if namep word "% :word [output gensym]
+push "namesused :word
+output word "% :word
+end
+
+to lname :word
+local "result
+make "result getid :word
+if not emptyp :result [output item 3 :result]
+(throw "error sentence [unrecognized identifier] :word)
+end
+
+to gettype :word
+local "result
+make "result getid :word
+if not emptyp :result [output item 2 :result]
+(throw "error sentence [unrecognized identifier] :word)
+end
+
+to getid :word [:list :idlist]
+if emptyp :list [output []]
+if equalp :word first first :list [output first :list]
+output (getid :word butfirst :list)
+end
+
+to id.type :id
+output item 2 :id
+end
+
+to id.pointer :id
+output item 3 :id
+end
+
+to id.lname :id
+output item 3 :id
+end
+
+to id.varp :id
+output item 4 :id
+end
+
+to id.frame :id
+output item 4 :id
+end
+
+;; Data abstraction: Frame slots
+
+to frame.retaddr
+output 0
+end
+
+to frame.save.newfp
+output 1
+end
+
+to frame.outerframe
+output 2
+end
+
+to frame.prevframe
+output 3
+end
+
+to frame.regsave
+output 4
+end
+
+to framesize.proc
+output 4+:numregs
+end
+
+to frame.retval
+output 4+:numregs
+end
+
+to framesize.fun
+output 5+:numregs
+end
+
+;; Data abstraction: Operators
+
+to op.instr :op
+output first :op
+end
+
+to op.nargs :op
+output first bf :op
+end
+
+to op.types :op
+output item 3 :op
+end
+
+to op.prec :op
+output last :op
+end
+
+;; Data abstraction: Expressions
+
+to exp.type :exp
+output first :exp
+end
+
+to exp.mode :exp
+output first butfirst :exp
+end
+
+to exp.value :exp
+output last :exp
+end
+
+;; Data abstraction: Registers
+
+to reg.zero
+output 0
+end
+
+to reg.retaddr
+output 1
+end
+
+to reg.stackptr
+output 2
+end
+
+to reg.globalptr
+output 3
+end
+
+to reg.frameptr
+output 4
+end
+
+to reg.newfp
+output 5
+end
+
+to reg.retval
+output 6
+end
+
+to reg.firstfree
+output 7
+end
+
+;; Runtime (machine simulation)
+
+to prun :progname
+localmake "prog thing word "% :progname
+localmake "regs (array :numregs 0)
+local filter "wordp :prog
+foreach :prog [if wordp ? [make ? ?rest]]
+localmake "memory (array :memsize 0)
+setitem 0 :regs 0
+if not procedurep "add [runsetup]
+prun1 :prog
+end
+
+to prun1 :pc
+if emptyp :pc [stop]
+if listp first :pc [run first :pc]
+prun1 butfirst :pc
+end
+
+to rload :reg :offset :index
+setitem :reg :regs (item (item :index :regs)+:offset :memory)
+end
+
+to store :reg :offset :index
+setitem (item :index :regs)+:offset :memory (item :reg :regs)
+end
+
+to runsetup
+foreach [[add sum] [sub difference] [mul product] [quo quotient]
+         [div [int quotient]] [rem remainder] [land product]
+         [lor [tobool lessp 0 sum]] [eql [tobool equalp]]
+         [neq [tobool not equalp]] [less [tobool lessp]]
+         [gtr [tobool greaterp]] [leq [tobool not greaterp]]
+         [geq [tobool not lessp]]] ~
+        [define first ?
+                `[[dest src1 src2]
+                  [setitem :dest :regs ,@[last ?] (item :src1 :regs)
+                                                  (item :src2 :regs)]]
+         define word first ? "i
+                `[[dest src1 immed]
+                  [setitem :dest :regs ,@[last ?] (item :src1 :regs)
+                                                  :immed]]]
+foreach [[lnot [difference 1]] [sint int] [sround round] [srandom random]] ~
+        [define first ?
+                `[[dest src]
+                  [setitem :dest :regs ,@[last ?] (item :src :regs)]]
+         define word first ? "i
+                `[[dest immed]
+                  [setitem :dest :regs ,@[last ?] :immed]]]
+end
+
+to tobool :tf
+output ifelse :tf [1] [0]
+end
+
+to jump :label
+make "pc fput :label thing :label
+end
+
+to jumpt :reg :label
+if (item :reg :regs)=1 [jump :label]
+end
+
+to jumpf :reg :label
+if (item :reg :regs)=0 [jump :label]
+end
+
+to jr :reg
+make "pc item :reg :regs
+end
+
+to jal :reg :label
+setitem :reg :regs :pc
+jump :label
+end
+
+to putch :width :reg
+spaces :width 1
+type char (item :reg :regs)
+end
+
+to putstr :width :string
+spaces :width (count first :string)
+type :string
+end
+
+to puttf :width :bool
+spaces :width 1
+type ifelse (item :bool :regs)=0 ["F] ["T]
+end
+
+to putint :width :reg
+localmake "num (item :reg :regs)
+spaces :width count :num
+type :num
+end
+
+to putreal :width :reg
+putint :width :reg
+end
+
+to spaces :width :count
+if :width > :count [repeat :width - :count [type "| |]]
+end
+
+to newline
+print []
+end
+
+to exit
+make "pc [exit]
+end
+</PRE>
+
+<P><A HREF="../v3-toc2.html">(back to Table of Contents)</A>
+<P><A HREF="https://people.eecs.berkeley.edu/~bh/v3ch4/v3ch4.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../v3ch6/v3ch6.html"><STRONG>NEXT</STRONG></A>
+
+<P>
+<ADDRESS>
+<A HREF="../index.html">Brian Harvey</A>, 
+<CODE>bh@cs.berkeley.edu</CODE>
+</ADDRESS>
+</BODY>
+</HTML>