diff options
author | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
---|---|---|
committer | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
commit | 562a9a52d599d9a05f871404050968a5fd282640 (patch) | |
tree | 7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/v3ch4/langd.html | |
parent | 5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff) | |
download | tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz |
*
Diffstat (limited to 'js/games/nluqo.github.io/~bh/v3ch4/langd.html')
-rw-r--r-- | js/games/nluqo.github.io/~bh/v3ch4/langd.html | 2310 |
1 files changed, 2310 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/v3ch4/langd.html b/js/games/nluqo.github.io/~bh/v3ch4/langd.html new file mode 100644 index 0000000..f3f01cc --- /dev/null +++ b/js/games/nluqo.github.io/~bh/v3ch4/langd.html @@ -0,0 +1,2310 @@ +<HTML> +<HEAD> +<TITLE>Computer Science Logo Style vol 3 ch 4: Programming Language Design</TITLE> +</HEAD> +<BODY> +<CITE>Computer Science Logo Style</CITE> volume 3: +<CITE>Beyond Programming</CITE> 2/e Copyright (C) 1997 MIT +<H1>Programming Language Design</H1> + +<TABLE width="100%"><TR><TD> +<IMG SRC="../csls3.jpg" ALT="cover photo"> +<TD><TABLE> +<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian +Harvey</A><BR>University of California, Berkeley</CITE> +<TR><TD align="right"><BR> +<TR><TD align="right"><A HREF="../pdf/v3ch04.pdf">Download PDF version</A> +<TR><TD align="right"><A HREF="../v3-toc2.html">Back to Table of Contents</A> +<TR><TD align="right"><A HREF="../v3ch3/v3ch3.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v3ch5/v3ch5.html"><STRONG>NEXT</STRONG></A> +<TR><TD align="right"><A HREF="https://mitpress.mit.edu/books/computer-science-logo-style-second-edition-volume-3">MIT +Press web page for <CITE>Computer Science Logo Style</CITE></A> +</TABLE></TABLE> + +<HR><P>Program file for this chapter: <A HREF="pascal.lg"><CODE>pascal</CODE></A> + +<P>This chapter and the next are about two related things: why different +programming languages are different and how a programming language is +implemented. To make the discussion concrete, I've chosen a specific +language as an example: Pascal. That choice seems appropriate partly +because Pascal is very different from Logo and partly because it is widely +used as a vehicle for teaching introductory computer science, the same task +I'm attempting in this book using Logo.<SUP>*</SUP> + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>The recent trend in computer +science education has been a shift from Pascal to C or C++. I haven't +followed that trend in this book because from my perspective C illuminates +no new issues, it has a more complicated syntax, and it leaves out one +interesting Pascal feature: nested procedure definitions (block structure). +C++ does introduce the issue of object-oriented programming, but, I think, +not in a way that clarifies the issues; if you want to understand OOP you'd +do better to learn Object Logo.</SMALL></BLOCKQUOTE></SMALL><P>For the purposes of this book I've written a program that translates a <EM> +small subset</EM> of Pascal into a simulated machine language. You can get a +real Pascal compiler for your computer that accepts the full language, and +that's what you should do if you want to learn how to program in Pascal. I +had two reasons for writing this subset compiler. One is that some readers +may not otherwise have access to a Pascal compiler, and mine, despite its +limitations, will enable you to explore the parts of the language I'm going +to be talking about. The other is that the next chapter is about how a +compiler works, and this compiler is accessible to examination because it's +written in Logo. + +<P>When you're comparing two programming languages an obvious question to ask +is "which is better?" Please don't use my partial Pascal compiler as the +basis for an answer to that question; it wouldn't be fair. You already know +my opinion, but my purpose in this chapter is not to convince you of it. +Instead I want you to understand <EM>why</EM> each language is designed the +way it is. For each of the language differences we'll examine, there are +good reasons for either choice; the reasons that influence a language +designer will depend on the overall goals he or she has for this language. + +<P><H2>Programming paradigms</H2> + +<P>Perhaps the most important aspect of the design of a programming language +is the <EM>programming paradigm</EM> that it encourages. A paradigm +(it's pronounced "para" as in "parakeet" followed by "dime" as in ten +cents) is an approach to organizing a complex program: How do you combine +the primitives of a language to accomplish harder tasks? It's an aspect of +programming style, but when people say "style" they're usually thinking of +smaller details, such as the use of comments in procedure definitions or +choosing sensible variable names. Perhaps an example of different +paradigms will help. + +<P>Here's how the factorial function is usually computed in Logo, using a +recursive operation: + +<P><PRE>to fact :n +if :n=0 [output 1] +output :n * fact :n-1 +end +</PRE> + +<P>The goal is to multiply several numbers together, the integers +from 1 to <CODE>:n</CODE>. We do this by carrying out one multiplication in each +recursive invocation. This procedure is written in the +<EM>functional programming</EM> paradigm; the main tool for building +up complexity is composition of functions. In this example, the +result of the recursive <CODE>fact</CODE> invocation is composed with the primitive +<CODE>*</CODE> (multiplication) function. + +<P>Now consider this alternate approach: + +<P><PRE>to fact.seq :n +localmake "product 1 +for [i 1 :n] [make "product (:product * :i)] +output :product +end +</PRE> + +<P>This is an example of the <EM>sequential programming</EM> paradigm, +so called because the <CODE>for</CODE> instruction carries out a sequence of steps: + +<P><P> + +<TABLE><TR><TH align="right" valign="top">•<TD> <TD valign="top">Multiply the accumulated product by 1. +<TR><TH align="right" valign="top">•<TD> <TD valign="top">Multiply the product by 2. +<TR><TH align="right" valign="top">•<TD> <TD valign="top">Multiply it by 3. + +<P></TABLE> + +<P>... and so on. Instead of a composition of functions, we have +a partial result stored in a box, the variable <CODE>product</CODE>. At each +step, the old value is replaced with an updated value. + +<P>Although <CODE>fact.seq</CODE> can be written in Logo, it's not the most natural +Logo style. Most versions of Logo don't even provide <CODE>for</CODE> as a +primitive command, although (as we saw in Volume 2) it can be written in +Logo.<SUP>*</SUP> As we've seen, Logo encourages the +functional programming paradigm, in which complicated +computations are achieved by means of function composition and recursion. +Logo encourages functional programming partly through +its emphasis on recursion rather than on iterative control structures, +and partly because lists are used as the main data aggregation mechanism. +As we saw in Chapter 3, lists encourage an aggregate to be built up one +member at a time (as recursive functions do), and discourage mutation +(which is crucial to the sequential approach). + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Even in Berkeley Logo, <CODE>for</CODE> is a library procedure rather +than a true primitive.</SMALL></BLOCKQUOTE></SMALL><P>In Pascal, the opposite is true. It's possible to write a recursive +factorial function in Pascal: + +<P><PRE>function fact(n:integer): integer; + begin + if n=0 then + fact := 1 + else + fact := n * fact(n-1) + end; +</PRE> + +<P>but a habitual Pascal programmer would be much more likely +to write this function in sequential style: + +<P><PRE>function fact(n:integer): integer; + var product, i: integer; + + begin + product := 1; + for i := 1 to n do + product := product * i; + fact := product + end; +</PRE> + +<P>(Don't worry, I know the details of the notation are a mystery to +you, but you should still be able to see the relationship between each Pascal +version and the corresponding Logo version. The only crucial point about +notation right now is that <CODE>:=</CODE> is the Pascal assignment operator, like +<CODE>make</CODE> in Logo. We'll go into the details of Pascal syntax later.) + +<P>Here's a more complicated example, showing how data aggregates are used in +the two paradigms. In Chapter 2 we explored the Simplex lock problem by +computing the function + +<P> + +<P><P><CENTER><IMG SRC="math1.gif" ALT="math display"></CENTER><P> + +<P> + +<P>using these procedures: + +<P><PRE>to simplex :buttons +output 2 * f :buttons +end + +to f :n +if equalp :n 0 [output 1] +output cascade :n [? + ((choose :n (#-1)) * f (#-1))] 0 +end +</PRE> + +<P>Here, the mathematical definition of <EM>f</EM> in terms of itself is +reflected in the recursive nature of the operation <CODE>f</CODE>. In Chapter 3, we +improved the efficiency of the procedure by remembering smaller values of +<EM>f</EM> to avoid recomputing them; similarly, instead of computing the <CODE> +choose</CODE> function separately each time, we used old values to compute +new ones: + +<P><PRE>to simplex :buttons +output 2 * first (cascade :buttons + [fput (sumprods butfirst ?2 ?1) ?1] [1] + [fput 1 nextrow ?2] [1 1]) +end + +to sumprods :a :b +output reduce "sum (map "product :a :b) +end + +to nextrow :combs +if emptyp butfirst :combs [output :combs] +output fput (sum first :combs first butfirst :combs) ~ + nextrow butfirst :combs +end +</PRE> + +<P>The recursive nature of <EM>f</EM> is less obvious in the second +implementation, but the overall technique is still composition of +functions. (Recall that the job of <CODE>cascade</CODE> is to invoke a +function repeatedly, in the pattern <EM>f</EM>(<EM>f</EM>(<EM>f</EM>(··· <EM>f</EM>(<EM>x</EM>)))). In this +case, <CODE>cascade</CODE> is computing two functions in parallel; one is a list of +values of the Simplex function <EM>f</EM> and the other is a row of Pascal's +triangle.) The availability of higher order functions (in this program +I've used <CODE>cascade</CODE>, <CODE>map</CODE>, and <CODE>reduce</CODE>) is another way +in which Logo encourages the functional paradigm. + +<P>In sequential style, the composition of functions is replaced by a sequence +of steps in which values are stored in boxes (members of arrays) and +repeatedly replaced with different values: + +<P><PRE>to simplex.seq :buttons +localmake "f (array :buttons+1 0) +localmake "combs (array :buttons+1 0) +local [left right] +setitem 0 :f 1 +setitem 0 :combs 1 +for [i 1 :buttons] [ + setitem :i :f 0 + make "right 0 + for [j 0 :i-1] [ + make "left :right + make "right item :j :combs + setitem :j :combs :left+:right + setitem :i :f (item :i :f) + (item :j :f)*(item :j :combs) + ] + setitem :i :combs 1 +] +output 2 * item :buttons :f +end +</PRE> + +<P>It may take some effort to convince yourself that this procedure +really computes the same results as the other versions! Within the +procedure, the array <CODE>f</CODE> contains the values <EM>f</EM>(0), <EM>f</EM>(1), ... as +they are computed one by one; the array <CODE>combs</CODE> contains one row (at a +time) of Pascal's triangle. + +<P>The procedure first puts <EM>f</EM>(0) into the zeroth position of the <CODE>f</CODE> array +and the first row of Pascal's triangle (containing just one 1) in the <CODE> +combs</CODE> array. Then comes a <CODE>for</CODE> loop that computes <EM>f</EM>(1), then <EM>f</EM>(2), +and so on, until the desired value is reached. An inner <CODE>for</CODE> loop +fills the same purpose as the <CODE>sumprods</CODE> function in the previous +version of <CODE>simplex</CODE>: It computes the sum of several terms, not by +function composition but by adding each term into the sum separately. +The instruction + +<P><PRE>setitem :i :f (item :i :f) + (item :j :f)*(item :j :combs) +</PRE> + +<P>adds one additional term to the sum each time it's carried out. + +<P>The sequential Simplex calculation looks bizarre in Logo, but it's much +more natural in Pascal: + +<P><PRE>function simplex(buttons:integer): integer; +var left, right, i, j: integer; + f, combs: array [0..30] of integer; + + begin + f[0] := 1; + combs[0] := 1; + for i := 1 to buttons do + begin + f[i] := 0; + right := 0; + for j := 0 to i-1 do + begin + left := right; + right := combs[j]; + combs[j] := left+right; + f[i] := f[i] + (f[j] * combs[j]) + end; + combs[i] := 1 + end; + simplex := 2 * f[buttons] + end; +</PRE> + +<P>Pascal is well suited to this style of programming for several +reasons. One is that the <CODE>f[i]</CODE> notation for a member of an array is +more compact and more readable than Logo's use of procedure invocations +(calling <CODE>item</CODE> to examine an array member and <CODE>setitem</CODE> to modify +its value). Another, already mentioned, is that <CODE>for</CODE> is built into +Pascal. Perhaps most important is Pascal's <EM>block structure:</EM> +the keywords <CODE>begin</CODE> and <CODE>end</CODE> can be used to group what would +otherwise be separate instructions into one larger instruction. In Logo, +the instructions that are repeated in a <CODE>for</CODE> loop must be part of a +list, one of the inputs to the <CODE>for</CODE> procedure; in principle, the entire +<CODE>for</CODE> invocation is one Logo instruction line. + +<P>Both Logo and Pascal are compromises between the functional paradigm and +the sequential paradigm. (In Logo, turtle graphics programs are most often +done sequentially, whereas the manipulation of words and sentences is +generally done functionally.) But Logo is much more of a functional +language than Pascal, partly because it supports list processing (you can +create lists in Pascal, but it's painful), and even more importantly because +in Logo it's easy to invent higher order functions such as <CODE>map</CODE> and +<CODE>cascade</CODE>. Pascal programmers can't readily invent their own control +structures because there's nothing like <CODE>run</CODE> or <CODE>apply</CODE> in Pascal, +and the built-in control structures are all sequential ones. (In addition +to <CODE>for</CODE>, Pascal has equivalents to the <CODE>while</CODE> and <CODE>do.until</CODE> +commands in the Berkeley Logo library.) As another example, Logo's <CODE> +ifelse</CODE> primitive can be used either as a command or as an operation, but +the Pascal equivalent works only as a command. + +<P>Not all programming languages compromise between paradigms. It's rare these +days to see a purely sequential language, but it used to be common; both the +original Fortran language and the early microcomputer versions of BASIC +lacked the ability to handle recursive procedures. Purely functional +languages are not widely used in industry, but are of interest to many +computer science researchers; the best known example is called ML. In a +purely functional language, there is no assignment operator (like <CODE>make</CODE> +in Logo) and no mutators (like <CODE>setitem</CODE> or <CODE>.setfirst</CODE>). + +<P>There are other programming paradigms besides sequential and functional, +although those are the oldest. The sequential paradigm came first because +the actual digital computer hardware works sequentially; Fortran, which was +the first higher level programming language, was initially viewed as merely +an abbreviation for the computer's hardware instruction +sequences.<SUP>*</SUP> The +functional paradigm was introduced with Lisp, the second-oldest programming +language still in use. Although Lisp is not a pure functional language (it +does have assignment and mutation), its design is firmly based on the idea of +functions as the way to express a computation. + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Today, we think of programming languages primarily as ways +to express problems, rather than as ways to model how computer hardware +works. This shift in attitude has allowed the development of non-sequential +paradigms. We design languages that are well matched to the problems we want +to solve, rather than well matched to the hardware we're using.</SMALL></BLOCKQUOTE></SMALL><P>Another paradigm that's very popular today is +<EM>object-oriented programming.</EM> In this paradigm, we imagine +that instead of having a single computer carrying out a single program, the +computational world includes many independent "objects," each of which can +carry out programs on its own. Each object includes <EM>methods,</EM> which +are like local procedures, and <EM>variables,</EM> just like the variables +we've used all along except that each belongs to a particular object. If +one object wants to know the value of another object's variable, the first +object must send a <EM>message</EM> to the second. A message is a request +to carry out a method, so the messages that each object accepts depends on +the methods that it knows. + +<P>Logo has had a sort of glimmer of the object paradigm for many years, because +many dialects of Logo include multiple turtles. To move a turtle, you send +it a message, using a notation something like + +<P><PRE>ask 7 [forward 100] +</PRE> + +<P>to send a message to turtle number 7. But this notation, even +though it conveys some of the flavor of object-oriented programming, is not +truly representative of the paradigm. In a real object system, it would +be possible for specific turtles to have their own, specialized <CODE>forward</CODE> +methods. Turtle 7, for example, might be a special "dotted turtle" that +draws dotted lines instead of solid lines when it moves forward. One Logo +dialect, called Object Logo, does provide a genuine object capability. + +<P>Object-oriented programming fits naturally with the sort of problem in which +the computer is modeling or simulating a bunch of real-world objects; in +fact, the paradigm was invented for <EM>simulation</EM> programs, used to try +to answer questions such as "Will it eliminate the traffic jams if we add +another lane to this highway, or should we spend the money on more frequent +bus service instead?" The objects in the simulation program are people, +cars, buses, and lanes. Another example of a problem that lends itself to +the object paradigm is a window system, such as the one in Mac OS or in +Microsoft Windows. Each window is an object; when a window is displayed so +as to hide part of another window, the new window must send a message to the +hidden one telling it not to display the hidden region. + +<P>Some people argue that object-oriented programming should be used for <EM> +every</EM> programming problem, not because the independent object metaphor is +always appropriate but because using objects helps with <EM>information +hiding;</EM> if every variable belongs to a specific object, the program is +less likely to have the kind of bug in which one part of a program messes up +a data structure that's used in another part. This can be particularly +important, they say, in a large programming problem in which several +programmers work on different pieces of the program. When the different +programmers' procedures are put together, conflicts can arise, but the +object paradigm helps isolate each programmer's work from the others. +Although this argument has some merit, I'm cautious about any claim that one +particular paradigm is best for all problems. I think programmers should be +familiar with all of the major paradigms and be able to pick one according +to the needs of each task. + +<P>Another important programming paradigm is called <EM> +logic programming</EM> or <EM>declarative programming.</EM> In +this approach, the programmer doesn't provide an algorithm at all, but +instead lists known facts about the problem and poses questions. It's up +to the language implementation to search out all the possible solutions to +a question. We saw a very simplified version of this paradigm in the +discussion of logic problems in Chapter 2. Logic programming is especially +well suited to <EM>database</EM> problems, in which we pose questions such as +"Who are all the employees of this company who work in the data processing +division and have a salary above 40,000?" But, like all the paradigms +I've mentioned, logic programming is <EM>universal;</EM> any problem that can +be solved by a computer at all can be expressed as a logic program. Logic +programming is quite popular in Japan and in Europe, but not so much in the +United States, perhaps just because it wasn't invented here. + +<P><H2>Interactive and Non-interactive Languages</H2> + + + +<P>You use Logo by interacting with the language processor. You issue an +instruction, Logo does what you've told it and perhaps prints a result, +and then you issue another instruction. You can preserve a sequence of +instructions by defining a procedure, in which case that procedure can be +invoked in later instructions. But you don't have to define procedures; +young children generally start using Logo by issuing primitive turtle motion +commands one at a time. Defining procedures can be thought of as extending +the repertoire of things Logo knows how to do for future interaction. + +<P>By contrast, you write a Pascal program as a complete entity, "feed" the +program to the Pascal language processor all at once, and then wait for the +results. Often, when you type your program into the computer you aren't +dealing with the Pascal processor at all; you use another program, a +text editor, to let you enter your Pascal program into the computer. <EM> +Then</EM> you start up Pascal itself. (Most microcomputer versions of Pascal +include a simple text editor for program entry, just as Logo includes a +procedure editor.) Typically you store your Pascal program as a file on a +disk and you give the file name as input to the Pascal language processor. + +<P>Keep in mind that it's the process of writing and entering a program that's +non-interactive in Pascal. It's perfectly possible to write a Pascal program +that interacts with the user once it's running, alternating <CODE>read</CODE> and +<CODE>write</CODE> statements. (However, user input is one of the things I've left +out of my Pascal subset, as you'll see shortly.) + +<P>If you want to write your own Pascal programs for use with my compiler, +you'll need a way to create a disk file containing your new program, using +either Logo's procedure editor or some separate editing program. The sample +Pascal programs in this chapter are included along with the Logo program +files that accompany this book. + +<P>Our first example of a complete Pascal program is a version of the +Tower of Hanoi puzzle. I described this problem in +the first volume of this series. The Logo solution consists of two +procedures: + +<P><PRE>to hanoi :number :from :to :other +if equalp :number 0 [stop] +hanoi :number-1 :from :other :to +movedisk :number :from :to +hanoi :number-1 :other :to :from +end + +to movedisk :number :from :to +print (sentence [Move disk] :number "from :from "to :to) +end +</PRE> + +<P>To use these procedures you issue an instruction like + +<P><PRE>? <U>hanoi 5 "a "b "c</U> +Move disk 1 from a to b +Move disk 2 from a to c +Move disk 1 from b to c +Move disk 3 from a to b +Move disk 1 from a to c +... and so on. +</PRE> + +<P>Here is the corresponding Pascal program. This program is in the +file <CODE>tower</CODE>. (As you can see, Pascal programs begin with a +<EM>program name;</EM> in all of the examples in this chapter the +file name is the same as the program name, although that isn't a requirement +of Pascal.) Never mind the program details for the moment; right now +the point is to make sure you know how to get the Pascal compiler to +translate it into Logo. + +<P><PRE>program tower; + {This program solves the 5-disk tower of hanoi problem.} + +procedure hanoi(number:integer;from,onto,other:char); + {Recursive procedure that solves a subproblem of the original problem, + moving some number of disks, not necessarily 5. To move n disks, it + must get the topmost n-1 out of the way, move the nth to the target + stack, then move the n-1 to the target stack.} + + procedure movedisk(number:integer;from,onto:char); + {This procedure moves one single disk. It assumes that the move is + legal, i.e., the disk is at the top of its stack and the target stack + has no smaller disks already. Procedure hanoi is responsible for + making sure that's all true.} + + begin {movedisk} + writeln('Move disk ',number:1,' from ',from,' to ',onto) + end; {movedisk} + + begin {hanoi} + if number <> 0 then + begin + hanoi(number-1,from,other,onto); + movedisk(number,from,onto); + hanoi(number-1,other,onto,from) + end + end; {hanoi} + +begin {main program} + hanoi(5,'a','b','c') +end. +</PRE> + +<P>Once you have a Pascal program in a disk file, you compile it using the +<CODE>compile</CODE> command with the file name as input: + +<P><PRE>? <U>compile "tower</U> +</PRE> + +<P>The compiler types out the program as it compiles it, partly to +keep you from falling asleep while it's working but also so that if the +compiler detects an error in the program you'll see where the error was +found. + +<P>When the compiler has finished processing the <EM>source file</EM> (the file +containing the Pascal program) it stops and you see a Logo prompt. At this +point the program has been translated into a simulated machine +language. To run the program, say + +<P><PRE>? <U>prun "tower</U> +Move disk 1 from a to b +Move disk 2 from a to c +Move disk 1 from b to c +Move disk 3 from a to b +Move disk 1 from a to c +... and so on. +</PRE> + +<P>The input to the <CODE>prun</CODE> (Pascal run) command is the program +name--the word that comes after <CODE>program</CODE> at the beginning of the +source file. + +<P>The difference between an interactive and a non-interactive language is not +just an arbitrary choice of "user interface" style. This choice reflects +a deeper distinction between two different ways of thinking about what +a computer program is. +In Logo there is really no such thing as a "program." Each procedure is an +entity on its own. You may think of one particular procedure as the +top-level one, but Logo doesn't know that; you could invoke any procedure +directly by using an interactive instruction naming that procedure. Logo +does have the idea of a <EM>workspace,</EM> a collection of procedures stored +together in a file because they are related. But a workspace +need not be a tree-structured hierarchy with one top-level procedure and the +others as subprocedures. It can be a collection of utility procedures with +no top-level umbrella, like the Berkeley Logo library. +It can be a group of projects that are conceptually related but with several +independent top-level procedures, like the two memoization examples, the two +sorting algorithms, the tree data base, and other projects in the <CODE>algs</CODE> +workspace of Chapter 3. + +<P>By contrast, a Pascal program is considered a single entity. It always +begins with the word <CODE>program</CODE> and ends with a period, by analogy with +an English sentence. (The subprocedures and the individual statements +within the program are separated with semicolons because they are analogous +to English clauses.<SUP>*</SUP>) It makes no sense to give the +Pascal compiler a source file containing just procedures without a main +program. + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I say "English" because I am writing for an +English-speaking audience, but in fact Pascal was designed by a largely +European committee including native speakers of several languages; principal +designer Niklaus Wirth is Swiss. Their languages all +have periods and semicolons, though.</SMALL></BLOCKQUOTE></SMALL><P><EM>Why</EM> did Logo's designers choose an interactive program development +approach, while Pascal's designers chose a whole-program paradigm? Like all +worthwhile questions, this one has more than one answer. And like many +questions in language design, this one has two broad <EM>kinds</EM> of answer: +the answers based on the implementation strategy for the language and the +ones based on underlying programming goals. + +<P>The most obvious answer is that Pascal is a <EM>compiled</EM> language and +Logo is an <EM>interpreted</EM> one. That is, most Pascal language +processors are <EM>compilers:</EM> programs that translate a program from one +language into another, like translating a book from English into Chinese. +Most compilers translate from a source language like Pascal into +the native <EM>machine language</EM> of whatever computer you're using. +(My Pascal compiler translates into a <EM>simulated</EM> machine language +that's actually processed by Logo procedures.) By +contrast, most Logo versions are <EM>interpreters:</EM> programs +that directly carry out the instructions in your source program, without +translating it to a different ("object") language.<SUP>*</SUP> + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>This is another +case in which the same word has two unrelated technical meanings. The use of +"object" in describing the result of a compilation (object program, object +language) has nothing to do with object-oriented programming.</SMALL></BLOCKQUOTE></SMALL><P>To understand why interpreters tend to go with interactive languages, while +compilers usually imply "batch mode" program development, think about the +"little person" metaphor that's often used in teaching Logo. If you think +of the computer as being full of little people who know how to carry out the +various procedures you've written, the one who's really in charge is not the +one who carries out your top-level procedure, but rather the one representing +the Logo interpreter itself. If the procedure specialists are like circus +performers, the Logo interpreter is the ringmaster. The circus metaphor is +actually a pretty good one, because on the one hand each performer is an +autonomous person, but at the same time the performers have to cooperate in +order to put on a show. The relevance of the metaphor to this discussion is +that in a compiled language there is no "ringmaster." The compiler is more +closely analogous to a bird that hatches an egg (your program) and then +pushes the new bird out of the nest to fend for itself. In a compiled +language there is no provision for an interactive interface to which you can +give commands that control the running of your program, unless your program +itself includes such an interface. + +<P>Saying the same thing in a different way, the Logo interpreter is part of +the environment in which any Logo program operates. (That's why Logo can +provide a facility like the <CODE>run</CODE> command to allow your program to +construct new Logo instructions as it progresses.) But a Pascal compiler +does its job, translating your program into another form, and then +disappears. Whatever mechanisms are available to control your program have +to be built into the program. For example, my Pascal version of the Tower +of Hanoi program includes the top-level instruction that starts up the +solution for five disks. In the Logo version, that instruction isn't +considered part of the program; instead, you direct Logo interactively +to invoke the <CODE>hanoi</CODE> procedure. + +<P> + +<P>The distinction between compiled and interpreted languages is not as +absolute as it once was. There are versions of Logo in which each procedure +is compiled as you define it, but it's still possible to give instructions +interactively. (Some such versions include both a compiler and an +interpreter; in others, the "interpreter" just arranges to compile each +instruction you type as if it were a one-line program.) And many current +Pascal compilers don't compile into the machine language of the host +computer, but rather into an <EM>intermediate</EM> language called +"P-code" that is then interpreted by another program, a +P-code interpreter. P-code is called an intermediate language +because the level of detail in a P-code program is in between that of a +language you'd want to use and that of the underlying machine language. Its +primitives are simple and quick, not complex control structures like <CODE> +if</CODE> or <CODE>for</CODE>. The advantage of a Pascal language processor based on +P-code is that the compiler is <EM>portable</EM>--it can work on +any computer. All that's needed to start using Pascal on a new computer is a +P-code interpreter, which is a relatively easy programming project. + +<P> + +<P>So far I've been explaining a language design decision (interactive or +non-interactive development) in terms of an implementation constraint +(interpreted or compiled). But it's possible to look beyond that +explanation to ask <EM>why</EM> someone would choose to design a compiler +rather than an interpreter or vice versa. + +<P>The main advantage of a compiler is that the finished object program runs +fast, since it is directly executed in the native language of the host +computer. (An interpreter, too, ultimately carries out the program in the +computer's native language. But the interpreter must decide which native +language instructions to execute for a given source language instruction +each time that instruction is evaluated. In a compiled language that +translation process happens only once, producing an object program +that requires no further translation while it's running.) +The tradeoff is that the compilation process itself is slow. +If you're writing a program that will be used every day forever, the +compiled language has the advantage because the development process only +happens once and then the program need not be recompiled. On the other +hand, during program development the compiled language may be at a +disadvantage, because any little change in one instruction requires that the + +entire program be recompiled. (For some languages there are <EM> +incremental</EM> compilers that can keep track of what part of the program +you've changed and only recompile that part.) + +<P>A compiled language like Pascal (or Fortran or C), then, makes sense in a +business setting where a program is written for practical use, generally +using well-understood algorithms so that the development process should be +straightforward. An interpreted language like Logo (or Lisp or BASIC) makes +more sense in a research facility where new algorithms are being explored +and the development process may be quite lengthy, but the program may never +be used in routine production. (In fact nobody uses BASIC for research +purposes, because of other weaknesses, but its interactive nature is a plus.) +Another environment in which interaction is important is education; a +computer science student writes programs that may <EM>never</EM> actually be +run except for testing. The program is of interest only as long as it +doesn't work yet. For such programs the speed advantage of a compiled +program is irrelevant. + +<P>There are also reasons that have nothing to do with implementation issues. +I've spoken earlier of two conflicting views of computer science, which I've +called the software engineering view and the artificial intelligence view. +In the former, the program development process is seen as beginning with a +clear, well-defined idea of what the program should do. This idea is +written down as a <EM>program specification</EM> that forms the basis for the +actual programming. From that starting point, the program is developed +top-down; first the main program is written in terms of subprocedures that +are planned but not written yet; then the lower-level procedures are written +to fill in the details. No procedure is written until it's clear how that +procedure fits into a specific overall program. Since Pascal's developers +are part of the software engineering camp, it's not surprising that a Pascal +program takes the form of an integrated whole in which each procedure must +be <EM>inside</EM> a larger one, rather than a collection of more autonomous +procedures. By contrast, Logo is a product of the artificial intelligence +camp, for whom program development is a more complicated process involving +bottom-up as well as top-down design. AI researchers recognize that they +may begin a project with only a vague idea of what the finished program will +do or how it will be organized. It's appropriate, then, to start by writing +program fragments that deal with whatever subtasks you <EM>do</EM> understand, +then see how those pieces can fit together to complete the overall project. +Development isn't a straight line from the abstract specification to the +concrete subprocedures; it's a zigzag path in which the programmer gets an +idea, tries it out, then uses the results as the basis for more thinking +about ideas. + +<P>Traditionally, an interpreter has been the primary tool to facilitate +interactive program development. Recently, though, software developers +have brought a more interactive flavor to compiled languages by inventing +the idea of an <EM>integrated development environment</EM> (IDE), in which a +compiler is one piece of a package that also includes a language-specific +editor (one that knows about the syntax of the language and automatically +provides, for example, the keyword <CODE>do</CODE> that must follow a <CODE>for</CODE> in +Pascal), online documentation, and a <EM>debugger,</EM> which is a program +that permits you to follow the execution of your program one step at a time, +like the <CODE>step</CODE> command in Berkeley Logo. The idea is to have your cake +and eat it too: You use the IDE tools during program development, but once +your program is debugged, you're left with a fast compiled version that can +be run without the IDE. + +<P><H2>Block Structure</H2> + +<P>So far we've been looking at how each language thinks about a program as a +whole. We turn now to the arrangement of pieces within a program or a +procedure. + +<P>A Logo procedure starts with a <EM>title line,</EM> followed by the +instructions in the procedure <EM>body</EM> and then the <CODE>end</CODE> line. +The purpose of the title line is to give names to the procedure itself and +to its inputs. + +<P>The structure of a Pascal program is similar in some ways, but with some +complications. The program starts with a <EM>header</EM> line, very much +analogous to the title line in Logo. The word <CODE>tower</CODE> in the +header line of our sample program is the name of the program. Skipping +over the middle part of the program for the moment, the part between <CODE> +begin</CODE> and <CODE>end</CODE> in the last few lines is +the <EM>statement part</EM> of +the program, just as in Logo. The word <CODE>end</CODE> in the Pascal program is +not exactly analogous to the <CODE>end</CODE> line in a Logo procedure; it's a kind +of closing bracket, matching the <CODE>begin</CODE> before it. The period right +after the final <CODE>end</CODE> is what corresponds to the Logo <CODE>end</CODE> line. + +<P> + +<P>What makes Pascal's structure different from Logo's is the part I've skipped +over, the declaration of procedures. In Logo, every procedure is a separate +entity. In Pascal, the declaration of the procedure <CODE>hanoi</CODE>, for +example, is <EM>part of</EM> the program <CODE>tower</CODE>. This particular +program uses no global variables, but if it did, those variables would also +have to be declared within the program. If the program used global +variables <CODE>a</CODE>, <CODE>b</CODE>, <CODE>i</CODE>, and <CODE>j</CODE> then it might begin + +<P><PRE>program tower; +var a,b:real; + i,j:integer; +procedure hanoi(number:integer;from,onto,other:char); +</PRE> + +<P>In summary, a Pascal program consists of +<P> +<TABLE><TR><TH align="right" valign="top">1.<TD> <TD valign="top"> the header line +<TR><TH align="right" valign="top">2.<TD> <TD valign="top"> the declaration part (variables and procedures) +<TR><TH align="right" valign="top">3.<TD> <TD valign="top"> the statement part +<TR><TH align="right" valign="top">4.<TD> <TD valign="top"> the final punctuation (period) + +<P></TABLE><P> + +<P>But notice that the procedure <CODE>hanoi</CODE>, declared inside <CODE> +tower</CODE>, has the <EM>same</EM> structure as the entire program. It +begins with a header line; its declaration part includes the declaration of +procedure <CODE>movedisk</CODE>; it has a statement part between <CODE>begin</CODE> and +<CODE>end</CODE>; its final punctuation is a semicolon instead of a period. + +<P>What does it <EM>mean</EM> for one procedure to be declared inside another? +You already know what a <EM>local variable</EM> means; if a variable <CODE>v</CODE> +belongs to a procedure <CODE>p</CODE> then that variable exists only while the +procedure is running; at another point in the program there might be a +different variable with the same name. In Pascal, the same is true for +local procedures. In our example program, the procedure <CODE>movedisk</CODE> +exists only while procedure <CODE>hanoi</CODE> is running. It would be an error to +try to invoke <CODE>movedisk</CODE> directly from the main program. + +<P>The header line for a procedure can include names for its inputs, just as +the title line of a Logo procedure names its inputs. A useful bit of +terminology is that the variable names in the procedure header are called +<EM>formal parameters</EM> to distinguish them from the expressions +that provide particular input values when the procedure is actually invoked; +the latter are called <EM>actual arguments.</EM> The words +"parameter" and "argument" are both used for what we call an +"input" in Logo.<SUP>*</SUP> + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I'm misleading you a little bit by calling it a +"header line." Like any part of a Pascal program, the header can extend +over more than one line, or can be on the same line with other things. The +end of the header is marked with a semicolon. In Pascal a line break is +just like a space between words. However, there are conventions for +properly formatting a Pascal program. Even though the Pascal compiler +doesn't care about spacing and line breaks, people always do it as I've +shown you here, with subordinate parts of the program indented and each +statement on a separate line.</SMALL></BLOCKQUOTE></SMALL><P>The sequence of header, declarations, statements, and punctuation is called +a <EM>block.</EM> Pascal is called a <EM>block structured</EM> language +because of the way blocks can include smaller blocks. Another aspect of +block structure is +Pascal's use of <EM>compound statements.</EM> A sequence of the form + +<P><PRE>begin <EM>statement</EM> ; <EM>statement</EM> ; <EM>statement</EM> end +</PRE> + +<P>is called a compound statement. An example from the <CODE>tower</CODE> +program is + +<P><PRE>begin + hanoi(number-1,from,other,onto); + movedisk(number,from,onto); + hanoi(number-1,other,onto,from) +end +</PRE> + +<P>(Notice that semicolons go <EM> +between</EM> statements within this sequence; none is needed after the last +statement of the group. This syntactic rule is based on the analogy between +Pascal statements and English clauses that I mentioned earlier.) For +example, Pascal includes a conditional statement whose form is + +<P><PRE>if <EM>condition</EM> then <EM>statement</EM> +</PRE> + +<P> +The "statement" part can be a single <EM>simple</EM> statement, +like a procedure call, or it can be a compound statement delimited by <CODE> +begin</CODE> and <CODE>end</CODE>. Because the general term "block structured language" +refers to any syntactic grouping of smaller units into a larger one, +including compound statements, you may hear the word "block" used to refer +to a compound statement even though that's not the official Pascal meaning. + +<P> + +<P> +<H2>Statement Types</H2> + +<P> + +<P>In Logo we don't talk about different kinds of statements like compound, +simple, and so on. <EM>Every</EM> Logo instruction (well, all but <CODE>to</CODE>) +is a procedure invocation. <CODE>If</CODE>, for example, is a procedure whose +first input is <CODE>true</CODE> or <CODE>false</CODE> and whose second input is a list +containing instructions to be carried out if the first input is <CODE>true</CODE>. +In Pascal there are several different kinds of statements, each with its own +syntax. + +<P>You know about compound statements. You've seen one example of <CODE>if</CODE>, + +which is one of several <EM>structured</EM> statements in Pascal. Other +examples include + +<P><PRE>while <EM>condition</EM> do <EM>statement</EM>; +repeat <EM>statements</EM> until <EM>condition</EM> + +while x < 0 do + begin + increase(x); + writeln(x) + end; + +repeat + increase(x); + writeln(x) +until x >= 0; +</PRE> + +<P>These are like the <CODE>while</CODE> and <CODE>do.until</CODE> tools in +Berkeley Logo. <CODE>While</CODE>, like <CODE>if</CODE>, requires a single statement +(which can be a compound statement between <CODE>begin</CODE> and <CODE>end</CODE>) after +the <CODE>do</CODE>. However, the words <CODE>repeat</CODE> and <CODE>until</CODE> implicitly +delimit a compound statement, so you can put more than one statement between +them without using <CODE>begin</CODE> and <CODE>end</CODE>. Another example is <CODE> +for</CODE>, which you'll see in use in a moment. Continuing the analogy +with English grammar, a compound statement is like a compound sentence with +several independent (or coordinate) clauses; a structured statement is like +a complex sentence, with a dependent (or subordinate) clause. (If you +always hated grammar, you can just ignore this analogy.) + +<P>There are basically only two kinds of simple statement: the +procedure call, + +which you've already seen, and the <EM>assignment</EM> statement used to give +a variable a new value. This latter is Pascal's version of <CODE>make</CODE> in +Logo; it takes the form + +<P><PRE><EM>variable</EM> := <EM>expression</EM> + +slope := ychange/xchange +</PRE> + +<P>As I've already mentioned, the variable must have been declared +either in a procedure heading or in a <CODE>var</CODE> declaration. +(Assignment is represented with the two-character symbol <CODE>:=</CODE> because +<CODE>=</CODE> by itself means <CODE>equalp</CODE> rather than <CODE>make</CODE>.) + +<P>I say there are "basically" only two kinds because each of these has some +special cases that look similar but follow different rules. For example, +printing to the computer screen is done by what looks like an invocation of +<CODE>write</CODE> (analogous to <CODE>type</CODE> in Logo) or <CODE> +writeln</CODE> ("write line," analogous to <CODE>print</CODE>). But these are +not ordinary procedures. Not only do they take a variable number of +arguments, but the arguments can take a special form not ordinarily +allowed. In the <CODE>movedisk</CODE> procedure in the <CODE>tower</CODE> program, one of +the arguments to <CODE>writeln</CODE> is + +<P><PRE>number:1 +</PRE> + +<P>The "<CODE>:1</CODE>" here means "using one print position unless more +are needed to fit the number." Pascal print formatting is designed to +emphasize the printing of numbers in columns, so the default is to print +each number with a fairly large number of characters, with spaces at the +left if the number doesn't have enough digits. The exact number of +characters depends on the type of number and the dialect of Pascal, but 10 +is a typical number for integers. So + +<P><PRE>writeln(1,2,3,4); +writeln(1000,2000,3000,4000); +</PRE> + +<P>will give a result looking something like this: + +<P><PRE> 1 2 3 4 + 1000 2000 3000 4000 +</PRE> + +<P>In <CODE>movedisk</CODE> I had to say "<CODE>:1</CODE>" to avoid all that +extra space. + +<P>What are the pros and cons of using a variety of syntax rules for different +kinds of statements? One reason for the Pascal approach is that differences +in meaning can be implicit in the definitions of different statement types +instead of having to be made explicit in a program. Don't worry; you're not +expected to understand what that sentence meant, but you will as soon as you +see an example. In Logo we say + +<P><PRE>if :x < 10 [increment "x] +while [:x < 10] [increment "x] +</PRE> + +<P>Why is the predicate expression <CODE>:x < 10</CODE> in a quoted list in +the case of <CODE>while</CODE> but not for <CODE>if</CODE>? <CODE>If</CODE> wants the +expression to be evaluated once, <EM>before</EM> <CODE>if</CODE> is invoked. The +actual input to <CODE>if</CODE> is not that expression but the value of the +expression, either <CODE>true</CODE> or <CODE>false</CODE>. <CODE>While</CODE>, on the other +hand, wants to evaluate that expression repeatedly. If Logo evaluated the +expression ahead of time and gave <CODE>while</CODE> an input of <CODE>true</CODE> or <CODE> +false</CODE> it wouldn't be able to know when to stop repeating. + +<P>The fact that <CODE>if</CODE> wants the condition evaluated once but <CODE>while</CODE> +wants to evaluate it repeatedly has nothing to do with the syntax of Logo; +the same is true in Pascal. But in Pascal you say + +<P> + +<P><PRE>if x<10 then increment(x); +while x<10 do increment(x); +</PRE> + +<P>In Logo the fact that <CODE>if</CODE>'s condition is evaluated in advance +but <CODE>while</CODE>'s isn't is made explicit by the use of square brackets. In +Pascal it's just part of the <EM>semantic</EM> definitions of the <CODE>if</CODE> +and <CODE>while</CODE> statements. (<EM>Syntax</EM> is the form in which something +is represented; <EM>semantics</EM> is the meaning of that something.) + +<P>One more example: Beginning students of Logo often have trouble +understanding why you say + +<P><PRE>make "new :old +</PRE> + +<P>to assign the value of one variable to another variable. Why is +the first quoted and the second dotted? Of course you understand that it's +because the first input to <CODE>make</CODE> is the <EM>name</EM> of the variable +you want to set, while the second is the <EM>value</EM> that you want to give +it. But in Pascal this distinction is implicit in the semantic definition +of the assignment statement; you just say + +<P><PRE>new := old +</PRE> + +<P>Since beginning Logo students have trouble with quotes and dots in <CODE> +make</CODE>, you might think that the Pascal approach is better. But beginning +Pascal students have a trouble of their own; they tend to get thrown by +statements like + +<P><PRE>x := x+1 +</PRE> + +<P>This doesn't look quite as bad as the BASIC or Fortran version in +which the symbol for assignment is just an equal sign, but it's still easy +to get confused because the symbol "<CODE>x</CODE>" means two very different +things in its two appearances here. In the Logo version + +<P><PRE>make "x :x+1 +</PRE> + +<P>the explicit difference in appearance between <CODE>"x</CODE> and <CODE>:x</CODE> +works to our advantage. + +<P>Which way do you find it easier to learn something: Do you want to start +with a simple, perhaps partly inaccurate understanding and learn about the +difficult special cases later, or do you prefer to be told the whole truth +from the beginning? I've posed the question in a way that shows my own +preference, I'm afraid, but there are many people with the opposite view. + +<P>The issue about whether or not to make the semantics of some action implicit +in the syntax of the language is the most profound reason for the difference +between Logo's single instruction syntax and Pascal's collection of +statement types, but there are other implications as well. One virtue of +the Pascal compound statement is that it makes for short, manageable +instruction lines. You've seen Logo procedures in these books in which one +"line" goes on for three or four printed lines on the page, e.g., when the +instruction list input to <CODE>if</CODE> contains several instructions. It's a +particularly painful problem in the versions of Logo that don't allow +continuation lines. + +<P>On the other hand, Logo's syntactic uniformity contributes to its <EM> +extensibility.</EM> In the example above, <CODE>if</CODE> is a Logo primitive, +whereas <CODE>while</CODE> is a library procedure written in Logo. But the +difference isn't obvious; the two are used in syntactically similar ways. +You can't write control structures like <CODE>while</CODE> in Pascal because +there's nothing analogous to <CODE>run</CODE> to allow a list of instructions to be +an input to a procedure, but even if you could, it would have to take the +form + +<P><PRE>while(<EM>condition</EM>,<EM>statement</EM>) +</PRE> + +<P>because that's what a procedure call looks like. But it's not +what a built-in Pascal control structure looks like. + +<P><H2>Shuffling a Deck Using Arrays</H2> + + +<P>It's time for another sample Pascal program. Program <CODE>cards</CODE> is a +Pascal version of the problem of shuffling a deck of cards that we discussed +in Chapter 3. It includes local variables, assignment statements, and the +<CODE>for</CODE> structured statement. It also will lead us into some +additional language design issues. + +<P><PRE>program cards; + {Shuffle a deck of cards} + +var ranks:array [0..51] of integer; + suits:array [0..51] of char; + i:integer; + +procedure showdeck; + {Print the deck arrays} + + begin {showdeck} + for i := 0 to 51 do + begin + if i mod 13 = 0 then writeln; + write(ranks[i]:3,suits[i]); + end; + writeln; + writeln + end; {showdeck} + +procedure deck; + {Create the deck in order} + + var i,j:integer; + suitnames:packed array [0..3] of char; + + begin {deck} + suitnames := 'HSDC'; + for i := 0 to 12 do + for j := 0 to 3 do + begin + ranks[13*j+i] := i+1; + suits[13*j+i] := suitnames[j] + end; + writeln('The initial deck:'); + showdeck + end; {deck} + +procedure shuffle; + {Shuffle the deck randomly} + + var rank,i,j:integer; + suit:char; + + begin {shuffle} + for i := 51 downto 1 do {For each card in the deck} + begin + j := random(i+1); {Pick a random card before it} + rank := ranks[i]; {Interchange ranks} + ranks[i] := ranks[j]; + ranks[j] := rank; + suit := suits[i]; {Interchange suits} + suits[i] := suits[j]; + suits[j] := suit + end; + writeln('The shuffled deck:'); + showdeck + end; {shuffle} + +begin {main program} + deck; + shuffle +end. +</PRE> + +<P>Experienced Pascal programmers will notice that this program isn't written +in the most elegant possible Pascal style. This is partly because of issues +in Pascal that I don't want to talk about (records) and partly because of +issues that I <EM>do</EM> want to talk about in the next section (scope). + +<P>Here's what happens when you run the program: + +<P><PRE>? <U>prun "cards</U> +The initial deck: + + 1H 2H 3H 4H 5H 6H 7H 8H 9H 10H 11H 12H 13H + 1S 2S 3S 4S 5S 6S 7S 8S 9S 10S 11S 12S 13S + 1D 2D 3D 4D 5D 6D 7D 8D 9D 10D 11D 12D 13D + 1C 2C 3C 4C 5C 6C 7C 8C 9C 10C 11C 12C 13C + +The shuffled deck: + + 2D 11D 6S 9D 6C 10H 8D 11C 3D 4C 5H 4S 1D + 5C 5D 6D 9S 4D 8C 13S 13D 10C 9H 10D 5S 12D + 13H 9C 3C 1S 10S 4H 12S 11S 12H 11H 2H 3H 1H + 13C 8H 7C 2C 1C 7S 6H 2S 7D 8S 12C 3S 7H +</PRE> + +<P>The Pascal <CODE>for</CODE> is somewhat like the Berkeley Logo <CODE>for</CODE> +in its semantics, although of course the syntax is quite different. The +step value must be either 1 (indicated by the keyword <CODE>to</CODE>) or −1 +(<CODE>downto</CODE>). By the way, if you've been wondering why I changed one of +the variable names in the Tower of Hanoi program from <CODE>to</CODE> in the Logo +version to <CODE>onto</CODE> in the Pascal version, it's because <CODE>to</CODE> is a <EM> +reserved word</EM> in Pascal and can't be used as the name of +anything. + +<P>The Pascal standard does not include a <CODE>random</CODE> function. Most +practical versions of Pascal do provide a random number generator of some +sort; since there's no standard, I've chosen to implement the kind that's +most useful for the kind of programming I'm interested in, namely the Logo +<CODE>random</CODE> that takes an integer argument and returns an integer between +zero and one less than the argument. + +<P><H2>Lexical Scope</H2> + + +<P>Program <CODE>cards</CODE> has three procedures: <CODE>showdeck</CODE>, <CODE>deck</CODE>, +and <CODE>shuffle</CODE>. Each of these is declared directly in the top-level +program. However, <CODE>showdeck</CODE> is not <EM>invoked</EM> directly at top +level; it's used by the other two procedures. (This is one of the +questionable bits of programming style I've put in for the sake of the +following discussion; ordinarily I think I'd have put the statements that +invoke <CODE>showdeck</CODE> in the main program block.) + +<P>If you read the program carefully you'll see that <CODE>showdeck</CODE> uses a +variable <CODE>i</CODE> but does <EM>not</EM> declare that +variable.<SUP>*</SUP> (When a +variable is used but not declared within a certain procedure, that use of +the variable is called a <EM>free reference.</EM> A use of a +variable that + +<EM>is</EM> declared in the same block is called a <EM>bound</EM> reference.) +There are three variables named <CODE>i</CODE> in the program: one in the outer +block, one in <CODE>deck</CODE>, and one in <CODE>shuffle</CODE>. When, for example, the +main program calls <CODE>deck</CODE> and <CODE>deck</CODE> calls <CODE>showdeck</CODE>, which +variable <CODE>i</CODE> does <CODE>showdeck</CODE> use? + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Actually, the Pascal language requires that the variable +used in a <CODE>for</CODE> statement <EM>must</EM> be declared in the same procedure +in which the <CODE>for</CODE> appears; program <CODE>cards</CODE> is not legal Pascal for +that reason. What's the purpose of that restriction? Suppose that this +procedure was invoked from within another <CODE>for</CODE> loop in another +procedure, and both use the same variable; then both procedures would be +trying to assign conflicting values to that variable. Berkeley Logo's <CODE> +for</CODE> automatically makes its variable local to the <CODE>for</CODE> procedure +itself, for the same reason. But my Pascal compiler lets us get away with +breaking this rule, and I've done it deliberately to make a point.</SMALL></BLOCKQUOTE></SMALL><P>In Logo the answer would be that <CODE>showdeck</CODE> uses the <CODE>i</CODE> belonging +to <CODE>deck</CODE>, the procedure that invoked it. That's because Logo follows +the rules of <EM>dynamic scope:</EM> A free reference to a variable is +directed to the variables of the procedure that invoked the current one, +then if necessary to the variables of the procedure that invoked that one, +and so on up to the global variables. (Dynamic scope is discussed more +fully in the first volume of this series.) + +<P>In Pascal, <CODE>showdeck</CODE> uses the <CODE>i</CODE> belonging to the main program. +That's because a free reference to a variable is directed to <EM>the block +within which the current procedure was declared,</EM> then to the block +surrounding that one, and so on up to the outermost program block. This +rule is called <EM>lexical</EM> scope. The set of blocks surrounding a +given block, smaller to larger, is its <EM>lexical environment.</EM> The +lexical environment of <CODE>showdeck</CODE> is + +<P><PRE>{showdeck, cards} +</PRE> + +<P>The lexical environment of <CODE>movedisk</CODE> in the <CODE>tower</CODE> +program is + +<P><PRE>{movedisk, hanoi, tower} +</PRE> + +<P>The set of procedure invocations leading to a given procedure is +its <EM>dynamic environment.</EM> A procedure's dynamic environment +isn't always the same; for example, the dynamic environment of <CODE> +showdeck</CODE> is sometimes + +<P><PRE>{showdeck, deck, cards} +</PRE> + +<P>and sometimes + +<P><PRE>{showdeck, shuffle, cards} +</PRE> + +<P>The word "lexical" is the adjective form of <EM>lexicon,</EM> which +means "dictionary." It's used in this computer science context because the +lexical context of a procedure has to do with where it's defined, just as +words are defined in a dictionary. The word "dynamic" means <EM>in +motion;</EM> it's used because the dynamic context of a procedure keeps +changing as the program runs. + +<P>What are the reasons behind the choice of lexical or dynamic scope? This is +another choice that was originally made for implementation reasons. It +turns out to be easy for an interpreter to use dynamic scope, but for a +compiler it's much easier to use lexical scope. That's because the +interpreter makes the decision about which variable to use while the program +is running and the dynamic environment is known, but the compiler has to +translate the program <EM>before</EM> it is run. At "compile time" there +isn't one fixed dynamic environment, but there is a single, already-known +lexical environment. Originally, interpreted languages like Logo, Lisp, and +APL all used dynamic scope, while compiled ones like Fortran and Pascal used +lexical scope. (There was even a period of time when Lisp systems offered +both an interpreter and a compiler, and the behavior of the same program was +different depending on whether you compiled it or interpreted it because of +different scope rules.) + +<P>More recent dialects of Lisp, such as Common Lisp and Scheme, have been +designed to use lexical scope even when interpreted. Their designers +think lexical scope is better for reasons that don't depend on the +implementation technique. One reason is that dynamic scope allows for +programming errors that don't arise when lexical scope is used. In Logo, +suppose you write a procedure that makes a free reference to some variable +<CODE>v</CODE>. What you intended, let's say, was to use a global variable by that +name. But you've forgotten that your procedure is sometimes invoked by +another procedure that you wrote two weeks ago that happens to have an input +named <CODE>v</CODE>. It can be very hard to figure out why your procedure is +suddenly getting the wrong variable. With lexical scope, it's much easier +to keep track of the context in which your procedure is defined, to make +sure there are no local variables <CODE>v</CODE> in the lexical environment. + +<P>It's possible to argue in favor of dynamic scope also. One +argument is that in a lexically scoped language certain kinds of tool +procedures can't be written at all: the ones like <CODE>while</CODE> or <CODE>for</CODE> +that take an instruction list as input and <CODE>run</CODE> the list repeatedly. +Suppose you write a procedure in Logo that contains an instruction like + +<P><PRE>while [:degrees < 0] [make "degrees :degrees+360] +</PRE> + +<P>What variable <CODE>degrees</CODE> do you want this instruction to use? +Presumably you mean the same variable <CODE>degrees</CODE> that is used by other +instructions in the same procedure. But if Logo used lexical scope, +then <CODE>while</CODE> wouldn't have access to the local variables of your +procedure. (It's possible to design other features into a lexically scoped +language to get around this limitation, but the necessary techniques are +more complicated than the straightforward way you can write <CODE>while</CODE> in +Logo.) + +<P>Another argument for dynamic scope, with particular relevance to Logo, is +that dynamic scope fits better with the expectations of an unsophisticated +programmer who hasn't thought about scope at all. One of the design goals +of Logo is to be easy for such beginners. Until now we've been talking +about scope in terms of naming conflicts: what happens if two variables have +the same name. But suppose you write a program with a bunch of procedures, +with a bunch of <EM>distinct</EM> variable names used as inputs. It makes +life very simple if all those variables are available whenever you want them, +so you don't have to think in such detail about how to get a certain piece +of information down the procedure invocation chain to a subprocedure. If +some variables are accessible to subprocedures but others aren't, that's one +more mystery to make programming seem difficult. In particular, dynamic +scope can simplify the debugging of a Logo program. If you arrange for your +program to <CODE>pause</CODE> at the moment when an error happens, then you can +enter Logo instructions, with all of the local variables of <EM>all</EM> +pending procedure invocations available, to help you figure out the reason +for the error. Debuggers for lexically scoped languages require a more +complicated debugging mechanism in which the programmer must explicitly +shift focus from one procedure to another. + +<P>In the situations we've seen, lexical scope always acts as a <EM> +restriction</EM> on the availability of variables to subprocedures. That is, +a procedure's lexical environment is always a subset of its dynamic +environment. (Suppose procedure <CODE>a</CODE> includes the definition of +procedure <CODE>b</CODE>, which in turn includes the definition of <CODE>c</CODE>. So the +lexical environment of <CODE>c</CODE> is <CODE>{c,b,a}</CODE>. You might imagine that +<CODE>c</CODE>'s dynamic environment could be <CODE>{c,a}</CODE> if procedure <CODE>a</CODE> +invoked <CODE>c</CODE> directly, but in fact that's illegal. Just as <CODE>a</CODE> can't +use <CODE>b</CODE>'s local variables, it can't use <CODE>b</CODE>'s local procedures +either. The reason the dynamic environment can be different from the +lexical one at all is that two procedures can be part of the same block, +like <CODE>showdeck</CODE> and <CODE>deck</CODE> in the <CODE>cards</CODE> program.) In Lisp, +it's possible for a procedure to return <EM>another procedure</EM> as its +output--not just the name of the procedure or the text of the procedure, as +we could do in Logo, but the procedure itself, lexical environment and all. +When such a procedure is later invoked from some other part of the +program, the procedure's lexical environment may not be a subset +of its dynamic environment, and so lexical scope gives it access to +variables that it couldn't use under dynamic scope rules. That's a powerful +argument in favor of lexical scope for Lisp, but it doesn't apply to Pascal. + +<P>One special scope rule in Pascal applies to procedures declared in the same +block: The one declared later can invoke the one declared earlier, but not +the other way around. In the <CODE>cards</CODE> program, <CODE>deck</CODE> can call <CODE> +showdeck</CODE> but <CODE>showdeck</CODE> can't call <CODE>deck</CODE>. There is no deep reason +for this restriction; it's entirely for the benefit of the compiler. One of +the design goals of Pascal was that it should be easy to write a compiler +that goes through the source program once, from beginning to end, without +having to back up and read part of the program twice. In particular, when +the compiler sees a procedure invocation, it must already know what inputs +that procedure requires; therefore, it must have already read the header of +the subprocedure. Usually you can get around this restriction by rearranging +the procedures in your program, but for the times when that doesn't work +Pascal provides a kludge that lets you put the header in one place in the +source file and defer the rest of the procedure until later. + +<P> + +<P><H2>Typed Variables</H2> + + +<P>Berkeley Logo has three data types: <EM>word, list,</EM> and <EM> +array.</EM> (Numbers are just words that happen to be full of digits.) But a +<EM>variable</EM> in Logo does not have a type associated with it; any datum +can be the value of any variable. + +<P>Pascal has lots of types, and every variable belongs to exactly one of them. +In the sample programs so far we've used five types: <EM>char,</EM> <EM> +integer,</EM> <EM>array of char,</EM> <EM>array of integer,</EM> and <EM>packed array +of char.</EM> When a variable is declared, the declaration says what type it is. + +<P>The selection of data types is the area in which my Pascal compiler is most +lacking; I've implemented only a few of the possible types. I'll describe +the ones available in my compiler in detail and give hints about the others. + +<P> +The fundamental types out of which all others are built are the <EM> +scalar</EM> types that represent a single value, as opposed to an aggregate +like an array or a list. Pascal has four: + +<P><P> +<TABLE><TR><TH align="right" valign="top"><CODE>integer</CODE><TD> <TD valign="top">a positive or negative whole number (e.g., <CODE> +23</CODE>) +<TR><TH align="right" valign="top"><CODE>real</CODE><TD> <TD valign="top">a number including decimal fraction part (e.g., <CODE> +-5.0</CODE>) +<TR><TH align="right" valign="top"><CODE>char</CODE><TD> <TD valign="top">a single character (e.g., <CODE>'Q'</CODE>) +<TR><TH align="right" valign="top"><CODE>Boolean</CODE><TD> <TD valign="top"><CODE>true</CODE> or <CODE>false</CODE> + +<P></TABLE><P> + +<P>Pascal also has several kinds of aggregate types. The only one +that I've implemented is the array, which is a fixed number of uniform +elements. By "uniform" I mean that all members of the array must be of +the same type. Full Pascal allows the members to be of any type, including +an aggregate type, as long as they're all the same, so you could say + +<P><PRE>var a : array [1..10] of array [1..4] of integer; +</PRE> + +<P>to get an array of arrays. But in my subset Pascal the members of +an array must be scalars. This restriction is not too severe because Pascal +arrays can have <EM>multiple indices;</EM> instead of the above you can use +the equivalent + +<P><PRE>var a : array [1..10, 1..4] of integer; +</PRE> + +<P>This declaration creates a two-dimensional array whose members +have names like <CODE>a[3,2]</CODE>. + +<P>The notation <CODE>1..10</CODE> is called a <EM>range;</EM> it indicates the extent +of the array. Berkeley Logo arrays ordinarily start with index one, so +a Logo instruction like + +<P><PRE>make "ten array 10 +</PRE> + +<P>is equivalent to the Pascal declaration + +<P><PRE>var ten:array [1..10] of something +</PRE> + +<P>except that the Logo array need not have uniform +members.<SUP>*</SUP> (Also, +a subtle difference is that the Logo array is an independent datum that can +be the value of a variable just as a number can be the value of a variable. +The Pascal array <EM>is</EM> the variable; you can change the contents of +individual members of the array but it's meaningless to speak of changing +the value of that variable to be something other than that array.) + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>The Berkeley Logo <CODE>array</CODE> primitive can take an optional +second input to specify a different starting index.</SMALL></BLOCKQUOTE></SMALL><P>In Pascal an index range doesn't have to be numbers; you can use any scalar +type except real: + +<P><PRE>var frequency : array ['a'..'z'] of integer; +</PRE> + +<P>might be used in a program that counts the frequency of use of +letters, such as a cryptography program. The members of this array would be +used by referring to things like <CODE>frequency['w']</CODE>. (In Pascal +documentation there is a word for "scalar type other than real": It's + +called an <EM>ordinal</EM> type.) + +<P>A <EM>packed array</EM> is one that's represented in the computer in a way +that takes up as little memory as possible. Ordinary arrays are stored so +as to make it as fast as possible to examine or change an individual element. +The distinction may or may not be important for a given type on a given +computer. For example, most current home computers have their memory +organized in <EM>bytes</EM> that are just the right size for a single +character. On such a computer, an array of char and a packed array of char +will probably be represented identically. But one of my favorite larger +computers, the Digital Equipment Corporation PDP-10, had its memory +organized in <EM>words</EM> of 36 bits, enough for five 7-bit characters with +one bit left over. A packed array of char, on the PDP-10, would be +represented with five characters per word; an ordinary array of char might +store only one character per word, wasting some space in order to simplify +the task of finding the <EM>n</EM>th character of the array. + +<P>My compiler, which is meant to be simple rather than efficient, ignores the +<CODE>packed</CODE> declaration. The reason I used it in the <CODE>cards</CODE> program +is to illustrate a rule of Pascal: The statement + +<P><PRE>suitnames := 'HSDC' +</PRE> + +<P>assigns a <EM>constant string</EM> to the array variable <CODE> +suitnames</CODE>, and such assignments are allowed only to packed arrays of char. +Also, the size of the array must equal the length of the string. If +<CODE>suitnames</CODE> were an array of length 10, for example, I'd have had to say + +<P><PRE>suitnames := 'HSDC ' +</PRE> + +<P>filling up the unused part of the array explicitly with spaces. + +<P>In an assignment statement, the type of the variable on the left must be the +same as the type of the expression on the right. An assignment can copy one +array into another if it involves two variables of exactly the same type: + +<P><PRE>var a,b:array [3..17] of real; + +a := b +</PRE> + +<P>but except for the case of packed arrays of char mentioned above +there is no way to represent a constant array in a Pascal program. If you +want an array of all the prime numbers less than 10 you have to initialize it +one member at a time: + +<P><PRE>var primes : array [1..4] of integer; + +primes[1] := 2; +primes[2] := 3; +primes[3] := 5; +primes[4] := 7 +</PRE> + +<P>In scalar assignments, a slight relaxation of the rules is allowed in that +you may assign an integer value to a real variable. The value is converted +to a real number (e.g., <CODE>17</CODE> to <CODE>17.0</CODE>). The opposite is <EM> +not</EM> allowed, but there are two built-in functions <CODE>trunc</CODE> (for +"truncate") and <CODE>round</CODE> that can be used to convert a real value to an +integer. <CODE>Trunc</CODE> cuts off the fraction part, so <CODE>trunc(4.99)</CODE> is +<CODE>4</CODE>. <CODE>Round</CODE> rounds to the nearest integer. + +<P>Pascal provides the usual infix arithmetic operations <CODE>+</CODE>, <CODE>-</CODE>, +<CODE>*</CODE>, and <CODE>/</CODE>, following the usual precedence rules, just as in Logo. +The result of any of these is integer if both operands are integer, +otherwise real, except that the result of <CODE>/</CODE> (division) is always real. +There are integer division operations <CODE>div</CODE> (integer quotient) and <CODE> +mod</CODE> (integer remainder); both operands to these must also be integers. +The relational operators <CODE>=</CODE> (like <CODE>equalp</CODE> in Logo), <CODE><</CODE> (less +than), <CODE>></CODE> (greater than), <CODE><=</CODE> (less than or equal to), <CODE>>=</CODE> +(greater than or equal to), and <CODE><></CODE> (not equal to) take two real or +integer operands and yield a Boolean result. There are also Boolean +operators <CODE>and</CODE>, <CODE>or</CODE>, and <CODE>not</CODE>, just like the Logo ones except +that they use infix syntax: + +<P><PRE>(x < 0) and (y <= 28) +</PRE> + +<P><H2>Additional Types in Standard Pascal</H2> + +<P>Standard Pascal, but not my version, includes other aggregate types besides +the array. One such type is the <EM>record;</EM> a record is a non-uniform +aggregate, but the "shape" of the aggregate must be declared in advance. +For example, you can declare a record containing three integers and an array +of 10 characters. In the <CODE>cards</CODE> program, instead of two separate +arrays for ranks and suits I could have said + +<P><PRE>var carddeck: array [0..51] of record + rank:integer; + suit:char + end; +</PRE> + +<P>Then to refer to the rank of card number 4 I'd say + +<P><PRE>carddeck[4].rank +</PRE> + +<P>and that would be an integer. A <EM>pointer</EM> is a variable +whose value is the memory address of a record; pointer variables can be used +to implement dynamic data structures like Logo lists by building explicitly +the collections of boxes and arrows illustrated in some of the pictures in +Chapter 3. But it's hard to build anything in Pascal that's <EM>quite</EM> +like Logo lists, even using pointers, because what's in each box has to +belong to some particular predeclared type. + +<P>Real Pascal also includes user-defined types. There is a <CODE> +type</CODE> +declaration that goes before the <CODE>var</CODE> declaration in a block: + +<P><PRE>type string = packed array [1..10] of char; +</PRE> + +<P>Variable declarations can then use <CODE>string</CODE> as the type of the +variable being declared. In fact, standard Pascal <EM>requires</EM> the use +of named types in certain situations. For example, in a procedure header +the formal parameters must be given a named type (either a built-in scalar +type like <CODE>integer</CODE> or a defined type like <CODE>string</CODE>); since I haven't +implemented <CODE>type</CODE> my compiler allows + +<P><PRE>procedure paul(words:packed array [1..10] of char); +</PRE> + +<P>although standard Pascal doesn't allow such a header. + +<P> +You can also define <EM>subrange</EM> types: + +<P><PRE>type dieface = 1..6; +</PRE> + +<P>This type is really an integer, but it's constrained to have only +values in the given range. This particular one might be used for a variable +whose value is to represent the result of rolling a six-sided die. Finally, + +there are <EM>enumerated</EM> types: + +<P><PRE>type Beatle = (John, Paul, George, Ringo); +</PRE> + +<P>This type is used for a variable that represents one of a small +number of possible things. In reality it's also a kind of integer; the word +<CODE>John</CODE> represents 0, <CODE>Paul</CODE> is 1, and so on. In fact, it's only +during the compilation of the program that Pascal remembers the names of the +possible values; you can't read or print these variables during the running +of the program. + +<P><H2>Critique of Typed Variables</H2> + +<P>Why would anyone want to use a subrange or other restricted type? If the +program works using a variable of type <CODE>dieface</CODE>, it would work just as +well if the variable were of type <CODE>integer</CODE>. The only difference is +that using a subrange type is slower because the program has to check (at +run time) to make sure that any value you try to assign to that variable is +in the approved range. + +<P>According to Pascal enthusiasts, the virtue of restricted types, like the +virtue of typed variables in the first place, is that their use helps catch +program bugs. For example, in the <CODE>cards</CODE> program, procedure <CODE> +shuffle</CODE> has variables <CODE>i</CODE> and <CODE>j</CODE> that are used to index the arrays +<CODE>ranks</CODE> and <CODE>suits</CODE>. How do we know there isn't an error in the +program so that one of those variables is given a value that isn't a valid +index for those arrays? Such an error would be caught when we <EM>use</EM> +the index variable to try to refer to an array element that doesn't exist, +but it's easier to debug the program if we get the error message at the +moment when the variable is assigned the incorrect value. So I should have +declared + +<P><PRE>var i,j : 0..51; +</PRE> + +<P>instead of using <CODE>integer</CODE>. (Of course one reason I didn't +use a subrange type is that I didn't implement them in my compiler!) + +<P>The trouble is that strict typing of variables is an unnecessary pain in the +neck.<SUP>*</SUP> Take this business of array index +bounds. Here is a possible piece of Pascal program: + +<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I know I said I wasn't going to try to convince you which +language is better, but on this particular point I really think the Pascal +people don't have a leg to stand on.</SMALL></BLOCKQUOTE></SMALL><P><PRE>i := 0; +while i <= 51 do + begin + writeln(ranks[i]:3,suits[i]); + i := i+1 + end +</PRE> + +<P>There's nothing wrong with this. It will print the value of each +member of the two arrays, starting from index 0 and continuing through 51. +However, at the end of the <CODE>while</CODE> statement, the variable <CODE>i</CODE> has +the value 52. This is <EM>not</EM> an error; the program does <EM>not</EM> try +to refer to member 52 of the arrays. But if we declared <CODE>i</CODE> as a +subrange type the way we "should," the program will give an error message +and die. This particular example could be rewritten using <CODE>for</CODE> instead +of <CODE>while</CODE> to avoid the problem, but it turns out that there are many +algorithms that jump around in an array in which the index variable +sometimes takes on a value just outside the bounds of the array. Some sort +algorithms, for example, are like that. + +<P> + +<P>Typed variables work against program robustness. (A <EM>robust</EM> program +is one that keeps calm in the face of bad data or other user error, rather +than dying abruptly.) For example, suppose we want to find the sum of a +bunch of numbers. Some human being is going to type these numbers into the +computer, one at a time. We don't know in advance how many there are, so we +let the person type <CODE>done</CODE> when done. Since the typist is only human, +rather than a computer, we want to make sure the program doesn't blow up if +he accidentally types something other than a number or <CODE>done</CODE>. Here's +one way to do it in Logo: + +<P><PRE>to addnumbers +print [Enter the numbers, one per line.] +print [Type the word 'done' when done.] +print se [The sum is] addnumbers1 0 +end + +to addnumbers1 :sum +localmake "next readnumber +if emptyp :next [output :sum] +output addnumbers1 :sum+:next +end + +to readnumber +localmake "input readlist +if emptyp :input [output readnumber] +if equalp :input [done] [output []] +if numberp first :input [output first :input] +print [Please type numbers only.] +output readnumber +end +</PRE> + +<P>If the user makes a typing mistake, the program says so, ignores +the bad input, and keeps going. Now, how shall we write this in Pascal? +Into what type of variable should we read each number from the keyboard? If +we pick <CODE>integer</CODE>, then any entry of a non-number will incite Pascal to +print an error message and terminate the program. Instead we can read the +keyboard entry into an <CODE>array of char</CODE>, one character at a time. Then +anything the user types is okay, but we can't do arithmetic on the +result--we can't add it into the accumulated sum. We can read it as <CODE> +char</CODE>s and then write a procedure that knows how to look at a string of +digits and compute the number that those digits represent. But this is not +the sort of thing that we should need to do ourselves in a high-level +programming language. + +<P>Why should a programmer have to decide in advance whether or not the numbers +that a program will manipulate are integers? In Logo I can write a general +numeric procedure like this one: + +<P><PRE>to square :x +output :x * :x +end +</PRE> + +<P>but in Pascal I need one for each kind of number: + +<P><PRE>function RealSquare(x:real): real; +begin + RealSquare := x * x +end; + +function IntSquare (x:integer): integer; +begin + IntSquare := x * x +end; +</PRE> + +<P>Why pick on the distinction between integer and non-integer values? +Why not positive and negative values, or odd and even values? The historical +answer is that computer hardware uses two different representations for +integer and real numbers, but so what? That doesn't mean the distinction +is relevant to the particular program I'm writing. + +<P>The language ML, which I mentioned earlier as an example of a pure functional +language, tries to provide the best of both worlds on this issue. It does +require that variables have types, as in Pascal, to help catch programming +errors. But two features make ML's type system more usable than Pascal's. +One is the provision of <EM>union types.</EM> It's possible to say that +a particular variable must contain either a number or the word <CODE>done</CODE>, +for example. (Pascal has something like this, called <EM>variants,</EM> but +they're less flexible.) Second, the ML compiler uses <EM>type inference,</EM> +a technique by which the compiler can often figure out the appropriate type +for a variable without an explicit declaration. + +<P> + +<P> +<H2>Procedures and Functions</H2> + +<P> + +<P>In Logo a distinction is made between those procedures that output a value +(operations) and those that don't (commands). Pascal has the same +categories, but they're called <EM>functions</EM> and <EM> +procedures</EM> respectively. + +<P>A function is a block just like a procedure block except for the minor +changes needed to accommodate the fact that the function produces a value. +First of all, the function header has to say what the <EM>type</EM> of the +result will be: + +<P><PRE>function whatever (<EM>arguments</EM>) : integer; +</PRE> + +<P>The function's type must be a scalar, not an aggregate type. This +restriction is in the standard only to make life easier for the compiler, +and some versions of Pascal do allow array-valued (or record-valued, +etc.) functions. + +<P>The other difference is that in the statement part of a function block we +have to tell Pascal what the value will be. That is, we need something +equivalent to <CODE>output</CODE> in Logo. Pascal's convention is that somewhere +in the block an assignment statement must be executed that has the +name of the function as its left hand side. That is, the function name is +used in an assignment statement as though it were a variable name. +(However, the name must <EM>not</EM> be declared as a variable.) This +notation may be a little confusing, because if the same name appears on the +<EM>right</EM> side of the assignment statement, it signals a recursive +invocation of the function. Perhaps an example will make this clear. + +<P>Program <CODE>multi</CODE> is a Pascal version of the memoized multinomial function +from Chapter 3. In the Logo version, <EM>t</EM>(<EM>n</EM>,<EM>k</EM>) was memoized using the +property name <EM>k</EM> on the property list named <EM>n</EM>. In the Pascal version, +since we have multi-dimensional arrays, it is straightforward to use a +two-dimensional array and store <EM>t</EM>(<EM>n</EM>,<EM>k</EM>) in <CODE>memo[n,k]</CODE>. + +<P><PRE>program multi; + {Multinomial expansion problem} + +var memo: array [0..4, 0..7] of integer; + i,j: integer; + +function t(n,k:integer) : integer; + + function realt(n,k:integer) : integer; + {without memoization} + + begin {realt} + if k = 0 then + realt := 1 + else + if n = 0 then + realt := 0 + else + realt := t(n,k-1)+t(n-1,k) + end; {realt} +</PRE> +<PRE> begin {t} + if memo[n,k] < 0 then + memo[n,k] := realt(n,k); + t := memo[n,k] + end; {t} + +begin {main program} + {initialization} + for i := 0 to 4 do + for j := 0 to 7 do + memo[i,j] := -1; + + {How many terms in (a+b+c+d)7?} + writeln(t(4,7)); +end. +</PRE> + +<P>The assignment statements like + +<P><PRE>realt := 0 +</PRE> + +<P>are the ones that control the values returned by the functions. +These assignment statements are not exactly like <CODE>output</CODE> in Logo +because they do not cause the function to return immediately. They act just +like ordinary assignments, as if there were actually a variable named <CODE> +realt</CODE> or <CODE>t</CODE>; when the statement part of the function is finished, +whatever value was most recently assigned to the function name is the one +that's used as the return value. (In fact the functions in program <CODE> +multi</CODE> are written so that only one such assignment is carried out, and +there are no more statements to execute after that assignment. That's a +pretty common programming style; it's rare to change the assignment to the +function name once it's been made.) + +<P>Apart from arbitrary syntactic details, Pascal's design with respect to +procedures and functions is similar to Logo's, so I can't ask why the two +languages made different choices. It's probably just as well, since you +shouldn't get the impression that Pascal is the exact opposite of Logo in +every way. Instead we could compare these two languages with Lisp, in +which there are only operations, or most versions of BASIC, in which +there are only commands. But I don't have the space to teach you enough +about those languages to make such a comparison meaningful. + +<P><H2>Call by Value and Call by Reference</H2> + +<P>Consider this Logo procedure: + +<P><PRE>to increment :var +make :var (thing :var)+1 +end + +? <U>make "baz 23</U> +? <U>increment "baz</U> +? <U>print :baz</U> +24 +</PRE> + +<P>The input to <CODE>increment</CODE> is the <EM>name</EM> of a variable +that you want to increment. A similar technique is used, for example, in +the <CODE>push</CODE> and <CODE>pop</CODE> library procedures, which take the name of a stack +variable as input. The reason this technique is necessary is that we want +the procedure to be able to modify the variable--to assign it a new value. + +<P>The same technique won't work in Pascal. For one thing, the association of +a name with a variable only exists at compile time. Once the compiled +program is running, the variables have no names, only addresses in computer +memory. Also, <CODE>increment</CODE> takes advantage of dynamic scope because the +variable it wants to modify isn't its own, but rather a variable accessible +to the calling procedure. + +<P>Here's how you do it in Pascal: + +<P><PRE>procedure increment(var v:integer); + begin + v := v+1; + end; +</PRE> + +<P>What's new here is the reserved word <CODE>var</CODE> in the argument +list. This word indicates that <CODE>v</CODE> is a <EM>variable parameter;</EM> +ordinary ones are called <EM>value parameters.</EM> <CODE>Increment</CODE> would be +used in a program like this: + +<P><PRE>program whatzit; + +var gub:integer; + begin + ... + gub := 5; + increment(gub); + ... + end. +</PRE> + +<P>Suppose <CODE>increment</CODE> had been written without the word <CODE>var</CODE> in its +header. In that case, when the statement <CODE>increment(gub)</CODE> was executed +here's what would happen. First, the actual argument to <CODE>increment</CODE> +(namely <CODE>gub</CODE>) would be evaluated. The value would be 5, since that's +the value of the variable <CODE>gub</CODE>. Then that value would be assigned to +the local variable <CODE>v</CODE> in <CODE>increment</CODE>. Then the instruction part of +<CODE>increment</CODE> would be run. The assignment statement there would change +the value of <CODE>v</CODE> from 5 to 6. Then <CODE>increment</CODE> would be finished, +and its local variable <CODE>v</CODE> would disappear. All of this would have no +effect on the variable <CODE>gub</CODE>. This ordinary interpretation of <CODE>v</CODE> +is called <EM>call by value</EM> because what gets associated with +the name <CODE>v</CODE> is the <EM>value</EM> of the actual argument, 5 in this +example, regardless of how that 5 was derived. For example, the instruction +in the main program could have been + +<P><PRE>increment(2+3); +</PRE> + +<P>and it wouldn't have mattered. + +<P> +Making <CODE>v</CODE> a variable parameter instead of a value parameter changes the +operation of the program substantially. When <CODE>increment</CODE> is invoked, +the actual argument <EM>must</EM> be a variable name, not an arbitrary +expression. Pascal does not find the value of the actual argument and pass +that value along to <CODE>increment</CODE>; instead, the formal parameter +<CODE>v</CODE> becomes <EM>another name for the same variable</EM> named in the +actual argument. In this example, <CODE>v</CODE> becomes another name for <CODE> +gub</CODE>. So the assignment statement + +<P><PRE>v := v+1 +</PRE> + +<P>isn't really about the local variable <CODE>v</CODE> at all; it's another +way to say + +<P><PRE>gub := gub+1 +</PRE> + +<P>and so it <EM>does</EM> affect the variable in the calling block. +This use of variable parameters is called <EM>call by reference</EM> +because the formal parameter (<CODE>v</CODE>) <EM>refers</EM> to another variable. + +<P>One way to think about call by reference is that it provides, in effect, a +sort of limited dynamic scope. It's a way for a superprocedure to +allow a subprocedure access to one selected variable from the +superprocedure's lexical environment. Because this permission is +given to the subprocedure explicitly, call by reference doesn't give rise to +the possible naming bugs that argue against dynamic scope in general. Also, +dynamic scope as used in Logo has the problem that you have to be careful +not to allow a formal parameter name to be the same as the name of a +variable you want to use from the superprocedure's environment. For example, +in the Logo version of <CODE>increment</CODE>, what if you wanted to use <CODE> +increment</CODE> to increment a variable named <CODE>var</CODE>? If you try to say + +<P><PRE>increment "var +</PRE> + +<P>it won't work, because <CODE>increment</CODE> will end up trying to +increment its own formal parameter. (This is why the inputs to some of the +Berkeley Logo library procedures have such long, obscure names.) But the +Pascal <CODE>increment</CODE> would have no trouble with a variable named <CODE>v</CODE> +in the calling procedure. + +<P>On the other hand, call by reference is a little mysterious. If you've +understood all of the above, and you know exactly when you should say <CODE>var</CODE> +in a formal parameter list and when you shouldn't, you're doing better than +most beginning Pascal students. In Logo there is only one rule about +passing inputs to procedures; to make something like <CODE>increment</CODE> work, +you <EM>explicitly</EM> pass the <EM>name</EM> of a variable as input. + +<P>Call by reference is generally used when a subprocedure needs to change the +value of a variable in a superprocedure. But there is also another +situation in which some people use it. Suppose you want to write a +procedure that takes a large array as an argument. If you make the array a +value parameter, Pascal will allocate space for the array in the +subprocedure and will copy each member of the array from the +superprocedure's variable into the subprocedure's variable as the first step +in invoking the subprocedure. This time-consuming array copying can be +avoided by declaring the array as a variable parameter, thereby giving the +subprocedure direct access to the superprocedure's array. Pascal enthusiasts +consider this use of call by reference cheating, though, because it creates +the possibility that the subprocedure could accidentally change something in +the superprocedure's array. Call by value is safer, from this point of view. + +<P><H2>Parameters in Logo: Call by Binding</H2> + + +<P>Does Logo use call by value or call by reference for passing arguments to +procedures? The official textbook answer is "call by value," but I find +that misleading, because those two categories really make sense +only in a language with a particular idea of what a variable is. A +Logo variable is different from a Pascal variable in a +subtle way. If you can understand this difference between the two +languages, you'll have learned something very valuable. + +<P>In Logo, the world is full of data (words, lists, and arrays). +These data may or may not be associated with variables. For example, when +you enter an instruction like + +<P><PRE>print butlast butfirst [Yes, this is a list evidently.] +</PRE> + +<P>three different lists are involved: the one you typed explicitly +in the instruction and two smaller lists. None of these three lists is +the value of any variable. A <EM>variable</EM> is an +association (called a <EM>binding</EM>) between a name and a datum. If +you enter the instruction + +<P><PRE>make "langs [Logo Lisp APL] +</PRE> + +<P>we say that the name <CODE>langs</CODE> <EM>is bound to</EM> the indicated +list. If you then do + +<P><PRE>make "mylangs :langs +</PRE> + +<P>we say that the name <CODE>mylangs</CODE> is bound to the <EM>same</EM> +datum as <CODE>langs</CODE>. We're dealing with one list that has two names. + +<P> +<CENTER><IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch4/bindings.gif" ALT="figure: bindings"></CENTER> + +<P> + +<P>In Pascal a variable is not a binding in this sense. A Pascal variable <EM> +is</EM> the datum it contains. If you have two array variables + +<P><PRE>var this,that: array [1..10] of integer; +</PRE> + +<P>and you do an assignment like + +<P><PRE>this := that; +</PRE> + +<P>then there are two <EM>separate</EM> arrays that happen to have +equal values stored in them. The same thing is true, although it's less +obviously meaningful, about scalars. If integer variables <CODE>i</CODE> and <CODE> +j</CODE> both have the value 10, then there are <EM>two different integers</EM> +that happen to have the same value. That's not the way a mathematician uses +the word "integer"; to a mathematician, there is only one 10. But to a +Pascal programmer, an integer isn't something like 10; an integer is a box, +and in the box there might be something like 10. + +<P>In Logo, a variable assignment (that is, an invocation of <CODE>make</CODE>) +changes the <EM>binding</EM> of the given variable name so that that name is +now associated with a different datum. In Pascal, a variable assignment +changes the value of <EM>the datum</EM> that is unalterably associated with +the named variable. + +<P>The official textbook story is this: A Logo variable is a box, just as +a Pascal variable is a box. The difference is that what goes in the box, +in Logo, is a <EM>pointer</EM> to the datum of interest. (A binding, +officially, is the connection between the variable's name and its box. +So there are two levels of indirection in finding what we ordinarily +think of as the value of a variable: First the binding gets us from the +name to a box--a location in the computer's memory--and then in that box we +find a pointer, which gets us to <EM>another</EM> location in memory, which +holds the actual information we want. In this model, call by reference +can easily be described by saying that two different names are bound to +the same box.) From this point of view, it makes +sense to say that Logo uses call by value, because the "value" in question +is the pointer, which is indeed copied when a procedure is called. + +<P>But ordinarily we don't think of the value of a Logo variable as being a +pointer; we think that the value of the variable is a word, a list, or an +array. From that point of view, parameter passing in Logo acts like call by +reference in some ways but like call by value in other ways. For example, +call by value makes a copy of the datum being passed. Logo does not copy +the actual datum, so in that respect it's like call by reference. On the +other hand, assigning a new value to a Logo formal parameter does not change +the value of any variables in the calling procedure; in that way, Logo works +like call by value. On the third hand, if the datum to which the formal +parameter is bound is a <EM>mutable</EM> data structure, such as an array, a +Logo subprocedure <EM>can</EM> change the value of a variable in the calling +procedure, not by assigning a new value to the formal parameter name +(changing the binding), but by invoking <CODE>setitem</CODE> on the shared datum +(altering the bound datum itself). + +<P>The following chart is a graphic representation of the ideas in the +last paragraph. The three columns represent Pascal call by value, Pascal +call by reference, and Logo. In each case the main program has two arrays +named <CODE>actual</CODE> and <CODE>other</CODE>; it then invokes a procedure <CODE>proc</CODE> +using <CODE>actual</CODE> as the actual argument providing the value for <CODE> +proc</CODE>'s formal parameter <CODE>formal</CODE>. Here are the programs: + +<P><TABLE> +<TR><TH>Pascal call by value<TH> <TH>Pascal call by reference +<TR><TD> +<TR><TD valign="top"><PRE>program pgm; +type pair = array [0..1] + of integer; +var actual,other: pair; + +procedure proc(formal:pair); + begin + <EM>body</EM> + end + +begin + actual[0] := 3; + actual[1] := 4; + other[0] := 5; + other[1] := 6; + proc(actual) +end. +</PRE><TD><TD valign="top"><PRE>program pgm; +type pair = array [0..1] + of integer; +var actual,other: pair; + +procedure proc(var formal:pair); + begin + <EM>body</EM> + end + +begin + actual[0] := 3; + actual[1] := 4; + other[0] := 5; + other[1] := 6; + proc(actual) +end. +</PRE></TABLE> + +<P><TABLE><TR><TH>Logo +<TR><TD> +<TR><TD><PRE>make "actual {3 4}@0 +make "other {5 6}@0 +proc :actual + +to proc :formal +<EM>body</EM> +end +</PRE></TABLE> + +<P> +<CENTER><IMG SRC="callby.gif" ALT="figure: callby"></CENTER> + +<P>The first row of the figure shows the situation when <CODE>proc</CODE> is +entered, before its body is executed. The second row shows what happens if +<CODE>proc</CODE> contains an assignment of <CODE>other</CODE> to <CODE>formal</CODE>, i.e., + +<P><PRE>formal := other +</PRE> + +<P>in either Pascal version or + +<P><PRE>make "formal :other +</PRE> + +<P>in the Logo version. The third row shows what happens if, instead, +<CODE>proc</CODE> contains an assignment of just one member of the array, i.e., + +<P><PRE>formal[1] := other[1] +</PRE> + +<P>in either Pascal version or + +<P><PRE>setitem 1 :formal (item 1 :other) +</PRE> + +<P>in the Logo version. Your goal is to see what happens to <CODE> +actual</CODE> in each case when <CODE>proc</CODE> is finished. + +<P>Our final Pascal program example, showing the use of call by reference, +is a version of the partition sort from Chapter 3 that uses the +technique of exchanging two array members when appropriate to divide the +array into two partitions "in place" (without requiring the allocation of +extra arrays to hold the partitions). This program is adapted from Jon +Bentley's version in <EM>Programming Pearls</EM> (Addison-Wesley, 1986). +It's much closer in style to the real quicksort than my list-based version. + +<P>In the partition sort program of Chapter 3, I had to put a lot of effort +into preventing a situation in which every member of the list being sorted +ended up on the same side of the partition value. The quicksort solution +starts by choosing some member of the array as the partition value and +excluding that member from the partitioning process. As a result, the worst +possible case is that the <EM>n</EM> members of the array are divided into the +partitioning member, a partition of size <EM>n</EM>−1, and a partition of size zero. +If we're unlucky enough to hit that case every time, we'll have an O(<EM>n</EM><SUP><SMALL>2</SMALL></SUP>) +running time, but not an infinite loop. + +<P>How do we choose the partitioning member? It turns out that just picking +one at random is surprisingly successful; sometimes you get a very bad +choice, but usually not. But in this program I'm using a popular method +that tends to work a little better (that is, to give more balanced partition +sizes): The program finds the first member of the unsorted array, the last +member, and the one halfway in between, and chooses the <EM>median</EM> of +these three values--the one that's neither the largest nor the smallest. + +<P>Once the partitioning member has been chosen, the goal is to rearrange the +array members into an order like this: + +<P><CENTER><IMG SRC="partition.gif" ALT="figure: partition"></CENTER> + +<P>If other members of the array have the same value as the one we've +chosen as the partitioning member, it doesn't really matter in which partition +they end up. What does matter is that before doing the partitioning, we +don't know where in the array the partitioning member will belong, so how can +we keep from bumping into it as we rearrange the other members? The +solution is that the partitioning member is temporarily kept in the leftmost +possible position; the other members are partitioned, and then the +partitioning member is swapped back into its proper position. + +<P>The partition works using two <EM>index</EM> variables <CODE>i</CODE> and <CODE>j</CODE>, +which start at the leftmost and rightmost ends of the part of the array that +we're sorting. (Remember that this algorithm uses recursive calls to sort +each partition, so that might not be all 100 members of the full array.) We +move <CODE>i</CODE> toward the right, and <CODE>j</CODE> toward the left, until we find +two members out of place. That is, we look for a situation in which <CODE> +data[i]</CODE> is greater than the partitioning member and <CODE>data[j]</CODE> is +smaller than the partitioning member. We then interchange those two members +of the array and continue until <CODE>i</CODE> and <CODE>j</CODE> meet in the middle. +Procedure <CODE>exch</CODE> has two variable parameters and exchanges their values. + +<P>Program <CODE>psort</CODE> illustrates a fairly common but not obvious technique: +the array <CODE>data</CODE> contains 100 "real" members in positions 0 to 99 but +also has a "fence" or "sentinel" member (with index 100) +just so that the program doesn't have to make a special case check for the +index variable <CODE>i</CODE> reaching the end of the array. The +value of <CODE>data[100]</CODE> is guaranteed to be greater than all the numbers that +are actually being sorted. Having this extra member in the array avoids the +need for an extra comparison of the form + +<P><PRE>if i > upper then ... +</PRE> + +<P>and thereby helps make the program a little faster. + +<P><PRE>program psort; + {partition sort demo} + +var data: array [0..100] of integer; + i: integer; + +procedure showdata; + {print the array} + + var i: integer; + + begin {showdata} + for i := 0 to 99 do + begin + if i mod 20 = 0 then writeln; + write(data[i]:3) + end; + writeln; + writeln + end; {showdata} + +function median(lower,upper:integer):integer; + {find the median of three values from the data array} + var mid: integer; + + begin + mid := (lower+upper) div 2; + if (data[lower] <= data[mid]) and (data[mid] <= data[upper]) then + median := mid + else if (data[lower] >= data[mid]) and + (data[mid] >= data[upper]) then + median := mid + else if (data[mid] <= data[lower]) and + (data[lower] <= data[upper]) then + median := lower + else if (data[mid] >= data[lower]) and + (data[lower] >= data[upper]) then + median := lower + else median := upper + end; + +procedure sort(lower,upper:integer); + {sort part of the array} + + var key,i,j:integer; + + procedure exch(var a,b:integer); + {exchange two integers} + + var temp:integer; + + begin {exch} + temp := a; + a := b; + b := temp + end; {exch} + + begin {sort} + if upper > lower then + begin + exch (data[lower],data[median(lower,upper)]); + key := data[lower]; + i := lower; + j := upper+1; + repeat + i := i+1 + until data[i] >= key; + repeat + j := j-1 + until data[j] <= key; + while (i <= j) do + begin + exch(data[i], data[j]); + repeat + i := i+1 + until data[i] >= key; + repeat + j := j-1 + until data[j] <= key + end; + exch(data[lower], data[j]); + sort(lower,j-1); + sort(i,upper) + end + end; {sort} + +begin {main program} + data[100] := 200; + for i := 0 to 99 do + data[i] := random(100); + writeln('Data before sorting:'); + showdata; + + sort(0,99); + writeln('Data after sorting:'); + showdata +end. +</PRE> + +<P> + + +<P><A HREF="../v3-toc2.html">(back to Table of Contents)</A> +<P><A HREF="../v3ch3/v3ch3.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v3ch5/v3ch5.html"><STRONG>NEXT</STRONG></A> + +<P> +<ADDRESS> +<A HREF="../index.html">Brian Harvey</A>, +<CODE>bh@cs.berkeley.edu</CODE> +</ADDRESS> +</BODY> +</HTML> |