<HTML>
<HEAD>
<TITLE>Computer Science Logo Style vol 3 ch 4: Programming Language Design</TITLE>
</HEAD>
<BODY>
<CITE>Computer Science Logo Style</CITE> volume 3:
<CITE>Beyond Programming</CITE> 2/e Copyright (C) 1997 MIT
<H1>Programming Language Design</H1>
<TABLE width="100%"><TR><TD>
<IMG SRC="../csls3.jpg" ALT="cover photo">
<TD><TABLE>
<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian
Harvey</A><BR>University of California, Berkeley</CITE>
<TR><TD align="right"><BR>
<TR><TD align="right"><A HREF="../pdf/v3ch04.pdf">Download PDF version</A>
<TR><TD align="right"><A HREF="../v3-toc2.html">Back to Table of Contents</A>
<TR><TD align="right"><A HREF="../v3ch3/v3ch3.html"><STRONG>BACK</STRONG></A>
chapter thread <A HREF="../v3ch5/v3ch5.html"><STRONG>NEXT</STRONG></A>
<TR><TD align="right"><A HREF="https://mitpress.mit.edu/books/computer-science-logo-style-second-edition-volume-3">MIT
Press web page for <CITE>Computer Science Logo Style</CITE></A>
</TABLE></TABLE>
<HR><P>Program file for this chapter: <A HREF="pascal.lg"><CODE>pascal</CODE></A>
<P>This chapter and the next are about two related things: why different
programming languages are different and how a programming language is
implemented. To make the discussion concrete, I've chosen a specific
language as an example: Pascal. That choice seems appropriate partly
because Pascal is very different from Logo and partly because it is widely
used as a vehicle for teaching introductory computer science, the same task
I'm attempting in this book using Logo.<SUP>*</SUP>
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>The recent trend in computer
science education has been a shift from Pascal to C or C++. I haven't
followed that trend in this book because from my perspective C illuminates
no new issues, it has a more complicated syntax, and it leaves out one
interesting Pascal feature: nested procedure definitions (block structure).
C++ does introduce the issue of object-oriented programming, but, I think,
not in a way that clarifies the issues; if you want to understand OOP you'd
do better to learn Object Logo.</SMALL></BLOCKQUOTE></SMALL><P>For the purposes of this book I've written a program that translates a <EM>
small subset</EM> of Pascal into a simulated machine language. You can get a
real Pascal compiler for your computer that accepts the full language, and
that's what you should do if you want to learn how to program in Pascal. I
had two reasons for writing this subset compiler. One is that some readers
may not otherwise have access to a Pascal compiler, and mine, despite its
limitations, will enable you to explore the parts of the language I'm going
to be talking about. The other is that the next chapter is about how a
compiler works, and this compiler is accessible to examination because it's
written in Logo.
<P>When you're comparing two programming languages an obvious question to ask
is "which is better?" Please don't use my partial Pascal compiler as the
basis for an answer to that question; it wouldn't be fair. You already know
my opinion, but my purpose in this chapter is not to convince you of it.
Instead I want you to understand <EM>why</EM> each language is designed the
way it is. For each of the language differences we'll examine, there are
good reasons for either choice; the reasons that influence a language
designer will depend on the overall goals he or she has for this language.
<P><H2>Programming paradigms</H2>
<P>Perhaps the most important aspect of the design of a programming language
is the <EM>programming paradigm</EM> that it encourages. A paradigm
(it's pronounced "para" as in "parakeet" followed by "dime" as in ten
cents) is an approach to organizing a complex program: How do you combine
the primitives of a language to accomplish harder tasks? It's an aspect of
programming style, but when people say "style" they're usually thinking of
smaller details, such as the use of comments in procedure definitions or
choosing sensible variable names. Perhaps an example of different
paradigms will help.
<P>Here's how the factorial function is usually computed in Logo, using a
recursive operation:
<P><PRE>to fact :n
if :n=0 [output 1]
output :n * fact :n-1
end
</PRE>
<P>The goal is to multiply several numbers together, the integers
from 1 to <CODE>:n</CODE>. We do this by carrying out one multiplication in each
recursive invocation. This procedure is written in the
<EM>functional programming</EM> paradigm; the main tool for building
up complexity is composition of functions. In this example, the
result of the recursive <CODE>fact</CODE> invocation is composed with the primitive
<CODE>*</CODE> (multiplication) function.
<P>Now consider this alternate approach:
<P><PRE>to fact.seq :n
localmake "product 1
for [i 1 :n] [make "product (:product * :i)]
output :product
end
</PRE>
<P>This is an example of the <EM>sequential programming</EM> paradigm,
so called because the <CODE>for</CODE> instruction carries out a sequence of steps:
<P><P>
<TABLE><TR><TH align="right" valign="top">•<TD> <TD valign="top">Multiply the accumulated product by 1.
<TR><TH align="right" valign="top">•<TD> <TD valign="top">Multiply the product by 2.
<TR><TH align="right" valign="top">•<TD> <TD valign="top">Multiply it by 3.
<P></TABLE>
<P>... and so on. Instead of a composition of functions, we have
a partial result stored in a box, the variable <CODE>product</CODE>. At each
step, the old value is replaced with an updated value.
<P>Although <CODE>fact.seq</CODE> can be written in Logo, it's not the most natural
Logo style. Most versions of Logo don't even provide <CODE>for</CODE> as a
primitive command, although (as we saw in Volume 2) it can be written in
Logo.<SUP>*</SUP> As we've seen, Logo encourages the
functional programming paradigm, in which complicated
computations are achieved by means of function composition and recursion.
Logo encourages functional programming partly through
its emphasis on recursion rather than on iterative control structures,
and partly because lists are used as the main data aggregation mechanism.
As we saw in Chapter 3, lists encourage an aggregate to be built up one
member at a time (as recursive functions do), and discourage mutation
(which is crucial to the sequential approach).
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Even in Berkeley Logo, <CODE>for</CODE> is a library procedure rather
than a true primitive.</SMALL></BLOCKQUOTE></SMALL><P>In Pascal, the opposite is true. It's possible to write a recursive
factorial function in Pascal:
<P><PRE>function fact(n:integer): integer;
begin
if n=0 then
fact := 1
else
fact := n * fact(n-1)
end;
</PRE>
<P>but a habitual Pascal programmer would be much more likely
to write this function in sequential style:
<P><PRE>function fact(n:integer): integer;
var product, i: integer;
begin
product := 1;
for i := 1 to n do
product := product * i;
fact := product
end;
</PRE>
<P>(Don't worry, I know the details of the notation are a mystery to
you, but you should still be able to see the relationship between each Pascal
version and the corresponding Logo version. The only crucial point about
notation right now is that <CODE>:=</CODE> is the Pascal assignment operator, like
<CODE>make</CODE> in Logo. We'll go into the details of Pascal syntax later.)
<P>Here's a more complicated example, showing how data aggregates are used in
the two paradigms. In Chapter 2 we explored the Simplex lock problem by
computing the function
<P>
<P><P><CENTER><IMG SRC="math1.gif" ALT="math display"></CENTER><P>
<P>
<P>using these procedures:
<P><PRE>to simplex :buttons
output 2 * f :buttons
end
to f :n
if equalp :n 0 [output 1]
output cascade :n [? + ((choose :n (#-1)) * f (#-1))] 0
end
</PRE>
<P>Here, the mathematical definition of <EM>f</EM> in terms of itself is
reflected in the recursive nature of the operation <CODE>f</CODE>. In Chapter 3, we
improved the efficiency of the procedure by remembering smaller values of
<EM>f</EM> to avoid recomputing them; similarly, instead of computing the <CODE>
choose</CODE> function separately each time, we used old values to compute
new ones:
<P><PRE>to simplex :buttons
output 2 * first (cascade :buttons
[fput (sumprods butfirst ?2 ?1) ?1] [1]
[fput 1 nextrow ?2] [1 1])
end
to sumprods :a :b
output reduce "sum (map "product :a :b)
end
to nextrow :combs
if emptyp butfirst :combs [output :combs]
output fput (sum first :combs first butfirst :combs) ~
nextrow butfirst :combs
end
</PRE>
<P>The recursive nature of <EM>f</EM> is less obvious in the second
implementation, but the overall technique is still composition of
functions. (Recall that the job of <CODE>cascade</CODE> is to invoke a
function repeatedly, in the pattern <EM>f</EM>(<EM>f</EM>(<EM>f</EM>(··· <EM>f</EM>(<EM>x</EM>)))). In this
case, <CODE>cascade</CODE> is computing two functions in parallel; one is a list of
values of the Simplex function <EM>f</EM> and the other is a row of Pascal's
triangle.) The availability of higher order functions (in this program
I've used <CODE>cascade</CODE>, <CODE>map</CODE>, and <CODE>reduce</CODE>) is another way
in which Logo encourages the functional paradigm.
<P>In sequential style, the composition of functions is replaced by a sequence
of steps in which values are stored in boxes (members of arrays) and
repeatedly replaced with different values:
<P><PRE>to simplex.seq :buttons
localmake "f (array :buttons+1 0)
localmake "combs (array :buttons+1 0)
local [left right]
setitem 0 :f 1
setitem 0 :combs 1
for [i 1 :buttons] [
setitem :i :f 0
make "right 0
for [j 0 :i-1] [
make "left :right
make "right item :j :combs
setitem :j :combs :left+:right
setitem :i :f (item :i :f) + (item :j :f)*(item :j :combs)
]
setitem :i :combs 1
]
output 2 * item :buttons :f
end
</PRE>
<P>It may take some effort to convince yourself that this procedure
really computes the same results as the other versions! Within the
procedure, the array <CODE>f</CODE> contains the values <EM>f</EM>(0), <EM>f</EM>(1), ... as
they are computed one by one; the array <CODE>combs</CODE> contains one row (at a
time) of Pascal's triangle.
<P>The procedure first puts <EM>f</EM>(0) into the zeroth position of the <CODE>f</CODE> array
and the first row of Pascal's triangle (containing just one 1) in the <CODE>
combs</CODE> array. Then comes a <CODE>for</CODE> loop that computes <EM>f</EM>(1), then <EM>f</EM>(2),
and so on, until the desired value is reached. An inner <CODE>for</CODE> loop
fills the same purpose as the <CODE>sumprods</CODE> function in the previous
version of <CODE>simplex</CODE>: It computes the sum of several terms, not by
function composition but by adding each term into the sum separately.
The instruction
<P><PRE>setitem :i :f (item :i :f) + (item :j :f)*(item :j :combs)
</PRE>
<P>adds one additional term to the sum each time it's carried out.
<P>The sequential Simplex calculation looks bizarre in Logo, but it's much
more natural in Pascal:
<P><PRE>function simplex(buttons:integer): integer;
var left, right, i, j: integer;
f, combs: array [0..30] of integer;
begin
f[0] := 1;
combs[0] := 1;
for i := 1 to buttons do
begin
f[i] := 0;
right := 0;
for j := 0 to i-1 do
begin
left := right;
right := combs[j];
combs[j] := left+right;
f[i] := f[i] + (f[j] * combs[j])
end;
combs[i] := 1
end;
simplex := 2 * f[buttons]
end;
</PRE>
<P>Pascal is well suited to this style of programming for several
reasons. One is that the <CODE>f[i]</CODE> notation for a member of an array is
more compact and more readable than Logo's use of procedure invocations
(calling <CODE>item</CODE> to examine an array member and <CODE>setitem</CODE> to modify
its value). Another, already mentioned, is that <CODE>for</CODE> is built into
Pascal. Perhaps most important is Pascal's <EM>block structure:</EM>
the keywords <CODE>begin</CODE> and <CODE>end</CODE> can be used to group what would
otherwise be separate instructions into one larger instruction. In Logo,
the instructions that are repeated in a <CODE>for</CODE> loop must be part of a
list, one of the inputs to the <CODE>for</CODE> procedure; in principle, the entire
<CODE>for</CODE> invocation is one Logo instruction line.
<P>Both Logo and Pascal are compromises between the functional paradigm and
the sequential paradigm. (In Logo, turtle graphics programs are most often
done sequentially, whereas the manipulation of words and sentences is
generally done functionally.) But Logo is much more of a functional
language than Pascal, partly because it supports list processing (you can
create lists in Pascal, but it's painful), and even more importantly because
in Logo it's easy to invent higher order functions such as <CODE>map</CODE> and
<CODE>cascade</CODE>. Pascal programmers can't readily invent their own control
structures because there's nothing like <CODE>run</CODE> or <CODE>apply</CODE> in Pascal,
and the built-in control structures are all sequential ones. (In addition
to <CODE>for</CODE>, Pascal has equivalents to the <CODE>while</CODE> and <CODE>do.until</CODE>
commands in the Berkeley Logo library.) As another example, Logo's <CODE>
ifelse</CODE> primitive can be used either as a command or as an operation, but
the Pascal equivalent works only as a command.
<P>Not all programming languages compromise between paradigms. It's rare these
days to see a purely sequential language, but it used to be common; both the
original Fortran language and the early microcomputer versions of BASIC
lacked the ability to handle recursive procedures. Purely functional
languages are not widely used in industry, but are of interest to many
computer science researchers; the best known example is called ML. In a
purely functional language, there is no assignment operator (like <CODE>make</CODE>
in Logo) and no mutators (like <CODE>setitem</CODE> or <CODE>.setfirst</CODE>).
<P>There are other programming paradigms besides sequential and functional,
although those are the oldest. The sequential paradigm came first because
the actual digital computer hardware works sequentially; Fortran, which was
the first higher level programming language, was initially viewed as merely
an abbreviation for the computer's hardware instruction
sequences.<SUP>*</SUP> The
functional paradigm was introduced with Lisp, the second-oldest programming
language still in use. Although Lisp is not a pure functional language (it
does have assignment and mutation), its design is firmly based on the idea of
functions as the way to express a computation.
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Today, we think of programming languages primarily as ways
to express problems, rather than as ways to model how computer hardware
works. This shift in attitude has allowed the development of non-sequential
paradigms. We design languages that are well matched to the problems we want
to solve, rather than well matched to the hardware we're using.</SMALL></BLOCKQUOTE></SMALL><P>Another paradigm that's very popular today is
<EM>object-oriented programming.</EM> In this paradigm, we imagine
that instead of having a single computer carrying out a single program, the
computational world includes many independent "objects," each of which can
carry out programs on its own. Each object includes <EM>methods,</EM> which
are like local procedures, and <EM>variables,</EM> just like the variables
we've used all along except that each belongs to a particular object. If
one object wants to know the value of another object's variable, the first
object must send a <EM>message</EM> to the second. A message is a request
to carry out a method, so the messages that each object accepts depends on
the methods that it knows.
<P>Logo has had a sort of glimmer of the object paradigm for many years, because
many dialects of Logo include multiple turtles. To move a turtle, you send
it a message, using a notation something like
<P><PRE>ask 7 [forward 100]
</PRE>
<P>to send a message to turtle number 7. But this notation, even
though it conveys some of the flavor of object-oriented programming, is not
truly representative of the paradigm. In a real object system, it would
be possible for specific turtles to have their own, specialized <CODE>forward</CODE>
methods. Turtle 7, for example, might be a special "dotted turtle" that
draws dotted lines instead of solid lines when it moves forward. One Logo
dialect, called Object Logo, does provide a genuine object capability.
<P>Object-oriented programming fits naturally with the sort of problem in which
the computer is modeling or simulating a bunch of real-world objects; in
fact, the paradigm was invented for <EM>simulation</EM> programs, used to try
to answer questions such as "Will it eliminate the traffic jams if we add
another lane to this highway, or should we spend the money on more frequent
bus service instead?" The objects in the simulation program are people,
cars, buses, and lanes. Another example of a problem that lends itself to
the object paradigm is a window system, such as the one in Mac OS or in
Microsoft Windows. Each window is an object; when a window is displayed so
as to hide part of another window, the new window must send a message to the
hidden one telling it not to display the hidden region.
<P>Some people argue that object-oriented programming should be used for <EM>
every</EM> programming problem, not because the independent object metaphor is
always appropriate but because using objects helps with <EM>information
hiding;</EM> if every variable belongs to a specific object, the program is
less likely to have the kind of bug in which one part of a program messes up
a data structure that's used in another part. This can be particularly
important, they say, in a large programming problem in which several
programmers work on different pieces of the program. When the different
programmers' procedures are put together, conflicts can arise, but the
object paradigm helps isolate each programmer's work from the others.
Although this argument has some merit, I'm cautious about any claim that one
particular paradigm is best for all problems. I think programmers should be
familiar with all of the major paradigms and be able to pick one according
to the needs of each task.
<P>Another important programming paradigm is called <EM>
logic programming</EM> or <EM>declarative programming.</EM> In
this approach, the programmer doesn't provide an algorithm at all, but
instead lists known facts about the problem and poses questions. It's up
to the language implementation to search out all the possible solutions to
a question. We saw a very simplified version of this paradigm in the
discussion of logic problems in Chapter 2. Logic programming is especially
well suited to <EM>database</EM> problems, in which we pose questions such as
"Who are all the employees of this company who work in the data processing
division and have a salary above 40,000?" But, like all the paradigms
I've mentioned, logic programming is <EM>universal;</EM> any problem that can
be solved by a computer at all can be expressed as a logic program. Logic
programming is quite popular in Japan and in Europe, but not so much in the
United States, perhaps just because it wasn't invented here.
<P><H2>Interactive and Non-interactive Languages</H2>
<P>You use Logo by interacting with the language processor. You issue an
instruction, Logo does what you've told it and perhaps prints a result,
and then you issue another instruction. You can preserve a sequence of
instructions by defining a procedure, in which case that procedure can be
invoked in later instructions. But you don't have to define procedures;
young children generally start using Logo by issuing primitive turtle motion
commands one at a time. Defining procedures can be thought of as extending
the repertoire of things Logo knows how to do for future interaction.
<P>By contrast, you write a Pascal program as a complete entity, "feed" the
program to the Pascal language processor all at once, and then wait for the
results. Often, when you type your program into the computer you aren't
dealing with the Pascal processor at all; you use another program, a
text editor, to let you enter your Pascal program into the computer. <EM>
Then</EM> you start up Pascal itself. (Most microcomputer versions of Pascal
include a simple text editor for program entry, just as Logo includes a
procedure editor.) Typically you store your Pascal program as a file on a
disk and you give the file name as input to the Pascal language processor.
<P>Keep in mind that it's the process of writing and entering a program that's
non-interactive in Pascal. It's perfectly possible to write a Pascal program
that interacts with the user once it's running, alternating <CODE>read</CODE> and
<CODE>write</CODE> statements. (However, user input is one of the things I've left
out of my Pascal subset, as you'll see shortly.)
<P>If you want to write your own Pascal programs for use with my compiler,
you'll need a way to create a disk file containing your new program, using
either Logo's procedure editor or some separate editing program. The sample
Pascal programs in this chapter are included along with the Logo program
files that accompany this book.
<P>Our first example of a complete Pascal program is a version of the
Tower of Hanoi puzzle. I described this problem in
the first volume of this series. The Logo solution consists of two
procedures:
<P><PRE>to hanoi :number :from :to :other
if equalp :number 0 [stop]
hanoi :number-1 :from :other :to
movedisk :number :from :to
hanoi :number-1 :other :to :from
end
to movedisk :number :from :to
print (sentence [Move disk] :number "from :from "to :to)
end
</PRE>
<P>To use these procedures you issue an instruction like
<P><PRE>? <U>hanoi 5 "a "b "c</U>
Move disk 1 from a to b
Move disk 2 from a to c
Move disk 1 from b to c
Move disk 3 from a to b
Move disk 1 from a to c
... and so on.
</PRE>
<P>Here is the corresponding Pascal program. This program is in the
file <CODE>tower</CODE>. (As you can see, Pascal programs begin with a
<EM>program name;</EM> in all of the examples in this chapter the
file name is the same as the program name, although that isn't a requirement
of Pascal.) Never mind the program details for the moment; right now
the point is to make sure you know how to get the Pascal compiler to
translate it into Logo.
<P><PRE>program tower;
{This program solves the 5-disk tower of hanoi problem.}
procedure hanoi(number:integer;from,onto,other:char);
{Recursive procedure that solves a subproblem of the original problem,
moving some number of disks, not necessarily 5. To move n disks, it
must get the topmost n-1 out of the way, move the nth to the target
stack, then move the n-1 to the target stack.}
procedure movedisk(number:integer;from,onto:char);
{This procedure moves one single disk. It assumes that the move is
legal, i.e., the disk is at the top of its stack and the target stack
has no smaller disks already. Procedure hanoi is responsible for
making sure that's all true.}
begin {movedisk}
writeln('Move disk ',number:1,' from ',from,' to ',onto)
end; {movedisk}
begin {hanoi}
if number <> 0 then
begin
hanoi(number-1,from,other,onto);
movedisk(number,from,onto);
hanoi(number-1,other,onto,from)
end
end; {hanoi}
begin {main program}
hanoi(5,'a','b','c')
end.
</PRE>
<P>Once you have a Pascal program in a disk file, you compile it using the
<CODE>compile</CODE> command with the file name as input:
<P><PRE>? <U>compile "tower</U>
</PRE>
<P>The compiler types out the program as it compiles it, partly to
keep you from falling asleep while it's working but also so that if the
compiler detects an error in the program you'll see where the error was
found.
<P>When the compiler has finished processing the <EM>source file</EM> (the file
containing the Pascal program) it stops and you see a Logo prompt. At this
point the program has been translated into a simulated machine
language. To run the program, say
<P><PRE>? <U>prun "tower</U>
Move disk 1 from a to b
Move disk 2 from a to c
Move disk 1 from b to c
Move disk 3 from a to b
Move disk 1 from a to c
... and so on.
</PRE>
<P>The input to the <CODE>prun</CODE> (Pascal run) command is the program
name--the word that comes after <CODE>program</CODE> at the beginning of the
source file.
<P>The difference between an interactive and a non-interactive language is not
just an arbitrary choice of "user interface" style. This choice reflects
a deeper distinction between two different ways of thinking about what
a computer program is.
In Logo there is really no such thing as a "program." Each procedure is an
entity on its own. You may think of one particular procedure as the
top-level one, but Logo doesn't know that; you could invoke any procedure
directly by using an interactive instruction naming that procedure. Logo
does have the idea of a <EM>workspace,</EM> a collection of procedures stored
together in a file because they are related. But a workspace
need not be a tree-structured hierarchy with one top-level procedure and the
others as subprocedures. It can be a collection of utility procedures with
no top-level umbrella, like the Berkeley Logo library.
It can be a group of projects that are conceptually related but with several
independent top-level procedures, like the two memoization examples, the two
sorting algorithms, the tree data base, and other projects in the <CODE>algs</CODE>
workspace of Chapter 3.
<P>By contrast, a Pascal program is considered a single entity. It always
begins with the word <CODE>program</CODE> and ends with a period, by analogy with
an English sentence. (The subprocedures and the individual statements
within the program are separated with semicolons because they are analogous
to English clauses.<SUP>*</SUP>) It makes no sense to give the
Pascal compiler a source file containing just procedures without a main
program.
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I say "English" because I am writing for an
English-speaking audience, but in fact Pascal was designed by a largely
European committee including native speakers of several languages; principal
designer Niklaus Wirth is Swiss. Their languages all
have periods and semicolons, though.</SMALL></BLOCKQUOTE></SMALL><P><EM>Why</EM> did Logo's designers choose an interactive program development
approach, while Pascal's designers chose a whole-program paradigm? Like all
worthwhile questions, this one has more than one answer. And like many
questions in language design, this one has two broad <EM>kinds</EM> of answer:
the answers based on the implementation strategy for the language and the
ones based on underlying programming goals.
<P>The most obvious answer is that Pascal is a <EM>compiled</EM> language and
Logo is an <EM>interpreted</EM> one. That is, most Pascal language
processors are <EM>compilers:</EM> programs that translate a program from one
language into another, like translating a book from English into Chinese.
Most compilers translate from a source language like Pascal into
the native <EM>machine language</EM> of whatever computer you're using.
(My Pascal compiler translates into a <EM>simulated</EM> machine language
that's actually processed by Logo procedures.) By
contrast, most Logo versions are <EM>interpreters:</EM> programs
that directly carry out the instructions in your source program, without
translating it to a different ("object") language.<SUP>*</SUP>
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>This is another
case in which the same word has two unrelated technical meanings. The use of
"object" in describing the result of a compilation (object program, object
language) has nothing to do with object-oriented programming.</SMALL></BLOCKQUOTE></SMALL><P>To understand why interpreters tend to go with interactive languages, while
compilers usually imply "batch mode" program development, think about the
"little person" metaphor that's often used in teaching Logo. If you think
of the computer as being full of little people who know how to carry out the
various procedures you've written, the one who's really in charge is not the
one who carries out your top-level procedure, but rather the one representing
the Logo interpreter itself. If the procedure specialists are like circus
performers, the Logo interpreter is the ringmaster. The circus metaphor is
actually a pretty good one, because on the one hand each performer is an
autonomous person, but at the same time the performers have to cooperate in
order to put on a show. The relevance of the metaphor to this discussion is
that in a compiled language there is no "ringmaster." The compiler is more
closely analogous to a bird that hatches an egg (your program) and then
pushes the new bird out of the nest to fend for itself. In a compiled
language there is no provision for an interactive interface to which you can
give commands that control the running of your program, unless your program
itself includes such an interface.
<P>Saying the same thing in a different way, the Logo interpreter is part of
the environment in which any Logo program operates. (That's why Logo can
provide a facility like the <CODE>run</CODE> command to allow your program to
construct new Logo instructions as it progresses.) But a Pascal compiler
does its job, translating your program into another form, and then
disappears. Whatever mechanisms are available to control your program have
to be built into the program. For example, my Pascal version of the Tower
of Hanoi program includes the top-level instruction that starts up the
solution for five disks. In the Logo version, that instruction isn't
considered part of the program; instead, you direct Logo interactively
to invoke the <CODE>hanoi</CODE> procedure.
<P>
<P>The distinction between compiled and interpreted languages is not as
absolute as it once was. There are versions of Logo in which each procedure
is compiled as you define it, but it's still possible to give instructions
interactively. (Some such versions include both a compiler and an
interpreter; in others, the "interpreter" just arranges to compile each
instruction you type as if it were a one-line program.) And many current
Pascal compilers don't compile into the machine language of the host
computer, but rather into an <EM>intermediate</EM> language called
"P-code" that is then interpreted by another program, a
P-code interpreter. P-code is called an intermediate language
because the level of detail in a P-code program is in between that of a
language you'd want to use and that of the underlying machine language. Its
primitives are simple and quick, not complex control structures like <CODE>
if</CODE> or <CODE>for</CODE>. The advantage of a Pascal language processor based on
P-code is that the compiler is <EM>portable</EM>--it can work on
any computer. All that's needed to start using Pascal on a new computer is a
P-code interpreter, which is a relatively easy programming project.
<P>
<P>So far I've been explaining a language design decision (interactive or
non-interactive development) in terms of an implementation constraint
(interpreted or compiled). But it's possible to look beyond that
explanation to ask <EM>why</EM> someone would choose to design a compiler
rather than an interpreter or vice versa.
<P>The main advantage of a compiler is that the finished object program runs
fast, since it is directly executed in the native language of the host
computer. (An interpreter, too, ultimately carries out the program in the
computer's native language. But the interpreter must decide which native
language instructions to execute for a given source language instruction
each time that instruction is evaluated. In a compiled language that
translation process happens only once, producing an object program
that requires no further translation while it's running.)
The tradeoff is that the compilation process itself is slow.
If you're writing a program that will be used every day forever, the
compiled language has the advantage because the development process only
happens once and then the program need not be recompiled. On the other
hand, during program development the compiled language may be at a
disadvantage, because any little change in one instruction requires that the
entire program be recompiled. (For some languages there are <EM>
incremental</EM> compilers that can keep track of what part of the program
you've changed and only recompile that part.)
<P>A compiled language like Pascal (or Fortran or C), then, makes sense in a
business setting where a program is written for practical use, generally
using well-understood algorithms so that the development process should be
straightforward. An interpreted language like Logo (or Lisp or BASIC) makes
more sense in a research facility where new algorithms are being explored
and the development process may be quite lengthy, but the program may never
be used in routine production. (In fact nobody uses BASIC for research
purposes, because of other weaknesses, but its interactive nature is a plus.)
Another environment in which interaction is important is education; a
computer science student writes programs that may <EM>never</EM> actually be
run except for testing. The program is of interest only as long as it
doesn't work yet. For such programs the speed advantage of a compiled
program is irrelevant.
<P>There are also reasons that have nothing to do with implementation issues.
I've spoken earlier of two conflicting views of computer science, which I've
called the software engineering view and the artificial intelligence view.
In the former, the program development process is seen as beginning with a
clear, well-defined idea of what the program should do. This idea is
written down as a <EM>program specification</EM> that forms the basis for the
actual programming. From that starting point, the program is developed
top-down; first the main program is written in terms of subprocedures that
are planned but not written yet; then the lower-level procedures are written
to fill in the details. No procedure is written until it's clear how that
procedure fits into a specific overall program. Since Pascal's developers
are part of the software engineering camp, it's not surprising that a Pascal
program takes the form of an integrated whole in which each procedure must
be <EM>inside</EM> a larger one, rather than a collection of more autonomous
procedures. By contrast, Logo is a product of the artificial intelligence
camp, for whom program development is a more complicated process involving
bottom-up as well as top-down design. AI researchers recognize that they
may begin a project with only a vague idea of what the finished program will
do or how it will be organized. It's appropriate, then, to start by writing
program fragments that deal with whatever subtasks you <EM>do</EM> understand,
then see how those pieces can fit together to complete the overall project.
Development isn't a straight line from the abstract specification to the
concrete subprocedures; it's a zigzag path in which the programmer gets an
idea, tries it out, then uses the results as the basis for more thinking
about ideas.
<P>Traditionally, an interpreter has been the primary tool to facilitate
interactive program development. Recently, though, software developers
have brought a more interactive flavor to compiled languages by inventing
the idea of an <EM>integrated development environment</EM> (IDE), in which a
compiler is one piece of a package that also includes a language-specific
editor (one that knows about the syntax of the language and automatically
provides, for example, the keyword <CODE>do</CODE> that must follow a <CODE>for</CODE> in
Pascal), online documentation, and a <EM>debugger,</EM> which is a program
that permits you to follow the execution of your program one step at a time,
like the <CODE>step</CODE> command in Berkeley Logo. The idea is to have your cake
and eat it too: You use the IDE tools during program development, but once
your program is debugged, you're left with a fast compiled version that can
be run without the IDE.
<P><H2>Block Structure</H2>
<P>So far we've been looking at how each language thinks about a program as a
whole. We turn now to the arrangement of pieces within a program or a
procedure.
<P>A Logo procedure starts with a <EM>title line,</EM> followed by the
instructions in the procedure <EM>body</EM> and then the <CODE>end</CODE> line.
The purpose of the title line is to give names to the procedure itself and
to its inputs.
<P>The structure of a Pascal program is similar in some ways, but with some
complications. The program starts with a <EM>header</EM> line, very much
analogous to the title line in Logo. The word <CODE>tower</CODE> in the
header line of our sample program is the name of the program. Skipping
over the middle part of the program for the moment, the part between <CODE>
begin</CODE> and <CODE>end</CODE> in the last few lines is
the <EM>statement part</EM> of
the program, just as in Logo. The word <CODE>end</CODE> in the Pascal program is
not exactly analogous to the <CODE>end</CODE> line in a Logo procedure; it's a kind
of closing bracket, matching the <CODE>begin</CODE> before it. The period right
after the final <CODE>end</CODE> is what corresponds to the Logo <CODE>end</CODE> line.
<P>
<P>What makes Pascal's structure different from Logo's is the part I've skipped
over, the declaration of procedures. In Logo, every procedure is a separate
entity. In Pascal, the declaration of the procedure <CODE>hanoi</CODE>, for
example, is <EM>part of</EM> the program <CODE>tower</CODE>. This particular
program uses no global variables, but if it did, those variables would also
have to be declared within the program. If the program used global
variables <CODE>a</CODE>, <CODE>b</CODE>, <CODE>i</CODE>, and <CODE>j</CODE> then it might begin
<P><PRE>program tower;
var a,b:real;
i,j:integer;
procedure hanoi(number:integer;from,onto,other:char);
</PRE>
<P>In summary, a Pascal program consists of
<P>
<TABLE><TR><TH align="right" valign="top">1.<TD> <TD valign="top"> the header line
<TR><TH align="right" valign="top">2.<TD> <TD valign="top"> the declaration part (variables and procedures)
<TR><TH align="right" valign="top">3.<TD> <TD valign="top"> the statement part
<TR><TH align="right" valign="top">4.<TD> <TD valign="top"> the final punctuation (period)
<P></TABLE><P>
<P>But notice that the procedure <CODE>hanoi</CODE>, declared inside <CODE>
tower</CODE>, has the <EM>same</EM> structure as the entire program. It
begins with a header line; its declaration part includes the declaration of
procedure <CODE>movedisk</CODE>; it has a statement part between <CODE>begin</CODE> and
<CODE>end</CODE>; its final punctuation is a semicolon instead of a period.
<P>What does it <EM>mean</EM> for one procedure to be declared inside another?
You already know what a <EM>local variable</EM> means; if a variable <CODE>v</CODE>
belongs to a procedure <CODE>p</CODE> then that variable exists only while the
procedure is running; at another point in the program there might be a
different variable with the same name. In Pascal, the same is true for
local procedures. In our example program, the procedure <CODE>movedisk</CODE>
exists only while procedure <CODE>hanoi</CODE> is running. It would be an error to
try to invoke <CODE>movedisk</CODE> directly from the main program.
<P>The header line for a procedure can include names for its inputs, just as
the title line of a Logo procedure names its inputs. A useful bit of
terminology is that the variable names in the procedure header are called
<EM>formal parameters</EM> to distinguish them from the expressions
that provide particular input values when the procedure is actually invoked;
the latter are called <EM>actual arguments.</EM> The words
"parameter" and "argument" are both used for what we call an
"input" in Logo.<SUP>*</SUP>
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I'm misleading you a little bit by calling it a
"header line." Like any part of a Pascal program, the header can extend
over more than one line, or can be on the same line with other things. The
end of the header is marked with a semicolon. In Pascal a line break is
just like a space between words. However, there are conventions for
properly formatting a Pascal program. Even though the Pascal compiler
doesn't care about spacing and line breaks, people always do it as I've
shown you here, with subordinate parts of the program indented and each
statement on a separate line.</SMALL></BLOCKQUOTE></SMALL><P>The sequence of header, declarations, statements, and punctuation is called
a <EM>block.</EM> Pascal is called a <EM>block structured</EM> language
because of the way blocks can include smaller blocks. Another aspect of
block structure is
Pascal's use of <EM>compound statements.</EM> A sequence of the form
<P><PRE>begin <EM>statement</EM> ; <EM>statement</EM> ; <EM>statement</EM> end
</PRE>
<P>is called a compound statement. An example from the <CODE>tower</CODE>
program is
<P><PRE>begin
hanoi(number-1,from,other,onto);
movedisk(number,from,onto);
hanoi(number-1,other,onto,from)
end
</PRE>
<P>(Notice that semicolons go <EM>
between</EM> statements within this sequence; none is needed after the last
statement of the group. This syntactic rule is based on the analogy between
Pascal statements and English clauses that I mentioned earlier.) For
example, Pascal includes a conditional statement whose form is
<P><PRE>if <EM>condition</EM> then <EM>statement</EM>
</PRE>
<P>
The "statement" part can be a single <EM>simple</EM> statement,
like a procedure call, or it can be a compound statement delimited by <CODE>
begin</CODE> and <CODE>end</CODE>. Because the general term "block structured language"
refers to any syntactic grouping of smaller units into a larger one,
including compound statements, you may hear the word "block" used to refer
to a compound statement even though that's not the official Pascal meaning.
<P>
<P>
<H2>Statement Types</H2>
<P>
<P>In Logo we don't talk about different kinds of statements like compound,
simple, and so on. <EM>Every</EM> Logo instruction (well, all but <CODE>to</CODE>)
is a procedure invocation. <CODE>If</CODE>, for example, is a procedure whose
first input is <CODE>true</CODE> or <CODE>false</CODE> and whose second input is a list
containing instructions to be carried out if the first input is <CODE>true</CODE>.
In Pascal there are several different kinds of statements, each with its own
syntax.
<P>You know about compound statements. You've seen one example of <CODE>if</CODE>,
which is one of several <EM>structured</EM> statements in Pascal. Other
examples include
<P><PRE>while <EM>condition</EM> do <EM>statement</EM>;
repeat <EM>statements</EM> until <EM>condition</EM>
while x < 0 do
begin
increase(x);
writeln(x)
end;
repeat
increase(x);
writeln(x)
until x >= 0;
</PRE>
<P>These are like the <CODE>while</CODE> and <CODE>do.until</CODE> tools in
Berkeley Logo. <CODE>While</CODE>, like <CODE>if</CODE>, requires a single statement
(which can be a compound statement between <CODE>begin</CODE> and <CODE>end</CODE>) after
the <CODE>do</CODE>. However, the words <CODE>repeat</CODE> and <CODE>until</CODE> implicitly
delimit a compound statement, so you can put more than one statement between
them without using <CODE>begin</CODE> and <CODE>end</CODE>. Another example is <CODE>
for</CODE>, which you'll see in use in a moment. Continuing the analogy
with English grammar, a compound statement is like a compound sentence with
several independent (or coordinate) clauses; a structured statement is like
a complex sentence, with a dependent (or subordinate) clause. (If you
always hated grammar, you can just ignore this analogy.)
<P>There are basically only two kinds of simple statement: the
procedure call,
which you've already seen, and the <EM>assignment</EM> statement used to give
a variable a new value. This latter is Pascal's version of <CODE>make</CODE> in
Logo; it takes the form
<P><PRE><EM>variable</EM> := <EM>expression</EM>
slope := ychange/xchange
</PRE>
<P>As I've already mentioned, the variable must have been declared
either in a procedure heading or in a <CODE>var</CODE> declaration.
(Assignment is represented with the two-character symbol <CODE>:=</CODE> because
<CODE>=</CODE> by itself means <CODE>equalp</CODE> rather than <CODE>make</CODE>.)
<P>I say there are "basically" only two kinds because each of these has some
special cases that look similar but follow different rules. For example,
printing to the computer screen is done by what looks like an invocation of
<CODE>write</CODE> (analogous to <CODE>type</CODE> in Logo) or <CODE>
writeln</CODE> ("write line," analogous to <CODE>print</CODE>). But these are
not ordinary procedures. Not only do they take a variable number of
arguments, but the arguments can take a special form not ordinarily
allowed. In the <CODE>movedisk</CODE> procedure in the <CODE>tower</CODE> program, one of
the arguments to <CODE>writeln</CODE> is
<P><PRE>number:1
</PRE>
<P>The "<CODE>:1</CODE>" here means "using one print position unless more
are needed to fit the number." Pascal print formatting is designed to
emphasize the printing of numbers in columns, so the default is to print
each number with a fairly large number of characters, with spaces at the
left if the number doesn't have enough digits. The exact number of
characters depends on the type of number and the dialect of Pascal, but 10
is a typical number for integers. So
<P><PRE>writeln(1,2,3,4);
writeln(1000,2000,3000,4000);
</PRE>
<P>will give a result looking something like this:
<P><PRE> 1 2 3 4
1000 2000 3000 4000
</PRE>
<P>In <CODE>movedisk</CODE> I had to say "<CODE>:1</CODE>" to avoid all that
extra space.
<P>What are the pros and cons of using a variety of syntax rules for different
kinds of statements? One reason for the Pascal approach is that differences
in meaning can be implicit in the definitions of different statement types
instead of having to be made explicit in a program. Don't worry; you're not
expected to understand what that sentence meant, but you will as soon as you
see an example. In Logo we say
<P><PRE>if :x < 10 [increment "x]
while [:x < 10] [increment "x]
</PRE>
<P>Why is the predicate expression <CODE>:x < 10</CODE> in a quoted list in
the case of <CODE>while</CODE> but not for <CODE>if</CODE>? <CODE>If</CODE> wants the
expression to be evaluated once, <EM>before</EM> <CODE>if</CODE> is invoked. The
actual input to <CODE>if</CODE> is not that expression but the value of the
expression, either <CODE>true</CODE> or <CODE>false</CODE>. <CODE>While</CODE>, on the other
hand, wants to evaluate that expression repeatedly. If Logo evaluated the
expression ahead of time and gave <CODE>while</CODE> an input of <CODE>true</CODE> or <CODE>
false</CODE> it wouldn't be able to know when to stop repeating.
<P>The fact that <CODE>if</CODE> wants the condition evaluated once but <CODE>while</CODE>
wants to evaluate it repeatedly has nothing to do with the syntax of Logo;
the same is true in Pascal. But in Pascal you say
<P>
<P><PRE>if x<10 then increment(x);
while x<10 do increment(x);
</PRE>
<P>In Logo the fact that <CODE>if</CODE>'s condition is evaluated in advance
but <CODE>while</CODE>'s isn't is made explicit by the use of square brackets. In
Pascal it's just part of the <EM>semantic</EM> definitions of the <CODE>if</CODE>
and <CODE>while</CODE> statements. (<EM>Syntax</EM> is the form in which something
is represented; <EM>semantics</EM> is the meaning of that something.)
<P>One more example: Beginning students of Logo often have trouble
understanding why you say
<P><PRE>make "new :old
</PRE>
<P>to assign the value of one variable to another variable. Why is
the first quoted and the second dotted? Of course you understand that it's
because the first input to <CODE>make</CODE> is the <EM>name</EM> of the variable
you want to set, while the second is the <EM>value</EM> that you want to give
it. But in Pascal this distinction is implicit in the semantic definition
of the assignment statement; you just say
<P><PRE>new := old
</PRE>
<P>Since beginning Logo students have trouble with quotes and dots in <CODE>
make</CODE>, you might think that the Pascal approach is better. But beginning
Pascal students have a trouble of their own; they tend to get thrown by
statements like
<P><PRE>x := x+1
</PRE>
<P>This doesn't look quite as bad as the BASIC or Fortran version in
which the symbol for assignment is just an equal sign, but it's still easy
to get confused because the symbol "<CODE>x</CODE>" means two very different
things in its two appearances here. In the Logo version
<P><PRE>make "x :x+1
</PRE>
<P>the explicit difference in appearance between <CODE>"x</CODE> and <CODE>:x</CODE>
works to our advantage.
<P>Which way do you find it easier to learn something: Do you want to start
with a simple, perhaps partly inaccurate understanding and learn about the
difficult special cases later, or do you prefer to be told the whole truth
from the beginning? I've posed the question in a way that shows my own
preference, I'm afraid, but there are many people with the opposite view.
<P>The issue about whether or not to make the semantics of some action implicit
in the syntax of the language is the most profound reason for the difference
between Logo's single instruction syntax and Pascal's collection of
statement types, but there are other implications as well. One virtue of
the Pascal compound statement is that it makes for short, manageable
instruction lines. You've seen Logo procedures in these books in which one
"line" goes on for three or four printed lines on the page, e.g., when the
instruction list input to <CODE>if</CODE> contains several instructions. It's a
particularly painful problem in the versions of Logo that don't allow
continuation lines.
<P>On the other hand, Logo's syntactic uniformity contributes to its <EM>
extensibility.</EM> In the example above, <CODE>if</CODE> is a Logo primitive,
whereas <CODE>while</CODE> is a library procedure written in Logo. But the
difference isn't obvious; the two are used in syntactically similar ways.
You can't write control structures like <CODE>while</CODE> in Pascal because
there's nothing analogous to <CODE>run</CODE> to allow a list of instructions to be
an input to a procedure, but even if you could, it would have to take the
form
<P><PRE>while(<EM>condition</EM>,<EM>statement</EM>)
</PRE>
<P>because that's what a procedure call looks like. But it's not
what a built-in Pascal control structure looks like.
<P><H2>Shuffling a Deck Using Arrays</H2>
<P>It's time for another sample Pascal program. Program <CODE>cards</CODE> is a
Pascal version of the problem of shuffling a deck of cards that we discussed
in Chapter 3. It includes local variables, assignment statements, and the
<CODE>for</CODE> structured statement. It also will lead us into some
additional language design issues.
<P><PRE>program cards;
{Shuffle a deck of cards}
var ranks:array [0..51] of integer;
suits:array [0..51] of char;
i:integer;
procedure showdeck;
{Print the deck arrays}
begin {showdeck}
for i := 0 to 51 do
begin
if i mod 13 = 0 then writeln;
write(ranks[i]:3,suits[i]);
end;
writeln;
writeln
end; {showdeck}
procedure deck;
{Create the deck in order}
var i,j:integer;
suitnames:packed array [0..3] of char;
begin {deck}
suitnames := 'HSDC';
for i := 0 to 12 do
for j := 0 to 3 do
begin
ranks[13*j+i] := i+1;
suits[13*j+i] := suitnames[j]
end;
writeln('The initial deck:');
showdeck
end; {deck}
procedure shuffle;
{Shuffle the deck randomly}
var rank,i,j:integer;
suit:char;
begin {shuffle}
for i := 51 downto 1 do {For each card in the deck}
begin
j := random(i+1); {Pick a random card before it}
rank := ranks[i]; {Interchange ranks}
ranks[i] := ranks[j];
ranks[j] := rank;
suit := suits[i]; {Interchange suits}
suits[i] := suits[j];
suits[j] := suit
end;
writeln('The shuffled deck:');
showdeck
end; {shuffle}
begin {main program}
deck;
shuffle
end.
</PRE>
<P>Experienced Pascal programmers will notice that this program isn't written
in the most elegant possible Pascal style. This is partly because of issues
in Pascal that I don't want to talk about (records) and partly because of
issues that I <EM>do</EM> want to talk about in the next section (scope).
<P>Here's what happens when you run the program:
<P><PRE>? <U>prun "cards</U>
The initial deck:
1H 2H 3H 4H 5H 6H 7H 8H 9H 10H 11H 12H 13H
1S 2S 3S 4S 5S 6S 7S 8S 9S 10S 11S 12S 13S
1D 2D 3D 4D 5D 6D 7D 8D 9D 10D 11D 12D 13D
1C 2C 3C 4C 5C 6C 7C 8C 9C 10C 11C 12C 13C
The shuffled deck:
2D 11D 6S 9D 6C 10H 8D 11C 3D 4C 5H 4S 1D
5C 5D 6D 9S 4D 8C 13S 13D 10C 9H 10D 5S 12D
13H 9C 3C 1S 10S 4H 12S 11S 12H 11H 2H 3H 1H
13C 8H 7C 2C 1C 7S 6H 2S 7D 8S 12C 3S 7H
</PRE>
<P>The Pascal <CODE>for</CODE> is somewhat like the Berkeley Logo <CODE>for</CODE>
in its semantics, although of course the syntax is quite different. The
step value must be either 1 (indicated by the keyword <CODE>to</CODE>) or −1
(<CODE>downto</CODE>). By the way, if you've been wondering why I changed one of
the variable names in the Tower of Hanoi program from <CODE>to</CODE> in the Logo
version to <CODE>onto</CODE> in the Pascal version, it's because <CODE>to</CODE> is a <EM>
reserved word</EM> in Pascal and can't be used as the name of
anything.
<P>The Pascal standard does not include a <CODE>random</CODE> function. Most
practical versions of Pascal do provide a random number generator of some
sort; since there's no standard, I've chosen to implement the kind that's
most useful for the kind of programming I'm interested in, namely the Logo
<CODE>random</CODE> that takes an integer argument and returns an integer between
zero and one less than the argument.
<P><H2>Lexical Scope</H2>
<P>Program <CODE>cards</CODE> has three procedures: <CODE>showdeck</CODE>, <CODE>deck</CODE>,
and <CODE>shuffle</CODE>. Each of these is declared directly in the top-level
program. However, <CODE>showdeck</CODE> is not <EM>invoked</EM> directly at top
level; it's used by the other two procedures. (This is one of the
questionable bits of programming style I've put in for the sake of the
following discussion; ordinarily I think I'd have put the statements that
invoke <CODE>showdeck</CODE> in the main program block.)
<P>If you read the program carefully you'll see that <CODE>showdeck</CODE> uses a
variable <CODE>i</CODE> but does <EM>not</EM> declare that
variable.<SUP>*</SUP> (When a
variable is used but not declared within a certain procedure, that use of
the variable is called a <EM>free reference.</EM> A use of a
variable that
<EM>is</EM> declared in the same block is called a <EM>bound</EM> reference.)
There are three variables named <CODE>i</CODE> in the program: one in the outer
block, one in <CODE>deck</CODE>, and one in <CODE>shuffle</CODE>. When, for example, the
main program calls <CODE>deck</CODE> and <CODE>deck</CODE> calls <CODE>showdeck</CODE>, which
variable <CODE>i</CODE> does <CODE>showdeck</CODE> use?
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Actually, the Pascal language requires that the variable
used in a <CODE>for</CODE> statement <EM>must</EM> be declared in the same procedure
in which the <CODE>for</CODE> appears; program <CODE>cards</CODE> is not legal Pascal for
that reason. What's the purpose of that restriction? Suppose that this
procedure was invoked from within another <CODE>for</CODE> loop in another
procedure, and both use the same variable; then both procedures would be
trying to assign conflicting values to that variable. Berkeley Logo's <CODE>
for</CODE> automatically makes its variable local to the <CODE>for</CODE> procedure
itself, for the same reason. But my Pascal compiler lets us get away with
breaking this rule, and I've done it deliberately to make a point.</SMALL></BLOCKQUOTE></SMALL><P>In Logo the answer would be that <CODE>showdeck</CODE> uses the <CODE>i</CODE> belonging
to <CODE>deck</CODE>, the procedure that invoked it. That's because Logo follows
the rules of <EM>dynamic scope:</EM> A free reference to a variable is
directed to the variables of the procedure that invoked the current one,
then if necessary to the variables of the procedure that invoked that one,
and so on up to the global variables. (Dynamic scope is discussed more
fully in the first volume of this series.)
<P>In Pascal, <CODE>showdeck</CODE> uses the <CODE>i</CODE> belonging to the main program.
That's because a free reference to a variable is directed to <EM>the block
within which the current procedure was declared,</EM> then to the block
surrounding that one, and so on up to the outermost program block. This
rule is called <EM>lexical</EM> scope. The set of blocks surrounding a
given block, smaller to larger, is its <EM>lexical environment.</EM> The
lexical environment of <CODE>showdeck</CODE> is
<P><PRE>{showdeck, cards}
</PRE>
<P>The lexical environment of <CODE>movedisk</CODE> in the <CODE>tower</CODE>
program is
<P><PRE>{movedisk, hanoi, tower}
</PRE>
<P>The set of procedure invocations leading to a given procedure is
its <EM>dynamic environment.</EM> A procedure's dynamic environment
isn't always the same; for example, the dynamic environment of <CODE>
showdeck</CODE> is sometimes
<P><PRE>{showdeck, deck, cards}
</PRE>
<P>and sometimes
<P><PRE>{showdeck, shuffle, cards}
</PRE>
<P>The word "lexical" is the adjective form of <EM>lexicon,</EM> which
means "dictionary." It's used in this computer science context because the
lexical context of a procedure has to do with where it's defined, just as
words are defined in a dictionary. The word "dynamic" means <EM>in
motion;</EM> it's used because the dynamic context of a procedure keeps
changing as the program runs.
<P>What are the reasons behind the choice of lexical or dynamic scope? This is
another choice that was originally made for implementation reasons. It
turns out to be easy for an interpreter to use dynamic scope, but for a
compiler it's much easier to use lexical scope. That's because the
interpreter makes the decision about which variable to use while the program
is running and the dynamic environment is known, but the compiler has to
translate the program <EM>before</EM> it is run. At "compile time" there
isn't one fixed dynamic environment, but there is a single, already-known
lexical environment. Originally, interpreted languages like Logo, Lisp, and
APL all used dynamic scope, while compiled ones like Fortran and Pascal used
lexical scope. (There was even a period of time when Lisp systems offered
both an interpreter and a compiler, and the behavior of the same program was
different depending on whether you compiled it or interpreted it because of
different scope rules.)
<P>More recent dialects of Lisp, such as Common Lisp and Scheme, have been
designed to use lexical scope even when interpreted. Their designers
think lexical scope is better for reasons that don't depend on the
implementation technique. One reason is that dynamic scope allows for
programming errors that don't arise when lexical scope is used. In Logo,
suppose you write a procedure that makes a free reference to some variable
<CODE>v</CODE>. What you intended, let's say, was to use a global variable by that
name. But you've forgotten that your procedure is sometimes invoked by
another procedure that you wrote two weeks ago that happens to have an input
named <CODE>v</CODE>. It can be very hard to figure out why your procedure is
suddenly getting the wrong variable. With lexical scope, it's much easier
to keep track of the context in which your procedure is defined, to make
sure there are no local variables <CODE>v</CODE> in the lexical environment.
<P>It's possible to argue in favor of dynamic scope also. One
argument is that in a lexically scoped language certain kinds of tool
procedures can't be written at all: the ones like <CODE>while</CODE> or <CODE>for</CODE>
that take an instruction list as input and <CODE>run</CODE> the list repeatedly.
Suppose you write a procedure in Logo that contains an instruction like
<P><PRE>while [:degrees < 0] [make "degrees :degrees+360]
</PRE>
<P>What variable <CODE>degrees</CODE> do you want this instruction to use?
Presumably you mean the same variable <CODE>degrees</CODE> that is used by other
instructions in the same procedure. But if Logo used lexical scope,
then <CODE>while</CODE> wouldn't have access to the local variables of your
procedure. (It's possible to design other features into a lexically scoped
language to get around this limitation, but the necessary techniques are
more complicated than the straightforward way you can write <CODE>while</CODE> in
Logo.)
<P>Another argument for dynamic scope, with particular relevance to Logo, is
that dynamic scope fits better with the expectations of an unsophisticated
programmer who hasn't thought about scope at all. One of the design goals
of Logo is to be easy for such beginners. Until now we've been talking
about scope in terms of naming conflicts: what happens if two variables have
the same name. But suppose you write a program with a bunch of procedures,
with a bunch of <EM>distinct</EM> variable names used as inputs. It makes
life very simple if all those variables are available whenever you want them,
so you don't have to think in such detail about how to get a certain piece
of information down the procedure invocation chain to a subprocedure. If
some variables are accessible to subprocedures but others aren't, that's one
more mystery to make programming seem difficult. In particular, dynamic
scope can simplify the debugging of a Logo program. If you arrange for your
program to <CODE>pause</CODE> at the moment when an error happens, then you can
enter Logo instructions, with all of the local variables of <EM>all</EM>
pending procedure invocations available, to help you figure out the reason
for the error. Debuggers for lexically scoped languages require a more
complicated debugging mechanism in which the programmer must explicitly
shift focus from one procedure to another.
<P>In the situations we've seen, lexical scope always acts as a <EM>
restriction</EM> on the availability of variables to subprocedures. That is,
a procedure's lexical environment is always a subset of its dynamic
environment. (Suppose procedure <CODE>a</CODE> includes the definition of
procedure <CODE>b</CODE>, which in turn includes the definition of <CODE>c</CODE>. So the
lexical environment of <CODE>c</CODE> is <CODE>{c,b,a}</CODE>. You might imagine that
<CODE>c</CODE>'s dynamic environment could be <CODE>{c,a}</CODE> if procedure <CODE>a</CODE>
invoked <CODE>c</CODE> directly, but in fact that's illegal. Just as <CODE>a</CODE> can't
use <CODE>b</CODE>'s local variables, it can't use <CODE>b</CODE>'s local procedures
either. The reason the dynamic environment can be different from the
lexical one at all is that two procedures can be part of the same block,
like <CODE>showdeck</CODE> and <CODE>deck</CODE> in the <CODE>cards</CODE> program.) In Lisp,
it's possible for a procedure to return <EM>another procedure</EM> as its
output--not just the name of the procedure or the text of the procedure, as
we could do in Logo, but the procedure itself, lexical environment and all.
When such a procedure is later invoked from some other part of the
program, the procedure's lexical environment may not be a subset
of its dynamic environment, and so lexical scope gives it access to
variables that it couldn't use under dynamic scope rules. That's a powerful
argument in favor of lexical scope for Lisp, but it doesn't apply to Pascal.
<P>One special scope rule in Pascal applies to procedures declared in the same
block: The one declared later can invoke the one declared earlier, but not
the other way around. In the <CODE>cards</CODE> program, <CODE>deck</CODE> can call <CODE>
showdeck</CODE> but <CODE>showdeck</CODE> can't call <CODE>deck</CODE>. There is no deep reason
for this restriction; it's entirely for the benefit of the compiler. One of
the design goals of Pascal was that it should be easy to write a compiler
that goes through the source program once, from beginning to end, without
having to back up and read part of the program twice. In particular, when
the compiler sees a procedure invocation, it must already know what inputs
that procedure requires; therefore, it must have already read the header of
the subprocedure. Usually you can get around this restriction by rearranging
the procedures in your program, but for the times when that doesn't work
Pascal provides a kludge that lets you put the header in one place in the
source file and defer the rest of the procedure until later.
<P>
<P><H2>Typed Variables</H2>
<P>Berkeley Logo has three data types: <EM>word, list,</EM> and <EM>
array.</EM> (Numbers are just words that happen to be full of digits.) But a
<EM>variable</EM> in Logo does not have a type associated with it; any datum
can be the value of any variable.
<P>Pascal has lots of types, and every variable belongs to exactly one of them.
In the sample programs so far we've used five types: <EM>char,</EM> <EM>
integer,</EM> <EM>array of char,</EM> <EM>array of integer,</EM> and <EM>packed array
of char.</EM> When a variable is declared, the declaration says what type it is.
<P>The selection of data types is the area in which my Pascal compiler is most
lacking; I've implemented only a few of the possible types. I'll describe
the ones available in my compiler in detail and give hints about the others.
<P>
The fundamental types out of which all others are built are the <EM>
scalar</EM> types that represent a single value, as opposed to an aggregate
like an array or a list. Pascal has four:
<P><P>
<TABLE><TR><TH align="right" valign="top"><CODE>integer</CODE><TD> <TD valign="top">a positive or negative whole number (e.g., <CODE>
23</CODE>)
<TR><TH align="right" valign="top"><CODE>real</CODE><TD> <TD valign="top">a number including decimal fraction part (e.g., <CODE>
-5.0</CODE>)
<TR><TH align="right" valign="top"><CODE>char</CODE><TD> <TD valign="top">a single character (e.g., <CODE>'Q'</CODE>)
<TR><TH align="right" valign="top"><CODE>Boolean</CODE><TD> <TD valign="top"><CODE>true</CODE> or <CODE>false</CODE>
<P></TABLE><P>
<P>Pascal also has several kinds of aggregate types. The only one
that I've implemented is the array, which is a fixed number of uniform
elements. By "uniform" I mean that all members of the array must be of
the same type. Full Pascal allows the members to be of any type, including
an aggregate type, as long as they're all the same, so you could say
<P><PRE>var a : array [1..10] of array [1..4] of integer;
</PRE>
<P>to get an array of arrays. But in my subset Pascal the members of
an array must be scalars. This restriction is not too severe because Pascal
arrays can have <EM>multiple indices;</EM> instead of the above you can use
the equivalent
<P><PRE>var a : array [1..10, 1..4] of integer;
</PRE>
<P>This declaration creates a two-dimensional array whose members
have names like <CODE>a[3,2]</CODE>.
<P>The notation <CODE>1..10</CODE> is called a <EM>range;</EM> it indicates the extent
of the array. Berkeley Logo arrays ordinarily start with index one, so
a Logo instruction like
<P><PRE>make "ten array 10
</PRE>
<P>is equivalent to the Pascal declaration
<P><PRE>var ten:array [1..10] of something
</PRE>
<P>except that the Logo array need not have uniform
members.<SUP>*</SUP> (Also,
a subtle difference is that the Logo array is an independent datum that can
be the value of a variable just as a number can be the value of a variable.
The Pascal array <EM>is</EM> the variable; you can change the contents of
individual members of the array but it's meaningless to speak of changing
the value of that variable to be something other than that array.)
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>The Berkeley Logo <CODE>array</CODE> primitive can take an optional
second input to specify a different starting index.</SMALL></BLOCKQUOTE></SMALL><P>In Pascal an index range doesn't have to be numbers; you can use any scalar
type except real:
<P><PRE>var frequency : array ['a'..'z'] of integer;
</PRE>
<P>might be used in a program that counts the frequency of use of
letters, such as a cryptography program. The members of this array would be
used by referring to things like <CODE>frequency['w']</CODE>. (In Pascal
documentation there is a word for "scalar type other than real": It's
called an <EM>ordinal</EM> type.)
<P>A <EM>packed array</EM> is one that's represented in the computer in a way
that takes up as little memory as possible. Ordinary arrays are stored so
as to make it as fast as possible to examine or change an individual element.
The distinction may or may not be important for a given type on a given
computer. For example, most current home computers have their memory
organized in <EM>bytes</EM> that are just the right size for a single
character. On such a computer, an array of char and a packed array of char
will probably be represented identically. But one of my favorite larger
computers, the Digital Equipment Corporation PDP-10, had its memory
organized in <EM>words</EM> of 36 bits, enough for five 7-bit characters with
one bit left over. A packed array of char, on the PDP-10, would be
represented with five characters per word; an ordinary array of char might
store only one character per word, wasting some space in order to simplify
the task of finding the <EM>n</EM>th character of the array.
<P>My compiler, which is meant to be simple rather than efficient, ignores the
<CODE>packed</CODE> declaration. The reason I used it in the <CODE>cards</CODE> program
is to illustrate a rule of Pascal: The statement
<P><PRE>suitnames := 'HSDC'
</PRE>
<P>assigns a <EM>constant string</EM> to the array variable <CODE>
suitnames</CODE>, and such assignments are allowed only to packed arrays of char.
Also, the size of the array must equal the length of the string. If
<CODE>suitnames</CODE> were an array of length 10, for example, I'd have had to say
<P><PRE>suitnames := 'HSDC '
</PRE>
<P>filling up the unused part of the array explicitly with spaces.
<P>In an assignment statement, the type of the variable on the left must be the
same as the type of the expression on the right. An assignment can copy one
array into another if it involves two variables of exactly the same type:
<P><PRE>var a,b:array [3..17] of real;
a := b
</PRE>
<P>but except for the case of packed arrays of char mentioned above
there is no way to represent a constant array in a Pascal program. If you
want an array of all the prime numbers less than 10 you have to initialize it
one member at a time:
<P><PRE>var primes : array [1..4] of integer;
primes[1] := 2;
primes[2] := 3;
primes[3] := 5;
primes[4] := 7
</PRE>
<P>In scalar assignments, a slight relaxation of the rules is allowed in that
you may assign an integer value to a real variable. The value is converted
to a real number (e.g., <CODE>17</CODE> to <CODE>17.0</CODE>). The opposite is <EM>
not</EM> allowed, but there are two built-in functions <CODE>trunc</CODE> (for
"truncate") and <CODE>round</CODE> that can be used to convert a real value to an
integer. <CODE>Trunc</CODE> cuts off the fraction part, so <CODE>trunc(4.99)</CODE> is
<CODE>4</CODE>. <CODE>Round</CODE> rounds to the nearest integer.
<P>Pascal provides the usual infix arithmetic operations <CODE>+</CODE>, <CODE>-</CODE>,
<CODE>*</CODE>, and <CODE>/</CODE>, following the usual precedence rules, just as in Logo.
The result of any of these is integer if both operands are integer,
otherwise real, except that the result of <CODE>/</CODE> (division) is always real.
There are integer division operations <CODE>div</CODE> (integer quotient) and <CODE>
mod</CODE> (integer remainder); both operands to these must also be integers.
The relational operators <CODE>=</CODE> (like <CODE>equalp</CODE> in Logo), <CODE><</CODE> (less
than), <CODE>></CODE> (greater than), <CODE><=</CODE> (less than or equal to), <CODE>>=</CODE>
(greater than or equal to), and <CODE><></CODE> (not equal to) take two real or
integer operands and yield a Boolean result. There are also Boolean
operators <CODE>and</CODE>, <CODE>or</CODE>, and <CODE>not</CODE>, just like the Logo ones except
that they use infix syntax:
<P><PRE>(x < 0) and (y <= 28)
</PRE>
<P><H2>Additional Types in Standard Pascal</H2>
<P>Standard Pascal, but not my version, includes other aggregate types besides
the array. One such type is the <EM>record;</EM> a record is a non-uniform
aggregate, but the "shape" of the aggregate must be declared in advance.
For example, you can declare a record containing three integers and an array
of 10 characters. In the <CODE>cards</CODE> program, instead of two separate
arrays for ranks and suits I could have said
<P><PRE>var carddeck: array [0..51] of record
rank:integer;
suit:char
end;
</PRE>
<P>Then to refer to the rank of card number 4 I'd say
<P><PRE>carddeck[4].rank
</PRE>
<P>and that would be an integer. A <EM>pointer</EM> is a variable
whose value is the memory address of a record; pointer variables can be used
to implement dynamic data structures like Logo lists by building explicitly
the collections of boxes and arrows illustrated in some of the pictures in
Chapter 3. But it's hard to build anything in Pascal that's <EM>quite</EM>
like Logo lists, even using pointers, because what's in each box has to
belong to some particular predeclared type.
<P>Real Pascal also includes user-defined types. There is a <CODE>
type</CODE>
declaration that goes before the <CODE>var</CODE> declaration in a block:
<P><PRE>type string = packed array [1..10] of char;
</PRE>
<P>Variable declarations can then use <CODE>string</CODE> as the type of the
variable being declared. In fact, standard Pascal <EM>requires</EM> the use
of named types in certain situations. For example, in a procedure header
the formal parameters must be given a named type (either a built-in scalar
type like <CODE>integer</CODE> or a defined type like <CODE>string</CODE>); since I haven't
implemented <CODE>type</CODE> my compiler allows
<P><PRE>procedure paul(words:packed array [1..10] of char);
</PRE>
<P>although standard Pascal doesn't allow such a header.
<P>
You can also define <EM>subrange</EM> types:
<P><PRE>type dieface = 1..6;
</PRE>
<P>This type is really an integer, but it's constrained to have only
values in the given range. This particular one might be used for a variable
whose value is to represent the result of rolling a six-sided die. Finally,
there are <EM>enumerated</EM> types:
<P><PRE>type Beatle = (John, Paul, George, Ringo);
</PRE>
<P>This type is used for a variable that represents one of a small
number of possible things. In reality it's also a kind of integer; the word
<CODE>John</CODE> represents 0, <CODE>Paul</CODE> is 1, and so on. In fact, it's only
during the compilation of the program that Pascal remembers the names of the
possible values; you can't read or print these variables during the running
of the program.
<P><H2>Critique of Typed Variables</H2>
<P>Why would anyone want to use a subrange or other restricted type? If the
program works using a variable of type <CODE>dieface</CODE>, it would work just as
well if the variable were of type <CODE>integer</CODE>. The only difference is
that using a subrange type is slower because the program has to check (at
run time) to make sure that any value you try to assign to that variable is
in the approved range.
<P>According to Pascal enthusiasts, the virtue of restricted types, like the
virtue of typed variables in the first place, is that their use helps catch
program bugs. For example, in the <CODE>cards</CODE> program, procedure <CODE>
shuffle</CODE> has variables <CODE>i</CODE> and <CODE>j</CODE> that are used to index the arrays
<CODE>ranks</CODE> and <CODE>suits</CODE>. How do we know there isn't an error in the
program so that one of those variables is given a value that isn't a valid
index for those arrays? Such an error would be caught when we <EM>use</EM>
the index variable to try to refer to an array element that doesn't exist,
but it's easier to debug the program if we get the error message at the
moment when the variable is assigned the incorrect value. So I should have
declared
<P><PRE>var i,j : 0..51;
</PRE>
<P>instead of using <CODE>integer</CODE>. (Of course one reason I didn't
use a subrange type is that I didn't implement them in my compiler!)
<P>The trouble is that strict typing of variables is an unnecessary pain in the
neck.<SUP>*</SUP> Take this business of array index
bounds. Here is a possible piece of Pascal program:
<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I know I said I wasn't going to try to convince you which
language is better, but on this particular point I really think the Pascal
people don't have a leg to stand on.</SMALL></BLOCKQUOTE></SMALL><P><PRE>i := 0;
while i <= 51 do
begin
writeln(ranks[i]:3,suits[i]);
i := i+1
end
</PRE>
<P>There's nothing wrong with this. It will print the value of each
member of the two arrays, starting from index 0 and continuing through 51.
However, at the end of the <CODE>while</CODE> statement, the variable <CODE>i</CODE> has
the value 52. This is <EM>not</EM> an error; the program does <EM>not</EM> try
to refer to member 52 of the arrays. But if we declared <CODE>i</CODE> as a
subrange type the way we "should," the program will give an error message
and die. This particular example could be rewritten using <CODE>for</CODE> instead
of <CODE>while</CODE> to avoid the problem, but it turns out that there are many
algorithms that jump around in an array in which the index variable
sometimes takes on a value just outside the bounds of the array. Some sort
algorithms, for example, are like that.
<P>
<P>Typed variables work against program robustness. (A <EM>robust</EM> program
is one that keeps calm in the face of bad data or other user error, rather
than dying abruptly.) For example, suppose we want to find the sum of a
bunch of numbers. Some human being is going to type these numbers into the
computer, one at a time. We don't know in advance how many there are, so we
let the person type <CODE>done</CODE> when done. Since the typist is only human,
rather than a computer, we want to make sure the program doesn't blow up if
he accidentally types something other than a number or <CODE>done</CODE>. Here's
one way to do it in Logo:
<P><PRE>to addnumbers
print [Enter the numbers, one per line.]
print [Type the word 'done' when done.]
print se [The sum is] addnumbers1 0
end
to addnumbers1 :sum
localmake "next readnumber
if emptyp :next [output :sum]
output addnumbers1 :sum+:next
end
to readnumber
localmake "input readlist
if emptyp :input [output readnumber]
if equalp :input [done] [output []]
if numberp first :input [output first :input]
print [Please type numbers only.]
output readnumber
end
</PRE>
<P>If the user makes a typing mistake, the program says so, ignores
the bad input, and keeps going. Now, how shall we write this in Pascal?
Into what type of variable should we read each number from the keyboard? If
we pick <CODE>integer</CODE>, then any entry of a non-number will incite Pascal to
print an error message and terminate the program. Instead we can read the
keyboard entry into an <CODE>array of char</CODE>, one character at a time. Then
anything the user types is okay, but we can't do arithmetic on the
result--we can't add it into the accumulated sum. We can read it as <CODE>
char</CODE>s and then write a procedure that knows how to look at a string of
digits and compute the number that those digits represent. But this is not
the sort of thing that we should need to do ourselves in a high-level
programming language.
<P>Why should a programmer have to decide in advance whether or not the numbers
that a program will manipulate are integers? In Logo I can write a general
numeric procedure like this one:
<P><PRE>to square :x
output :x * :x
end
</PRE>
<P>but in Pascal I need one for each kind of number:
<P><PRE>function RealSquare(x:real): real;
begin
RealSquare := x * x
end;
function IntSquare (x:integer): integer;
begin
IntSquare := x * x
end;
</PRE>
<P>Why pick on the distinction between integer and non-integer values?
Why not positive and negative values, or odd and even values? The historical
answer is that computer hardware uses two different representations for
integer and real numbers, but so what? That doesn't mean the distinction
is relevant to the particular program I'm writing.
<P>The language ML, which I mentioned earlier as an example of a pure functional
language, tries to provide the best of both worlds on this issue. It does
require that variables have types, as in Pascal, to help catch programming
errors. But two features make ML's type system more usable than Pascal's.
One is the provision of <EM>union types.</EM> It's possible to say that
a particular variable must contain either a number or the word <CODE>done</CODE>,
for example. (Pascal has something like this, called <EM>variants,</EM> but
they're less flexible.) Second, the ML compiler uses <EM>type inference,</EM>
a technique by which the compiler can often figure out the appropriate type
for a variable without an explicit declaration.
<P>
<P>
<H2>Procedures and Functions</H2>
<P>
<P>In Logo a distinction is made between those procedures that output a value
(operations) and those that don't (commands). Pascal has the same
categories, but they're called <EM>functions</EM> and <EM>
procedures</EM> respectively.
<P>A function is a block just like a procedure block except for the minor
changes needed to accommodate the fact that the function produces a value.
First of all, the function header has to say what the <EM>type</EM> of the
result will be:
<P><PRE>function whatever (<EM>arguments</EM>) : integer;
</PRE>
<P>The function's type must be a scalar, not an aggregate type. This
restriction is in the standard only to make life easier for the compiler,
and some versions of Pascal do allow array-valued (or record-valued,
etc.) functions.
<P>The other difference is that in the statement part of a function block we
have to tell Pascal what the value will be. That is, we need something
equivalent to <CODE>output</CODE> in Logo. Pascal's convention is that somewhere
in the block an assignment statement must be executed that has the
name of the function as its left hand side. That is, the function name is
used in an assignment statement as though it were a variable name.
(However, the name must <EM>not</EM> be declared as a variable.) This
notation may be a little confusing, because if the same name appears on the
<EM>right</EM> side of the assignment statement, it signals a recursive
invocation of the function. Perhaps an example will make this clear.
<P>Program <CODE>multi</CODE> is a Pascal version of the memoized multinomial function
from Chapter 3. In the Logo version, <EM>t</EM>(<EM>n</EM>,<EM>k</EM>) was memoized using the
property name <EM>k</EM> on the property list named <EM>n</EM>. In the Pascal version,
since we have multi-dimensional arrays, it is straightforward to use a
two-dimensional array and store <EM>t</EM>(<EM>n</EM>,<EM>k</EM>) in <CODE>memo[n,k]</CODE>.
<P><PRE>program multi;
{Multinomial expansion problem}
var memo: array [0..4, 0..7] of integer;
i,j: integer;
function t(n,k:integer) : integer;
function realt(n,k:integer) : integer;
{without memoization}
begin {realt}
if k = 0 then
realt := 1
else
if n = 0 then
realt := 0
else
realt := t(n,k-1)+t(n-1,k)
end; {realt}
</PRE>
<PRE> begin {t}
if memo[n,k] < 0 then
memo[n,k] := realt(n,k);
t := memo[n,k]
end; {t}
begin {main program}
{initialization}
for i := 0 to 4 do
for j := 0 to 7 do
memo[i,j] := -1;
{How many terms in (a+b+c+d)7?}
writeln(t(4,7));
end.
</PRE>
<P>The assignment statements like
<P><PRE>realt := 0
</PRE>
<P>are the ones that control the values returned by the functions.
These assignment statements are not exactly like <CODE>output</CODE> in Logo
because they do not cause the function to return immediately. They act just
like ordinary assignments, as if there were actually a variable named <CODE>
realt</CODE> or <CODE>t</CODE>; when the statement part of the function is finished,
whatever value was most recently assigned to the function name is the one
that's used as the return value. (In fact the functions in program <CODE>
multi</CODE> are written so that only one such assignment is carried out, and
there are no more statements to execute after that assignment. That's a
pretty common programming style; it's rare to change the assignment to the
function name once it's been made.)
<P>Apart from arbitrary syntactic details, Pascal's design with respect to
procedures and functions is similar to Logo's, so I can't ask why the two
languages made different choices. It's probably just as well, since you
shouldn't get the impression that Pascal is the exact opposite of Logo in
every way. Instead we could compare these two languages with Lisp, in
which there are only operations, or most versions of BASIC, in which
there are only commands. But I don't have the space to teach you enough
about those languages to make such a comparison meaningful.
<P><H2>Call by Value and Call by Reference</H2>
<P>Consider this Logo procedure:
<P><PRE>to increment :var
make :var (thing :var)+1
end
? <U>make "baz 23</U>
? <U>increment "baz</U>
? <U>print :baz</U>
24
</PRE>
<P>The input to <CODE>increment</CODE> is the <EM>name</EM> of a variable
that you want to increment. A similar technique is used, for example, in
the <CODE>push</CODE> and <CODE>pop</CODE> library procedures, which take the name of a stack
variable as input. The reason this technique is necessary is that we want
the procedure to be able to modify the variable--to assign it a new value.
<P>The same technique won't work in Pascal. For one thing, the association of
a name with a variable only exists at compile time. Once the compiled
program is running, the variables have no names, only addresses in computer
memory. Also, <CODE>increment</CODE> takes advantage of dynamic scope because the
variable it wants to modify isn't its own, but rather a variable accessible
to the calling procedure.
<P>Here's how you do it in Pascal:
<P><PRE>procedure increment(var v:integer);
begin
v := v+1;
end;
</PRE>
<P>What's new here is the reserved word <CODE>var</CODE> in the argument
list. This word indicates that <CODE>v</CODE> is a <EM>variable parameter;</EM>
ordinary ones are called <EM>value parameters.</EM> <CODE>Increment</CODE> would be
used in a program like this:
<P><PRE>program whatzit;
var gub:integer;
begin
...
gub := 5;
increment(gub);
...
end.
</PRE>
<P>Suppose <CODE>increment</CODE> had been written without the word <CODE>var</CODE> in its
header. In that case, when the statement <CODE>increment(gub)</CODE> was executed
here's what would happen. First, the actual argument to <CODE>increment</CODE>
(namely <CODE>gub</CODE>) would be evaluated. The value would be 5, since that's
the value of the variable <CODE>gub</CODE>. Then that value would be assigned to
the local variable <CODE>v</CODE> in <CODE>increment</CODE>. Then the instruction part of
<CODE>increment</CODE> would be run. The assignment statement there would change
the value of <CODE>v</CODE> from 5 to 6. Then <CODE>increment</CODE> would be finished,
and its local variable <CODE>v</CODE> would disappear. All of this would have no
effect on the variable <CODE>gub</CODE>. This ordinary interpretation of <CODE>v</CODE>
is called <EM>call by value</EM> because what gets associated with
the name <CODE>v</CODE> is the <EM>value</EM> of the actual argument, 5 in this
example, regardless of how that 5 was derived. For example, the instruction
in the main program could have been
<P><PRE>increment(2+3);
</PRE>
<P>and it wouldn't have mattered.
<P>
Making <CODE>v</CODE> a variable parameter instead of a value parameter changes the
operation of the program substantially. When <CODE>increment</CODE> is invoked,
the actual argument <EM>must</EM> be a variable name, not an arbitrary
expression. Pascal does not find the value of the actual argument and pass
that value along to <CODE>increment</CODE>; instead, the formal parameter
<CODE>v</CODE> becomes <EM>another name for the same variable</EM> named in the
actual argument. In this example, <CODE>v</CODE> becomes another name for <CODE>
gub</CODE>. So the assignment statement
<P><PRE>v := v+1
</PRE>
<P>isn't really about the local variable <CODE>v</CODE> at all; it's another
way to say
<P><PRE>gub := gub+1
</PRE>
<P>and so it <EM>does</EM> affect the variable in the calling block.
This use of variable parameters is called <EM>call by reference</EM>
because the formal parameter (<CODE>v</CODE>) <EM>refers</EM> to another variable.
<P>One way to think about call by reference is that it provides, in effect, a
sort of limited dynamic scope. It's a way for a superprocedure to
allow a subprocedure access to one selected variable from the
superprocedure's lexical environment. Because this permission is
given to the subprocedure explicitly, call by reference doesn't give rise to
the possible naming bugs that argue against dynamic scope in general. Also,
dynamic scope as used in Logo has the problem that you have to be careful
not to allow a formal parameter name to be the same as the name of a
variable you want to use from the superprocedure's environment. For example,
in the Logo version of <CODE>increment</CODE>, what if you wanted to use <CODE>
increment</CODE> to increment a variable named <CODE>var</CODE>? If you try to say
<P><PRE>increment "var
</PRE>
<P>it won't work, because <CODE>increment</CODE> will end up trying to
increment its own formal parameter. (This is why the inputs to some of the
Berkeley Logo library procedures have such long, obscure names.) But the
Pascal <CODE>increment</CODE> would have no trouble with a variable named <CODE>v</CODE>
in the calling procedure.
<P>On the other hand, call by reference is a little mysterious. If you've
understood all of the above, and you know exactly when you should say <CODE>var</CODE>
in a formal parameter list and when you shouldn't, you're doing better than
most beginning Pascal students. In Logo there is only one rule about
passing inputs to procedures; to make something like <CODE>increment</CODE> work,
you <EM>explicitly</EM> pass the <EM>name</EM> of a variable as input.
<P>Call by reference is generally used when a subprocedure needs to change the
value of a variable in a superprocedure. But there is also another
situation in which some people use it. Suppose you want to write a
procedure that takes a large array as an argument. If you make the array a
value parameter, Pascal will allocate space for the array in the
subprocedure and will copy each member of the array from the
superprocedure's variable into the subprocedure's variable as the first step
in invoking the subprocedure. This time-consuming array copying can be
avoided by declaring the array as a variable parameter, thereby giving the
subprocedure direct access to the superprocedure's array. Pascal enthusiasts
consider this use of call by reference cheating, though, because it creates
the possibility that the subprocedure could accidentally change something in
the superprocedure's array. Call by value is safer, from this point of view.
<P><H2>Parameters in Logo: Call by Binding</H2>
<P>Does Logo use call by value or call by reference for passing arguments to
procedures? The official textbook answer is "call by value," but I find
that misleading, because those two categories really make sense
only in a language with a particular idea of what a variable is. A
Logo variable is different from a Pascal variable in a
subtle way. If you can understand this difference between the two
languages, you'll have learned something very valuable.
<P>In Logo, the world is full of data (words, lists, and arrays).
These data may or may not be associated with variables. For example, when
you enter an instruction like
<P><PRE>print butlast butfirst [Yes, this is a list evidently.]
</PRE>
<P>three different lists are involved: the one you typed explicitly
in the instruction and two smaller lists. None of these three lists is
the value of any variable. A <EM>variable</EM> is an
association (called a <EM>binding</EM>) between a name and a datum. If
you enter the instruction
<P><PRE>make "langs [Logo Lisp APL]
</PRE>
<P>we say that the name <CODE>langs</CODE> <EM>is bound to</EM> the indicated
list. If you then do
<P><PRE>make "mylangs :langs
</PRE>
<P>we say that the name <CODE>mylangs</CODE> is bound to the <EM>same</EM>
datum as <CODE>langs</CODE>. We're dealing with one list that has two names.
<P>
<CENTER><IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch4/bindings.gif" ALT="figure: bindings"></CENTER>
<P>
<P>In Pascal a variable is not a binding in this sense. A Pascal variable <EM>
is</EM> the datum it contains. If you have two array variables
<P><PRE>var this,that: array [1..10] of integer;
</PRE>
<P>and you do an assignment like
<P><PRE>this := that;
</PRE>
<P>then there are two <EM>separate</EM> arrays that happen to have
equal values stored in them. The same thing is true, although it's less
obviously meaningful, about scalars. If integer variables <CODE>i</CODE> and <CODE>
j</CODE> both have the value 10, then there are <EM>two different integers</EM>
that happen to have the same value. That's not the way a mathematician uses
the word "integer"; to a mathematician, there is only one 10. But to a
Pascal programmer, an integer isn't something like 10; an integer is a box,
and in the box there might be something like 10.
<P>In Logo, a variable assignment (that is, an invocation of <CODE>make</CODE>)
changes the <EM>binding</EM> of the given variable name so that that name is
now associated with a different datum. In Pascal, a variable assignment
changes the value of <EM>the datum</EM> that is unalterably associated with
the named variable.
<P>The official textbook story is this: A Logo variable is a box, just as
a Pascal variable is a box. The difference is that what goes in the box,
in Logo, is a <EM>pointer</EM> to the datum of interest. (A binding,
officially, is the connection between the variable's name and its box.
So there are two levels of indirection in finding what we ordinarily
think of as the value of a variable: First the binding gets us from the
name to a box--a location in the computer's memory--and then in that box we
find a pointer, which gets us to <EM>another</EM> location in memory, which
holds the actual information we want. In this model, call by reference
can easily be described by saying that two different names are bound to
the same box.) From this point of view, it makes
sense to say that Logo uses call by value, because the "value" in question
is the pointer, which is indeed copied when a procedure is called.
<P>But ordinarily we don't think of the value of a Logo variable as being a
pointer; we think that the value of the variable is a word, a list, or an
array. From that point of view, parameter passing in Logo acts like call by
reference in some ways but like call by value in other ways. For example,
call by value makes a copy of the datum being passed. Logo does not copy
the actual datum, so in that respect it's like call by reference. On the
other hand, assigning a new value to a Logo formal parameter does not change
the value of any variables in the calling procedure; in that way, Logo works
like call by value. On the third hand, if the datum to which the formal
parameter is bound is a <EM>mutable</EM> data structure, such as an array, a
Logo subprocedure <EM>can</EM> change the value of a variable in the calling
procedure, not by assigning a new value to the formal parameter name
(changing the binding), but by invoking <CODE>setitem</CODE> on the shared datum
(altering the bound datum itself).
<P>The following chart is a graphic representation of the ideas in the
last paragraph. The three columns represent Pascal call by value, Pascal
call by reference, and Logo. In each case the main program has two arrays
named <CODE>actual</CODE> and <CODE>other</CODE>; it then invokes a procedure <CODE>proc</CODE>
using <CODE>actual</CODE> as the actual argument providing the value for <CODE>
proc</CODE>'s formal parameter <CODE>formal</CODE>. Here are the programs:
<P><TABLE>
<TR><TH>Pascal call by value<TH> <TH>Pascal call by reference
<TR><TD>
<TR><TD valign="top"><PRE>program pgm;
type pair = array [0..1]
of integer;
var actual,other: pair;
procedure proc(formal:pair);
begin
<EM>body</EM>
end
begin
actual[0] := 3;
actual[1] := 4;
other[0] := 5;
other[1] := 6;
proc(actual)
end.
</PRE><TD><TD valign="top"><PRE>program pgm;
type pair = array [0..1]
of integer;
var actual,other: pair;
procedure proc(var formal:pair);
begin
<EM>body</EM>
end
begin
actual[0] := 3;
actual[1] := 4;
other[0] := 5;
other[1] := 6;
proc(actual)
end.
</PRE></TABLE>
<P><TABLE><TR><TH>Logo
<TR><TD>
<TR><TD><PRE>make "actual {3 4}@0
make "other {5 6}@0
proc :actual
to proc :formal
<EM>body</EM>
end
</PRE></TABLE>
<P>
<CENTER><IMG SRC="callby.gif" ALT="figure: callby"></CENTER>
<P>The first row of the figure shows the situation when <CODE>proc</CODE> is
entered, before its body is executed. The second row shows what happens if
<CODE>proc</CODE> contains an assignment of <CODE>other</CODE> to <CODE>formal</CODE>, i.e.,
<P><PRE>formal := other
</PRE>
<P>in either Pascal version or
<P><PRE>make "formal :other
</PRE>
<P>in the Logo version. The third row shows what happens if, instead,
<CODE>proc</CODE> contains an assignment of just one member of the array, i.e.,
<P><PRE>formal[1] := other[1]
</PRE>
<P>in either Pascal version or
<P><PRE>setitem 1 :formal (item 1 :other)
</PRE>
<P>in the Logo version. Your goal is to see what happens to <CODE>
actual</CODE> in each case when <CODE>proc</CODE> is finished.
<P>Our final Pascal program example, showing the use of call by reference,
is a version of the partition sort from Chapter 3 that uses the
technique of exchanging two array members when appropriate to divide the
array into two partitions "in place" (without requiring the allocation of
extra arrays to hold the partitions). This program is adapted from Jon
Bentley's version in <EM>Programming Pearls</EM> (Addison-Wesley, 1986).
It's much closer in style to the real quicksort than my list-based version.
<P>In the partition sort program of Chapter 3, I had to put a lot of effort
into preventing a situation in which every member of the list being sorted
ended up on the same side of the partition value. The quicksort solution
starts by choosing some member of the array as the partition value and
excluding that member from the partitioning process. As a result, the worst
possible case is that the <EM>n</EM> members of the array are divided into the
partitioning member, a partition of size <EM>n</EM>−1, and a partition of size zero.
If we're unlucky enough to hit that case every time, we'll have an O(<EM>n</EM><SUP><SMALL>2</SMALL></SUP>)
running time, but not an infinite loop.
<P>How do we choose the partitioning member? It turns out that just picking
one at random is surprisingly successful; sometimes you get a very bad
choice, but usually not. But in this program I'm using a popular method
that tends to work a little better (that is, to give more balanced partition
sizes): The program finds the first member of the unsorted array, the last
member, and the one halfway in between, and chooses the <EM>median</EM> of
these three values--the one that's neither the largest nor the smallest.
<P>Once the partitioning member has been chosen, the goal is to rearrange the
array members into an order like this:
<P><CENTER><IMG SRC="partition.gif" ALT="figure: partition"></CENTER>
<P>If other members of the array have the same value as the one we've
chosen as the partitioning member, it doesn't really matter in which partition
they end up. What does matter is that before doing the partitioning, we
don't know where in the array the partitioning member will belong, so how can
we keep from bumping into it as we rearrange the other members? The
solution is that the partitioning member is temporarily kept in the leftmost
possible position; the other members are partitioned, and then the
partitioning member is swapped back into its proper position.
<P>The partition works using two <EM>index</EM> variables <CODE>i</CODE> and <CODE>j</CODE>,
which start at the leftmost and rightmost ends of the part of the array that
we're sorting. (Remember that this algorithm uses recursive calls to sort
each partition, so that might not be all 100 members of the full array.) We
move <CODE>i</CODE> toward the right, and <CODE>j</CODE> toward the left, until we find
two members out of place. That is, we look for a situation in which <CODE>
data[i]</CODE> is greater than the partitioning member and <CODE>data[j]</CODE> is
smaller than the partitioning member. We then interchange those two members
of the array and continue until <CODE>i</CODE> and <CODE>j</CODE> meet in the middle.
Procedure <CODE>exch</CODE> has two variable parameters and exchanges their values.
<P>Program <CODE>psort</CODE> illustrates a fairly common but not obvious technique:
the array <CODE>data</CODE> contains 100 "real" members in positions 0 to 99 but
also has a "fence" or "sentinel" member (with index 100)
just so that the program doesn't have to make a special case check for the
index variable <CODE>i</CODE> reaching the end of the array. The
value of <CODE>data[100]</CODE> is guaranteed to be greater than all the numbers that
are actually being sorted. Having this extra member in the array avoids the
need for an extra comparison of the form
<P><PRE>if i > upper then ...
</PRE>
<P>and thereby helps make the program a little faster.
<P><PRE>program psort;
{partition sort demo}
var data: array [0..100] of integer;
i: integer;
procedure showdata;
{print the array}
var i: integer;
begin {showdata}
for i := 0 to 99 do
begin
if i mod 20 = 0 then writeln;
write(data[i]:3)
end;
writeln;
writeln
end; {showdata}
function median(lower,upper:integer):integer;
{find the median of three values from the data array}
var mid: integer;
begin
mid := (lower+upper) div 2;
if (data[lower] <= data[mid]) and (data[mid] <= data[upper]) then
median := mid
else if (data[lower] >= data[mid]) and
(data[mid] >= data[upper]) then
median := mid
else if (data[mid] <= data[lower]) and
(data[lower] <= data[upper]) then
median := lower
else if (data[mid] >= data[lower]) and
(data[lower] >= data[upper]) then
median := lower
else median := upper
end;
procedure sort(lower,upper:integer);
{sort part of the array}
var key,i,j:integer;
procedure exch(var a,b:integer);
{exchange two integers}
var temp:integer;
begin {exch}
temp := a;
a := b;
b := temp
end; {exch}
begin {sort}
if upper > lower then
begin
exch (data[lower],data[median(lower,upper)]);
key := data[lower];
i := lower;
j := upper+1;
repeat
i := i+1
until data[i] >= key;
repeat
j := j-1
until data[j] <= key;
while (i <= j) do
begin
exch(data[i], data[j]);
repeat
i := i+1
until data[i] >= key;
repeat
j := j-1
until data[j] <= key
end;
exch(data[lower], data[j]);
sort(lower,j-1);
sort(i,upper)
end
end; {sort}
begin {main program}
data[100] := 200;
for i := 0 to 99 do
data[i] := random(100);
writeln('Data before sorting:');
showdata;
sort(0,99);
writeln('Data after sorting:');
showdata
end.
</PRE>
<P>
<P><A HREF="../v3-toc2.html">(back to Table of Contents)</A>
<P><A HREF="../v3ch3/v3ch3.html"><STRONG>BACK</STRONG></A>
chapter thread <A HREF="../v3ch5/v3ch5.html"><STRONG>NEXT</STRONG></A>
<P>
<ADDRESS>
<A HREF="../index.html">Brian Harvey</A>,
<CODE>bh@cs.berkeley.edu</CODE>
</ADDRESS>
</BODY>
</HTML>