|
Program file for this chapter: basic
The BASIC programming language was designed by John Kemeny and Thomas Kurtz in the late 1960s. (The name is an acronym for Beginner's All-purpose Symbolic Instruction Code.) It was first implemented on a large, central computer facility at Dartmouth; the designers' goal was to have a language that all students could use for simple problems, in contrast to the arcane programming languages used by most experts at that time.
A decade later, when the microcomputer was invented, BASIC took on a new importance. Kemeny and Kurtz designed a simple language for the sake of the users, but that simplicity also made the language easy for the computer! Every programming language requires a computer program to translate it into instructions that the computer can carry out. For example, the Logo programs you write are translated by a Logo interpreter. But Logo is a relatively complex language, and a Logo interpreter is a pretty big program. The first microcomputers had only a few thousand bytes of memory. (Today's home computers, by contrast, have several million bytes.) Those early personal computers couldn't handle Logo, but it was possible to write a BASIC interpreter that would fit them. As a result, BASIC became the near-universal language for amateur computer enthusiasts in the late 1970s and early 1980s.
Today's personal computers come with translators for a wide variety of programming languages, and also with software packages that enable many people to accomplish their computing tasks without writing programs of their own at all. BASIC is much less widely used today, although it has served as the core for Microsoft's "Visual Basic" language.
In this chapter, I want to show how Logo's define
command can be
used in a program-writing program. My program will translate BASIC
programs into Logo programs. I chose BASIC for the same reason the
early microcomputers used it: It's a small language and the translator
is relatively easy to write. (Kemeny and Kurtz, the designers of BASIC,
have criticized the microcomputer implementations as too simple and
as unfaithful to their original goals. My implementation will share that
defect, to make the project easier. Don't use this version as a basis on
which to judge the language! For that you should investigate True Basic,
the version that Kemeny and Kurtz wrote themselves for personal computers.)
Here's a typical short BASIC program:
10 print "Table of Squares" 20 print 30 print "How many values would you like?" 40 input num 50 for i=1 to num 60 print i, i*i 70 next i 80 end
And here's what happens when we run it:
Table of Squares How many values would you like? 5 1 1 2 4 3 9 4 16 5 25
Each line in the sample BASIC program begins with a line number. These numbers are used for program editing. Instead of the modern screen editors with which you're familiar, the early versions of BASIC had a very primitive editing facility; you could replace a line by typing a new line with the same number. There was no way to replace less than an entire line. To delete a line completely, you'd enter a line containing just the number. The reason the line numbers in this program are multiples of ten is to leave room for inserting new lines. For example, I could say
75 print "Have a nice day."
to insert a new line between lines 70 and 80. (By the way, the earliest versions of Logo used a similar line numbering system, except that each Logo procedure was separately numbered. The editing technique isn't really part of the language design; early systems used "line editors" because they had typewriter-like paper terminals instead of today's display screens. I'm using a line editor in this project because it's easy to implement!)
The BASIC language consists of one or two dozen commands, depending on the version used. My BASIC dialect understands only these ten commands:
LET variable = value PRINT values INPUT variables FOR variable = value TO value NEXT variable IF value THEN command GOTO linenumber GOSUB linenumber RETURN END
Unlike Logo procedure calls, which consist of the procedure
name followed by inputs in a uniform format, each BASIC command has its
own format, sometimes including internal separators such as the equal sign
and the word to
in the for
command format, or the word then
in the if
command format.
In some versions of BASIC, including this one, a single line can contain more than one command, if the commands are separated with colons. Thus the same program shown earlier could also be written this way:
10 print "Table of Squares":print 30 print "How many values would you like?":input num 50 for i=1 to num : print i, i*i : next i 80 end
The let
command assigns a value to a variable, like Logo's
make
procedure. Unlike Logo, BASIC does not have the rule that
all inputs are evaluated before applying the command. In particular, the
word after let
must be the name of the variable, not an
expression whose value is the name. Therefore the name is not quoted.
Also, a variable can't have the same name as a procedure, so there is no
need for anything like Logo's use of the colon to indicate a variable
value. (This restricted version of BASIC doesn't have named procedures at
all, like some early microcomputer versions.)
make "x :y + 3 (Logo) let x = y + 3 (BASIC)
In my subset of BASIC, the value of a variable must be a number. More complete BASIC dialects include string variables (like words in Logo) and arrays (like Logo's arrays).
The value to be assigned to a variable can be computed using an arithmetic
expression made up of variables, numbers, the arithmetic operators
+
, -
, *
, and /
, and
parentheses for grouping.
The print
command is similar to Logo's print procedure in that it
prints a line on the screen. That line can include any number of values.
Here is an example print
command:
print "x = "; x, "y = "; y, "sum = "; x+y
In this example two kinds of values are printed: arithmetic
values (as in the let
command) and strings. A string is any
sequence of characters surrounded by quotation marks.
Notice that the values in this example are separated by punctuation marks, either commas or semicolons. When a semicolon is used, the two values are printed right next to each other, with no space between them. (That's why each of the strings in this example ends with a space.) When a comma is used, BASIC prints a tab character between the two values, so that values on different lines will line up to form columns. (Look again at the table of squares example at the beginning of this chapter.)
The input
command is the opposite of print
; it reads values from
the keyboard and assigns them to variables. There is nothing in Logo exactly
like input
. Instead, Logo has operations readword
and
readlist
that output the contents of a line; those values can be
assigned to variables using make
or can be used in some other way.
The Logo approach is more flexible, but the early versions of BASIC didn't
have anything like Logo's operations. The input
command will also
accept a string in quotation marks before its list of variables; that string
is printed as a prompt before BASIC reads from the keyboard. (BASIC does not
start a new line after printing the prompt, so the effect is like Logo's
type
command rather than like print
.) Here's an example:
input "Please enter x and y: " x,y
The user can type the values for x and y on the same line,
separated by spaces, or on separate lines. BASIC keeps reading lines until
it has collected enough numbers for the listed variables. Notice that the
variable names in the input
command must be separated by commas, not
by semicolons.
The for
and next
commands work together to provide
a numeric iteration capability like Berkeley Logo's for
procedure. The for
command format includes a variable name, a
starting value, and an ending value. (The step value is always 1.) The
named variable is given the specified starting value. If that value is less
than the ending value, then all of the commands between the for
command and the matching next
command (the one with the same
named variable) are carried out. Then the variable is increased by 1, and
the process continues until the ending value is reached. For
and next
pairs with different variables can be nested:
10 input "Input size: " num 20 for i = 1 to num 30 for j = i to num 40 print i;" ";j 50 next j:next i 60 end Input size: 4 1 1 1 2 1 3 1 4 2 2 2 3 2 4 3 3 3 4 4 4
Notice that the next j
must come before the next i
so
that the for
/next
pairs are properly nested.
The if
command allows conditional execution, much like Logo's
if
command, but with a different notation. Instead of taking
an instruction list as an input, BASIC's if
uses the keyword
then
to introduce a single conditional command. (If you want
to make more than one command conditional, you must combine if
with goto
, described next.) The value that controls the
if
must be computed using one of the operators =
,
<
, or >
for numeric comparison.*
*Notice that the equal sign has two
meanings in BASIC. In the
let
command, it's like Logo's
make
; in the if
command, it's like Logo's
equalp
. In the early 1980s, Logo enthusiasts had fierce
arguments with BASIC fans, and this sort of notational inconsistency was one
of the things that drove us crazy! (More serious concerns were the lack of
operations and of recursion in the microcomputer versions of
BASIC.)
The goto
command transfers control to the beginning of a command
line specified by its line number. It can be used with if
to make a
sequence of commands conditional:
10 input x 20 if x > 0 then goto 100 30 print "x is negative." 40 print "x = "; x 50 goto 200 100 print "x is positive." 200 end
The gosub
and return
commands provide a rudimentary procedure
calling mechanism. I call it "rudimentary" because the procedures have no
inputs, and can only be commands, not operations. Also, the command lines
that make up the procedure are also part of the main program, so you
generally need a goto
in the main program to skip over them:
10 let x=7 20 gosub 100 30 let x=9 40 gosub 100 50 goto 200 100 print x, x*x 110 return 200 end
Finally, the end
command ends the program. There must be an end
at the end of a BASIC program, and there should not be one anywhere else.
(In this implementation of BASIC, an end
stops the BASIC program even
if there are more lines after it. It's roughly equivalent to a throw
to toplevel
in Logo.)
To start the translator, run the Logo procedure basic
with no inputs.
You will then see the BASIC prompt, which is the word READY
on a line
by itself.
At the prompt you can do either of two things. If you type a line starting with a line number, that line will be entered into your BASIC program. It is inserted in order by line number. Any previous line with the same number will be deleted. If the line you type contains only a line number, then the line in the program with that number will be deleted.
If your line does not start with a number, then it is taken as an
immediate command, not as part of the program. This version of
BASIC recognizes only three immediate commands: The word run
means to run your program, starting from the smallest line number. The word
list
means to print out a listing of the program's lines, in
numeric order. The word exit
returns to the Logo prompt.
There are two kinds of translators for programming languages: compilers and interpreters. The difference is that a compiler translates one language (the source language) into another (the target language), leaving the result around so that it can be run repeatedly without being translated again. An interpreter translates each little piece of source language into one action in the target language and runs the result, but does not preserve a complete translated program in the target language.
Ordinarily, the target language for both compilers and interpreters is the "native" language of the particular computer you're using, the language that is wired into the computer hardware. This machine language is the only form in which a program can actually be run. The BASIC compiler in this chapter is quite unrealistic in that it uses Logo as the target language, which means that the program must go through another translation, from Logo to machine language, before it can actually be run. For our purposes, there are three advantages to using Logo as the target language. First, every kind of computer has its own machine language, so I'd have to write several versions of the compiler to satisfy everyone if I compiled BASIC into machine language. Second, I know you know Logo, so you can understand the resulting program, whereas you might not be familiar with any machine language. Third, this approach allows me to cheat by leaving out a lot of the complexity of a real compiler. Logo is a "high level" language, which means that it takes care of many details for us, such as the allocation of specific locations in the computer's memory to hold each piece of information used by the program. In order to compile into machine language, I'd have to pay attention to those details.
Why would anyone want an interpreter, if the compiler translates the program once and for all, while the interpreter requires retranslation every time a command is carried out? One reason is that an interpreter is easier to write, because (just as in the case of a compiler with Logo as the target language) many of the details can be left out. Another reason is that traditional compilers work using a batch method, which means that you must first write the entire program with a text editor, then run the compiler to translate the program into machine language, and finally run the program. This is okay for a working program that is used often, but not recompiled often. But when you're creating a program in the first place, there is a debugging process that requires frequent modifications to the source language program. If each modification requires a complete recompilation, the debugging is slow and frustrating. That's why interpreted languages are often used for teaching--when you're learning to program, you spend much more time debugging a program than running the final version.
The best of both worlds is an incremental compiler, a compiler that can recompile only the changed part when a small change is made to a large program. For example, Object Logo is a commercial version of Logo for the Macintosh in which each procedure is compiled when it is defined. Modifying a procedure requires recompiling that procedure, but not recompiling the others. Object Logo behaves like an interpreter, because the user doesn't have to ask explicitly for a procedure to be compiled, but programs run faster in Object Logo than in most other versions because each procedure is translated only once, rather than on every invocation.
The BASIC translator in this chapter is an incremental compiler. Each
numbered line is compiled into a Logo procedure as soon as it is typed in.
If the line number is 40
then the resulting procedure will be
named basic%40
. The last step in each of these procedures is
to invoke the procedure for the next line. The compiler maintains a list of
all the currently existing line numbers, in order, so the run
command is implemented by saying
run (list (word "basic% first :linenumbers))
Actually, what I just said about each procedure ending with an invocation of the next one is slightly simplified. Suppose the BASIC program starts
10 let x=3 20 let y=9 30 ...
and we translate that into
to basic%10 make "%x 3 basic%20 end to basic%20 make "%y 9 basic%30 end
Then what happens if the user adds a new line numbered 15? We
would have to recompile line 10 to invoke basic%15
instead of
basic%20
. To avoid that, each line is compiled in a way that
defers the choice of the next line until the program is actually run:
to basic%10 make "%x 3 nextline 10 end to basic%20 make "%y 9 nextline 20 end
This solution depends on a procedure nextline
that finds
the next available line number after its argument:
to nextline :num make "target member :num :linenumbers if not emptyp :target [make "target butfirst :target] if not emptyp :target [run (list (word "basic% first :target))] end
Nextline
uses the Berkeley Logo primitive member
,
which is like the predicate memberp
except that if the first
input is found as a member of the second, instead of giving
true
as its output, it gives the portion of the second input
starting with the first input:
? show member "the [when in the course of human events] [the course of human events]
If the first input is not a member of the second, member
outputs an empty word or list, depending on the type of the second input.
The two separate emptyp
tests are used instead of a single if
because the desired line number might not be in the list at all, or it might
be the last one in the list, in which case the butfirst
invocation
will output an empty list. (Neither of these cases should arise. The first
means that we're running a line that doesn't exist, and the second means
that the BASIC program doesn't end with an end
line. But the procedure
tries to avoid disaster even in these cases.)
Look again at the definition of basic%10
. You'll see that the
variable named x
in the BASIC program is named %x
in the
Logo translation. The compiler uses this renaming technique to ensure that
the names of variables and procedures in the compiled program don't conflict
with names used in the compiler itself. For example, the compiler uses a
variable named linenumbers
whose value is the list of line numbers.
What if someone writes a BASIC program that says
10 let linenumbers = 100
This won't be a problem because in the Logo translation, that
variable will be named %linenumbers
.
The compiler can be divided conceptually into four parts:
let
is a single
word, but x+1
should be understood as three separate words.
if
, the parser expects an expression followed by the word
then
and another command.
nextline
procedure discussed earlier is an example.
Real compilers have the same structure, except of course that the code generator produces machine language instructions rather than Logo instructions. Also, a professional compiler will include an optimizer that looks for ways to make the compiled program as efficient as possible.
A reader is a program that reads a bunch of characters
(typically one line, although not in every language) and divides those
characters into meaningful units. For example, every Logo implementation
includes a reader that interprets square brackets as indications of
list grouping. But some of the rules followed by the Logo reader differ
among implementations. For example, can the hyphen character (-
) be
part of a larger word, or is it always a word by itself? In a context in
which it means subtraction, we'd like it to be a word by itself. For
example, when you say
print :x-3
as a Logo instruction, you mean to print three less than the
value of the variable named x
, not to print the value of a variable
whose name is the three-letter word x-3
! On the other hand, if you
have a list of telephone numbers like this:
make "phones [555-2368 555-9827 555-8311]
you'd like the first
of that list to be an entire
phone number, the word 555-2368
, not just 555
. Some Logo
implementations treat every hyphen as a word by itself; some treat
every hyphen just like a letter, and require that you put spaces around
a minus sign if you mean subtraction. Other implementations, including
Berkeley Logo, use a more complicated rule in which the status of the
hyphen depends on the context in which it appears, so that both of the
examples in this paragraph work as desired.
In any case, Logo's reader follows rules that are not appropriate for BASIC.
For example, the colon (:
) is a delimiter in BASIC, so it should be
treated as a word by itself; in Logo, the colon is paired with the variable
name that follows it. In both languages, the quotation mark ("
) is
used to mark quoted text, but in Logo it comes only at the beginning of
a word, and the quoted text ends at the next space character, whereas in
BASIC the quoted text continues until a second, matching quotation mark.
For these and other reasons, it's desirable to have
a BASIC-specific reader for use in this project.
The rules of the BASIC reader are pretty simple. Each invocation of
basicread
reads one line from the keyboard, ending with the Return
or Enter character. Within that line, space characters separate words
but are not part of any word. A quotation mark begins a quoted word that
includes everything up to and including the next matching quotation mark.
Certain characters form words by themselves:
+ - * / = < > ( ) , ; :
All other characters are treated like letters; that is, they can be part of multi-character words.
? show basicread 30 print x;y;"foo,baz",z:print hello+4 [30 print x ; y ; "foo,baz" , z : print hello + 4]
Notice that the comma inside the quotation marks is not made
into a separate word by basicread
. The other punctuation characters,
however, appear in the output sentence as one-character words.
Basicread
uses the Logo primitive readword
to read a line.
Readword
can be thought of as a reader with one trivial rule: The
only special character is the one that ends a line. Everything else is
considered as part of a single long word. Basicread
examines that
long word character by character, looking for delimiters, and accumulating
a sentence of words separated according to the BASIC rules. The
implementation of basicread
is straightforward; you can read the
procedures at the end of this chapter if you're interested. For now,
I'll just take it for granted and go on to discuss the more interesting
parts of the BASIC compiler.
The parser is the part of a compiler that figures out the structure of each piece of the source program. For example, if the BASIC compiler sees the command
let x = ( 3 * y ) + 7
it must recognize that this is a let
command, which
must follow the pattern
LET variable = value
and therefore x
must be the name of a variable, while
( 3 * y ) + 7
must be an expression representing a value. The
expression must be further parsed into its component pieces. Both the
variable name and the expression must be translated into the form they
will take in the compiled (Logo) program, but that's the job of the
code generator.
In practice, the parser and the code generator are combined into one
step; as each piece of the source program is recognized, it is translated
into a corresponding piece of the object program. So we'll see that most
of the procedures in the BASIC compiler include parsing instructions and
code generation instructions. For example, here is the procedure that
compiles a let
command:
to compile.let :command make "command butfirst :command make "var pop "command make "delimiter pop "command if not equalp :delimiter "= [(throw "error [Need = in let.])] make "exp expression queue "definition (sentence "make (word ""% :var) :exp) end
In this procedure, all but the last instruction (the line
starting with queue
) are parsing the source command. The last
line, which we'll come back to later, is generating a Logo make
instruction, the translation of the BASIC let
in the object
program.
BASIC was designed to be very easy to parse. The parser can read a command
from left to right, one word at a time; at every moment, it knows exactly
what to expect. The command must begin with one of the small number of
command names that make up the BASIC language. What comes next depends on
that command name; in the case of let
, what comes next is one word
(the variable name), then an equal sign, then an expression. Each
instruction in the compile.let
procedure handles one of these pieces.
First we skip over the word let
by removing it from the front of the
command:
make "command butfirst :command
Then we read and remember one word, the variable name:
make "var pop "command
(Remember that the pop
operation removes one member
from the beginning of a list, returning that member. In this case
we are removing the variable name from the entire let
command.)
Then we make sure there's an equal sign:
make "delimiter pop "command if not equalp :delimiter "= [(throw "error [Need = in let.])]
And finally we call a subprocedure to read the expression; as we'll see later, that procedure also translates the expression to the form it will take in the object program:
make "exp expression
The parsers for other BASIC commands have essentially the same
structure as this example. They repeatedly invoke pop
to
read one word from the command or expression
to read and
translate an expression. (The if
command is a little more
complicated because it contains another command as a component,
but that inner command is just compiled as if it occurred by itself.
We'll look at that process in more detail when we get to the
code generation part of the compiler.)
Each compilation procedure expects a single BASIC command as its
input. Remember that a line in a BASIC program can include more
than one command. The compiler uses a procedure named split
to break up each line into a list of commands:
? show split [30 print x ; y ; "foo,baz" , z : print hello + 4] [30 [print x ; y ; "foo,baz" , z] [print hello + 4]]
Split
outputs a list whose first member is a line
number; the remaining members are lists, each containing one BASIC
command. Split
works by looking for colons within the
command line.
Here is the overall structure of the compiler, but with only the instructions related to parsing included:
to basic forever [basicprompt] end to basicprompt print "READY make "line basicread if emptyp :line [stop] ifelse numberp first :line [compile split :line] [immediate :line] end to compile :commands make "number first :commands ifelse emptyp butfirst :commands ~ [eraseline :number] ~ [makedef (word "basic% :number) butfirst :commands] end to makedef :name :commands ... foreach :commands [run list (word "compile. first ?) ?] ... end
Basic
does some initialization (not shown) and then
invokes basicprompt
repeatedly. Basicprompt
calls the
BASIC reader to read a line; if that line starts with a number, then
split
is used to transform the line into a list of commands,
and compile
is invoked with that list as input. Compile
remembers the line number for later use, and then invokes makedef
with the list of commands as an input. I've left out most of the
instructions in makedef
because they're concerned with code
generation, but the important part right now is that for each command
in the list, it invokes a procedure named compile.
something
based on the first word of the command, which must be one of the
command names in the BASIC language.
Each line of the BASIC source program is going to be compiled into one
Logo procedure. (We'll see shortly that the BASIC if
and for
commands are exceptions.) For example, the line
10 let x = 3 : let y = 4 : print x,y+6
will be compiled into the Logo procedure
to basic%10 make "%x 3 make "%y 4 type :%x type char 9 type :%y + 6 print [] nextline 10 end
Each of the three BASIC commands within the source line
contributes one or more instructions to the object procedure. Each
let
command is translated into a make
instruction; the
print
command is translated into three type
instructions
and a print
instruction. (The last instruction line in the
procedure, the invocation of nextline
, does not come from
any of the BASIC commands, but is automatically part of the translation
of every BASIC command line.)
To generate this object procedure, the BASIC compiler is going to have
to invoke Logo's define
primitive, this way:
define "basic%10 [[] [make "%x 3] [make "%y 4] ... [nextline 10]]
Of course, these actual inputs do not appear explicitly in
the compiler! Rather, the inputs to define
are variables that
have the desired values:
define :name :definition
The variable name
is an input to makedef
, as we've
seen earlier. The variable definition
is created within
makedef
. It starts out as a list containing just the empty
list, because the first sublist of the input to define
is the
list of the names of the desired inputs to basic%10
, but it has
no inputs. The procedures within the compiler that parse each of the
commands on the source line will also generate object code (that is, Logo
instructions) by appending those instructions to the value of
definition
using Logo's queue
command.
Queue
takes two inputs: the name of a variable whose value is
a list, and a new member to be added at the end of the list. Its effect is
to change the value of the variable to be the extended list.
Look back at the definition of compile.let
above. Earlier we
considered the parsing instructions within that procedure, but deferred
discussion of the last instruction:
queue "definition (sentence "make (word ""% :var) :exp)
Now we can understand what this does: It generates a Logo
make
instruction and appends that instruction to the object
procedure definition in progress.
We can now also think about the output from the expression
procedure.
Its job is to parse a BASIC expression and to translate it into the
corresponding Logo expression. This part of the compiler is one of the
least realistic. A real compiler would have to think about such issues as
the precedence of arithmetic operations; for example, an expression like
3+x*4
must be translated into two machine language instructions, first
one that multiplies x
by 4, and then one that adds the result of that
multiplication to 3. But the Logo interpreter already handles that aspect
of arithmetic for us, so all expression
has to do is to translate
variable references like x
into the Logo form :%x
.
? show expression [3 + x * 4] [3 + :%x * 4]
(We'll take a closer look at translating arithmetic expressions in the Pascal compiler found in the third volume of this series, Beyond Programming.)
We are now ready to look at the complete version of makedef
:
to makedef :name :commands make "definition [[]] foreach :commands [run list (word "compile. first ?) ?] queue "definition (list "nextline :number) define :name :definition make "linenumbers insert :number :linenumbers end
I hope you'll find this straightforward. First we create
an empty definition. Then, for each BASIC command on the line, we
append to that definition whatever instructions are generated by
the code generating instructions for that command. After all the
BASIC commands have been compiled, we add an invocation of nextline
to the definition. Now we can actually define the Logo procedure whose
text we've been accumulating. The last instruction updates the list
of line numbers that nextline
uses to find the next BASIC command
line when the compiled program is running.
In a sense, this is the end of the story. My purpose in this chapter
was to illustrate how define
can be used in a significant project,
and I've done that. But there are a few more points I should explain
about the code generation for some specific BASIC commands, to complete
your understanding of the compiler.
One such point is about the difference between goto
and
gosub
. Logo doesn't have anything like a goto
mechanism; both goto
and gosub
must be implemented
by invoking the procedure corresponding to the given line number. The
difference is that in the case of goto
, we want to invoke that
procedure and not come back! The solution is to compile the BASIC command
goto 40
into the Logo instructions
basic%40 stop
In effect, we are calling line 40 as a subprocedure, but
when it returns, we're finished. Any additional Logo instructions
generated for the same line after the goto
(including the
invocation of nextline
that's generated automatically for
every source line) will be ignored because of the stop
.*
*In fact, the Berkeley Logo interpreter
is clever enough to notice that there is a
stop
instruction
after the invocation of basic%40
, and it arranges things so
that there is no "return" from that procedure. This makes things a little
more efficient, but doesn't change the meaning of the
program.
The next tricky part of the compiler has to do with the for
and
next
commands. Think first about next
. It must
increment the value of the given variable, test that value against a
remembered limit, and, if the limit has not been reached, go to... where?
The for
loop continues with the BASIC command just after the
for
command itself. That might be in the middle of a line, so
next
can't just remember a line number and invoke
basic%N
for line number N. To solve this problem, the
line containing the for
command is split into two Logo
procedures, one containing everything up to and including the
for
, and one for the rest of the line. For example, the line
30 let x = 3 : for i = 1 to 5 : print i,x : next i
is translated into
to basic%30 make "%x 3 make "%i 1 make "let%i 5 make "next%i [%g1] %g1 end to %g1 type :%i type char 9 type :%x print [] make "%i :%i + 1 if not greaterp :%i :let%i [run :next%i stop] nextline 30 end
The first make
instruction in basic%30
is the
translation of the let
command. The remaining four lines are
the translation of the for
command; it must give an initial
value to the variable i
, remember the limit value 5, and
remember the Logo procedure to be used for looping. That latter procedure
is named %g1
in this example. The percent sign is used for the
usual reason, to ensure that the names created by the compiler don't
conflict with names in the compiler itself. The g1
part is a
generated symbol, created by invoking the Berkeley Logo primitive
operation gensym
. Each invocation of gensym
outputs a new symbol, first g1
, then g2
, and so on.
The first four instructions in procedure %g1
(three
type
s and a print
) are the translation of the
BASIC print
command. The next two instructions are the
translation of the next
command; the make
instruction increments i
, and the if
instruction
tests whether the limit has been passed, and if not, invokes the looping
procedure %g1
again. (Why does this say
run :next%i
instead of just %g1
? Remember that
the name %g1
was created during the compilation of the
for
command. When we get around to compiling the
next
command, the code generator has no way to remember which
generated symbol was used by the corresponding for
. Instead it
makes reference to a variable next%i
, named after the variable
given in the next
command itself, whose value is the name of
the procedure to run. Why not just call that procedure itself
next%i
instead of using a generated symbol? The trouble is
that there might be more than one pair of for
and
next
commands in the same BASIC program using the same
variable, and each of them must have its own looping procedure name.)
There is a slight complication in the print
and
input
commands to deal with quoted character strings. The
trouble is that Logo's idea of a word ends with a space, so it's not easy to
translate
20 print "hi there"
into a Logo instruction in which the string is explicitly present in the instruction. Instead, the BASIC compiler creates a Logo global variable with a generated name, and uses that variable in the compiled Logo instructions.
The trickiest compilation problem comes from the if
command, because
it includes another command as part of itself. That included command might
be translated into several Logo instructions, all of which should be
made to depend on the condition that the if
is testing. The solution
is to put the translation of the inner command into a separate procedure,
so that the BASIC command line
50 if x<6 then print x, x*x
is translated into the two Logo procedures
to basic%50 if :%x < 6 [%g2] nextline 50 end to %g2 type :%x type char 9 type :%x * :%x print [] end
Unfortunately, this doesn't quite work if the inner command is
a goto
. If we were to translate
60 if :foo < 10 then goto 200
into
to basic%60 if :%foo < 10 [%g3] nextline 60 end to %g3 basic%200 stop end
then the stop
inside %g3
would stop only %g3
itself, not basic%60
as desired. So the code generator for if
checks to see whether the result of compiling the inner command is a single
Logo instruction line; if so, that line is used directly in the compiled
Logo if
rather than diverted into a subprocedure:
to basic%60 if :%foo < 10 [basic%200 stop] nextline 60 end
How does the code generator for if
divert the result of compiling
the inner command away from the definition of the overall BASIC command
line? Here is the relevant part of the compiler:
to compile.if :command make "command butfirst :command make "exp expression make "delimiter pop "command if not equalp :delimiter "then [(throw "error [Need then after if.])] queue "definition (sentence "if :exp (list c.if1)) end to c.if1 local "definition make "definition [[]] run list (word "compile. first :command) :command ifelse (count :definition) = 2 ~ [output last :definition] ~ [make "newname word "% gensym define :newname :definition output (list :newname)] end
The first few lines of this are straightforwardly parsing the
part of the BASIC if
command up to the word then
. What
happens next is a little tricky; a subprocedure c.if1
is invoked
to parse and translate the inner command. It has to be a subprocedure
because it creates a local variable named definition
; when the
inner command is compiled, this local variable "steals" the generated
code. If there is only one line of generated code, then c.if1
outputs
that line; if more than one, then c.if1
creates a subprocedure
and outputs an instruction to invoke that subprocedure. This technique
depends on Logo's dynamic scope, so that references to the variable
named definition
in other parts of the compiler (such as, for
example, compile.print
or compile.goto
) will refer to this
local version.
We've already seen the most important part of the runtime library: the
procedure nextline
that gets the compiled program from one line
to the next.
There is only one more procedure needed as runtime support; it's called
readvalue
and it's used by the BASIC input
command. In BASIC
, data input is independent of lines. If a
single input
command includes two variables, the user can type
the two desired values on separate lines or on a single line. Furthermore,
two separate input
commands can read values from a
single line, if there are still values left on the line after the first
input
has been satisfied. Readvalue
uses a global
variable readline
whose value is whatever's still available
from the last data input line, if any. If there is nothing available, it
reads a new line of input.
A more realistic BASIC implementation would include runtime library procedures to compute built-in functions (the equivalent to Logo's primitive operations) such as absolute value or the trigonometric functions.
This BASIC compiler leaves out many features of a complete implementation. In a real BASIC, a string can be the value of a variable, and there are string operations such as concatenation and substring extraction analogous to the arithmetic operations for numbers. The BASIC programmer can create an array of numbers, or an array of strings. In some versions of BASIC, the programmer can define named subprocedures, just as in Logo. For the purposes of this chapter, I wanted to make the compiler as simple as possible and still have a usable language. If you want to extend the compiler, get a BASIC textbook and start implementing features.
It's also possible to expand the immediate command capabilities of the
compiler. In most BASIC implementations, for example, you can say
list 100-200
to list only a specified range of lines within the
source program.
A much harder project would be to replace the code generator in this
compiler with one that generates machine language for your computer.
Instead of using define
to create Logo procedures, your compiler
would then write machine language instructions into a data file.
To do this, you must learn quite a lot about how machine language
programs are run on your computer!
(back to Table of Contents) | BACK chapter thread NEXT |
I haven't discussed every detail of the program. For example, you may want to trace through what happens when you ask to delete a line from the BASIC source program. Here is the complete compiler.
to basic make "linenumbers [] make "readline [] forever [basicprompt] end to basicprompt print [] print "READY print [] make "line basicread if emptyp :line [stop] ifelse numberp first :line [compile split :line] [immediate :line] end to compile :commands make "number first :commands make :number :line ifelse emptyp butfirst :commands ~ [eraseline :number] ~ [makedef (word "basic% :number) butfirst :commands] end to makedef :name :commands make "definition [[]] foreach :commands [run list (word "compile. first ?) ?] queue "definition (list "nextline :number) define :name :definition make "linenumbers insert :number :linenumbers end to insert :num :list if emptyp :list [output (list :num)] if :num = first :list [output :list] if :num < first :list [output fput :num :list] output fput first :list (insert :num butfirst :list) end to eraseline :num make "linenumbers remove :num :linenumbers end to immediate :line if equalp :line [list] [foreach :linenumbers [print thing ?] stop] if equalp :line [run] [run (list (word "basic% first :linenumbers)) stop] if equalp :line [exit] [throw "toplevel] print sentence [Invalid command:] :line end ;; Compiling each BASIC command to compile.end :command queue "definition [stop] end to compile.goto :command queue "definition (list (word "basic% last :command) "stop) end to compile.gosub :command queue "definition (list (word "basic% last :command)) end to compile.return :command queue "definition [stop] end to compile.print :command make "command butfirst :command while [not emptyp :command] [c.print1] queue "definition [print []] end to c.print1 make "exp expression ifelse equalp first first :exp "" ~ [make "sym gensym make word "%% :sym butfirst butlast first :exp queue "definition list "type word ":%% :sym] ~ [queue "definition fput "type :exp] if emptyp :command [stop] make "delimiter pop "command if equalp :delimiter ", [queue "definition [type char 9] stop] if equalp :delimiter "\; [stop] (throw "error [Comma or semicolon needed in print.]) end to compile.input :command make "command butfirst :command if equalp first first :command "" ~ [make "sym gensym make "prompt pop "command make word "%% :sym butfirst butlast :prompt queue "definition list "type word ":%% :sym] while [not emptyp :command] [c.input1] end to c.input1 make "var pop "command queue "definition (list "make (word ""% :var) "readvalue) if emptyp :command [stop] make "delimiter pop "command if not equalp :delimiter ", (throw "error [Comma needed in input.]) end to compile.let :command make "command butfirst :command make "var pop "command make "delimiter pop "command if not equalp :delimiter "= [(throw "error [Need = in let.])] make "exp expression queue "definition (sentence "make (word ""% :var) :exp) end to compile.for :command make "command butfirst :command make "var pop "command make "delimiter pop "command if not equalp :delimiter "= [(throw "error [Need = after for.])] make "start expression make "delimiter pop "command if not equalp :delimiter "to [(throw "error [Need to after for.])] make "end expression queue "definition (sentence "make (word ""% :var) :start) queue "definition (sentence "make (word ""let% :var) :end) make "newname word "% gensym queue "definition (sentence "make (word ""next% :var) (list (list :newname))) queue "definition (list :newname) define :name :definition make "name :newname make "definition [[]] end to compile.next :command make "command butfirst :command make "var pop "command queue "definition (sentence "make (word ""% :var) (word ":% :var) [+ 1]) queue "definition (sentence [if not greaterp] (word ":% :var) (word ":let% :var) (list (list "run (word ":next% :var) "stop))) end to compile.if :command make "command butfirst :command make "exp expression make "delimiter pop "command if not equalp :delimiter "then [(throw "error [Need then after if.])] queue "definition (sentence "if :exp (list c.if1)) end to c.if1 local "definition make "definition [[]] run list (word "compile. first :command) :command ifelse (count :definition) = 2 ~ [output last :definition] ~ [make "newname word "% gensym define :newname :definition output (list :newname)] end ;; Compile an expression for LET, IF, PRINT, or FOR to expression make "expr [] make "token expr1 while [not emptyp :token] [queue "expr :token make "token expr1] output :expr end to expr1 if emptyp :command [output []] make "token pop "command if memberp :token [+ - * / = < > ( )] [output :token] if memberp :token [, \; : then to] [push "command :token output []] if numberp :token [output :token] if equalp first :token "" [output :token] output word ":% :token end ;; reading input to basicread output basicread1 readword [] " end to basicread1 :input :output :token if emptyp :input [if not emptyp :token [push "output :token] output reverse :output] if equalp first :input "| | [if not emptyp :token [push "output :token] output basicread1 (butfirst :input) :output "] if equalp first :input "" [if not emptyp :token [push "output :token] output breadstring butfirst :input :output "] if memberp first :input [+ - * / = < > ( ) , \; :] ~ [if not emptyp :token [push "output :token] output basicread1 (butfirst :input) (fput first :input :output) "] output basicread1 (butfirst :input) :output (word :token first :input) end to breadstring :input :output :string if emptyp :input [(throw "error [String needs ending quote.])] if equalp first :input "" ~ [output basicread1 (butfirst :input) (fput (word "" :string "") :output) "] output breadstring (butfirst :input) :output (word :string first :input) end to split :line output fput first :line split1 (butfirst :line) [] [] end to split1 :input :output :command if emptyp :input [if not emptyp :command [push "output reverse :command] output reverse :output] if equalp first :input ": [if not emptyp :command [push "output reverse :command] output split1 (butfirst :input) :output []] output split1 (butfirst :input) :output (fput first :input :command) end ;; Runtime library to nextline :num make "target member :num :linenumbers if not emptyp :target [make "target butfirst :target] if not emptyp :target [run (list (word "basic% first :target))] end to readvalue while [emptyp :readline] [make "readline basicread] output pop "readline end
Brian Harvey,
bh@cs.berkeley.edu