diff options
author | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
---|---|---|
committer | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
commit | 562a9a52d599d9a05f871404050968a5fd282640 (patch) | |
tree | 7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/ssch18/trees.html | |
parent | 5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff) | |
download | tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz |
*
Diffstat (limited to 'js/games/nluqo.github.io/~bh/ssch18/trees.html')
-rw-r--r-- | js/games/nluqo.github.io/~bh/ssch18/trees.html | 848 |
1 files changed, 848 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/ssch18/trees.html b/js/games/nluqo.github.io/~bh/ssch18/trees.html new file mode 100644 index 0000000..3282cb2 --- /dev/null +++ b/js/games/nluqo.github.io/~bh/ssch18/trees.html @@ -0,0 +1,848 @@ +<P> + +<P><A NAME="mondrian"></A><CENTER><IMG SRC="../ss-pics/mondrian.jpg" ALT="figure: mondrian"></CENTER><P><CENTER><EM>Apple Tree in Blossom,</EM> Piet Mondrian +(1912) +</CENTER><P> + +<HTML> +<HEAD> +<TITLE>Simply Scheme: Introducing Computer Science ch 18: Trees</TITLE> +</HEAD> +<BODY> +<HR> +<CITE>Simply Scheme:</CITE> +<CITE>Introducing Computer Science</CITE> 2/e Copyright (C) 1999 MIT +<H2>Chapter 18</H2> +<H1>Trees</H1> + +<TABLE width="100%"><TR><TD> +<IMG SRC="../simply.jpg" ALT="cover photo"> +<TD><TABLE> +<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian +Harvey</A><BR>University of California, Berkeley</CITE> +<TR><TD align="right"><CITE><A HREF="http://ccrma.stanford.edu/~matt">Matthew +Wright</A><BR>University of California, Santa Barbara</CITE> +<TR><TD align="right"><BR> +<TR><TD align="right"><A HREF="../pdf/ssch18.pdf">Download PDF version</A> +<TR><TD align="right"><A HREF="../ss-toc2.html">Back to Table of Contents</A> +<TR><TD align="right"><A HREF="../ssch17/lists.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch19/implement-hof.html"><STRONG>NEXT</STRONG></A> +<TR><TD align="right"><A HREF="http://mitpress.mit.edu/0262082810">MIT +Press web page for <CITE>Simply Scheme</CITE></A> +</TABLE></TABLE> + +<HR> + + +<P>The big advantage of full-featured lists over sentences is their ability to +represent <EM>structure</EM> in our data by means of sublists. +In this chapter we'll look at examples in which we use lists and sublists to +represent two-dimensional information structures. The kinds of structures +we'll consider are called <EM>trees</EM> because they resemble trees in +nature: + +<P><P><CENTER><IMG SRC="../ss-pics/firsttree.jpg" ALT="figure: firsttree"> + <IMG SRC="../ss-pics/tree.jpg" ALT="photo: tree"> +</CENTER> + + +<P> +<P>The components of a tree are called <EM>nodes.</EM> At the +top is the <EM><A NAME="g1"></A><A NAME="g2"></A>root node</EM> of the tree; in the interior of the +diagram there are <EM><A NAME="g3"></A><A NAME="g4"></A>branch nodes;</EM> at the bottom are the <EM><A NAME="g5"></A><A NAME="g6"></A>leaf nodes,</EM> from which no further branches extend. + +<P>We're going to begin by considering a tree as an abstract data type, without +thinking about how lists are used to represent trees. For example, we'll +construct trees using a procedure named <CODE>make-node</CODE>, as if that were a +Scheme primitive. About halfway through the chapter, we'll explore the +relationship between trees and lists. + +<P><H2>Example: The World</H2> + +<P>Here is a tree that represents the world: + +<P><CENTER><IMG SRC="../ss-pics/world.jpg" ALT="figure: world"></CENTER> + +<P>Each node in the tree represents some region of the world. Consider the +node labeled "Great Britain." There are two parts to this node: The +obvious part is the label itself, the name "Great Britain." But +the regions of the world that are included within Great Britain—that is, +the nodes that are attached beneath Great Britain in the figure—are also part +of this node. + +<P>We say that every node has a <EM>datum</EM> and zero or more <EM>children.</EM> For the moment, let's just say that the datum can be +either a word or a sentence. The children, if any, are themselves trees. +Notice that this definition is recursive—a tree is made up of trees. +(What's the base case?) + +<P>This family metaphor is also part of the terminology of +trees.<A NAME="text1" HREF="trees.html#ft1">[1]</A> We say that a node is the <EM>parent</EM> of +another node, or that two nodes are <EM>siblings.</EM> In more advanced +treatments, you even hear things like "grandparent" and "cousin," but we +won't get into that. + +<P>What happens when you prune an actual tree by cutting off a branch? +The cut-off part is essentially a tree in itself, with a smaller trunk +and fewer branches. The metaphor isn't perfect because the cut-off part +doesn't have roots, but still, we can stick the end in the ground and +hope that the cut-off end will take root as a new tree. + +<P>It's the same with a country in our example; each country is a branch node of +the entire world tree, but also a tree in itself. Depending on how you +think about it, Great Britain can be either a component of the entire world +or a collection of smaller locations. So the branch node that represents +Great Britain is the root node of a <EM>subtree</EM> of the entire tree. + +<P><CENTER><IMG SRC="../ss-pics/britain.jpg" ALT="figure: britain"></CENTER> + +<P>What is a node? It might seem natural to think of a node as being just the +information in one of the circles in the diagram—that is, to think of a +node as including only its datum. In that way of thinking, each node would +be separate from every other node, just as the words in a sentence are all +separate elements. However, it will be more useful to think of a node as a +structure that includes everything below that circle also: the datum and the +children. So when we think of the node for Great Britain, we're thinking +not only of the name "Great Britain," but also of everything <EM>in</EM> Great Britain. From this perspective, the root node of a tree +includes the entire tree. We might as well say that the node <EM>is</EM> the +tree. + +<P>The constructor for a tree is actually the constructor for one node, +its root node. Our constructor for trees is therefore called +<CODE><A NAME="g7"></A><CODE>make-node</CODE></CODE>. It takes two arguments: the datum and a (possibly +empty) list of children. As the following example shows, constructing what +we think of as one tree requires the construction of many such nodes. + +<P><PRE>(define world-tree ;; painful-to-type version + (make-node + 'world + (list (make-node + 'italy + (list (make-node 'venezia '()) + (make-node 'riomaggiore '()) + (make-node 'firenze '()) + (make-node 'roma '()))) + (make-node + '(united states) + (list (make-node 'california + (list (make-node 'berkeley '()) + (make-node '(san francisco) '()) + (make-node 'gilroy '()))) + (make-node 'massachusetts + (list (make-node 'cambridge '()) + (make-node 'amherst '()) + (make-node 'sudbury '())))))))) +</PRE> + +<P>You'll notice that we haven't defined all of the places shown +in the figure. That's because we got tired of doing all this typing; +we're going to invent some abbreviations later. For now, we'll take time +out to show you the selectors for trees. +<A NAME="g8"></A> +<A NAME="g9"></A> + +<P><PRE>> (datum world-tree) +WORLD + +> (datum (car (children world-tree))) +ITALY + +> (datum (car (children (cadr (children world-tree))))) +CALIFORNIA + +> (datum (car (children (car (children + (cadr (children world-tree))))))) +BERKELEY +</PRE> + +<P><CODE>Datum</CODE> of a tree node returns the datum of that node. +<CODE>Children</CODE> of a node returns a list of the children of the node. +(A list of trees is called a <EM>forest.</EM>) + +<P>Here are some abbreviations to help us construct the world tree with less +typing. Unlike <CODE>make-node</CODE>, <CODE>datum</CODE>, and <CODE>children</CODE>, which are +intended to work on trees in general, these abbreviations were designed +with the world tree specifically in mind: + +<P><PRE>(define (<A NAME="g10"></A>leaf datum) + (make-node datum '())) + +(define (<A NAME="g11"></A>cities name-list) + (map leaf name-list)) +</PRE> + +<P>With these abbreviations the world tree is somewhat easier to +define: + +<P><PRE>(define world-tree + (make-node + 'world + (list (make-node + 'italy + (cities '(venezia riomaggiore firenze roma))) + (make-node + '(united states) + (list (make-node + 'california + (cities '(berkeley (san francisco) gilroy))) + (make-node + 'massachusetts + (cities '(cambridge amherst sudbury))) + (make-node 'ohio (cities '(kent))))) + (make-node 'zimbabwe (cities '(harare hwange))) + (make-node 'china + (cities '(beijing shanghai guangzhou suzhou))) + (make-node + '(great britain) + (list + (make-node 'england (cities '(liverpool))) + (make-node 'scotland + (cities '(edinburgh glasgow (gretna green)))) + (make-node 'wales (cities '(abergavenny))))) + (make-node + 'australia + (list + (make-node 'victoria (cities '(melbourne))) + (make-node '(new south wales) (cities '(sydney))) + (make-node 'queensland + (cities '(cairns (port douglas)))))) + (make-node 'honduras (cities '(tegucigalpa)))))) +</PRE> + +<P> +<P><H2>How Big Is My Tree?</H2> + +<P>Now that we have the tree, how many cities are there in our world? + +<P><PRE>(define (<A NAME="g12"></A>count-leaves tree) + (if (leaf? tree) + 1 + (reduce + (map count-leaves (children tree))))) + +(define (<A NAME="g13"></A>leaf? node) + (null? (children node))) + +> (count-leaves world-tree) +27 +</PRE> + +<P>At first glance, this may seem like a simple case of recursion, with +<CODE>count-leaves</CODE> calling <CODE>count-leaves</CODE>. But since what looks like a +single recursive call is really a call to <CODE>map</CODE>, it is equivalent to +<EM>several</EM> recursive calls, one for each child of the given tree node. + +<P><H2>Mutual Recursion</H2> + +In Chapter 14 we wrote recursive +procedures that were equivalent to using higher-order functions. Let's do +the same for <CODE>count-leaves</CODE>. + +<P><PRE>(define (<A NAME="g14"></A>count-leaves tree) + (if (leaf? tree) + 1 + (count-leaves-in-forest (children tree)))) + +(define (<A NAME="g15"></A>count-leaves-in-forest forest) + (if (null? forest) + 0 + (+ (count-leaves (car forest)) + (count-leaves-in-forest (cdr forest))))) +</PRE> + +<P>Note that <CODE>count-leaves</CODE> calls <CODE>count-leaves-in-forest</CODE>, +and <CODE>count-leaves-</CODE><CODE>in-</CODE><CODE>forest</CODE> calls +<CODE>count-leaves</CODE>. This pattern is called <EM><A NAME="g16"></A><A NAME="g17"></A>mutual recursion.</EM> + +<P>Mutual recursion is often a useful technique for dealing with trees. In the +typical recursion we've seen before this chapter, we've moved sequentially +through a list or sentence, with each recursive call taking us one step to +the right. In the following paragraphs we present three different models to +help you think about how the shape of a tree gives rise to a mutual recursion. + +<P>In the first model, we're going to think of <CODE>count-leaves</CODE> as an +initialization procedure, and <CODE>count-leaves-in-forest</CODE> as its helper +procedure. Suppose we want to count the leaves of a tree. Unless the +argument is a very shallow<A NAME="text2" HREF="trees.html#ft2">[2]</A> tree, this will involve counting the leaves of all +of the children of that tree. What we want is a straightforward sequential +recursion over the list of children. But we're given the wrong argument: +the tree itself, not its list of children. So we need an initialization +procedure, <CODE>count-leaves</CODE>, whose job is to extract the list of children +and invoke a helper procedure, <CODE>count-leaves-in-forest</CODE>, with that list +as argument. + +<P>The helper procedure follows the usual sequential list pattern: Do +something to the <CODE>car</CODE> of the list, and recursively handle the <CODE>cdr</CODE> +of the list. Now, what do we have to do to the <CODE>car</CODE>? In the usual +sequential recursion, the <CODE>car</CODE> of the list is something simple, such as +a word. What's special about trees is that here the <CODE>car</CODE> is itself a +tree, just like the entire data structure we started with. Therefore, we +must invoke a procedure whose domain is trees: <CODE>count-leaves</CODE>. + +<P>This model is built on two ideas. One is the idea of the domain of a +function; the reason we need two procedures is that we need one that takes a +tree as its argument and one that takes a list of trees as its argument. +The other idea is the leap of faith; we assume that the invocation of <CODE>count-leaves</CODE> within <CODE>count-leaves-in-forest</CODE> will correctly handle each +child without tracing the exact sequence of events. + +<P>The second model is easier to state but less rigorous. Because of the +two-dimensional nature of trees, in order to visit every node we have to be +able to move in two different directions. From a given node we have to be +able to move <EM>down</EM> to its children, but from each child we must be +able to move <EM>across</EM> to its next sibling. + +<P>The job of <CODE>count-leaves-in-forest</CODE> is to move from left to right +through a list of children. (It does this using the more familiar kind of +recursion, in which it invokes itself directly.) The downward motion +happens in <CODE>count-leaves</CODE>, which moves down one level by invoking <CODE>children</CODE>. How does the program move down more than one level? At each +level, <CODE>count-leaves</CODE> is invoked recursively from <CODE>count-leaves-in-forest</CODE>. + +<P>The third model is also based on the two-dimensional nature of trees. +Imagine for a moment that each node in the tree has at most one child. +In that case, <CODE>count-leaves</CODE> could move from the root down to the single +leaf with a structure very similar to the actual procedure, but carrying out +a sequential recursion: + +<P><PRE>(define (count-leaf tree) + (if (leaf? tree) + 1 + (count-leaf (child tree)))) +</PRE> + +<P>The trouble with this, of course, is that at each downward +step there isn't a single "next" node. Instead of a single path from the +root to the leaf, there are multiple paths from the root to many leaves. +To make our idea of downward motion through sequential recursion work in a +real tree, at each level we must "clone" <CODE>count-leaves</CODE> as many times +as there are children. <CODE>Count-leaves-in-forest</CODE> is the factory that +manufactures the clones. It hires one <CODE>count-leaves</CODE> little person for +each child and accumulates their results. + +<P>The key point in recursion on trees is that each child of a tree is itself a +perfectly good tree. This recursiveness in the nature of trees gives rise +to a very recursive structure for programs that use trees. The reason we +say "very" recursive is that each invocation of <CODE>count-leaves</CODE> causes +not just one but several recursive invocations, one for each child, by way +of <CODE>count-leaves-in-forest</CODE>. + +<P>In fact, we use the name <EM><A NAME="g18"></A><A NAME="g19"></A>tree recursion</EM> for any situation +in which a procedure invocation results in more than one recursive call, +even if there isn't an argument that's a tree. The computation of Fibonacci +numbers from Chapter 13 is an example of a tree recursion with no +tree. The <CODE>car</CODE>-<CODE>cdr</CODE> recursions in Chapter 17 are also +tree recursions; any structured list-of-lists has a somewhat tree-like, +two-dimensional character even though it doesn't use the formal mechanisms +we're exploring in this chapter. +The <CODE>cdr</CODE> recursion is a "horizontal" one, moving from one element +to another within the same list; the <CODE>car</CODE> recursion is a "vertical" +one, exploring a sublist of the given list. + +<P> +<P><H2>Searching for a Datum in the Tree</H2> + +<P>Procedures that explore trees aren't always as simple as <CODE>count-leaves</CODE>. +We started with that example because we could write it using higher-order +functions, so that you'd understand the structure of the problem before we +had to take on the complexity of mutual recursion. But many tree problems +don't quite fit our higher-order functions. + +<P>For example, let's write a predicate <CODE>in-tree?</CODE> that takes the name of +a place and a tree as arguments and tells whether or not that place is in the +tree. It <EM>is</EM> possible to make it work with <CODE>filter</CODE>: + +<P><PRE>(define (<A NAME="g20"></A>in-tree? place tree) + (or (equal? place (datum tree)) + (not (null? (filter (lambda (subtree) (in-tree? place subtree)) + (children tree)))))) +</PRE> + +<P>This awkward construction, however, also performs +unnecessary computation. If the place we're looking for happens to be in the +first child of a node, <CODE>filter</CODE> will nevertheless look in all the other +children as well. We can do better by replacing the use of <CODE>filter</CODE> +with a mutual recursion: + +<P><PRE>(define (<A NAME="g21"></A>in-tree? place tree) + (or (equal? place (datum tree)) + (in-forest? place (children tree)))) + +(define (<A NAME="g22"></A>in-forest? place forest) + (if (null? forest) + #f + (or (in-tree? place (car forest)) + (in-forest? place (cdr forest))))) + +> (in-tree? 'abergavenny world-tree) +#T + +> (in-tree? 'abbenay world-tree) +#F + +> (in-tree? 'venezia (cadr (children world-tree))) +#F +</PRE> + +<P>Although any mutual recursion is a little tricky to read, the structure +of this program does fit the way we'd describe the algorithm in +English. A place is in a tree if one of two conditions holds: the place is +the datum at the root of the tree, or the place is (recursively) in one of +the child trees of this tree. That's what <CODE>in-tree?</CODE> says. As for +<CODE>in-forest?</CODE>, it says that a place is in one of a group of trees if +the place is in the first tree, or if it's in one of the remaining trees. + +<P><H2>Locating a Datum in the Tree</H2> + +<P>Our next project is similar to the previous one, but a little more +intricate. We'd like to be able to locate a city and find out all of +the larger regions that enclose the city. For example, we want to say + +<P><PRE>> (locate 'berkeley world-tree) +(WORLD (UNITED STATES) CALIFORNIA BERKELEY) +</PRE> + +<P>Instead of just getting a yes-or-no answer about whether a city is +in the tree, we now want to find out <EM>where</EM> it is. + +<P>The algorithm is recursive: To look for Berkeley within the world, we need +to be able to look for Berkeley within any subtree. The <CODE>world</CODE> node +has several children (countries). <CODE>Locate</CODE> recursively asks each of +those children to find a path to Berkeley. All but one of the children +return <CODE>#f</CODE>, because they can't find Berkeley within their territory. +But the <CODE>(united states)</CODE> node returns + +<P><PRE>((UNITED STATES) CALIFORNIA BERKELEY) +</PRE> + +<P>To make a complete path, we just prepend the name of the current +node, <CODE>world</CODE>, to this path. What happens when <CODE>locate</CODE> tries to +look for Berkeley in Australia? Since all of Australia's children return +<CODE>#f</CODE>, there is no path to Berkeley from Australia, so <CODE>locate</CODE> +returns <CODE>#f</CODE>. + +<P><PRE>(define (<A NAME="g23"></A>locate city tree) + (if (equal? city (datum tree)) + (list city) + (let ((subpath (locate-in-forest city (children tree)))) + (if subpath + (cons (datum tree) subpath) + #f)))) + +(define (<A NAME="g24"></A>locate-in-forest city forest) + (if (null? forest) + #f + (or (locate city (car forest)) + (locate-in-forest city (cdr forest))))) +</PRE> + +<P>Compare the structure of <CODE>locate</CODE> with that of <CODE>in-tree?</CODE>. The +helper procedures <CODE>in-forest?</CODE> and <CODE>locate-in-forest</CODE> are almost +identical. The main procedures look different, because <CODE>locate</CODE> has a +harder job, but both of them check for two possibilities: The city might be +the datum of the argument node, or it might belong to one of the child trees. + +<P><H2>Representing Trees as Lists</H2> + +<P>We've done a lot with trees, but we haven't yet talked about the way Scheme +stores trees internally. How do <CODE>make-node</CODE>, <CODE>datum</CODE>, and <CODE>children</CODE> work? It turns out to be very convenient to represent trees in +terms of lists. + +<P><PRE>(define (<A NAME="g25"></A>make-node datum children) + (cons datum children)) + +(define (<A NAME="g26"></A>datum node) + (car node)) + +(define (<A NAME="g27"></A>children node) + (cdr node)) +</PRE> + +<P>In other words, a tree is a list whose first element is the datum +and whose remaining elements are subtrees. + +<P> +<PRE>> world-tree +(WORLD + (ITALY (VENEZIA) (RIOMAGGIORE) (FIRENZE) (ROMA)) + ((UNITED STATES) + (CALIFORNIA (BERKELEY) ((SAN FRANCISCO)) (GILROY)) + (MASSACHUSETTS (CAMBRIDGE) (AMHERST) (SUDBURY)) + (OHIO (KENT))) + (ZIMBABWE (HARARE) (HWANGE)) + (CHINA (BEIJING) (SHANGHAI) (GUANGSZHOU) (SUZHOW)) + ((GREAT BRITAIN) + (ENGLAND (LIVERPOOL)) + (SCOTLAND (EDINBURGH) (GLASGOW) ((GRETNA GREEN))) + (WALES (ABERGAVENNY))) + (AUSTRALIA + (VICTORIA (MELBOURNE)) + ((NEW SOUTH WALES) (SYDNEY)) + (QUEENSLAND (CAIRNS) ((PORT DOUGLAS)))) + (HONDURAS (TEGUCIGALPA))) +</PRE> + +<P><PRE>> (car (children world-tree)) +(ITALY (VENEZIA) (RIOMAGGIORE) (FIRENZE) (ROMA)) +</PRE> + +<P>Ordinarily, however, we're not going to print out trees in their entirety. +As in the <CODE>locate</CODE> example, we'll extract just some subset of the +information and put it in a more readable form. + +<P><H2>Abstract Data Types</H2> + +<P>The procedures <CODE>make-node</CODE>, <CODE>datum</CODE>, and <CODE>children</CODE> define an +<A NAME="g28"></A><A NAME="g29"></A>abstract data type for trees. Using this ADT, +<A NAME="g30"></A> +<A NAME="g31"></A> +we were able to write several useful procedures to manipulate trees before +pinning down exactly how a tree is represented as a Scheme list. + +<P>Although it would be possible to refer to the parts of a node by using <CODE>car</CODE> and <CODE>cdr</CODE> directly, your programs will be more readable if you use +the ADT-specific selectors and constructors. Consider this example: + +<P><PRE>(in-tree? 'venezia (caddr world-tree)) +</PRE> + +<P>What does <CODE>caddr</CODE> mean in this context? Is the <CODE>caddr</CODE> of +a tree a datum? A child? A forest? Of course you could work it out by +careful reasoning, but the form in which we presented this example +originally was much clearer: + +<P><PRE>(in-tree? 'venezia (cadr (children world-tree))) +</PRE> + +<P>Even better would be + +<P><PRE>(in-tree? 'venezia (list-ref (children world-tree) 1)) +</PRE> + +<P>Using the appropriate selectors and constructors is called <EM>respecting</EM> the data abstraction. Failing to use the appropriate +selectors and constructors is called a <EM>data abstraction +violation.</EM> + +<P>Since we wrote the selectors and constructor for trees ourselves, we +could have defined them to use some different representation: + +<P><PRE>(define (make-node datum children) + (list 'the 'node 'with 'datum datum 'and 'children children)) + +(define (datum node) (list-ref node 4)) + +(define (children node) (list-ref node 7)) +</PRE> + +<P><PRE>> (make-node 'italy (cities '(venezia riomaggiore firenze roma))) +(THE NODE WITH DATUM ITALY AND CHILDREN + ((THE NODE WITH DATUM VENEZIA AND CHILDREN ()) + (THE NODE WITH DATUM RIOMAGGIORE AND CHILDREN ()) + (THE NODE WITH DATUM FIRENZE AND CHILDREN ()) + (THE NODE WITH DATUM ROMA AND CHILDREN ()))) +</PRE> + +<P>You might expect that this change in the representation would +require changes to all the procedures we wrote earlier, such as <CODE>count-leaves</CODE>. But in fact, those procedures would continue to work +perfectly because they don't see the representation. (They respect the data +abstraction.) As long as <CODE>datum</CODE> and <CODE>children</CODE> find the right +information, it doesn't matter how the trees are stored. All that matters +is that the constructors and selectors have to be compatible with each other. + +<P>On the other hand, the example in this section in which we violated the data +abstraction by using <CODE>caddr</CODE> to find the second child of <CODE>world-tree</CODE> would fail if we changed the representation. Many cases like +this one, in which formerly working programs failed after a change in +representation, led programmers to use such moralistic terms as +"respecting" and "violating" data abstractions.<A NAME="text3" HREF="trees.html#ft3">[3]</A> + +<P> +<H2>An Advanced Example: Parsing Arithmetic Expressions</H2> + +<P>Consider the notation for arithmetic expressions. Scheme uses <EM><A NAME="g32"></A> +<A NAME="g33"></A> +prefix</EM> notation: <CODE>(+ 3 4)</CODE>. By contrast, people who aren't Scheme +programmers generally represent arithmetic computations using an <EM>infix</EM> notation, in which the function symbol goes between two +arguments: 3+4. + +<P>Our goal in this section is to translate an infix arithmetic expression into +a tree representing the computation. This translation process is called +<EM>parsing</EM> the expression. For example, we'll turn the expression + +<P><P><CENTER>4+3×7-5/(3+4)+6</CENTER><P> + +into the tree + +<P><CENTER><IMG SRC="../ss-pics/parse0.jpg" ALT="figure: parse0"></CENTER> + +<P>The point of using a tree is that it's going to be very easy to perform the +computation once we have it in tree form. In the original infix form, it's +hard to know what to do first, because there are <EM>precedence</EM> rules +that determine an implicit grouping: Multiplication and division come +before addition and subtraction; operations with the same precedence are +done from left to right. Our sample expression is equivalent to + +<P><P><CENTER>(((4 + (3 × 7)) - (5 / (3+4))) + 6)</CENTER><P> + +In the tree representation, it's easy to see that the operations +nearer the leaves are done first; the root node is the last operation, +because it depends on the results of lower-level operations. + +<P>Our program will take as its argument an infix arithmetic expression in the +form of a list: + +<P><PRE>> (parse '(4 + 3 * 7 - 5 / (3 + 4) + 6)) +</PRE> + +<P>Each element of the list must be one of three things: a number; +one of the four symbols <CODE>+</CODE>, <CODE>-</CODE>, <CODE>*</CODE>, or <CODE>/</CODE>; or a sublist +(such as the three-element list <CODE>(3 + 4)</CODE> in this example) satisfying +the same rule. (You can imagine that we're implementing a pocket +calculator. If we were implementing a computer programming language, then +we'd also accept variable names as operands. But we're not bothering with +that complication because it doesn't really affect the part of the problem +about turning the expression into a tree.) + +<P>What makes this problem tricky is that we can't put the list elements into +the tree as soon as we see them. For example, the first three elements of +our sample list are <CODE>4</CODE>, <CODE>+</CODE>, and <CODE>3</CODE>. It's tempting to build a +subtree of those three elements: + +<P><CENTER><IMG SRC="../ss-pics/fourplus.jpg" ALT="figure: fourplus"></CENTER> + +<P>But if you compare this picture with the earlier picture of the +correct tree, you'll see that the second argument to this <CODE>+</CODE> invocation +isn't the number <CODE>3</CODE>, but rather the subexpression <CODE>3 * 7</CODE>. + +<P>By this reasoning you might think that we have to examine the entire +expression before we can start building the tree. But in fact we can +sometimes build a subtree with confidence. For example, when we see the +minus sign in our sample expression, we can tell that the subexpression +<CODE>3 * 7</CODE> that comes before it is complete, because <CODE>*</CODE> has higher +precedence than <CODE>-</CODE> does. + +<P>Here's the plan. The program will examine its argument from left to right. +Since the program can't finish processing each list element right away, it +has to maintain information about the elements that have been examined but +not entirely processed. It's going to be easier to maintain that +information in two parts: one list for still-pending operations and another +for still-pending operands. Here are the first steps in parsing our sample +expression; the program examines the elements of the argument, putting +numbers onto the operand list and operation symbols onto the operation +list:<A NAME="text4" HREF="trees.html#ft4">[4]</A> + +<P> +<CENTER><IMG SRC="../ss-pics/parse1.jpg" ALT="figure: parse1"></CENTER> + +<P>At this point, the program is looking at the <CODE>*</CODE> operator in the infix +expression. If this newly seen operator had lower precedence than the <CODE>+</CODE> that's already at the head of the list of operations, then it would be +time to carry out the <CODE>+</CODE> operation by creating a tree with <CODE>+</CODE> at +the root and the first two operands in the list as its children. Instead, +since <CODE>*</CODE> has higher precedence than <CODE>+</CODE>, the program isn't ready to +create a subtree but must instead add the <CODE>*</CODE> to its operation list. + +<P><CENTER><IMG SRC="../ss-pics/parse2.jpg" ALT="figure: parse2"></CENTER> + +<P>This time, the newly seen <CODE>-</CODE> operation has lower precedence than the +<CODE>*</CODE> at the head of the operation list. Therefore, it's time for the +program to <EM>handle</EM> the <CODE>*</CODE> operator, by making a subtree +containing that operator and the first two elements of the operand list. +This new subtree becomes the new first element of the operand list. + +<P><CENTER><IMG SRC="../ss-pics/parse3.jpg" ALT="figure: parse3"></CENTER> + +<P>Because the program decided to handle the waiting <CODE>*</CODE> operator, it still +hasn't moved the <CODE>-</CODE> operator from the infix expression to the operator +list. Now the program must compare <CODE>-</CODE> with the <CODE>+</CODE> at the head of +the list. These two operators have the same precedence. Since we want to +carry out same-precedence operators from left to right, it's time to handle +the <CODE>+</CODE> operator. + +<P><CENTER><IMG SRC="../ss-pics/parse4.jpg" ALT="figure: parse4"></CENTER> + +<P>Finally the program can move the <CODE>-</CODE> operator onto the operator list. +The next several steps are similar to ones we've already seen. + +<P><CENTER><IMG SRC="../ss-pics/parse5.jpg" ALT="figure: parse5"></CENTER> + + +<P>This is a new situation: The first unseen element of the infix expression is +neither a number nor an operator, but a sublist. We recursively <CODE>parse</CODE> +this subexpression, adding the resulting tree to the operand list. + +<P><CENTER><IMG SRC="../ss-pics/parse6.jpg" ALT="figure: parse6"></CENTER> + +<P>Then we proceed as before, processing the <CODE>/</CODE> because it has higher +precedence than the <CODE>+</CODE>, then the <CODE>-</CODE> because it has the same +priority as the <CODE>+</CODE>, and so on. + +<P><CENTER><IMG SRC="../ss-pics/parse7.jpg" ALT="figure: parse7"></CENTER> + +<P>Once the program has examined every element of the infix expression, the +operators remaining on the operator list must be handled. In this case +there is only one such operator. Once the operators have all been handled, +there should be one element remaining on the operand list; that element is +the desired tree for the entire original expression. + +<P><CENTER><IMG SRC="../ss-pics/parse8.jpg" ALT="figure: parse8"></CENTER> + +<P>The following program implements this algorithm. It works only for correctly +formed infix expressions; if given an argument like <CODE>(3 + *)</CODE>, it'll give +an incorrect result or a Scheme error. + +<P><PRE>(define (<A NAME="g34"></A>parse expr) + (parse-helper expr '() '())) + +(define (parse-helper expr operators operands) + (cond ((null? expr) + (if (null? operators) + (car operands) + (handle-op '() operators operands))) + ((number? (car expr)) + (parse-helper (cdr expr) + operators + (cons (make-node (car expr) '()) operands))) + ((list? (car expr)) + (parse-helper (cdr expr) + operators + (cons (parse (car expr)) operands))) + (else (if (or (null? operators) + (> (precedence (car expr)) + (precedence (car operators)))) + (parse-helper (cdr expr) + (cons (car expr) operators) + operands) + (handle-op expr operators operands))))) + +(define (handle-op expr operators operands) + (parse-helper expr + (cdr operators) + (cons (make-node (car operators) + (list (cadr operands) (car operands))) + (cddr operands)))) + +(define (precedence oper) + (if (member? oper '(+ -)) 1 2)) +</PRE> + +<P>We promised that after building the tree it would be easy to compute the +value of the expression. Here is the program to do that: + +<P><PRE>(define (<A NAME="g35"></A>compute tree) + (if (number? (datum tree)) + (datum tree) + ((function-named-by (datum tree)) + (compute (car (children tree))) + (compute (cadr (children tree)))))) + +(define (function-named-by oper) + (cond ((equal? oper '+) +) + ((equal? oper '-) -) + ((equal? oper '*) *) + ((equal? oper '/) /) + (else (error "no such operator as" oper)))) + +> (compute (parse '(4 + 3 * 7 - 5 / (3 + 4) + 6))) +30.285714285714 +</PRE> + +<P><H2>Pitfalls</H2> + +<P>A leaf node is a perfectly good actual argument to a tree procedure, +even though the picture of a leaf node doesn't look treeish because there +aren't any branches. A common mistake is to make the base case of the +recursion be a node whose children are leaves, instead of a node that's a +leaf itself. + +<P> +<P>The value returned by <CODE>children</CODE> is not a tree, but a forest. It's +therefore not a suitable actual argument to a procedure that expects a tree. + +<P><H2>Exercises</H2> + +<P><B>18.1</B> What does + +<P><PRE>((SAN FRANCISCO)) +</PRE> + +<P>mean in the printout of <CODE>world-tree</CODE>? Why two sets of +parentheses? + + +<P> +<P> +<B>18.2</B> Suppose we change the definition of the tree constructor so that it +uses <CODE>list</CODE> instead of <CODE>cons</CODE>: + +<P><PRE>(define (make-node datum children) + (list datum children)) +</PRE> + +<P>How do we have to change the selectors so that everything still works? + + +<P> +<B>18.3</B> Write <CODE><A NAME="g36"></A>depth,</CODE> a procedure that takes a tree as argument and +returns the largest number of nodes connected through parent-child links. +That is, a leaf node has depth 1; a tree in which all the children of the +root node are leaves has depth 2. Our world tree has depth 4 (because the +longest path from the root to a leaf is, for example, world, country, state, +city). + + +<P> +<B>18.4</B> Write <CODE><A NAME="g37"></A>count-nodes</CODE>, a procedure that takes a tree as argument +and returns the total number of nodes in the tree. (Earlier we counted the +number of <EM>leaf</EM> nodes.) + + +<P> +<B>18.5</B> Write <CODE><A NAME="g38"></A>prune</CODE>, a procedure that takes a tree as argument and +returns a copy of the tree, but with all the leaf nodes of the original tree +removed. (If the argument to <CODE>prune</CODE> is a one-node tree, in which the +root node has no children, then <CODE>prune</CODE> should return <CODE>#f</CODE> because +the result of removing the root node wouldn't be a tree.) + + +<P> + +<B>18.6</B> Write a program <CODE><A NAME="g39"></A>parse-scheme</CODE> that parses a Scheme arithmetic +expression into the same kind of tree that <CODE>parse</CODE> produces for infix +expressions. Assume that all procedure invocations in the Scheme expression +have two arguments. + +<P>The resulting tree should be a valid argument to <CODE>compute</CODE>: + +<P><PRE>> (compute (parse-scheme '(* (+ 4 3) 2))) +14 +</PRE> + +<P>(You can solve this problem without the restriction to +two-argument invocations if you rewrite <CODE>compute</CODE> so that it doesn't +assume every branch node has two children.) + + +<P> + +<HR> +<A NAME="ft1" HREF="trees.html#text1">[1]</A> Contrariwise, the tree metaphor is also part of the +terminology of families.<P> +<A NAME="ft2" HREF="trees.html#text2">[2]</A> You probably think of trees as being short +or tall. But since our trees are upside-down, the convention is to call +them shallow or deep.<P> +<A NAME="ft3" HREF="trees.html#text3">[3]</A> Another example +of a data abstraction violation is in Chapter 16. When <CODE>match</CODE> creates +an empty known-values database, we didn't use a constructor. Instead, we +merely used a quoted empty sentence: + +<P><PRE>(define (match pattern sent) + (match-using-known-values pattern sent '())) +</PRE><P> +<A NAME="ft4" HREF="trees.html#text4">[4]</A> Actually, as we'll see shortly, the elements of the operand +list are trees, so what we put in the operand list is a one-node tree whose +datum is the number.<P> +<P><A HREF="../ss-toc2.html">(back to Table of Contents)</A><P> +<A HREF="../ssch17/lists.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch19/implement-hof.html"><STRONG>NEXT</STRONG></A> + +<P> +<ADDRESS> +<A HREF="../index.html">Brian Harvey</A>, +<CODE>bh@cs.berkeley.edu</CODE> +</ADDRESS> +</BODY> +</HTML> |