diff options
author | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
---|---|---|
committer | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
commit | 562a9a52d599d9a05f871404050968a5fd282640 (patch) | |
tree | 7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/v2ch7 | |
parent | 5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff) | |
download | tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz |
*
Diffstat (limited to 'js/games/nluqo.github.io/~bh/v2ch7')
-rw-r--r-- | js/games/nluqo.github.io/~bh/v2ch7/match.html | 1437 | ||||
-rw-r--r-- | js/games/nluqo.github.io/~bh/v2ch7/match.lg | 166 | ||||
-rw-r--r-- | js/games/nluqo.github.io/~bh/v2ch7/v2ch7.html | 1437 |
3 files changed, 3040 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/v2ch7/match.html b/js/games/nluqo.github.io/~bh/v2ch7/match.html new file mode 100644 index 0000000..d160258 --- /dev/null +++ b/js/games/nluqo.github.io/~bh/v2ch7/match.html @@ -0,0 +1,1437 @@ +<HTML> +<HEAD> +<TITLE>Computer Science Logo Style vol 2 ch 7: Pattern Matcher</TITLE> +</HEAD> +<BODY> +<CITE>Computer Science Logo Style</CITE> volume 2: +<CITE>Advanced Techniques</CITE> 2/e Copyright (C) 1997 MIT +<H1>Pattern Matcher</H1> + +<TABLE width="100%"><TR><TD> +<IMG SRC="../csls2.jpg" ALT="cover photo"> +<TD><TABLE> +<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian +Harvey</A><BR>University of California, Berkeley</CITE> +<TR><TD align="right"><BR> +<TR><TD align="right"><A HREF="../pdf/v2ch07.pdf">Download PDF version</A> +<TR><TD align="right"><A HREF="../v2-toc2.html">Back to Table of Contents</A> +<TR><TD align="right"><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A> +<TR><TD align="right"><A HREF="https://mitpress.mit.edu/books/computer-science-logo-style-second-edition-volume-2">MIT +Press web page for <CITE>Computer Science Logo Style</CITE></A> +</TABLE></TABLE> + +<HR><P>Program file for this chapter: <A HREF="match.lg"><CODE>match</CODE></A> + +<P> +In a <EM>conversational</EM> program, one that carries on a conversation with +the user, you may often have occasion to compare what the user types with +some expected response. For example, a quiz program will compare the user's +response with the correct answer; if you've just asked "how are you," you +might look for words like "fine" or "lousy" in the reply. The main +tools that Logo provides for such comparisons are <CODE>equalp</CODE>, which +compares two values for exact equality, and <CODE>memberp</CODE>, which compares +one datum with a list of alternatives. This project provides a more +advanced comparison tool. + +<P>Most of the projects in this book are fairly complicated in their inner +workings, but relatively simple in the external appearance of what they do. +This project is the reverse; the actual program is not so complex, but it +does quite a lot, and it will take a while to explain all of it. Pattern +matching is a powerful programming tool, and I hope you won't be put off by +the effort required to learn how to use it. + +<P>A <EM>pattern</EM> is a list in which some members are not made explicit. +This definition is best understood by considering an example. Consider the +pattern + +<P><PRE>[Every # is a #] +</PRE> + +<P>The words <CODE>every</CODE>, <CODE>is</CODE>, and <CODE>a</CODE> represent themselves +explicitly. The two number signs, however, are symbols representing "zero +or more arbitrary data." Here are some lists that would match the +pattern: + +<P><PRE>[Every man is a mortal] +[Every computer programmer is a genius] +[Every is a word] +[Every datum is a word or a list] +</PRE> + +<P>Here are some lists that would <EM>not</EM> match the pattern: + +<P><PRE>[Socrates is a man] +[Every man is an animal] +[Everyone I know is a friend] +[I think every list is a match] +</PRE> + +<P>The first of these examples doesn't match the pattern because the +word <CODE>every</CODE> is missing. The second has <CODE>an</CODE> instead of <CODE>a</CODE>, +while the third has <CODE>everyone</CODE> instead of <CODE>every</CODE>. The fourth has +the extra words <CODE>I think</CODE> before the word <CODE>every</CODE>. This last +example <EM>would</EM> match the pattern + +<P><PRE>[# every # is a #] +</PRE> + +<P>because this new pattern allows for extra words at the beginning. + +<P><CODE>Match</CODE> is a predicate that takes two inputs. The first input is a +pattern and the second input is a sentence. The output is <CODE>true</CODE> if the +sentence matches the pattern, otherwise <CODE>false</CODE>. + +<P><PRE>? <U>print match [Every # is a #] [Every book is a joy to read]</U> +true +? <U>print match [Every # is a #] [Every adolescent is obnoxious]</U> +false +</PRE> + +<P>Patterns can be more complicated than the ones I've shown so far. In the +following paragraphs I'll introduce the many ways that you can modify +patterns to control which sentences they'll match. As you read, you should +make up sample patterns of your own and try them out with <CODE>match</CODE>. + +<P>Often, in a conversational program, it's not good enough just to know +whether or not a sentence matches a pattern. You also want to know the +pieces of the sentence that match the variable parts of the pattern. <CODE> +Match</CODE> meets this requirement by allowing you to tell it the names of +variables that you want to receive the matching words from the sentence. +Here is an example: + +<P><PRE>? <U>print match [#food is for #animal] [Hay is for horses]</U> +true +? <U>show :food</U> +[Hay] +? <U>show :animal</U> +[horses] + +? <U>print match [#food is for #animal] [C++ is for the birds]</U> +true +? <U>show :food</U> +[C++] +? <U>show :animal</U> +[the birds] +</PRE> + +<P>Here is a short <A NAME="converse"> conversational program using the parts of the +pattern matcher we've discussed so far. + +<P><PRE>to converse +local [response name like] +print [Hi, my name is Toby and I like ice cream] +print [Tell me about yourself] +make "response readlist +if match [# my name is #name] :response [do.name strip.and :name] +if match [# i like #like] :response [do.like strip.and :like] +print [Nice meeting you!] +end + +to do.name :name +print sentence "Hello, :name +end + +to do.like :like +print sentence [I'm glad you like] :like +end + +to strip.and :text +local "short +if match [#short and #] :text [output :short] +output :text +end + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I like Chinese food</U> +Hello, Brian +I'm glad you like Chinese food +Nice meeting you! + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>I like spaghetti and meat balls</U> +I'm glad you like spaghetti +Nice meeting you! +</PRE> + +<P>If <CODE>match</CODE> outputs <CODE>false</CODE>, there is no guarantee of what +values will end up in the variables mentioned in the pattern. <CODE> +Converse</CODE> uses the result of the match only if <CODE>match</CODE> outputs <CODE> +true</CODE>. + +<P><CODE>Converse</CODE> looks for each part of the sentence (the name and the thing +the person likes) in two steps: first it finds the keywords <CODE>my name is</CODE> +or <CODE>I like</CODE> and extracts everything following those phrases, then it +looks within what it extracted for the word <CODE>and</CODE> and removes anything +following it. For example, when I typed + +<P><PRE>My name is Brian and I like Chinese food +</PRE> + +<P>the result of matching the name pattern was to give the variable +<CODE>name</CODE> the value + +<P><PRE>Brian and I like Chinese food +</PRE> + +<P>Then <CODE>strip.and</CODE> used a second pattern to eliminate everything +after the <CODE>and</CODE>. You might be tempted to extract the name in one step +by using a pattern like + +<P><PRE>[# my name is #name and #] +</PRE> + +<P>but I wanted to avoid that pattern because it won't match a +sentence that only contains + +<P><PRE>my name is Mary +</PRE> + +<P>without expressing any likes or dislikes. The program as I've +written it does accept these shorter sentences also. Later we'll see a more +complicated pattern that accepts sentences with or without <CODE>and</CODE> +using a single pattern. + +<P>The special symbol <CODE>#</CODE> in a pattern represents zero or more words. <CODE> +Match</CODE> recognizes other symbols with different meanings: + +<P><TABLE> +<TR><TD><CODE>#</CODE><TD> zero or more +<TR><TD><CODE>&</CODE><TD> one or more +<TR><TD><CODE>?</CODE><TD> zero or one +<TR><TD><CODE>!</CODE><TD> exactly one +</TABLE> + +<P>For example, if you'd like the <CODE>converse</CODE> program to recognize +only the first name of the person using it, you could change the relevant +pattern to + +<P><PRE>[# my name is !name #] +</PRE> + +<P>Then a conversation with the program might look like this: + +<P><PRE>? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian Harvey</U> +Hello, Brian +Nice meeting you! +? +</PRE> + +<P>The word <CODE>!name</CODE> in the pattern matched just the single word +<CODE>Brian</CODE>, not the multiple words <CODE>Brian Harvey</CODE> that the original +pattern would have selected. (If you modify <CODE>converse</CODE> in this way, it +should be possible to remove the invocation of <CODE>strip.and</CODE> in computing +the input to <CODE>do.name</CODE>. The single word stored in the variable <CODE>name</CODE> won't +contain any other clauses.) + +<P>So far, the patterns we've seen allow two extremes: the pattern can include +a single word that must be matched exactly, or it can allow <EM>any</EM> word +at all to be matched. It is also possible to write a pattern that calls for +words in some specified category--that is, words that satisfy some +predicate. Here is an example: + +<P><PRE>to ask.age +local "age +print [How old are you?] +if match [# !age:numberp #] readlist ~ + [print (sentence [You are] :age [years old.]] +end + +? <U>ask.age</U> +How old are you? +<U>I will be 36 next month</U> +You are 36 years old. +</PRE> + +<P>This is a slightly silly example, but it does illustrate the use +of a predicate to restrict which words can match a variable part of a +pattern. The pattern used in <CODE>ask.age</CODE> looks for a single word for +which <CODE>numberp</CODE> is <CODE>true</CODE>, that is, for a number. Any number of +words surrounding the number are allowed. + +<P>Of course, a predicate used in a pattern need not be a primitive one like +<CODE>numberp</CODE>. You may find it useful to write your own predicates that +select categories of words. Such a predicate might have a list built in: + +<P><PRE>to colorp :word +output memberp :word [red orange yellow blue green violet white black] +end +</PRE> + +<P>Or you could check some inherent property of a word: + +<P><PRE>to ends.y :word +output equalp last :word "y +end +</PRE> + +<P>In either case, what is essential is that your predicate must take +a word as its single input, and must output <CODE>true</CODE> if you want <CODE> +match</CODE> to accept the word to fill a slot in the pattern. + +<P>It is most common to want a predicate like <CODE>colorp</CODE> +above--one that tests its input word for membership in a certain list. A +special notation makes it possible to include such a list in the pattern +itself, instead of writing a predicate procedure. For example, suppose you +are writing a quiz program, and you want to ask the question, "What is the +quickest route from Boston to Framingham?" You'd like to accept answers +like these: + +<P><PRE>Mass Pike +the Massachusetts Turnpike +the Pike +</PRE> + +<P>but not <CODE>the Ohio Turnpike</CODE>! Here is a pattern you could use. + +<P><PRE>[?:in [the] ?:in [Mass Massachusetts] !:in [Pike Turnpike]] +</PRE> + +<P>The special predicate <CODE>in</CODE> is a version of <CODE>memberp</CODE> that +knows to look in the pattern, right after the element that invokes <CODE>in</CODE>, +for the list of acceptable words. This pattern accepts zero or one <CODE> +the</CODE>, zero or one of <CODE>Mass</CODE> or <CODE>Massachusetts</CODE>, and one of <CODE> +Pike</CODE> or <CODE>Turnpike</CODE>. That is, the first two words are optional and the +third is required. + +<P>Earlier I rejected the use of a pattern + +<P><PRE>[# my name is #name and #] +</PRE> + +<P>because I wanted also to be able to accept sentences without <CODE> +and</CODE> following the name. I promised to exhibit a pattern that would accept +both sentence forms. Here it is: + +<P><PRE>[# my name is #name:notand #] +</PRE> + +<P>This pattern uses a predicate <CODE>notand</CODE> that allows any word +except <CODE>and</CODE>. It's easy to write this predicate: + +<P><PRE>to notand :word +output not equalp :word "and +end +</PRE> + +<P>(By the way, the symbols indicating the number of words to match are meant +to be mnemonic. The question mark indicates that it's questionable whether +or not the word will be in the sentence. The exclamation point looks a +little like a digit 1, and also shouts emphatically that the word is +present. The number sign means that any number of words (including zero) is +okay, and the ampersand indicates that more words are required, namely at +least one instead of at least zero.) + +<P>We've seen various combinations of quantifiers (that's what I'll call +the characters like <CODE>#</CODE> that control how many words are matched), +variable names, and predicates: + +<P><TABLE> +<TR><TD><CODE>#</CODE><TD> no variable, no predicate (accept any word) +<TR><TD><CODE>#name</CODE><TD> set variable, no predicate +<TR><TD><CODE>?:in</CODE><TD> no variable, test predicate +<TR><TD><CODE>!age:numberp</CODE><TD> set variable, test predicate +</TABLE> + +<P>We are now about to discuss some of the more esoteric features of +the <CODE>match</CODE> program. So far, we have always compared a pattern against +a <EM>sentence,</EM> a list of words. It is also possible to match a pattern +against a structured list, with smaller lists among its members. <CODE>Match</CODE> +treats a sublist just like a word, if you don't want to examine the inner +structure of the sublist. Here are some examples. + +<P><PRE>? <U>print match [hello #middle goodbye] [hello is [very much] like goodbye]</U> +true +? <U>show :middle</U> +[is [very much] like] +? <U>print match [hi #middle:wordp bye] [hi and then bye]</U> +true +? <U>show :middle</U> +[and then] +? <U>print match [hi #middle:wordp bye] [hi and [then] bye]</U> +false + +? <U>print match [hi #mid:wordp #dle:listp bye] [hi and [then] bye]</U> +true +? <U>show :mid show :dle</U> +[and] +[[then]] +</PRE> + +<P>A more interesting possibility is to ask <CODE>match</CODE> to apply a +sub-pattern to a sublist. This is done by using the pattern (that is, a +list) in place of the name of a predicate. Here is an example: + +<P><PRE>? <U>print match [a #:[x # y] b] [a [x 111 y] [x 222 y] b]</U> +true +? <U>print match [a #:[x # y] b] [a [x 333 zzz] b]</U> +false +</PRE> + +<P>It is possible to include variable names in the subpattern, but +this makes sense only if the quantifier outside the pattern is <CODE>!</CODE> or +<CODE>?</CODE> because otherwise you may be trying to assign more than one value to +the same variable. Here's what I mean: + +<P><PRE>? <U>print match [a #all:[x #some y] b] [a [x 111 y] [x 222 y] b]</U> +true +? <U>show :all show :some</U> +[[x 111 y] [x 222 y]] +[222] +</PRE> + +<P>The variable <CODE>all</CODE> is properly set to contain both of the +lists that matched the subpattern, but the variable <CODE>some</CODE> only contains +the result of the second match. + +<P>If a list appears in a pattern without a quantifier before it, <CODE>match</CODE> +treats it as if it were preceded by "<CODE>!:</CODE>"; in other words, it tries +to match the subpattern exactly once. + +<P>A pattern element like <CODE>#:predicate</CODE> can match several members of the +target sentence; the predicate is applied to each candidate member +separately. For example: + +<P><PRE>? <U>print match [#nums:numberp #rest] [3 2 1 blastoff!]</U> +true +? <U>show :nums show :rest</U> +[3 2 1] +[blastoff!] +</PRE> + +<P>Sometimes you may want to match several members of a sentence, but +apply the predicate to all of the candidates <EM>together</EM> in one list. +To do this, use the quantifier <CODE>@</CODE>: + +<P><PRE>to threep :list +output equalp count :list 3 +end + +? <U>print match [@begin:threep #rest] [a b c d e]</U> +true +? <U>show :begin show :rest</U> +[a b c] +[d e] +</PRE> + +<P>In this example, I haven't used the predicate to examine the <EM> +nature</EM> of the matching words, but rather to control the <EM>number</EM> of +words that are matched. Here is another example that looks "inside" the +matching words. + +<P><PRE>to headtailp :list +if (count :list) < 2 [output "false] +output equalp first :list last :list +end + +? <U>print match [#front @good:headtailp #back] [a b c x d e f g x h i]</U> +true +? <U>show :front show :good show :back</U> +[a b c] +[x d e f g x] +[h i] +</PRE> + +<P>Think about all the different tests that <CODE>match</CODE> has to make +to find this match! Also, do you see why the first instruction of <CODE> +headtailp</CODE> is needed? + +<P>Some patterns are <EM>ambiguous;</EM> that is, there might be more than one +way to associate words from the matched sentence with quantifiers in the +pattern. For example, what should the following do? + +<P><PRE>match [#front xx #back] [a b c d xx e f g xx h i] +</PRE> + +<P>The word <CODE>xx</CODE> appears twice in the matched sentence. The +program could choose to use everything up to the first <CODE>xx</CODE> as <CODE> +front</CODE>, leaving six words for <CODE>back</CODE>, or it could choose to use +everything up to the second <CODE>xx</CODE> as <CODE>front</CODE>, leaving only two words +for <CODE>back</CODE>. In fact, each quantifier, starting from the left, matches +as many words as it can: + +<P><PRE>? <U>print match [#front xx #back] [a b c d xx e f g xx h i]</U> +true +? <U>show :front show :back</U> +[a b c d xx e f g] +[h i] +</PRE> + +<P>If that's not what you want, the quantifier <CODE>^</CODE> behaves +like <CODE>#</CODE> except that it matches as <EM>few</EM> words as possible. + +<P><PRE>? <U>print match [^front xx #back] [a b c d xx e f g xx h i]</U> +true +? <U>show :front show :back</U> +[a b c d] +[e f g xx h i] +</PRE> + +<P>We can use the <CODE>^</CODE> quantifier to fix a bug +in <A HREF="match.html#converse">the <CODE>converse</CODE> +program</A>: + +<P><PRE>? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I like bacon and eggs</U> +Hello, Brian and I like bacon +I'm glad you like bacon +Nice meeting you! +</PRE> + +<P>The problem here is that the pattern used by <CODE>strip.and</CODE> +divided the sentence at the second <CODE>and</CODE>, just as the earlier example +chose the second <CODE>xx</CODE> when I used <CODE>#</CODE> as the quantifier. We +can fix it this way: + +<P><PRE>to strip.and :text +local "short +if match [^short and #] :text [output :short] +output :text +end + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I like bacon and eggs</U> +Hello, Brian +I'm glad you like bacon +Nice meeting you! +</PRE> + +<P>There is just one more special feature of <CODE>match</CODE> left to describe. It +is another special predicate, like <CODE>in</CODE>, but this one is called <CODE> +anyof</CODE>. When you use <CODE>anyof</CODE>, the next member of the pattern should be +a <EM>list of patterns</EM> to test. <CODE>Match</CODE> tries each pattern in turn, +applied to list members as determined by the quantifier used. In practice, +though, <CODE>anyof</CODE> only makes sense when applied to several members as a +group, so the quantifier <CODE>@</CODE> should always be used. An example may make +this clear. I'm going to rewrite the <CODE>converse</CODE> program to check for +names and likes all at once. + +<P><PRE>to converse +local [response name like rest] +print [Hi, my name is Toby and I like ice cream] +print [Tell me about yourself] +make "response readlist +while match [@:anyof [[My name is #name:notand] + [I like #like:notand] + [&:notand]] + ?:in [and] #rest] ~ + :line ~ + [make "response :rest] +if not emptyp :name [print sentence "Hello, :name] +if not emptyp :like [print sentence [I'm glad you like] :like] +print [Nice meeting you!] +end +</PRE> + +<P>This program uses the <CODE>notand</CODE> predicate I wrote earlier. It +checks for clauses separated by the word <CODE>and</CODE>. Each clause can match +any of three patterns, one for the name, one for the liking, and a general +pattern that matches any other clause. The clauses can appear in any order. + +<P><PRE>? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I hate cheese</U> +Hello, Brian +Nice meeting you! + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>I like wings and my name is Jonathan</U> +Hello, Jonathan +I'm glad you like wings +Nice meeting you! +</PRE> + +<P><H2>Reinventing <CODE>Equalp</CODE> for Lists</H2> + +<P> + +<P><CODE>Match</CODE> is a kind of fancy <CODE>equalp</CODE> with a complicated understanding +of what equality means. One way to approach an understanding of <CODE>match</CODE> +is to begin with this question: Suppose Logo's primitive <CODE>equalp</CODE> only +worked for comparing two <EM>words</EM> for equality. (For the remainder of +this section, I won't use the word <CODE>equalp</CODE> at all; I'll call this +imaginary primitive <CODE>wordequalp</CODE> instead.) How would you write a +<CODE>listequalp</CODE> to compare two lists? This is basically a <CODE> +butfirst</CODE>-style recursive operation, but you have to be a little careful +about the fact that either input might be smaller than the other. + +<P><PRE>to listequalp :a :b +if emptyp :a [output emptyp :b] +if emptyp :b [output "false] +if wordequalp first :a first :b ~ + [output listequalp butfirst :a butfirst :b] +output "false +end +</PRE> + +<P>(This procedure contains the instruction <CODE>output "false</CODE> +twice, but it never says <CODE>output "true</CODE>. How can it ever say that two +lists are equal?) + +<P>There is one deficiency in the procedure as I've defined +it. The problem is that it only works for <EM>sentences</EM>--lists whose +members are words. If either list contains a sublist, <CODE>listequalp</CODE> will +try to apply <CODE>wordequalp</CODE> to that sublist. If you enjoy the exercise of +reinventing Logo primitives, you may want to fix that. But for my purposes, +the version here is good enough as a basis for further development of the +pattern matcher. + +<P><H2>A Simple Pattern Matcher</H2> + +<P>We can extend the idea of <CODE>listequalp</CODE> slightly to make a pattern +matcher that only recognizes the special word <CODE>#</CODE> to mean "match zero +or more words." We won't do any of the fancy things like storing the +matching words in a variable. + +<P><PRE>to match :pat :sen +if emptyp :pat [output emptyp :sen] +if emptyp :sen [if equalp first :pat "# + [output match butfirst :pat :sen] + [output "false]] +if equalp first :pat "# [output or match butfirst :pat :sen + match :pat butfirst :sen] +if equalp first :pat first :sen ~ + [output match butfirst :pat butfirst :sen] +output "false +end +</PRE> + +<P>The end test is more complicated in this program than in <CODE>listequalp</CODE> +because the combination of an empty sentence and a nonempty pattern can +still be a match, if the pattern is something like <CODE>[#]</CODE> that matches +zero or more words. + +<P> + +<P>The really interesting part of this procedure is what happens if a <CODE>#</CODE> +is found in the pattern. The match succeeds (outputs <CODE>true</CODE>) if one of +two smaller matches succeeds. The two smaller matches correspond to two +possible conditions: the <CODE>#</CODE> can match zero words, or more than zero. +The first case is detected by the expression + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>For example, suppose you want to evaluate + +<P><PRE>match [# cream] [cream] +</PRE> + +<P>This expression should yield the value <CODE>true</CODE>, with the <CODE> +#</CODE> matching no words in the sentence. In this example the expression + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>is equivalent to + +<P><PRE>match [cream] [cream] +</PRE> + +<P>which straightforwardly outputs <CODE>true</CODE>. + +<P>On the other hand, the expression + +<P><PRE>match :pat butfirst :sen +</PRE> + +<P>comes into play when the <CODE>#</CODE> has to match at least one word. +For example, consider the expression + +<P><PRE>match [# cream] [ice cream] +</PRE> + +<P>Here the <CODE>#</CODE> should match the word <CODE>ice</CODE>. The expression + +<P><PRE>match :pat butfirst :sen +</PRE> + +<P>is here equivalent to + +<P><PRE>match [# cream] [cream] +</PRE> + +<P>But this is the example that was <CODE>true</CODE> just above. + +<P>If the <CODE>#</CODE> has to match more than one word, several recursive +invocations of <CODE>match</CODE> are required, each one taking the <CODE>butfirst</CODE> +of the sentence once. For example, suppose we start with + +<P><PRE>match [# cream] [vanilla ice cream] +</PRE> + +<P>Here is the sequence of recursive invocations leading to a <CODE> +true</CODE> match: + +<P><PRE>match :pat butfirst :sen match [# cream] [ice cream] + match :pat butfirst :sen match [# cream] [cream] + match butfirst :pat :sen match [cream] [cream] +</PRE> + +<P>I have been talking as if Logo only evaluated whichever of the two +expressions + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>and + +<P><PRE>match :pat butfirst :sen +</PRE> + +<P>is appropriate for the particular inputs used. Actually, <EM> +both</EM> expressions are evaluated each time, so there are many recursive +invocations of <CODE>match</CODE> that come out <CODE>false</CODE>. However, the purpose +of the primitive operation <CODE>or</CODE> is to output <CODE>true</CODE> if <EM> +either</EM> of its inputs is <CODE>true</CODE>. To understand fully how <CODE>match</CODE> +works, you'll almost certainly have to trace a few examples carefully by +hand. + +<P><H2>Efficiency and Elegance</H2> + + +<P>Pattern matching is a complicated task, and even the best-written programs +are not blindingly fast. But what is the "best-written" program? In the +simple pattern matcher of the last section, the instruction + +<P><PRE>if equalp first :pat "# [output or match butfirst :pat :sen + match :pat butfirst :sen] +</PRE> + +<P>is extremely compact and elegant. It packs a lot of power into a +single instruction, by combining the results of two recursive invocations +with <CODE>or</CODE>. The similarity of the inputs to the two invocations is also +appealing. + +<P>The trouble with this instruction is that it is much slower than necessary, +because it always tries both recursive invocations even if the first one +succeeds. A more efficient way to program the same general idea would be +this: + +<P><PRE>if equalp first :pat "# ~ + [if match butfirst :pat :sen + [output "true] + [output match :pat butfirst :sen]] +</PRE> + +<P>This new version is much less pleasing to the eye, but it's much +faster. The reason is that if the expression + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>outputs <CODE>true</CODE>, then the other recursive invocation is avoided. + +<P>It's a mistake to make efficiency your only criterion for program style. +Sometimes it's worth a small slowdown of your program to achieve a large +gain in clarity. But this is a case in which the saving is quite +substantial. Here is a partial trace of the evaluation of + +<P><PRE>match [cat # bat] [cat rat bat] +</PRE> + +<P>using the original version of the procedure: + +<P><PRE>match [cat # bat] [cat rat bat] [cat # bat] [cat rat bat] + match butfirst :pat butfirst :sen [# bat] [rat bat] + match butfirst :pat :sen [bat] [rat bat] + match :pat butfirst :sen [# bat] [bat] + match butfirst :pat :sen [bat] [bat] + match butfirst :pat butfirst :sen [] [] +* match :pat butfirst :sen [# bat] [] +* match butfirst :pat :sen [bat] [] +</PRE> + +<P>The two invocations marked with asterisks are avoided by using the +revised version. These represent 25% of the invocations of <CODE>match</CODE>, a +significant saving. (Don't think that the program necessarily runs 25% +faster. Not all invocations take the same amount of time. This is just a +rough measure.) If there were more words after the <CODE>#</CODE> in the pattern, +the saving would be even greater. + +<P>In this situation we achieve a large saving of time by reorganizing the flow +of control in the program. This is quite different from a more common sort +of concern for efficiency, the kind that leads people to use shorter +variable names so that the program will be a little smaller, or to worry +about whether to use <CODE>fput</CODE> or <CODE>sentence</CODE> in a case where either +would do. These small "bumming" kinds of optimization are rarely worth +the trouble they cause. Figuring out how many times <CODE>match</CODE> is invoked +using each version is a simple example of the branch of computer science +called <EM>analysis of algorithms;</EM> a more profound analysis might use +mathematical techniques to compare the two versions in general, rather than +for a single example. + +<P>In the full version of the pattern matcher, listed at the end of this +project description, I've taken some care to avoid unnecessary matching. +On the other hand, the full version has less flexibility than the simple +version because of its ability to assign matching words to variables. +Consider a case like + +<P><PRE>match [# #] [any old list of words] +</PRE> + +<P>Which <CODE>#</CODE> matches how many words? It doesn't matter if you +don't store the result of the match in variables. But if the pattern is +<CODE>[#a #b]</CODE> instead, there has to be a uniform rule about which part of +the pattern matches what. (In my pattern matcher, all of the words would be +assigned to <CODE>a</CODE>, and <CODE>:b</CODE> would be empty. In general, pattern +elements toward the left match as many words as possible when there is any +ambiguity.) The simple pattern matcher doesn't have this problem, and can +be written to match the ambiguous pattern whichever way gives a <CODE>true</CODE> +result most quickly. + +<P>By the way, what if the two expressions that invoke <CODE>match</CODE> recursively +were reversed in the revised instruction? That is, what if the instruction +were changed again, to read + +<P><PRE>if equalp first :pat "# ~ + [if match :pat butfirst :sen + [output "true] + [output match butfirst :pat :sen]] +</PRE> + +<P>Would this be more or less efficient than the previous version? + +<P><H2>Logo's Evaluation of Inputs</H2> + + + +<P>The discussion about efficiency started because Logo <EM>evaluates</EM> the +inputs to the primitive operation <CODE>or</CODE> before invoking the procedure. +That is, in the example in question, Logo invokes <CODE>match</CODE> twice before +using <CODE>or</CODE> to check whether either invocation output <CODE>true</CODE>. +This is consistent with the way Logo does things in general: To evaluate an +expression that uses some procedure, Logo first evaluates all the inputs for +that procedure, and then invokes the procedure with the evaluated inputs. +Logo's rule is extremely consistent (except for the <CODE>to</CODE> command), but +it isn't the only possible way. In Lisp, a language that's like Logo in +many ways, each procedure can choose whether or not its inputs should be +evaluated in advance. + +<P> + +<P>An example may make it clearer what I mean by this. Lisp has a +procedure called <CODE>set</CODE> that's +equivalent to the Logo <CODE>make</CODE>. You say + +<P><PRE>(set 'var 27) +</PRE> + +<P>as the equivalent of + +<P><PRE>make "var 27 +</PRE> + +<P>But Lisp also has a version called <CODE>setq</CODE> whose first input is +<EM>not</EM> evaluated before <CODE>setq</CODE> is invoked. It's as if there were +an automatic quote mark before the first input, so you just say + +<P><PRE>(setq var 27) +</PRE> + +<P>with the same effect as the other examples. + +<P>Except for the special format of the <CODE>to</CODE> command that forms the title +line of a procedure, Berkeley Logo and many other Logo dialects +do not have any form of automatically-quoted inputs. The design principle +was that consistency of evaluation would make the rules easier to +understand. Some other versions of Logo do use auto-quoting for certain +procedures. For example, in Berkeley Logo, to edit the definition of a +procedure named <CODE>doit</CODE> you type the instruction + +<P><PRE>edit "doit +</PRE> + +<P>But in some other versions of Logo you instead say + +<P><PRE>edit doit +</PRE> + +<P>because in those versions, the <CODE>edit</CODE> command auto-quotes its +input. One possible reason for this design decision is that teachers of +young children like to present Logo without an explicit discussion of the +evaluation rules. They teach the <CODE>edit</CODE> command as a special case, +rather than as just the invocation of a procedure like everything else. +Using this approach, auto-quoting the input avoids having to explain what +that quotation mark means. + +<P>The advantage of the non-auto-quoting version of <CODE>edit</CODE> isn't just in +some abstract idea of consistency. It allows us to take advantage of +composition of functions. Suppose you are working on a very large project, +a video game, with hundreds of procedures. You want to edit all the procedures +having to do with the speed of the spaceships, or whatever moves around the +screen in this game. Luckily, all the procedures you want have the word +<CODE>speed</CODE> as part of their names; they are called <CODE>shipspeed</CODE> or +<CODE>asteroidspeed</CODE> or <CODE>speedcontrol</CODE>. You can say + +<P><PRE>edit filter [substringp "speed ?] procedures +</PRE> + +<P>(<CODE>Procedures</CODE> is a Berkeley Logo primitive operation that +outputs a list of all procedures defined in the workspace; <CODE>substringp</CODE> +is a predicate that checks whether one word appears as part of a longer +word.) An auto-quoting <CODE>edit</CODE> command wouldn't have this flexibility. + +<P>The reason all this discussion is relevant to the pattern matcher is that +the Lisp versions of <CODE>or</CODE> and <CODE>and</CODE> have auto-quoted inputs, which +get evaluated one by one. As soon as one of the inputs to <CODE>or</CODE> turns +out to be <CODE>true</CODE> (or one of the inputs to <CODE>and</CODE> is <CODE>false</CODE>), the +evaluation stops. This is very useful not only for efficiency reasons, as +in the discussion earlier, but to prevent certain kinds of errors. For +example, consider this Logo instruction: + +<P><PRE>if not emptyp :list [if equalp first :list 1 [print "one]] +</PRE> + +<P>It would be pleasant to be able to rewrite that instruction this +way: + +<P><PRE>if and (not emptyp :list) (equalp first :list 1) [print "one] +</PRE> + +<P>The use of <CODE>and</CODE>, I think, makes the program structure clearer +than the nested <CODE>if</CODE>s. That is, it's apparent in the second version +that something (the <CODE>print</CODE>) is to be done if two conditions are met, and +that that's all that happens in the instruction. In the first version, +there might have been another instruction inside the range of the first +(outer) <CODE>if</CODE>; you have to read carefully to see that that isn't so. + +<P>Unfortunately, the second version won't work in Logo. If <CODE>:list</CODE> is in +fact empty, the expression + +<P><PRE>(equalp first :list 1) +</PRE> + +<P>is evaluated before <CODE>and</CODE> is invoked; this expression causes +an error message because <CODE>first</CODE> doesn't accept an empty input. In +Lisp, the corresponding instruction <EM>would</EM> work, because the two +predicate expressions would be evaluated serially and the second wouldn't +be evaluated if the first turned out to be false. + +<P>The serial evaluation of inputs to <CODE>and</CODE> and <CODE>or</CODE> is so +often useful that some people have proposed it for Logo, even at the cost of +destroying the uniform evaluate-first rule. But if you want a serial <CODE> +and</CODE> or <CODE>or</CODE>, it's easy enough to write them, if you explicitly quote +the predicate expressions that are its inputs: + +<P><PRE>to serial.and :pred1 :pred2 +if not run :pred1 [output "false] +output run :pred2 +end + +to serial.or :pred1 :pred2 +if run :pred1 [output "true] +output run :pred2 +end +</PRE> + +<P>Here's how you would use <CODE>serial.and</CODE> to solve the problem +with the nested <CODE>if</CODE>s: + +<P><PRE>if (serial.and [not emptyp :list] [equalp first :list 1]) [print "one] +</PRE> + +<P>Similarly, you could use <CODE>serial.or</CODE> instead of <CODE>or</CODE> to +solve the efficiency problem in the first version of the pattern matcher: + +<P><PRE>output serial.or [match butfirst :pat :sen] [match :pat butfirst :sen] +</PRE> + +<P>These procedures depend on the fact that the predicate expressions +that are used as their inputs are presented inside square brackets; that's +why they are not evaluated before <CODE>serial.and</CODE> or <CODE>serial.or</CODE> is +invoked. + +<P><H2>Indirect Assignment</H2> + + + +<P>From now on, I'll be talking about the big pattern matcher, not the simple +one I introduced to illustrate the structure of the problem. Here is the +top-level procedure <CODE>match</CODE>: + +<P><PRE>to match :pat :sen +local [special.var special.pred special.buffer in.list] +if or wordp :pat wordp :sen [output "false] +if emptyp :pat [output emptyp :sen] +if listp first :pat [output special fput "!: :pat :sen] +if memberp first first :pat [? # ! & @ ^] [output special :pat :sen] +if emptyp :sen [output "false] +if equalp first :pat first :sen ~ + [output match butfirst :pat butfirst :sen] +output "false +end +</PRE> + +<P>As you'd expect, there are more cases to consider in this more +featureful version, but the basic structure is similar to the simple +matcher. The instructions starting <CODE>if emptyp</CODE>, <CODE>if memberp</CODE>, +<CODE>if emptyp</CODE>, and <CODE>if equalp</CODE> play the same roles as similar +instructions in the other version. (The <CODE>memberp</CODE> test replaces the +comparison against the word <CODE>#</CODE> with a wider range of choices.) + +<P>The first <CODE>if</CODE> instruction tests for errors in the format of the pattern +or the sentence to be matched, in which a word is found where a list was +expected. It's not important if you use well-formed inputs to <CODE>match</CODE>. +The <CODE>listp</CODE> test essentially converts a pattern like + +<P><PRE>[foo [some # pattern] baz] +</PRE> + +<P>to the equivalent form + +<P><PRE>[foo !:[some # pattern] baz] +</PRE> + +<P>The interesting new case comes when <CODE>match</CODE> sees a word in the +pattern that starts with one of the six special quantifier characters. In +this case, <CODE>match</CODE> invokes <CODE>special</CODE> to check for a match. + +<P>One of the interesting properties of <CODE>special</CODE> is that it has to be able +to assign a value to a variable whose name is not built into the program, +but instead is part of the <EM>data</EM> used as input to the program. +That is, if the word + +<P><PRE>?howmany:numberp +</PRE> + +<P>appears in the pattern, <CODE>special</CODE> (or one of its +subprocedures) must assign a value to the variable named <CODE>howmany</CODE>, but +there is no instruction of the form + +<P><PRE>make "howmany ... +</PRE> + +<P>anywhere in the program. Instead, <CODE>match</CODE> has <EM>another</EM> +variable, whose name is <CODE>special.var</CODE>, whose <EM>value</EM> is the <EM> +name</EM> <CODE>howmany</CODE>. The assignment of the matching words to the +pattern-specified variable is done with an instruction like + +<P><PRE>make :special.var ... +</PRE> + +<P>Here the first input to <CODE>make</CODE> is not a quoted word, as usual, +but an expression that must be evaluated to figure out which variable to use. + +<P><CODE>Special</CODE>, then, has two tasks. First it must divide a word like + +<P><PRE>?howmany:numberp +</PRE> + +<P>into its component parts; then it must carry out the matching +tasks that are the <EM>meaning</EM> of those parts. These two tasks are like +a smaller version of what a programming language interpreter like Logo +does. Finding the meaningful parts of an instruction is called the <EM> +syntax</EM> of a language, and understanding what the parts mean is called the +<EM>semantics</EM> of the language. <CODE>Special</CODE> has two instructions, one +for the syntax and one for the semantics: + +<P><PRE>to special :pat :sen +set.special parse.special butfirst first :pat " +output run word "match first first :pat +end +</PRE> + +<P>To <EM>parse</EM> something is to divide it into its pieces. <CODE> +Parse.special</CODE> outputs a list of the form + +<P><PRE>[howmany numberp] +</PRE> + +<P>for the example we're considering. Then <CODE>set.special</CODE> assigns +the two members of this list as the values of two variables. The variable +named <CODE>special.var</CODE> is given the value <CODE>howmany</CODE>, and the variable +named <CODE>special.pred</CODE> is given the value <CODE>numberp</CODE>. This preliminary +work is what makes possible the indirect assignment described earlier. + +<P><H2>Defaults</H2> + +<P>What happens if the pattern has a word like + +<P><PRE>?:numberp +</PRE> + +<P>without a variable name? What happens when the program tries to +assign a value to the variable named in the pattern? <CODE>Set.special</CODE> +contains the instruction + +<P><PRE>if emptyp :special.var [make "special.var "special.buffer] +</PRE> + +<P>The effect of this instruction is that if you do not mention a +variable in the pattern, the variable named <CODE>special.buffer</CODE> will be +used to hold the results of the match. This variable is the <EM> +default</EM> variable, the one used if no other is specified. + +<P>It's important, by the way, that the variable <CODE>special.buffer</CODE> is +declared to be local in procedure <CODE>match</CODE>. What makes it important is +that <CODE>match</CODE> is recursive; if you use a pattern like + +<P><PRE>[a # b # c] +</PRE> + +<P>then the matching of the second <CODE>#</CODE> is a subproblem of the +matching of the first one. <CODE>Match</CODE> invokes <CODE>special</CODE>, which invokes +<CODE>match#</CODE>, which invokes <CODE>#test</CODE>, which invokes <CODE> +match</CODE> on the <CODE>butfirst</CODE> of the pattern. That <CODE>butfirst</CODE> contains +another <CODE>#</CODE>. Each of these uses the variable <CODE>special.buffer</CODE> to +remember the words it is trying as a match; since the variable is declared +local, the two don't get confused. (This means, by the way, that you can +really confuse <CODE>match</CODE> by using the same variable name twice in a +pattern. It requires a fairly complicated pattern to confuse <CODE>match</CODE>, +but here is an example. The first result is correct, the second incorrect. + +<P><PRE>? <U>print match [a #x b &y ! c] [a i b j c b k c]</U> +true +? <U>show :x show :y</U> +[i] +[j c b] +? <U>print match [a #x b &x ! c] [a i b j c b k c]</U> +false +</PRE> + +<P>The only change is that the variable name <CODE>x</CODE> is used twice in +the second pattern, and as a result, <CODE>match</CODE> doesn't find the correct +match. You'll know that you really understand how <CODE>match</CODE> works if you +can explain why it <EM>won't</EM> fail if the <CODE>!</CODE> is removed from the +pattern.) + +<P>When writing a tool to be used in other projects, especially if the tool +will be used by other people, it's important to think about defaults. What +should the program do if some piece of information is missing? If you don't +provide for a default explicitly, the most likely result is a Logo error +message; your program will end up trying to take <CODE>first</CODE> of an empty +list, or something like that. + +<P>Another default in the pattern matcher is for the predicate used to test +matches. For example, what happens when the word + +<P><PRE>?howmany +</PRE> + +<P>appears in the pattern, without a predicate? This case is +recognized by <CODE>parse.special</CODE>, in the instruction + +<P><PRE>if emptyp :word [output list :var "always] +</PRE> + +<P>The special predicate <CODE>always</CODE> is used if no other is given in +the pattern. <CODE>Always</CODE> has a very simple definition: + +<P><PRE>to always :x +output "true +end +</PRE> + +<P> + +<P><H2>Program as Data</H2> + + +<P>The instruction in <CODE>special</CODE> that carries out the semantics of a special +pattern-matching instruction word is + +<P><PRE>output run word "match first first :pat +</PRE> + +<P>If the pattern contains the word + +<P><PRE>?howmany:numberp +</PRE> + +<P>then this instruction extracts the quantifier character +<CODE>?</CODE> (the first character of the first word of the pattern) and makes +from it a procedure name <CODE>match?</CODE>. That name is then <CODE>run</CODE> as a Logo +expression; that is, <CODE>special</CODE> invokes a procedure whose name is <CODE> +match?</CODE>. + +<P>Most programming languages do not allow the invocation of a procedure based +on finding the name of the procedure in the program's data. Generally there +is a very strict separation between program and data. Being able to +manipulate data to create a Logo instruction, and then run it, is a very +powerful part of Logo. It is also used to deal with the names of predicates +included in the pattern; to see if a word in the sentence input to <CODE> +match</CODE> is a match for a piece of the pattern, the predicate <CODE>try.pred</CODE> +contains the instruction + +<P><PRE>output run list :special.pred quoted first :sen +</PRE> + +<P>This instruction generates a list whose first member is the name +of the predicate found in the pattern and whose second and last member is a +word from the sentence. Then this list is run as a Logo expression, which +should yield either <CODE>true</CODE> or <CODE>false</CODE> as output, indicating whether +or not the word is acceptable. + +<P><H2>Parsing Rules</H2> + +<P>When you are reading the program, remember that the kind of pattern that +I've written as + +<P><PRE>[begin !:[smaller # pattern] end] +</PRE> + +<P>is read by Logo as if I'd written + +<P><PRE>[begin !: [smaller # pattern] end] +</PRE> + +<P>That is to say, this pattern is a list of four members. I think +of the middle two as a unit, representing a single thing to match. The +sublist takes the place of a predicate name after the <CODE>!</CODE> quantifier. +But for Logo, there is no predicate name in the word starting with the +exclamation point; the pattern is a separate member of the large list. +That's why <CODE>set.special</CODE> uses the expression + +<P><PRE>emptyp :special.pred +</PRE> + +<P>to test for this situation, rather than <CODE>listp</CODE>. After <CODE> +parse.special</CODE> does its work, all it has found is a colon with nothing +following it. <CODE>Set.special</CODE> has to look at the next member of the +pattern list in order to find the subpattern. + +<P><H2>Further Explorations</H2> + +<P>Chapter 9 is a large program that uses <CODE>match</CODE>. It +may give you ideas for the ways in which this tool can be used in your own +programs. Here, instead of talking about applications of <CODE>match</CODE>, I'll +discuss some possible extensions or revisions of the pattern matcher itself. + +<P>There are many obvious small extensions. For example, to complement the +special <CODE>in</CODE> primitive, you could write <CODE>notin</CODE>, which would accept +all <EM>but</EM> the members of the following list. You could allow the use +of a number as the predicate, meaning that exactly that many matching words +are required. That is, in the example for which I invented the predicate +<CODE>threep</CODE>, I would instead be able to use + +<P><PRE>[@begin:3 #rest] +</PRE> + +<P>as the pattern. + +<P>There is no convenient way to say in a pattern that some subpattern can be +repeated several times, if the subpattern is more than a single word. That +is, in the second version of <CODE>converse</CODE>, instead of having to use <CODE> +while</CODE> to chop off pieces of the matched +sentence into a variable <CODE>rest</CODE>, I'd like to be able to say in the +pattern something like + +<P><PRE>[@@: [@:anyof [[my name is #name:notand] + [i like #like:notand] + [&:notand]] + ?:in [and]]] +</PRE> + +<P>Here the doubled atsign (<CODE>@@</CODE>) means that the entire pattern +that follows should be matched repeatedly instead of only once. + +<P>For other approaches to pattern matching, you might want to read about the +programming languages Snobol and Icon, each of which includes pattern +matching as one of its main features. + +<P> +<TABLE width="100%"><TR><TD><A HREF="../v2-toc2.html">(back to Table of Contents)</A> +<TD align="right"><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A> +</TABLE> + +<P><H2>Program Listing</H2> + +<P><P> +<P><PRE> +to match :pat :sen +local [special.var special.pred special.buffer in.list] +if or wordp :pat wordp :sen [output "false] +if emptyp :pat [output emptyp :sen] +if listp first :pat [output special fput "!: :pat :sen] +if memberp first first :pat [? # ! & @ ^] [output special :pat :sen] +if emptyp :sen [output "false] +if equalp first :pat first :sen ~ + [output match butfirst :pat butfirst :sen] +output "false +end + +;; Parsing quantifiers + +to special :pat :sen +set.special parse.special butfirst first :pat " +output run word "match first first :pat +end + +to parse.special :word :var +if emptyp :word [output list :var "always] +if equalp first :word ": [output list :var butfirst :word] +output parse.special butfirst :word word :var first :word +end + +to set.special :list +make "special.var first :list +make "special.pred last :list +if emptyp :special.var [make "special.var "special.buffer] +if memberp :special.pred [in anyof] [set.in] +if not emptyp :special.pred [stop] +make "special.pred first butfirst :pat +make "pat fput first :pat butfirst butfirst :pat +end + +to set.in +make "in.list first butfirst :pat +make "pat fput first :pat butfirst butfirst :pat +end + +;; Exactly one match + +to match! +if emptyp :sen [output "false] +if not try.pred [output "false] +make :special.var first :sen +output match butfirst :pat butfirst :sen +end + +;; Zero or one match + +to match? +make :special.var [] +if emptyp :sen [output match butfirst :pat :sen] +if not try.pred [output match butfirst :pat :sen] +make :special.var first :sen +if match butfirst :pat butfirst :sen [output "true] +make :special.var [] +output match butfirst :pat :sen +end + +;; Zero or more matches + +to match# +make :special.var [] +output #test #gather :sen +end + +to #gather :sen +if emptyp :sen [output :sen] +if not try.pred [output :sen] +make :special.var lput first :sen thing :special.var +output #gather butfirst :sen +end + +to #test :sen +if match butfirst :pat :sen [output "true] +if emptyp thing :special.var [output "false] +output #test2 fput last thing :special.var :sen +end + +to #test2 :sen +make :special.var butlast thing :special.var +output #test :sen +end + +;; One or more matches + +to match& +output &test match# +end + +to &test :tf +if emptyp thing :special.var [output "false] +output :tf +end + +;; Zero or more matches (as few as possible) + +to match^ +make :special.var [] +output ^test :sen +end + +to ^test :sen +if match butfirst :pat :sen [output "true] +if emptyp :sen [output "false] +if not try.pred [output "false] +make :special.var lput first :sen thing :special.var +output ^test butfirst :sen +end + +;; Match words in a group + +to match@ +make :special.var :sen +output @test [] +end + +to @test :sen +if @try.pred [if match butfirst :pat :sen [output "true]] +if emptyp thing :special.var [output "false] +output @test2 fput last thing :special.var :sen +end + +to @test2 :sen +make :special.var butlast thing :special.var +output @test :sen +end + +;; Applying the predicates + +to try.pred +if listp :special.pred [output match :special.pred first :sen] +output run list :special.pred quoted first :sen +end + +to quoted :thing +if listp :thing [output :thing] +output word "" :thing +end + +to @try.pred +if listp :special.pred [output match :special.pred thing :special.var] +output run list :special.pred thing :special.var +end + +;; Special predicates + +to always :x +output "true +end + +to in :word +output memberp :word :in.list +end + +to anyof :sen +output anyof1 :sen :in.list +end + +to anyof1 :sen :pats +if emptyp :pats [output "false] +if match first :pats :sen [output "true] +output anyof1 :sen butfirst :pats +end +</PRE><P> + + + +<P><A HREF="../v2-toc2.html">(back to Table of Contents)</A> +<P><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A> + +<P> +<ADDRESS> +<A HREF="../index.html">Brian Harvey</A>, +<CODE>bh@cs.berkeley.edu</CODE> +</ADDRESS> +</BODY> +</HTML> diff --git a/js/games/nluqo.github.io/~bh/v2ch7/match.lg b/js/games/nluqo.github.io/~bh/v2ch7/match.lg new file mode 100644 index 0000000..71b2c09 --- /dev/null +++ b/js/games/nluqo.github.io/~bh/v2ch7/match.lg @@ -0,0 +1,166 @@ +to match :pat :sen +local [special.var special.pred special.buffer in.list] +if or wordp :pat wordp :sen [output "false] +if emptyp :pat [output emptyp :sen] +if listp first :pat [output special fput "!: :pat :sen] +if memberp first first :pat [? # ! & @ ^] [output special :pat :sen] +if emptyp :sen [output "false] +if equalp first :pat first :sen ~ + [output match butfirst :pat butfirst :sen] +output "false +end + +;; Parsing quantifiers + +to special :pat :sen +set.special parse.special butfirst first :pat " +output run word "match first first :pat +end + +to parse.special :word :var +if emptyp :word [output list :var "always] +if equalp first :word ": [output list :var butfirst :word] +output parse.special butfirst :word word :var first :word +end + +to set.special :list +make "special.var first :list +make "special.pred last :list +if emptyp :special.var [make "special.var "special.buffer] +if memberp :special.pred [in anyof] [set.in] +if not emptyp :special.pred [stop] +make "special.pred first butfirst :pat +make "pat fput first :pat butfirst butfirst :pat +end + +to set.in +make "in.list first butfirst :pat +make "pat fput first :pat butfirst butfirst :pat +end + +;; Exactly one match + +to match! +if emptyp :sen [output "false] +if not try.pred [output "false] +make :special.var first :sen +output match butfirst :pat butfirst :sen +end + +;; Zero or one match + +to match? +make :special.var [] +if emptyp :sen [output match butfirst :pat :sen] +if not try.pred [output match butfirst :pat :sen] +make :special.var first :sen +if match butfirst :pat butfirst :sen [output "true] +make :special.var [] +output match butfirst :pat :sen +end + +;; Zero or more matches + +to match# +make :special.var [] +output #test #gather :sen +end + +to #gather :sen +if emptyp :sen [output :sen] +if not try.pred [output :sen] +make :special.var lput first :sen thing :special.var +output #gather butfirst :sen +end + +to #test :sen +if match butfirst :pat :sen [output "true] +if emptyp thing :special.var [output "false] +output #test2 fput last thing :special.var :sen +end + +to #test2 :sen +make :special.var butlast thing :special.var +output #test :sen +end + +;; One or more matches + +to match& +output &test match# +end + +to &test :tf +if emptyp thing :special.var [output "false] +output :tf +end + +;; Zero or more matches (as few as possible) + +to match^ +make :special.var [] +output ^test :sen +end + +to ^test :sen +if match butfirst :pat :sen [output "true] +if emptyp :sen [output "false] +if not try.pred [output "false] +make :special.var lput first :sen thing :special.var +output ^test butfirst :sen +end + +;; Match words in a group + +to match@ +make :special.var :sen +output @test [] +end + +to @test :sen +if @try.pred [if match butfirst :pat :sen [output "true]] +if emptyp thing :special.var [output "false] +output @test2 fput last thing :special.var :sen +end + +to @test2 :sen +make :special.var butlast thing :special.var +output @test :sen +end + +;; Applying the predicates + +to try.pred +if listp :special.pred [output match :special.pred first :sen] +output run list :special.pred quoted first :sen +end + +to quoted :thing +if listp :thing [output :thing] +output word "" :thing +end + +to @try.pred +if listp :special.pred [output match :special.pred thing :special.var] +output run list :special.pred thing :special.var +end + +;; Special predicates + +to always :x +output "true +end + +to in :word +output memberp :word :in.list +end + +to anyof :sen +output anyof1 :sen :in.list +end + +to anyof1 :sen :pats +if emptyp :pats [output "false] +if match first :pats :sen [output "true] +output anyof1 :sen butfirst :pats +end diff --git a/js/games/nluqo.github.io/~bh/v2ch7/v2ch7.html b/js/games/nluqo.github.io/~bh/v2ch7/v2ch7.html new file mode 100644 index 0000000..00b39b3 --- /dev/null +++ b/js/games/nluqo.github.io/~bh/v2ch7/v2ch7.html @@ -0,0 +1,1437 @@ +<HTML> +<HEAD> +<TITLE>Computer Science Logo Style vol 2 ch 7: Pattern Matcher</TITLE> +</HEAD> +<BODY> +<CITE>Computer Science Logo Style</CITE> volume 2: +<CITE>Advanced Techniques</CITE> 2/e Copyright (C) 1997 MIT +<H1>Pattern Matcher</H1> + +<TABLE width="100%"><TR><TD> +<IMG SRC="../csls2.jpg" ALT="cover photo"> +<TD><TABLE> +<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian +Harvey</A><BR>University of California, Berkeley</CITE> +<TR><TD align="right"><BR> +<TR><TD align="right"><A HREF="../pdf/v2ch07.pdf">Download PDF version</A> +<TR><TD align="right"><A HREF="../v2-toc2.html">Back to Table of Contents</A> +<TR><TD align="right"><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A> +<TR><TD align="right"><A HREF="https://mitpress.mit.edu/books/computer-science-logo-style-second-edition-volume-2">MIT +Press web page for <CITE>Computer Science Logo Style</CITE></A> +</TABLE></TABLE> + +<HR><P>Program file for this chapter: <A HREF="match.lg"><CODE>match</CODE></A> + +<P> +In a <EM>conversational</EM> program, one that carries on a conversation with +the user, you may often have occasion to compare what the user types with +some expected response. For example, a quiz program will compare the user's +response with the correct answer; if you've just asked "how are you," you +might look for words like "fine" or "lousy" in the reply. The main +tools that Logo provides for such comparisons are <CODE>equalp</CODE>, which +compares two values for exact equality, and <CODE>memberp</CODE>, which compares +one datum with a list of alternatives. This project provides a more +advanced comparison tool. + +<P>Most of the projects in this book are fairly complicated in their inner +workings, but relatively simple in the external appearance of what they do. +This project is the reverse; the actual program is not so complex, but it +does quite a lot, and it will take a while to explain all of it. Pattern +matching is a powerful programming tool, and I hope you won't be put off by +the effort required to learn how to use it. + +<P>A <EM>pattern</EM> is a list in which some members are not made explicit. +This definition is best understood by considering an example. Consider the +pattern + +<P><PRE>[Every # is a #] +</PRE> + +<P>The words <CODE>every</CODE>, <CODE>is</CODE>, and <CODE>a</CODE> represent themselves +explicitly. The two number signs, however, are symbols representing "zero +or more arbitrary data." Here are some lists that would match the +pattern: + +<P><PRE>[Every man is a mortal] +[Every computer programmer is a genius] +[Every is a word] +[Every datum is a word or a list] +</PRE> + +<P>Here are some lists that would <EM>not</EM> match the pattern: + +<P><PRE>[Socrates is a man] +[Every man is an animal] +[Everyone I know is a friend] +[I think every list is a match] +</PRE> + +<P>The first of these examples doesn't match the pattern because the +word <CODE>every</CODE> is missing. The second has <CODE>an</CODE> instead of <CODE>a</CODE>, +while the third has <CODE>everyone</CODE> instead of <CODE>every</CODE>. The fourth has +the extra words <CODE>I think</CODE> before the word <CODE>every</CODE>. This last +example <EM>would</EM> match the pattern + +<P><PRE>[# every # is a #] +</PRE> + +<P>because this new pattern allows for extra words at the beginning. + +<P><CODE>Match</CODE> is a predicate that takes two inputs. The first input is a +pattern and the second input is a sentence. The output is <CODE>true</CODE> if the +sentence matches the pattern, otherwise <CODE>false</CODE>. + +<P><PRE>? <U>print match [Every # is a #] [Every book is a joy to read]</U> +true +? <U>print match [Every # is a #] [Every adolescent is obnoxious]</U> +false +</PRE> + +<P>Patterns can be more complicated than the ones I've shown so far. In the +following paragraphs I'll introduce the many ways that you can modify +patterns to control which sentences they'll match. As you read, you should +make up sample patterns of your own and try them out with <CODE>match</CODE>. + +<P>Often, in a conversational program, it's not good enough just to know +whether or not a sentence matches a pattern. You also want to know the +pieces of the sentence that match the variable parts of the pattern. <CODE> +Match</CODE> meets this requirement by allowing you to tell it the names of +variables that you want to receive the matching words from the sentence. +Here is an example: + +<P><PRE>? <U>print match [#food is for #animal] [Hay is for horses]</U> +true +? <U>show :food</U> +[Hay] +? <U>show :animal</U> +[horses] + +? <U>print match [#food is for #animal] [C++ is for the birds]</U> +true +? <U>show :food</U> +[C++] +? <U>show :animal</U> +[the birds] +</PRE> + +<P>Here is a short <A NAME="converse"> conversational program using the parts of the +pattern matcher we've discussed so far. + +<P><PRE>to converse +local [response name like] +print [Hi, my name is Toby and I like ice cream] +print [Tell me about yourself] +make "response readlist +if match [# my name is #name] :response [do.name strip.and :name] +if match [# i like #like] :response [do.like strip.and :like] +print [Nice meeting you!] +end + +to do.name :name +print sentence "Hello, :name +end + +to do.like :like +print sentence [I'm glad you like] :like +end + +to strip.and :text +local "short +if match [#short and #] :text [output :short] +output :text +end + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I like Chinese food</U> +Hello, Brian +I'm glad you like Chinese food +Nice meeting you! + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>I like spaghetti and meat balls</U> +I'm glad you like spaghetti +Nice meeting you! +</PRE> + +<P>If <CODE>match</CODE> outputs <CODE>false</CODE>, there is no guarantee of what +values will end up in the variables mentioned in the pattern. <CODE> +Converse</CODE> uses the result of the match only if <CODE>match</CODE> outputs <CODE> +true</CODE>. + +<P><CODE>Converse</CODE> looks for each part of the sentence (the name and the thing +the person likes) in two steps: first it finds the keywords <CODE>my name is</CODE> +or <CODE>I like</CODE> and extracts everything following those phrases, then it +looks within what it extracted for the word <CODE>and</CODE> and removes anything +following it. For example, when I typed + +<P><PRE>My name is Brian and I like Chinese food +</PRE> + +<P>the result of matching the name pattern was to give the variable +<CODE>name</CODE> the value + +<P><PRE>Brian and I like Chinese food +</PRE> + +<P>Then <CODE>strip.and</CODE> used a second pattern to eliminate everything +after the <CODE>and</CODE>. You might be tempted to extract the name in one step +by using a pattern like + +<P><PRE>[# my name is #name and #] +</PRE> + +<P>but I wanted to avoid that pattern because it won't match a +sentence that only contains + +<P><PRE>my name is Mary +</PRE> + +<P>without expressing any likes or dislikes. The program as I've +written it does accept these shorter sentences also. Later we'll see a more +complicated pattern that accepts sentences with or without <CODE>and</CODE> +using a single pattern. + +<P>The special symbol <CODE>#</CODE> in a pattern represents zero or more words. <CODE> +Match</CODE> recognizes other symbols with different meanings: + +<P><TABLE> +<TR><TD><CODE>#</CODE><TD> zero or more +<TR><TD><CODE>&</CODE><TD> one or more +<TR><TD><CODE>?</CODE><TD> zero or one +<TR><TD><CODE>!</CODE><TD> exactly one +</TABLE> + +<P>For example, if you'd like the <CODE>converse</CODE> program to recognize +only the first name of the person using it, you could change the relevant +pattern to + +<P><PRE>[# my name is !name #] +</PRE> + +<P>Then a conversation with the program might look like this: + +<P><PRE>? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian Harvey</U> +Hello, Brian +Nice meeting you! +? +</PRE> + +<P>The word <CODE>!name</CODE> in the pattern matched just the single word +<CODE>Brian</CODE>, not the multiple words <CODE>Brian Harvey</CODE> that the original +pattern would have selected. (If you modify <CODE>converse</CODE> in this way, it +should be possible to remove the invocation of <CODE>strip.and</CODE> in computing +the input to <CODE>do.name</CODE>. The single word stored in the variable <CODE>name</CODE> won't +contain any other clauses.) + +<P>So far, the patterns we've seen allow two extremes: the pattern can include +a single word that must be matched exactly, or it can allow <EM>any</EM> word +at all to be matched. It is also possible to write a pattern that calls for +words in some specified category--that is, words that satisfy some +predicate. Here is an example: + +<P><PRE>to ask.age +local "age +print [How old are you?] +if match [# !age:numberp #] readlist ~ + [print (sentence [You are] :age [years old.]] +end + +? <U>ask.age</U> +How old are you? +<U>I will be 36 next month</U> +You are 36 years old. +</PRE> + +<P>This is a slightly silly example, but it does illustrate the use +of a predicate to restrict which words can match a variable part of a +pattern. The pattern used in <CODE>ask.age</CODE> looks for a single word for +which <CODE>numberp</CODE> is <CODE>true</CODE>, that is, for a number. Any number of +words surrounding the number are allowed. + +<P>Of course, a predicate used in a pattern need not be a primitive one like +<CODE>numberp</CODE>. You may find it useful to write your own predicates that +select categories of words. Such a predicate might have a list built in: + +<P><PRE>to colorp :word +output memberp :word [red orange yellow blue green violet white black] +end +</PRE> + +<P>Or you could check some inherent property of a word: + +<P><PRE>to ends.y :word +output equalp last :word "y +end +</PRE> + +<P>In either case, what is essential is that your predicate must take +a word as its single input, and must output <CODE>true</CODE> if you want <CODE> +match</CODE> to accept the word to fill a slot in the pattern. + +<P>It is most common to want a predicate like <CODE>colorp</CODE> +above--one that tests its input word for membership in a certain list. A +special notation makes it possible to include such a list in the pattern +itself, instead of writing a predicate procedure. For example, suppose you +are writing a quiz program, and you want to ask the question, "What is the +quickest route from Boston to Framingham?" You'd like to accept answers +like these: + +<P><PRE>Mass Pike +the Massachusetts Turnpike +the Pike +</PRE> + +<P>but not <CODE>the Ohio Turnpike</CODE>! Here is a pattern you could use. + +<P><PRE>[?:in [the] ?:in [Mass Massachusetts] !:in [Pike Turnpike]] +</PRE> + +<P>The special predicate <CODE>in</CODE> is a version of <CODE>memberp</CODE> that +knows to look in the pattern, right after the element that invokes <CODE>in</CODE>, +for the list of acceptable words. This pattern accepts zero or one <CODE> +the</CODE>, zero or one of <CODE>Mass</CODE> or <CODE>Massachusetts</CODE>, and one of <CODE> +Pike</CODE> or <CODE>Turnpike</CODE>. That is, the first two words are optional and the +third is required. + +<P>Earlier I rejected the use of a pattern + +<P><PRE>[# my name is #name and #] +</PRE> + +<P>because I wanted also to be able to accept sentences without <CODE> +and</CODE> following the name. I promised to exhibit a pattern that would accept +both sentence forms. Here it is: + +<P><PRE>[# my name is #name:notand #] +</PRE> + +<P>This pattern uses a predicate <CODE>notand</CODE> that allows any word +except <CODE>and</CODE>. It's easy to write this predicate: + +<P><PRE>to notand :word +output not equalp :word "and +end +</PRE> + +<P>(By the way, the symbols indicating the number of words to match are meant +to be mnemonic. The question mark indicates that it's questionable whether +or not the word will be in the sentence. The exclamation point looks a +little like a digit 1, and also shouts emphatically that the word is +present. The number sign means that any number of words (including zero) is +okay, and the ampersand indicates that more words are required, namely at +least one instead of at least zero.) + +<P>We've seen various combinations of quantifiers (that's what I'll call +the characters like <CODE>#</CODE> that control how many words are matched), +variable names, and predicates: + +<P><TABLE> +<TR><TD><CODE>#</CODE><TD> no variable, no predicate (accept any word) +<TR><TD><CODE>#name</CODE><TD> set variable, no predicate +<TR><TD><CODE>?:in</CODE><TD> no variable, test predicate +<TR><TD><CODE>!age:numberp</CODE><TD> set variable, test predicate +</TABLE> + +<P>We are now about to discuss some of the more esoteric features of +the <CODE>match</CODE> program. So far, we have always compared a pattern against +a <EM>sentence,</EM> a list of words. It is also possible to match a pattern +against a structured list, with smaller lists among its members. <CODE>Match</CODE> +treats a sublist just like a word, if you don't want to examine the inner +structure of the sublist. Here are some examples. + +<P><PRE>? <U>print match [hello #middle goodbye] [hello is [very much] like goodbye]</U> +true +? <U>show :middle</U> +[is [very much] like] +? <U>print match [hi #middle:wordp bye] [hi and then bye]</U> +true +? <U>show :middle</U> +[and then] +? <U>print match [hi #middle:wordp bye] [hi and [then] bye]</U> +false + +? <U>print match [hi #mid:wordp #dle:listp bye] [hi and [then] bye]</U> +true +? <U>show :mid show :dle</U> +[and] +[[then]] +</PRE> + +<P>A more interesting possibility is to ask <CODE>match</CODE> to apply a +sub-pattern to a sublist. This is done by using the pattern (that is, a +list) in place of the name of a predicate. Here is an example: + +<P><PRE>? <U>print match [a #:[x # y] b] [a [x 111 y] [x 222 y] b]</U> +true +? <U>print match [a #:[x # y] b] [a [x 333 zzz] b]</U> +false +</PRE> + +<P>It is possible to include variable names in the subpattern, but +this makes sense only if the quantifier outside the pattern is <CODE>!</CODE> or +<CODE>?</CODE> because otherwise you may be trying to assign more than one value to +the same variable. Here's what I mean: + +<P><PRE>? <U>print match [a #all:[x #some y] b] [a [x 111 y] [x 222 y] b]</U> +true +? <U>show :all show :some</U> +[[x 111 y] [x 222 y]] +[222] +</PRE> + +<P>The variable <CODE>all</CODE> is properly set to contain both of the +lists that matched the subpattern, but the variable <CODE>some</CODE> only contains +the result of the second match. + +<P>If a list appears in a pattern without a quantifier before it, <CODE>match</CODE> +treats it as if it were preceded by "<CODE>!:</CODE>"; in other words, it tries +to match the subpattern exactly once. + +<P>A pattern element like <CODE>#:predicate</CODE> can match several members of the +target sentence; the predicate is applied to each candidate member +separately. For example: + +<P><PRE>? <U>print match [#nums:numberp #rest] [3 2 1 blastoff!]</U> +true +? <U>show :nums show :rest</U> +[3 2 1] +[blastoff!] +</PRE> + +<P>Sometimes you may want to match several members of a sentence, but +apply the predicate to all of the candidates <EM>together</EM> in one list. +To do this, use the quantifier <CODE>@</CODE>: + +<P><PRE>to threep :list +output equalp count :list 3 +end + +? <U>print match [@begin:threep #rest] [a b c d e]</U> +true +? <U>show :begin show :rest</U> +[a b c] +[d e] +</PRE> + +<P>In this example, I haven't used the predicate to examine the <EM> +nature</EM> of the matching words, but rather to control the <EM>number</EM> of +words that are matched. Here is another example that looks "inside" the +matching words. + +<P><PRE>to headtailp :list +if (count :list) < 2 [output "false] +output equalp first :list last :list +end + +? <U>print match [#front @good:headtailp #back] [a b c x d e f g x h i]</U> +true +? <U>show :front show :good show :back</U> +[a b c] +[x d e f g x] +[h i] +</PRE> + +<P>Think about all the different tests that <CODE>match</CODE> has to make +to find this match! Also, do you see why the first instruction of <CODE> +headtailp</CODE> is needed? + +<P>Some patterns are <EM>ambiguous;</EM> that is, there might be more than one +way to associate words from the matched sentence with quantifiers in the +pattern. For example, what should the following do? + +<P><PRE>match [#front xx #back] [a b c d xx e f g xx h i] +</PRE> + +<P>The word <CODE>xx</CODE> appears twice in the matched sentence. The +program could choose to use everything up to the first <CODE>xx</CODE> as <CODE> +front</CODE>, leaving six words for <CODE>back</CODE>, or it could choose to use +everything up to the second <CODE>xx</CODE> as <CODE>front</CODE>, leaving only two words +for <CODE>back</CODE>. In fact, each quantifier, starting from the left, matches +as many words as it can: + +<P><PRE>? <U>print match [#front xx #back] [a b c d xx e f g xx h i]</U> +true +? <U>show :front show :back</U> +[a b c d xx e f g] +[h i] +</PRE> + +<P>If that's not what you want, the quantifier <CODE>^</CODE> behaves +like <CODE>#</CODE> except that it matches as <EM>few</EM> words as possible. + +<P><PRE>? <U>print match [^front xx #back] [a b c d xx e f g xx h i]</U> +true +? <U>show :front show :back</U> +[a b c d] +[e f g xx h i] +</PRE> + +<P>We can use the <CODE>^</CODE> quantifier to fix a bug +in <A HREF="v2ch7.html#converse">the <CODE>converse</CODE> +program</A>: + +<P><PRE>? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I like bacon and eggs</U> +Hello, Brian and I like bacon +I'm glad you like bacon +Nice meeting you! +</PRE> + +<P>The problem here is that the pattern used by <CODE>strip.and</CODE> +divided the sentence at the second <CODE>and</CODE>, just as the earlier example +chose the second <CODE>xx</CODE> when I used <CODE>#</CODE> as the quantifier. We +can fix it this way: + +<P><PRE>to strip.and :text +local "short +if match [^short and #] :text [output :short] +output :text +end + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I like bacon and eggs</U> +Hello, Brian +I'm glad you like bacon +Nice meeting you! +</PRE> + +<P>There is just one more special feature of <CODE>match</CODE> left to describe. It +is another special predicate, like <CODE>in</CODE>, but this one is called <CODE> +anyof</CODE>. When you use <CODE>anyof</CODE>, the next member of the pattern should be +a <EM>list of patterns</EM> to test. <CODE>Match</CODE> tries each pattern in turn, +applied to list members as determined by the quantifier used. In practice, +though, <CODE>anyof</CODE> only makes sense when applied to several members as a +group, so the quantifier <CODE>@</CODE> should always be used. An example may make +this clear. I'm going to rewrite the <CODE>converse</CODE> program to check for +names and likes all at once. + +<P><PRE>to converse +local [response name like rest] +print [Hi, my name is Toby and I like ice cream] +print [Tell me about yourself] +make "response readlist +while match [@:anyof [[My name is #name:notand] + [I like #like:notand] + [&:notand]] + ?:in [and] #rest] ~ + :line ~ + [make "response :rest] +if not emptyp :name [print sentence "Hello, :name] +if not emptyp :like [print sentence [I'm glad you like] :like] +print [Nice meeting you!] +end +</PRE> + +<P>This program uses the <CODE>notand</CODE> predicate I wrote earlier. It +checks for clauses separated by the word <CODE>and</CODE>. Each clause can match +any of three patterns, one for the name, one for the liking, and a general +pattern that matches any other clause. The clauses can appear in any order. + +<P><PRE>? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>My name is Brian and I hate cheese</U> +Hello, Brian +Nice meeting you! + +? <U>converse</U> +Hi, my name is Toby and I like ice cream +Tell me about yourself +<U>I like wings and my name is Jonathan</U> +Hello, Jonathan +I'm glad you like wings +Nice meeting you! +</PRE> + +<P><H2>Reinventing <CODE>Equalp</CODE> for Lists</H2> + +<P> + +<P><CODE>Match</CODE> is a kind of fancy <CODE>equalp</CODE> with a complicated understanding +of what equality means. One way to approach an understanding of <CODE>match</CODE> +is to begin with this question: Suppose Logo's primitive <CODE>equalp</CODE> only +worked for comparing two <EM>words</EM> for equality. (For the remainder of +this section, I won't use the word <CODE>equalp</CODE> at all; I'll call this +imaginary primitive <CODE>wordequalp</CODE> instead.) How would you write a +<CODE>listequalp</CODE> to compare two lists? This is basically a <CODE> +butfirst</CODE>-style recursive operation, but you have to be a little careful +about the fact that either input might be smaller than the other. + +<P><PRE>to listequalp :a :b +if emptyp :a [output emptyp :b] +if emptyp :b [output "false] +if wordequalp first :a first :b ~ + [output listequalp butfirst :a butfirst :b] +output "false +end +</PRE> + +<P>(This procedure contains the instruction <CODE>output "false</CODE> +twice, but it never says <CODE>output "true</CODE>. How can it ever say that two +lists are equal?) + +<P>There is one deficiency in the procedure as I've defined +it. The problem is that it only works for <EM>sentences</EM>--lists whose +members are words. If either list contains a sublist, <CODE>listequalp</CODE> will +try to apply <CODE>wordequalp</CODE> to that sublist. If you enjoy the exercise of +reinventing Logo primitives, you may want to fix that. But for my purposes, +the version here is good enough as a basis for further development of the +pattern matcher. + +<P><H2>A Simple Pattern Matcher</H2> + +<P>We can extend the idea of <CODE>listequalp</CODE> slightly to make a pattern +matcher that only recognizes the special word <CODE>#</CODE> to mean "match zero +or more words." We won't do any of the fancy things like storing the +matching words in a variable. + +<P><PRE>to match :pat :sen +if emptyp :pat [output emptyp :sen] +if emptyp :sen [if equalp first :pat "# + [output match butfirst :pat :sen] + [output "false]] +if equalp first :pat "# [output or match butfirst :pat :sen + match :pat butfirst :sen] +if equalp first :pat first :sen ~ + [output match butfirst :pat butfirst :sen] +output "false +end +</PRE> + +<P>The end test is more complicated in this program than in <CODE>listequalp</CODE> +because the combination of an empty sentence and a nonempty pattern can +still be a match, if the pattern is something like <CODE>[#]</CODE> that matches +zero or more words. + +<P> + +<P>The really interesting part of this procedure is what happens if a <CODE>#</CODE> +is found in the pattern. The match succeeds (outputs <CODE>true</CODE>) if one of +two smaller matches succeeds. The two smaller matches correspond to two +possible conditions: the <CODE>#</CODE> can match zero words, or more than zero. +The first case is detected by the expression + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>For example, suppose you want to evaluate + +<P><PRE>match [# cream] [cream] +</PRE> + +<P>This expression should yield the value <CODE>true</CODE>, with the <CODE> +#</CODE> matching no words in the sentence. In this example the expression + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>is equivalent to + +<P><PRE>match [cream] [cream] +</PRE> + +<P>which straightforwardly outputs <CODE>true</CODE>. + +<P>On the other hand, the expression + +<P><PRE>match :pat butfirst :sen +</PRE> + +<P>comes into play when the <CODE>#</CODE> has to match at least one word. +For example, consider the expression + +<P><PRE>match [# cream] [ice cream] +</PRE> + +<P>Here the <CODE>#</CODE> should match the word <CODE>ice</CODE>. The expression + +<P><PRE>match :pat butfirst :sen +</PRE> + +<P>is here equivalent to + +<P><PRE>match [# cream] [cream] +</PRE> + +<P>But this is the example that was <CODE>true</CODE> just above. + +<P>If the <CODE>#</CODE> has to match more than one word, several recursive +invocations of <CODE>match</CODE> are required, each one taking the <CODE>butfirst</CODE> +of the sentence once. For example, suppose we start with + +<P><PRE>match [# cream] [vanilla ice cream] +</PRE> + +<P>Here is the sequence of recursive invocations leading to a <CODE> +true</CODE> match: + +<P><PRE>match :pat butfirst :sen match [# cream] [ice cream] + match :pat butfirst :sen match [# cream] [cream] + match butfirst :pat :sen match [cream] [cream] +</PRE> + +<P>I have been talking as if Logo only evaluated whichever of the two +expressions + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>and + +<P><PRE>match :pat butfirst :sen +</PRE> + +<P>is appropriate for the particular inputs used. Actually, <EM> +both</EM> expressions are evaluated each time, so there are many recursive +invocations of <CODE>match</CODE> that come out <CODE>false</CODE>. However, the purpose +of the primitive operation <CODE>or</CODE> is to output <CODE>true</CODE> if <EM> +either</EM> of its inputs is <CODE>true</CODE>. To understand fully how <CODE>match</CODE> +works, you'll almost certainly have to trace a few examples carefully by +hand. + +<P><H2>Efficiency and Elegance</H2> + + +<P>Pattern matching is a complicated task, and even the best-written programs +are not blindingly fast. But what is the "best-written" program? In the +simple pattern matcher of the last section, the instruction + +<P><PRE>if equalp first :pat "# [output or match butfirst :pat :sen + match :pat butfirst :sen] +</PRE> + +<P>is extremely compact and elegant. It packs a lot of power into a +single instruction, by combining the results of two recursive invocations +with <CODE>or</CODE>. The similarity of the inputs to the two invocations is also +appealing. + +<P>The trouble with this instruction is that it is much slower than necessary, +because it always tries both recursive invocations even if the first one +succeeds. A more efficient way to program the same general idea would be +this: + +<P><PRE>if equalp first :pat "# ~ + [if match butfirst :pat :sen + [output "true] + [output match :pat butfirst :sen]] +</PRE> + +<P>This new version is much less pleasing to the eye, but it's much +faster. The reason is that if the expression + +<P><PRE>match butfirst :pat :sen +</PRE> + +<P>outputs <CODE>true</CODE>, then the other recursive invocation is avoided. + +<P>It's a mistake to make efficiency your only criterion for program style. +Sometimes it's worth a small slowdown of your program to achieve a large +gain in clarity. But this is a case in which the saving is quite +substantial. Here is a partial trace of the evaluation of + +<P><PRE>match [cat # bat] [cat rat bat] +</PRE> + +<P>using the original version of the procedure: + +<P><PRE>match [cat # bat] [cat rat bat] [cat # bat] [cat rat bat] + match butfirst :pat butfirst :sen [# bat] [rat bat] + match butfirst :pat :sen [bat] [rat bat] + match :pat butfirst :sen [# bat] [bat] + match butfirst :pat :sen [bat] [bat] + match butfirst :pat butfirst :sen [] [] +* match :pat butfirst :sen [# bat] [] +* match butfirst :pat :sen [bat] [] +</PRE> + +<P>The two invocations marked with asterisks are avoided by using the +revised version. These represent 25% of the invocations of <CODE>match</CODE>, a +significant saving. (Don't think that the program necessarily runs 25% +faster. Not all invocations take the same amount of time. This is just a +rough measure.) If there were more words after the <CODE>#</CODE> in the pattern, +the saving would be even greater. + +<P>In this situation we achieve a large saving of time by reorganizing the flow +of control in the program. This is quite different from a more common sort +of concern for efficiency, the kind that leads people to use shorter +variable names so that the program will be a little smaller, or to worry +about whether to use <CODE>fput</CODE> or <CODE>sentence</CODE> in a case where either +would do. These small "bumming" kinds of optimization are rarely worth +the trouble they cause. Figuring out how many times <CODE>match</CODE> is invoked +using each version is a simple example of the branch of computer science +called <EM>analysis of algorithms;</EM> a more profound analysis might use +mathematical techniques to compare the two versions in general, rather than +for a single example. + +<P>In the full version of the pattern matcher, listed at the end of this +project description, I've taken some care to avoid unnecessary matching. +On the other hand, the full version has less flexibility than the simple +version because of its ability to assign matching words to variables. +Consider a case like + +<P><PRE>match [# #] [any old list of words] +</PRE> + +<P>Which <CODE>#</CODE> matches how many words? It doesn't matter if you +don't store the result of the match in variables. But if the pattern is +<CODE>[#a #b]</CODE> instead, there has to be a uniform rule about which part of +the pattern matches what. (In my pattern matcher, all of the words would be +assigned to <CODE>a</CODE>, and <CODE>:b</CODE> would be empty. In general, pattern +elements toward the left match as many words as possible when there is any +ambiguity.) The simple pattern matcher doesn't have this problem, and can +be written to match the ambiguous pattern whichever way gives a <CODE>true</CODE> +result most quickly. + +<P>By the way, what if the two expressions that invoke <CODE>match</CODE> recursively +were reversed in the revised instruction? That is, what if the instruction +were changed again, to read + +<P><PRE>if equalp first :pat "# ~ + [if match :pat butfirst :sen + [output "true] + [output match butfirst :pat :sen]] +</PRE> + +<P>Would this be more or less efficient than the previous version? + +<P><H2>Logo's Evaluation of Inputs</H2> + + + +<P>The discussion about efficiency started because Logo <EM>evaluates</EM> the +inputs to the primitive operation <CODE>or</CODE> before invoking the procedure. +That is, in the example in question, Logo invokes <CODE>match</CODE> twice before +using <CODE>or</CODE> to check whether either invocation output <CODE>true</CODE>. +This is consistent with the way Logo does things in general: To evaluate an +expression that uses some procedure, Logo first evaluates all the inputs for +that procedure, and then invokes the procedure with the evaluated inputs. +Logo's rule is extremely consistent (except for the <CODE>to</CODE> command), but +it isn't the only possible way. In Lisp, a language that's like Logo in +many ways, each procedure can choose whether or not its inputs should be +evaluated in advance. + +<P> + +<P>An example may make it clearer what I mean by this. Lisp has a +procedure called <CODE>set</CODE> that's +equivalent to the Logo <CODE>make</CODE>. You say + +<P><PRE>(set 'var 27) +</PRE> + +<P>as the equivalent of + +<P><PRE>make "var 27 +</PRE> + +<P>But Lisp also has a version called <CODE>setq</CODE> whose first input is +<EM>not</EM> evaluated before <CODE>setq</CODE> is invoked. It's as if there were +an automatic quote mark before the first input, so you just say + +<P><PRE>(setq var 27) +</PRE> + +<P>with the same effect as the other examples. + +<P>Except for the special format of the <CODE>to</CODE> command that forms the title +line of a procedure, Berkeley Logo and many other Logo dialects +do not have any form of automatically-quoted inputs. The design principle +was that consistency of evaluation would make the rules easier to +understand. Some other versions of Logo do use auto-quoting for certain +procedures. For example, in Berkeley Logo, to edit the definition of a +procedure named <CODE>doit</CODE> you type the instruction + +<P><PRE>edit "doit +</PRE> + +<P>But in some other versions of Logo you instead say + +<P><PRE>edit doit +</PRE> + +<P>because in those versions, the <CODE>edit</CODE> command auto-quotes its +input. One possible reason for this design decision is that teachers of +young children like to present Logo without an explicit discussion of the +evaluation rules. They teach the <CODE>edit</CODE> command as a special case, +rather than as just the invocation of a procedure like everything else. +Using this approach, auto-quoting the input avoids having to explain what +that quotation mark means. + +<P>The advantage of the non-auto-quoting version of <CODE>edit</CODE> isn't just in +some abstract idea of consistency. It allows us to take advantage of +composition of functions. Suppose you are working on a very large project, +a video game, with hundreds of procedures. You want to edit all the procedures +having to do with the speed of the spaceships, or whatever moves around the +screen in this game. Luckily, all the procedures you want have the word +<CODE>speed</CODE> as part of their names; they are called <CODE>shipspeed</CODE> or +<CODE>asteroidspeed</CODE> or <CODE>speedcontrol</CODE>. You can say + +<P><PRE>edit filter [substringp "speed ?] procedures +</PRE> + +<P>(<CODE>Procedures</CODE> is a Berkeley Logo primitive operation that +outputs a list of all procedures defined in the workspace; <CODE>substringp</CODE> +is a predicate that checks whether one word appears as part of a longer +word.) An auto-quoting <CODE>edit</CODE> command wouldn't have this flexibility. + +<P>The reason all this discussion is relevant to the pattern matcher is that +the Lisp versions of <CODE>or</CODE> and <CODE>and</CODE> have auto-quoted inputs, which +get evaluated one by one. As soon as one of the inputs to <CODE>or</CODE> turns +out to be <CODE>true</CODE> (or one of the inputs to <CODE>and</CODE> is <CODE>false</CODE>), the +evaluation stops. This is very useful not only for efficiency reasons, as +in the discussion earlier, but to prevent certain kinds of errors. For +example, consider this Logo instruction: + +<P><PRE>if not emptyp :list [if equalp first :list 1 [print "one]] +</PRE> + +<P>It would be pleasant to be able to rewrite that instruction this +way: + +<P><PRE>if and (not emptyp :list) (equalp first :list 1) [print "one] +</PRE> + +<P>The use of <CODE>and</CODE>, I think, makes the program structure clearer +than the nested <CODE>if</CODE>s. That is, it's apparent in the second version +that something (the <CODE>print</CODE>) is to be done if two conditions are met, and +that that's all that happens in the instruction. In the first version, +there might have been another instruction inside the range of the first +(outer) <CODE>if</CODE>; you have to read carefully to see that that isn't so. + +<P>Unfortunately, the second version won't work in Logo. If <CODE>:list</CODE> is in +fact empty, the expression + +<P><PRE>(equalp first :list 1) +</PRE> + +<P>is evaluated before <CODE>and</CODE> is invoked; this expression causes +an error message because <CODE>first</CODE> doesn't accept an empty input. In +Lisp, the corresponding instruction <EM>would</EM> work, because the two +predicate expressions would be evaluated serially and the second wouldn't +be evaluated if the first turned out to be false. + +<P>The serial evaluation of inputs to <CODE>and</CODE> and <CODE>or</CODE> is so +often useful that some people have proposed it for Logo, even at the cost of +destroying the uniform evaluate-first rule. But if you want a serial <CODE> +and</CODE> or <CODE>or</CODE>, it's easy enough to write them, if you explicitly quote +the predicate expressions that are its inputs: + +<P><PRE>to serial.and :pred1 :pred2 +if not run :pred1 [output "false] +output run :pred2 +end + +to serial.or :pred1 :pred2 +if run :pred1 [output "true] +output run :pred2 +end +</PRE> + +<P>Here's how you would use <CODE>serial.and</CODE> to solve the problem +with the nested <CODE>if</CODE>s: + +<P><PRE>if (serial.and [not emptyp :list] [equalp first :list 1]) [print "one] +</PRE> + +<P>Similarly, you could use <CODE>serial.or</CODE> instead of <CODE>or</CODE> to +solve the efficiency problem in the first version of the pattern matcher: + +<P><PRE>output serial.or [match butfirst :pat :sen] [match :pat butfirst :sen] +</PRE> + +<P>These procedures depend on the fact that the predicate expressions +that are used as their inputs are presented inside square brackets; that's +why they are not evaluated before <CODE>serial.and</CODE> or <CODE>serial.or</CODE> is +invoked. + +<P><H2>Indirect Assignment</H2> + + + +<P>From now on, I'll be talking about the big pattern matcher, not the simple +one I introduced to illustrate the structure of the problem. Here is the +top-level procedure <CODE>match</CODE>: + +<P><PRE>to match :pat :sen +local [special.var special.pred special.buffer in.list] +if or wordp :pat wordp :sen [output "false] +if emptyp :pat [output emptyp :sen] +if listp first :pat [output special fput "!: :pat :sen] +if memberp first first :pat [? # ! & @ ^] [output special :pat :sen] +if emptyp :sen [output "false] +if equalp first :pat first :sen ~ + [output match butfirst :pat butfirst :sen] +output "false +end +</PRE> + +<P>As you'd expect, there are more cases to consider in this more +featureful version, but the basic structure is similar to the simple +matcher. The instructions starting <CODE>if emptyp</CODE>, <CODE>if memberp</CODE>, +<CODE>if emptyp</CODE>, and <CODE>if equalp</CODE> play the same roles as similar +instructions in the other version. (The <CODE>memberp</CODE> test replaces the +comparison against the word <CODE>#</CODE> with a wider range of choices.) + +<P>The first <CODE>if</CODE> instruction tests for errors in the format of the pattern +or the sentence to be matched, in which a word is found where a list was +expected. It's not important if you use well-formed inputs to <CODE>match</CODE>. +The <CODE>listp</CODE> test essentially converts a pattern like + +<P><PRE>[foo [some # pattern] baz] +</PRE> + +<P>to the equivalent form + +<P><PRE>[foo !:[some # pattern] baz] +</PRE> + +<P>The interesting new case comes when <CODE>match</CODE> sees a word in the +pattern that starts with one of the six special quantifier characters. In +this case, <CODE>match</CODE> invokes <CODE>special</CODE> to check for a match. + +<P>One of the interesting properties of <CODE>special</CODE> is that it has to be able +to assign a value to a variable whose name is not built into the program, +but instead is part of the <EM>data</EM> used as input to the program. +That is, if the word + +<P><PRE>?howmany:numberp +</PRE> + +<P>appears in the pattern, <CODE>special</CODE> (or one of its +subprocedures) must assign a value to the variable named <CODE>howmany</CODE>, but +there is no instruction of the form + +<P><PRE>make "howmany ... +</PRE> + +<P>anywhere in the program. Instead, <CODE>match</CODE> has <EM>another</EM> +variable, whose name is <CODE>special.var</CODE>, whose <EM>value</EM> is the <EM> +name</EM> <CODE>howmany</CODE>. The assignment of the matching words to the +pattern-specified variable is done with an instruction like + +<P><PRE>make :special.var ... +</PRE> + +<P>Here the first input to <CODE>make</CODE> is not a quoted word, as usual, +but an expression that must be evaluated to figure out which variable to use. + +<P><CODE>Special</CODE>, then, has two tasks. First it must divide a word like + +<P><PRE>?howmany:numberp +</PRE> + +<P>into its component parts; then it must carry out the matching +tasks that are the <EM>meaning</EM> of those parts. These two tasks are like +a smaller version of what a programming language interpreter like Logo +does. Finding the meaningful parts of an instruction is called the <EM> +syntax</EM> of a language, and understanding what the parts mean is called the +<EM>semantics</EM> of the language. <CODE>Special</CODE> has two instructions, one +for the syntax and one for the semantics: + +<P><PRE>to special :pat :sen +set.special parse.special butfirst first :pat " +output run word "match first first :pat +end +</PRE> + +<P>To <EM>parse</EM> something is to divide it into its pieces. <CODE> +Parse.special</CODE> outputs a list of the form + +<P><PRE>[howmany numberp] +</PRE> + +<P>for the example we're considering. Then <CODE>set.special</CODE> assigns +the two members of this list as the values of two variables. The variable +named <CODE>special.var</CODE> is given the value <CODE>howmany</CODE>, and the variable +named <CODE>special.pred</CODE> is given the value <CODE>numberp</CODE>. This preliminary +work is what makes possible the indirect assignment described earlier. + +<P><H2>Defaults</H2> + +<P>What happens if the pattern has a word like + +<P><PRE>?:numberp +</PRE> + +<P>without a variable name? What happens when the program tries to +assign a value to the variable named in the pattern? <CODE>Set.special</CODE> +contains the instruction + +<P><PRE>if emptyp :special.var [make "special.var "special.buffer] +</PRE> + +<P>The effect of this instruction is that if you do not mention a +variable in the pattern, the variable named <CODE>special.buffer</CODE> will be +used to hold the results of the match. This variable is the <EM> +default</EM> variable, the one used if no other is specified. + +<P>It's important, by the way, that the variable <CODE>special.buffer</CODE> is +declared to be local in procedure <CODE>match</CODE>. What makes it important is +that <CODE>match</CODE> is recursive; if you use a pattern like + +<P><PRE>[a # b # c] +</PRE> + +<P>then the matching of the second <CODE>#</CODE> is a subproblem of the +matching of the first one. <CODE>Match</CODE> invokes <CODE>special</CODE>, which invokes +<CODE>match#</CODE>, which invokes <CODE>#test</CODE>, which invokes <CODE> +match</CODE> on the <CODE>butfirst</CODE> of the pattern. That <CODE>butfirst</CODE> contains +another <CODE>#</CODE>. Each of these uses the variable <CODE>special.buffer</CODE> to +remember the words it is trying as a match; since the variable is declared +local, the two don't get confused. (This means, by the way, that you can +really confuse <CODE>match</CODE> by using the same variable name twice in a +pattern. It requires a fairly complicated pattern to confuse <CODE>match</CODE>, +but here is an example. The first result is correct, the second incorrect. + +<P><PRE>? <U>print match [a #x b &y ! c] [a i b j c b k c]</U> +true +? <U>show :x show :y</U> +[i] +[j c b] +? <U>print match [a #x b &x ! c] [a i b j c b k c]</U> +false +</PRE> + +<P>The only change is that the variable name <CODE>x</CODE> is used twice in +the second pattern, and as a result, <CODE>match</CODE> doesn't find the correct +match. You'll know that you really understand how <CODE>match</CODE> works if you +can explain why it <EM>won't</EM> fail if the <CODE>!</CODE> is removed from the +pattern.) + +<P>When writing a tool to be used in other projects, especially if the tool +will be used by other people, it's important to think about defaults. What +should the program do if some piece of information is missing? If you don't +provide for a default explicitly, the most likely result is a Logo error +message; your program will end up trying to take <CODE>first</CODE> of an empty +list, or something like that. + +<P>Another default in the pattern matcher is for the predicate used to test +matches. For example, what happens when the word + +<P><PRE>?howmany +</PRE> + +<P>appears in the pattern, without a predicate? This case is +recognized by <CODE>parse.special</CODE>, in the instruction + +<P><PRE>if emptyp :word [output list :var "always] +</PRE> + +<P>The special predicate <CODE>always</CODE> is used if no other is given in +the pattern. <CODE>Always</CODE> has a very simple definition: + +<P><PRE>to always :x +output "true +end +</PRE> + +<P> + +<P><H2>Program as Data</H2> + + +<P>The instruction in <CODE>special</CODE> that carries out the semantics of a special +pattern-matching instruction word is + +<P><PRE>output run word "match first first :pat +</PRE> + +<P>If the pattern contains the word + +<P><PRE>?howmany:numberp +</PRE> + +<P>then this instruction extracts the quantifier character +<CODE>?</CODE> (the first character of the first word of the pattern) and makes +from it a procedure name <CODE>match?</CODE>. That name is then <CODE>run</CODE> as a Logo +expression; that is, <CODE>special</CODE> invokes a procedure whose name is <CODE> +match?</CODE>. + +<P>Most programming languages do not allow the invocation of a procedure based +on finding the name of the procedure in the program's data. Generally there +is a very strict separation between program and data. Being able to +manipulate data to create a Logo instruction, and then run it, is a very +powerful part of Logo. It is also used to deal with the names of predicates +included in the pattern; to see if a word in the sentence input to <CODE> +match</CODE> is a match for a piece of the pattern, the predicate <CODE>try.pred</CODE> +contains the instruction + +<P><PRE>output run list :special.pred quoted first :sen +</PRE> + +<P>This instruction generates a list whose first member is the name +of the predicate found in the pattern and whose second and last member is a +word from the sentence. Then this list is run as a Logo expression, which +should yield either <CODE>true</CODE> or <CODE>false</CODE> as output, indicating whether +or not the word is acceptable. + +<P><H2>Parsing Rules</H2> + +<P>When you are reading the program, remember that the kind of pattern that +I've written as + +<P><PRE>[begin !:[smaller # pattern] end] +</PRE> + +<P>is read by Logo as if I'd written + +<P><PRE>[begin !: [smaller # pattern] end] +</PRE> + +<P>That is to say, this pattern is a list of four members. I think +of the middle two as a unit, representing a single thing to match. The +sublist takes the place of a predicate name after the <CODE>!</CODE> quantifier. +But for Logo, there is no predicate name in the word starting with the +exclamation point; the pattern is a separate member of the large list. +That's why <CODE>set.special</CODE> uses the expression + +<P><PRE>emptyp :special.pred +</PRE> + +<P>to test for this situation, rather than <CODE>listp</CODE>. After <CODE> +parse.special</CODE> does its work, all it has found is a colon with nothing +following it. <CODE>Set.special</CODE> has to look at the next member of the +pattern list in order to find the subpattern. + +<P><H2>Further Explorations</H2> + +<P>Chapter 9 is a large program that uses <CODE>match</CODE>. It +may give you ideas for the ways in which this tool can be used in your own +programs. Here, instead of talking about applications of <CODE>match</CODE>, I'll +discuss some possible extensions or revisions of the pattern matcher itself. + +<P>There are many obvious small extensions. For example, to complement the +special <CODE>in</CODE> primitive, you could write <CODE>notin</CODE>, which would accept +all <EM>but</EM> the members of the following list. You could allow the use +of a number as the predicate, meaning that exactly that many matching words +are required. That is, in the example for which I invented the predicate +<CODE>threep</CODE>, I would instead be able to use + +<P><PRE>[@begin:3 #rest] +</PRE> + +<P>as the pattern. + +<P>There is no convenient way to say in a pattern that some subpattern can be +repeated several times, if the subpattern is more than a single word. That +is, in the second version of <CODE>converse</CODE>, instead of having to use <CODE> +while</CODE> to chop off pieces of the matched +sentence into a variable <CODE>rest</CODE>, I'd like to be able to say in the +pattern something like + +<P><PRE>[@@: [@:anyof [[my name is #name:notand] + [i like #like:notand] + [&:notand]] + ?:in [and]]] +</PRE> + +<P>Here the doubled atsign (<CODE>@@</CODE>) means that the entire pattern +that follows should be matched repeatedly instead of only once. + +<P>For other approaches to pattern matching, you might want to read about the +programming languages Snobol and Icon, each of which includes pattern +matching as one of its main features. + +<P> +<TABLE width="100%"><TR><TD><A HREF="../v2-toc2.html">(back to Table of Contents)</A> +<TD align="right"><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A> +</TABLE> + +<P><H2>Program Listing</H2> + +<P><P> +<P><PRE> +to match :pat :sen +local [special.var special.pred special.buffer in.list] +if or wordp :pat wordp :sen [output "false] +if emptyp :pat [output emptyp :sen] +if listp first :pat [output special fput "!: :pat :sen] +if memberp first first :pat [? # ! & @ ^] [output special :pat :sen] +if emptyp :sen [output "false] +if equalp first :pat first :sen ~ + [output match butfirst :pat butfirst :sen] +output "false +end + +;; Parsing quantifiers + +to special :pat :sen +set.special parse.special butfirst first :pat " +output run word "match first first :pat +end + +to parse.special :word :var +if emptyp :word [output list :var "always] +if equalp first :word ": [output list :var butfirst :word] +output parse.special butfirst :word word :var first :word +end + +to set.special :list +make "special.var first :list +make "special.pred last :list +if emptyp :special.var [make "special.var "special.buffer] +if memberp :special.pred [in anyof] [set.in] +if not emptyp :special.pred [stop] +make "special.pred first butfirst :pat +make "pat fput first :pat butfirst butfirst :pat +end + +to set.in +make "in.list first butfirst :pat +make "pat fput first :pat butfirst butfirst :pat +end + +;; Exactly one match + +to match! +if emptyp :sen [output "false] +if not try.pred [output "false] +make :special.var first :sen +output match butfirst :pat butfirst :sen +end + +;; Zero or one match + +to match? +make :special.var [] +if emptyp :sen [output match butfirst :pat :sen] +if not try.pred [output match butfirst :pat :sen] +make :special.var first :sen +if match butfirst :pat butfirst :sen [output "true] +make :special.var [] +output match butfirst :pat :sen +end + +;; Zero or more matches + +to match# +make :special.var [] +output #test #gather :sen +end + +to #gather :sen +if emptyp :sen [output :sen] +if not try.pred [output :sen] +make :special.var lput first :sen thing :special.var +output #gather butfirst :sen +end + +to #test :sen +if match butfirst :pat :sen [output "true] +if emptyp thing :special.var [output "false] +output #test2 fput last thing :special.var :sen +end + +to #test2 :sen +make :special.var butlast thing :special.var +output #test :sen +end + +;; One or more matches + +to match& +output &test match# +end + +to &test :tf +if emptyp thing :special.var [output "false] +output :tf +end + +;; Zero or more matches (as few as possible) + +to match^ +make :special.var [] +output ^test :sen +end + +to ^test :sen +if match butfirst :pat :sen [output "true] +if emptyp :sen [output "false] +if not try.pred [output "false] +make :special.var lput first :sen thing :special.var +output ^test butfirst :sen +end + +;; Match words in a group + +to match@ +make :special.var :sen +output @test [] +end + +to @test :sen +if @try.pred [if match butfirst :pat :sen [output "true]] +if emptyp thing :special.var [output "false] +output @test2 fput last thing :special.var :sen +end + +to @test2 :sen +make :special.var butlast thing :special.var +output @test :sen +end + +;; Applying the predicates + +to try.pred +if listp :special.pred [output match :special.pred first :sen] +output run list :special.pred quoted first :sen +end + +to quoted :thing +if listp :thing [output :thing] +output word "" :thing +end + +to @try.pred +if listp :special.pred [output match :special.pred thing :special.var] +output run list :special.pred thing :special.var +end + +;; Special predicates + +to always :x +output "true +end + +to in :word +output memberp :word :in.list +end + +to anyof :sen +output anyof1 :sen :in.list +end + +to anyof1 :sen :pats +if emptyp :pats [output "false] +if match first :pats :sen [output "true] +output anyof1 :sen butfirst :pats +end +</PRE><P> + + + +<P><A HREF="../v2-toc2.html">(back to Table of Contents)</A> +<P><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A> + +<P> +<ADDRESS> +<A HREF="../index.html">Brian Harvey</A>, +<CODE>bh@cs.berkeley.edu</CODE> +</ADDRESS> +</BODY> +</HTML> |