*

author: elioat <elioat@tilde.institute> 2023-08-23 07:52:19 -0400
committer: elioat <elioat@tilde.institute> 2023-08-23 07:52:19 -0400
commit: 562a9a52d599d9a05f871404050968a5fd282640 (patch)
tree: 7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/v2ch7/match.html
parent: 5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff)
download: tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz
1 files changed, 1437 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/v2ch7/match.html b/js/games/nluqo.github.io/~bh/v2ch7/match.html
new file mode 100644
index 0000000..d160258
--- /dev/null
+++ b/js/games/nluqo.github.io/~bh/v2ch7/match.html
@@ -0,0 +1,1437 @@
+<HTML>
+<HEAD>
+<TITLE>Computer Science Logo Style vol 2 ch 7: Pattern Matcher</TITLE>
+</HEAD>
+<BODY>
+<CITE>Computer Science Logo Style</CITE> volume 2:
+<CITE>Advanced Techniques</CITE> 2/e Copyright (C) 1997 MIT
+<H1>Pattern Matcher</H1>
+
+<TABLE width="100%"><TR><TD>
+<IMG SRC="../csls2.jpg" ALT="cover photo">
+<TD><TABLE>
+<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian
+Harvey</A><BR>University of California, Berkeley</CITE>
+<TR><TD align="right"><BR>
+<TR><TD align="right"><A HREF="../pdf/v2ch07.pdf">Download PDF version</A>
+<TR><TD align="right"><A HREF="../v2-toc2.html">Back to Table of Contents</A>
+<TR><TD align="right"><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A>
+<TR><TD align="right"><A HREF="https://mitpress.mit.edu/books/computer-science-logo-style-second-edition-volume-2">MIT
+Press web page for <CITE>Computer Science Logo Style</CITE></A>
+</TABLE></TABLE>
+
+<HR><P>Program file for this chapter: <A HREF="match.lg"><CODE>match</CODE></A>
+
+<P>
+In a <EM>conversational</EM> program, one that carries on a conversation with
+the user, you may often have occasion to compare what the user types with
+some expected response.  For example, a quiz program will compare the user's
+response with the correct answer; if you've just asked &quot;how are you,&quot; you
+might look for words like &quot;fine&quot; or &quot;lousy&quot; in the reply.  The main
+tools that Logo provides for such comparisons are <CODE>equalp</CODE>, which
+compares two values for exact equality, and <CODE>memberp</CODE>, which compares
+one datum with a list of alternatives.  This project provides a more
+advanced comparison tool.
+
+<P>Most of the projects in this book are fairly complicated in their inner
+workings, but relatively simple in the external appearance of what they do.
+This project is the reverse; the actual program is not so complex, but it
+does quite a lot, and it will take a while to explain all of it.  Pattern
+matching is a powerful programming tool, and I hope you won't be put off by
+the effort required to learn how to use it.
+
+<P>A <EM>pattern</EM> is a list in which some members are not made explicit.
+This definition is best understood by considering an example.  Consider the
+pattern
+
+<P><PRE>[Every # is a #]
+</PRE>
+
+<P>The words <CODE>every</CODE>, <CODE>is</CODE>, and <CODE>a</CODE> represent themselves
+explicitly.  The two number signs, however, are symbols representing &quot;zero
+or more arbitrary data.&quot; Here are some lists that would match the
+pattern:
+
+<P><PRE>[Every man is a mortal]
+[Every computer programmer is a genius]
+[Every is a word]
+[Every datum is a word or a list]
+</PRE>
+
+<P>Here are some lists that would <EM>not</EM> match the pattern:
+
+<P><PRE>[Socrates is a man]
+[Every man is an animal]
+[Everyone I know is a friend]
+[I think every list is a match]
+</PRE>
+
+<P>The first of these examples doesn't match the pattern because the
+word <CODE>every</CODE> is missing.  The second has <CODE>an</CODE> instead of <CODE>a</CODE>,
+while the third has <CODE>everyone</CODE> instead of <CODE>every</CODE>.  The fourth has
+the extra words <CODE>I think</CODE> before the word <CODE>every</CODE>.  This last
+example <EM>would</EM> match the pattern
+
+<P><PRE>[# every # is a #]
+</PRE>
+
+<P>because this new pattern allows for extra words at the beginning.
+
+<P><CODE>Match</CODE> is a predicate that takes two inputs.  The first input is a
+pattern and the second input is a sentence.  The output is <CODE>true</CODE> if the
+sentence matches the pattern, otherwise <CODE>false</CODE>.
+
+<P><PRE>? <U>print match [Every # is a #] [Every book is a joy to read]</U>
+true
+? <U>print match [Every # is a #] [Every adolescent is obnoxious]</U>
+false
+</PRE>
+
+<P>Patterns can be more complicated than the ones I've shown so far.  In the
+following paragraphs I'll introduce the many ways that you can modify
+patterns to control which sentences they'll match.  As you read, you should
+make up sample patterns of your own and try them out with <CODE>match</CODE>.
+
+<P>Often, in a conversational program, it's not good enough just to know
+whether or not a sentence matches a pattern.  You also want to know the
+pieces of the sentence that match the variable parts of the pattern.  <CODE>
+Match</CODE> meets this requirement by allowing you to tell it the names of
+variables that you want to receive the matching words from the sentence.
+Here is an example:
+
+<P><PRE>? <U>print match [#food is for #animal] [Hay is for horses]</U>
+true
+? <U>show :food</U>
+[Hay]
+? <U>show :animal</U>
+[horses]
+
+? <U>print match [#food is for #animal] [C++ is for the birds]</U>
+true
+? <U>show :food</U>
+[C++]
+? <U>show :animal</U>
+[the birds]
+</PRE>
+
+<P>Here is a short <A NAME="converse"> conversational program using the parts of the
+pattern matcher we've discussed so far.
+
+<P><PRE>to converse
+local [response name like]
+print [Hi, my name is Toby and I like ice cream]
+print [Tell me about yourself]
+make &quot;response readlist
+if match [# my name is #name] :response [do.name strip.and :name]
+if match [# i like #like] :response [do.like strip.and :like]
+print [Nice meeting you!]
+end
+
+to do.name :name
+print sentence &quot;Hello, :name
+end
+
+to do.like :like
+print sentence [I'm glad you like] :like
+end
+
+to strip.and :text
+local &quot;short
+if match [#short and #] :text [output :short]
+output :text
+end
+
+? <U>converse</U>
+Hi, my name is Toby and I like ice cream
+Tell me about yourself
+<U>My name is Brian and I like Chinese food</U>
+Hello, Brian
+I'm glad you like Chinese food
+Nice meeting you!
+
+? <U>converse</U>
+Hi, my name is Toby and I like ice cream
+Tell me about yourself
+<U>I like spaghetti and meat balls</U>
+I'm glad you like spaghetti
+Nice meeting you!
+</PRE>
+
+<P>If <CODE>match</CODE> outputs <CODE>false</CODE>, there is no guarantee of what
+values will end up in the variables mentioned in the pattern.  <CODE>
+Converse</CODE> uses the result of the match only if <CODE>match</CODE> outputs <CODE>
+true</CODE>.
+
+<P><CODE>Converse</CODE> looks for each part of the sentence (the name and the thing
+the person likes) in two steps: first it finds the keywords <CODE>my name is</CODE>
+or <CODE>I like</CODE> and extracts everything following those phrases, then it
+looks within what it extracted for the word <CODE>and</CODE> and removes anything
+following it.  For example, when I typed
+
+<P><PRE>My name is Brian and I like Chinese food
+</PRE>
+
+<P>the result of matching the name pattern was to give the variable
+<CODE>name</CODE> the value
+
+<P><PRE>Brian and I like Chinese food
+</PRE>
+
+<P>Then <CODE>strip.and</CODE> used a second pattern to eliminate everything
+after the <CODE>and</CODE>.  You might be tempted to extract the name in one step
+by using a pattern like
+
+<P><PRE>[# my name is #name and #]
+</PRE>
+
+<P>but I wanted to avoid that pattern because it won't match a
+sentence that only contains
+
+<P><PRE>my name is Mary
+</PRE>
+
+<P>without expressing any likes or dislikes.  The program as I've
+written it does accept these shorter sentences also.  Later we'll see a more
+complicated pattern that accepts sentences with or without <CODE>and</CODE>
+using a single pattern.
+
+<P>The special symbol <CODE>#</CODE> in a pattern represents zero or more words.  <CODE>
+Match</CODE> recognizes other symbols with different meanings:
+
+<P><TABLE>
+<TR><TD><CODE>#</CODE><TD>&nbsp;&nbsp;&nbsp;zero or more
+<TR><TD><CODE>&amp;</CODE><TD>&nbsp;&nbsp;&nbsp;one or more
+<TR><TD><CODE>?</CODE><TD>&nbsp;&nbsp;&nbsp;zero or one
+<TR><TD><CODE>!</CODE><TD>&nbsp;&nbsp;&nbsp;exactly one
+</TABLE>
+
+<P>For example, if you'd like the <CODE>converse</CODE> program to recognize
+only the first name of the person using it, you could change the relevant
+pattern to
+
+<P><PRE>[# my name is !name #]
+</PRE>
+
+<P>Then a conversation with the program might look like this:
+
+<P><PRE>? <U>converse</U>
+Hi, my name is Toby and I like ice cream
+Tell me about yourself
+<U>My name is Brian Harvey</U>
+Hello, Brian
+Nice meeting you!
+?
+</PRE>
+
+<P>The word <CODE>!name</CODE> in the pattern matched just the single word
+<CODE>Brian</CODE>, not the multiple words <CODE>Brian Harvey</CODE> that the original
+pattern would have selected.  (If you modify <CODE>converse</CODE> in this way, it
+should be possible to remove the invocation of <CODE>strip.and</CODE> in computing
+the input to <CODE>do.name</CODE>.  The single word stored in the variable <CODE>name</CODE> won't
+contain any other clauses.)
+
+<P>So far, the patterns we've seen allow two extremes: the pattern can include
+a single word that must be matched exactly, or it can allow <EM>any</EM> word
+at all to be matched.  It is also possible to write a pattern that calls for
+words in some specified category--that is, words that satisfy some
+predicate.  Here is an example:
+
+<P><PRE>to ask.age
+local &quot;age
+print [How old are you?]
+if match [# !age:numberp #] readlist ~
+   [print (sentence [You are] :age [years old.]]
+end
+
+? <U>ask.age</U>
+How old are you?
+<U>I will be 36 next month</U>
+You are 36 years old.
+</PRE>
+
+<P>This is a slightly silly example, but it does illustrate the use
+of a predicate to restrict which words can match a variable part of a
+pattern.  The pattern used in <CODE>ask.age</CODE> looks for a single word for
+which <CODE>numberp</CODE> is <CODE>true</CODE>, that is, for a number.  Any number of
+words surrounding the number are allowed.
+
+<P>Of course, a predicate used in a pattern need not be a primitive one like
+<CODE>numberp</CODE>.  You may find it useful to write your own predicates that
+select categories of words.  Such a predicate might have a list built in:
+
+<P><PRE>to colorp :word
+output memberp :word [red orange yellow blue green violet white black]
+end
+</PRE>
+
+<P>Or you could check some inherent property of a word:
+
+<P><PRE>to ends.y :word
+output equalp last :word &quot;y
+end
+</PRE>
+
+<P>In either case, what is essential is that your predicate must take
+a word as its single input, and must output <CODE>true</CODE> if you want <CODE>
+match</CODE> to accept the word to fill a slot in the pattern.
+
+<P>It is most common to want a predicate like <CODE>colorp</CODE>
+above--one that tests its input word for membership in a certain list.  A
+special notation makes it possible to include such a list in the pattern
+itself, instead of writing a predicate procedure.  For example, suppose you
+are writing a quiz program, and you want to ask the question, &quot;What is the
+quickest route from Boston to Framingham?&quot; You'd like to accept answers
+like these:
+
+<P><PRE>Mass Pike
+the Massachusetts Turnpike
+the Pike
+</PRE>
+
+<P>but not <CODE>the Ohio Turnpike</CODE>!  Here is a pattern you could use.
+
+<P><PRE>[?:in [the] ?:in [Mass Massachusetts] !:in [Pike Turnpike]]
+</PRE>
+
+<P>The special predicate <CODE>in</CODE> is a version of <CODE>memberp</CODE> that
+knows to look in the pattern, right after the element that invokes <CODE>in</CODE>,
+for the list of acceptable words.  This pattern accepts zero or one <CODE>
+the</CODE>, zero or one of <CODE>Mass</CODE> or <CODE>Massachusetts</CODE>, and one of <CODE>
+Pike</CODE> or <CODE>Turnpike</CODE>.  That is, the first two words are optional and the
+third is required.
+
+<P>Earlier I rejected the use of a pattern
+
+<P><PRE>[# my name is #name and #]
+</PRE>
+
+<P>because I wanted also to be able to accept sentences without <CODE>
+and</CODE> following the name.  I promised to exhibit a pattern that would accept
+both sentence forms.  Here it is:
+
+<P><PRE>[# my name is #name:notand #]
+</PRE>
+
+<P>This pattern uses a predicate <CODE>notand</CODE> that allows any word
+except <CODE>and</CODE>.  It's easy to write this predicate:
+
+<P><PRE>to notand :word
+output not equalp :word &quot;and
+end
+</PRE>
+
+<P>(By the way, the symbols indicating the number of words to match are meant
+to be mnemonic.  The question mark indicates that it's questionable whether
+or not the word will be in the sentence.  The exclamation point looks a
+little like a digit 1, and also shouts emphatically that the word is
+present.  The number sign means that any number of words (including zero) is
+okay, and the ampersand indicates that more words are required, namely at
+least one instead of at least zero.)
+
+<P>We've seen various combinations of quantifiers (that's what I'll call
+the characters like <CODE>#</CODE> that control how many words are matched),
+variable names, and predicates:
+
+<P><TABLE>
+<TR><TD><CODE>#</CODE><TD>&nbsp;&nbsp;&nbsp;no variable, no predicate (accept any word)
+<TR><TD><CODE>#name</CODE><TD>&nbsp;&nbsp;&nbsp;set variable, no predicate
+<TR><TD><CODE>?:in</CODE><TD>&nbsp;&nbsp;&nbsp;no variable, test predicate
+<TR><TD><CODE>!age:numberp</CODE><TD>&nbsp;&nbsp;&nbsp;set variable, test predicate
+</TABLE>
+
+<P>We are now about to discuss some of the more esoteric features of
+the <CODE>match</CODE> program.  So far, we have always compared a pattern against
+a <EM>sentence,</EM> a list of words.  It is also possible to match a pattern
+against a structured list, with smaller lists among its members.  <CODE>Match</CODE>
+treats a sublist just like a word, if you don't want to examine the inner
+structure of the sublist.  Here are some examples.
+
+<P><PRE>? <U>print match [hello #middle goodbye] [hello is [very much] like goodbye]</U>
+true
+? <U>show :middle</U>
+[is [very much] like]
+? <U>print match [hi #middle:wordp bye] [hi and then bye]</U>
+true
+? <U>show :middle</U>
+[and then]
+? <U>print match [hi #middle:wordp bye] [hi and [then] bye]</U>
+false
+
+? <U>print match [hi #mid:wordp #dle:listp bye] [hi and [then] bye]</U>
+true
+? <U>show :mid show :dle</U>
+[and]
+[[then]]
+</PRE>
+
+<P>A more interesting possibility is to ask <CODE>match</CODE> to apply a
+sub-pattern to a sublist.  This is done by using the pattern (that is, a
+list) in place of the name of a predicate.  Here is an example:
+
+<P><PRE>? <U>print match [a #:[x # y] b] [a [x 111 y] [x 222 y] b]</U>
+true
+? <U>print match [a #:[x # y] b] [a [x 333 zzz] b]</U>
+false
+</PRE>
+
+<P>It is possible to include variable names in the subpattern, but
+this makes sense only if the quantifier outside the pattern is <CODE>!</CODE> or
+<CODE>?</CODE> because otherwise you may be trying to assign more than one value to
+the same variable.  Here's what I mean:
+
+<P><PRE>? <U>print match [a #all:[x #some y] b] [a [x 111 y] [x 222 y] b]</U>
+true
+? <U>show :all show :some</U>
+[[x 111 y] [x 222 y]]
+[222]
+</PRE>
+
+<P>The variable <CODE>all</CODE> is properly set to contain both of the
+lists that matched the subpattern, but the variable <CODE>some</CODE> only contains
+the result of the second match.
+
+<P>If a list appears in a pattern without a quantifier before it, <CODE>match</CODE>
+treats it as if it were preceded by &quot;<CODE>!:</CODE>&quot;; in other words, it tries
+to match the subpattern exactly once.
+
+<P>A pattern element like <CODE>#:predicate</CODE> can match several members of the
+target sentence; the predicate is applied to each candidate member
+separately.  For example:
+
+<P><PRE>? <U>print match [#nums:numberp #rest] [3 2 1 blastoff!]</U>
+true
+? <U>show :nums show :rest</U>
+[3 2 1]
+[blastoff!]
+</PRE>
+
+<P>Sometimes you may want to match several members of a sentence, but
+apply the predicate to all of the candidates <EM>together</EM> in one list.
+To do this, use the quantifier <CODE>@</CODE>:
+
+<P><PRE>to threep :list
+output equalp count :list 3
+end
+
+? <U>print match [@begin:threep #rest] [a b c d e]</U>
+true
+? <U>show :begin show :rest</U>
+[a b c]
+[d e]
+</PRE>
+
+<P>In this example, I haven't used the predicate to examine the <EM>
+nature</EM> of the matching words, but rather to control the <EM>number</EM> of
+words that are matched.  Here is another example that looks &quot;inside&quot; the
+matching words.
+
+<P><PRE>to headtailp :list
+if (count :list) &lt; 2 [output &quot;false]
+output equalp first :list last :list
+end
+
+? <U>print match [#front @good:headtailp #back] [a b c x d e f g x h i]</U>
+true
+? <U>show :front show :good show :back</U>
+[a b c]
+[x d e f g x]
+[h i]
+</PRE>
+
+<P>Think about all the different tests that <CODE>match</CODE> has to make
+to find this match!  Also, do you see why the first instruction of <CODE>
+headtailp</CODE> is needed?
+
+<P>Some patterns are <EM>ambiguous;</EM> that is, there might be more than one
+way to associate words from the matched sentence with quantifiers in the
+pattern.  For example, what should the following do?
+
+<P><PRE>match [#front xx #back] [a b c d xx e f g xx h i]
+</PRE>
+
+<P>The word <CODE>xx</CODE> appears twice in the matched sentence.  The
+program could choose to use everything up to the first <CODE>xx</CODE> as <CODE>
+front</CODE>, leaving six words for <CODE>back</CODE>, or it could choose to use
+everything up to the second <CODE>xx</CODE> as <CODE>front</CODE>, leaving only two words
+for <CODE>back</CODE>.  In fact, each quantifier, starting from the left, matches
+as many words as it can:
+
+<P><PRE>? <U>print match [#front xx #back] [a b c d xx e f g xx h i]</U>
+true
+? <U>show :front show :back</U>
+[a b c d xx e f g]
+[h i]
+</PRE>
+
+<P>If that's not what you want, the quantifier <CODE>^</CODE> behaves
+like <CODE>#</CODE> except that it matches as <EM>few</EM> words as possible.
+
+<P><PRE>? <U>print match [^front xx #back] [a b c d xx e f g xx h i]</U>
+true
+? <U>show :front show :back</U>
+[a b c d]
+[e f g xx h i]
+</PRE>
+
+<P>We can use the <CODE>^</CODE> quantifier to fix a bug
+in <A HREF="match.html#converse">the <CODE>converse</CODE>
+program</A>:
+
+<P><PRE>? <U>converse</U>
+Hi, my name is Toby and I like ice cream
+Tell me about yourself
+<U>My name is Brian and I like bacon and eggs</U>
+Hello, Brian and I like bacon
+I'm glad you like bacon
+Nice meeting you!
+</PRE>
+
+<P>The problem here is that the pattern used by <CODE>strip.and</CODE>
+divided the sentence at the second <CODE>and</CODE>, just as the earlier example
+chose the second <CODE>xx</CODE> when I used <CODE>#</CODE> as the quantifier.  We
+can fix it this way:
+
+<P><PRE>to strip.and :text
+local &quot;short
+if match [^short and #] :text [output :short]
+output :text
+end
+
+? <U>converse</U>
+Hi, my name is Toby and I like ice cream
+Tell me about yourself
+<U>My name is Brian and I like bacon and eggs</U>
+Hello, Brian
+I'm glad you like bacon
+Nice meeting you!
+</PRE>
+
+<P>There is just one more special feature of <CODE>match</CODE> left to describe.  It
+is another special predicate, like <CODE>in</CODE>, but this one is called <CODE>
+anyof</CODE>.  When you use <CODE>anyof</CODE>, the next member of the pattern should be
+a <EM>list of patterns</EM> to test.  <CODE>Match</CODE> tries each pattern in turn,
+applied to list members as determined by the quantifier used.  In practice,
+though, <CODE>anyof</CODE> only makes sense when applied to several members as a
+group, so the quantifier <CODE>@</CODE> should always be used.  An example may make
+this clear.  I'm going to rewrite the <CODE>converse</CODE> program to check for
+names and likes all at once.
+
+<P><PRE>to converse
+local [response name like rest]
+print [Hi, my name is Toby and I like ice cream]
+print [Tell me about yourself]
+make &quot;response readlist
+while match [@:anyof [[My name is #name:notand]
+                      [I like #like:notand]
+                      [&amp;:notand]]
+                     ?:in [and] #rest] ~
+            :line ~
+      [make &quot;response :rest]
+if not emptyp :name [print sentence &quot;Hello, :name]
+if not emptyp :like [print sentence [I'm glad you like] :like]
+print [Nice meeting you!]
+end
+</PRE>
+
+<P>This program uses the <CODE>notand</CODE> predicate I wrote earlier.  It
+checks for clauses separated by the word <CODE>and</CODE>.  Each clause can match
+any of three patterns, one for the name, one for the liking, and a general
+pattern that matches any other clause.  The clauses can appear in any order.
+
+<P><PRE>? <U>converse</U>
+Hi, my name is Toby and I like ice cream
+Tell me about yourself
+<U>My name is Brian and I hate cheese</U>
+Hello, Brian
+Nice meeting you!
+
+? <U>converse</U>
+Hi, my name is Toby and I like ice cream
+Tell me about yourself
+<U>I like wings and my name is Jonathan</U>
+Hello, Jonathan
+I'm glad you like wings
+Nice meeting you!
+</PRE>
+
+<P><H2>Reinventing <CODE>Equalp</CODE> for Lists</H2>
+
+<P>
+
+<P><CODE>Match</CODE> is a kind of fancy <CODE>equalp</CODE> with a complicated understanding
+of what equality means.  One way to approach an understanding of <CODE>match</CODE>
+is to begin with this question:  Suppose Logo's primitive <CODE>equalp</CODE> only
+worked for comparing two <EM>words</EM> for equality.  (For the remainder of
+this section, I won't use the word <CODE>equalp</CODE> at all; I'll call this
+imaginary primitive <CODE>wordequalp</CODE> instead.)  How would you write a
+<CODE>listequalp</CODE> to compare two lists?  This is basically a <CODE>
+butfirst</CODE>-style recursive operation, but you have to be a little careful
+about the fact that either input might be smaller than the other.
+
+<P><PRE>to listequalp :a :b
+if emptyp :a [output emptyp :b]
+if emptyp :b [output &quot;false]
+if wordequalp first :a first :b ~
+   [output listequalp butfirst :a butfirst :b]
+output &quot;false
+end
+</PRE>
+
+<P>(This procedure contains the instruction <CODE>output &quot;false</CODE>
+twice, but it never says <CODE>output &quot;true</CODE>.  How can it ever say that two
+lists are equal?)
+
+<P>There is one deficiency in the procedure as I've defined
+it.  The problem is that it only works for <EM>sentences</EM>--lists whose
+members are words.  If either list contains a sublist, <CODE>listequalp</CODE> will
+try to apply <CODE>wordequalp</CODE> to that sublist.  If you enjoy the exercise of
+reinventing Logo primitives, you may want to fix that.  But for my purposes,
+the version here is good enough as a basis for further development of the
+pattern matcher.
+
+<P><H2>A Simple Pattern Matcher</H2>
+
+<P>We can extend the idea of <CODE>listequalp</CODE> slightly to make a pattern
+matcher that only recognizes the special word <CODE>#</CODE> to mean &quot;match zero
+or more words.&quot; We won't do any of the fancy things like storing the
+matching words in a variable.
+
+<P><PRE>to match :pat :sen
+if emptyp :pat [output emptyp :sen]
+if emptyp :sen [if equalp first :pat &quot;#
+                   [output match butfirst :pat :sen]
+                   [output &quot;false]]
+if equalp first :pat &quot;# [output or match butfirst :pat :sen
+                                   match :pat butfirst :sen]
+if equalp first :pat first :sen ~
+   [output match butfirst :pat butfirst :sen]
+output &quot;false
+end
+</PRE>
+
+<P>The end test is more complicated in this program than in <CODE>listequalp</CODE>
+because the combination of an empty sentence and a nonempty pattern can
+still be a match, if the pattern is something like <CODE>[#]</CODE> that matches
+zero or more words.
+
+<P>
+
+<P>The really interesting part of this procedure is what happens if a <CODE>#</CODE>
+is found in the pattern.  The match succeeds (outputs <CODE>true</CODE>) if one of
+two smaller matches succeeds.  The two smaller matches correspond to two
+possible conditions: the <CODE>#</CODE> can match zero words, or more than zero.
+The first case is detected by the expression
+
+<P><PRE>match butfirst :pat :sen
+</PRE>
+
+<P>For example, suppose you want to evaluate
+
+<P><PRE>match [# cream] [cream]
+</PRE>
+
+<P>This expression should yield the value <CODE>true</CODE>, with the <CODE>
+#</CODE> matching no words in the sentence.  In this example the expression
+
+<P><PRE>match butfirst :pat :sen
+</PRE>
+
+<P>is equivalent to
+
+<P><PRE>match [cream] [cream]
+</PRE>
+
+<P>which straightforwardly outputs <CODE>true</CODE>.
+
+<P>On the other hand, the expression
+
+<P><PRE>match :pat butfirst :sen
+</PRE>
+
+<P>comes into play when the <CODE>#</CODE> has to match at least one word.
+For example, consider the expression
+
+<P><PRE>match [# cream] [ice cream]
+</PRE>
+
+<P>Here the <CODE>#</CODE> should match the word <CODE>ice</CODE>.  The expression
+
+<P><PRE>match :pat butfirst :sen
+</PRE>
+
+<P>is here equivalent to
+
+<P><PRE>match [# cream] [cream]
+</PRE>
+
+<P>But this is the example that was <CODE>true</CODE> just above.
+
+<P>If the <CODE>#</CODE> has to match more than one word, several recursive
+invocations of <CODE>match</CODE> are required, each one taking the <CODE>butfirst</CODE>
+of the sentence once.  For example, suppose we start with
+
+<P><PRE>match [# cream] [vanilla ice cream]
+</PRE>
+
+<P>Here is the sequence of recursive invocations leading to a <CODE>
+true</CODE> match:
+
+<P><PRE>match :pat butfirst :sen      match [# cream] [ice cream]
+  match :pat butfirst :sen      match [# cream] [cream]
+    match butfirst :pat :sen      match [cream] [cream]
+</PRE>
+
+<P>I have been talking as if Logo only evaluated whichever of the two
+expressions
+
+<P><PRE>match butfirst :pat :sen
+</PRE>
+
+<P>and
+
+<P><PRE>match :pat butfirst :sen
+</PRE>
+
+<P>is appropriate for the particular inputs used.  Actually, <EM>
+both</EM> expressions are evaluated each time, so there are many recursive
+invocations of <CODE>match</CODE> that come out <CODE>false</CODE>.  However, the purpose
+of the primitive operation <CODE>or</CODE> is to output <CODE>true</CODE> if <EM>
+either</EM> of its inputs is <CODE>true</CODE>.  To understand fully how <CODE>match</CODE>
+works, you'll almost certainly have to trace a few examples carefully by
+hand.
+
+<P><H2>Efficiency and Elegance</H2>
+
+
+<P>Pattern matching is a complicated task, and even the best-written programs
+are not blindingly fast.  But what is the &quot;best-written&quot; program?  In the
+simple pattern matcher of the last section, the instruction
+
+<P><PRE>if equalp first :pat &quot;# [output or match butfirst :pat :sen
+                                   match :pat butfirst :sen]
+</PRE>
+
+<P>is extremely compact and elegant.  It packs a lot of power into a
+single instruction, by combining the results of two recursive invocations
+with <CODE>or</CODE>.  The similarity of the inputs to the two invocations is also
+appealing.
+
+<P>The trouble with this instruction is that it is much slower than necessary,
+because it always tries both recursive invocations even if the first one
+succeeds.  A more efficient way to program the same general idea would be
+this:
+
+<P><PRE>if equalp first :pat &quot;# ~
+   [if match butfirst :pat :sen
+       [output &quot;true]
+       [output match :pat butfirst :sen]]
+</PRE>
+
+<P>This new version is much less pleasing to the eye, but it's much
+faster.  The reason is that if the expression
+
+<P><PRE>match butfirst :pat :sen
+</PRE>
+
+<P>outputs <CODE>true</CODE>, then the other recursive invocation is avoided.
+
+<P>It's a mistake to make efficiency your only criterion for program style.
+Sometimes it's worth a small slowdown of your program to achieve a large
+gain in clarity.  But this is a case in which the saving is quite
+substantial.  Here is a partial trace of the evaluation of
+
+<P><PRE>match [cat # bat] [cat rat bat]
+</PRE>
+
+<P>using the original version of the procedure:
+
+<P><PRE>match [cat # bat] [cat rat bat]              [cat # bat]   [cat rat bat]
+  match butfirst :pat butfirst :sen          [# bat]       [rat bat]
+    match butfirst :pat :sen                 [bat]         [rat bat]
+    match :pat butfirst :sen                 [# bat]       [bat]
+      match butfirst :pat :sen               [bat]         [bat]
+        match butfirst :pat butfirst :sen    []            []
+*     match :pat butfirst :sen               [# bat]       []
+*       match butfirst :pat :sen             [bat]         []
+</PRE>
+
+<P>The two invocations marked with asterisks are avoided by using the
+revised version.  These represent 25% of the invocations of <CODE>match</CODE>, a
+significant saving.  (Don't think that the program necessarily runs 25%
+faster.  Not all invocations take the same amount of time.  This is just a
+rough measure.)  If there were more words after the <CODE>#</CODE> in the pattern,
+the saving would be even greater.
+
+<P>In this situation we achieve a large saving of time by reorganizing the flow
+of control in the program.  This is quite different from a more common sort
+of concern for efficiency, the kind that leads people to use shorter
+variable names so that the program will be a little smaller, or to worry
+about whether to use <CODE>fput</CODE> or <CODE>sentence</CODE> in a case where either
+would do.  These small &quot;bumming&quot; kinds of optimization are rarely worth
+the trouble they cause.  Figuring out how many times <CODE>match</CODE> is invoked
+using each version is a simple example of the branch of computer science
+called <EM>analysis of algorithms;</EM> a more profound analysis might use
+mathematical techniques to compare the two versions in general, rather than
+for a single example.
+
+<P>In the full version of the pattern matcher, listed at the end of this
+project description, I've taken some care to avoid unnecessary matching.
+On the other hand, the full version has less flexibility than the simple
+version because of its ability to assign matching words to variables.
+Consider a case like
+
+<P><PRE>match [# #] [any old list of words]
+</PRE>
+
+<P>Which <CODE>#</CODE> matches how many words?  It doesn't matter if you
+don't store the result of the match in variables.  But if the pattern is
+<CODE>[#a #b]</CODE> instead, there has to be a uniform rule about which part of
+the pattern matches what.  (In my pattern matcher, all of the words would be
+assigned to <CODE>a</CODE>, and <CODE>:b</CODE> would be empty.  In general, pattern
+elements toward the left match as many words as possible when there is any
+ambiguity.)  The simple pattern matcher doesn't have this problem, and can
+be written to match the ambiguous pattern whichever way gives a <CODE>true</CODE>
+result most quickly.
+
+<P>By the way, what if the two expressions that invoke <CODE>match</CODE> recursively
+were reversed in the revised instruction?  That is, what if the instruction
+were changed again, to read
+
+<P><PRE>if equalp first :pat &quot;# ~
+   [if match :pat butfirst :sen
+       [output &quot;true]
+       [output match butfirst :pat :sen]]
+</PRE>
+
+<P>Would this be more or less efficient than the previous version?
+
+<P><H2>Logo's Evaluation of Inputs</H2>
+
+
+
+<P>The discussion about efficiency started because Logo <EM>evaluates</EM> the
+inputs to the primitive operation <CODE>or</CODE> before invoking the procedure.
+That is, in the example in question, Logo invokes <CODE>match</CODE> twice before
+using <CODE>or</CODE> to check whether either invocation output <CODE>true</CODE>.
+This is consistent with the way Logo does things in general: To evaluate an
+expression that uses some procedure, Logo first evaluates all the inputs for
+that procedure, and then invokes the procedure with the evaluated inputs.
+Logo's rule is extremely consistent (except for the <CODE>to</CODE> command), but
+it isn't the only possible way.  In Lisp, a language that's like Logo in
+many ways, each procedure can choose whether or not its inputs should be
+evaluated in advance.
+
+<P>
+
+<P>An example may make it clearer what I mean by this.  Lisp has a
+procedure called <CODE>set</CODE> that's
+equivalent to the Logo <CODE>make</CODE>.  You say
+
+<P><PRE>(set 'var 27)
+</PRE>
+
+<P>as the equivalent of
+
+<P><PRE>make &quot;var 27
+</PRE>
+
+<P>But Lisp also has a version called <CODE>setq</CODE> whose first input is
+<EM>not</EM> evaluated before <CODE>setq</CODE> is invoked.  It's as if there were
+an automatic quote mark before the first input, so you just say
+
+<P><PRE>(setq var 27)
+</PRE>
+
+<P>with the same effect as the other examples.
+
+<P>Except for the special format of the <CODE>to</CODE> command that forms the title
+line of a procedure, Berkeley Logo and many other Logo dialects
+do not have any form of automatically-quoted inputs.  The design principle
+was that consistency of evaluation would make the rules easier to
+understand.  Some other versions of Logo do use auto-quoting for certain
+procedures.  For example, in Berkeley Logo, to edit the definition of a
+procedure named <CODE>doit</CODE> you type the instruction
+
+<P><PRE>edit &quot;doit
+</PRE>
+
+<P>But in some other versions of Logo you instead say
+
+<P><PRE>edit doit
+</PRE>
+
+<P>because in those versions, the <CODE>edit</CODE> command auto-quotes its
+input.  One possible reason for this design decision is that teachers of
+young children like to present Logo without an explicit discussion of the
+evaluation rules.  They teach the <CODE>edit</CODE> command as a special case,
+rather than as just the invocation of a procedure like everything else.
+Using this approach, auto-quoting the input avoids having to explain what
+that quotation mark means.
+
+<P>The advantage of the non-auto-quoting version of <CODE>edit</CODE> isn't just in
+some abstract idea of consistency.  It allows us to take advantage of
+composition of functions.  Suppose you are working on a very large project,
+a video game, with hundreds of procedures.  You want to edit all the procedures
+having to do with the speed of the spaceships, or whatever moves around the
+screen in this game.  Luckily, all the procedures you want have the word
+<CODE>speed</CODE> as part of their names; they are called <CODE>shipspeed</CODE> or
+<CODE>asteroidspeed</CODE> or <CODE>speedcontrol</CODE>.  You can say
+
+<P><PRE>edit filter [substringp &quot;speed ?] procedures
+</PRE>
+
+<P>(<CODE>Procedures</CODE> is a Berkeley Logo primitive operation that
+outputs a list of all procedures defined in the workspace; <CODE>substringp</CODE>
+is a predicate that checks whether one word appears as part of a longer
+word.)  An auto-quoting <CODE>edit</CODE> command wouldn't have this flexibility.
+
+<P>The reason all this discussion is relevant to the pattern matcher is that
+the Lisp versions of <CODE>or</CODE> and <CODE>and</CODE> have auto-quoted inputs, which
+get evaluated one by one.  As soon as one of the inputs to <CODE>or</CODE> turns
+out to be <CODE>true</CODE> (or one of the inputs to <CODE>and</CODE> is <CODE>false</CODE>), the
+evaluation stops.  This is very useful not only for efficiency reasons, as
+in the discussion earlier, but to prevent certain kinds of errors.  For
+example, consider this Logo instruction:
+
+<P><PRE>if not emptyp :list [if equalp first :list 1 [print &quot;one]]
+</PRE>
+
+<P>It would be pleasant to be able to rewrite that instruction this
+way:
+
+<P><PRE>if and (not emptyp :list) (equalp first :list 1) [print &quot;one]
+</PRE>
+
+<P>The use of <CODE>and</CODE>, I think, makes the program structure clearer
+than the nested <CODE>if</CODE>s.  That is, it's apparent in the second version
+that something (the <CODE>print</CODE>) is to be done if two conditions are met, and
+that that's all that happens in the instruction.  In the first version,
+there might have been another instruction inside the range of the first
+(outer) <CODE>if</CODE>; you have to read carefully to see that that isn't so.
+
+<P>Unfortunately, the second version won't work in Logo.  If <CODE>:list</CODE> is in
+fact empty, the expression
+
+<P><PRE>(equalp first :list 1)
+</PRE>
+
+<P>is evaluated before <CODE>and</CODE> is invoked; this expression causes
+an error message because <CODE>first</CODE> doesn't accept an empty input.  In
+Lisp, the corresponding instruction <EM>would</EM> work, because the two
+predicate expressions would be evaluated serially and the second wouldn't
+be evaluated if the first turned out to be false.
+
+<P>The serial evaluation of inputs to <CODE>and</CODE> and <CODE>or</CODE> is so
+often useful that some people have proposed it for Logo, even at the cost of
+destroying the uniform evaluate-first rule.  But if you want a serial <CODE>
+and</CODE> or <CODE>or</CODE>, it's easy enough to write them, if you explicitly quote
+the predicate expressions that are its inputs:
+
+<P><PRE>to serial.and :pred1 :pred2
+if not run :pred1 [output &quot;false]
+output run :pred2
+end
+
+to serial.or :pred1 :pred2
+if run :pred1 [output &quot;true]
+output run :pred2
+end
+</PRE>
+
+<P>Here's how you would use <CODE>serial.and</CODE> to solve the problem
+with the nested <CODE>if</CODE>s:
+
+<P><PRE>if (serial.and [not emptyp :list] [equalp first :list 1]) [print &quot;one]
+</PRE>
+
+<P>Similarly, you could use <CODE>serial.or</CODE> instead of <CODE>or</CODE> to
+solve the efficiency problem in the first version of the pattern matcher:
+
+<P><PRE>output serial.or [match butfirst :pat :sen] [match :pat butfirst :sen]
+</PRE>
+
+<P>These procedures depend on the fact that the predicate expressions
+that are used as their inputs are presented inside square brackets; that's
+why they are not evaluated before <CODE>serial.and</CODE> or <CODE>serial.or</CODE> is
+invoked.
+
+<P><H2>Indirect Assignment</H2>
+
+
+
+<P>From now on, I'll be talking about the big pattern matcher, not the simple
+one I introduced to illustrate the structure of the problem.  Here is the
+top-level procedure <CODE>match</CODE>:
+
+<P><PRE>to match :pat :sen
+local [special.var special.pred special.buffer in.list]
+if or wordp :pat wordp :sen [output &quot;false]
+if emptyp :pat [output emptyp :sen]
+if listp first :pat [output special fput &quot;!: :pat :sen]
+if memberp first first :pat [? # ! &amp; @ ^] [output special :pat :sen]
+if emptyp :sen [output &quot;false]
+if equalp first :pat first :sen ~
+   [output match butfirst :pat butfirst :sen]
+output &quot;false
+end
+</PRE>
+
+<P>As you'd expect, there are more cases to consider in this more
+featureful version, but the basic structure is similar to the simple
+matcher.  The instructions starting <CODE>if emptyp</CODE>, <CODE>if memberp</CODE>,
+<CODE>if emptyp</CODE>, and <CODE>if equalp</CODE> play the same roles as similar
+instructions in the other version.  (The <CODE>memberp</CODE> test replaces the
+comparison against the word <CODE>#</CODE> with a wider range of choices.)
+
+<P>The first <CODE>if</CODE> instruction tests for errors in the format of the pattern
+or the sentence to be matched, in which a word is found where a list was
+expected.  It's not important if you use well-formed inputs to <CODE>match</CODE>.
+The <CODE>listp</CODE> test essentially converts a pattern like
+
+<P><PRE>[foo [some # pattern] baz]
+</PRE>
+
+<P>to the equivalent form
+
+<P><PRE>[foo !:[some # pattern] baz]
+</PRE>
+
+<P>The interesting new case comes when <CODE>match</CODE> sees a word in the
+pattern that starts with one of the six special quantifier characters.  In
+this case, <CODE>match</CODE> invokes <CODE>special</CODE> to check for a match.
+
+<P>One of the interesting properties of <CODE>special</CODE> is that it has to be able
+to assign a value to a variable whose name is not built into the program,
+but instead is part of the <EM>data</EM> used as input to the program.
+That is, if the word
+
+<P><PRE>?howmany:numberp
+</PRE>
+
+<P>appears in the pattern, <CODE>special</CODE> (or one of its
+subprocedures) must assign a value to the variable named <CODE>howmany</CODE>, but
+there is no instruction of the form
+
+<P><PRE>make &quot;howmany ...
+</PRE>
+
+<P>anywhere in the program.  Instead, <CODE>match</CODE> has <EM>another</EM>
+variable, whose name is <CODE>special.var</CODE>, whose <EM>value</EM> is the <EM>
+name</EM> <CODE>howmany</CODE>.  The assignment of the matching words to the
+pattern-specified variable is done with an instruction like
+
+<P><PRE>make :special.var ...
+</PRE>
+
+<P>Here the first input to <CODE>make</CODE> is not a quoted word, as usual,
+but an expression that must be evaluated to figure out which variable to use.
+
+<P><CODE>Special</CODE>, then, has two tasks.  First it must divide a word like
+
+<P><PRE>?howmany:numberp
+</PRE>
+
+<P>into its component parts; then it must carry out the matching
+tasks that are the <EM>meaning</EM> of those parts.  These two tasks are like
+a smaller version of what a programming language interpreter like Logo
+does.  Finding the meaningful parts of an instruction is called the <EM>
+syntax</EM> of a language, and understanding what the parts mean is called the
+<EM>semantics</EM> of the language.  <CODE>Special</CODE> has two instructions, one
+for the syntax and one for the semantics:
+
+<P><PRE>to special :pat :sen
+set.special parse.special butfirst first :pat "
+output run word &quot;match first first :pat
+end
+</PRE>
+
+<P>To <EM>parse</EM> something is to divide it into its pieces.  <CODE>
+Parse.special</CODE> outputs a list of the form
+
+<P><PRE>[howmany numberp]
+</PRE>
+
+<P>for the example we're considering.  Then <CODE>set.special</CODE> assigns
+the two members of this list as the values of two variables.  The variable
+named <CODE>special.var</CODE> is given the value <CODE>howmany</CODE>, and the variable
+named <CODE>special.pred</CODE> is given the value <CODE>numberp</CODE>.  This preliminary
+work is what makes possible the indirect assignment described earlier.
+
+<P><H2>Defaults</H2>
+
+<P>What happens if the pattern has a word like
+
+<P><PRE>?:numberp
+</PRE>
+
+<P>without a variable name?  What happens when the program tries to
+assign a value to the variable named in the pattern?  <CODE>Set.special</CODE>
+contains the instruction
+
+<P><PRE>if emptyp :special.var [make &quot;special.var &quot;special.buffer]
+</PRE>
+
+<P>The effect of this instruction is that if you do not mention a
+variable in the pattern, the variable named <CODE>special.buffer</CODE> will be
+used to hold the results of the match.  This variable is the <EM>
+default</EM> variable, the one used if no other is specified.
+
+<P>It's important, by the way, that the variable <CODE>special.buffer</CODE> is
+declared to be local in procedure <CODE>match</CODE>.  What makes it important is
+that <CODE>match</CODE> is recursive; if you use a pattern like
+
+<P><PRE>[a # b # c]
+</PRE>
+
+<P>then the matching of the second <CODE>#</CODE> is a subproblem of the
+matching of the first one.  <CODE>Match</CODE> invokes <CODE>special</CODE>, which invokes
+<CODE>match#</CODE>, which invokes <CODE>#test</CODE>, which invokes <CODE>
+match</CODE> on the <CODE>butfirst</CODE> of the pattern.  That <CODE>butfirst</CODE> contains
+another <CODE>#</CODE>.  Each of these uses the variable <CODE>special.buffer</CODE> to
+remember the words it is trying as a match; since the variable is declared
+local, the two don't get confused.  (This means, by the way, that you can
+really confuse <CODE>match</CODE> by using the same variable name twice in a
+pattern.  It requires a fairly complicated pattern to confuse <CODE>match</CODE>,
+but here is an example.  The first result is correct, the second incorrect.
+
+<P><PRE>? <U>print match [a #x b &amp;y ! c] [a i b j c b k c]</U>
+true
+? <U>show :x show :y</U>
+[i]
+[j c b]
+? <U>print match [a #x b &amp;x ! c] [a i b j c b k c]</U>
+false
+</PRE>
+
+<P>The only change is that the variable name <CODE>x</CODE> is used twice in
+the second pattern, and as a result, <CODE>match</CODE> doesn't find the correct
+match.  You'll know that you really understand how <CODE>match</CODE> works if you
+can explain why it <EM>won't</EM> fail if the <CODE>!</CODE> is removed from the
+pattern.)
+
+<P>When writing a tool to be used in other projects, especially if the tool
+will be used by other people, it's important to think about defaults.  What
+should the program do if some piece of information is missing?  If you don't
+provide for a default explicitly, the most likely result is a Logo error
+message; your program will end up trying to take <CODE>first</CODE> of an empty
+list, or something like that.
+
+<P>Another default in the pattern matcher is for the predicate used to test
+matches.  For example, what happens when the word
+
+<P><PRE>?howmany
+</PRE>
+
+<P>appears in the pattern, without a predicate?  This case is
+recognized by <CODE>parse.special</CODE>, in the instruction
+
+<P><PRE>if emptyp :word [output list :var &quot;always]
+</PRE>
+
+<P>The special predicate <CODE>always</CODE> is used if no other is given in
+the pattern.  <CODE>Always</CODE> has a very simple definition:
+
+<P><PRE>to always :x
+output &quot;true
+end
+</PRE>
+
+<P>
+
+<P><H2>Program as Data</H2>
+
+
+<P>The instruction in <CODE>special</CODE> that carries out the semantics of a special
+pattern-matching instruction word is
+
+<P><PRE>output run word &quot;match first first :pat
+</PRE>
+
+<P>If the pattern contains the word
+
+<P><PRE>?howmany:numberp
+</PRE>
+
+<P>then this instruction extracts the quantifier character
+<CODE>?</CODE> (the first character of the first word of the pattern) and makes
+from it a procedure name <CODE>match?</CODE>.  That name is then <CODE>run</CODE> as a Logo
+expression; that is, <CODE>special</CODE> invokes a procedure whose name is <CODE>
+match?</CODE>.
+
+<P>Most programming languages do not allow the invocation of a procedure based
+on finding the name of the procedure in the program's data.  Generally there
+is a very strict separation between program and data.  Being able to
+manipulate data to create a Logo instruction, and then run it, is a very
+powerful part of Logo.  It is also used to deal with the names of predicates
+included in the pattern; to see if a word in the sentence input to <CODE>
+match</CODE> is a match for a piece of the pattern, the predicate <CODE>try.pred</CODE>
+contains the instruction
+
+<P><PRE>output run list :special.pred quoted first :sen
+</PRE>
+
+<P>This instruction generates a list whose first member is the name
+of the predicate found in the pattern and whose second and last member is a
+word from the sentence.  Then this list is run as a Logo expression, which
+should yield either <CODE>true</CODE> or <CODE>false</CODE> as output, indicating whether
+or not the word is acceptable.
+
+<P><H2>Parsing Rules</H2>
+
+<P>When you are reading the program, remember that the kind of pattern that
+I've written as
+
+<P><PRE>[begin !:[smaller # pattern] end]
+</PRE>
+
+<P>is read by Logo as if I'd written
+
+<P><PRE>[begin !: [smaller # pattern] end]
+</PRE>
+
+<P>That is to say, this pattern is a list of four members.  I think
+of the middle two as a unit, representing a single thing to match.  The
+sublist takes the place of a predicate name after the <CODE>!</CODE> quantifier.
+But for Logo, there is no predicate name in the word starting with the
+exclamation point; the pattern is a separate member of the large list.
+That's why <CODE>set.special</CODE> uses the expression
+
+<P><PRE>emptyp :special.pred
+</PRE>
+
+<P>to test for this situation, rather than <CODE>listp</CODE>.  After <CODE>
+parse.special</CODE> does its work, all it has found is a colon with nothing
+following it.  <CODE>Set.special</CODE> has to look at the next member of the
+pattern list in order to find the subpattern.
+
+<P><H2>Further Explorations</H2>
+
+<P>Chapter 9 is a large program that uses <CODE>match</CODE>.  It
+may give you ideas for the ways in which this tool can be used in your own
+programs.  Here, instead of talking about applications of <CODE>match</CODE>, I'll
+discuss some possible extensions or revisions of the pattern matcher itself.
+
+<P>There are many obvious small extensions.  For example, to complement the
+special <CODE>in</CODE> primitive, you could write <CODE>notin</CODE>, which would accept
+all <EM>but</EM> the members of the following list.  You could allow the use
+of a number as the predicate, meaning that exactly that many matching words
+are required.  That is, in the example for which I invented the predicate
+<CODE>threep</CODE>, I would instead be able to use
+
+<P><PRE>[@begin:3 #rest]
+</PRE>
+
+<P>as the pattern.
+
+<P>There is no convenient way to say in a pattern that some subpattern can be
+repeated several times, if the subpattern is more than a single word.  That
+is, in the second version of <CODE>converse</CODE>, instead of having to use <CODE>
+while</CODE> to chop off  pieces of the matched
+sentence into a variable <CODE>rest</CODE>, I'd like to be able to say in the
+pattern something like
+
+<P><PRE>[@@: [@:anyof [[my name is #name:notand]
+               [i like #like:notand]
+               [&amp;:notand]]
+      ?:in [and]]]
+</PRE>
+
+<P>Here the doubled atsign (<CODE>@@</CODE>) means that the entire pattern
+that follows should be matched repeatedly instead of only once.
+
+<P>For other approaches to pattern matching, you might want to read about the
+programming languages Snobol and Icon, each of which includes pattern
+matching as one of its main features.
+
+<P>
+<TABLE width="100%"><TR><TD><A HREF="../v2-toc2.html">(back to Table of Contents)</A>
+<TD align="right"><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A>
+</TABLE>
+
+<P><H2>Program Listing</H2>
+
+<P><P>
+<P><PRE>
+to match :pat :sen
+local [special.var special.pred special.buffer in.list]
+if or wordp :pat wordp :sen [output "false]
+if emptyp :pat [output emptyp :sen]
+if listp first :pat [output special fput "!: :pat :sen]
+if memberp first first :pat [? # ! & @ ^] [output special :pat :sen]
+if emptyp :sen [output "false]
+if equalp first :pat first :sen ~
+   [output match butfirst :pat butfirst :sen]
+output "false
+end
+
+;; Parsing quantifiers
+
+to special :pat :sen
+set.special parse.special butfirst first :pat "
+output run word "match first first :pat
+end
+
+to parse.special :word :var
+if emptyp :word [output list :var "always]
+if equalp first :word ": [output list :var butfirst :word]
+output parse.special butfirst :word word :var first :word
+end
+
+to set.special :list
+make "special.var first :list
+make "special.pred last :list
+if emptyp :special.var [make "special.var "special.buffer]
+if memberp :special.pred [in anyof] [set.in]
+if not emptyp :special.pred [stop]
+make "special.pred first butfirst :pat
+make "pat fput first :pat butfirst butfirst :pat
+end
+
+to set.in
+make "in.list first butfirst :pat
+make "pat fput first :pat butfirst butfirst :pat
+end
+
+;; Exactly one match
+
+to match!
+if emptyp :sen [output "false]
+if not try.pred [output "false]
+make :special.var first :sen
+output match butfirst :pat butfirst :sen
+end
+
+;; Zero or one match
+
+to match?
+make :special.var []
+if emptyp :sen [output match butfirst :pat :sen]
+if not try.pred [output match butfirst :pat :sen]
+make :special.var first :sen
+if match butfirst :pat butfirst :sen [output "true]
+make :special.var []
+output match butfirst :pat :sen
+end
+
+;; Zero or more matches
+
+to match#
+make :special.var []
+output #test #gather :sen
+end
+
+to #gather :sen
+if emptyp :sen [output :sen]
+if not try.pred [output :sen]
+make :special.var lput first :sen thing :special.var
+output #gather butfirst :sen
+end
+
+to #test :sen
+if match butfirst :pat :sen [output "true]
+if emptyp thing :special.var [output "false]
+output #test2 fput last thing :special.var :sen
+end
+
+to #test2 :sen
+make :special.var butlast thing :special.var
+output #test :sen
+end
+
+;; One or more matches
+
+to match&
+output &test match#
+end
+
+to &test :tf
+if emptyp thing :special.var [output "false]
+output :tf
+end
+
+;; Zero or more matches (as few as possible)
+
+to match^
+make :special.var []
+output ^test :sen
+end
+
+to ^test :sen
+if match butfirst :pat :sen [output "true]
+if emptyp :sen [output "false]
+if not try.pred [output "false]
+make :special.var lput first :sen thing :special.var
+output ^test butfirst :sen
+end
+
+;; Match words in a group
+
+to match@
+make :special.var :sen
+output @test []
+end
+
+to @test :sen
+if @try.pred [if match butfirst :pat :sen [output "true]]
+if emptyp thing :special.var [output "false]
+output @test2 fput last thing :special.var :sen
+end
+
+to @test2 :sen
+make :special.var butlast thing :special.var
+output @test :sen
+end
+
+;; Applying the predicates
+
+to try.pred
+if listp :special.pred [output match :special.pred first :sen]
+output run list :special.pred quoted first :sen
+end
+
+to quoted :thing
+if listp :thing [output :thing]
+output word "" :thing
+end
+
+to @try.pred
+if listp :special.pred [output match :special.pred thing :special.var]
+output run list :special.pred thing :special.var
+end
+
+;; Special predicates
+
+to always :x
+output "true
+end
+
+to in :word
+output memberp :word :in.list
+end
+
+to anyof :sen
+output anyof1 :sen :in.list
+end
+
+to anyof1 :sen :pats
+if emptyp :pats [output "false]
+if match first :pats :sen [output "true]
+output anyof1 :sen butfirst :pats
+end
+</PRE><P>
+
+
+
+<P><A HREF="../v2-toc2.html">(back to Table of Contents)</A>
+<P><A HREF="../v2ch6/v2ch6.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../v2ch8/v2ch8.html"><STRONG>NEXT</STRONG></A>
+
+<P>
+<ADDRESS>
+<A HREF="../index.html">Brian Harvey</A>, 
+<CODE>bh@cs.berkeley.edu</CODE>
+</ADDRESS>
+</BODY>
+</HTML>
author	elioat <elioat@tilde.institute>	2023-08-23 07:52:19 -0400
committer	elioat <elioat@tilde.institute>	2023-08-23 07:52:19 -0400
commit	562a9a52d599d9a05f871404050968a5fd282640 (patch)
tree	7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/v2ch7/match.html
parent	5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff)
download	tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz