*

author: elioat <elioat@tilde.institute> 2023-08-23 07:52:19 -0400
committer: elioat <elioat@tilde.institute> 2023-08-23 07:52:19 -0400
commit: 562a9a52d599d9a05f871404050968a5fd282640 (patch)
tree: 7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/ssch16/match.html
parent: 5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff)
download: tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz
1 files changed, 1399 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/ssch16/match.html b/js/games/nluqo.github.io/~bh/ssch16/match.html
new file mode 100644
index 0000000..922bee4
--- /dev/null
+++ b/js/games/nluqo.github.io/~bh/ssch16/match.html
@@ -0,0 +1,1399 @@
+<P>
+
+<P><A NAME="bongard"></A><CENTER><IMG SRC="../ss-pics/patterns.jpg" ALT="figure: patterns"></CENTER><P><CENTER>In each set, how do the ones on the left differ from the ones on the
+right?
+</CENTER><P>
+
+
+<P><HTML>
+<HEAD>
+<TITLE>Simply Scheme: Introducing Computer Science ch 16: Example: Pattern Matcher</TITLE>
+</HEAD>
+<BODY>
+<HR>
+<CITE>Simply Scheme:</CITE>
+<CITE>Introducing Computer Science</CITE> 2/e Copyright (C) 1999 MIT
+<H2>Chapter 16</H2>
+<H1>Example: Pattern Matcher</H1>
+
+<TABLE width="100%"><TR><TD>
+<IMG SRC="../simply.jpg" ALT="cover photo">
+<TD><TABLE>
+<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian
+Harvey</A><BR>University of California, Berkeley</CITE>
+<TR><TD align="right"><CITE><A HREF="http://ccrma.stanford.edu/~matt">Matthew
+Wright</A><BR>University of California, Santa Barbara</CITE>
+<TR><TD align="right"><BR>
+<TR><TD align="right"><A HREF="../pdf/ssch16.pdf">Download PDF version</A>
+<TR><TD align="right"><A HREF="../ss-toc2.html">Back to Table of Contents</A>
+<TR><TD align="right"><A HREF="../ssch15/poker.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../ssch17/part5.html"><STRONG>NEXT</STRONG></A>
+<TR><TD align="right"><A HREF="http://mitpress.mit.edu/0262082810">MIT
+Press web page for <CITE>Simply Scheme</CITE></A>
+</TABLE></TABLE>
+
+<HR>
+
+
+<P>It's time for another extended example in which we use the Scheme tools
+we've been learning to accomplish something practical.  We'll start by
+describing how the program will work before we talk about how to implement
+it.
+
+<P>You can load our program into Scheme by typing
+
+<P><PRE>(load &quot;match.scm&quot;)
+</PRE>
+
+<P><H2>Problem Description</H2>
+
+<P>A <EM><A NAME="g1"></A><A NAME="g2"></A>pattern matcher</EM> is a commonly used procedure whose job is
+to compare a sentence to a range of possibilities.  An example may make this
+clear:
+
+<P><PRE>&gt; (match '(* me *) '(love me do))
+#T
+
+&gt; (match '(* me *) '(please please me))
+#T
+
+&gt; (match '(* me *) '(in my life))
+#F
+</PRE>
+
+<P>The first argument, <CODE>(* me *)</CODE>, is a <EM>pattern.</EM>  In the
+pattern, each asterisk (<CODE>*</CODE>) means &quot;any number of words, including no
+words at all.&quot; So the entire pattern matches any sentence that contains the
+word &quot;me&quot; anywhere within it.  You can think of <CODE>match</CODE> as a more
+general form of <CODE>equal?</CODE> in the sense that it compares two sentences
+and tells us whether they're the same, but with a broader meaning of &quot;the
+same.&quot;
+
+<P>Our pattern matcher will accept patterns more complicated than this first
+example.  There are four <EM>special characters</EM> that indicate
+unspecified parts of a pattern, depending on the number of words that should
+be allowed:
+
+<P><P><TABLE><TR><TH align="right" valign="top"><CODE>?</CODE><TD>&nbsp;&nbsp;&nbsp;&nbsp;<TD valign="top">At most one word.
+<TR><TH align="right" valign="top"><CODE>!</CODE><TD>&nbsp;&nbsp;&nbsp;&nbsp;<TD valign="top">Exactly one word.
+<TR><TH align="right" valign="top"><CODE>&amp;</CODE><TD>&nbsp;&nbsp;&nbsp;&nbsp;<TD valign="top">At least one word.
+<TR><TH align="right" valign="top"><CODE>*</CODE><TD>&nbsp;&nbsp;&nbsp;&nbsp;<TD valign="top">Any number of words.
+</TABLE>
+
+<P><P>These characters are meant to be somewhat mnemonic.  The question
+mark means &quot;maybe there's a word.&quot; The exclamation point means &quot;precisely
+one word!&quot; (And it's vertical, just like the digit 1, sort of.)  The
+ampersand, which ordinarily means &quot;and,&quot; indicates that we're matching a
+word and maybe more.  The asterisk doesn't have any mnemonic value, but it's
+what everyone uses for a general matching indicator anyway.
+
+<P>We can give a <EM>name</EM> to the collection of words that match an
+unspecified part of a pattern by including in the pattern a word that starts
+with one of the four special characters and continues with the name.  If the
+match succeeds, <CODE>match</CODE> will return a sentence containing these names
+and the corresponding values from the sentence:
+
+<P><PRE>&gt; (match '(*start me *end) '(love me do))
+(START LOVE ! END DO !)
+
+&gt; (match '(*start me *end) '(please please me))
+(START PLEASE PLEASE ! END !)
+
+&gt; (match '(mean mr mustard) '(mean mr mustard))
+()
+
+&gt; (match '(*start me *end) '(in my life))
+FAILED
+</PRE>
+
+<P>In these examples, you see that <CODE>match</CODE> doesn't really return
+<CODE>#t</CODE> or <CODE>#f</CODE>; the earlier set of examples showed a simplified
+picture.  In the first of the new examples, the special pattern word
+<CODE>*start</CODE> is allowed to match any number of words, as indicated by the
+asterisk.  In this case it turned out to match the single word &quot;love.&quot;
+<CODE>Match</CODE> returns a result that tells us which words of the sentence match
+the named special words in the pattern.  (We'll call one of these special
+pattern words a <EM>placeholder.</EM>) The exclamation points in the
+returned value are needed to separate one match from another.  (In the second
+example, the name <CODE>end</CODE> was matched by an empty set of words.)  In the
+third example, the match was successful, but since there were no
+placeholders the returned sentence was empty.  If the match is unsuccessful,
+<CODE>match</CODE> returns the word <CODE>failed</CODE>.<A NAME="text1" HREF="match.html#ft1">[1]</A>
+
+<P>If the same placeholder name appears more than once in the pattern, then it
+must be matched by the same word(s) in the sentence each time:
+
+<P><PRE>&gt; (match '(!twice !other !twice) '(cry baby cry))
+(TWICE CRY ! OTHER BABY !)
+
+&gt; (match '(!twice !other !twice) '(please please me))
+FAILED
+</PRE>
+
+<P>Some patterns might be matchable in more than one way.  For example, the
+invocation
+
+<P><PRE>&gt; (match '(*front *back) '(your mother should know))
+</PRE>
+
+<P>might return any of five different correct answers:
+
+<P><PRE>(FRONT YOUR MOTHER SHOULD KNOW ! BACK !)
+(FRONT YOUR MOTHER SHOULD ! BACK KNOW !)
+(FRONT YOUR MOTHER ! BACK SHOULD KNOW !)
+(FRONT YOUR ! BACK MOTHER SHOULD KNOW !)
+(FRONT ! BACK YOUR MOTHER SHOULD KNOW !)
+</PRE>
+
+<P>We arbitrarily decide that in such cases the first placeholder
+should match as many words as possible, so in this case <CODE>match</CODE> will
+actually return the first of these answers.
+
+<P>Before continuing, you might want to look at the first batch of exercises at
+the end of this chapter, which are about using the pattern matcher.  (The
+rest of the exercises are about the implementation, which we'll discuss
+next.)
+
+<P><H2>Implementation: When Are Two Sentences Equal?</H2>
+
+<P>Our approach to implementation will be to start with something we already
+know how to write: a predicate that tests whether two sentences are exactly
+equal.  We will add capabilities one at a time until we reach our goal.
+
+<P>Suppose that Scheme's primitive <CODE>equal?</CODE> function worked only for words
+and not for sentences.  We could write an equality tester for sentences,
+like this:
+
+<P><PRE>(define (<A NAME="g3"></A>sent-equal? sent1 sent2)
+  (cond ((empty? sent1)
+	 (empty? sent2))
+	((empty? sent2) #f)
+	((equal? (first sent1) (first sent2))
+	 (sent-equal? (bf sent1) (bf sent2)))
+	(else #f)))
+</PRE>
+
+<P>Two sentences are equal if each word in the first sentence is
+equal to the corresponding word in the second.  They're unequal if one
+sentence runs out of words before the other.
+
+<P>Why are we choosing to accept Scheme's primitive word comparison but rewrite
+the sentence comparison?  In our pattern matcher, a placeholder in the
+pattern corresponds to a group of words in the sentence.  There is no kind
+of placeholder that matches only part of a word.  (It would be possible to
+implement such placeholders, but we've chosen not to.)  Therefore, we will
+never need to ask whether a word is &quot;almost equal&quot; to another word.
+
+<P><H2>When Are Two Sentences Nearly Equal?</H2>
+
+<P>Pattern matching is just a more general form of this <CODE>sent-equal?</CODE> 
+procedure.  Let's write a very simple pattern matcher that knows only about
+the &quot;!&quot; special character and doesn't let us name the words
+that match the exclamation points in the pattern.  We'll call this one <CODE>match?</CODE> with a question mark because it returns just true or false.
+
+<P><PRE>(define (match? pattern sent)                ;; first version: ! only
+  (cond ((empty? pattern)
+	 (empty? sent))
+	((empty? sent) #f)
+	<U>((equal? (first pattern) '!)</U>
+	 <U>(match? (bf pattern) (bf sent)))</U>
+	((equal? (first pattern) (first sent))
+	 (match? (bf pattern) (bf sent)))
+	(else #f)))
+</PRE>
+
+<P>This program is exactly the same as <CODE>sent-equal?</CODE>, except for
+the highlighted <CODE>cond</CODE> clause.  We are still comparing each word of the
+pattern with the corresponding word of the sentence, but now an exclamation
+mark in the pattern matches <EM>any</EM> word in the sentence.  (If <CODE>first</CODE> of <CODE>pattern</CODE> is an exclamation mark, we don't even look at <CODE>first</CODE> of <CODE>sent</CODE>.)
+
+<P>Our strategy in the next several sections will be to expand the pattern
+matcher by implementing the remaining special characters, then finally
+adding the ability to name the placeholders.  For now, when we say something
+like &quot;the <CODE>*</CODE> placeholder,&quot; we mean the placeholder consisting of the
+asterisk alone.  Later, after we add named placeholders, the same procedures
+will implement any placeholder that begins with an asterisk.
+
+<P><H2>Matching with Alternatives</H2>
+
+<P>The <CODE>!</CODE> matching is not much harder than <CODE>sent-equal?</CODE>, because
+it's still the case that one word of the pattern must match one word of the
+sentence.  When we introduce the <CODE>?</CODE> option, the structure of the
+program must be more complicated, because a question mark in the pattern
+might or might not be paired up with a word in the sentence.  In other
+words, the pattern and the sentence might match without being the same
+length.
+
+<P><PRE>(define (match? pattern sent)                ;; second version: ! and ?
+  (cond ((empty? pattern)
+	 (empty? sent))
+	<U>((equal? (first pattern) '?)</U>
+	 <U>(if (empty? sent)</U>
+	     <U>(match? (bf pattern) '())</U>
+	     <U>(or (match? (bf pattern) (bf sent))</U>
+		 <U>(match? (bf pattern) sent))))</U>
+	((empty? sent) #f)
+	((equal? (first pattern) '!)
+	 (match? (bf pattern) (bf sent)))
+	((equal? (first pattern) (first sent))
+	 (match? (bf pattern) (bf sent)))
+	(else #f)))
+</PRE>
+
+<P>Note that the new <CODE>cond</CODE> clause comes <EM>before</EM> the check
+to see if <CODE>sent</CODE> is empty.  That's because <CODE>sent</CODE> might be empty and
+a pattern of <CODE>(?)</CODE> would still match it.  But if the sentence is empty,
+we know that the question mark doesn't match a word, so we just have to make
+sure that the <CODE>butfirst</CODE> of the pattern contains nothing but question
+marks.  (We don't have a predicate named <CODE>all-question-marks?</CODE>; instead,
+we use <CODE>match?</CODE> recursively to make this test.)
+
+<P>In general, a question mark in the pattern has to match either one word or
+zero words in the sentence.  How do we decide?  Our rule is that each
+placeholder should match as many words as possible, so we prefer to match
+one word if we can.  But allowing the question mark to match a word might
+prevent the rest of the pattern from matching the rest of the sentence.
+
+<P>Compare these two examples:
+
+<P><PRE>&gt; (match? '(? please me) '(please please me))
+#T
+
+&gt; (match? '(? please me) '(please me))
+#T
+</PRE>
+
+<P>In the first case, the first thing in the pattern is a question mark and the
+first thing in the sentence is &quot;please,&quot; and they match.  That leaves
+&quot;please me&quot; in the pattern to match &quot;please me&quot; in the sentence.
+
+<P>In the second case, we again have a question mark as the first thing in the
+pattern and &quot;please&quot; as the first thing in the sentence.  But this time,
+we had better not use up the &quot;please&quot; in the sentence, because that will
+only leave &quot;me&quot; to match &quot;please me.&quot; In this case the question mark has
+to match no words.
+
+<P>To you, these examples probably look obvious.  That's because you're a human
+being, and you can take in the entire pattern and the entire sentence all at
+once.  Scheme isn't as smart as you are; it has to compare words one pair at
+a time.  To Scheme, the processing of both examples begins with question
+mark as the first word of the pattern and &quot;please&quot; as the first word of
+the sentence.  The pattern matcher has to consider both cases.
+
+<P>How does the procedure consider both cases?  Look at the invocation of <CODE>or</CODE> by the <CODE>match?</CODE> procedure.  There are two alternatives; if either
+turns out true, the match succeeds.  One is that we try to match the
+question mark with the first word of the sentence just as we matched <CODE>!</CODE> in our earlier example&mdash;by making a recursive call on the <CODE>butfirst</CODE>s of the pattern and sentence.  If that returns true, then the
+question mark matches the first word.
+
+<P>The second alternative that can make the match succeed is a recursive call
+to <CODE>match?</CODE> on the <CODE>butfirst</CODE> of the pattern and the <EM>entire</EM>
+sentence; this corresponds to matching the <CODE>?</CODE> against
+nothing.<A NAME="text2" HREF="match.html#ft2">[2]</A>
+
+<P>Let's trace <CODE>match?</CODE> so that you can see how these two cases are
+handled differently by the program.
+
+<P><PRE>&gt; (trace match?)
+&gt; (match? '(? please me) '(please please me))
+<SMALL><CODE>(match? (? please me) (please please me))
+| (match? (please me) (please me))                Try matching <CODE>?</CODE> with <CODE>please</CODE>.
+|  |  (match? (me) (me))
+|  |  |  (match? () ())
+|  |  |  #t                                       It works!
+|  |  #t
+|  #t
+#t
+</CODE></SMALL>#T
+</PRE>
+
+<P><PRE>&gt; (match? '(? please me) '(please me))
+<SMALL><CODE>(match? (? please me) (please me))
+| (match? (please me) (me))                        Try matching <CODE>?</CODE> with <CODE>please</CODE>.
+|  #f                                              It doesn't work.
+| (match? (please me) (please me))                 This time, match <CODE>?</CODE> with nothing.
+|  |  (match? (me) (me))
+|  |  |  (match? () ())
+|  |  |  #t
+|  |  #t
+|  #t
+#t
+</CODE></SMALL>#T
+</PRE>
+
+<P><H2>Backtracking</H2>
+
+<P>The program structure that allows for two alternative routes to success has
+more profound implications than you may think at first.
+
+<P>When <CODE>match?</CODE> sees a question mark in the pattern, it has to decide
+whether or not to &quot;use up&quot; a word of the sentence by matching it with the
+question mark.  You might wonder, &quot;How does the question mark decide
+whether to take a word?&quot; The answer is that the decision isn't made &quot;by
+the question mark&quot;; there's nothing about the particular word that the
+question mark might match that helps with the decision!  Instead, the
+decision depends on matching what comes to the right of the question mark.
+
+<P>Compare this situation with the <CODE>keep</CODE> recursive pattern.  There, too,
+the procedure makes a decision about the first word of a sentence, and each
+alternative leads to a recursive call for the <CODE>butfirst</CODE>:
+
+<P><PRE>(cond ((empty? sent) '())
+      ((some-test? (first sent))
+       (se (first sent) <U>(recursive-call (bf sent))</U>))
+      (else <U>(recursive-call (bf sent))</U>))
+</PRE>
+
+<P>The difference is that in the <CODE>keep</CODE> pattern the choice
+between alternatives <EM>can</EM> be made just by looking at the immediate
+situation&mdash;the single word that might or might not be chosen; the decision
+doesn't depend on anything in the rest of the problem.  As a result, the
+choice has already been made before any recursive call happens.  Therefore,
+only one of the recursive calls is actually made, to make choices about the
+remaining words in the sentence.
+
+<P>In <CODE>match?</CODE>, by contrast, any particular invocation can't make its
+choice until it knows the result of a recursive invocation.  The result
+from the recursive call determines the choice made by the caller.
+
+<P>Here's a model that might help you think about this kind of recursion.  <CODE>Match?</CODE> sees a question mark in the pattern.  It makes a <EM>tentative</EM>
+decision that this question mark should match the first word of the
+sentence, and it uses a recursive invocation to see whether that decision
+allows the rest of the problem to be solved.  If so, the tentative choice
+was correct.  If not, <CODE>match?</CODE> tries an alternative decision that the
+question mark doesn't match a word.  This alternative is still tentative;
+another recursive call is needed to see if the rest of the pattern can
+succeed.  If not, the overall match fails.
+
+<P>This structure is called <EM>backtracking.</EM>
+
+<P>What if there are two question marks in the pattern?  Then there are <EM>four</EM> ways to match the overall pattern.  Both question marks can match a
+word, or only the first question mark, or only the second, or neither.  A
+pattern with several placeholders leads to even more alternatives.  A
+pattern with three question marks will have eight alternatives.  (All three
+match words, the first two do but the third doesn't, and so on.)  A pattern
+with 10 question marks will have 1024 alternatives.  How can <CODE>match?</CODE>
+try all these alternatives?  The procedure seems to make only one two-way
+choice; how can it accomplish a four-way or many-way decision?
+
+<P>The secret is the same as the usual secret of recursion:  Most of the work
+is done in recursive calls.  We take a leap of faith that recursive
+invocations will take care of the decisions concerning question marks later
+in the pattern.  Think about it using the backtracking model.  Let's suppose
+there are 10 question marks in the pattern.  When <CODE>match?</CODE> encounters
+the leftmost question mark, it makes a tentative decision to match the
+question mark with a word of the sentence.  To test whether this choice can
+work, <CODE>match?</CODE> invokes itself recursively on a pattern with nine
+question marks.  By the leap of faith, the recursive invocation will examine
+512 ways to match question marks with words&mdash;half of the total number.  If
+one of these 512 works, we're finished.  If not, the original <CODE>match?</CODE>
+invocation changes its tentative choice, deciding instead <EM>not</EM> to
+match its question mark (the leftmost one) with a word of the sentence.
+Another recursive call is made based on that decision, and that recursive
+call checks out the remaining 512 possibilities.
+
+<P>By the way, the program doesn't always have to try all of the different
+combinations of question marks matching or not matching words separately.
+For example, if the problem is
+
+<P><PRE>(match? '(a b ? ? ? ?) '(x y z w p q))
+</PRE>
+
+<P>then the very first comparison discovers that <CODE>a</CODE> is different
+from <CODE>x</CODE>, so none of the 16 possible arrangements about question marks
+matching or not matching words will make a difference.
+
+<P>Here are some traced examples involving patterns with two question marks, to
+show how the result of backtracking depends on the individual problem.
+
+<P><PRE>&gt; (match? '(? ? foo) '(bar foo))
+<SMALL><CODE>(match? (? ? foo) (bar foo))
+|  (match? (? foo) (foo))
+|  |  (match? (foo) ())
+|  |  #f
+|  |  (match? (foo) (foo))
+|  |  |  (match? () ())
+|  |  |  #t
+|  |  #t
+|  #t 
+#t
+</CODE></SMALL>#T
+</PRE>
+
+<P>In this first example, the first question mark tries to match the
+word <CODE>bar</CODE>, but it can't tell whether or not that match will succeed
+until the recursive call returns.  In the recursive call, the second
+question mark tries to match the word <CODE>foo</CODE>, and fails.  Then the
+second question mark tries again, this time matching nothing, and succeeds.
+Therefore, the first question mark can report success; it never has to try a
+recursive call in which it doesn't match a word.
+
+<P>In our second example, each question mark will have to try both alternatives,
+matching and then not matching a word, before the overall match succeeds.
+
+<P><PRE>&gt; (match? '(? ? foo bar) '(foo bar))
+<SMALL><CODE>(match? (? ? foo bar) (foo bar))
+|  (match? (? foo bar) (bar))
+|  |  (match? (foo bar) ())
+|  |  #f
+|  |  (match? (foo bar) (bar))
+|  |  #f
+|  #f
+|  (match? (? foo bar) (foo bar))
+|  |  (match? (foo bar) (bar))
+|  |  #f
+|  |  (match? (foo bar) (foo bar))
+|  |  |  (match? (bar) (bar))
+|  |  |  |  (match? () ())
+|  |  |  |  #t
+|  |  |  #t
+|  |  #t
+|  #t
+#t
+</CODE></SMALL>#T
+</PRE>
+
+<P>The first question mark tries to match the word <CODE>foo</CODE> in the
+sentence, leaving the pattern <CODE>(? foo bar)</CODE> to match <CODE>(bar)</CODE>.  The
+second question mark will try both matching and not matching a word, but
+neither succeeds.  Therefore, the first question mark tries again, this time
+not matching a word.  The second question mark first tries matching <CODE>foo</CODE>, and when that fails, tries not matching anything.  This last attempt is
+successful.
+
+<P>In the previous example, every question mark's first attempt failed.  The
+following example illustrates the opposite case, in which every question
+mark's first attempt succeeds.
+
+<P><PRE>&gt; (match? '(? ? baz) '(foo bar baz))
+<SMALL><CODE>(match? (? ? baz) (foo bar baz))
+|  (match? (? baz) (bar baz))
+|  |  (match? (baz) (baz))
+|  |  |  (match? () ())
+|  |  |  #t
+|  |  #t
+|  #t
+#t
+</CODE></SMALL>#t
+</PRE>
+
+<P>The first question mark matches <CODE>foo</CODE>; the second matches <CODE>bar</CODE>.
+
+<P>If the sentence is shorter than the pattern, we may end up trying to match a
+pattern against an empty sentence.  This is much easier than the general
+problem, because there aren't two alternatives; a question mark has no word
+in the sentence to match.
+
+<P><PRE>&gt; (match? '(? ? foo) '())
+<SMALL><CODE>(match? (? ? foo) ())
+|  (match? (? foo) ())
+|  |  (match? (foo) ())
+|  |  #f
+|  #f
+#f
+</CODE></SMALL>#f
+</PRE>
+
+<P>Each question mark knows right away that it had better not try to
+match a word, so we never have to backtrack.
+
+<P><H2>Matching Several Words</H2>
+
+<P>The next placeholder we'll implement is <CODE>*</CODE>.  The order in which we're
+implementing these placeholders was chosen so that each new version
+increases the variability in the number of words a placeholder can match.
+The <CODE>!</CODE> placeholder was very easy because it always matches exactly one
+word; it's hardly different at all from a non-placeholder in the pattern.
+Implementing <CODE>?</CODE> was more complicated because there were two
+alternatives to consider.  But for <CODE>*</CODE>, we might match any number of
+words, up to the entire rest of the sentence.
+
+<P>Our strategy will be a generalization of the <CODE>?</CODE> strategy:  Start with
+a &quot;greedy&quot; match, and then, if a recursive call tells us that the
+remaining part of the sentence can't match the rest of the pattern, try a
+less greedy match.
+
+<P>The difference between <CODE>?</CODE> and <CODE>*</CODE> is that <CODE>?</CODE> allows only two
+possible match lengths, zero and one.  Therefore, these two cases can be
+checked with two explicit subexpressions of an <CODE>or</CODE> expression.  In the
+more general case of <CODE>*</CODE>, any length is possible, so we can't check every
+possibility separately.  Instead, as in any problem of unknown size, we use
+recursion.  First we try the longest possible match; if that fails because
+the rest of the pattern can't be matched, a recursive call tries the
+next-longest match.  If we get all the way down to an empty match for the
+<CODE>*</CODE> and still can't match the rest of the pattern, then we return <CODE>#f</CODE>.
+
+<P><PRE>(define (match? pattern sent)              ;; third version: !, ?, and *
+  (cond ((empty? pattern)
+	 (empty? sent))
+	((equal? (first pattern) '?)
+	 (if (empty? sent)
+	     (match? (bf pattern) '())
+	     (or (match? (bf pattern) (bf sent))
+		 (match? (bf pattern) sent))))
+	<U>((equal? (first pattern) '*)</U>
+	 <U>(*-longest-match (bf pattern) sent))</U>
+	((empty? sent) #f)
+	((equal? (first pattern) '!)
+	 (match? (bf pattern) (bf sent)))
+	((equal? (first pattern) (first sent))
+	 (match? (bf pattern) (bf sent)))
+	(else #f)))
+
+(define (*-longest-match pattern-rest sent)
+  (*-lm-helper pattern-rest sent '()))
+
+(define (*-lm-helper pattern-rest sent-matched sent-unmatched)
+  (cond ((match? pattern-rest sent-unmatched) #t)
+	((empty? sent-matched) #f)
+	(else (*-lm-helper pattern-rest
+			   (bl sent-matched)
+			   (se (last sent-matched) sent-unmatched)))))
+</PRE>
+
+<P>If an asterisk is found in the pattern, <CODE>match?</CODE> invokes <CODE>*-longest-match</CODE>, which carries out this backtracking approach.
+
+<P>The real work is done by <CODE>*-lm-helper</CODE>, which has three arguments.
+The first argument is the still-to-be-matched part of the pattern, following
+the <CODE>*</CODE> placeholder that we're trying to match now.  <CODE>Sent-matched</CODE>
+is the part of the sentence that we're considering as a candidate to match
+the <CODE>*</CODE> placeholder.  <CODE>Sent-unmatched</CODE> is the remainder of the
+sentence, following the words in <CODE>sent-matched</CODE>; it must match
+<CODE>pattern-rest</CODE>.
+
+<P>Since we're trying to find the longest possible match, <CODE>*-longest-match</CODE>
+chooses the entire sentence as the first attempt for <CODE>sent-matched</CODE>.
+Since <CODE>sent-matched</CODE> is using up the entire sentence, the initial value
+of <CODE>sent-unmatched</CODE> is empty.  The only job of <CODE>*-longest-match</CODE> is
+to invoke <CODE>*-lm-helper</CODE> with these initial arguments.  On each
+recursive invocation, <CODE>*-lm-helper</CODE> shortens <CODE>sent-matched</CODE> by one
+word and accordingly lengthens <CODE>sent-unmatched</CODE>.
+
+<P>Here's an example in which the <CODE>*</CODE> placeholder tries to match four
+words, then three words, and finally succeeds with two words:
+
+<P><PRE>&gt; (trace match? *-longest-match *-lm-helper)
+
+&gt; (match? '(* days night) '(a hard days night))
+<SMALL><CODE>(match? (* days night) (a hard days night))
+|  (*-longest-match (days night) (a hard days night))
+|  |  (*-lm-helper (days night) (a hard days night) ())
+|  |  |  (match? (days night) ())
+|  |  |  #f
+|  |  |  (*-lm-helper (days night) (a hard days) (night))
+|  |  |  |  (match? (days night) (night))
+|  |  |  |  #f
+|  |  |  |  (*-lm-helper (days night) (a hard) (days night))
+|  |  |  |  |  (match? (days night) (days night))
+|  |  |  |  |  |  (match? (night) (night))
+|  |  |  |  |  |  |  (match? () ())
+|  |  |  |  |  |  |  #t
+|  |  |  |  |  |  #t
+|  |  |  |  |  #t
+|  |  |  |  #t
+|  |  |  #t
+|  |  #t
+|  #t
+#t
+</CODE></SMALL>#t
+</PRE>
+
+<P><H2>Combining the Placeholders</H2>
+
+<P>We have one remaining placeholder, <CODE>&amp;</CODE>, which is much like <CODE>*</CODE>
+except that it fails unless it can match at least one word.  We could,
+therefore, write a <CODE>&amp;-longest-match</CODE> that would be identical to <CODE>*-longest-match</CODE> except for the base case of its helper procedure.  If <CODE>sent-matched</CODE> is empty, the result is <CODE>#f</CODE> even if it would be possible
+to match the rest of the pattern against the rest of the sentence.  (All we
+have to do is exchange the first two clauses of the <CODE>cond</CODE>.)
+
+<P><PRE>(define (&amp;-longest-match pattern-rest sent)
+  (&amp;-lm-helper pattern-rest sent '()))
+
+(define (&amp;-lm-helper pattern-rest sent-matched sent-unmatched)
+  (cond ((empty? sent-matched) #f)
+	((match? pattern-rest sent-unmatched) #t)
+	(else (&amp;-lm-helper pattern-rest
+			   (bl sent-matched)
+			   (se (last sent-matched) sent-unmatched)))))
+</PRE>
+
+<P>When two procedures are so similar, that's a clue that perhaps
+they could be combined into one.  We could look at the bodies of these two
+procedures to find a way to combine them textually.  But instead, let's step
+back and think about the meanings of the placeholders.
+
+<P>The reason that the procedures <CODE>*-longest-match</CODE> and <CODE>&amp;-longest-match</CODE> are so similar is that the two placeholders have almost
+identical meanings.  <CODE>*</CODE> means &quot;match as many words as possible&quot;;
+<CODE>&amp;</CODE> means &quot;match as many words as possible, but at least one.&quot; Once
+we're thinking in these terms, it's plausible to think of <CODE>?</CODE> as
+meaning &quot;match as many words as possible, but at most one.&quot; In fact,
+although this is a stretch, we can also describe <CODE>!</CODE> similarly: &quot;Match
+as many words as possible, but at least one, and at most one.&quot;
+
+<P><TABLE><TR><TH>Placeholder&nbsp;&nbsp;<TH>Minimum size&nbsp;&nbsp;
+<TH>Maximum size&nbsp;&nbsp;
+<TR><TD><CODE>*</CODE><TD>0<TD>no limit
+<TR><TD><CODE>&</CODE><TD>1<TD>no limit
+<TR><TD><CODE>?</CODE><TD>0<TD>1
+<TR><TD><CODE>!</CODE><TD>1<TD>1
+</TABLE>
+
+
+<P>We'll take advantage of this newly understood similarity to
+simplify the program by using a single algorithm for all placeholders.
+
+<P>How do we generalize <CODE>*-longest-match</CODE> and <CODE>&amp;-longest-match</CODE> to
+handle all four cases?  There are two kinds of generalization involved.
+We'll write a procedure <CODE>longest-match</CODE> that will have the same
+arguments as <CODE>*-longest-match</CODE>, plus two others, one for for the minimum
+size of the matched text and one for the maximum.
+
+<P>We'll specify the minimum size with a formal parameter <CODE>min</CODE>.  (The
+corresponding argument will always be 0 or 1.)  <CODE>Longest-match</CODE> will
+pass the value of <CODE>min</CODE> down to <CODE>lm-helper</CODE>, which will use it to
+reject potential matches that are too short.
+
+<P>Unfortunately, we can't use a number to specify the maximum size, because
+for <CODE>*</CODE> and <CODE>&amp;</CODE> there is no maximum.  Instead, <CODE>longest-match</CODE>
+has a formal parameter <CODE>max-one?</CODE> whose value is <CODE>#t</CODE> only for <CODE>?</CODE> and <CODE>!</CODE>.
+
+<P>Our earlier, special-case versions of <CODE>longest-match</CODE> were written for
+<CODE>*</CODE> and <CODE>&amp;</CODE>, the placeholders for which <CODE>max-one?</CODE> will be
+false.  For those placeholders, the new <CODE>longest-match</CODE> will be just
+like the earlier versions.  Our next task is to generalize <CODE>longest-match</CODE> so that it can handle the <CODE>#t</CODE> cases.
+
+<P>Think about the meaning of the <CODE>sent-matched</CODE> and <CODE>sent-unmatched</CODE>
+parameters in the <CODE>lm-helper</CODE> procedures.  <CODE>Sent-matched</CODE> means
+&quot;the longest part of the sentence that this placeholder is still allowed to
+match,&quot; while <CODE>sent-unmatched</CODE> contains whatever portion of the
+sentence has already been disqualified from being matched by the placeholder.
+
+<P>Consider the behavior of <CODE>*-longest-match</CODE> when an asterisk is at the
+beginning of a pattern that we're trying to match against a seven-word
+sentence.  Initially, <CODE>sent-matched</CODE> is the entire seven-word sentence,
+and <CODE>sent-unmatched</CODE> is empty.  Then, supposing that doesn't work, <CODE>sent-matched</CODE> is a six-word sentence, while <CODE>sent-unmatched</CODE> contains
+the remaining word.  This continues as long as no match succeeds until, near
+the end of <CODE>longest-match</CODE>'s job, <CODE>sent-matched</CODE> is a one-word
+sentence and <CODE>sent-unmatched</CODE> contains six words.  At this point, the
+longest possible match for the asterisk is a single word.
+
+<P>This situation is where we want to <EM>start</EM> in the case of the <CODE>?</CODE>
+and <CODE>!</CODE> placeholders.  So when we're trying to match one of these
+placeholders, our initialization procedure won't use the entire sentence as
+the initial value of <CODE>sent-matched</CODE>; rather, the initial value of <CODE>sent-matched</CODE> will be a one-word sentence, and <CODE>sent-unmatched</CODE> will
+contain the rest of the sentence.
+
+<P><PRE>(define (longest-match pattern-rest sent min max-one?)  ;; first version
+  (cond ((empty? sent)
+	 (and (= min 0) (match? pattern-rest sent)))
+	(max-one?
+	 (lm-helper pattern-rest (se (first sent)) (bf sent) min))
+	(else (lm-helper pattern-rest sent '() min))))
+
+(define (lm-helper pattern-rest sent-matched sent-unmatched min)
+  (cond ((&lt; (length sent-matched) min) #f)
+	((match? pattern-rest sent-unmatched) #t)
+	((empty? sent-matched) #f)
+	(else (lm-helper pattern-rest
+			 (bl sent-matched)
+			 (se (last sent-matched) sent-unmatched)
+			 min))))
+</PRE>
+
+<P>Now we can rewrite <CODE>match?</CODE> to use <CODE>longest-match</CODE>.  <CODE>Match?</CODE>
+will delegate the handling of all placeholders to a subprocedure <CODE>match-special</CODE> that will invoke <CODE>longest-match</CODE> with the correct values
+for <CODE>min</CODE> and <CODE>max-one?</CODE> according to the table.
+
+<P><PRE>(define (match? pattern sent)                          ;; fourth version
+  (cond ((empty? pattern)
+	 (empty? sent))
+	<U>((special? (first pattern))</U>
+	 <U>(match-special (first pattern) (bf pattern) sent))</U>
+	((empty? sent) #f)
+	((equal? (first pattern) (first sent))
+	 (match? (bf pattern) (bf sent)))
+	(else #f)))
+
+(define (special? wd)                                  ;; first version
+  (member? wd '(* &amp; ? !)))
+
+(define (match-special placeholder pattern-rest sent)  ;; first version
+  (cond ((equal? placeholder '?)
+	 (longest-match pattern-rest sent 0 #t))
+	((equal? placeholder '!)
+	 (longest-match pattern-rest sent 1 #t))
+	((equal? placeholder '*)
+	 (longest-match pattern-rest sent 0 #f))
+	((equal? placeholder '&amp;)
+	 (longest-match pattern-rest sent 1 #f))))
+</PRE>
+
+<P><H2>Naming the Matched Text</H2>
+
+<P>So far we've worked out how to match the four kinds of placeholders and
+return a true or false value indicating whether a match is possible.  Our
+program is almost finished; all we need to make it useful is the
+facility that will let us find out <EM>which</EM> words in the sentence
+matched each placeholder in the pattern.
+
+<P>We don't have to change the overall structure of the program in order to
+make this work.  But most of the procedures in the pattern matcher will have
+to be given an additional argument, the database of placeholder names
+and values that have been matched so far.<A NAME="text3" HREF="match.html#ft3">[3]</A> The formal parameter <CODE>known-values</CODE> will hold
+this database.  Its value will be a sentence containing placeholder names
+followed by the corresponding words and an exclamation point to separate the
+entries, as in the examples earlier in the chapter.  When we begin the
+search for a match, we use an empty sentence as the initial <CODE>known-values</CODE>:
+
+<P><PRE>(define (match pattern sent)
+  (match-using-known-values pattern sent '()))
+
+(define (match-using-known-values pattern sent known-values)
+  &hellip;)
+</PRE>
+
+<P>As <CODE>match-using-known-values</CODE> matches the beginning of a
+pattern with the beginning of a sentence, it invokes itself recursively with
+an expanded <CODE>known-values</CODE> containing each newly matched placeholder.
+For example, in evaluating
+
+<P><PRE>(match '(!twice !other !twice) '(cry baby cry))
+</PRE>
+
+<P>the program will call <CODE>match-using-known-values</CODE> four times:
+
+<P><TABLE><TR><TH><CODE>pattern</CODE><TH><CODE>sent</CODE>
+<TH><CODE>known-values</CODE>
+<TR><TD><CODE>(!twice !other !twice)&nbsp;&nbsp;</CODE>
+<TD><CODE>(cry baby cry)&nbsp;&nbsp;</CODE>
+<TD><CODE>()</CODE>
+<TR><TD><CODE>(!other !twice)</CODE>
+<TD><CODE>(baby cry)</CODE>
+<TD><CODE>(twice cry !)</CODE>
+<TR><TD><CODE>(!twice)</CODE>
+<TD><CODE>(cry)</CODE>
+<TD><CODE>(twice cry ! other baby !)</CODE>
+<TR><TD><CODE>()</CODE>
+<TD><CODE>()</CODE>
+<TD><CODE>(twice cry ! other baby !)</CODE>
+</TABLE>
+
+
+<P>In the first invocation, we try to match <CODE>!twice</CODE> against some part of
+the sentence.  Since <CODE>!</CODE> matches exactly one word, the only possibility
+is to match the word <CODE>cry</CODE>.  The recursive invocation, therefore, is made
+with the first words of the pattern and sentence removed, but with the match
+between <CODE>twice</CODE> and <CODE>cry</CODE> added to the database.
+
+<P>Similarly, the second invocation matches <CODE>!other</CODE> with <CODE>baby</CODE> and
+causes a third invocation with shortened pattern and sentence but a longer
+database.
+
+<P>The third invocation is a little different because the pattern contains the
+placeholder <CODE>!twice</CODE>, but the name <CODE>twice</CODE> is already in the
+database.  Therefore, this placeholder can't match whatever word happens to
+be available; it must match the same word that it matched before.  (Our
+program will have to check for this situation.)  Luckily, the sentence does
+indeed contain the word <CODE>cry</CODE> at this position.
+
+<P>The final invocation reaches the base case of the recursion, because the
+pattern is empty.  The value that <CODE>match-using-known-values</CODE> returns is
+the database in this invocation.
+
+<P><H2>The Final Version</H2>
+
+<P>We're now ready to show you the final version of the program.  The program
+structure is much like what you've seen before; the main difference is the
+database of placeholder names and values.  The program must add entries to
+this database and must look for database entries that were added earlier.
+Here are the three most important procedures and how they are changed
+from the earlier version to implement this capability:
+
+<P><P><TABLE><TR><TH align="right" valign="top">&bull;<TD>&nbsp;&nbsp;&nbsp;&nbsp;<TD valign="top"><CODE>match-using-known-values</CODE>, essentially the same as what was
+formerly named <CODE>match?</CODE> except for bookkeeping details.
+</TABLE><TABLE><TR><TH align="right" valign="top">&bull;<TD>&nbsp;&nbsp;&nbsp;&nbsp;<TD valign="top"><CODE>match-special</CODE>, similar to the old version, except that it must
+recognize the case of a placeholder whose name has already been seen.  In
+this case, the placeholder can match only the same words that it matched
+before.
+</TABLE><TABLE><TR><TH align="right" valign="top">&bull;<TD>&nbsp;&nbsp;&nbsp;&nbsp;<TD valign="top"><CODE>longest-match</CODE> and <CODE>lm-helper</CODE>, also similar to the old
+versions, except that they have the additional job of adding to the database the
+name and value of any placeholder that they match.
+
+</TABLE>
+
+<P>Here are the modified procedures.  Compare them to the previous
+versions.
+
+<P><PRE>(define (<A NAME="g4"></A>match pattern sent)
+  (match-using-known-values pattern sent '()))
+
+(define (<A NAME="g5"></A>match-using-known-values pattern sent known-values)
+  (cond ((empty? pattern)
+	 (if (empty? sent) known-values 'failed))
+	((special? (first pattern))
+	 (let ((placeholder (first pattern)))
+	   (match-special (first placeholder)
+			  (bf placeholder)
+			  (bf pattern)
+			  sent
+			  known-values)))
+	((empty? sent) 'failed)
+	((equal? (first pattern) (first sent))
+	 (match-using-known-values (bf pattern) (bf sent) known-values))
+	(else 'failed)))
+
+(define (<A NAME="g6"></A>match-special howmany name pattern-rest sent known-values)
+  (let ((old-value (lookup name known-values)))
+    (cond ((not (equal? old-value 'no-value))
+	   (if (length-ok? old-value howmany)
+	       (already-known-match
+		  old-value pattern-rest sent known-values)
+	       'failed))
+	  ((equal? howmany '?)
+	   (longest-match name pattern-rest sent 0 #t known-values))
+	  ((equal? howmany '!)
+	   (longest-match name pattern-rest sent 1 #t known-values))
+	  ((equal? howmany '*)
+	   (longest-match name pattern-rest sent 0 #f known-values))
+	  ((equal? howmany '&amp;)
+	   (longest-match name pattern-rest sent 1 #f known-values)))))
+
+(define (<A NAME="g7"></A>longest-match name pattern-rest sent min max-one? known-values)
+  (cond ((empty? sent)
+	 (if (= min 0)
+	     (match-using-known-values pattern-rest
+				       sent
+				       (add name '() known-values))
+	     'failed))
+	(max-one?
+	 (lm-helper name pattern-rest (se (first sent))
+		    (bf sent) min known-values))
+	(else (lm-helper name pattern-rest
+			 sent '() min known-values))))
+
+(define (<A NAME="g8"></A>lm-helper name pattern-rest
+		   sent-matched sent-unmatched min known-values)
+  (if (&lt; (length sent-matched) min)
+      'failed
+      (let ((tentative-result (match-using-known-values
+			       pattern-rest
+			       sent-unmatched
+			       (add name sent-matched known-values))))
+	(cond ((not (equal? tentative-result 'failed)) tentative-result)
+	      ((empty? sent-matched) 'failed)
+	      (else (lm-helper name
+			       pattern-rest
+			       (bl sent-matched)
+			       (se (last sent-matched) sent-unmatched)
+			       min
+			       known-values))))))
+</PRE>
+
+<P>We haven't listed all of the minor procedures that these procedures invoke.
+A complete listing is at the end of the chapter, but we hope that you have
+enough confidence about the overall program structure to be able to assume
+these small details will work.  In the next few paragraphs we discuss some
+of the ways in which the procedures shown here differ from the earlier
+versions.
+
+<P>In the invocation of <CODE>match-special</CODE> we found it convenient to split the
+placeholder into its first character, the one that tells how many words can
+be matched, and the butfirst, which is the name of the placeholder.
+
+<P>What happens if <CODE>match-special</CODE> finds that the name is already in the
+database?  In this situation, we don't have to try multiple possibilities
+for the number of words to match (the usual job of <CODE>longest-match</CODE>); the
+placeholder must match exactly the words that it matched before.  In this
+situation, three things must be true in order for the match to succeed:
+<A NAME="threematchconditions"></A>
+(1) The first words of the <CODE>sent</CODE> argument must match the old value
+stored in the database.  (2) The partial <CODE>pattern</CODE> that remains after
+this placeholder must match the rest of the <CODE>sent</CODE>.  (3) The old value
+must be consistent with the number of words permitted by the <CODE>howmany</CODE>
+part of the placeholder.  For example, if the pattern is
+
+<P><PRE>(*stuff and !stuff)
+</PRE>
+
+<P>and the database says that the placeholder <CODE>*stuff</CODE> was
+matched by three words from the sentence, then the second placeholder <CODE>!stuff</CODE> can't possibly be matched because it accepts only one word.  This
+third condition is actually checked first, by <CODE>length-ok?</CODE>, and if we
+pass that hurdle, the other two conditions are checked by <CODE>already-known-match</CODE>.
+
+<P>The only significant change to <CODE>longest-match</CODE> is that it invokes
+<CODE>add</CODE> to compute an expanded database with the newly found match added,
+and it uses the resulting database as an argument to <CODE>match-using-known-values</CODE>.
+
+<P><H2>Abstract Data Types</H2>
+
+<P>As you know, a database of known values is represented in this program as a
+sentence in which the entries are separated by exclamation points.  Where is
+this representation accomplished in the program you've seen?  There's
+nothing like
+
+<P><PRE>&hellip;(sentence old-known-values name value '!) &hellip;</PRE>
+
+<P>anywhere in the procedures we've shown.  Instead, the program makes
+reference to the database of known values through two procedure calls:
+
+<P><PRE>(lookup name known-values)                   ; in match-special
+(add name matched known-values)              ; in longest-match
+</PRE>
+
+<P>Only the procedures <CODE>lookup</CODE> and <CODE>add</CODE> manipulate the
+database of known values:
+
+<P><PRE>(define (<A NAME="g9"></A>lookup name known-values)
+  (cond ((empty? known-values) 'no-value)
+	((equal? (first known-values) name)
+	 (get-value (bf known-values)))
+	(else (lookup name (skip-value known-values)))))
+
+(define (<A NAME="g10"></A>get-value stuff)
+  (if (equal? (first stuff) '!)
+      '()
+      (se (first stuff) (get-value (bf stuff)))))
+
+(define (<A NAME="g11"></A>skip-value stuff)
+  (if (equal? (first stuff) '!)
+      (bf stuff)
+      (skip-value (bf stuff))))
+
+(define (<A NAME="g12"></A>add name value known-values)
+  (if (empty? name)
+      known-values
+      (se known-values name value '!)))
+</PRE>
+
+<P>These procedures are full of small details.  For example, it's a little
+tricky to extract the part of a sentence from a name to the next exclamation
+point.  It's convenient that we could write the more important procedures,
+such as <CODE>longest-match</CODE>, without filling them with these details.
+As far as <CODE>longest-match</CODE> knows, <CODE>lookup</CODE> and <CODE>add</CODE> could be
+Scheme primitive procedures.  In effect we've created a new data type, with
+<CODE>add</CODE> as its constructor and <CODE>lookup</CODE> as its selector.
+
+<P>Types such as these, that are invented by a programmer and aren't part of the
+Scheme language itself, are called <EM><A NAME="g13"></A><A NAME="g14"></A>abstract data types.</EM>
+<A NAME="g15"></A>
+<A NAME="g16"></A>
+Creating an abstract data type means drawing a barrier between an idea about
+some kind of information we want to model in a program and the particular
+mechanism that we use to represent the information.  In this case, the
+information is a collection of name-value associations, and the particular
+mechanism is a sentence with exclamation points and so on.  The pattern
+matcher doesn't think of the database as a sentence.  For example, it would
+be silly to translate the database into Pig Latin or find its acronym.
+
+<P>Just as we distinguish the <EM>primitive</EM> procedures that Scheme knows
+all along from the <EM>compound</EM> procedures that the Scheme programmer
+defines, we could use the names &quot;primitive data type&quot; for types such as
+numbers and Booleans that are built into Scheme and &quot;compound data type&quot;
+for ones that the programmer invents by defining selectors and constructors.
+But &quot;compound data type&quot; is a bit of a pun, because it also suggests a data
+type built out of smaller pieces, just as a compound expression is built of
+smaller expressions.  Perhaps that's why the name &quot;abstract data type&quot; has
+become generally accepted.  It's connected to the idea of abstraction
+that we introduced earlier, because in order to create an abstract data type,
+we must specify the selectors and constructors and give names to those
+patterns of computation.
+
+<P><H2>Backtracking and <CODE>Known-Values</CODE></H2>
+
+<P>What happens to the database in cases that require backtracking, where
+a particular recursive call might be &quot;on the wrong track&quot;?  Let's trace
+<CODE>match-using-known-values</CODE> and see what happens.  (We'll use the
+little-people model to discuss this example, and so we're annotating each
+invocation in the trace with the name of its little person.)
+
+<P><PRE>&gt; (trace match-using-known-values)
+&gt; (match '(*start me *end) '(love me do))<SMALL><CODE>
+(match-using-known-values (*start me *end) (love me do) ())          Martha
+|  (match-using-known-values (me *end) () (start love me do !))      Mercutio
+|  failed
+|  (match-using-known-values (me *end) (do) (start love me !))       Masayuki
+|  failed
+|  (match-using-known-values (me *end) (me do) (start love !))       Mohammad
+|  |  (match-using-known-values (*end) (do) (start love !))          Mae
+|  |  |  (match-using-known-values () () (start love ! end do !))    Merlin
+|  |  |  (start love ! end do !)
+|  |  (start love ! end do !)
+|  (start love ! end do !)
+(start love ! end do !)
+</CODE></SMALL>(START LOVE ! END DO !)
+</PRE>
+
+<P>Martha, the first little person shown, has an empty <CODE>known-values</CODE>.  She
+makes three attempts to match <CODE>*start</CODE> with parts of the sentence.  In
+each case, a little person is hired with the provisional match in his or her
+<CODE>known-values</CODE>.  (Actually, Martha does not directly hire Mercutio and
+the others.  Martha hires a <CODE>match-special</CODE> little person, who in turn
+hires a <CODE>longest-match</CODE> specialist, who hires an <CODE>lm-helper</CODE>
+specialist, who hires Mercutio.  But that added complexity isn't important
+for the point we're focusing on right now, namely, how backtracking can
+work.  Pretend Martha hires Mercutio.)
+
+<P>If you don't use the little-people model, but instead think about the
+program as if there were just one <CODE>known-values</CODE> variable, then the
+backtracking can indeed be very mysterious.  Once a provisional match is
+added to the database, how is it ever removed?  The answer is that it doesn't
+work that way.  There isn't a &quot;the&quot; database.  Instead, each little person
+has a separate database.  If an attempted match fails, the little person who
+reports the failure just stops working.  For example, Martha hires Mercutio
+to attempt a match in which the name <CODE>start</CODE> has the value <CODE>love me
+do</CODE>.  Mercutio is unable to complete the match, and reports failure.  It is
+Martha, not Mercutio, who then hires Masayuki to try another value for <CODE>start</CODE>.  Martha's database hasn't changed, so Martha gives Masayuki a
+database that reflects the new trial value but not the old one.
+
+<P>Not every hiring of a little person starts from an empty database.  When a
+match is partially successful, the continuation of the same attempt must
+benefit from the work that's already been done.  So, for example, when
+Mohammad hires Mae, and when Mae hires Merlin, each of them passes on an
+extended database, not an empty one.  Specifically, Mae gives Merlin the
+new match of the name <CODE>end</CODE> with the value <CODE>do</CODE>, but also the match
+of <CODE>start</CODE> with <CODE>love</CODE> that she was given by Mohammad.
+
+<P>So as you can see, we don't have to do anything special to keep track of our
+database when we backtrack; the structure of the recursion takes care of
+everything for free.
+
+<P><H2>How We Wrote It</H2>
+
+<P>For explanatory purposes we've chosen to present the pieces of this program
+in a different order from the one in which we actually wrote them.  We <EM>did</EM> implement the easy placeholders (<CODE>!</CODE> and <CODE>?</CODE>) before the
+harder ones.  But our program had provision for a database of names from the
+beginning.
+
+<P>There is no &quot;right&quot; way to approach a programming problem.  Our particular
+approach was determined partly by our past experience.  Each of us had
+written similar programs before, and we had preconceived ideas about the
+easy and hard parts.  You might well start at a different point.  For
+example, here is an elegant small program we'd both been shown by friends:
+
+<P><PRE>(define (match? pattern sent)
+  (cond ((empty? pattern) (empty? sent))
+	((empty? sent)
+	 (and (equal? (first pattern) '*) (match? (bf pattern) sent)))
+	((equal? (first pattern) '*)
+	 (or (match? pattern (bf sent))
+	     (match? (bf pattern) sent)))
+	(else (and (equal? (first pattern) (first sent))
+		   (match? (bf pattern) (bf sent))))))
+</PRE>
+
+<P>What's appealing about this is the funny symmetry of taking the
+<CODE>butfirst</CODE> of the pattern <EM>or</EM> of the sentence.  That's not
+something you'd naturally think of, probably, but once you've worked out how
+it can work, it affects your preconceptions when you set out to write a
+pattern matcher yourself.
+
+<P>Based on that inspiration, we might well have started with the hard cases
+(such as <CODE>*</CODE>), with the idea that once they're in place, the easy cases
+won't change the program structure much.
+
+<P><H2>Complete Program Listing</H2>
+
+<P><P><P><PRE>
+(define (match pattern sent)
+  (match-using-known-values pattern sent '()))
+
+(define (match-using-known-values pattern sent known-values)
+  (cond ((empty? pattern)
+	 (if (empty? sent) known-values 'failed))
+	((special? (first pattern))
+	 (let ((placeholder (first pattern)))
+	   (match-special (first placeholder)
+			  (bf placeholder)
+			  (bf pattern)
+			  sent
+			  known-values)))
+	((empty? sent) 'failed)
+	((equal? (first pattern) (first sent))
+	 (match-using-known-values (bf pattern) (bf sent) known-values))
+	(else 'failed)))
+
+(define (special? wd)
+  (member? (first wd) '(* & ? !)))
+
+(define (match-special howmany name pattern-rest sent known-values)
+  (let ((old-value (lookup name known-values)))
+    (cond ((not (equal? old-value 'no-value))
+	   (if (length-ok? old-value howmany)
+	       (already-known-match
+		  old-value pattern-rest sent known-values)
+	       'failed))
+	  ((equal? howmany '?)
+	   (longest-match name pattern-rest sent 0 #t known-values))
+	  ((equal? howmany '!)
+	   (longest-match name pattern-rest sent 1 #t known-values))
+	  ((equal? howmany '*)
+	   (longest-match name pattern-rest sent 0 #f known-values))
+	  ((equal? howmany '&)
+	   (longest-match name pattern-rest sent 1 #f known-values)))))
+
+(define (length-ok? value howmany)
+  (cond ((empty? value) (member? howmany '(? *)))
+	((not (empty? (bf value))) (member? howmany '(* &)))
+	(else #t)))
+
+(define (already-known-match value pattern-rest sent known-values)
+  (let ((unmatched (chop-leading-substring value sent)))
+    (if (not (equal? unmatched 'failed))
+	(match-using-known-values pattern-rest unmatched known-values)
+	'failed)))
+
+(define (chop-leading-substring value sent)
+  (cond ((empty? value) sent)
+	((empty? sent) 'failed)
+	((equal? (first value) (first sent))
+	 (chop-leading-substring (bf value) (bf sent)))
+	(else 'failed)))
+
+(define (longest-match name pattern-rest sent min max-one? known-values)
+  (cond ((empty? sent)
+	 (if (= min 0)
+	     (match-using-known-values pattern-rest
+				       sent
+				       (add name '() known-values))
+	     'failed))
+	(max-one?
+	 (lm-helper name pattern-rest (se (first sent))
+		    (bf sent) min known-values))
+	(else (lm-helper name pattern-rest
+			 sent '() min known-values))))
+
+(define (lm-helper name pattern-rest
+		   sent-matched sent-unmatched min known-values)
+  (if (< (length sent-matched) min)
+      'failed
+      (let ((tentative-result (match-using-known-values
+			       pattern-rest
+			       sent-unmatched
+			       (add name sent-matched known-values))))
+	(cond ((not (equal? tentative-result 'failed)) tentative-result)
+	      ((empty? sent-matched) 'failed)
+	      (else (lm-helper name
+			       pattern-rest
+			       (bl sent-matched)
+			       (se (last sent-matched) sent-unmatched)
+			       min
+			       known-values))))))
+
+;;; Known values database abstract data type
+
+(define (lookup name known-values)
+  (cond ((empty? known-values) 'no-value)
+	((equal? (first known-values) name)
+	 (get-value (bf known-values)))
+	(else (lookup name (skip-value known-values)))))
+
+(define (get-value stuff)
+  (if (equal? (first stuff) '!)
+      '()
+      (se (first stuff) (get-value (bf stuff)))))
+
+(define (skip-value stuff)
+  (if (equal? (first stuff) '!)
+      (bf stuff)
+      (skip-value (bf stuff))))
+
+(define (add name value known-values)
+  (if (empty? name)
+      known-values
+      (se known-values name value '!)))
+</PRE><P>
+
+
+<P><H2>Exercises about Using the Pattern Matcher</H2>
+
+<P><B>16.1</B>&nbsp;&nbsp;Design and test a pattern that matches any sentence containing the word <CODE>C</CODE> three times (not necessarily next to each other).
+
+
+<P>
+<B>16.2</B>&nbsp;&nbsp;Design and test a pattern that matches a sentence consisting of two copies
+of a smaller sentence, such as <CODE>(a b a b)</CODE>.
+
+
+<P>
+<B>16.3</B>&nbsp;&nbsp;Design and test a pattern that matches any sentence of no more than three
+words.
+
+
+<P>
+<B>16.4</B>&nbsp;&nbsp;Design and test a pattern that matches any sentence of at least three words.
+
+
+<P>
+<B>16.5</B>&nbsp;&nbsp;Show sentences of length 2, 3, and 4 that match the pattern
+
+<P><PRE>(*x *y *y *x)
+</PRE>
+
+<P>For each length, if no sentence can match the pattern, explain why
+not.
+
+
+<P>
+<B>16.6</B>&nbsp;&nbsp;Show sentences of length 2, 3, and 4 that match the pattern
+
+<P><PRE>(*x *y &amp;y &amp;x)
+</PRE>
+
+<P>For each length, if no sentence can match the pattern, explain why
+not.
+
+
+<P>
+<B>16.7</B>&nbsp;&nbsp;List <EM>all</EM> the sentences of length 6 or less, starting with <CODE>a b a</CODE>, that match the pattern
+
+<P><PRE>(*x *y *y *x)
+</PRE>
+
+
+<P>
+<H2>Exercises about Implementation</H2>
+
+<P><B>16.8</B>&nbsp;&nbsp;Explain how <CODE>longest-match</CODE> handles an empty sentence.
+
+
+<P>
+<B>16.9</B>&nbsp;&nbsp;Suppose the first <CODE>cond</CODE> clause in <CODE>match-using-known-values</CODE> were
+
+<P><PRE>((empty? pattern) known-values)
+</PRE>
+
+<P>Give an example of a pattern and sentence for which the modified
+program would give a different result from the original.
+
+
+<P>
+<B>16.10</B>&nbsp;&nbsp;What happens if the sentence argument&mdash;not the pattern&mdash;contains
+the word <CODE>*</CODE> somewhere?
+
+
+<P>
+<B>16.11</B>&nbsp;&nbsp;For each of the following examples, how many <CODE>match-using-known-values</CODE>
+little people are required?
+
+<P><PRE>(match '(from me to you) '(from me to you))
+(match '(*x *y *x) '(a b c a b))
+(match '(*x *y *z) '(a b c a b))
+(match '(*x hey *y bulldog *z) '(a hey b bulldog c))
+(match '(*x a b c d e f) '(a b c d e f))
+(match '(a b c d e f *x) '(a b c d e f))
+</PRE>
+
+<P>In general, what can you say about the characteristics that make a
+pattern easy or hard to match?
+
+
+<P>
+<B>16.12</B>&nbsp;&nbsp;Show a pattern with the following two properties:  (1) It has at least two
+placeholders.  (2) When you match it against any sentence, every invocation
+of <CODE>lookup</CODE> returns <CODE>no-value</CODE>.
+
+
+<P>
+<B>16.13</B>&nbsp;&nbsp;Show a pattern and a sentence that can be used as arguments to <CODE>match</CODE>
+so that <CODE>lookup</CODE> returns <CODE>(the beatles)</CODE> at some point during the
+match.
+
+
+<P>
+<B>16.14</B>&nbsp;&nbsp;Our program can still match patterns with unnamed placeholders.
+How would it affect the operation of the program if these unnamed
+placeholders were added to the database?  What part of the program keeps
+them from being added?
+
+
+<P>
+<B>16.15</B>&nbsp;&nbsp;Why don't <CODE>get-value</CODE> and <CODE>skip-value</CODE> check for an empty argument
+as the base case?
+
+
+<P>
+<B>16.16</B>&nbsp;&nbsp;Why didn't we write the first <CODE>cond</CODE> clause in <CODE>length-ok?</CODE> as
+the following?
+
+<P><PRE>((and (empty? value) (member? howmany '(? *))) #t)
+</PRE>
+
+<P>
+<B>16.17</B>&nbsp;&nbsp;Where in the program is the initial empty database of known values
+established?
+
+
+<P>
+<B>16.18</B>&nbsp;&nbsp;For the case of matching a placeholder name that's already been matched in
+this pattern, we said on page <A HREF="match.html#threematchconditions">there</A> that three conditions
+must be checked.  For each of the three, give a pattern and sentence that
+the program would incorrectly match if the condition were not checked.
+
+
+<P>
+<B>16.19</B>&nbsp;&nbsp;What will the following example do?
+
+<P><PRE>(match '(?x is *y !x) '(! is an exclamation point !))
+</PRE>
+
+<P>Can you suggest a way to fix this problem?
+
+
+<P>
+<B>16.20</B>&nbsp;&nbsp;Modify the pattern matcher so that a placeholder of the form <CODE>*15x</CODE> is like <CODE>*x</CODE> except that it can be matched only by exactly 15
+words.
+
+<P><PRE>&gt; (match '(*3front *back) '(your mother should know))
+(FRONT YOUR MOTHER SHOULD ! BACK KNOW !)
+</PRE>
+
+<P>
+<B>16.21</B>&nbsp;&nbsp;Modify the pattern matcher so that a <CODE>+</CODE> placeholder (with or without a
+name attached) matches only a number:
+
+<P><PRE>&gt; (match '(*front +middle *back) '(four score and 7 years ago))
+(FRONT FOUR SCORE AND ! MIDDLE 7 ! BACK YEARS AGO !)
+</PRE>
+
+<P>The <CODE>+</CODE> placeholder is otherwise like <CODE>!</CODE>&mdash;it must match
+exactly one word.
+
+
+<P>
+<B>16.22</B>&nbsp;&nbsp;Does your favorite text editor or word processor have a search command that
+allows you to search for patterns rather than only specific strings of
+characters?  Look into this and compare your editor's capabilities with
+that of our pattern matcher.
+
+
+<P>
+
+<P>
+<HR>
+<A NAME="ft1" HREF="match.html#text1">[1]</A> Why not return the
+sentence if successful or <CODE>#f</CODE> otherwise?  That would be fine in most
+versions of Scheme, but as we mentioned earlier, the empty sentence <CODE>()</CODE>
+is the same as the false value <CODE>#f</CODE> in some dialects.  In those
+Schemes, a successfully matched pattern with no named placeholders, for
+which the program should return an empty sentence, would be
+indistinguishable from an unmatched pattern.<P>
+<A NAME="ft2" HREF="match.html#text2">[2]</A> Actually, since <CODE>or</CODE> is a special form, Scheme avoids
+the need to try the second alternative if the first one succeeds.<P>
+<A NAME="ft3" HREF="match.html#text3">[3]</A> The word <EM>database</EM>
+has two possible meanings in computer science, a broad meaning and a narrow
+one.  The broad meaning, which we're using here, is a repository of
+information to which the program periodically adds new items for later
+retrieval.  The narrow meaning is a collection of information that's
+manipulated by a <EM>database program,</EM> which provides facilities for
+adding new information, modifying existing entries, selecting entries that
+match some specified criterion, and so on.  We'll see a database program
+near the end of the book.<P>
+<P><A HREF="../ss-toc2.html">(back to Table of Contents)</A><P>
+<A HREF="../ssch15/poker.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../ssch17/part5.html"><STRONG>NEXT</STRONG></A>
+
+<P>
+<ADDRESS>
+<A HREF="../index.html">Brian Harvey</A>, 
+<CODE>bh@cs.berkeley.edu</CODE>
+</ADDRESS>
+</BODY>
+</HTML>
author	elioat <elioat@tilde.institute>	2023-08-23 07:52:19 -0400
committer	elioat <elioat@tilde.institute>	2023-08-23 07:52:19 -0400
commit	562a9a52d599d9a05f871404050968a5fd282640 (patch)
tree	7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/ssch16/match.html
parent	5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff)
download	tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz