diff options
author | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
---|---|---|
committer | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
commit | 562a9a52d599d9a05f871404050968a5fd282640 (patch) | |
tree | 7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/ssch16/match | |
parent | 5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff) | |
download | tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz |
*
Diffstat (limited to 'js/games/nluqo.github.io/~bh/ssch16/match')
-rw-r--r-- | js/games/nluqo.github.io/~bh/ssch16/match | 1399 |
1 files changed, 1399 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/ssch16/match b/js/games/nluqo.github.io/~bh/ssch16/match new file mode 100644 index 0000000..5ae4f6d --- /dev/null +++ b/js/games/nluqo.github.io/~bh/ssch16/match @@ -0,0 +1,1399 @@ +<P> + +<P><A NAME="bongard"></A><CENTER><IMG SRC="../ss-pics/patterns.jpg" ALT="figure: patterns"></CENTER><P><CENTER>In each set, how do the ones on the left differ from the ones on the +right? +</CENTER><P> + + +<P><HTML> +<HEAD> +<TITLE>Simply Scheme: Introducing Computer Science ch 16: Example: Pattern Matcher</TITLE> +</HEAD> +<BODY> +<HR> +<CITE>Simply Scheme:</CITE> +<CITE>Introducing Computer Science</CITE> 2/e Copyright (C) 1999 MIT +<H2>Chapter 16</H2> +<H1>Example: Pattern Matcher</H1> + +<TABLE width="100%"><TR><TD> +<IMG SRC="../simply.jpg" ALT="cover photo"> +<TD><TABLE> +<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian +Harvey</A><BR>University of California, Berkeley</CITE> +<TR><TD align="right"><CITE><A HREF="http://ccrma.stanford.edu/~matt">Matthew +Wright</A><BR>University of California, Santa Barbara</CITE> +<TR><TD align="right"><BR> +<TR><TD align="right"><A HREF="../pdf/ssch16.pdf">Download PDF version</A> +<TR><TD align="right"><A HREF="../ss-toc2.html">Back to Table of Contents</A> +<TR><TD align="right"><A HREF="../ssch15/poker.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch17/part5.html"><STRONG>NEXT</STRONG></A> +<TR><TD align="right"><A HREF="http://mitpress.mit.edu/0262082810">MIT +Press web page for <CITE>Simply Scheme</CITE></A> +</TABLE></TABLE> + +<HR> + + +<P>It's time for another extended example in which we use the Scheme tools +we've been learning to accomplish something practical. We'll start by +describing how the program will work before we talk about how to implement +it. + +<P>You can load our program into Scheme by typing + +<P><PRE>(load "match.scm") +</PRE> + +<P><H2>Problem Description</H2> + +<P>A <EM><A NAME="g1"></A><A NAME="g2"></A>pattern matcher</EM> is a commonly used procedure whose job is +to compare a sentence to a range of possibilities. An example may make this +clear: + +<P><PRE>> (match '(* me *) '(love me do)) +#T + +> (match '(* me *) '(please please me)) +#T + +> (match '(* me *) '(in my life)) +#F +</PRE> + +<P>The first argument, <CODE>(* me *)</CODE>, is a <EM>pattern.</EM> In the +pattern, each asterisk (<CODE>*</CODE>) means "any number of words, including no +words at all." So the entire pattern matches any sentence that contains the +word "me" anywhere within it. You can think of <CODE>match</CODE> as a more +general form of <CODE>equal?</CODE> in the sense that it compares two sentences +and tells us whether they're the same, but with a broader meaning of "the +same." + +<P>Our pattern matcher will accept patterns more complicated than this first +example. There are four <EM>special characters</EM> that indicate +unspecified parts of a pattern, depending on the number of words that should +be allowed: + +<P><P><TABLE><TR><TH align="right" valign="top"><CODE>?</CODE><TD> <TD valign="top">At most one word. +<TR><TH align="right" valign="top"><CODE>!</CODE><TD> <TD valign="top">Exactly one word. +<TR><TH align="right" valign="top"><CODE>&</CODE><TD> <TD valign="top">At least one word. +<TR><TH align="right" valign="top"><CODE>*</CODE><TD> <TD valign="top">Any number of words. +</TABLE> + +<P><P>These characters are meant to be somewhat mnemonic. The question +mark means "maybe there's a word." The exclamation point means "precisely +one word!" (And it's vertical, just like the digit 1, sort of.) The +ampersand, which ordinarily means "and," indicates that we're matching a +word and maybe more. The asterisk doesn't have any mnemonic value, but it's +what everyone uses for a general matching indicator anyway. + +<P>We can give a <EM>name</EM> to the collection of words that match an +unspecified part of a pattern by including in the pattern a word that starts +with one of the four special characters and continues with the name. If the +match succeeds, <CODE>match</CODE> will return a sentence containing these names +and the corresponding values from the sentence: + +<P><PRE>> (match '(*start me *end) '(love me do)) +(START LOVE ! END DO !) + +> (match '(*start me *end) '(please please me)) +(START PLEASE PLEASE ! END !) + +> (match '(mean mr mustard) '(mean mr mustard)) +() + +> (match '(*start me *end) '(in my life)) +FAILED +</PRE> + +<P>In these examples, you see that <CODE>match</CODE> doesn't really return +<CODE>#t</CODE> or <CODE>#f</CODE>; the earlier set of examples showed a simplified +picture. In the first of the new examples, the special pattern word +<CODE>*start</CODE> is allowed to match any number of words, as indicated by the +asterisk. In this case it turned out to match the single word "love." +<CODE>Match</CODE> returns a result that tells us which words of the sentence match +the named special words in the pattern. (We'll call one of these special +pattern words a <EM>placeholder.</EM>) The exclamation points in the +returned value are needed to separate one match from another. (In the second +example, the name <CODE>end</CODE> was matched by an empty set of words.) In the +third example, the match was successful, but since there were no +placeholders the returned sentence was empty. If the match is unsuccessful, +<CODE>match</CODE> returns the word <CODE>failed</CODE>.<A NAME="text1" HREF="match#ft1">[1]</A> + +<P>If the same placeholder name appears more than once in the pattern, then it +must be matched by the same word(s) in the sentence each time: + +<P><PRE>> (match '(!twice !other !twice) '(cry baby cry)) +(TWICE CRY ! OTHER BABY !) + +> (match '(!twice !other !twice) '(please please me)) +FAILED +</PRE> + +<P>Some patterns might be matchable in more than one way. For example, the +invocation + +<P><PRE>> (match '(*front *back) '(your mother should know)) +</PRE> + +<P>might return any of five different correct answers: + +<P><PRE>(FRONT YOUR MOTHER SHOULD KNOW ! BACK !) +(FRONT YOUR MOTHER SHOULD ! BACK KNOW !) +(FRONT YOUR MOTHER ! BACK SHOULD KNOW !) +(FRONT YOUR ! BACK MOTHER SHOULD KNOW !) +(FRONT ! BACK YOUR MOTHER SHOULD KNOW !) +</PRE> + +<P>We arbitrarily decide that in such cases the first placeholder +should match as many words as possible, so in this case <CODE>match</CODE> will +actually return the first of these answers. + +<P>Before continuing, you might want to look at the first batch of exercises at +the end of this chapter, which are about using the pattern matcher. (The +rest of the exercises are about the implementation, which we'll discuss +next.) + +<P><H2>Implementation: When Are Two Sentences Equal?</H2> + +<P>Our approach to implementation will be to start with something we already +know how to write: a predicate that tests whether two sentences are exactly +equal. We will add capabilities one at a time until we reach our goal. + +<P>Suppose that Scheme's primitive <CODE>equal?</CODE> function worked only for words +and not for sentences. We could write an equality tester for sentences, +like this: + +<P><PRE>(define (<A NAME="g3"></A>sent-equal? sent1 sent2) + (cond ((empty? sent1) + (empty? sent2)) + ((empty? sent2) #f) + ((equal? (first sent1) (first sent2)) + (sent-equal? (bf sent1) (bf sent2))) + (else #f))) +</PRE> + +<P>Two sentences are equal if each word in the first sentence is +equal to the corresponding word in the second. They're unequal if one +sentence runs out of words before the other. + +<P>Why are we choosing to accept Scheme's primitive word comparison but rewrite +the sentence comparison? In our pattern matcher, a placeholder in the +pattern corresponds to a group of words in the sentence. There is no kind +of placeholder that matches only part of a word. (It would be possible to +implement such placeholders, but we've chosen not to.) Therefore, we will +never need to ask whether a word is "almost equal" to another word. + +<P><H2>When Are Two Sentences Nearly Equal?</H2> + +<P>Pattern matching is just a more general form of this <CODE>sent-equal?</CODE> +procedure. Let's write a very simple pattern matcher that knows only about +the "!" special character and doesn't let us name the words +that match the exclamation points in the pattern. We'll call this one <CODE>match?</CODE> with a question mark because it returns just true or false. + +<P><PRE>(define (match? pattern sent) ;; first version: ! only + (cond ((empty? pattern) + (empty? sent)) + ((empty? sent) #f) + <U>((equal? (first pattern) '!)</U> + <U>(match? (bf pattern) (bf sent)))</U> + ((equal? (first pattern) (first sent)) + (match? (bf pattern) (bf sent))) + (else #f))) +</PRE> + +<P>This program is exactly the same as <CODE>sent-equal?</CODE>, except for +the highlighted <CODE>cond</CODE> clause. We are still comparing each word of the +pattern with the corresponding word of the sentence, but now an exclamation +mark in the pattern matches <EM>any</EM> word in the sentence. (If <CODE>first</CODE> of <CODE>pattern</CODE> is an exclamation mark, we don't even look at <CODE>first</CODE> of <CODE>sent</CODE>.) + +<P>Our strategy in the next several sections will be to expand the pattern +matcher by implementing the remaining special characters, then finally +adding the ability to name the placeholders. For now, when we say something +like "the <CODE>*</CODE> placeholder," we mean the placeholder consisting of the +asterisk alone. Later, after we add named placeholders, the same procedures +will implement any placeholder that begins with an asterisk. + +<P><H2>Matching with Alternatives</H2> + +<P>The <CODE>!</CODE> matching is not much harder than <CODE>sent-equal?</CODE>, because +it's still the case that one word of the pattern must match one word of the +sentence. When we introduce the <CODE>?</CODE> option, the structure of the +program must be more complicated, because a question mark in the pattern +might or might not be paired up with a word in the sentence. In other +words, the pattern and the sentence might match without being the same +length. + +<P><PRE>(define (match? pattern sent) ;; second version: ! and ? + (cond ((empty? pattern) + (empty? sent)) + <U>((equal? (first pattern) '?)</U> + <U>(if (empty? sent)</U> + <U>(match? (bf pattern) '())</U> + <U>(or (match? (bf pattern) (bf sent))</U> + <U>(match? (bf pattern) sent))))</U> + ((empty? sent) #f) + ((equal? (first pattern) '!) + (match? (bf pattern) (bf sent))) + ((equal? (first pattern) (first sent)) + (match? (bf pattern) (bf sent))) + (else #f))) +</PRE> + +<P>Note that the new <CODE>cond</CODE> clause comes <EM>before</EM> the check +to see if <CODE>sent</CODE> is empty. That's because <CODE>sent</CODE> might be empty and +a pattern of <CODE>(?)</CODE> would still match it. But if the sentence is empty, +we know that the question mark doesn't match a word, so we just have to make +sure that the <CODE>butfirst</CODE> of the pattern contains nothing but question +marks. (We don't have a predicate named <CODE>all-question-marks?</CODE>; instead, +we use <CODE>match?</CODE> recursively to make this test.) + +<P>In general, a question mark in the pattern has to match either one word or +zero words in the sentence. How do we decide? Our rule is that each +placeholder should match as many words as possible, so we prefer to match +one word if we can. But allowing the question mark to match a word might +prevent the rest of the pattern from matching the rest of the sentence. + +<P>Compare these two examples: + +<P><PRE>> (match? '(? please me) '(please please me)) +#T + +> (match? '(? please me) '(please me)) +#T +</PRE> + +<P>In the first case, the first thing in the pattern is a question mark and the +first thing in the sentence is "please," and they match. That leaves +"please me" in the pattern to match "please me" in the sentence. + +<P>In the second case, we again have a question mark as the first thing in the +pattern and "please" as the first thing in the sentence. But this time, +we had better not use up the "please" in the sentence, because that will +only leave "me" to match "please me." In this case the question mark has +to match no words. + +<P>To you, these examples probably look obvious. That's because you're a human +being, and you can take in the entire pattern and the entire sentence all at +once. Scheme isn't as smart as you are; it has to compare words one pair at +a time. To Scheme, the processing of both examples begins with question +mark as the first word of the pattern and "please" as the first word of +the sentence. The pattern matcher has to consider both cases. + +<P>How does the procedure consider both cases? Look at the invocation of <CODE>or</CODE> by the <CODE>match?</CODE> procedure. There are two alternatives; if either +turns out true, the match succeeds. One is that we try to match the +question mark with the first word of the sentence just as we matched <CODE>!</CODE> in our earlier example—by making a recursive call on the <CODE>butfirst</CODE>s of the pattern and sentence. If that returns true, then the +question mark matches the first word. + +<P>The second alternative that can make the match succeed is a recursive call +to <CODE>match?</CODE> on the <CODE>butfirst</CODE> of the pattern and the <EM>entire</EM> +sentence; this corresponds to matching the <CODE>?</CODE> against +nothing.<A NAME="text2" HREF="match#ft2">[2]</A> + +<P>Let's trace <CODE>match?</CODE> so that you can see how these two cases are +handled differently by the program. + +<P><PRE>> (trace match?) +> (match? '(? please me) '(please please me)) +<SMALL><CODE>(match? (? please me) (please please me)) +| (match? (please me) (please me)) Try matching <CODE>?</CODE> with <CODE>please</CODE>. +| | (match? (me) (me)) +| | | (match? () ()) +| | | #t It works! +| | #t +| #t +#t +</CODE></SMALL>#T +</PRE> + +<P><PRE>> (match? '(? please me) '(please me)) +<SMALL><CODE>(match? (? please me) (please me)) +| (match? (please me) (me)) Try matching <CODE>?</CODE> with <CODE>please</CODE>. +| #f It doesn't work. +| (match? (please me) (please me)) This time, match <CODE>?</CODE> with nothing. +| | (match? (me) (me)) +| | | (match? () ()) +| | | #t +| | #t +| #t +#t +</CODE></SMALL>#T +</PRE> + +<P><H2>Backtracking</H2> + +<P>The program structure that allows for two alternative routes to success has +more profound implications than you may think at first. + +<P>When <CODE>match?</CODE> sees a question mark in the pattern, it has to decide +whether or not to "use up" a word of the sentence by matching it with the +question mark. You might wonder, "How does the question mark decide +whether to take a word?" The answer is that the decision isn't made "by +the question mark"; there's nothing about the particular word that the +question mark might match that helps with the decision! Instead, the +decision depends on matching what comes to the right of the question mark. + +<P>Compare this situation with the <CODE>keep</CODE> recursive pattern. There, too, +the procedure makes a decision about the first word of a sentence, and each +alternative leads to a recursive call for the <CODE>butfirst</CODE>: + +<P><PRE>(cond ((empty? sent) '()) + ((some-test? (first sent)) + (se (first sent) <U>(recursive-call (bf sent))</U>)) + (else <U>(recursive-call (bf sent))</U>)) +</PRE> + +<P>The difference is that in the <CODE>keep</CODE> pattern the choice +between alternatives <EM>can</EM> be made just by looking at the immediate +situation—the single word that might or might not be chosen; the decision +doesn't depend on anything in the rest of the problem. As a result, the +choice has already been made before any recursive call happens. Therefore, +only one of the recursive calls is actually made, to make choices about the +remaining words in the sentence. + +<P>In <CODE>match?</CODE>, by contrast, any particular invocation can't make its +choice until it knows the result of a recursive invocation. The result +from the recursive call determines the choice made by the caller. + +<P>Here's a model that might help you think about this kind of recursion. <CODE>Match?</CODE> sees a question mark in the pattern. It makes a <EM>tentative</EM> +decision that this question mark should match the first word of the +sentence, and it uses a recursive invocation to see whether that decision +allows the rest of the problem to be solved. If so, the tentative choice +was correct. If not, <CODE>match?</CODE> tries an alternative decision that the +question mark doesn't match a word. This alternative is still tentative; +another recursive call is needed to see if the rest of the pattern can +succeed. If not, the overall match fails. + +<P>This structure is called <EM>backtracking.</EM> + +<P>What if there are two question marks in the pattern? Then there are <EM>four</EM> ways to match the overall pattern. Both question marks can match a +word, or only the first question mark, or only the second, or neither. A +pattern with several placeholders leads to even more alternatives. A +pattern with three question marks will have eight alternatives. (All three +match words, the first two do but the third doesn't, and so on.) A pattern +with 10 question marks will have 1024 alternatives. How can <CODE>match?</CODE> +try all these alternatives? The procedure seems to make only one two-way +choice; how can it accomplish a four-way or many-way decision? + +<P>The secret is the same as the usual secret of recursion: Most of the work +is done in recursive calls. We take a leap of faith that recursive +invocations will take care of the decisions concerning question marks later +in the pattern. Think about it using the backtracking model. Let's suppose +there are 10 question marks in the pattern. When <CODE>match?</CODE> encounters +the leftmost question mark, it makes a tentative decision to match the +question mark with a word of the sentence. To test whether this choice can +work, <CODE>match?</CODE> invokes itself recursively on a pattern with nine +question marks. By the leap of faith, the recursive invocation will examine +512 ways to match question marks with words—half of the total number. If +one of these 512 works, we're finished. If not, the original <CODE>match?</CODE> +invocation changes its tentative choice, deciding instead <EM>not</EM> to +match its question mark (the leftmost one) with a word of the sentence. +Another recursive call is made based on that decision, and that recursive +call checks out the remaining 512 possibilities. + +<P>By the way, the program doesn't always have to try all of the different +combinations of question marks matching or not matching words separately. +For example, if the problem is + +<P><PRE>(match? '(a b ? ? ? ?) '(x y z w p q)) +</PRE> + +<P>then the very first comparison discovers that <CODE>a</CODE> is different +from <CODE>x</CODE>, so none of the 16 possible arrangements about question marks +matching or not matching words will make a difference. + +<P>Here are some traced examples involving patterns with two question marks, to +show how the result of backtracking depends on the individual problem. + +<P><PRE>> (match? '(? ? foo) '(bar foo)) +<SMALL><CODE>(match? (? ? foo) (bar foo)) +| (match? (? foo) (foo)) +| | (match? (foo) ()) +| | #f +| | (match? (foo) (foo)) +| | | (match? () ()) +| | | #t +| | #t +| #t +#t +</CODE></SMALL>#T +</PRE> + +<P>In this first example, the first question mark tries to match the +word <CODE>bar</CODE>, but it can't tell whether or not that match will succeed +until the recursive call returns. In the recursive call, the second +question mark tries to match the word <CODE>foo</CODE>, and fails. Then the +second question mark tries again, this time matching nothing, and succeeds. +Therefore, the first question mark can report success; it never has to try a +recursive call in which it doesn't match a word. + +<P>In our second example, each question mark will have to try both alternatives, +matching and then not matching a word, before the overall match succeeds. + +<P><PRE>> (match? '(? ? foo bar) '(foo bar)) +<SMALL><CODE>(match? (? ? foo bar) (foo bar)) +| (match? (? foo bar) (bar)) +| | (match? (foo bar) ()) +| | #f +| | (match? (foo bar) (bar)) +| | #f +| #f +| (match? (? foo bar) (foo bar)) +| | (match? (foo bar) (bar)) +| | #f +| | (match? (foo bar) (foo bar)) +| | | (match? (bar) (bar)) +| | | | (match? () ()) +| | | | #t +| | | #t +| | #t +| #t +#t +</CODE></SMALL>#T +</PRE> + +<P>The first question mark tries to match the word <CODE>foo</CODE> in the +sentence, leaving the pattern <CODE>(? foo bar)</CODE> to match <CODE>(bar)</CODE>. The +second question mark will try both matching and not matching a word, but +neither succeeds. Therefore, the first question mark tries again, this time +not matching a word. The second question mark first tries matching <CODE>foo</CODE>, and when that fails, tries not matching anything. This last attempt is +successful. + +<P>In the previous example, every question mark's first attempt failed. The +following example illustrates the opposite case, in which every question +mark's first attempt succeeds. + +<P><PRE>> (match? '(? ? baz) '(foo bar baz)) +<SMALL><CODE>(match? (? ? baz) (foo bar baz)) +| (match? (? baz) (bar baz)) +| | (match? (baz) (baz)) +| | | (match? () ()) +| | | #t +| | #t +| #t +#t +</CODE></SMALL>#t +</PRE> + +<P>The first question mark matches <CODE>foo</CODE>; the second matches <CODE>bar</CODE>. + +<P>If the sentence is shorter than the pattern, we may end up trying to match a +pattern against an empty sentence. This is much easier than the general +problem, because there aren't two alternatives; a question mark has no word +in the sentence to match. + +<P><PRE>> (match? '(? ? foo) '()) +<SMALL><CODE>(match? (? ? foo) ()) +| (match? (? foo) ()) +| | (match? (foo) ()) +| | #f +| #f +#f +</CODE></SMALL>#f +</PRE> + +<P>Each question mark knows right away that it had better not try to +match a word, so we never have to backtrack. + +<P><H2>Matching Several Words</H2> + +<P>The next placeholder we'll implement is <CODE>*</CODE>. The order in which we're +implementing these placeholders was chosen so that each new version +increases the variability in the number of words a placeholder can match. +The <CODE>!</CODE> placeholder was very easy because it always matches exactly one +word; it's hardly different at all from a non-placeholder in the pattern. +Implementing <CODE>?</CODE> was more complicated because there were two +alternatives to consider. But for <CODE>*</CODE>, we might match any number of +words, up to the entire rest of the sentence. + +<P>Our strategy will be a generalization of the <CODE>?</CODE> strategy: Start with +a "greedy" match, and then, if a recursive call tells us that the +remaining part of the sentence can't match the rest of the pattern, try a +less greedy match. + +<P>The difference between <CODE>?</CODE> and <CODE>*</CODE> is that <CODE>?</CODE> allows only two +possible match lengths, zero and one. Therefore, these two cases can be +checked with two explicit subexpressions of an <CODE>or</CODE> expression. In the +more general case of <CODE>*</CODE>, any length is possible, so we can't check every +possibility separately. Instead, as in any problem of unknown size, we use +recursion. First we try the longest possible match; if that fails because +the rest of the pattern can't be matched, a recursive call tries the +next-longest match. If we get all the way down to an empty match for the +<CODE>*</CODE> and still can't match the rest of the pattern, then we return <CODE>#f</CODE>. + +<P><PRE>(define (match? pattern sent) ;; third version: !, ?, and * + (cond ((empty? pattern) + (empty? sent)) + ((equal? (first pattern) '?) + (if (empty? sent) + (match? (bf pattern) '()) + (or (match? (bf pattern) (bf sent)) + (match? (bf pattern) sent)))) + <U>((equal? (first pattern) '*)</U> + <U>(*-longest-match (bf pattern) sent))</U> + ((empty? sent) #f) + ((equal? (first pattern) '!) + (match? (bf pattern) (bf sent))) + ((equal? (first pattern) (first sent)) + (match? (bf pattern) (bf sent))) + (else #f))) + +(define (*-longest-match pattern-rest sent) + (*-lm-helper pattern-rest sent '())) + +(define (*-lm-helper pattern-rest sent-matched sent-unmatched) + (cond ((match? pattern-rest sent-unmatched) #t) + ((empty? sent-matched) #f) + (else (*-lm-helper pattern-rest + (bl sent-matched) + (se (last sent-matched) sent-unmatched))))) +</PRE> + +<P>If an asterisk is found in the pattern, <CODE>match?</CODE> invokes <CODE>*-longest-match</CODE>, which carries out this backtracking approach. + +<P>The real work is done by <CODE>*-lm-helper</CODE>, which has three arguments. +The first argument is the still-to-be-matched part of the pattern, following +the <CODE>*</CODE> placeholder that we're trying to match now. <CODE>Sent-matched</CODE> +is the part of the sentence that we're considering as a candidate to match +the <CODE>*</CODE> placeholder. <CODE>Sent-unmatched</CODE> is the remainder of the +sentence, following the words in <CODE>sent-matched</CODE>; it must match +<CODE>pattern-rest</CODE>. + +<P>Since we're trying to find the longest possible match, <CODE>*-longest-match</CODE> +chooses the entire sentence as the first attempt for <CODE>sent-matched</CODE>. +Since <CODE>sent-matched</CODE> is using up the entire sentence, the initial value +of <CODE>sent-unmatched</CODE> is empty. The only job of <CODE>*-longest-match</CODE> is +to invoke <CODE>*-lm-helper</CODE> with these initial arguments. On each +recursive invocation, <CODE>*-lm-helper</CODE> shortens <CODE>sent-matched</CODE> by one +word and accordingly lengthens <CODE>sent-unmatched</CODE>. + +<P>Here's an example in which the <CODE>*</CODE> placeholder tries to match four +words, then three words, and finally succeeds with two words: + +<P><PRE>> (trace match? *-longest-match *-lm-helper) + +> (match? '(* days night) '(a hard days night)) +<SMALL><CODE>(match? (* days night) (a hard days night)) +| (*-longest-match (days night) (a hard days night)) +| | (*-lm-helper (days night) (a hard days night) ()) +| | | (match? (days night) ()) +| | | #f +| | | (*-lm-helper (days night) (a hard days) (night)) +| | | | (match? (days night) (night)) +| | | | #f +| | | | (*-lm-helper (days night) (a hard) (days night)) +| | | | | (match? (days night) (days night)) +| | | | | | (match? (night) (night)) +| | | | | | | (match? () ()) +| | | | | | | #t +| | | | | | #t +| | | | | #t +| | | | #t +| | | #t +| | #t +| #t +#t +</CODE></SMALL>#t +</PRE> + +<P><H2>Combining the Placeholders</H2> + +<P>We have one remaining placeholder, <CODE>&</CODE>, which is much like <CODE>*</CODE> +except that it fails unless it can match at least one word. We could, +therefore, write a <CODE>&-longest-match</CODE> that would be identical to <CODE>*-longest-match</CODE> except for the base case of its helper procedure. If <CODE>sent-matched</CODE> is empty, the result is <CODE>#f</CODE> even if it would be possible +to match the rest of the pattern against the rest of the sentence. (All we +have to do is exchange the first two clauses of the <CODE>cond</CODE>.) + +<P><PRE>(define (&-longest-match pattern-rest sent) + (&-lm-helper pattern-rest sent '())) + +(define (&-lm-helper pattern-rest sent-matched sent-unmatched) + (cond ((empty? sent-matched) #f) + ((match? pattern-rest sent-unmatched) #t) + (else (&-lm-helper pattern-rest + (bl sent-matched) + (se (last sent-matched) sent-unmatched))))) +</PRE> + +<P>When two procedures are so similar, that's a clue that perhaps +they could be combined into one. We could look at the bodies of these two +procedures to find a way to combine them textually. But instead, let's step +back and think about the meanings of the placeholders. + +<P>The reason that the procedures <CODE>*-longest-match</CODE> and <CODE>&-longest-match</CODE> are so similar is that the two placeholders have almost +identical meanings. <CODE>*</CODE> means "match as many words as possible"; +<CODE>&</CODE> means "match as many words as possible, but at least one." Once +we're thinking in these terms, it's plausible to think of <CODE>?</CODE> as +meaning "match as many words as possible, but at most one." In fact, +although this is a stretch, we can also describe <CODE>!</CODE> similarly: "Match +as many words as possible, but at least one, and at most one." + +<P><TABLE><TR><TH>Placeholder <TH>Minimum size +<TH>Maximum size +<TR><TD><CODE>*</CODE><TD>0<TD>no limit +<TR><TD><CODE>&</CODE><TD>1<TD>no limit +<TR><TD><CODE>?</CODE><TD>0<TD>1 +<TR><TD><CODE>!</CODE><TD>1<TD>1 +</TABLE> + + +<P>We'll take advantage of this newly understood similarity to +simplify the program by using a single algorithm for all placeholders. + +<P>How do we generalize <CODE>*-longest-match</CODE> and <CODE>&-longest-match</CODE> to +handle all four cases? There are two kinds of generalization involved. +We'll write a procedure <CODE>longest-match</CODE> that will have the same +arguments as <CODE>*-longest-match</CODE>, plus two others, one for for the minimum +size of the matched text and one for the maximum. + +<P>We'll specify the minimum size with a formal parameter <CODE>min</CODE>. (The +corresponding argument will always be 0 or 1.) <CODE>Longest-match</CODE> will +pass the value of <CODE>min</CODE> down to <CODE>lm-helper</CODE>, which will use it to +reject potential matches that are too short. + +<P>Unfortunately, we can't use a number to specify the maximum size, because +for <CODE>*</CODE> and <CODE>&</CODE> there is no maximum. Instead, <CODE>longest-match</CODE> +has a formal parameter <CODE>max-one?</CODE> whose value is <CODE>#t</CODE> only for <CODE>?</CODE> and <CODE>!</CODE>. + +<P>Our earlier, special-case versions of <CODE>longest-match</CODE> were written for +<CODE>*</CODE> and <CODE>&</CODE>, the placeholders for which <CODE>max-one?</CODE> will be +false. For those placeholders, the new <CODE>longest-match</CODE> will be just +like the earlier versions. Our next task is to generalize <CODE>longest-match</CODE> so that it can handle the <CODE>#t</CODE> cases. + +<P>Think about the meaning of the <CODE>sent-matched</CODE> and <CODE>sent-unmatched</CODE> +parameters in the <CODE>lm-helper</CODE> procedures. <CODE>Sent-matched</CODE> means +"the longest part of the sentence that this placeholder is still allowed to +match," while <CODE>sent-unmatched</CODE> contains whatever portion of the +sentence has already been disqualified from being matched by the placeholder. + +<P>Consider the behavior of <CODE>*-longest-match</CODE> when an asterisk is at the +beginning of a pattern that we're trying to match against a seven-word +sentence. Initially, <CODE>sent-matched</CODE> is the entire seven-word sentence, +and <CODE>sent-unmatched</CODE> is empty. Then, supposing that doesn't work, <CODE>sent-matched</CODE> is a six-word sentence, while <CODE>sent-unmatched</CODE> contains +the remaining word. This continues as long as no match succeeds until, near +the end of <CODE>longest-match</CODE>'s job, <CODE>sent-matched</CODE> is a one-word +sentence and <CODE>sent-unmatched</CODE> contains six words. At this point, the +longest possible match for the asterisk is a single word. + +<P>This situation is where we want to <EM>start</EM> in the case of the <CODE>?</CODE> +and <CODE>!</CODE> placeholders. So when we're trying to match one of these +placeholders, our initialization procedure won't use the entire sentence as +the initial value of <CODE>sent-matched</CODE>; rather, the initial value of <CODE>sent-matched</CODE> will be a one-word sentence, and <CODE>sent-unmatched</CODE> will +contain the rest of the sentence. + +<P><PRE>(define (longest-match pattern-rest sent min max-one?) ;; first version + (cond ((empty? sent) + (and (= min 0) (match? pattern-rest sent))) + (max-one? + (lm-helper pattern-rest (se (first sent)) (bf sent) min)) + (else (lm-helper pattern-rest sent '() min)))) + +(define (lm-helper pattern-rest sent-matched sent-unmatched min) + (cond ((< (length sent-matched) min) #f) + ((match? pattern-rest sent-unmatched) #t) + ((empty? sent-matched) #f) + (else (lm-helper pattern-rest + (bl sent-matched) + (se (last sent-matched) sent-unmatched) + min)))) +</PRE> + +<P>Now we can rewrite <CODE>match?</CODE> to use <CODE>longest-match</CODE>. <CODE>Match?</CODE> +will delegate the handling of all placeholders to a subprocedure <CODE>match-special</CODE> that will invoke <CODE>longest-match</CODE> with the correct values +for <CODE>min</CODE> and <CODE>max-one?</CODE> according to the table. + +<P><PRE>(define (match? pattern sent) ;; fourth version + (cond ((empty? pattern) + (empty? sent)) + <U>((special? (first pattern))</U> + <U>(match-special (first pattern) (bf pattern) sent))</U> + ((empty? sent) #f) + ((equal? (first pattern) (first sent)) + (match? (bf pattern) (bf sent))) + (else #f))) + +(define (special? wd) ;; first version + (member? wd '(* & ? !))) + +(define (match-special placeholder pattern-rest sent) ;; first version + (cond ((equal? placeholder '?) + (longest-match pattern-rest sent 0 #t)) + ((equal? placeholder '!) + (longest-match pattern-rest sent 1 #t)) + ((equal? placeholder '*) + (longest-match pattern-rest sent 0 #f)) + ((equal? placeholder '&) + (longest-match pattern-rest sent 1 #f)))) +</PRE> + +<P><H2>Naming the Matched Text</H2> + +<P>So far we've worked out how to match the four kinds of placeholders and +return a true or false value indicating whether a match is possible. Our +program is almost finished; all we need to make it useful is the +facility that will let us find out <EM>which</EM> words in the sentence +matched each placeholder in the pattern. + +<P>We don't have to change the overall structure of the program in order to +make this work. But most of the procedures in the pattern matcher will have +to be given an additional argument, the database of placeholder names +and values that have been matched so far.<A NAME="text3" HREF="match#ft3">[3]</A> The formal parameter <CODE>known-values</CODE> will hold +this database. Its value will be a sentence containing placeholder names +followed by the corresponding words and an exclamation point to separate the +entries, as in the examples earlier in the chapter. When we begin the +search for a match, we use an empty sentence as the initial <CODE>known-values</CODE>: + +<P><PRE>(define (match pattern sent) + (match-using-known-values pattern sent '())) + +(define (match-using-known-values pattern sent known-values) + …) +</PRE> + +<P>As <CODE>match-using-known-values</CODE> matches the beginning of a +pattern with the beginning of a sentence, it invokes itself recursively with +an expanded <CODE>known-values</CODE> containing each newly matched placeholder. +For example, in evaluating + +<P><PRE>(match '(!twice !other !twice) '(cry baby cry)) +</PRE> + +<P>the program will call <CODE>match-using-known-values</CODE> four times: + +<P><TABLE><TR><TH><CODE>pattern</CODE><TH><CODE>sent</CODE> +<TH><CODE>known-values</CODE> +<TR><TD><CODE>(!twice !other !twice) </CODE> +<TD><CODE>(cry baby cry) </CODE> +<TD><CODE>()</CODE> +<TR><TD><CODE>(!other !twice)</CODE> +<TD><CODE>(baby cry)</CODE> +<TD><CODE>(twice cry !)</CODE> +<TR><TD><CODE>(!twice)</CODE> +<TD><CODE>(cry)</CODE> +<TD><CODE>(twice cry ! other baby !)</CODE> +<TR><TD><CODE>()</CODE> +<TD><CODE>()</CODE> +<TD><CODE>(twice cry ! other baby !)</CODE> +</TABLE> + + +<P>In the first invocation, we try to match <CODE>!twice</CODE> against some part of +the sentence. Since <CODE>!</CODE> matches exactly one word, the only possibility +is to match the word <CODE>cry</CODE>. The recursive invocation, therefore, is made +with the first words of the pattern and sentence removed, but with the match +between <CODE>twice</CODE> and <CODE>cry</CODE> added to the database. + +<P>Similarly, the second invocation matches <CODE>!other</CODE> with <CODE>baby</CODE> and +causes a third invocation with shortened pattern and sentence but a longer +database. + +<P>The third invocation is a little different because the pattern contains the +placeholder <CODE>!twice</CODE>, but the name <CODE>twice</CODE> is already in the +database. Therefore, this placeholder can't match whatever word happens to +be available; it must match the same word that it matched before. (Our +program will have to check for this situation.) Luckily, the sentence does +indeed contain the word <CODE>cry</CODE> at this position. + +<P>The final invocation reaches the base case of the recursion, because the +pattern is empty. The value that <CODE>match-using-known-values</CODE> returns is +the database in this invocation. + +<P><H2>The Final Version</H2> + +<P>We're now ready to show you the final version of the program. The program +structure is much like what you've seen before; the main difference is the +database of placeholder names and values. The program must add entries to +this database and must look for database entries that were added earlier. +Here are the three most important procedures and how they are changed +from the earlier version to implement this capability: + +<P><P><TABLE><TR><TH align="right" valign="top">•<TD> <TD valign="top"><CODE>match-using-known-values</CODE>, essentially the same as what was +formerly named <CODE>match?</CODE> except for bookkeeping details. +</TABLE><TABLE><TR><TH align="right" valign="top">•<TD> <TD valign="top"><CODE>match-special</CODE>, similar to the old version, except that it must +recognize the case of a placeholder whose name has already been seen. In +this case, the placeholder can match only the same words that it matched +before. +</TABLE><TABLE><TR><TH align="right" valign="top">•<TD> <TD valign="top"><CODE>longest-match</CODE> and <CODE>lm-helper</CODE>, also similar to the old +versions, except that they have the additional job of adding to the database the +name and value of any placeholder that they match. + +</TABLE> + +<P>Here are the modified procedures. Compare them to the previous +versions. + +<P><PRE>(define (<A NAME="g4"></A>match pattern sent) + (match-using-known-values pattern sent '())) + +(define (<A NAME="g5"></A>match-using-known-values pattern sent known-values) + (cond ((empty? pattern) + (if (empty? sent) known-values 'failed)) + ((special? (first pattern)) + (let ((placeholder (first pattern))) + (match-special (first placeholder) + (bf placeholder) + (bf pattern) + sent + known-values))) + ((empty? sent) 'failed) + ((equal? (first pattern) (first sent)) + (match-using-known-values (bf pattern) (bf sent) known-values)) + (else 'failed))) + +(define (<A NAME="g6"></A>match-special howmany name pattern-rest sent known-values) + (let ((old-value (lookup name known-values))) + (cond ((not (equal? old-value 'no-value)) + (if (length-ok? old-value howmany) + (already-known-match + old-value pattern-rest sent known-values) + 'failed)) + ((equal? howmany '?) + (longest-match name pattern-rest sent 0 #t known-values)) + ((equal? howmany '!) + (longest-match name pattern-rest sent 1 #t known-values)) + ((equal? howmany '*) + (longest-match name pattern-rest sent 0 #f known-values)) + ((equal? howmany '&) + (longest-match name pattern-rest sent 1 #f known-values))))) + +(define (<A NAME="g7"></A>longest-match name pattern-rest sent min max-one? known-values) + (cond ((empty? sent) + (if (= min 0) + (match-using-known-values pattern-rest + sent + (add name '() known-values)) + 'failed)) + (max-one? + (lm-helper name pattern-rest (se (first sent)) + (bf sent) min known-values)) + (else (lm-helper name pattern-rest + sent '() min known-values)))) + +(define (<A NAME="g8"></A>lm-helper name pattern-rest + sent-matched sent-unmatched min known-values) + (if (< (length sent-matched) min) + 'failed + (let ((tentative-result (match-using-known-values + pattern-rest + sent-unmatched + (add name sent-matched known-values)))) + (cond ((not (equal? tentative-result 'failed)) tentative-result) + ((empty? sent-matched) 'failed) + (else (lm-helper name + pattern-rest + (bl sent-matched) + (se (last sent-matched) sent-unmatched) + min + known-values)))))) +</PRE> + +<P>We haven't listed all of the minor procedures that these procedures invoke. +A complete listing is at the end of the chapter, but we hope that you have +enough confidence about the overall program structure to be able to assume +these small details will work. In the next few paragraphs we discuss some +of the ways in which the procedures shown here differ from the earlier +versions. + +<P>In the invocation of <CODE>match-special</CODE> we found it convenient to split the +placeholder into its first character, the one that tells how many words can +be matched, and the butfirst, which is the name of the placeholder. + +<P>What happens if <CODE>match-special</CODE> finds that the name is already in the +database? In this situation, we don't have to try multiple possibilities +for the number of words to match (the usual job of <CODE>longest-match</CODE>); the +placeholder must match exactly the words that it matched before. In this +situation, three things must be true in order for the match to succeed: +<A NAME="threematchconditions"></A> +(1) The first words of the <CODE>sent</CODE> argument must match the old value +stored in the database. (2) The partial <CODE>pattern</CODE> that remains after +this placeholder must match the rest of the <CODE>sent</CODE>. (3) The old value +must be consistent with the number of words permitted by the <CODE>howmany</CODE> +part of the placeholder. For example, if the pattern is + +<P><PRE>(*stuff and !stuff) +</PRE> + +<P>and the database says that the placeholder <CODE>*stuff</CODE> was +matched by three words from the sentence, then the second placeholder <CODE>!stuff</CODE> can't possibly be matched because it accepts only one word. This +third condition is actually checked first, by <CODE>length-ok?</CODE>, and if we +pass that hurdle, the other two conditions are checked by <CODE>already-known-match</CODE>. + +<P>The only significant change to <CODE>longest-match</CODE> is that it invokes +<CODE>add</CODE> to compute an expanded database with the newly found match added, +and it uses the resulting database as an argument to <CODE>match-using-known-values</CODE>. + +<P><H2>Abstract Data Types</H2> + +<P>As you know, a database of known values is represented in this program as a +sentence in which the entries are separated by exclamation points. Where is +this representation accomplished in the program you've seen? There's +nothing like + +<P><PRE>…(sentence old-known-values name value '!) …</PRE> + +<P>anywhere in the procedures we've shown. Instead, the program makes +reference to the database of known values through two procedure calls: + +<P><PRE>(lookup name known-values) ; in match-special +(add name matched known-values) ; in longest-match +</PRE> + +<P>Only the procedures <CODE>lookup</CODE> and <CODE>add</CODE> manipulate the +database of known values: + +<P><PRE>(define (<A NAME="g9"></A>lookup name known-values) + (cond ((empty? known-values) 'no-value) + ((equal? (first known-values) name) + (get-value (bf known-values))) + (else (lookup name (skip-value known-values))))) + +(define (<A NAME="g10"></A>get-value stuff) + (if (equal? (first stuff) '!) + '() + (se (first stuff) (get-value (bf stuff))))) + +(define (<A NAME="g11"></A>skip-value stuff) + (if (equal? (first stuff) '!) + (bf stuff) + (skip-value (bf stuff)))) + +(define (<A NAME="g12"></A>add name value known-values) + (if (empty? name) + known-values + (se known-values name value '!))) +</PRE> + +<P>These procedures are full of small details. For example, it's a little +tricky to extract the part of a sentence from a name to the next exclamation +point. It's convenient that we could write the more important procedures, +such as <CODE>longest-match</CODE>, without filling them with these details. +As far as <CODE>longest-match</CODE> knows, <CODE>lookup</CODE> and <CODE>add</CODE> could be +Scheme primitive procedures. In effect we've created a new data type, with +<CODE>add</CODE> as its constructor and <CODE>lookup</CODE> as its selector. + +<P>Types such as these, that are invented by a programmer and aren't part of the +Scheme language itself, are called <EM><A NAME="g13"></A><A NAME="g14"></A>abstract data types.</EM> +<A NAME="g15"></A> +<A NAME="g16"></A> +Creating an abstract data type means drawing a barrier between an idea about +some kind of information we want to model in a program and the particular +mechanism that we use to represent the information. In this case, the +information is a collection of name-value associations, and the particular +mechanism is a sentence with exclamation points and so on. The pattern +matcher doesn't think of the database as a sentence. For example, it would +be silly to translate the database into Pig Latin or find its acronym. + +<P>Just as we distinguish the <EM>primitive</EM> procedures that Scheme knows +all along from the <EM>compound</EM> procedures that the Scheme programmer +defines, we could use the names "primitive data type" for types such as +numbers and Booleans that are built into Scheme and "compound data type" +for ones that the programmer invents by defining selectors and constructors. +But "compound data type" is a bit of a pun, because it also suggests a data +type built out of smaller pieces, just as a compound expression is built of +smaller expressions. Perhaps that's why the name "abstract data type" has +become generally accepted. It's connected to the idea of abstraction +that we introduced earlier, because in order to create an abstract data type, +we must specify the selectors and constructors and give names to those +patterns of computation. + +<P><H2>Backtracking and <CODE>Known-Values</CODE></H2> + +<P>What happens to the database in cases that require backtracking, where +a particular recursive call might be "on the wrong track"? Let's trace +<CODE>match-using-known-values</CODE> and see what happens. (We'll use the +little-people model to discuss this example, and so we're annotating each +invocation in the trace with the name of its little person.) + +<P><PRE>> (trace match-using-known-values) +> (match '(*start me *end) '(love me do))<SMALL><CODE> +(match-using-known-values (*start me *end) (love me do) ()) Martha +| (match-using-known-values (me *end) () (start love me do !)) Mercutio +| failed +| (match-using-known-values (me *end) (do) (start love me !)) Masayuki +| failed +| (match-using-known-values (me *end) (me do) (start love !)) Mohammad +| | (match-using-known-values (*end) (do) (start love !)) Mae +| | | (match-using-known-values () () (start love ! end do !)) Merlin +| | | (start love ! end do !) +| | (start love ! end do !) +| (start love ! end do !) +(start love ! end do !) +</CODE></SMALL>(START LOVE ! END DO !) +</PRE> + +<P>Martha, the first little person shown, has an empty <CODE>known-values</CODE>. She +makes three attempts to match <CODE>*start</CODE> with parts of the sentence. In +each case, a little person is hired with the provisional match in his or her +<CODE>known-values</CODE>. (Actually, Martha does not directly hire Mercutio and +the others. Martha hires a <CODE>match-special</CODE> little person, who in turn +hires a <CODE>longest-match</CODE> specialist, who hires an <CODE>lm-helper</CODE> +specialist, who hires Mercutio. But that added complexity isn't important +for the point we're focusing on right now, namely, how backtracking can +work. Pretend Martha hires Mercutio.) + +<P>If you don't use the little-people model, but instead think about the +program as if there were just one <CODE>known-values</CODE> variable, then the +backtracking can indeed be very mysterious. Once a provisional match is +added to the database, how is it ever removed? The answer is that it doesn't +work that way. There isn't a "the" database. Instead, each little person +has a separate database. If an attempted match fails, the little person who +reports the failure just stops working. For example, Martha hires Mercutio +to attempt a match in which the name <CODE>start</CODE> has the value <CODE>love me +do</CODE>. Mercutio is unable to complete the match, and reports failure. It is +Martha, not Mercutio, who then hires Masayuki to try another value for <CODE>start</CODE>. Martha's database hasn't changed, so Martha gives Masayuki a +database that reflects the new trial value but not the old one. + +<P>Not every hiring of a little person starts from an empty database. When a +match is partially successful, the continuation of the same attempt must +benefit from the work that's already been done. So, for example, when +Mohammad hires Mae, and when Mae hires Merlin, each of them passes on an +extended database, not an empty one. Specifically, Mae gives Merlin the +new match of the name <CODE>end</CODE> with the value <CODE>do</CODE>, but also the match +of <CODE>start</CODE> with <CODE>love</CODE> that she was given by Mohammad. + +<P>So as you can see, we don't have to do anything special to keep track of our +database when we backtrack; the structure of the recursion takes care of +everything for free. + +<P><H2>How We Wrote It</H2> + +<P>For explanatory purposes we've chosen to present the pieces of this program +in a different order from the one in which we actually wrote them. We <EM>did</EM> implement the easy placeholders (<CODE>!</CODE> and <CODE>?</CODE>) before the +harder ones. But our program had provision for a database of names from the +beginning. + +<P>There is no "right" way to approach a programming problem. Our particular +approach was determined partly by our past experience. Each of us had +written similar programs before, and we had preconceived ideas about the +easy and hard parts. You might well start at a different point. For +example, here is an elegant small program we'd both been shown by friends: + +<P><PRE>(define (match? pattern sent) + (cond ((empty? pattern) (empty? sent)) + ((empty? sent) + (and (equal? (first pattern) '*) (match? (bf pattern) sent))) + ((equal? (first pattern) '*) + (or (match? pattern (bf sent)) + (match? (bf pattern) sent))) + (else (and (equal? (first pattern) (first sent)) + (match? (bf pattern) (bf sent)))))) +</PRE> + +<P>What's appealing about this is the funny symmetry of taking the +<CODE>butfirst</CODE> of the pattern <EM>or</EM> of the sentence. That's not +something you'd naturally think of, probably, but once you've worked out how +it can work, it affects your preconceptions when you set out to write a +pattern matcher yourself. + +<P>Based on that inspiration, we might well have started with the hard cases +(such as <CODE>*</CODE>), with the idea that once they're in place, the easy cases +won't change the program structure much. + +<P><H2>Complete Program Listing</H2> + +<P><P><P><PRE> +(define (match pattern sent) + (match-using-known-values pattern sent '())) + +(define (match-using-known-values pattern sent known-values) + (cond ((empty? pattern) + (if (empty? sent) known-values 'failed)) + ((special? (first pattern)) + (let ((placeholder (first pattern))) + (match-special (first placeholder) + (bf placeholder) + (bf pattern) + sent + known-values))) + ((empty? sent) 'failed) + ((equal? (first pattern) (first sent)) + (match-using-known-values (bf pattern) (bf sent) known-values)) + (else 'failed))) + +(define (special? wd) + (member? (first wd) '(* & ? !))) + +(define (match-special howmany name pattern-rest sent known-values) + (let ((old-value (lookup name known-values))) + (cond ((not (equal? old-value 'no-value)) + (if (length-ok? old-value howmany) + (already-known-match + old-value pattern-rest sent known-values) + 'failed)) + ((equal? howmany '?) + (longest-match name pattern-rest sent 0 #t known-values)) + ((equal? howmany '!) + (longest-match name pattern-rest sent 1 #t known-values)) + ((equal? howmany '*) + (longest-match name pattern-rest sent 0 #f known-values)) + ((equal? howmany '&) + (longest-match name pattern-rest sent 1 #f known-values))))) + +(define (length-ok? value howmany) + (cond ((empty? value) (member? howmany '(? *))) + ((not (empty? (bf value))) (member? howmany '(* &))) + (else #t))) + +(define (already-known-match value pattern-rest sent known-values) + (let ((unmatched (chop-leading-substring value sent))) + (if (not (equal? unmatched 'failed)) + (match-using-known-values pattern-rest unmatched known-values) + 'failed))) + +(define (chop-leading-substring value sent) + (cond ((empty? value) sent) + ((empty? sent) 'failed) + ((equal? (first value) (first sent)) + (chop-leading-substring (bf value) (bf sent))) + (else 'failed))) + +(define (longest-match name pattern-rest sent min max-one? known-values) + (cond ((empty? sent) + (if (= min 0) + (match-using-known-values pattern-rest + sent + (add name '() known-values)) + 'failed)) + (max-one? + (lm-helper name pattern-rest (se (first sent)) + (bf sent) min known-values)) + (else (lm-helper name pattern-rest + sent '() min known-values)))) + +(define (lm-helper name pattern-rest + sent-matched sent-unmatched min known-values) + (if (< (length sent-matched) min) + 'failed + (let ((tentative-result (match-using-known-values + pattern-rest + sent-unmatched + (add name sent-matched known-values)))) + (cond ((not (equal? tentative-result 'failed)) tentative-result) + ((empty? sent-matched) 'failed) + (else (lm-helper name + pattern-rest + (bl sent-matched) + (se (last sent-matched) sent-unmatched) + min + known-values)))))) + +;;; Known values database abstract data type + +(define (lookup name known-values) + (cond ((empty? known-values) 'no-value) + ((equal? (first known-values) name) + (get-value (bf known-values))) + (else (lookup name (skip-value known-values))))) + +(define (get-value stuff) + (if (equal? (first stuff) '!) + '() + (se (first stuff) (get-value (bf stuff))))) + +(define (skip-value stuff) + (if (equal? (first stuff) '!) + (bf stuff) + (skip-value (bf stuff)))) + +(define (add name value known-values) + (if (empty? name) + known-values + (se known-values name value '!))) +</PRE><P> + + +<P><H2>Exercises about Using the Pattern Matcher</H2> + +<P><B>16.1</B> Design and test a pattern that matches any sentence containing the word <CODE>C</CODE> three times (not necessarily next to each other). + + +<P> +<B>16.2</B> Design and test a pattern that matches a sentence consisting of two copies +of a smaller sentence, such as <CODE>(a b a b)</CODE>. + + +<P> +<B>16.3</B> Design and test a pattern that matches any sentence of no more than three +words. + + +<P> +<B>16.4</B> Design and test a pattern that matches any sentence of at least three words. + + +<P> +<B>16.5</B> Show sentences of length 2, 3, and 4 that match the pattern + +<P><PRE>(*x *y *y *x) +</PRE> + +<P>For each length, if no sentence can match the pattern, explain why +not. + + +<P> +<B>16.6</B> Show sentences of length 2, 3, and 4 that match the pattern + +<P><PRE>(*x *y &y &x) +</PRE> + +<P>For each length, if no sentence can match the pattern, explain why +not. + + +<P> +<B>16.7</B> List <EM>all</EM> the sentences of length 6 or less, starting with <CODE>a b a</CODE>, that match the pattern + +<P><PRE>(*x *y *y *x) +</PRE> + + +<P> +<H2>Exercises about Implementation</H2> + +<P><B>16.8</B> Explain how <CODE>longest-match</CODE> handles an empty sentence. + + +<P> +<B>16.9</B> Suppose the first <CODE>cond</CODE> clause in <CODE>match-using-known-values</CODE> were + +<P><PRE>((empty? pattern) known-values) +</PRE> + +<P>Give an example of a pattern and sentence for which the modified +program would give a different result from the original. + + +<P> +<B>16.10</B> What happens if the sentence argument—not the pattern—contains +the word <CODE>*</CODE> somewhere? + + +<P> +<B>16.11</B> For each of the following examples, how many <CODE>match-using-known-values</CODE> +little people are required? + +<P><PRE>(match '(from me to you) '(from me to you)) +(match '(*x *y *x) '(a b c a b)) +(match '(*x *y *z) '(a b c a b)) +(match '(*x hey *y bulldog *z) '(a hey b bulldog c)) +(match '(*x a b c d e f) '(a b c d e f)) +(match '(a b c d e f *x) '(a b c d e f)) +</PRE> + +<P>In general, what can you say about the characteristics that make a +pattern easy or hard to match? + + +<P> +<B>16.12</B> Show a pattern with the following two properties: (1) It has at least two +placeholders. (2) When you match it against any sentence, every invocation +of <CODE>lookup</CODE> returns <CODE>no-value</CODE>. + + +<P> +<B>16.13</B> Show a pattern and a sentence that can be used as arguments to <CODE>match</CODE> +so that <CODE>lookup</CODE> returns <CODE>(the beatles)</CODE> at some point during the +match. + + +<P> +<B>16.14</B> Our program can still match patterns with unnamed placeholders. +How would it affect the operation of the program if these unnamed +placeholders were added to the database? What part of the program keeps +them from being added? + + +<P> +<B>16.15</B> Why don't <CODE>get-value</CODE> and <CODE>skip-value</CODE> check for an empty argument +as the base case? + + +<P> +<B>16.16</B> Why didn't we write the first <CODE>cond</CODE> clause in <CODE>length-ok?</CODE> as +the following? + +<P><PRE>((and (empty? value) (member? howmany '(? *))) #t) +</PRE> + +<P> +<B>16.17</B> Where in the program is the initial empty database of known values +established? + + +<P> +<B>16.18</B> For the case of matching a placeholder name that's already been matched in +this pattern, we said on page <A HREF="match.html#threematchconditions">there</A> that three conditions +must be checked. For each of the three, give a pattern and sentence that +the program would incorrectly match if the condition were not checked. + + +<P> +<B>16.19</B> What will the following example do? + +<P><PRE>(match '(?x is *y !x) '(! is an exclamation point !)) +</PRE> + +<P>Can you suggest a way to fix this problem? + + +<P> +<B>16.20</B> Modify the pattern matcher so that a placeholder of the form <CODE>*15x</CODE> is like <CODE>*x</CODE> except that it can be matched only by exactly 15 +words. + +<P><PRE>> (match '(*3front *back) '(your mother should know)) +(FRONT YOUR MOTHER SHOULD ! BACK KNOW !) +</PRE> + +<P> +<B>16.21</B> Modify the pattern matcher so that a <CODE>+</CODE> placeholder (with or without a +name attached) matches only a number: + +<P><PRE>> (match '(*front +middle *back) '(four score and 7 years ago)) +(FRONT FOUR SCORE AND ! MIDDLE 7 ! BACK YEARS AGO !) +</PRE> + +<P>The <CODE>+</CODE> placeholder is otherwise like <CODE>!</CODE>—it must match +exactly one word. + + +<P> +<B>16.22</B> Does your favorite text editor or word processor have a search command that +allows you to search for patterns rather than only specific strings of +characters? Look into this and compare your editor's capabilities with +that of our pattern matcher. + + +<P> + +<P> +<HR> +<A NAME="ft1" HREF="match#text1">[1]</A> Why not return the +sentence if successful or <CODE>#f</CODE> otherwise? That would be fine in most +versions of Scheme, but as we mentioned earlier, the empty sentence <CODE>()</CODE> +is the same as the false value <CODE>#f</CODE> in some dialects. In those +Schemes, a successfully matched pattern with no named placeholders, for +which the program should return an empty sentence, would be +indistinguishable from an unmatched pattern.<P> +<A NAME="ft2" HREF="match#text2">[2]</A> Actually, since <CODE>or</CODE> is a special form, Scheme avoids +the need to try the second alternative if the first one succeeds.<P> +<A NAME="ft3" HREF="match#text3">[3]</A> The word <EM>database</EM> +has two possible meanings in computer science, a broad meaning and a narrow +one. The broad meaning, which we're using here, is a repository of +information to which the program periodically adds new items for later +retrieval. The narrow meaning is a collection of information that's +manipulated by a <EM>database program,</EM> which provides facilities for +adding new information, modifying existing entries, selecting entries that +match some specified criterion, and so on. We'll see a database program +near the end of the book.<P> +<P><A HREF="../ss-toc2.html">(back to Table of Contents)</A><P> +<A HREF="../ssch15/poker.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch17/part5.html"><STRONG>NEXT</STRONG></A> + +<P> +<ADDRESS> +<A HREF="../index.html">Brian Harvey</A>, +<CODE>bh@cs.berkeley.edu</CODE> +</ADDRESS> +</BODY> +</HTML> |