about summary refs log tree commit diff stats
path: root/js/games/nluqo.github.io/~bh/ssch22/files.html
diff options
context:
space:
mode:
authorelioat <elioat@tilde.institute>2023-08-23 07:52:19 -0400
committerelioat <elioat@tilde.institute>2023-08-23 07:52:19 -0400
commit562a9a52d599d9a05f871404050968a5fd282640 (patch)
tree7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/ssch22/files.html
parent5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff)
downloadtour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz
*
Diffstat (limited to 'js/games/nluqo.github.io/~bh/ssch22/files.html')
-rw-r--r--js/games/nluqo.github.io/~bh/ssch22/files.html875
1 files changed, 875 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/ssch22/files.html b/js/games/nluqo.github.io/~bh/ssch22/files.html
new file mode 100644
index 0000000..8e85ab1
--- /dev/null
+++ b/js/games/nluqo.github.io/~bh/ssch22/files.html
@@ -0,0 +1,875 @@
+<P>
+
+<P><A NAME="cabinets"></A>
+<P><CENTER><IMG SRC="../ss-pics/files.jpg" ALT="figure: files"></CENTER>
+<HTML>
+<HEAD>
+<TITLE>Simply Scheme: Introducing Computer Science ch 22: Files</TITLE>
+</HEAD>
+<BODY>
+<HR>
+<CITE>Simply Scheme:</CITE>
+<CITE>Introducing Computer Science</CITE> 2/e Copyright (C) 1999 MIT
+<H2>Chapter 22</H2>
+<H1>Files</H1>
+
+<TABLE width="100%"><TR><TD>
+<IMG SRC="../simply.jpg" ALT="cover photo">
+<TD><TABLE>
+<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian
+Harvey</A><BR>University of California, Berkeley</CITE>
+<TR><TD align="right"><CITE><A HREF="http://ccrma.stanford.edu/~matt">Matthew
+Wright</A><BR>University of California, Santa Barbara</CITE>
+<TR><TD align="right"><BR>
+<TR><TD align="right"><A HREF="../pdf/ssch22.pdf">Download PDF version</A>
+<TR><TD align="right"><A HREF="../ss-toc2.html">Back to Table of Contents</A>
+<TR><TD align="right"><A HREF="../ssch21/functions-implement.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../ssch23/vectors.html"><STRONG>NEXT</STRONG></A>
+<TR><TD align="right"><A HREF="http://mitpress.mit.edu/0262082810">MIT
+Press web page for <CITE>Simply Scheme</CITE></A>
+</TABLE></TABLE>
+
+<HR>
+
+
+<P>We learned in Chapter 20 how to read from the keyboard and write to the
+screen.  The same procedures (<A NAME="g1"></A><CODE>read</CODE>, <CODE><A NAME="g2"></A><CODE>read-line</CODE></CODE>,
+<A NAME="g3"></A><CODE>display</CODE>, <A NAME="g4"></A><CODE>show</CODE>, <CODE><A NAME="g5"></A><CODE>show-line</CODE></CODE>, and
+<A NAME="g6"></A><CODE>newline</CODE>) can also be used to read and write <A NAME="g7"></A><A NAME="g8"></A>data files on
+the disk.
+
+<P><H2>Ports</H2>
+
+<P>Imagine a complicated program that reads a little bit of data at a time from
+a lot of different files.  For example, soon we will write a program to
+merge two files the way we merged two sentences in <CODE>mergesort</CODE> in Chapter
+15.  In order to make this work, each invocation of <CODE>read</CODE> must
+specify which file to read from this time.  Similarly, we might want to
+direct output among several files.
+
+<P>Each of the input/output procedures can take an extra argument to specify a
+file:
+
+<P><PRE>(show '(across the universe) file1)
+(show-line '(penny lane) file2)
+(read file3)
+</PRE>
+
+<P>What are <CODE>file1</CODE> and so on?  You might think that the natural
+thing would be for them to be words, that is, the names of files.
+
+<P>It happens not to work that way.  Instead, before you can use a file, you
+have to <EM>open</EM> it.  If you want to read a file, the system has to
+check that the file exists.  If you want to write a file, the system has to
+create a new, empty file.  The Scheme procedures that open a file return a
+<EM>port,</EM> which is what Scheme uses to remember the file you opened.
+Ports are useful only as arguments to the input/output procedures.  Here's
+an example:
+
+<P><PRE>&gt; (let ((port (<A NAME="g9"></A>open-output-file &quot;songs&quot;)))
+    (<A NAME="g10"></A>show '(all my loving) port)
+    (show '(ticket to ride) port)
+    (show '(martha my dear) port)
+    (<A NAME="g11"></A>close-output-port port))
+</PRE>
+
+<P>(<CODE>Close-output-port</CODE>, like <CODE>define</CODE>, has an unspecified
+return value that we're not going to include in the examples.)
+
+<P>We've created a file named <CODE>songs</CODE> and put three expressions, each on
+its own line, in that file.  Notice that nothing appeared on the screen when
+we called <CODE>show</CODE>.  Because we used a port argument to <CODE>show</CODE>, the
+output went into the file.  Here's what's in the file:
+
+<P><PRE>(ALL MY LOVING)
+(TICKET TO RIDE)
+(MARTHA MY DEAR)
+</PRE>
+
+<P>The example illustrates two more details about using files that we haven't
+mentioned before: First, the name of a file must be given in double-quote
+marks.  Second, when you're finished using a file, you have to <EM>close</EM>
+the port associated with it.  (This is very important.  On some systems,
+if you forget to close the port, the file disappears.)
+
+<P>The file is now permanent.  If we were to exit from Scheme, we could read
+the file in a word processor or any other program.  But let's read it using
+Scheme:
+
+<P><PRE>(define in (<A NAME="g12"></A>open-input-file &quot;songs&quot;))
+
+&gt; (read in)
+(ALL MY LOVING)
+
+&gt; (read in)
+(TICKET TO RIDE)
+
+&gt; (read in)
+(MARTHA MY DEAR)
+
+&gt; (<A NAME="g13"></A>close-input-port in)
+</PRE>
+
+<P>(In this illustration, we've used a global variable to hold the
+port because we wanted to show the results of reading the file step by
+step.  In a real program, we'd generally use a <CODE>let</CODE> structure like the
+one we used to write the file.  Now that we've closed the port, the variable
+<CODE>in</CODE> contains a port that can no longer be used.)
+
+<P><H2>Writing Files for People to Read</H2>
+
+<P>
+A file full of sentences in parentheses is a natural representation for
+information that will be used by a Scheme program, but it may seem awkward
+if the file will be read by human beings.  We could use <CODE>show-line</CODE>
+instead of <CODE>show</CODE> to create a file, still with one song title per line,
+but without the parentheses:
+
+<P><PRE>&gt; (let ((port (open-output-file &quot;songs2&quot;)))
+    (show-line '(all my loving) port)
+    (show-line '(ticket to ride) port)
+    (show-line '(martha my dear) port)
+    (close-output-port port))
+</PRE>
+
+<P>The file <CODE>songs2</CODE> will contain
+
+<P><PRE>ALL MY LOVING
+TICKET TO RIDE
+MARTHA MY DEAR
+</PRE>
+
+<P>What should we do if we want to read this file back into Scheme?  We must
+read the file a line at a time, with <CODE>read-line</CODE>.  In effect, <CODE>read-line</CODE> treats the breaks between lines as if they were parentheses:
+
+<P><PRE>(define in (open-input-file &quot;songs2&quot;))
+
+&gt; (read-line in)
+(ALL MY LOVING)
+
+&gt; (close-input-port in)
+</PRE>
+
+<P>(Notice that we don't have to read the entire file before closing
+the port.  If we open the file again later, we start over again from the
+beginning.)
+
+<P> 
+As far as Scheme is concerned, the result of writing the file with <CODE>show-line</CODE> and reading it with <CODE>read-line</CODE> was the same as that of
+writing it with <CODE>show</CODE> and reading it with <CODE>read</CODE>.  The difference
+is that without parentheses the file itself is more &quot;user-friendly&quot; for
+someone who reads it outside of Scheme.<A NAME="text1" HREF="files.html#ft1">[1]</A>
+
+<P><H2>Using a File as a Database</H2>
+
+<P>It's not very interesting merely to read the file line by line.  Instead,
+let's use it as a very small database in which we can look up songs by
+number.  (For only three songs, it would be more realistic and more
+convenient to keep them in a list and look them up with <CODE>list-ref</CODE>.
+Pretend that this file has 3000 songs, too many for you to want to keep
+them all at once in your computer's memory.)
+
+<P><PRE>(define (<A NAME="g14"></A>get-song n)
+  (let ((port (open-input-file &quot;songs2&quot;)))
+    (skip-songs (- n 1) port)
+    (let ((answer (read-line port)))
+      (close-input-port port)
+      answer)))
+
+(define (<A NAME="g15"></A>skip-songs n port)
+  (if (= n 0)
+      'done
+      (begin (read-line port)
+	     (skip-songs (- n 1) port))))
+
+&gt; (get-song 2)      
+(TICKET TO RIDE)
+</PRE>
+
+<P>When we invoke <CODE>read-line</CODE> in <CODE>skip-songs</CODE>, we pay no
+attention to the value it returns.  Remember that the values of all but the
+last expression in a sequence are discarded.  <CODE>Read</CODE> and <CODE>read-line</CODE>
+are the first procedures we've seen that have both a useful return value and
+a useful side effect&mdash;moving forward in the file.
+
+<P><CODE>Skip-songs</CODE> returns the word <CODE>done</CODE> when it's finished.  We don't
+do anything with that return value, and there's no particular reason why we
+chose that word.  But every Scheme procedure has to return <EM>something,</EM>
+and this was as good as anything.
+
+<P>What if we asked for a song number greater than three?  In other words, what
+if we read beyond the end of the file?  In that case, <CODE>read</CODE> will return
+a special value called an <EM><A NAME="g16"></A><A NAME="g17"></A>end-of-file object.</EM> The only
+useful thing to do with that value is to test for it.  Our next sample
+program reads an entire file and prints it to the screen:
+
+<P><PRE>(define (<A NAME="g18"></A>print-file name)
+  (let ((port (open-input-file name)))
+    (print-file-helper port)
+    (close-input-port port)
+    'done))
+
+(define (print-file-helper port)             ;; first version
+  (let ((stuff (read-line port)))
+    (if (<A NAME="g19"></A>eof-object? stuff)
+	'done
+	(begin (show-line stuff)
+	       (print-file-helper port)))))
+
+&gt; (print-file &quot;songs&quot;)
+ALL MY LOVING
+TICKET TO RIDE
+MARTHA MY DEAR
+DONE
+</PRE>
+
+<P>Did you notice that each recursive call in <CODE>print-file-helper</CODE> has
+exactly the same argument as the one before?  How does the problem get
+smaller?  (Up to now, recursive calls have involved something like the <CODE>butfirst</CODE> of an old argument, or one less than an old number.)  When we're
+reading a file, the sense in which the problem gets smaller at each
+invocation is that we're getting closer to the end of the file.  You don't
+<CODE>butfirst</CODE> the port; reading it makes the unread portion of the file
+smaller as a side effect.
+
+<P><H2>Transforming the Lines of a File</H2>
+
+<P>Often we want to transform a file one line at a time.  That is, we want to
+copy lines from an input file to an output file, but instead of copying the
+lines exactly, we want each output line to be a <EM>function</EM> of the
+corresponding input line.  Here are some examples: We have a file full of
+text and we want to <EM>justify</EM> the output so that every line is exactly
+the same length, as in a book.  We have a file of students' names and
+grades, and we want a summary with the students' total and average scores.
+We have a file with people's first and last names, and we want to rearrange
+them to be last-name-first.
+
+<P>We'll write a procedure <A NAME="g20"></A><CODE>file-map</CODE>, analogous to <CODE>map</CODE> but for files.
+It will take three arguments:  The first will be a procedure whose domain and
+range are sentences; the second will be the name of the input file; the
+third will be the name of the output file.
+
+<P>Of course, this isn't exactly like the way <CODE>map</CODE> works&mdash;if it were
+exactly analogous, it would take only two arguments, the procedure and the
+<EM>contents</EM> of a file.  But one of the important features of files is
+that they let us handle amounts of information that are too big to fit all
+at once in the computer's memory.  Another feature is that once we write a
+file, it's there permanently, until we erase it.  So instead of having a
+<CODE>file-map</CODE> <EM>function</EM> that returns the contents of the new file,
+we have a procedure that writes its result to the disk.
+
+<P>
+<PRE>(define (<A NAME="g21"></A>file-map fn inname outname)
+  (let ((inport (open-input-file inname))
+	(outport (open-output-file outname)))
+    (file-map-helper fn inport outport)
+    (close-input-port inport)
+    (close-output-port outport)
+    'done))
+
+(define (<A NAME="g22"></A>file-map-helper fn inport outport)
+  (let ((line (read-line inport)))
+    (if (eof-object? line)
+	'done
+	(begin (show-line (fn line) outport)
+	       (file-map-helper fn inport outport)))))
+</PRE>
+
+<P>Compare this program with the earlier <CODE>print-file</CODE> example.  The two are
+almost identical.  One difference is that now the output goes to a file instead
+of to the screen; the other is that we apply the function <CODE>fn</CODE> to each
+line before doing the output.  But that small change vastly increases the
+generality of our program.  We've performed our usual trick of generalizing a
+<A NAME="g23"></A>
+pattern by adding a procedure argument, and instead of a program that
+carries out one specific task (printing the contents of a file), we have a
+tool that can be used to create many programs.
+
+<P>We'll start with an easy example: putting the last name first in a file
+full of names.  That is, if we start with an input file named <CODE>dddbmt</CODE>
+that contains
+
+<P><PRE>David Harmon
+Trevor Davies
+John Dymond
+Michael Wilson
+Ian Amey
+</PRE>
+
+<P>we want the output file to contain
+
+<P><PRE>Harmon, David
+Davies, Trevor
+Dymond, John
+Wilson, Michael
+Amey, Ian
+</PRE>
+
+<P>Since we are using <CODE>file-map</CODE> to handle our progress through the file,
+all we have to write is a procedure that takes a sentence (one name) as its
+argument and returns the same name but with the last word moved to the front
+and with a comma added:
+<PRE>(define (<A NAME="g24"></A>lastfirst name)
+  (se (word (last name) &quot;,&quot;) (bl name)))
+</PRE>
+
+<P>We use <CODE>butlast</CODE> rather than <CODE>first</CODE> in case someone in
+the file has a middle name.
+
+<P>To use this procedure we call <CODE>file-map</CODE> like this:
+
+<P><PRE>&gt; (file-map lastfirst &quot;dddbmt&quot; &quot;dddbmt-reversed&quot;)
+DONE
+</PRE>
+
+<P> Although you don't see the results on the screen, you can 
+
+<P><PRE>&gt; (print-file &quot;dddbmt-reversed&quot;)
+</PRE>
+
+<P>to see that we got the results we wanted.
+
+<P>Our next example is averaging grades.  Suppose the file <CODE>grades</CODE>
+contains this text:
+
+<P><PRE>John 88 92 100 75 95
+Paul 90 91 85 80 91
+George 85 87 90 72 96
+Ringo 95 84 88 87 87
+</PRE>
+
+<P>The output we want is:
+
+<P><PRE>John total: 450 average: 90
+Paul total: 437 average: 87.4
+George total: 430 average: 86
+Ringo total: 441 average: 88.2
+</PRE>
+
+<P>Here's the program:
+
+<P><PRE>(define (<A NAME="g25"></A>process-grades line)
+  (se (first line)
+      &quot;total:"
+      (accumulate + (bf line))
+      &quot;average:"
+      (/ (accumulate + (bf line))
+	 (count (bf line)))))
+
+&gt; (file-map process-grades &quot;grades&quot; &quot;results&quot;)
+</PRE>
+
+<P>As before, you can
+
+<P><PRE>&gt; (print-file &quot;results&quot;)
+</PRE>
+
+<P>to see that we got the results we wanted.
+
+<P> 
+<H2>Justifying Text</H2>
+
+<P>Many word-processing programs <EM>justify</EM> text; that is, they insert
+extra space between words so that every line reaches exactly to the right
+margin.  We can do that using <CODE>file-map</CODE>.
+
+<P>Let's suppose we have a file <CODE>r5rs</CODE>, written in some text editor, that
+looks like this:
+
+<P><PRE>Programming languages should be designed not by
+piling feature on top of feature, but by
+removing the weaknesses and restrictions that
+make additional features appear necessary.
+Scheme demonstrates that a very small number of
+rules for forming expressions, with no
+restrictions on how they are composed, suffice
+to form a practical and efficient programming
+language that is flexible enough to support most
+of the major programming paradigms in use today.
+</PRE>
+
+<P>(This is the first paragraph of the <EM>Revised<SUP><SMALL>5</SMALL></SUP> Report on
+the Algorithmic Language Scheme,</EM> edited by <A NAME="g26"></A>William Clinger
+and <A NAME="g27"></A>Jonathan Rees.)
+
+<P>Here is what the result should be if we justify our <CODE>r5rs</CODE> text:
+
+<P><PRE>Programming languages should be  designed  not  by
+piling  feature  on  top  of   feature,   but   by
+removing  the  weaknesses  and  restrictions  that
+make   additional   features   appear   necessary.
+Scheme demonstrates that a very  small  number  of
+rules   for   forming   expressions,    with    no
+restrictions on how  they  are  composed,  suffice
+to form  a  practical  and  efficient  programming
+language that is flexible enough to  support  most
+of the major programming paradigms in  use  today.
+</PRE>
+
+<P>The tricky part is that ordinarily we don't control the spaces that appear
+when a sentence is printed.  We just make sure the words are right, and we
+get one space between words automatically.  The solution used in this
+program is that each line of the output file is constructed as a single long
+word, including space characters that we place explicitly within it.  (Since
+<CODE>show-line</CODE> requires a sentence as its argument, our procedure will
+actually return a one-word sentence.  In the following program, <CODE>pad</CODE>
+constructs the word, and <CODE>justify</CODE> makes a one-word sentence containing
+it.)
+
+<P>This program, although short, is much harder to understand than most of
+our short examples.  There is no big new idea involved; instead, there are a
+number of unexciting but necessary details.  How many spaces between words?
+Do some words get more space than others?  The program structure is messy
+because the problem itself is messy.  Although it will be hard to read and
+understand, this program is a more realistic example of input/output
+programming than the cleanly structured examples we've shown until now.
+
+<P><CODE>Justify</CODE> takes two arguments, the line of text (a sentence) and a
+number indicating the desired width (how many characters).  Here's the
+algorithm:  First the program computes the total number of characters the
+sentence would take up without adding extras.  That's the job of <CODE>char-count</CODE>, which adds up the lengths of all the words, and adds to that
+the <EM>n</EM>&minus;1 spaces between words.  <CODE>Extra-spaces</CODE> subtracts that length
+from the desired line width to get the number of extra spaces we need.
+
+<P>The hard part of the job is done by <CODE>pad</CODE>.  It's invoked with three
+arguments: the part of the line not yet processed, the number of
+opportunities there are to insert extra spaces in that part of the line
+(that is, the number of words minus one), and the number of extra spaces that
+the program still needs to insert.  The number of extra spaces to insert
+<EM>this time</EM> is the integer quotient of the number <CODE>pad</CODE> wants to
+insert and the number of chances it'll have.  That is, if there are five
+words on the line, there are four places where <CODE>pad</CODE> can insert extra
+space.  If it needs to insert nine spaces altogether, then it should insert
+9/4 or two spaces at the first opportunity.  (Are you worried about the
+remainder?  It will turn out that <CODE>pad</CODE> doesn't lose any spaces because
+it takes the quotient over again for each word break.  The base case is that
+the number of remaining word breaks (the divisor) is one, so there will be
+no remainder, and all the leftover extra spaces will be inserted at the last
+word break.)
+
+<P><PRE>(define (<A NAME="g28"></A>justify line width)
+  (if (&lt; (count line) 2)
+      line
+      (se (pad line
+	       (- (count line) 1)
+	       (extra-spaces width (char-count line))))))
+
+(define (<A NAME="g29"></A>char-count line)
+  (+ (accumulate + (every count line))      ; letters within words
+     (- (count line) 1)))                   ; plus spaces between words
+
+(define (<A NAME="g30"></A>extra-spaces width chars)
+  (if (&gt; chars width)
+      0                                     ; none if already too wide
+      (- width chars)))
+
+(define (<A NAME="g31"></A>pad line chances needed)
+  (if (= chances 0)                         ; only one word in line
+      (first line)
+      (let ((extra (quotient needed chances)))
+	(word (first line)
+	      (spaces (+ extra 1))
+	      (pad (bf line) (- chances 1) (- needed extra))))))
+
+(define (<A NAME="g32"></A>spaces n)
+  (if (= n 0)
+      &quot;"
+      (word &quot; &quot; (spaces (- n 1)))))
+</PRE>
+
+<P>Because <CODE>justify</CODE> takes two arguments, we have to decide what line width
+we want to give it.  Here's how to make each line take 50 characters:
+
+<P><PRE>&gt; (file-map (lambda (sent) (justify sent 50)) &quot;r5rs&quot; &quot;r5rs-just&quot;)
+</PRE>
+
+<P><H2>Preserving Spacing of Text from Files</H2>
+
+<P>If we try to print the file <CODE>r5rs-just</CODE> from the previous section using
+<CODE>print-file</CODE>, it'll look exactly like <CODE>r5rs</CODE>.  That's because <CODE>read-line</CODE> doesn't preserve consecutive spaces in the lines that it reads.
+<CODE>Read-line</CODE> cares only where each word (consisting of non-space
+characters) begins and ends; it pays no attention to how many spaces come
+between any two words.  The lines
+
+<P><PRE>All     My          Loving
+</PRE>
+
+<P>and
+
+<P><PRE>All My Loving
+</PRE>
+
+<P>are the same, as far as <CODE>read-line</CODE> tells you.
+
+<P>For situations in which we do care about spacing, we have another way to
+read a line from a file.  The procedure <A NAME="g33"></A><CODE>read-string</CODE> reads all of the
+characters on a line, returning a single word that contains all of them,
+spaces included:<A NAME="text2" HREF="files.html#ft2">[2]</A>
+
+<P>
+<PRE>&gt; (define inport (open-input-file &quot;r5rs-just&quot;))
+
+&gt; (read-string inport)
+&quot;Programming languages should be  designed  not  by"
+
+&gt; (read-string inport)
+&quot;piling  feature  on  top  of   feature,   but   by"
+
+&gt; (close-input-port inport)
+</PRE>
+
+<P>
+<P>We can use <CODE>read-string</CODE> to rewrite <CODE>print-file</CODE> so that it makes
+an exact copy of the input file:
+
+<P><PRE>(define (<A NAME="g34"></A>print-file-helper port)
+  (let ((stuff (read-string port)))
+    (if (eof-object? stuff)
+	'done
+	(begin (show stuff)
+	       (print-file-helper port)))))
+</PRE>
+
+<P>(We only had to change the helper procedure.)
+
+<P> 
+<H2>Merging Two Files</H2>
+
+<P>Suppose you have two files of people's names.  Each file has been sorted
+in alphabetical order.  You want to combine them to form a single file,
+still in order.  (If this sounds unrealistic, it isn't.  Programs that sort
+very large amounts of information can't always fit it all in memory at
+once, so they read in as much as fits, sort it, and write a file.  Then
+they read and sort another chunk.  At the end of this process, the program
+is left with several sorted partial files, and it has to merge those
+files to get the overall result.)
+
+<P>The algorithm for merging files is exactly the same as the one we used for
+merging sentences in the <CODE>mergesort</CODE> program of Chapter 15.  The
+only difference is that the items to be sorted come from reading ports
+instead of from <CODE>first</CODE>ing a sentence.
+
+<P><PRE>(define (<A NAME="g35"></A>filemerge file1 file2 outfile)
+  (let ((p1 (open-input-file file1))
+	(p2 (open-input-file file2))
+	(outp (open-output-file outfile)))
+    (filemerge-helper p1 p2 outp (read-string p1) (read-string p2))
+    (close-output-port outp)
+    (close-input-port p1)
+    (close-input-port p2)
+    'done))
+
+(define (<A NAME="g36"></A>filemerge-helper p1 p2 outp line1 line2)
+  (cond ((eof-object? line1) (merge-copy line2 p2 outp))
+	((eof-object? line2) (merge-copy line1 p1 outp))
+	((before? line1 line2)
+	 (show line1 outp)
+	 (filemerge-helper p1 p2 outp (read-string p1) line2))
+	(else (show line2 outp)
+	      (filemerge-helper p1 p2 outp line1 (read-string p2)))))
+
+(define (<A NAME="g37"></A>merge-copy line inp outp)
+  (if (eof-object? line)
+      #f
+      (begin (show line outp)
+	     (merge-copy (read-string inp) inp outp))))
+</PRE>
+
+<P>You might think, comparing <CODE>filemerge-helper</CODE> with such earlier examples
+as <CODE>print-file-helper</CODE> and <CODE>file-map-helper</CODE>, that it would make
+more sense for <CODE>filemerge-helper</CODE> to take just the three ports as
+arguments and work like this:
+
+<P><PRE>(define (filemerge-helper p1 p2 outp)        ;; wrong
+  (let ((line1 (read-string p1))
+	(line2 (read-string p2)))
+    (cond ((eof-object? line1) (merge-copy p2 outp))
+	  ((eof-object? line2) (merge-copy p1 outp))
+	  ((before? line1 line2)
+	   (show line1 outp)
+	   (filemerge-helper p1 p2 outp))
+	  (else (show line2 outp)
+		(filemerge-helper p1 p2 outp)))))
+</PRE>
+
+<P>Unfortunately, this won't work.  Suppose that the first line of <CODE>file2</CODE>
+comes before the first line of <CODE>file1</CODE>.  This program correctly writes
+the first line of <CODE>file2</CODE> to the output file, as we expect.  But what
+about the first line of <CODE>file1</CODE>?  Since we called <CODE>read-string</CODE> on
+<CODE>file1</CODE>, we've &quot;gobbled&quot;<A NAME="text3" HREF="files.html#ft3">[3]</A> that line, but we're not yet ready to write it to the output.
+
+<P>In each invocation of <CODE>filemerge-helper</CODE>, only one line is written to
+the output file, so unless we want to lose information, we'd better read
+only one line.  This means that we can't call <CODE>read-string</CODE> twice on each
+recursive call.  One of the lines has to be handed down from one invocation
+to the next.  (That is, it has to be an argument to the recursive call.)
+Since we don't know in advance <EM>which</EM> line to keep, the easiest
+solution is to hand down both lines.
+
+<P>Therefore, <CODE>filemerge-helper</CODE> also takes as arguments the first line of
+each file that hasn't yet been written to the output.  When we first call
+<CODE>filemerge-helper</CODE> from <CODE>filemerge</CODE>, we read the first line of each
+file to provide the initial values of these arguments.  Then, on each
+recursive call, <CODE>filemerge-helper</CODE> calls <CODE>read-string</CODE> only once.
+
+<P><H2>Writing Files for Scheme to Read</H2>
+
+<P>You may be thinking that the three file-reading procedures we've shown, <CODE>read</CODE>, <CODE>read-line</CODE>, and <CODE>read-string</CODE>, have been getting better and
+better.  <CODE>Read</CODE> ignores case and forces you to have parentheses in your
+file.  <CODE>Read-line</CODE> fixes those problems, but it loses spacing information.
+<CODE>Read-string</CODE> can read anything and always gets it right.
+
+<P>But there's a cost to the generality of <CODE>read-string</CODE>; it can read any
+file, but it loses <EM>structure</EM> information.  For example, when we
+processed a file of people's names with <CODE>file-map</CODE>, we used this
+function:
+
+<P><PRE>(define (lastfirst name)
+  (se (word (last name) &quot;,&quot;) (bl name)))
+</PRE>
+
+<P>It's easy to break a name into its components if you have the name
+in the form of a sentence, with the words separated already.  But if we had
+read each line with <CODE>read-string</CODE>, <CODE>last</CODE> of a line would have been
+the last letter, not the last name.
+
+<P>The <CODE>lastfirst</CODE> example illustrates why you might want to use <CODE>read-line</CODE> rather than <CODE>read-string</CODE>: <CODE>Read-line</CODE> &quot;understands&quot;
+spaces.  Here's an example in which the even more structured <CODE>read</CODE> is
+appropriate.  We have a file of Beatles songs and the albums on which they
+appear:
+
+<P><PRE>((love me do) (please please me))
+((do you want to know a secret?) (please please me))
+((think for yourself) (rubber soul))
+((your mother should know) (magical mystery tour))
+</PRE>
+
+<P>Each line of this file contains two pieces of information: a song
+title and an album title.  If each line contained only the words of the two
+titles, as in
+
+<P><PRE>love me do please please me
+</PRE>
+
+<P>how would we know where the song title stops and the album title
+starts?  The natural way to represent this grouping information is to use
+the mechanism Scheme provides for grouping, namely, list structure.
+
+<P>If we use <CODE>read-line</CODE> to read the file, we'll lose the list structure;
+it will return a sentence containing words like <CODE>&quot;((love&quot;</CODE>.  <CODE>Read</CODE>,
+however, will do what we want.
+
+<P>How did we create this file in the first place?  We just used one <CODE>show</CODE>
+per line of the file, like this:
+
+<P><PRE>&gt; (show '((love me do) (please please me)) port)
+</PRE>
+
+<P>But what about the movie soundtracks?  We're going to have to come to terms
+with the apostrophe in &quot;A Hard Day's Night.&quot;
+
+<P>The straightforward solution is to put <CODE>day's</CODE> in a string:
+
+<P><PRE>(show '((and i love her) (a hard &quot;day's&quot; night)) port)
+</PRE>
+
+<P>The corresponding line in the file will look like this:
+
+<P><PRE>((AND I LOVE HER) (A HARD day's NIGHT))
+</PRE>
+
+<P>This result is actually even worse than it looks, because when we
+try to <CODE>read</CODE> the line back, the <CODE>'s</CODE> will be expanded into <CODE>(quote s)</CODE> in most versions of Scheme.  Using a string made it possible for
+us to get an apostrophe into Scheme.  If the word <CODE>day's</CODE> were inside
+quotation marks in the file, then <CODE>read</CODE> would understand our intentions.
+
+<P>Why aren't there double quotes in the file?  All of the printing
+procedures we've seen so far assume that whatever you're printing is
+intended to be read by people.  Therefore, they try to minimize
+distracting notation such as double-quote marks.  But, as we've
+discovered, if you're writing a file to be read by Scheme, then you
+do want enough notation so that Scheme can tell what the original
+object was.
+
+<P><CODE>Write</CODE> is a printing procedure just like <CODE>display</CODE>, except that it
+<A NAME="spwrite"></A>
+includes quote marks around strings:<A NAME="text4" HREF="files.html#ft4">[4]</A>
+
+<P><PRE>&gt; (write '(a hard &quot;day's&quot; night))
+(A HARD &quot;day's&quot; NIGHT)
+</PRE>
+
+<P>Once we're using strings, and since we're not extracting individual words
+from the titles, we might as well represent each title as one string:
+
+<P><PRE>&gt; (write '(&quot;And I Love Her&quot; &quot;A Hard Day's Night&quot;) port)
+</PRE>
+
+<P><H2>Pitfalls</H2>
+
+<P>One pitfall crucial to avoid when using files is that if there is an
+error in your program, it might blow up and return you to the Scheme prompt
+without closing the open files.  If you fix the program and try to run it
+again, you may get a message like &quot;file busy&quot; because the operating system
+of your computer may not allow you to open the same file on two ports at
+once.  Even worse, if you exit from Scheme without closing all your ports,
+on some computers you may find that you have unreadable files thereafter.
+
+<P>To help cope with this problem, we've provided a procedure <CODE><A NAME="g39"></A><CODE>close-all-ports</CODE></CODE> that can be invoked to close every port that you've
+opened since starting Scheme.  This procedure works only in our modified
+Scheme, but it can help you out of trouble while you're learning.
+
+<P>Be sure you don't open or close a file within a recursive procedure,
+if you intend to do it only once.  That's why most of the programs in
+this chapter have the structure of a procedure that opens files, calls a
+recursive helper, and then closes the files.
+
+<P>As we explained in the <CODE>filemerge</CODE> example, you can't read the same
+line twice.  Be sure your program remembers each line in a variable as long
+as it's needed.
+
+<P><H2>Exercises</H2>
+
+<P><B>22.1</B>&nbsp;&nbsp;Write a <CODE><A NAME="g40"></A>concatenate</CODE> procedure that takes two arguments: a list
+of names of input files, and one name for an output file.  The procedure
+should copy all of the input files, in order, into the output file.
+
+
+<P>
+<B>22.2</B>&nbsp;&nbsp;Write a procedure to count the number of lines in a file.  It should
+take the filename as argument and return the number.
+
+
+<P>
+<B>22.3</B>&nbsp;&nbsp;Write a procedure to count the number of words in a file.  It should
+take the filename as argument and return the number.
+
+
+<P>
+<B>22.4</B>&nbsp;&nbsp;Write a procedure to count the number of characters in a file, including
+space characters.  It should take the filename as argument and return the
+number.
+
+
+<P>
+<B>22.5</B>&nbsp;&nbsp;Write a procedure that copies an input file to an output file but
+eliminates multiple consecutive copies of the same line.  That is, if
+the input file contains the lines
+
+<P><PRE>John Lennon
+Paul McCartney
+Paul McCartney
+George Harrison
+
+
+Paul McCartney
+Ringo Starr
+</PRE>
+
+<P>then the output file should contain
+
+<P><PRE>John Lennon
+Paul McCartney
+George Harrison
+
+Paul McCartney
+Ringo Starr
+</PRE>
+
+<P>
+<B>22.6</B>&nbsp;&nbsp;Write a <CODE><A NAME="g41"></A>lookup</CODE> procedure that takes as arguments a filename and
+a word.  The procedure should print (on the screen, not into another file)
+only those lines from the input file that include the chosen word.
+
+
+<P>
+<B>22.7</B>&nbsp;&nbsp;Write a <CODE><A NAME="g42"></A>page</CODE> procedure that takes a filename as argument and
+prints the file a screenful at a time.  Assume that a screen can fit 24
+lines; your procedure should print 23 lines of the file and then a prompt
+message, and then wait for the user to enter a (probably empty) line.  It
+should then print the most recent line from the file again (so that the user
+will see some overlap between screenfuls) and 22 more lines, and so on until
+the file ends.
+
+
+<P>
+<B>22.8</B>&nbsp;&nbsp;A common operation in a database program is to <EM>join</EM> two
+databases, that is, to create a new database combining the information from the
+two given ones.  There has to be some piece of information in common between
+the two databases.  For example, suppose we have a class roster database in
+which each record includes a student's name, student ID number, and computer
+account name, like this:
+<A NAME="join"></A>
+
+<P><PRE>((john alec entwistle) 04397 john)
+((keith moon) 09382 kmoon)
+((peter townshend) 10428 pete)
+((roger daltrey) 01025 roger)
+</PRE>
+
+<P>We also have a grade database in which each student's grades
+are stored according to computer account name:
+
+<P><PRE>(john 87 90 76 68 95)
+(kmoon 80 88 95 77 89)
+(pete 100 92 80 65 72)
+(roger 85 96 83 62 74)
+</PRE>
+
+<P>We want to create a combined database like this:
+
+<P><PRE>((john alec entwistle) 04397 john 87 90 76 68 95)
+((keith moon) 09382 kmoon 80 88 95 77 89)
+((peter townshend) 10428 pete 100 92 80 65 72)
+((roger daltrey) 01025 roger 85 96 83 62 74)
+</PRE>
+
+<P>in which the information from the roster and grade databases has
+been combined for each account name.
+
+<P>Write a program <CODE><A NAME="g43"></A>join</CODE> that takes five arguments: two input
+filenames, two numbers indicating the position of the item within each
+record that should overlap between the files, and an output filename.  For
+our example, we'd say
+
+<P><PRE>&gt; (join &quot;class-roster&quot; &quot;grades&quot; 3 1 &quot;combined-file&quot;)
+</PRE>
+
+<P>In our example, both files are in alphabetical order of computer account
+name, the account name is a word, and the same account name never appears
+more than once in each file.  In general, you may assume that these
+conditions hold for the item that the two files have in common.  Your
+program should <EM>not</EM> assume that every item in one file also appears
+in the other.  A line should be written in the output file only for the
+items that do appear in both files.  
+
+<P>
+
+<HR>
+<A NAME="ft1" HREF="files.html#text1">[1]</A> Another difference, not
+apparent in this example, is that <CODE>show</CODE> and <CODE>read</CODE> can handle
+structured lists.  <CODE>Show-line</CODE> can print a structured list, leaving off
+only the outermost parentheses, but <CODE>read-line</CODE> will treat any
+parentheses in the file as ordinary characters; it always returns a
+sentence.<P>
+<A NAME="ft2" HREF="files.html#text2">[2]</A> Like all the input and output primitives, <CODE>read-string</CODE> can be invoked with or without a port argument.<P>
+<A NAME="ft3" HREF="files.html#text3">[3]</A> Computer programmers really talk
+this way.<P>
+<A NAME="ft4" HREF="files.html#text4">[4]</A> There are other kinds of data
+that <A NAME="g38"></A><CODE>write</CODE> prints differently from <CODE>display</CODE>, but we don't
+use them in
+this book.  The general rule is that <CODE>display</CODE> formats the output for
+human readers, while <CODE>write</CODE> ensures that Scheme can reread the
+information unambiguously.  <CODE>Show</CODE> and <CODE>show-line</CODE> are extensions
+that we wrote using <CODE>display</CODE>.  We could have written <CODE>show-in-write-format</CODE>,
+for example, but happened not to need it.<P>
+<P><A HREF="../ss-toc2.html">(back to Table of Contents)</A><P>
+<A HREF="../ssch21/functions-implement.html"><STRONG>BACK</STRONG></A>
+chapter thread <A HREF="../ssch23/vectors.html"><STRONG>NEXT</STRONG></A>
+
+<P>
+<ADDRESS>
+<A HREF="../index.html">Brian Harvey</A>, 
+<CODE>bh@cs.berkeley.edu</CODE>
+</ADDRESS>
+</BODY>
+</HTML>