diff options
author | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
---|---|---|
committer | elioat <elioat@tilde.institute> | 2023-08-23 07:52:19 -0400 |
commit | 562a9a52d599d9a05f871404050968a5fd282640 (patch) | |
tree | 7d3305c1252c043bfe246ccc7deff0056aa6b5ab /js/games/nluqo.github.io/~bh/ssch22 | |
parent | 5d012c6c011a9dedf7d0a098e456206244eb5a0f (diff) | |
download | tour-562a9a52d599d9a05f871404050968a5fd282640.tar.gz |
*
Diffstat (limited to 'js/games/nluqo.github.io/~bh/ssch22')
-rw-r--r-- | js/games/nluqo.github.io/~bh/ssch22/files | 875 | ||||
-rw-r--r-- | js/games/nluqo.github.io/~bh/ssch22/files.html | 875 |
2 files changed, 1750 insertions, 0 deletions
diff --git a/js/games/nluqo.github.io/~bh/ssch22/files b/js/games/nluqo.github.io/~bh/ssch22/files new file mode 100644 index 0000000..ed2c840 --- /dev/null +++ b/js/games/nluqo.github.io/~bh/ssch22/files @@ -0,0 +1,875 @@ +<P> + +<P><A NAME="cabinets"></A> +<P><CENTER><IMG SRC="../ss-pics/files.jpg" ALT="figure: files"></CENTER> +<HTML> +<HEAD> +<TITLE>Simply Scheme: Introducing Computer Science ch 22: Files</TITLE> +</HEAD> +<BODY> +<HR> +<CITE>Simply Scheme:</CITE> +<CITE>Introducing Computer Science</CITE> 2/e Copyright (C) 1999 MIT +<H2>Chapter 22</H2> +<H1>Files</H1> + +<TABLE width="100%"><TR><TD> +<IMG SRC="../simply.jpg" ALT="cover photo"> +<TD><TABLE> +<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian +Harvey</A><BR>University of California, Berkeley</CITE> +<TR><TD align="right"><CITE><A HREF="http://ccrma.stanford.edu/~matt">Matthew +Wright</A><BR>University of California, Santa Barbara</CITE> +<TR><TD align="right"><BR> +<TR><TD align="right"><A HREF="../pdf/ssch22.pdf">Download PDF version</A> +<TR><TD align="right"><A HREF="../ss-toc2.html">Back to Table of Contents</A> +<TR><TD align="right"><A HREF="../ssch21/functions-implement.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch23/vectors.html"><STRONG>NEXT</STRONG></A> +<TR><TD align="right"><A HREF="http://mitpress.mit.edu/0262082810">MIT +Press web page for <CITE>Simply Scheme</CITE></A> +</TABLE></TABLE> + +<HR> + + +<P>We learned in Chapter 20 how to read from the keyboard and write to the +screen. The same procedures (<A NAME="g1"></A><CODE>read</CODE>, <CODE><A NAME="g2"></A><CODE>read-line</CODE></CODE>, +<A NAME="g3"></A><CODE>display</CODE>, <A NAME="g4"></A><CODE>show</CODE>, <CODE><A NAME="g5"></A><CODE>show-line</CODE></CODE>, and +<A NAME="g6"></A><CODE>newline</CODE>) can also be used to read and write <A NAME="g7"></A><A NAME="g8"></A>data files on +the disk. + +<P><H2>Ports</H2> + +<P>Imagine a complicated program that reads a little bit of data at a time from +a lot of different files. For example, soon we will write a program to +merge two files the way we merged two sentences in <CODE>mergesort</CODE> in Chapter +15. In order to make this work, each invocation of <CODE>read</CODE> must +specify which file to read from this time. Similarly, we might want to +direct output among several files. + +<P>Each of the input/output procedures can take an extra argument to specify a +file: + +<P><PRE>(show '(across the universe) file1) +(show-line '(penny lane) file2) +(read file3) +</PRE> + +<P>What are <CODE>file1</CODE> and so on? You might think that the natural +thing would be for them to be words, that is, the names of files. + +<P>It happens not to work that way. Instead, before you can use a file, you +have to <EM>open</EM> it. If you want to read a file, the system has to +check that the file exists. If you want to write a file, the system has to +create a new, empty file. The Scheme procedures that open a file return a +<EM>port,</EM> which is what Scheme uses to remember the file you opened. +Ports are useful only as arguments to the input/output procedures. Here's +an example: + +<P><PRE>> (let ((port (<A NAME="g9"></A>open-output-file "songs"))) + (<A NAME="g10"></A>show '(all my loving) port) + (show '(ticket to ride) port) + (show '(martha my dear) port) + (<A NAME="g11"></A>close-output-port port)) +</PRE> + +<P>(<CODE>Close-output-port</CODE>, like <CODE>define</CODE>, has an unspecified +return value that we're not going to include in the examples.) + +<P>We've created a file named <CODE>songs</CODE> and put three expressions, each on +its own line, in that file. Notice that nothing appeared on the screen when +we called <CODE>show</CODE>. Because we used a port argument to <CODE>show</CODE>, the +output went into the file. Here's what's in the file: + +<P><PRE>(ALL MY LOVING) +(TICKET TO RIDE) +(MARTHA MY DEAR) +</PRE> + +<P>The example illustrates two more details about using files that we haven't +mentioned before: First, the name of a file must be given in double-quote +marks. Second, when you're finished using a file, you have to <EM>close</EM> +the port associated with it. (This is very important. On some systems, +if you forget to close the port, the file disappears.) + +<P>The file is now permanent. If we were to exit from Scheme, we could read +the file in a word processor or any other program. But let's read it using +Scheme: + +<P><PRE>(define in (<A NAME="g12"></A>open-input-file "songs")) + +> (read in) +(ALL MY LOVING) + +> (read in) +(TICKET TO RIDE) + +> (read in) +(MARTHA MY DEAR) + +> (<A NAME="g13"></A>close-input-port in) +</PRE> + +<P>(In this illustration, we've used a global variable to hold the +port because we wanted to show the results of reading the file step by +step. In a real program, we'd generally use a <CODE>let</CODE> structure like the +one we used to write the file. Now that we've closed the port, the variable +<CODE>in</CODE> contains a port that can no longer be used.) + +<P><H2>Writing Files for People to Read</H2> + +<P> +A file full of sentences in parentheses is a natural representation for +information that will be used by a Scheme program, but it may seem awkward +if the file will be read by human beings. We could use <CODE>show-line</CODE> +instead of <CODE>show</CODE> to create a file, still with one song title per line, +but without the parentheses: + +<P><PRE>> (let ((port (open-output-file "songs2"))) + (show-line '(all my loving) port) + (show-line '(ticket to ride) port) + (show-line '(martha my dear) port) + (close-output-port port)) +</PRE> + +<P>The file <CODE>songs2</CODE> will contain + +<P><PRE>ALL MY LOVING +TICKET TO RIDE +MARTHA MY DEAR +</PRE> + +<P>What should we do if we want to read this file back into Scheme? We must +read the file a line at a time, with <CODE>read-line</CODE>. In effect, <CODE>read-line</CODE> treats the breaks between lines as if they were parentheses: + +<P><PRE>(define in (open-input-file "songs2")) + +> (read-line in) +(ALL MY LOVING) + +> (close-input-port in) +</PRE> + +<P>(Notice that we don't have to read the entire file before closing +the port. If we open the file again later, we start over again from the +beginning.) + +<P> +As far as Scheme is concerned, the result of writing the file with <CODE>show-line</CODE> and reading it with <CODE>read-line</CODE> was the same as that of +writing it with <CODE>show</CODE> and reading it with <CODE>read</CODE>. The difference +is that without parentheses the file itself is more "user-friendly" for +someone who reads it outside of Scheme.<A NAME="text1" HREF="files#ft1">[1]</A> + +<P><H2>Using a File as a Database</H2> + +<P>It's not very interesting merely to read the file line by line. Instead, +let's use it as a very small database in which we can look up songs by +number. (For only three songs, it would be more realistic and more +convenient to keep them in a list and look them up with <CODE>list-ref</CODE>. +Pretend that this file has 3000 songs, too many for you to want to keep +them all at once in your computer's memory.) + +<P><PRE>(define (<A NAME="g14"></A>get-song n) + (let ((port (open-input-file "songs2"))) + (skip-songs (- n 1) port) + (let ((answer (read-line port))) + (close-input-port port) + answer))) + +(define (<A NAME="g15"></A>skip-songs n port) + (if (= n 0) + 'done + (begin (read-line port) + (skip-songs (- n 1) port)))) + +> (get-song 2) +(TICKET TO RIDE) +</PRE> + +<P>When we invoke <CODE>read-line</CODE> in <CODE>skip-songs</CODE>, we pay no +attention to the value it returns. Remember that the values of all but the +last expression in a sequence are discarded. <CODE>Read</CODE> and <CODE>read-line</CODE> +are the first procedures we've seen that have both a useful return value and +a useful side effect—moving forward in the file. + +<P><CODE>Skip-songs</CODE> returns the word <CODE>done</CODE> when it's finished. We don't +do anything with that return value, and there's no particular reason why we +chose that word. But every Scheme procedure has to return <EM>something,</EM> +and this was as good as anything. + +<P>What if we asked for a song number greater than three? In other words, what +if we read beyond the end of the file? In that case, <CODE>read</CODE> will return +a special value called an <EM><A NAME="g16"></A><A NAME="g17"></A>end-of-file object.</EM> The only +useful thing to do with that value is to test for it. Our next sample +program reads an entire file and prints it to the screen: + +<P><PRE>(define (<A NAME="g18"></A>print-file name) + (let ((port (open-input-file name))) + (print-file-helper port) + (close-input-port port) + 'done)) + +(define (print-file-helper port) ;; first version + (let ((stuff (read-line port))) + (if (<A NAME="g19"></A>eof-object? stuff) + 'done + (begin (show-line stuff) + (print-file-helper port))))) + +> (print-file "songs") +ALL MY LOVING +TICKET TO RIDE +MARTHA MY DEAR +DONE +</PRE> + +<P>Did you notice that each recursive call in <CODE>print-file-helper</CODE> has +exactly the same argument as the one before? How does the problem get +smaller? (Up to now, recursive calls have involved something like the <CODE>butfirst</CODE> of an old argument, or one less than an old number.) When we're +reading a file, the sense in which the problem gets smaller at each +invocation is that we're getting closer to the end of the file. You don't +<CODE>butfirst</CODE> the port; reading it makes the unread portion of the file +smaller as a side effect. + +<P><H2>Transforming the Lines of a File</H2> + +<P>Often we want to transform a file one line at a time. That is, we want to +copy lines from an input file to an output file, but instead of copying the +lines exactly, we want each output line to be a <EM>function</EM> of the +corresponding input line. Here are some examples: We have a file full of +text and we want to <EM>justify</EM> the output so that every line is exactly +the same length, as in a book. We have a file of students' names and +grades, and we want a summary with the students' total and average scores. +We have a file with people's first and last names, and we want to rearrange +them to be last-name-first. + +<P>We'll write a procedure <A NAME="g20"></A><CODE>file-map</CODE>, analogous to <CODE>map</CODE> but for files. +It will take three arguments: The first will be a procedure whose domain and +range are sentences; the second will be the name of the input file; the +third will be the name of the output file. + +<P>Of course, this isn't exactly like the way <CODE>map</CODE> works—if it were +exactly analogous, it would take only two arguments, the procedure and the +<EM>contents</EM> of a file. But one of the important features of files is +that they let us handle amounts of information that are too big to fit all +at once in the computer's memory. Another feature is that once we write a +file, it's there permanently, until we erase it. So instead of having a +<CODE>file-map</CODE> <EM>function</EM> that returns the contents of the new file, +we have a procedure that writes its result to the disk. + +<P> +<PRE>(define (<A NAME="g21"></A>file-map fn inname outname) + (let ((inport (open-input-file inname)) + (outport (open-output-file outname))) + (file-map-helper fn inport outport) + (close-input-port inport) + (close-output-port outport) + 'done)) + +(define (<A NAME="g22"></A>file-map-helper fn inport outport) + (let ((line (read-line inport))) + (if (eof-object? line) + 'done + (begin (show-line (fn line) outport) + (file-map-helper fn inport outport))))) +</PRE> + +<P>Compare this program with the earlier <CODE>print-file</CODE> example. The two are +almost identical. One difference is that now the output goes to a file instead +of to the screen; the other is that we apply the function <CODE>fn</CODE> to each +line before doing the output. But that small change vastly increases the +generality of our program. We've performed our usual trick of generalizing a +<A NAME="g23"></A> +pattern by adding a procedure argument, and instead of a program that +carries out one specific task (printing the contents of a file), we have a +tool that can be used to create many programs. + +<P>We'll start with an easy example: putting the last name first in a file +full of names. That is, if we start with an input file named <CODE>dddbmt</CODE> +that contains + +<P><PRE>David Harmon +Trevor Davies +John Dymond +Michael Wilson +Ian Amey +</PRE> + +<P>we want the output file to contain + +<P><PRE>Harmon, David +Davies, Trevor +Dymond, John +Wilson, Michael +Amey, Ian +</PRE> + +<P>Since we are using <CODE>file-map</CODE> to handle our progress through the file, +all we have to write is a procedure that takes a sentence (one name) as its +argument and returns the same name but with the last word moved to the front +and with a comma added: +<PRE>(define (<A NAME="g24"></A>lastfirst name) + (se (word (last name) ",") (bl name))) +</PRE> + +<P>We use <CODE>butlast</CODE> rather than <CODE>first</CODE> in case someone in +the file has a middle name. + +<P>To use this procedure we call <CODE>file-map</CODE> like this: + +<P><PRE>> (file-map lastfirst "dddbmt" "dddbmt-reversed") +DONE +</PRE> + +<P> Although you don't see the results on the screen, you can + +<P><PRE>> (print-file "dddbmt-reversed") +</PRE> + +<P>to see that we got the results we wanted. + +<P>Our next example is averaging grades. Suppose the file <CODE>grades</CODE> +contains this text: + +<P><PRE>John 88 92 100 75 95 +Paul 90 91 85 80 91 +George 85 87 90 72 96 +Ringo 95 84 88 87 87 +</PRE> + +<P>The output we want is: + +<P><PRE>John total: 450 average: 90 +Paul total: 437 average: 87.4 +George total: 430 average: 86 +Ringo total: 441 average: 88.2 +</PRE> + +<P>Here's the program: + +<P><PRE>(define (<A NAME="g25"></A>process-grades line) + (se (first line) + "total:" + (accumulate + (bf line)) + "average:" + (/ (accumulate + (bf line)) + (count (bf line))))) + +> (file-map process-grades "grades" "results") +</PRE> + +<P>As before, you can + +<P><PRE>> (print-file "results") +</PRE> + +<P>to see that we got the results we wanted. + +<P> +<H2>Justifying Text</H2> + +<P>Many word-processing programs <EM>justify</EM> text; that is, they insert +extra space between words so that every line reaches exactly to the right +margin. We can do that using <CODE>file-map</CODE>. + +<P>Let's suppose we have a file <CODE>r5rs</CODE>, written in some text editor, that +looks like this: + +<P><PRE>Programming languages should be designed not by +piling feature on top of feature, but by +removing the weaknesses and restrictions that +make additional features appear necessary. +Scheme demonstrates that a very small number of +rules for forming expressions, with no +restrictions on how they are composed, suffice +to form a practical and efficient programming +language that is flexible enough to support most +of the major programming paradigms in use today. +</PRE> + +<P>(This is the first paragraph of the <EM>Revised<SUP><SMALL>5</SMALL></SUP> Report on +the Algorithmic Language Scheme,</EM> edited by <A NAME="g26"></A>William Clinger +and <A NAME="g27"></A>Jonathan Rees.) + +<P>Here is what the result should be if we justify our <CODE>r5rs</CODE> text: + +<P><PRE>Programming languages should be designed not by +piling feature on top of feature, but by +removing the weaknesses and restrictions that +make additional features appear necessary. +Scheme demonstrates that a very small number of +rules for forming expressions, with no +restrictions on how they are composed, suffice +to form a practical and efficient programming +language that is flexible enough to support most +of the major programming paradigms in use today. +</PRE> + +<P>The tricky part is that ordinarily we don't control the spaces that appear +when a sentence is printed. We just make sure the words are right, and we +get one space between words automatically. The solution used in this +program is that each line of the output file is constructed as a single long +word, including space characters that we place explicitly within it. (Since +<CODE>show-line</CODE> requires a sentence as its argument, our procedure will +actually return a one-word sentence. In the following program, <CODE>pad</CODE> +constructs the word, and <CODE>justify</CODE> makes a one-word sentence containing +it.) + +<P>This program, although short, is much harder to understand than most of +our short examples. There is no big new idea involved; instead, there are a +number of unexciting but necessary details. How many spaces between words? +Do some words get more space than others? The program structure is messy +because the problem itself is messy. Although it will be hard to read and +understand, this program is a more realistic example of input/output +programming than the cleanly structured examples we've shown until now. + +<P><CODE>Justify</CODE> takes two arguments, the line of text (a sentence) and a +number indicating the desired width (how many characters). Here's the +algorithm: First the program computes the total number of characters the +sentence would take up without adding extras. That's the job of <CODE>char-count</CODE>, which adds up the lengths of all the words, and adds to that +the <EM>n</EM>−1 spaces between words. <CODE>Extra-spaces</CODE> subtracts that length +from the desired line width to get the number of extra spaces we need. + +<P>The hard part of the job is done by <CODE>pad</CODE>. It's invoked with three +arguments: the part of the line not yet processed, the number of +opportunities there are to insert extra spaces in that part of the line +(that is, the number of words minus one), and the number of extra spaces that +the program still needs to insert. The number of extra spaces to insert +<EM>this time</EM> is the integer quotient of the number <CODE>pad</CODE> wants to +insert and the number of chances it'll have. That is, if there are five +words on the line, there are four places where <CODE>pad</CODE> can insert extra +space. If it needs to insert nine spaces altogether, then it should insert +9/4 or two spaces at the first opportunity. (Are you worried about the +remainder? It will turn out that <CODE>pad</CODE> doesn't lose any spaces because +it takes the quotient over again for each word break. The base case is that +the number of remaining word breaks (the divisor) is one, so there will be +no remainder, and all the leftover extra spaces will be inserted at the last +word break.) + +<P><PRE>(define (<A NAME="g28"></A>justify line width) + (if (< (count line) 2) + line + (se (pad line + (- (count line) 1) + (extra-spaces width (char-count line)))))) + +(define (<A NAME="g29"></A>char-count line) + (+ (accumulate + (every count line)) ; letters within words + (- (count line) 1))) ; plus spaces between words + +(define (<A NAME="g30"></A>extra-spaces width chars) + (if (> chars width) + 0 ; none if already too wide + (- width chars))) + +(define (<A NAME="g31"></A>pad line chances needed) + (if (= chances 0) ; only one word in line + (first line) + (let ((extra (quotient needed chances))) + (word (first line) + (spaces (+ extra 1)) + (pad (bf line) (- chances 1) (- needed extra)))))) + +(define (<A NAME="g32"></A>spaces n) + (if (= n 0) + "" + (word " " (spaces (- n 1))))) +</PRE> + +<P>Because <CODE>justify</CODE> takes two arguments, we have to decide what line width +we want to give it. Here's how to make each line take 50 characters: + +<P><PRE>> (file-map (lambda (sent) (justify sent 50)) "r5rs" "r5rs-just") +</PRE> + +<P><H2>Preserving Spacing of Text from Files</H2> + +<P>If we try to print the file <CODE>r5rs-just</CODE> from the previous section using +<CODE>print-file</CODE>, it'll look exactly like <CODE>r5rs</CODE>. That's because <CODE>read-line</CODE> doesn't preserve consecutive spaces in the lines that it reads. +<CODE>Read-line</CODE> cares only where each word (consisting of non-space +characters) begins and ends; it pays no attention to how many spaces come +between any two words. The lines + +<P><PRE>All My Loving +</PRE> + +<P>and + +<P><PRE>All My Loving +</PRE> + +<P>are the same, as far as <CODE>read-line</CODE> tells you. + +<P>For situations in which we do care about spacing, we have another way to +read a line from a file. The procedure <A NAME="g33"></A><CODE>read-string</CODE> reads all of the +characters on a line, returning a single word that contains all of them, +spaces included:<A NAME="text2" HREF="files#ft2">[2]</A> + +<P> +<PRE>> (define inport (open-input-file "r5rs-just")) + +> (read-string inport) +"Programming languages should be designed not by" + +> (read-string inport) +"piling feature on top of feature, but by" + +> (close-input-port inport) +</PRE> + +<P> +<P>We can use <CODE>read-string</CODE> to rewrite <CODE>print-file</CODE> so that it makes +an exact copy of the input file: + +<P><PRE>(define (<A NAME="g34"></A>print-file-helper port) + (let ((stuff (read-string port))) + (if (eof-object? stuff) + 'done + (begin (show stuff) + (print-file-helper port))))) +</PRE> + +<P>(We only had to change the helper procedure.) + +<P> +<H2>Merging Two Files</H2> + +<P>Suppose you have two files of people's names. Each file has been sorted +in alphabetical order. You want to combine them to form a single file, +still in order. (If this sounds unrealistic, it isn't. Programs that sort +very large amounts of information can't always fit it all in memory at +once, so they read in as much as fits, sort it, and write a file. Then +they read and sort another chunk. At the end of this process, the program +is left with several sorted partial files, and it has to merge those +files to get the overall result.) + +<P>The algorithm for merging files is exactly the same as the one we used for +merging sentences in the <CODE>mergesort</CODE> program of Chapter 15. The +only difference is that the items to be sorted come from reading ports +instead of from <CODE>first</CODE>ing a sentence. + +<P><PRE>(define (<A NAME="g35"></A>filemerge file1 file2 outfile) + (let ((p1 (open-input-file file1)) + (p2 (open-input-file file2)) + (outp (open-output-file outfile))) + (filemerge-helper p1 p2 outp (read-string p1) (read-string p2)) + (close-output-port outp) + (close-input-port p1) + (close-input-port p2) + 'done)) + +(define (<A NAME="g36"></A>filemerge-helper p1 p2 outp line1 line2) + (cond ((eof-object? line1) (merge-copy line2 p2 outp)) + ((eof-object? line2) (merge-copy line1 p1 outp)) + ((before? line1 line2) + (show line1 outp) + (filemerge-helper p1 p2 outp (read-string p1) line2)) + (else (show line2 outp) + (filemerge-helper p1 p2 outp line1 (read-string p2))))) + +(define (<A NAME="g37"></A>merge-copy line inp outp) + (if (eof-object? line) + #f + (begin (show line outp) + (merge-copy (read-string inp) inp outp)))) +</PRE> + +<P>You might think, comparing <CODE>filemerge-helper</CODE> with such earlier examples +as <CODE>print-file-helper</CODE> and <CODE>file-map-helper</CODE>, that it would make +more sense for <CODE>filemerge-helper</CODE> to take just the three ports as +arguments and work like this: + +<P><PRE>(define (filemerge-helper p1 p2 outp) ;; wrong + (let ((line1 (read-string p1)) + (line2 (read-string p2))) + (cond ((eof-object? line1) (merge-copy p2 outp)) + ((eof-object? line2) (merge-copy p1 outp)) + ((before? line1 line2) + (show line1 outp) + (filemerge-helper p1 p2 outp)) + (else (show line2 outp) + (filemerge-helper p1 p2 outp))))) +</PRE> + +<P>Unfortunately, this won't work. Suppose that the first line of <CODE>file2</CODE> +comes before the first line of <CODE>file1</CODE>. This program correctly writes +the first line of <CODE>file2</CODE> to the output file, as we expect. But what +about the first line of <CODE>file1</CODE>? Since we called <CODE>read-string</CODE> on +<CODE>file1</CODE>, we've "gobbled"<A NAME="text3" HREF="files#ft3">[3]</A> that line, but we're not yet ready to write it to the output. + +<P>In each invocation of <CODE>filemerge-helper</CODE>, only one line is written to +the output file, so unless we want to lose information, we'd better read +only one line. This means that we can't call <CODE>read-string</CODE> twice on each +recursive call. One of the lines has to be handed down from one invocation +to the next. (That is, it has to be an argument to the recursive call.) +Since we don't know in advance <EM>which</EM> line to keep, the easiest +solution is to hand down both lines. + +<P>Therefore, <CODE>filemerge-helper</CODE> also takes as arguments the first line of +each file that hasn't yet been written to the output. When we first call +<CODE>filemerge-helper</CODE> from <CODE>filemerge</CODE>, we read the first line of each +file to provide the initial values of these arguments. Then, on each +recursive call, <CODE>filemerge-helper</CODE> calls <CODE>read-string</CODE> only once. + +<P><H2>Writing Files for Scheme to Read</H2> + +<P>You may be thinking that the three file-reading procedures we've shown, <CODE>read</CODE>, <CODE>read-line</CODE>, and <CODE>read-string</CODE>, have been getting better and +better. <CODE>Read</CODE> ignores case and forces you to have parentheses in your +file. <CODE>Read-line</CODE> fixes those problems, but it loses spacing information. +<CODE>Read-string</CODE> can read anything and always gets it right. + +<P>But there's a cost to the generality of <CODE>read-string</CODE>; it can read any +file, but it loses <EM>structure</EM> information. For example, when we +processed a file of people's names with <CODE>file-map</CODE>, we used this +function: + +<P><PRE>(define (lastfirst name) + (se (word (last name) ",") (bl name))) +</PRE> + +<P>It's easy to break a name into its components if you have the name +in the form of a sentence, with the words separated already. But if we had +read each line with <CODE>read-string</CODE>, <CODE>last</CODE> of a line would have been +the last letter, not the last name. + +<P>The <CODE>lastfirst</CODE> example illustrates why you might want to use <CODE>read-line</CODE> rather than <CODE>read-string</CODE>: <CODE>Read-line</CODE> "understands" +spaces. Here's an example in which the even more structured <CODE>read</CODE> is +appropriate. We have a file of Beatles songs and the albums on which they +appear: + +<P><PRE>((love me do) (please please me)) +((do you want to know a secret?) (please please me)) +((think for yourself) (rubber soul)) +((your mother should know) (magical mystery tour)) +</PRE> + +<P>Each line of this file contains two pieces of information: a song +title and an album title. If each line contained only the words of the two +titles, as in + +<P><PRE>love me do please please me +</PRE> + +<P>how would we know where the song title stops and the album title +starts? The natural way to represent this grouping information is to use +the mechanism Scheme provides for grouping, namely, list structure. + +<P>If we use <CODE>read-line</CODE> to read the file, we'll lose the list structure; +it will return a sentence containing words like <CODE>"((love"</CODE>. <CODE>Read</CODE>, +however, will do what we want. + +<P>How did we create this file in the first place? We just used one <CODE>show</CODE> +per line of the file, like this: + +<P><PRE>> (show '((love me do) (please please me)) port) +</PRE> + +<P>But what about the movie soundtracks? We're going to have to come to terms +with the apostrophe in "A Hard Day's Night." + +<P>The straightforward solution is to put <CODE>day's</CODE> in a string: + +<P><PRE>(show '((and i love her) (a hard "day's" night)) port) +</PRE> + +<P>The corresponding line in the file will look like this: + +<P><PRE>((AND I LOVE HER) (A HARD day's NIGHT)) +</PRE> + +<P>This result is actually even worse than it looks, because when we +try to <CODE>read</CODE> the line back, the <CODE>'s</CODE> will be expanded into <CODE>(quote s)</CODE> in most versions of Scheme. Using a string made it possible for +us to get an apostrophe into Scheme. If the word <CODE>day's</CODE> were inside +quotation marks in the file, then <CODE>read</CODE> would understand our intentions. + +<P>Why aren't there double quotes in the file? All of the printing +procedures we've seen so far assume that whatever you're printing is +intended to be read by people. Therefore, they try to minimize +distracting notation such as double-quote marks. But, as we've +discovered, if you're writing a file to be read by Scheme, then you +do want enough notation so that Scheme can tell what the original +object was. + +<P><CODE>Write</CODE> is a printing procedure just like <CODE>display</CODE>, except that it +<A NAME="spwrite"></A> +includes quote marks around strings:<A NAME="text4" HREF="files#ft4">[4]</A> + +<P><PRE>> (write '(a hard "day's" night)) +(A HARD "day's" NIGHT) +</PRE> + +<P>Once we're using strings, and since we're not extracting individual words +from the titles, we might as well represent each title as one string: + +<P><PRE>> (write '("And I Love Her" "A Hard Day's Night") port) +</PRE> + +<P><H2>Pitfalls</H2> + +<P>One pitfall crucial to avoid when using files is that if there is an +error in your program, it might blow up and return you to the Scheme prompt +without closing the open files. If you fix the program and try to run it +again, you may get a message like "file busy" because the operating system +of your computer may not allow you to open the same file on two ports at +once. Even worse, if you exit from Scheme without closing all your ports, +on some computers you may find that you have unreadable files thereafter. + +<P>To help cope with this problem, we've provided a procedure <CODE><A NAME="g39"></A><CODE>close-all-ports</CODE></CODE> that can be invoked to close every port that you've +opened since starting Scheme. This procedure works only in our modified +Scheme, but it can help you out of trouble while you're learning. + +<P>Be sure you don't open or close a file within a recursive procedure, +if you intend to do it only once. That's why most of the programs in +this chapter have the structure of a procedure that opens files, calls a +recursive helper, and then closes the files. + +<P>As we explained in the <CODE>filemerge</CODE> example, you can't read the same +line twice. Be sure your program remembers each line in a variable as long +as it's needed. + +<P><H2>Exercises</H2> + +<P><B>22.1</B> Write a <CODE><A NAME="g40"></A>concatenate</CODE> procedure that takes two arguments: a list +of names of input files, and one name for an output file. The procedure +should copy all of the input files, in order, into the output file. + + +<P> +<B>22.2</B> Write a procedure to count the number of lines in a file. It should +take the filename as argument and return the number. + + +<P> +<B>22.3</B> Write a procedure to count the number of words in a file. It should +take the filename as argument and return the number. + + +<P> +<B>22.4</B> Write a procedure to count the number of characters in a file, including +space characters. It should take the filename as argument and return the +number. + + +<P> +<B>22.5</B> Write a procedure that copies an input file to an output file but +eliminates multiple consecutive copies of the same line. That is, if +the input file contains the lines + +<P><PRE>John Lennon +Paul McCartney +Paul McCartney +George Harrison + + +Paul McCartney +Ringo Starr +</PRE> + +<P>then the output file should contain + +<P><PRE>John Lennon +Paul McCartney +George Harrison + +Paul McCartney +Ringo Starr +</PRE> + +<P> +<B>22.6</B> Write a <CODE><A NAME="g41"></A>lookup</CODE> procedure that takes as arguments a filename and +a word. The procedure should print (on the screen, not into another file) +only those lines from the input file that include the chosen word. + + +<P> +<B>22.7</B> Write a <CODE><A NAME="g42"></A>page</CODE> procedure that takes a filename as argument and +prints the file a screenful at a time. Assume that a screen can fit 24 +lines; your procedure should print 23 lines of the file and then a prompt +message, and then wait for the user to enter a (probably empty) line. It +should then print the most recent line from the file again (so that the user +will see some overlap between screenfuls) and 22 more lines, and so on until +the file ends. + + +<P> +<B>22.8</B> A common operation in a database program is to <EM>join</EM> two +databases, that is, to create a new database combining the information from the +two given ones. There has to be some piece of information in common between +the two databases. For example, suppose we have a class roster database in +which each record includes a student's name, student ID number, and computer +account name, like this: +<A NAME="join"></A> + +<P><PRE>((john alec entwistle) 04397 john) +((keith moon) 09382 kmoon) +((peter townshend) 10428 pete) +((roger daltrey) 01025 roger) +</PRE> + +<P>We also have a grade database in which each student's grades +are stored according to computer account name: + +<P><PRE>(john 87 90 76 68 95) +(kmoon 80 88 95 77 89) +(pete 100 92 80 65 72) +(roger 85 96 83 62 74) +</PRE> + +<P>We want to create a combined database like this: + +<P><PRE>((john alec entwistle) 04397 john 87 90 76 68 95) +((keith moon) 09382 kmoon 80 88 95 77 89) +((peter townshend) 10428 pete 100 92 80 65 72) +((roger daltrey) 01025 roger 85 96 83 62 74) +</PRE> + +<P>in which the information from the roster and grade databases has +been combined for each account name. + +<P>Write a program <CODE><A NAME="g43"></A>join</CODE> that takes five arguments: two input +filenames, two numbers indicating the position of the item within each +record that should overlap between the files, and an output filename. For +our example, we'd say + +<P><PRE>> (join "class-roster" "grades" 3 1 "combined-file") +</PRE> + +<P>In our example, both files are in alphabetical order of computer account +name, the account name is a word, and the same account name never appears +more than once in each file. In general, you may assume that these +conditions hold for the item that the two files have in common. Your +program should <EM>not</EM> assume that every item in one file also appears +in the other. A line should be written in the output file only for the +items that do appear in both files. + +<P> + +<HR> +<A NAME="ft1" HREF="files#text1">[1]</A> Another difference, not +apparent in this example, is that <CODE>show</CODE> and <CODE>read</CODE> can handle +structured lists. <CODE>Show-line</CODE> can print a structured list, leaving off +only the outermost parentheses, but <CODE>read-line</CODE> will treat any +parentheses in the file as ordinary characters; it always returns a +sentence.<P> +<A NAME="ft2" HREF="files#text2">[2]</A> Like all the input and output primitives, <CODE>read-string</CODE> can be invoked with or without a port argument.<P> +<A NAME="ft3" HREF="files#text3">[3]</A> Computer programmers really talk +this way.<P> +<A NAME="ft4" HREF="files#text4">[4]</A> There are other kinds of data +that <A NAME="g38"></A><CODE>write</CODE> prints differently from <CODE>display</CODE>, but we don't +use them in +this book. The general rule is that <CODE>display</CODE> formats the output for +human readers, while <CODE>write</CODE> ensures that Scheme can reread the +information unambiguously. <CODE>Show</CODE> and <CODE>show-line</CODE> are extensions +that we wrote using <CODE>display</CODE>. We could have written <CODE>show-in-write-format</CODE>, +for example, but happened not to need it.<P> +<P><A HREF="../ss-toc2.html">(back to Table of Contents)</A><P> +<A HREF="../ssch21/functions-implement.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch23/vectors.html"><STRONG>NEXT</STRONG></A> + +<P> +<ADDRESS> +<A HREF="../index.html">Brian Harvey</A>, +<CODE>bh@cs.berkeley.edu</CODE> +</ADDRESS> +</BODY> +</HTML> diff --git a/js/games/nluqo.github.io/~bh/ssch22/files.html b/js/games/nluqo.github.io/~bh/ssch22/files.html new file mode 100644 index 0000000..8e85ab1 --- /dev/null +++ b/js/games/nluqo.github.io/~bh/ssch22/files.html @@ -0,0 +1,875 @@ +<P> + +<P><A NAME="cabinets"></A> +<P><CENTER><IMG SRC="../ss-pics/files.jpg" ALT="figure: files"></CENTER> +<HTML> +<HEAD> +<TITLE>Simply Scheme: Introducing Computer Science ch 22: Files</TITLE> +</HEAD> +<BODY> +<HR> +<CITE>Simply Scheme:</CITE> +<CITE>Introducing Computer Science</CITE> 2/e Copyright (C) 1999 MIT +<H2>Chapter 22</H2> +<H1>Files</H1> + +<TABLE width="100%"><TR><TD> +<IMG SRC="../simply.jpg" ALT="cover photo"> +<TD><TABLE> +<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian +Harvey</A><BR>University of California, Berkeley</CITE> +<TR><TD align="right"><CITE><A HREF="http://ccrma.stanford.edu/~matt">Matthew +Wright</A><BR>University of California, Santa Barbara</CITE> +<TR><TD align="right"><BR> +<TR><TD align="right"><A HREF="../pdf/ssch22.pdf">Download PDF version</A> +<TR><TD align="right"><A HREF="../ss-toc2.html">Back to Table of Contents</A> +<TR><TD align="right"><A HREF="../ssch21/functions-implement.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch23/vectors.html"><STRONG>NEXT</STRONG></A> +<TR><TD align="right"><A HREF="http://mitpress.mit.edu/0262082810">MIT +Press web page for <CITE>Simply Scheme</CITE></A> +</TABLE></TABLE> + +<HR> + + +<P>We learned in Chapter 20 how to read from the keyboard and write to the +screen. The same procedures (<A NAME="g1"></A><CODE>read</CODE>, <CODE><A NAME="g2"></A><CODE>read-line</CODE></CODE>, +<A NAME="g3"></A><CODE>display</CODE>, <A NAME="g4"></A><CODE>show</CODE>, <CODE><A NAME="g5"></A><CODE>show-line</CODE></CODE>, and +<A NAME="g6"></A><CODE>newline</CODE>) can also be used to read and write <A NAME="g7"></A><A NAME="g8"></A>data files on +the disk. + +<P><H2>Ports</H2> + +<P>Imagine a complicated program that reads a little bit of data at a time from +a lot of different files. For example, soon we will write a program to +merge two files the way we merged two sentences in <CODE>mergesort</CODE> in Chapter +15. In order to make this work, each invocation of <CODE>read</CODE> must +specify which file to read from this time. Similarly, we might want to +direct output among several files. + +<P>Each of the input/output procedures can take an extra argument to specify a +file: + +<P><PRE>(show '(across the universe) file1) +(show-line '(penny lane) file2) +(read file3) +</PRE> + +<P>What are <CODE>file1</CODE> and so on? You might think that the natural +thing would be for them to be words, that is, the names of files. + +<P>It happens not to work that way. Instead, before you can use a file, you +have to <EM>open</EM> it. If you want to read a file, the system has to +check that the file exists. If you want to write a file, the system has to +create a new, empty file. The Scheme procedures that open a file return a +<EM>port,</EM> which is what Scheme uses to remember the file you opened. +Ports are useful only as arguments to the input/output procedures. Here's +an example: + +<P><PRE>> (let ((port (<A NAME="g9"></A>open-output-file "songs"))) + (<A NAME="g10"></A>show '(all my loving) port) + (show '(ticket to ride) port) + (show '(martha my dear) port) + (<A NAME="g11"></A>close-output-port port)) +</PRE> + +<P>(<CODE>Close-output-port</CODE>, like <CODE>define</CODE>, has an unspecified +return value that we're not going to include in the examples.) + +<P>We've created a file named <CODE>songs</CODE> and put three expressions, each on +its own line, in that file. Notice that nothing appeared on the screen when +we called <CODE>show</CODE>. Because we used a port argument to <CODE>show</CODE>, the +output went into the file. Here's what's in the file: + +<P><PRE>(ALL MY LOVING) +(TICKET TO RIDE) +(MARTHA MY DEAR) +</PRE> + +<P>The example illustrates two more details about using files that we haven't +mentioned before: First, the name of a file must be given in double-quote +marks. Second, when you're finished using a file, you have to <EM>close</EM> +the port associated with it. (This is very important. On some systems, +if you forget to close the port, the file disappears.) + +<P>The file is now permanent. If we were to exit from Scheme, we could read +the file in a word processor or any other program. But let's read it using +Scheme: + +<P><PRE>(define in (<A NAME="g12"></A>open-input-file "songs")) + +> (read in) +(ALL MY LOVING) + +> (read in) +(TICKET TO RIDE) + +> (read in) +(MARTHA MY DEAR) + +> (<A NAME="g13"></A>close-input-port in) +</PRE> + +<P>(In this illustration, we've used a global variable to hold the +port because we wanted to show the results of reading the file step by +step. In a real program, we'd generally use a <CODE>let</CODE> structure like the +one we used to write the file. Now that we've closed the port, the variable +<CODE>in</CODE> contains a port that can no longer be used.) + +<P><H2>Writing Files for People to Read</H2> + +<P> +A file full of sentences in parentheses is a natural representation for +information that will be used by a Scheme program, but it may seem awkward +if the file will be read by human beings. We could use <CODE>show-line</CODE> +instead of <CODE>show</CODE> to create a file, still with one song title per line, +but without the parentheses: + +<P><PRE>> (let ((port (open-output-file "songs2"))) + (show-line '(all my loving) port) + (show-line '(ticket to ride) port) + (show-line '(martha my dear) port) + (close-output-port port)) +</PRE> + +<P>The file <CODE>songs2</CODE> will contain + +<P><PRE>ALL MY LOVING +TICKET TO RIDE +MARTHA MY DEAR +</PRE> + +<P>What should we do if we want to read this file back into Scheme? We must +read the file a line at a time, with <CODE>read-line</CODE>. In effect, <CODE>read-line</CODE> treats the breaks between lines as if they were parentheses: + +<P><PRE>(define in (open-input-file "songs2")) + +> (read-line in) +(ALL MY LOVING) + +> (close-input-port in) +</PRE> + +<P>(Notice that we don't have to read the entire file before closing +the port. If we open the file again later, we start over again from the +beginning.) + +<P> +As far as Scheme is concerned, the result of writing the file with <CODE>show-line</CODE> and reading it with <CODE>read-line</CODE> was the same as that of +writing it with <CODE>show</CODE> and reading it with <CODE>read</CODE>. The difference +is that without parentheses the file itself is more "user-friendly" for +someone who reads it outside of Scheme.<A NAME="text1" HREF="files.html#ft1">[1]</A> + +<P><H2>Using a File as a Database</H2> + +<P>It's not very interesting merely to read the file line by line. Instead, +let's use it as a very small database in which we can look up songs by +number. (For only three songs, it would be more realistic and more +convenient to keep them in a list and look them up with <CODE>list-ref</CODE>. +Pretend that this file has 3000 songs, too many for you to want to keep +them all at once in your computer's memory.) + +<P><PRE>(define (<A NAME="g14"></A>get-song n) + (let ((port (open-input-file "songs2"))) + (skip-songs (- n 1) port) + (let ((answer (read-line port))) + (close-input-port port) + answer))) + +(define (<A NAME="g15"></A>skip-songs n port) + (if (= n 0) + 'done + (begin (read-line port) + (skip-songs (- n 1) port)))) + +> (get-song 2) +(TICKET TO RIDE) +</PRE> + +<P>When we invoke <CODE>read-line</CODE> in <CODE>skip-songs</CODE>, we pay no +attention to the value it returns. Remember that the values of all but the +last expression in a sequence are discarded. <CODE>Read</CODE> and <CODE>read-line</CODE> +are the first procedures we've seen that have both a useful return value and +a useful side effect—moving forward in the file. + +<P><CODE>Skip-songs</CODE> returns the word <CODE>done</CODE> when it's finished. We don't +do anything with that return value, and there's no particular reason why we +chose that word. But every Scheme procedure has to return <EM>something,</EM> +and this was as good as anything. + +<P>What if we asked for a song number greater than three? In other words, what +if we read beyond the end of the file? In that case, <CODE>read</CODE> will return +a special value called an <EM><A NAME="g16"></A><A NAME="g17"></A>end-of-file object.</EM> The only +useful thing to do with that value is to test for it. Our next sample +program reads an entire file and prints it to the screen: + +<P><PRE>(define (<A NAME="g18"></A>print-file name) + (let ((port (open-input-file name))) + (print-file-helper port) + (close-input-port port) + 'done)) + +(define (print-file-helper port) ;; first version + (let ((stuff (read-line port))) + (if (<A NAME="g19"></A>eof-object? stuff) + 'done + (begin (show-line stuff) + (print-file-helper port))))) + +> (print-file "songs") +ALL MY LOVING +TICKET TO RIDE +MARTHA MY DEAR +DONE +</PRE> + +<P>Did you notice that each recursive call in <CODE>print-file-helper</CODE> has +exactly the same argument as the one before? How does the problem get +smaller? (Up to now, recursive calls have involved something like the <CODE>butfirst</CODE> of an old argument, or one less than an old number.) When we're +reading a file, the sense in which the problem gets smaller at each +invocation is that we're getting closer to the end of the file. You don't +<CODE>butfirst</CODE> the port; reading it makes the unread portion of the file +smaller as a side effect. + +<P><H2>Transforming the Lines of a File</H2> + +<P>Often we want to transform a file one line at a time. That is, we want to +copy lines from an input file to an output file, but instead of copying the +lines exactly, we want each output line to be a <EM>function</EM> of the +corresponding input line. Here are some examples: We have a file full of +text and we want to <EM>justify</EM> the output so that every line is exactly +the same length, as in a book. We have a file of students' names and +grades, and we want a summary with the students' total and average scores. +We have a file with people's first and last names, and we want to rearrange +them to be last-name-first. + +<P>We'll write a procedure <A NAME="g20"></A><CODE>file-map</CODE>, analogous to <CODE>map</CODE> but for files. +It will take three arguments: The first will be a procedure whose domain and +range are sentences; the second will be the name of the input file; the +third will be the name of the output file. + +<P>Of course, this isn't exactly like the way <CODE>map</CODE> works—if it were +exactly analogous, it would take only two arguments, the procedure and the +<EM>contents</EM> of a file. But one of the important features of files is +that they let us handle amounts of information that are too big to fit all +at once in the computer's memory. Another feature is that once we write a +file, it's there permanently, until we erase it. So instead of having a +<CODE>file-map</CODE> <EM>function</EM> that returns the contents of the new file, +we have a procedure that writes its result to the disk. + +<P> +<PRE>(define (<A NAME="g21"></A>file-map fn inname outname) + (let ((inport (open-input-file inname)) + (outport (open-output-file outname))) + (file-map-helper fn inport outport) + (close-input-port inport) + (close-output-port outport) + 'done)) + +(define (<A NAME="g22"></A>file-map-helper fn inport outport) + (let ((line (read-line inport))) + (if (eof-object? line) + 'done + (begin (show-line (fn line) outport) + (file-map-helper fn inport outport))))) +</PRE> + +<P>Compare this program with the earlier <CODE>print-file</CODE> example. The two are +almost identical. One difference is that now the output goes to a file instead +of to the screen; the other is that we apply the function <CODE>fn</CODE> to each +line before doing the output. But that small change vastly increases the +generality of our program. We've performed our usual trick of generalizing a +<A NAME="g23"></A> +pattern by adding a procedure argument, and instead of a program that +carries out one specific task (printing the contents of a file), we have a +tool that can be used to create many programs. + +<P>We'll start with an easy example: putting the last name first in a file +full of names. That is, if we start with an input file named <CODE>dddbmt</CODE> +that contains + +<P><PRE>David Harmon +Trevor Davies +John Dymond +Michael Wilson +Ian Amey +</PRE> + +<P>we want the output file to contain + +<P><PRE>Harmon, David +Davies, Trevor +Dymond, John +Wilson, Michael +Amey, Ian +</PRE> + +<P>Since we are using <CODE>file-map</CODE> to handle our progress through the file, +all we have to write is a procedure that takes a sentence (one name) as its +argument and returns the same name but with the last word moved to the front +and with a comma added: +<PRE>(define (<A NAME="g24"></A>lastfirst name) + (se (word (last name) ",") (bl name))) +</PRE> + +<P>We use <CODE>butlast</CODE> rather than <CODE>first</CODE> in case someone in +the file has a middle name. + +<P>To use this procedure we call <CODE>file-map</CODE> like this: + +<P><PRE>> (file-map lastfirst "dddbmt" "dddbmt-reversed") +DONE +</PRE> + +<P> Although you don't see the results on the screen, you can + +<P><PRE>> (print-file "dddbmt-reversed") +</PRE> + +<P>to see that we got the results we wanted. + +<P>Our next example is averaging grades. Suppose the file <CODE>grades</CODE> +contains this text: + +<P><PRE>John 88 92 100 75 95 +Paul 90 91 85 80 91 +George 85 87 90 72 96 +Ringo 95 84 88 87 87 +</PRE> + +<P>The output we want is: + +<P><PRE>John total: 450 average: 90 +Paul total: 437 average: 87.4 +George total: 430 average: 86 +Ringo total: 441 average: 88.2 +</PRE> + +<P>Here's the program: + +<P><PRE>(define (<A NAME="g25"></A>process-grades line) + (se (first line) + "total:" + (accumulate + (bf line)) + "average:" + (/ (accumulate + (bf line)) + (count (bf line))))) + +> (file-map process-grades "grades" "results") +</PRE> + +<P>As before, you can + +<P><PRE>> (print-file "results") +</PRE> + +<P>to see that we got the results we wanted. + +<P> +<H2>Justifying Text</H2> + +<P>Many word-processing programs <EM>justify</EM> text; that is, they insert +extra space between words so that every line reaches exactly to the right +margin. We can do that using <CODE>file-map</CODE>. + +<P>Let's suppose we have a file <CODE>r5rs</CODE>, written in some text editor, that +looks like this: + +<P><PRE>Programming languages should be designed not by +piling feature on top of feature, but by +removing the weaknesses and restrictions that +make additional features appear necessary. +Scheme demonstrates that a very small number of +rules for forming expressions, with no +restrictions on how they are composed, suffice +to form a practical and efficient programming +language that is flexible enough to support most +of the major programming paradigms in use today. +</PRE> + +<P>(This is the first paragraph of the <EM>Revised<SUP><SMALL>5</SMALL></SUP> Report on +the Algorithmic Language Scheme,</EM> edited by <A NAME="g26"></A>William Clinger +and <A NAME="g27"></A>Jonathan Rees.) + +<P>Here is what the result should be if we justify our <CODE>r5rs</CODE> text: + +<P><PRE>Programming languages should be designed not by +piling feature on top of feature, but by +removing the weaknesses and restrictions that +make additional features appear necessary. +Scheme demonstrates that a very small number of +rules for forming expressions, with no +restrictions on how they are composed, suffice +to form a practical and efficient programming +language that is flexible enough to support most +of the major programming paradigms in use today. +</PRE> + +<P>The tricky part is that ordinarily we don't control the spaces that appear +when a sentence is printed. We just make sure the words are right, and we +get one space between words automatically. The solution used in this +program is that each line of the output file is constructed as a single long +word, including space characters that we place explicitly within it. (Since +<CODE>show-line</CODE> requires a sentence as its argument, our procedure will +actually return a one-word sentence. In the following program, <CODE>pad</CODE> +constructs the word, and <CODE>justify</CODE> makes a one-word sentence containing +it.) + +<P>This program, although short, is much harder to understand than most of +our short examples. There is no big new idea involved; instead, there are a +number of unexciting but necessary details. How many spaces between words? +Do some words get more space than others? The program structure is messy +because the problem itself is messy. Although it will be hard to read and +understand, this program is a more realistic example of input/output +programming than the cleanly structured examples we've shown until now. + +<P><CODE>Justify</CODE> takes two arguments, the line of text (a sentence) and a +number indicating the desired width (how many characters). Here's the +algorithm: First the program computes the total number of characters the +sentence would take up without adding extras. That's the job of <CODE>char-count</CODE>, which adds up the lengths of all the words, and adds to that +the <EM>n</EM>−1 spaces between words. <CODE>Extra-spaces</CODE> subtracts that length +from the desired line width to get the number of extra spaces we need. + +<P>The hard part of the job is done by <CODE>pad</CODE>. It's invoked with three +arguments: the part of the line not yet processed, the number of +opportunities there are to insert extra spaces in that part of the line +(that is, the number of words minus one), and the number of extra spaces that +the program still needs to insert. The number of extra spaces to insert +<EM>this time</EM> is the integer quotient of the number <CODE>pad</CODE> wants to +insert and the number of chances it'll have. That is, if there are five +words on the line, there are four places where <CODE>pad</CODE> can insert extra +space. If it needs to insert nine spaces altogether, then it should insert +9/4 or two spaces at the first opportunity. (Are you worried about the +remainder? It will turn out that <CODE>pad</CODE> doesn't lose any spaces because +it takes the quotient over again for each word break. The base case is that +the number of remaining word breaks (the divisor) is one, so there will be +no remainder, and all the leftover extra spaces will be inserted at the last +word break.) + +<P><PRE>(define (<A NAME="g28"></A>justify line width) + (if (< (count line) 2) + line + (se (pad line + (- (count line) 1) + (extra-spaces width (char-count line)))))) + +(define (<A NAME="g29"></A>char-count line) + (+ (accumulate + (every count line)) ; letters within words + (- (count line) 1))) ; plus spaces between words + +(define (<A NAME="g30"></A>extra-spaces width chars) + (if (> chars width) + 0 ; none if already too wide + (- width chars))) + +(define (<A NAME="g31"></A>pad line chances needed) + (if (= chances 0) ; only one word in line + (first line) + (let ((extra (quotient needed chances))) + (word (first line) + (spaces (+ extra 1)) + (pad (bf line) (- chances 1) (- needed extra)))))) + +(define (<A NAME="g32"></A>spaces n) + (if (= n 0) + "" + (word " " (spaces (- n 1))))) +</PRE> + +<P>Because <CODE>justify</CODE> takes two arguments, we have to decide what line width +we want to give it. Here's how to make each line take 50 characters: + +<P><PRE>> (file-map (lambda (sent) (justify sent 50)) "r5rs" "r5rs-just") +</PRE> + +<P><H2>Preserving Spacing of Text from Files</H2> + +<P>If we try to print the file <CODE>r5rs-just</CODE> from the previous section using +<CODE>print-file</CODE>, it'll look exactly like <CODE>r5rs</CODE>. That's because <CODE>read-line</CODE> doesn't preserve consecutive spaces in the lines that it reads. +<CODE>Read-line</CODE> cares only where each word (consisting of non-space +characters) begins and ends; it pays no attention to how many spaces come +between any two words. The lines + +<P><PRE>All My Loving +</PRE> + +<P>and + +<P><PRE>All My Loving +</PRE> + +<P>are the same, as far as <CODE>read-line</CODE> tells you. + +<P>For situations in which we do care about spacing, we have another way to +read a line from a file. The procedure <A NAME="g33"></A><CODE>read-string</CODE> reads all of the +characters on a line, returning a single word that contains all of them, +spaces included:<A NAME="text2" HREF="files.html#ft2">[2]</A> + +<P> +<PRE>> (define inport (open-input-file "r5rs-just")) + +> (read-string inport) +"Programming languages should be designed not by" + +> (read-string inport) +"piling feature on top of feature, but by" + +> (close-input-port inport) +</PRE> + +<P> +<P>We can use <CODE>read-string</CODE> to rewrite <CODE>print-file</CODE> so that it makes +an exact copy of the input file: + +<P><PRE>(define (<A NAME="g34"></A>print-file-helper port) + (let ((stuff (read-string port))) + (if (eof-object? stuff) + 'done + (begin (show stuff) + (print-file-helper port))))) +</PRE> + +<P>(We only had to change the helper procedure.) + +<P> +<H2>Merging Two Files</H2> + +<P>Suppose you have two files of people's names. Each file has been sorted +in alphabetical order. You want to combine them to form a single file, +still in order. (If this sounds unrealistic, it isn't. Programs that sort +very large amounts of information can't always fit it all in memory at +once, so they read in as much as fits, sort it, and write a file. Then +they read and sort another chunk. At the end of this process, the program +is left with several sorted partial files, and it has to merge those +files to get the overall result.) + +<P>The algorithm for merging files is exactly the same as the one we used for +merging sentences in the <CODE>mergesort</CODE> program of Chapter 15. The +only difference is that the items to be sorted come from reading ports +instead of from <CODE>first</CODE>ing a sentence. + +<P><PRE>(define (<A NAME="g35"></A>filemerge file1 file2 outfile) + (let ((p1 (open-input-file file1)) + (p2 (open-input-file file2)) + (outp (open-output-file outfile))) + (filemerge-helper p1 p2 outp (read-string p1) (read-string p2)) + (close-output-port outp) + (close-input-port p1) + (close-input-port p2) + 'done)) + +(define (<A NAME="g36"></A>filemerge-helper p1 p2 outp line1 line2) + (cond ((eof-object? line1) (merge-copy line2 p2 outp)) + ((eof-object? line2) (merge-copy line1 p1 outp)) + ((before? line1 line2) + (show line1 outp) + (filemerge-helper p1 p2 outp (read-string p1) line2)) + (else (show line2 outp) + (filemerge-helper p1 p2 outp line1 (read-string p2))))) + +(define (<A NAME="g37"></A>merge-copy line inp outp) + (if (eof-object? line) + #f + (begin (show line outp) + (merge-copy (read-string inp) inp outp)))) +</PRE> + +<P>You might think, comparing <CODE>filemerge-helper</CODE> with such earlier examples +as <CODE>print-file-helper</CODE> and <CODE>file-map-helper</CODE>, that it would make +more sense for <CODE>filemerge-helper</CODE> to take just the three ports as +arguments and work like this: + +<P><PRE>(define (filemerge-helper p1 p2 outp) ;; wrong + (let ((line1 (read-string p1)) + (line2 (read-string p2))) + (cond ((eof-object? line1) (merge-copy p2 outp)) + ((eof-object? line2) (merge-copy p1 outp)) + ((before? line1 line2) + (show line1 outp) + (filemerge-helper p1 p2 outp)) + (else (show line2 outp) + (filemerge-helper p1 p2 outp))))) +</PRE> + +<P>Unfortunately, this won't work. Suppose that the first line of <CODE>file2</CODE> +comes before the first line of <CODE>file1</CODE>. This program correctly writes +the first line of <CODE>file2</CODE> to the output file, as we expect. But what +about the first line of <CODE>file1</CODE>? Since we called <CODE>read-string</CODE> on +<CODE>file1</CODE>, we've "gobbled"<A NAME="text3" HREF="files.html#ft3">[3]</A> that line, but we're not yet ready to write it to the output. + +<P>In each invocation of <CODE>filemerge-helper</CODE>, only one line is written to +the output file, so unless we want to lose information, we'd better read +only one line. This means that we can't call <CODE>read-string</CODE> twice on each +recursive call. One of the lines has to be handed down from one invocation +to the next. (That is, it has to be an argument to the recursive call.) +Since we don't know in advance <EM>which</EM> line to keep, the easiest +solution is to hand down both lines. + +<P>Therefore, <CODE>filemerge-helper</CODE> also takes as arguments the first line of +each file that hasn't yet been written to the output. When we first call +<CODE>filemerge-helper</CODE> from <CODE>filemerge</CODE>, we read the first line of each +file to provide the initial values of these arguments. Then, on each +recursive call, <CODE>filemerge-helper</CODE> calls <CODE>read-string</CODE> only once. + +<P><H2>Writing Files for Scheme to Read</H2> + +<P>You may be thinking that the three file-reading procedures we've shown, <CODE>read</CODE>, <CODE>read-line</CODE>, and <CODE>read-string</CODE>, have been getting better and +better. <CODE>Read</CODE> ignores case and forces you to have parentheses in your +file. <CODE>Read-line</CODE> fixes those problems, but it loses spacing information. +<CODE>Read-string</CODE> can read anything and always gets it right. + +<P>But there's a cost to the generality of <CODE>read-string</CODE>; it can read any +file, but it loses <EM>structure</EM> information. For example, when we +processed a file of people's names with <CODE>file-map</CODE>, we used this +function: + +<P><PRE>(define (lastfirst name) + (se (word (last name) ",") (bl name))) +</PRE> + +<P>It's easy to break a name into its components if you have the name +in the form of a sentence, with the words separated already. But if we had +read each line with <CODE>read-string</CODE>, <CODE>last</CODE> of a line would have been +the last letter, not the last name. + +<P>The <CODE>lastfirst</CODE> example illustrates why you might want to use <CODE>read-line</CODE> rather than <CODE>read-string</CODE>: <CODE>Read-line</CODE> "understands" +spaces. Here's an example in which the even more structured <CODE>read</CODE> is +appropriate. We have a file of Beatles songs and the albums on which they +appear: + +<P><PRE>((love me do) (please please me)) +((do you want to know a secret?) (please please me)) +((think for yourself) (rubber soul)) +((your mother should know) (magical mystery tour)) +</PRE> + +<P>Each line of this file contains two pieces of information: a song +title and an album title. If each line contained only the words of the two +titles, as in + +<P><PRE>love me do please please me +</PRE> + +<P>how would we know where the song title stops and the album title +starts? The natural way to represent this grouping information is to use +the mechanism Scheme provides for grouping, namely, list structure. + +<P>If we use <CODE>read-line</CODE> to read the file, we'll lose the list structure; +it will return a sentence containing words like <CODE>"((love"</CODE>. <CODE>Read</CODE>, +however, will do what we want. + +<P>How did we create this file in the first place? We just used one <CODE>show</CODE> +per line of the file, like this: + +<P><PRE>> (show '((love me do) (please please me)) port) +</PRE> + +<P>But what about the movie soundtracks? We're going to have to come to terms +with the apostrophe in "A Hard Day's Night." + +<P>The straightforward solution is to put <CODE>day's</CODE> in a string: + +<P><PRE>(show '((and i love her) (a hard "day's" night)) port) +</PRE> + +<P>The corresponding line in the file will look like this: + +<P><PRE>((AND I LOVE HER) (A HARD day's NIGHT)) +</PRE> + +<P>This result is actually even worse than it looks, because when we +try to <CODE>read</CODE> the line back, the <CODE>'s</CODE> will be expanded into <CODE>(quote s)</CODE> in most versions of Scheme. Using a string made it possible for +us to get an apostrophe into Scheme. If the word <CODE>day's</CODE> were inside +quotation marks in the file, then <CODE>read</CODE> would understand our intentions. + +<P>Why aren't there double quotes in the file? All of the printing +procedures we've seen so far assume that whatever you're printing is +intended to be read by people. Therefore, they try to minimize +distracting notation such as double-quote marks. But, as we've +discovered, if you're writing a file to be read by Scheme, then you +do want enough notation so that Scheme can tell what the original +object was. + +<P><CODE>Write</CODE> is a printing procedure just like <CODE>display</CODE>, except that it +<A NAME="spwrite"></A> +includes quote marks around strings:<A NAME="text4" HREF="files.html#ft4">[4]</A> + +<P><PRE>> (write '(a hard "day's" night)) +(A HARD "day's" NIGHT) +</PRE> + +<P>Once we're using strings, and since we're not extracting individual words +from the titles, we might as well represent each title as one string: + +<P><PRE>> (write '("And I Love Her" "A Hard Day's Night") port) +</PRE> + +<P><H2>Pitfalls</H2> + +<P>One pitfall crucial to avoid when using files is that if there is an +error in your program, it might blow up and return you to the Scheme prompt +without closing the open files. If you fix the program and try to run it +again, you may get a message like "file busy" because the operating system +of your computer may not allow you to open the same file on two ports at +once. Even worse, if you exit from Scheme without closing all your ports, +on some computers you may find that you have unreadable files thereafter. + +<P>To help cope with this problem, we've provided a procedure <CODE><A NAME="g39"></A><CODE>close-all-ports</CODE></CODE> that can be invoked to close every port that you've +opened since starting Scheme. This procedure works only in our modified +Scheme, but it can help you out of trouble while you're learning. + +<P>Be sure you don't open or close a file within a recursive procedure, +if you intend to do it only once. That's why most of the programs in +this chapter have the structure of a procedure that opens files, calls a +recursive helper, and then closes the files. + +<P>As we explained in the <CODE>filemerge</CODE> example, you can't read the same +line twice. Be sure your program remembers each line in a variable as long +as it's needed. + +<P><H2>Exercises</H2> + +<P><B>22.1</B> Write a <CODE><A NAME="g40"></A>concatenate</CODE> procedure that takes two arguments: a list +of names of input files, and one name for an output file. The procedure +should copy all of the input files, in order, into the output file. + + +<P> +<B>22.2</B> Write a procedure to count the number of lines in a file. It should +take the filename as argument and return the number. + + +<P> +<B>22.3</B> Write a procedure to count the number of words in a file. It should +take the filename as argument and return the number. + + +<P> +<B>22.4</B> Write a procedure to count the number of characters in a file, including +space characters. It should take the filename as argument and return the +number. + + +<P> +<B>22.5</B> Write a procedure that copies an input file to an output file but +eliminates multiple consecutive copies of the same line. That is, if +the input file contains the lines + +<P><PRE>John Lennon +Paul McCartney +Paul McCartney +George Harrison + + +Paul McCartney +Ringo Starr +</PRE> + +<P>then the output file should contain + +<P><PRE>John Lennon +Paul McCartney +George Harrison + +Paul McCartney +Ringo Starr +</PRE> + +<P> +<B>22.6</B> Write a <CODE><A NAME="g41"></A>lookup</CODE> procedure that takes as arguments a filename and +a word. The procedure should print (on the screen, not into another file) +only those lines from the input file that include the chosen word. + + +<P> +<B>22.7</B> Write a <CODE><A NAME="g42"></A>page</CODE> procedure that takes a filename as argument and +prints the file a screenful at a time. Assume that a screen can fit 24 +lines; your procedure should print 23 lines of the file and then a prompt +message, and then wait for the user to enter a (probably empty) line. It +should then print the most recent line from the file again (so that the user +will see some overlap between screenfuls) and 22 more lines, and so on until +the file ends. + + +<P> +<B>22.8</B> A common operation in a database program is to <EM>join</EM> two +databases, that is, to create a new database combining the information from the +two given ones. There has to be some piece of information in common between +the two databases. For example, suppose we have a class roster database in +which each record includes a student's name, student ID number, and computer +account name, like this: +<A NAME="join"></A> + +<P><PRE>((john alec entwistle) 04397 john) +((keith moon) 09382 kmoon) +((peter townshend) 10428 pete) +((roger daltrey) 01025 roger) +</PRE> + +<P>We also have a grade database in which each student's grades +are stored according to computer account name: + +<P><PRE>(john 87 90 76 68 95) +(kmoon 80 88 95 77 89) +(pete 100 92 80 65 72) +(roger 85 96 83 62 74) +</PRE> + +<P>We want to create a combined database like this: + +<P><PRE>((john alec entwistle) 04397 john 87 90 76 68 95) +((keith moon) 09382 kmoon 80 88 95 77 89) +((peter townshend) 10428 pete 100 92 80 65 72) +((roger daltrey) 01025 roger 85 96 83 62 74) +</PRE> + +<P>in which the information from the roster and grade databases has +been combined for each account name. + +<P>Write a program <CODE><A NAME="g43"></A>join</CODE> that takes five arguments: two input +filenames, two numbers indicating the position of the item within each +record that should overlap between the files, and an output filename. For +our example, we'd say + +<P><PRE>> (join "class-roster" "grades" 3 1 "combined-file") +</PRE> + +<P>In our example, both files are in alphabetical order of computer account +name, the account name is a word, and the same account name never appears +more than once in each file. In general, you may assume that these +conditions hold for the item that the two files have in common. Your +program should <EM>not</EM> assume that every item in one file also appears +in the other. A line should be written in the output file only for the +items that do appear in both files. + +<P> + +<HR> +<A NAME="ft1" HREF="files.html#text1">[1]</A> Another difference, not +apparent in this example, is that <CODE>show</CODE> and <CODE>read</CODE> can handle +structured lists. <CODE>Show-line</CODE> can print a structured list, leaving off +only the outermost parentheses, but <CODE>read-line</CODE> will treat any +parentheses in the file as ordinary characters; it always returns a +sentence.<P> +<A NAME="ft2" HREF="files.html#text2">[2]</A> Like all the input and output primitives, <CODE>read-string</CODE> can be invoked with or without a port argument.<P> +<A NAME="ft3" HREF="files.html#text3">[3]</A> Computer programmers really talk +this way.<P> +<A NAME="ft4" HREF="files.html#text4">[4]</A> There are other kinds of data +that <A NAME="g38"></A><CODE>write</CODE> prints differently from <CODE>display</CODE>, but we don't +use them in +this book. The general rule is that <CODE>display</CODE> formats the output for +human readers, while <CODE>write</CODE> ensures that Scheme can reread the +information unambiguously. <CODE>Show</CODE> and <CODE>show-line</CODE> are extensions +that we wrote using <CODE>display</CODE>. We could have written <CODE>show-in-write-format</CODE>, +for example, but happened not to need it.<P> +<P><A HREF="../ss-toc2.html">(back to Table of Contents)</A><P> +<A HREF="../ssch21/functions-implement.html"><STRONG>BACK</STRONG></A> +chapter thread <A HREF="../ssch23/vectors.html"><STRONG>NEXT</STRONG></A> + +<P> +<ADDRESS> +<A HREF="../index.html">Brian Harvey</A>, +<CODE>bh@cs.berkeley.edu</CODE> +</ADDRESS> +</BODY> +</HTML> |