《Structure and Interpretation of Computer Programs》lecture 13 webhan.pdf_大学文库

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology 6.001 Notes: Section 31.1 Slide 3l.ll Trees, graphs and Search In previous lectures we have seen a number of important themes, which relate to designing code for complex syste One was the idea of proof by induction, meaning that we could Induction for proving code correctness reason formally about whether and why our code would work Data structures can gather information together into abstract units correctly based on an inductive proof. This specifically said that Code to manipulate data abstractions tends to reflect if we could prove(using basic axioms) that the code correctly structure of abstraction handled a base case, and if we could prove that assuming the code ran correctly for all cases whose input was less than some given size, then it ran correctly for input of that given size, then we could conclude that it ran correctly on all correct inputs 922003 60015e A second was the idea that we can gather related information ogether into higher-level data units which we could abstract into treating as simple elements. Lists were a good example. And we saw that so long as the constructor and selectors of the abstraction obeyed a contract, we could suppress the details of the abstraction from its The third was that code written to manipulate data abstractions frequently had a structure that reflected the underlying structure of the abstraction: often we would use selectors to extract out subparts, manipulate those parts, and then use the constructor to create a new version of the abstraction Trees, Graphs and Search Slide 31.1.2 oday we are going to explore these themes in more detail. We Key themes from previous lectures are going to use them to build more complex data structures, Induction for proving code correctness Data structures can gather information together into and are particularly going to see how we can use inductive reasoning to design procedures to manipulate such structures Code to manipulate data abstractions tends to reflect Specifically, we are going to look at two very useful data abstractions trees and graphs. And we are going to see how Exploring the combination of these themes procedures that search for information in large collections can be nicely designed to interact with such structures Graph Algorithms for trees and graph

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.1.3 Let's start by revisiting lists. Remember that we said a list was Revisiting lists imply a sequence of cons cells, connected by cdr's, ending in an empty list(also known as nil) Remember that we said lists had the property of closure. If we List: sequence of cons pairs, ending in nil were to cons anything onto the beginning of the list, we would Closed under the action of cons get a new sequence of cons cells, ending in the empty list, Closed (except for the empty list) under the action of hence a list. And with the exception of the empty list, taking the| -Induction shows that all lists satisfy this property cdr of a list results in a sequence of cons cells, ending in the empty list, ng really has an inductive flavor to it The base case Note that this is an empty list. Given that every collection of cons cells of size 4 marms 6001S less than n, ending in an empty list, is a list; then clearly consing a new element onto such a list yields a list, and thus by induction, every collection of cons cells ending in the empty list is a list. We should be able to use this idea to reason about procedures that manipulate lists Code that manipulates lists Slide 31.1. 4 Here is one of our standard procedures for manipulating lists (define (map proc 1st) kif (null? lst) our old friend map. Intuitively, we know how this code should work, but can we establish formally that it does what we cans(proc《car1st) (map proc (cdr lst))) Inductive expect? Sure. We just use our notion of induction here, both on the data (ma 1234)) structure and on the code itself. for the base case we have the base case data structure for a list, namely an empty list. Thus, the code clearly returns the right structure. Now, assume that map correctly returns a list in which each element of the input I list has had proc applied to it, and that the order of the elements is preserved, for any list of size smaller than the current list. Then we know, given Ist, that by induction on the data structure (cdr lst) is a list. By induction on the procedure map we know this willl return a list of the same size, with each element replaced by the application of proc to that element. We can then process the first element of the list, and by induction on the data structure, cons will return a new list of the appropriate size that also satisfies the conditions of map. Thus by induction, we know that map will correctly perform as expected Slide 31.1.5 Code that manipulates lists Now, is there anything explicit in the code that says this applies only to lists of numbers? p proc Ist) 主f(nu11?1st) Of course not. It could be lists of symbols, or lists of strings. Or con(Px。《cax1at) Base case ists of anything. That can be seen by looking at the type ( map proc (cdr lst)))) Inductive definition for a list (1234) As with pairs, we will represent a list by the symbol list Va1ue:(14916) followed by angle brackets containing a type definition for the Is there anyting sedal about lists o numben? elements of the list. Since lists are created out of pairs, we can Type Definition Listanunber and whose second element is a List of the same type, or(which Liban b an i mor dta) be more explicit about this definition, by noting that a list of some type is either a Pair whose first element is of that type pafr I noll 600SC is indicated by the vertical bar) the list is the special empty list

6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. Slide 31.1.3 Let's start by revisiting lists. Remember that we said a list was simply a sequence of cons cells, connected by cdr's, ending in an empty list (also known as nil). Remember that we said lists had the property of closure. If we were to cons anything onto the beginning of the list, we would get a new sequence of cons cells, ending in the empty list, hence a list. And with the exception of the empty list, taking the cdr of a list results in a sequence of cons cells, ending in the empty list, hence a list. Note that this really has an inductive flavor to it. The base case is an empty list. Given that every collection of cons cells of size less than n, ending in an empty list, is a list; then clearly consing a new element onto such a list yields a list, and thus by induction, every collection of cons cells ending in the empty list is a list. We should be able to use this idea to reason about procedures that manipulate lists. Slide 31.1.4 Here is one of our standard procedures for manipulating lists: our old friend map. Intuitively, we know how this code should work, but can we establish formally that it does what we expect? Sure. We just use our notion of induction here, both on the data structure and on the code itself. For the base case, we have the base case data structure for a list, namely an empty list. Thus, the code clearly returns the right structure. Now, assume that map correctly returns a list in which each element of the input list has had proc applied to it, and that the order of the elements is preserved, for any list of size smaller than the current list. Then we know, given lst, that by induction on the data structure (cdr lst) is a list. By induction on the procedure map we know this willl return a list of the same size, with each element replaced by the application of proc to that element. We can then process the first element of the list, and by induction on the data structure, cons will return a new list of the appropriate size that also satisfies the conditions of map. Thus, by induction, we know that map will correctly perform as expected. Slide 31.1.5 Now, is there anything explicit in the code that says this applies only to lists of numbers? Of course not. It could be lists of symbols, or lists of strings. Or lists of anything. That can be seen by looking at the type definition for a list. As with pairs, we will represent a list by the symbol List followed by angle brackets containing a type definition for the elements of the list. Since lists are created out of pairs, we can be more explicit about this definition, by noting that a List of some type is either a Pair whose first element is of that type, and whose second element is a List of the same type, or (which is indicated by the vertical bar) the list is the special empty list

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.2.3 So what can we say about a tree? First, we can define its type Tr rees By analogy to a list, we have a tree of type C defined as either a list of trees of type C, reflecting the fact that there could be arbitrary branches, or that a tree is just a leaf of type C. Note the recursive definition of the tree structure. and a leaf of some Type Definition: type is just an element of that type Tree List>I Leaf In fact, as defined here, it would appear that a tree of type c ald al ways have leaves of the same type, e.g. numbers or strings. Of course, we could generalize this to allow variations of types of elements in the leaves of a tree Examples: countleaves caletree Associated with a tree, we expect to see some standard 92 operations. For example, we should have a predicate, leaf? that tests where an element is a leaf or if it is a more complex sub-tree. In addition, we would like to have procedures that generalize operations on lists, such as counting the number of leaves(or basic elements )of a tree We expect that these procedures should be more general than the lists versions, to reflect the fact that elements of a tree may themselves be complex things, like trees Tree Procedures Slide 31.2. 4 Given trees, built as lists of lists, what kinds of procedures can a tree is a list, we can use list procedures on them we create to manipulate them? First, because a tree is (define my-tree (list 4 (list 5 7)2)) implemented as a list, in principle we could use list operations m-tree on the tree. although that starts to infringe on the data abstraction barrier that separates the use of an abstraction from 国网网 its implementation Here is a simple example. I have defined a test tree, and given it a name. I have also shown the box-and-pointer diagram that represents the actual implementation of this tree 6001 SICP Slide 31.2.5 Tree Procedures Suppose I ask for the length of this structure. Ideally, length should be applied to lists, but because I have chosen. A tree is a list, we can use list procedures on them to represent a tree as a list of list, this procedure can be applied(define my-tree (list 4 (list 5 7)2)) here. For this example, it returns the value 3, which probably isn't what you expected. After all, we would like to think of a tree in terms of the number of leaves, not the number of top level branches Why did this happen? Well, recall that length applies to a list, and it simple counts the number of cons pairs in that list, without ever looking at what is hanging off those pairs. In this (arms 1 001 SICP case. this data structure is a list of three elements some of which happen to be lists as well So if i want the number of top-level branches, this does the right thing, but suppose I really want to know how many leaves are in the tree?

6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. Slide 31.2.3 So what can we say about a tree? First, we can define its type. By analogy to a list, we have a tree of type C defined as either a list of trees of type C, reflecting the fact that there could be arbitrary branches, or that a tree is just a leaf of type C. Note the recursive definition of the tree structure. And a leaf of some type is just an element of that type. In fact, as defined here, it would appear that a tree of type C would always have leaves of the same type, e.g. numbers or strings. Of course, we could generalize this to allow variations of types of elements in the leaves of a tree. Associated with a tree, we expect to see some standard operations. For example, we should have a predicate, leaf? that tests where an element is a leaf or if it is a more complex sub-tree. In addition, we would like to have procedures that generalize operations on lists, such as counting the number of leaves (or basic elements) of a tree. We expect that these procedures should be more general than the lists versions, to reflect the fact that elements of a tree may themselves be complex things, like trees. Slide 31.2.4 Given trees, built as lists of lists, what kinds of procedures can we create to manipulate them? First, because a tree is implemented as a list, in principle we could use list operations on the tree, although that starts to infringe on the data abstraction barrier that separates the use of an abstraction from its implementation. Here is a simple example. I have defined a test tree, and given it a name. I have also shown the box-and-pointer diagram that represents the actual implementation of this tree. Slide 31.2.5 Suppose I ask for the length of this structure. Ideally, length should be applied to lists, but because I have chosen to represent a tree as a list of list, this procedure can be applied here. For this example, it returns the value 3, which probably isn't what you expected. After all, we would like to think of a tree in terms of the number of leaves, not the number of toplevel branches. Why did this happen? Well, recall that length applies to a list, and it simple counts the number of cons pairs in that list, without ever looking at what is hanging off those pairs. In this case, this data structure is a list of three elements, some of which happen to be lists as well. So if I want the number of top-level branches, this does the right thing, but suppose I really want to know how many leaves are in the tree?

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.2.6 Tree Procedures What I would like is another procedure, count-leaves a tree is a list; we can use list procedures on therm that would in this case return the value 4 to indicate there are (define my-tree (list 4 (list 5 7)2)) takes advantage of the data structure of the tree to accomplis four basic elements in this tree. I need a procedure that direc countleaves my-tree) Slide 31.2.7 countleaves So let's think about this, using the general idea that pro should reflect the structure of the data abstraction to which they the cound of a tree is the sun of the countleaves of eac To count the number of leaves in a tree. we can devise a nice recursive strategy. The base case says if the tree is empty, then there are no leaves, obviously. However, we also have a second base case here. which is different than what we have seen before. This second base case says that if the tree is just a single leave. then the value to return is 1 For the recursive strategy we need to be careful. When we were4 dealing with lists, the recursive strategy said"do something to the car of the list" and then add that into whatever we are doing tree, so we have to count the leaves in each child, then add all of those lp er tree as each branch, or child, of the applied again to the rest of the list. Remember that a tree could have anoth countleaves Slide 31.2.8 So let's implement that idea. My definition for count leaves has one argument, a tree It starts with the two base he tatey. the court of tec s the sum of the coumleave f ach I cases. If the tree is empty, using the fact that the tree is implemented as a list, then return 0. If the tree is a leaf, which detine (countleaves tree cond ((null? tree) 0) base case we can find by testing that it is not a pair(since that wo bRse mean it was a list, and hence a subtree), then return o cs (countleaves (cdr tree))))) Otherwise, I add up the number of leaves in the first branch of this tree. and combine that with the number of leaves in the tree formed by pruning that first branch off. 9203 601 SICP I Aha! This is a different form than what we saw earlier Remember for lists, we would have just done something to the car of the tree and then combined that with a recursive call to count-leayes on the cdr of the tree But since the car of a tree could itself be a tree, and not just an element, we have to recursively apply the procedure to both the car and the cdr of the tree So what kind of order of growth should we expect for such a procedure? Notice that there are two calls to the procedure in the recursive case, and that should remind you of a type of procedure we saw before, namely exponential Step back for a second. Notice how again the definition of this procedure reflects the structure of the associated data abstraction to which it will be applied. Compare this to what we saw with lists

6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. Slide 31.2.6 What I would like is another procedure, count-leaves that would in this case return the value 4 to indicate there are four basic elements in this tree. I need a procedure that directly takes advantage of the data structure of the tree to accomplish this. Slide 31.2.7 So let's think about this, using the general idea that procedures should reflect the structure of the data abstraction to which they apply. To count the number of leaves in a tree, we can devise a nice recursive strategy. The base case says if the tree is empty, then there are no leaves, obviously. However, we also have a second base case here, which is different than what we have seen before. This second base case says that if the tree is just a single leave, then the value to return is 1. For the recursive strategy we need to be careful. When we were dealing with lists, the recursive strategy said "do something to the car of the list" and then add that into whatever we are doing applied again to the rest of the list. Remember that a tree could have another tree as each branch, or child, of the tree, so we have to count the leaves in each child, then add all of those up. Slide 31.2.8 So let's implement that idea. My definition for countleaves has one argument, a tree. It starts with the two base cases. If the tree is empty, using the fact that the tree is implemented as a list, then return 0. If the tree is a leaf, which we can find by testing that it is not a pair (since that would mean it was a list, and hence a subtree), then return 1. Otherwise, I add up the number of leaves in the first branch of this tree, and combine that with the number of leaves in the tree formed by pruning that first branch off. Aha! This is a different form than what we saw earlier. Remember for lists, we would have just done something to the car of the tree, and then combined that with a recursive call to count-leaves on the cdr of the tree. But since the car of a tree could itself be a tree, and not just an element, we have to recursively apply the procedure to both the car and the cdr of the tree. So what kind of order of growth should we expect for such a procedure? Notice that there are two calls to the procedure in the recursive case, and that should remind you of a type of procedure we saw before, namely exponential. Step back for a second. Notice how again the definition of this procedure reflects the structure of the associated data abstraction to which it will be applied. Compare this to what we saw with lists

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology This message we will repeat many times during the term. The structure of the procedure tends to go hand-in-hand with the structure of the data abstraction it is designed to manipulate Slide 31.2.9 countleaves and substitution model Because this is a new kind of data structure. and because we are creating new kinds of procedures to manipulate them, let's look (define (coun reel 1)/base cas at this a bit more carefully. Here again is the code for count 【(1eaf els leaves. with the two base cases. and the double recursion (countleaves (edr tree)) down the first branch and the rest of the branches of the tree ine (leaf not《pair?x))) 91200 600SC countleaves and substitution model Slide 31.2.10 Let's use our substitution model to see how this procedure evolves, and we will do this in stages. Here is the simplest case recursive case veg (car tr Let's apply count-leaves to a tree of one leaf. To be very careful, I am going to replace (list 2)with the define(e△E?x) actual box-and-pointer structure that is its value Count-leaves first checks the base cases but neither applies. Thus it reduces to adding count-leaves of the (count leaves (2)) 9 ((oount leaves 2)(oountleaves nil)) first subtree, or the car of the box-and-pointer structure to 6001 SKP soa count-leaves of the rest of the branches, or the cdr or the box-and-pointer structure Note that the first recursive call will catch the second base case while the second recursive call will catch the first base case, and eventually this reduces to the correct value for the number of leaves in this tree, as seen in the original box and pointer diagram, which had exactly one element in it Slide 31.2.11 countleaves and substitution model- pg 2 Now let's try a bit more complicated tree, as shown. Notice that this tree has the property that its first branch is just a leaf, but its define (countleaves tree) ((null? tree) 0),base case ((leaf? tree) 1) second branch, defined as everything but the first branch, is recursive case itself a tree (countleaves (odr tree))))) Count-leaves thus applies to the box-and-pointer ( define(Ie△f?x) (not (pair? x))) structure shown. In this case. the recursive call will add the number of leaves in the first branch, which is just the leaf, 5 since that is what the car returns to the number of leaves in es5) oount⊥ayes the tree, (7)since that is what the cdr of the original structure returns hus the first recursive call will trigger the second base case returning 1 for the single leaf there The second recursive call will reduce to the case we saw on the last slide recognizing this as another tree. Thus it unwinds the tree one more level then returning i for the leaf in the first branch, and o for the empty tree in the second branch

6.001 Structure and Interpretation of Computer Programs. Copyright © 2004 by Massachusetts Institute of Technology. This message we will repeat many times during the term. The structure of the procedure tends to go hand-in-hand with the structure of the data abstraction it is designed to manipulate. Slide 31.2.9 Because this is a new kind of data structure, and because we are creating new kinds of procedures to manipulate them, let's look at this a bit more carefully. Here again is the code for countleaves, with the two base cases, and the double recursion down the first branch and the rest of the branches of the tree. Slide 31.2.10 Let's use our substitution model to see how this procedure evolves, and we will do this in stages. Here is the simplest case. Let's apply count-leaves to a tree of one leaf. To be very careful, I am going to replace (list 2) with the actual box-and-pointer structure that is its value. Count-leaves first checks the base cases, but neither applies. Thus it reduces to adding count-leaves of the first subtree, or the car of the box-and-pointer structure to count-leaves of the rest of the branches, or the cdr or the box-and-pointer structure. Note that the first recursive call will catch the second base case, while the second recursive call will catch the first base case, and eventually this reduces to the correct value for the number of leaves in this tree, as seen in the original box and pointer diagram, which had exactly one element in it. Slide 31.2.11 Now let's try a bit more complicated tree, as shown. Notice that this tree has the property that its first branch is just a leaf, but its second branch, defined as everything but the first branch, is itself a tree. Count-leaves thus applies to the box-and-pointer structure shown. In this case, the recursive call will add the number of leaves in the first branch, which is just the leaf, 5, since that is what the car returns, to the number of leaves in the tree, (7) since that is what the cdr of the original structure returns. Thus, the first recursive call will trigger the second base case, returning 1 for the single leaf there. The second recursive call will reduce to the case we saw on the last slide, recognizing this as another tree. Thus it unwinds the tree one more level, then returning 1 for the leaf in the first branch, and 0 for the empty tree in the second branch

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.2.16 countleaves- bigger example And of course the recursive call down the rest of three does not trigger a base case, since the argument is a tree. Instead addcount-leaves of the first element(notice this is just the first element of that list, even though it is itself a tree) (count leaves ny-tree) tocount-leaves of the rest of the tree( i. e thecar of ( (count leaves a)(count leaves ((57)2 d(4(572 the argument) Slide 31.2.17 And now you get the idea. At each level of the tree, we are countleaves- bigger example callingcount-leaves on the first element of tree, and adding that to whatever we get by callcount-leaves on calls keep unwinding until they either reach a leaf (with a return s fount leaves /) foom R2 the remaining branches of the tree which is itself a tree. These ney-tree) l(4(572) value of 1)or an empty tree(with a return value of o), and all these values then get added up to return the final value 9122003 General operations on trees Slide 31.2.18 Of course, not only can we write procedures that directly (define( tree-map proc tree) manipulate trees, we can capture general patterns. Just as we (if(null? tree) had the notion ofmap for lists, we have a similar idea for trees (f (leaf? Tree) shown here (proc tree) Here we need to separate two different base cases: the empty (cons(tree- map proc(car tree) tree, and a leaf (or isolated element)of a tree. For the empty (tree-map proc (cdr tree)) tree, we just return an empty tree. For a leaf, we simply apply the procedure the general case, we have to be careful about our data 492a3 structure. a tree is a list each of whose elements might itself be a tree. So we can split a tree into iscar and its cdr, each of which is a tree. We must then map our procedure down each of these subpieces, and then glue them back together This is different than mapping down a list, where we could just directly cons the processed first element onto the remainder of the list

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology Slide 31.2.19 Note that induction holds in this case as well. for each of the General operations on trees two base cases, we get back an appropriate tree, either an empty(define(tree-map proc tree tree or a single leaf. In the inductive case we know that both (if(null? tree) thecar and the cdr of the tree are trees of smaller size. so (if(leaf? Tree we can assume thattree-map correctly returns a processed tree. Then we know that cons will glue each of these pieces ons( tree-map proc (tree-map proc (cdr tree))))) back into the larger tree, and hence by induction this code will correctly process trees of all sizes Induction still holds for this kind of procedure 920 6001S Even more general operations on trees Slide 31. 2.20 (define (tree-manip leaf-op init merge tree] We can capture higher order operations on trees, beyond just mapping. For example, this code allows us to specify what (if (leaf operation we want to apply to each leaf of the tree, and how we (merge(tree manip leaf-op init merge(car tree) want to glue pieces together Using this we can map procedures onto each element of the tre manip lambda o e x x) 0 cons (1 2(3 4)5)6)+1(4916)25)36) tree; or we can count the number of leaves in a tree; or we can (tree-manip(lambda (x]1)0+(1(2(34)5)6))26; number of leaves reverse the elements of the tree both at the top level and at each oda o x)"o lambada (a b)(apnd b (ist am"1 3 4) 5)6) subtree 9203 601 SICP 6.001 Notes: Section 31.3 Slide 31.3.1 Why search? Now. let's turn to a different issue. given that we can have fairly complex data structures: trees, each of whose subtree is a complex tree, the issue of how to manipulate that information Finding information distributed throughout complex data structures is often equally important becomes more complex. In the examples we just completed, we were mostly considering procedures that manipulated trees to return trees. But in many cases, we may want to decide if information is contained in a tree or to decide how to add data to a tree in a structured way. That is, trees can be used to collect related data together in a structured way, and we need to think about how to handle that interaction Thus, the issue of how to collect data into complex structures is really only half of the problem; we also want to be able to retrieve information from that structure

6.001 Structure and Interpretation of Computer Programs. Copyright o 2004 by Massachusetts Institute of Technology. Slide 31.3. 2 Examples of Search So we may want to find information in a complex structure by Look up a fiiend's email address searching through that structure; or we may want to know if the information is simply missing from the structure. This means Look up a business we want to search through data structures in efficient ways What does this mean: to search for information in a data structure? Well, if the problem is small in size, we can imagine just storing the data in a linear structure like a list. For example, ooking up an email address of a friend might only require business makes sense. Here each element of the list might be a smal list looking at a few tens of addresses, so just searching down a data structure that contains names addresses and other information; and search would involve walking down the list hecking each entry against a name, and then returning the full entry when we find the right one But suppose I want to find the address for some business, perhaps based on the name of the business. Given that there are hundreds of thousands of businesses, I don' t want to have to look through a linear list. Better would be first to restrict myself to a particular state, and then to a town, and then to a kind of business, and then look through a much smaller list. In this case, one can conceptually see why using a tree would be a much better way of structuring the information. I would only need to search at the top level through a set of branches to find the right state, thereby excluding huge portions of the data set. I could then search through a set of towns, and so on Slide 31.3.3 Examples of search Let's broaden this even further. Suppose I want to find a document on the World Wide Web. How do I structure that vast. Fund a document on the Web amount of data in a manner that makes retrieval efficient? 92 Web search and Web spiders Slide 31.3, 4 Well, at a conceptual level, what is the Web? Web=(rapidly growing) collection of documents Basically, it is a very rapidly growing collection of documents, Document-text(& other material)& links to other that are totally unordered( that is, there is no coherent organization collecting them together). A document consists of some text (or other information) that may be of interest to us ing Http protocl MPEnR information(ASCIL, HTML. GIF, JPG. plus links to other documents Program can extract links from document. hence To search through the Web, one starts with a particular document, which is identified by its URL (or Uniform Resource Locator). Using some common protocol for decoding the URL es some work, e.g index of documents(keywords) q12∞0 one can retrieve the file containing the do information about how the document or file is encoded(is this in basic characters, is it a web page, is it an image, or a movi and so on) Of particular interest are those links contained within the document that may point to other documents. So typically, a search through the Web involves invoking a spider, that starts with some initial set of documents (identified by their URLS), retrieves the actual documents, processes them to find links within those documents