Abstract data types This opened my mind,I started to grasp what it means to use the tool known as algebra.I'll be damned if anyone had ever told me before:over and again Mr.Dupuy [the mathematics teacher]was making pompous sentences on the subject,but not once would he say this simple word:it is a division of labor,which like any division of labor produces miracles, and allows the mind to concentrate all of its forces on just one side ofobjects,on just one of their qualities. What a difference it would have made for us if Mr.Dupuy had told us:This cheese is sof or it is hard,it is white,it is blue;it is old,it is young,it is yours,it is mine,it is light or it is heavy.Of so many qualities let us consider only the weight.Whatever that weight may be, let us call it A.Now,without thinking of the weight any more,let us apply to A everything that we know of quantities. Such a simple thing,yet no one was saying it to us in that faraway province... Stendhal,The Life of Henry Brulard,1836 For abstraction consists only in separating the perceptible qualities of bodies,either from other qualities,or from the bodies to which they apply.Errors arise when this separation is poorly done or wrongly applied:poorly done in philosophical questions,and wronghy applied in physical and mathematical questions.An almost sure way to err in philosophy is to fail to simplify enough the objects under study,and an infallible way to obtain defective results in physics and mathematics is to view the objects as less composite than they are. Denis Diderot,A Letter on the Blind for the Benefit of Those Who Can See,1749 ettingobets play the edo iusofare architectures requre that we describe them adequately.This chapter shows how. You are perhaps impatient to dive into the depths of object technology and explore the details of multiple inheritance,dynamic binding and other joys;then you may at first look at this chapter as an undue delay since it is mostly devoted to the study of some mathematical concepts(although all the mathematics involved is elementary). But in the same way that even the most gifted musician will benefit from learning a little music theory,knowing about abstract data types will help you understand and enjoy the practice of object-oriented analysis,design and programming,however attractive the concepts might already appear without the help of the theory.Since abstract data types
6 Abstract data types This opened my mind, I started to grasp what it means to use the tool known as algebra. I’ll be damned if anyone had ever told me before: over and again Mr. Dupuy [the mathematics teacher] was making pompous sentences on the subject, but not once would he say this simple word: it is a division of labor, which like any division of labor produces miracles, and allows the mind to concentrate all of its forces on just one side of objects, on just one of their qualities. What a difference it would have made for us if Mr. Dupuy had told us: This cheese is soft or it is hard; it is white, it is blue; it is old, it is young; it is yours, it is mine, it is light or it is heavy. Of so many qualities let us consider only the weight. Whatever that weight may be, let us call it A. Now, without thinking of the weight any more, let us apply to A everything that we know of quantities. Such a simple thing; yet no one was saying it to us in that faraway province… Stendhal, The Life of Henry Brulard, 1836. For abstraction consists only in separating the perceptible qualities of bodies, either from other qualities, or from the bodies to which they apply. Errors arise when this separation is poorly done or wrongly applied: poorly done in philosophical questions, and wrongly applied in physical and mathematical questions. An almost sure way to err in philosophy is to fail to simplify enough the objects under study; and an infallible way to obtain defective results in physics and mathematics is to view the objects as less composite than they are. Denis Diderot, A Letter on the Blind for the Benefit of Those Who Can See, 1749. Letting objects play the lead role in our software architectures requires that we describe them adequately. This chapter shows how. You are perhaps impatient to dive into the depths of object technology and explore the details of multiple inheritance, dynamic binding and other joys; then you may at first look at this chapter as an undue delay since it is mostly devoted to the study of some mathematical concepts (although all the mathematics involved is elementary). But in the same way that even the most gifted musician will benefit from learning a little music theory, knowing about abstract data types will help you understand and enjoy the practice of object-oriented analysis, design and programming, however attractive the concepts might already appear without the help of the theory. Since abstract data types
122 ABSTRACT DATA TYPES $6.1 establish the theoretical basis for the entire method,the consequences of the ideas introduced in this chapter will be felt throughout the rest of this book. There is more.As we will see at chapter end,these consequences actually extend beyond the study of software proper,yielding a few principles of intellectual investigation which one may perhaps apply to other disciplines. 6.1 CRITERIA To obtain proper descriptions of objects,we need a method satisfying three conditions: The descriptions should be precise and unambiguous. They should be complete-or at least as complete as we want them in each case (we may decide to leave some details out). They should not be overspecifying. The last point is what makes the answer non-trivial.It is after all easy to be precise, unambiguous and complete if we "spill the beans"by giving out all the details of the objects'representation.But this is usually too much information for the authors of software elements that need to access the objects. This observation is close to the comments that led to the notion of information "Information Hid- hiding.The concern there was that by providing a module's source code (or,more ing".page 51. generally,implementation-related elements)as the primary source of information for the authors of software elements that rely on that module,we may drown them in a flood of details,prevent them from concentrating on their own job,and hamper prospects of smooth evolution.Here the danger is the same if we let modules use a certain data structure on the basis of information that pertains to the structure's representation rather than to its essential properties. 6.2 IMPLEMENTATION VARIATIONS To understand better why the need for abstract data descriptions is so crucial,let us explore further the potential consequences of using physical representation as the basis for describing objects. A well-known and convenient example is the description of stack objects.A stack object serves to pile up and retrieve other objects in a last-in,first-out ("LIFO")manner, the latest inserted element being the first one to be retrieved.The stack is a ubiquitous structure in computing science and in many software systems;the typical compiler or interpreter,for example,is peppered with stacks of many kinds. Stacks,it must be said,are also ubiquitous in didactic presentations of abstract data types, so much so that Edsger Dijkstra is said to have once quipped that"abstract data types are a remarkable theory,whose purpose is to describe stacks".Fair enough.But the notion of abstract data type applies to so many more advanced cases in the rest of this book that I do not feel ashamed of starting with this staple example.It is the simplest I know which includes about every important idea about abstract data types
122 ABSTRACT DATA TYPES §6.1 establish the theoretical basis for the entire method, the consequences of the ideas introduced in this chapter will be felt throughout the rest of this book. There is more. As we will see at chapter end, these consequences actually extend beyond the study of software proper, yielding a few principles of intellectual investigation which one may perhaps apply to other disciplines. 6.1 CRITERIA To obtain proper descriptions of objects, we need a method satisfying three conditions: • The descriptions should be precise and unambiguous. • They should be complete — or at least as complete as we want them in each case (we may decide to leave some details out). • They should not be overspecifying. The last point is what makes the answer non-trivial. It is after all easy to be precise, unambiguous and complete if we “spill the beans” by giving out all the details of the objects’ representation. But this is usually too much information for the authors of software elements that need to access the objects. This observation is close to the comments that led to the notion of information hiding. The concern there was that by providing a module’s source code (or, more generally, implementation-related elements) as the primary source of information for the authors of software elements that rely on that module, we may drown them in a flood of details, prevent them from concentrating on their own job, and hamper prospects of smooth evolution. Here the danger is the same if we let modules use a certain data structure on the basis of information that pertains to the structure’s representation rather than to its essential properties. 6.2 IMPLEMENTATION VARIATIONS To understand better why the need for abstract data descriptions is so crucial, let us explore further the potential consequences of using physical representation as the basis for describing objects. A well-known and convenient example is the description of stack objects. A stack object serves to pile up and retrieve other objects in a last-in, first-out (“LIFO”) manner, the latest inserted element being the first one to be retrieved. The stack is a ubiquitous structure in computing science and in many software systems; the typical compiler or interpreter, for example, is peppered with stacks of many kinds. Stacks, it must be said, are also ubiquitous in didactic presentations of abstract data types, so much so that Edsger Dijkstra is said to have once quipped that “abstract data types are a remarkable theory, whose purpose is to describe stacks”. Fair enough. But the notion of abstract data type applies to so many more advanced cases in the rest of this book that I do not feel ashamed of starting with this staple example. It is the simplest I know which includes about every important idea about abstract data types. “Information Hiding”, page 51
$6.2 IMPLEMENTATION VARIATIONS 123 Stack representations Several possible physical representations exist for stacks: Three possible capacity “Push”operation: representations count :count for a stack count representation [count:=x (ARRAY_UP) representation capacity “Push”operation: representation [free]:=x (ARRAY_DOWN) free:=free-1 free representation “Push”operation: item previous new(n) previous item n.item=x n.previous:=last previous item last n (LINKED) previous ifem The figure illustrates three of the most common representations.Each has been given a name for ease of reference: ARRAY UP:represent a stack through an array representation and an integer count whose value ranges from 0(for an empty stack)to capacity,the size of the array representation;stack elements are stored in the array at indices 1 up to count ARRAY DOWN:like ARRAY UP,but with elements stored from the end of the array rather than from the beginning.Here the integer is called free (it is the index of the highest free array position,or 0 if all positions are occupied)and ranges from capacity for an empty stack down to 0.The stack elements are stored in the array at indices capacity down to free +I. LINKED:a linked representation which stores each stack element in a cell with two fields:item representing the element,and previous containing a pointer to the cell containing the previously pushed element.The representation also needs last,a pointer to the cell representing the top
§6.2 IMPLEMENTATION VARIATIONS 123 Stack representations Several possible physical representations exist for stacks: The figure illustrates three of the most common representations. Each has been given a name for ease of reference: • ARRAY_UP: represent a stack through an array representation and an integer count whose value ranges from 0 (for an empty stack) to capacity, the size of the array representation; stack elements are stored in the array at indices 1 up to count. • ARRAY_DOWN: like ARRAY_UP, but with elements stored from the end of the array rather than from the beginning. Here the integer is called free (it is the index of the highest free array position, or 0 if all positions are occupied) and ranges from capacity for an empty stack down to 0. The stack elements are stored in the array at indices capacity down to free + 1. • LINKED: a linked representation which stores each stack element in a cell with two fields: item representing the element, and previous containing a pointer to the cell containing the previously pushed element. The representation also needs last, a pointer to the cell representing the top. Three possible representations for a stack representation (ARRAY_UP) “Push” operation: count := count + 1 count representation [count] := x capacity 1 representation (ARRAY_DOWN) “Push” operation: representation [free] := x free := free – 1 free capacity 1 (LINKED) “Push” operation: new (n) n ● item := x n ● previous := last last := n item item item item previous previous previous previous last
124 ABSTRACT DATA TYPES $6.2 Next to each representation,the figure shows a program extract (in Pascal-like notation)giving the corresponding implementation for a basic stack operation:pushing an element x onto the top. For the array representations,ARRAY_UP and ARRAY_DOWN,the instructions increase or decrease the top indicator (count or free)and assign x to the corresponding array element.Since these representations support stacks of at most capacity elements, robust implementations should include guards of the respective forms if count capacity then .. if free >0 then... which the figure omits for simplicity. For LINKED,the linked representation,pushing an element requires four operations: create a new cell n(done here with Pascal's new procedure,which allocates space for a new object);assign x to the new cell's ifem field;chain the new cell to the earlier stack top by assigning to its previous field the current value of last;and update last so that it will now be attached to the newly created cell. Although these are the most frequently used stack representations,many others exist. For example if you need two stacks of elements of the same type,and have only limited space available,you may rely on a single array with two integer top markers,count as in ARRAY_UP and free as in ARRAY_DOWN;one of the stacks will grow up and the other will grow down.The representation is full if and only if count=free. capacity Head-to-head Stack 2 representation for two stacks free CO111 representation Stack 1 The advantage,of course,is to lessen the risk of running out of space:with two arrays of capacity n representing stacks under ARRAY UP or ARRAY DOWN,you exhaust the available space whenever either stack reaches n elements;with a single array of size 2n holding two head-to-head stacks,you run out when the combined size reaches 2n,a less likely occurrence if the two stacks grow independently.(For any variable values p and g, max (p+q)<max (p)+max (g).) Each of these and other possible representations is useful in some cases.Choosing one of them as"the"definition of stacks would be a typical case ofoverspecification.Why should we consider ARRAY UP,for example,more representative than LINKED?The most visible properties of ARRAY_UP-the array,the integer count,the upper bound-are irrelevant to an understanding of the underlying structure
124 ABSTRACT DATA TYPES §6.2 Next to each representation, the figure shows a program extract (in Pascal-like notation) giving the corresponding implementation for a basic stack operation: pushing an element x onto the top. For the array representations, ARRAY_UP and ARRAY_DOWN, the instructions increase or decrease the top indicator (count or free) and assign x to the corresponding array element. Since these representations support stacks of at most capacity elements, robust implementations should include guards of the respective forms if count 0 then … which the figure omits for simplicity. For LINKED, the linked representation, pushing an element requires four operations: create a new cell n (done here with Pascal’s new procedure, which allocates space for a new object); assign x to the new cell’s item field; chain the new cell to the earlier stack top by assigning to its previous field the current value of last; and update last so that it will now be attached to the newly created cell. Although these are the most frequently used stack representations, many others exist. For example if you need two stacks of elements of the same type, and have only limited space available, you may rely on a single array with two integer top markers, count as in ARRAY_UP and free as in ARRAY_DOWN; one of the stacks will grow up and the other will grow down. The representation is full if and only if count = free. The advantage, of course, is to lessen the risk of running out of space: with two arrays of capacity n representing stacks under ARRAY_UP or ARRAY_DOWN, you exhaust the available space whenever either stack reaches n elements; with a single array of size 2n holding two head-to-head stacks, you run out when the combined size reaches 2n, a less likely occurrence if the two stacks grow independently. (For any variable values p and q, max (p+q) ≤ max (p) + max (q).) Each of these and other possible representations is useful in some cases. Choosing one of them as “the” definition of stacks would be a typical case of overspecification. Why should we consider ARRAY_UP, for example, more representative than LINKED? The most visible properties of ARRAY_UP — the array, the integer count, the upper bound — are irrelevant to an understanding of the underlying structure. Head-to-head representation for two stacks representation count 1 capacity free Stack 2 Stack 1
$6.2 IMPLEMENTATION VARIATIONS 125 The danger of overspecification Why is it so bad to use a particular representation as specification? “ABOUT SOFT. The results of the Lientz and Swanson maintenance study,which you may recall, WARE MAINTE- give a hint.More than 17%of software costs was found to come from the need to take into NANCE”,I.3.page account changes of data formats.As was noted in the discussion,too many programs are 17. closely tied to the physical structure of the data they manipulate.A method relying on the physical representation of data structures to guide analysis and design would not be likely to yield flexible software. So if we are to use objects or object types as the basis of our system architectures, we should find a better description criterion than the physical representation. How long is a middle initial? Lest stacks make us forget that,beyond the examples favored by computer scientists,data structures are ultimately connected with real-life objects,here is an amusing example, taken from a posting on the Risks forum(comp.risks Usenet newsgroup)of the dangers of a view of data that is too closely dependent on concrete properties: Risks forum,10.74, My dear mother blessed (or perhaps cursed)all of her children with two middle initials, 3Jan.1993.Post- in my case“D”and“E".This has cau.sed me a good deal of trouble. ing by Darrell D.E. Long:Dehuman- It seems that TRW sells certain parts of your credit information,such as your name and ization by old a demographic profile.I recently got a new credit card from Gottchalks and found to my Cobol programs". chagrin that my name had been truncated to "Darrell D.Long".I went to the credit Abbreviated. manager and was assured that things would be fixed.Well,two things happened:I got a new credit card,this time as"Darrell E.Long",and TRW now has an annotation in my See exercise E6.5. file to the effect"File variation:middle initial is E".Soon after this I start getting mail page 161. for "Darrell E.Long"(along with the usual "Darrell Long"and "Darrell D.Long"and the occasional "Darrell D.E.Long"). I called up the credit bureau and it seems that the programmer who coded up the TRW database decided that all good Americans are entitled to only one middle initial.As the woman on the phone patiently told me "They only allocated enough megabytes (sic)in the system for one middle initial,and it would probably be awfully hard to change". Aside from the typical example of technobabble justification ("megabytes"),the lesson here is the need to avoid tying software to the exact physical properties of data. TRW's system seems similar to those programs,mentioned in an earlier discussion,which "knew"that postal codes consist of exactly five digits. See page 18. The author of the message reproduced above was mainly concerned about junk mail, an unpleasant but not life-threatening event;the archives of the Risks forum are full of computer-originated name confusions with more serious consequences.The "millenium problem",mentioned in the discussion of software maintenance,is another example ofthe dangers of accessing data based on physical representation,this one with hundreds of millions of dollars'worth of consequences
§6.2 IMPLEMENTATION VARIATIONS 125 The danger of overspecification Why is it so bad to use a particular representation as specification? The results of the Lientz and Swanson maintenance study, which you may recall, give a hint. More than 17% of software costs was found to come from the need to take into account changes of data formats. As was noted in the discussion, too many programs are closely tied to the physical structure of the data they manipulate. A method relying on the physical representation of data structures to guide analysis and design would not be likely to yield flexible software. So if we are to use objects or object types as the basis of our system architectures, we should find a better description criterion than the physical representation. How long is a middle initial? Lest stacks make us forget that, beyond the examples favored by computer scientists, data structures are ultimately connected with real-life objects, here is an amusing example, taken from a posting on the Risks forum (comp.risks Usenet newsgroup) of the dangers of a view of data that is too closely dependent on concrete properties: My dear mother blessed (or perhaps cursed) all of her children with two middle initials, in my case “D” and “E”. This has caused me a good deal of trouble. It seems that TRW sells certain parts of your credit information, such as your name and a demographic profile. I recently got a new credit card from Gottchalks and found to my chagrin that my name had been truncated to “Darrell D. Long”. I went to the credit manager and was assured that things would be fixed. Well, two things happened: I got a new credit card, this time as “Darrell E. Long”, and TRW now has an annotation in my file to the effect “File variation: middle initial is E”. Soon after this I start getting mail for “Darrell E. Long” (along with the usual “Darrell Long” and “Darrell D. Long” and the occasional “Darrell D. E. Long”). I called up the credit bureau and it seems that the programmer who coded up the TRW database decided that all good Americans are entitled to only one middle initial. As the woman on the phone patiently told me “They only allocated enough megabytes (sic) in the system for one middle initial, and it would probably be awfully hard to change”. Aside from the typical example of technobabble justification (“megabytes”), the lesson here is the need to avoid tying software to the exact physical properties of data. TRW’s system seems similar to those programs, mentioned in an earlier discussion, which “knew” that postal codes consist of exactly five digits. The author of the message reproduced above was mainly concerned about junk mail, an unpleasant but not life-threatening event; the archives of the Risks forum are full of computer-originated name confusions with more serious consequences. The “millenium problem”, mentioned in the discussion of software maintenance, is another example of the dangers of accessing data based on physical representation, this one with hundreds of millions of dollars’ worth of consequences. “ABOUT SOFTWARE MAINTENANCE”, 1.3, page 17. Risks forum, 10.74, 3 Jan. 1993. Posting by Darrell D.E. Long: ``Dehumanization by old Cobol programs''. Abbreviated. See exercise E6.5, page 161. See page 18
126 ABSTRACT DATA TYPES $6.3 6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS How do we retain completeness,precision and non-ambiguity without paying the price of overspecification? Using the operations In the stack example,what unites the various representations in spite of all their differences is that they describe a"container"structure (a structure used to contain other objects),where certain operations are applicable and enjoy certain properties.By focusing not on a particular choice of representation but on these operations and properties,we may be able to obtain an abstract yet useful characterization of the notion of stack. The operations typically available on a stack are the following: A command to push an element on top of a stack.Let us call that operation put. A command to remove the stack's top element,if the stack is not empty.Let us call it remove. A query to find out what the topelement is,ifthe stack is not empty.Let us call it item. .A query to determine whether the stack is empty.(This will enable clients to determine beforehand if they can use remove and item. In addition we may need a creator operation giving us a stack,initially empty.Let us call it make. Two points may have caught your attention and will deserve more explanation later in this chapter.First,the operation names may seem surprising;for the moment,just think of put as meaning push,remove as meaning pop,and item as meaning top.Details shortly (on the facing page,actually).Second,the operations have been divided into three categories:creators,which yield objects;queries,which return information about objects;and commands,which can modify objects.This classification will also require some more comments. In a traditional view of data structures,we would consider that the notion of stack is given by some data declaration corresponding to one of the above representations,for example (representation ARRAY_UP,Pascal-like syntax): count:INTEGER representation:array [1..capacity]of STACK ELEMENT TYPE where capacity,a constant integer,is the maximum number ofelements on the stack.Then put,remove,item,empty and make would be routines (subprograms)that work on the object structures defined by these declarations. The key step towards data abstraction is to reverse the viewpoint:forget for the moment about the representation;take the operations themselves as defining the data structure.In other words,a stack is any structure to which clients may apply the operations listed above
126 ABSTRACT DATA TYPES §6.3 6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS How do we retain completeness, precision and non-ambiguity without paying the price of overspecification? Using the operations In the stack example, what unites the various representations in spite of all their differences is that they describe a “container” structure (a structure used to contain other objects), where certain operations are applicable and enjoy certain properties. By focusing not on a particular choice of representation but on these operations and properties, we may be able to obtain an abstract yet useful characterization of the notion of stack. The operations typically available on a stack are the following: • A command to push an element on top of a stack. Let us call that operation put. • A command to remove the stack’s top element, if the stack is not empty. Let us call it remove. • A query to find out what the top element is, if the stack is not empty. Let us call it item. • A query to determine whether the stack is empty. (This will enable clients to determine beforehand if they can use remove and item.) In addition we may need a creator operation giving us a stack, initially empty. Let us call it make. Two points may have caught your attention and will deserve more explanation later in this chapter. First, the operation names may seem surprising; for the moment, just think of put as meaning push, remove as meaning pop, and item as meaning top. Details shortly (on the facing page, actually). Second, the operations have been divided into three categories: creators, which yield objects; queries, which return information about objects; and commands, which can modify objects. This classification will also require some more comments. In a traditional view of data structures, we would consider that the notion of stack is given by some data declaration corresponding to one of the above representations, for example (representation ARRAY_UP, Pascal-like syntax): count: INTEGER representation: array [1 ● ● capacity] of STACK_ELEMENT_TYPE where capacity, a constant integer, is the maximum number of elements on the stack. Then put, remove, item, empty and make would be routines (subprograms) that work on the object structures defined by these declarations. The key step towards data abstraction is to reverse the viewpoint: forget for the moment about the representation; take the operations themselves as defining the data structure. In other words, a stack is any structure to which clients may apply the operations listed above
$6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS 127 A laissez-faire policy for the society of modules The method just outlined for describing data structures shows a rather selfish approach to the world of data structures:like an economist of the most passionate supply-side, invisible-hand,let-the-free-market-decide school,we are interested in individual agents not so much for what they are internally as for what they have to offer to each other.The world ofobjects (and hence of software architecture)will be a world of interacting agents, communicating on the basis of precisely defined protocols. The economic analogy will indeed accompany us throughout this presentation;the agents-the software modules-are called suppliers and clients;the protocols will be called contracts,and much of object-oriented design is indeed Design by Contract,the title of a later chapter. See "BEYOND As always with analogies,we should not get too carried away:this work is not a SOFTWARE”,6.6. textbook on economics,and contains no hint of its author's views in that field.It will page 147. suffice for the moment to note the remarkable analogies of the abstract data type approach to some theories of how human agents should work together.Later in this chapter we will again explore what abstract data types can tell us beyond their original area of application. Name consistency For the moment,let us get back to more immediate concerns,and make sure you are comfortable with the above example specification in all its details.If you have encountered stacks before,the operation names chosen for the discussion of stacks may have surprised or even shocked you.Any self-respecting computer scientist will know stack operations under other names: Common stack operation name Name used here push put pop remove top item new make Why use anything else than the traditional terminology?The reason is a desire to take a high-level view of data structures-especially "containers",those data structures used to keep objects. Stacks are just one brand of container;more precisely,they belong to a category of containers which we may call dispensers.A dispenser provides its clients with a mechanism for storing (pur),retrieving (item)and removing (remove)objects,but without giving them any control over the choice of object to be stored,retrieved or removed.For example,the LIFO policy of stacks implies that you may only retrieve or remove the element that was stored last.Another brand of dispenser is the queue,which has a first-in, first-out(FIFO)policy:you store at one end,retrieve and remove at the other;the element
§6.3 TOWARDS AN ABSTRACT VIEW OF OBJECTS 127 A laissez-faire policy for the society of modules The method just outlined for describing data structures shows a rather selfish approach to the world of data structures: like an economist of the most passionate supply-side, invisible-hand, let-the-free-market-decide school, we are interested in individual agents not so much for what they are internally as for what they have to offer to each other. The world of objects (and hence of software architecture) will be a world of interacting agents, communicating on the basis of precisely defined protocols. The economic analogy will indeed accompany us throughout this presentation; the agents — the software modules — are called suppliers and clients; the protocols will be called contracts, and much of object-oriented design is indeed Design by Contract, the title of a later chapter. As always with analogies, we should not get too carried away: this work is not a textbook on economics, and contains no hint of its author’s views in that field. It will suffice for the moment to note the remarkable analogies of the abstract data type approach to some theories of how human agents should work together. Later in this chapter we will again explore what abstract data types can tell us beyond their original area of application. Name consistency For the moment, let us get back to more immediate concerns, and make sure you are comfortable with the above example specification in all its details. If you have encountered stacks before, the operation names chosen for the discussion of stacks may have surprised or even shocked you. Any self-respecting computer scientist will know stack operations under other names: Why use anything else than the traditional terminology? The reason is a desire to take a high-level view of data structures — especially “containers”, those data structures used to keep objects. Stacks are just one brand of container; more precisely, they belong to a category of containers which we may call dispensers. A dispenser provides its clients with a mechanism for storing (put), retrieving (item) and removing (remove) objects, but without giving them any control over the choice of object to be stored, retrieved or removed. For example, the LIFO policy of stacks implies that you may only retrieve or remove the element that was stored last. Another brand of dispenser is the queue, which has a first-in, first-out (FIFO) policy: you store at one end, retrieve and remove at the other; the element Common stack operation name Name used here push put pop remove top item new make See “BEYOND SOFTWARE”, 6.6, page 147
128 ABSTRACT DATA TYPES $6.3 that you retrieve or remove is the oldest one stored but not yet removed.An example of a container which is not a dispenser is an array,where you choose,through integer indices, the positions where you store and retrieve objects. Because the similarities between various kinds of container(dispensers,arrays and others)are more important than the differences between their individual storage,retrieval and removal properties,this book constantly adheres to a standardized terminology which downplays the differences between data structure variants and instead emphasizes the commonality.So the basic operation to retrieve an element will always be called item,the basic operation to remove an element will always be called remove and so on. These naming issues may appear superficial at first-"cosmetic",as programmers sometimes say.But do not forget that one of our eventual aims is to provide the basis for powerful,professional libraries of reusable software components.Such libraries will contain tens of thousands of available operations.Without a systematic and clear nomenclature,both the developers and the users of these libraries would quickly be swamped in a flood of specific and incompatible names,providing a strong (and unjustifiable)obstacle to large-scale reuse. Naming,then,is not cosmetic.Good reusable software is software that provides the right functionality and provides it under the right names. The names used here for stack operations are part of a systematic set of naming Chapter 26,in par- conventions used throughout this book.A later chapter will introduce them in more detail. ticular“CHOOS- ING THE RIGHT How not to handle abstractions NAMES”,26.2. page 879. In software engineering as in other scientific and technical disciplines,a seminal idea may seem obvious once you have been exposed to it,even though it may have taken a long time to emerge.The bad ideas and the complicated ones(they are often the same)often appear first;it takes time for the simple and the elegant to take over. This observation is true of abstract data types.Although good software developers have always(as a result ofeducation or mere instinct)made good use of abstraction,many of the systems in existence today were designed without much consideration of this goal. I once did a little involuntary experiment which provided a good illustration of this state of affairs.While setting up the project part of a course which I was teaching,I decided to provide students with a sort of anonymous marketplace,where they could place mock"for sale"announcements of software modules,without saying who was the source of the advertisement.(The idea,which may or may not have been a good one,was to favor a selection process based only on a precise specification of the modules'advertized facilities.)The mail facility of a famous operating system commonly favored by universities seemed to provide the right base mechanism (why write a new mail system just for a course project?);but naturally that mail facility shows the sender's name when it delivers a message to its recipients.I had access to the source of the corresponding code -a huge C program-and decided,perhaps foolishly,to take that code,remove all references to the sender's name in delivered messages,and recompile
128 ABSTRACT DATA TYPES §6.3 that you retrieve or remove is the oldest one stored but not yet removed. An example of a container which is not a dispenser is an array, where you choose, through integer indices, the positions where you store and retrieve objects. Because the similarities between various kinds of container (dispensers, arrays and others) are more important than the differences between their individual storage, retrieval and removal properties, this book constantly adheres to a standardized terminology which downplays the differences between data structure variants and instead emphasizes the commonality. So the basic operation to retrieve an element will always be called item, the basic operation to remove an element will always be called remove and so on. These naming issues may appear superficial at first — “cosmetic”, as programmers sometimes say. But do not forget that one of our eventual aims is to provide the basis for powerful, professional libraries of reusable software components. Such libraries will contain tens of thousands of available operations. Without a systematic and clear nomenclature, both the developers and the users of these libraries would quickly be swamped in a flood of specific and incompatible names, providing a strong (and unjustifiable) obstacle to large-scale reuse. Naming, then, is not cosmetic. Good reusable software is software that provides the right functionality and provides it under the right names. The names used here for stack operations are part of a systematic set of naming conventions used throughout this book. A later chapter will introduce them in more detail. How not to handle abstractions In software engineering as in other scientific and technical disciplines, a seminal idea may seem obvious once you have been exposed to it, even though it may have taken a long time to emerge. The bad ideas and the complicated ones (they are often the same) often appear first; it takes time for the simple and the elegant to take over. This observation is true of abstract data types. Although good software developers have always (as a result of education or mere instinct) made good use of abstraction, many of the systems in existence today were designed without much consideration of this goal. I once did a little involuntary experiment which provided a good illustration of this state of affairs. While setting up the project part of a course which I was teaching, I decided to provide students with a sort of anonymous marketplace, where they could place mock “for sale” announcements of software modules, without saying who was the source of the advertisement. (The idea, which may or may not have been a good one, was to favor a selection process based only on a precise specification of the modules’ advertized facilities.) The mail facility of a famous operating system commonly favored by universities seemed to provide the right base mechanism (why write a new mail system just for a course project?); but naturally that mail facility shows the sender’s name when it delivers a message to its recipients. I had access to the source of the corresponding code — a huge C program — and decided, perhaps foolishly, to take that code, remove all references to the sender’s name in delivered messages, and recompile. Chapter 26, in particular “CHOOSING THE RIGHT NAMES”, 26.2, page 879
$6.4 FORMALIZING THE SPECIFICATION 129 Aided by a teaching assistant,I thus embarked on a task which seemed obvious enough although not commonly taught in software engineering courses:systematic program deconstruction.Sure enough,we quickly found the first place where the program accessed the sender's name,and we removed the corresponding code.This,we naively thought,would have done the job,so we recompiled and sent a test mail message;but the sender's name was still there!Thus began a long and surreal process:time and again, believing we had finally found the last reference to the sender's name,we would remove it,recompile,and mail a test message,only to find the name duly recorded once again in its habitual field.Like the Hydra in its famous fight,the mailer kept growing a new head every time we thought we had cut the last neck. Finally,repeating for the modern era the earlier feat of Hercules,we slew the beast for good;by then we had removed more than twenty code extracts which all accessed,in some way or other,information about the message sender. Writing MAIL Although the previous sections have only got us barely started on our road to abstract MESSAGE is the data types,it should be clear by now that any program written in accordance with even the topic of exercise E6.4,page161. most elementary concepts of data abstraction would treat MAIL MESSAGE as a carefully defined abstract notion,supporting a query operation,perhaps called sender,which returns information about the message sender.Any portion of the mail program that needs this information would obtain it solely through the sender query.Had the mail program been designed according to this seemingly obvious principle,it would have been sufficient,for the purpose of my little exercise,to modify the code of the sender query. Most likely,the software would also then have provided an associated command operation set sender to update sender information,making the job even easier. What is the real moral of that little story (besides lowering the reader's guard in preparation for the surprise mathematical offensive of the next section)?After all,the mail program in question is successful,at least judging by its widespread use.But it typifies the current quality standard in the industry.Until we move significantly beyond that standard, the phrase "software engineering"will remain a case of wishful thinking Oh yes,one more note.Some time after my brief encounter with the mail program, I read that certain network hackers had intruded into the computer systems of highly guarded government laboratories,using a security hole of that very mail program-a hole which was familiar,so the press reported,to all those in the know.I was not in the know; but,when I learned the news,I was not surprised. 6.4 FORMALIZING THE SPECIFICATION The glimpse of data abstraction presented so far is too informal to be of durable use. Consider again our staple example:a stack,as we now understand it,is defined in terms of the applicable operations;but then we need to define these operations! Informal descriptions as above(put pushes an element"on top of"the stack,remove pops the element"last pushed"and so on)do not suffice.We need to know precisely how these operations can be used by clients,and what they will do for them
§6.4 FORMALIZING THE SPECIFICATION 129 Aided by a teaching assistant, I thus embarked on a task which seemed obvious enough although not commonly taught in software engineering courses: systematic program deconstruction. Sure enough, we quickly found the first place where the program accessed the sender’s name, and we removed the corresponding code. This, we naïvely thought, would have done the job, so we recompiled and sent a test mail message; but the sender’s name was still there! Thus began a long and surreal process: time and again, believing we had finally found the last reference to the sender’s name, we would remove it, recompile, and mail a test message, only to find the name duly recorded once again in its habitual field. Like the Hydra in its famous fight, the mailer kept growing a new head every time we thought we had cut the last neck. Finally, repeating for the modern era the earlier feat of Hercules, we slew the beast for good; by then we had removed more than twenty code extracts which all accessed, in some way or other, information about the message sender. Although the previous sections have only got us barely started on our road to abstract data types, it should be clear by now that any program written in accordance with even the most elementary concepts of data abstraction would treat MAIL_MESSAGE as a carefully defined abstract notion, supporting a query operation, perhaps called sender, which returns information about the message sender. Any portion of the mail program that needs this information would obtain it solely through the sender query. Had the mail program been designed according to this seemingly obvious principle, it would have been sufficient, for the purpose of my little exercise, to modify the code of the sender query. Most likely, the software would also then have provided an associated command operation set_sender to update sender information, making the job even easier. What is the real moral of that little story (besides lowering the reader’s guard in preparation for the surprise mathematical offensive of the next section)? After all, the mail program in question is successful, at least judging by its widespread use. But it typifies the current quality standard in the industry. Until we move significantly beyond that standard, the phrase “software engineering” will remain a case of wishful thinking. Oh yes, one more note. Some time after my brief encounter with the mail program, I read that certain network hackers had intruded into the computer systems of highly guarded government laboratories, using a security hole of that very mail program — a hole which was familiar, so the press reported, to all those in the know. I was not in the know; but, when I learned the news, I was not surprised. 6.4 FORMALIZING THE SPECIFICATION The glimpse of data abstraction presented so far is too informal to be of durable use. Consider again our staple example: a stack, as we now understand it, is defined in terms of the applicable operations; but then we need to define these operations! Informal descriptions as above (put pushes an element “on top of” the stack, remove pops the element “last pushed” and so on) do not suffice. We need to know precisely how these operations can be used by clients, and what they will do for them. Writing MAIL_ MESSAGE is the topic of exercise E6.4, page 161
130 ABSTRACT DATA TYPES $6.4 An abstract data type specification will provide this information.It consists of four paragraphs,explained in the next sections: ·TYPES ·FUNCTIONS. ·AXIOMS ·PRECONDITIONS. These paragraphs will rely on a simple mathematical notation for specifying the properties of an abstract data type (ADT for short). The notation-a mathematical formalism,not to be confused with the software notation of the rest of this book even though for consistency it uses a similar syntactic style-has no name and is not a programming language;it could serve as the starting point for a formal specification language,but we shall not pursue this avenue here, being content enough to use self-explanatory conventions for the unambiguous specification of abstract data types. Specifying types The TYPES paragraph indicates the types being specified.In general,it may be convenient to specify several ADTs together,although our example has only one,STACK. By the way,what is a type?The answer to this question will combine all the ideas developed in the rest of this chapter;a type is a collection of objects characterized by functions,axioms and preconditions.If for the moment you just view a type as a set of objects,in the mathematical sense of the word "set"-type STACK as the set of all possible stacks,type INTEGER as the set of all possible integer values and so on-you are not guilty of any terrible misunderstanding.As you read this discussion you will be able to refine this view.In the meantime the discussion will not be too fussy about using “set”for"type”and conversely. On one point,however,you should make sure to avoid any confusion:an abstract data type such as STACK is not an object(one particular stack)but a collection of objects (the set of all stacks).Remember what our real goal is:finding a good basis for the modules of our software systems.As was noted in the previous chapter,basing a module on one particular object-one stack,one airplane,one bank account-would not make sense.O-O design will enable us to build modules covering the properties of all stacks,all airplanes,all bank accounts-or at least of some stacks,airplanes or accounts. An object belonging to the set of objects described by an ADT specification is called an instance of the ADT.For example,a specific stack which satisfies the properties of the STACK abstract data type will be an instance of STACK.The notion of instance will carry over to object-oriented design and programming,where it will play an important role in explaining the run-time behavior of programs
130 ABSTRACT DATA TYPES §6.4 An abstract data type specification will provide this information. It consists of four paragraphs, explained in the next sections: • TYPES. • FUNCTIONS. • AXIOMS. • PRECONDITIONS. These paragraphs will rely on a simple mathematical notation for specifying the properties of an abstract data type (ADT for short). The notation — a mathematical formalism, not to be confused with the software notation of the rest of this book even though for consistency it uses a similar syntactic style — has no name and is not a programming language; it could serve as the starting point for a formal specification language, but we shall not pursue this avenue here, being content enough to use self-explanatory conventions for the unambiguous specification of abstract data types. Specifying types The TYPES paragraph indicates the types being specified. In general, it may be convenient to specify several ADTs together, although our example has only one, STACK. By the way, what is a type? The answer to this question will combine all the ideas developed in the rest of this chapter; a type is a collection of objects characterized by functions, axioms and preconditions. If for the moment you just view a type as a set of objects, in the mathematical sense of the word “set” — type STACK as the set of all possible stacks, type INTEGER as the set of all possible integer values and so on — you are not guilty of any terrible misunderstanding. As you read this discussion you will be able to refine this view. In the meantime the discussion will not be too fussy about using “set” for “type” and conversely. On one point, however, you should make sure to avoid any confusion: an abstract data type such as STACK is not an object (one particular stack) but a collection of objects (the set of all stacks). Remember what our real goal is: finding a good basis for the modules of our software systems. As was noted in the previous chapter, basing a module on one particular object — one stack, one airplane, one bank account — would not make sense. O-O design will enable us to build modules covering the properties of all stacks, all airplanes, all bank accounts — or at least of some stacks, airplanes or accounts. An object belonging to the set of objects described by an ADT specification is called an instance of the ADT. For example, a specific stack which satisfies the properties of the STACK abstract data type will be an instance of STACK. The notion of instance will carry over to object-oriented design and programming, where it will play an important role in explaining the run-time behavior of programs