23 Principles of class design erinere develoers knw that fewisathan theroe design of module interfaces.In a multi-person,or just multi-week software project,many of the decisions,discussions,disputes and confusions tend to revolve around matters of module interface specification:"Who takes care of making sure that...?","But I thought you only passed me normalized input...","Why are you processing this since I already took care of it??” If there were just one advantage to expect from object technology,this would have to be it.From the outset of this presentation,object-oriented development has been described as an architectural technique for producing systems made of coherent,properly interfaced modules.We have now accumulated enough technical background to review the design principles through which you can take advantage of the best O-O mechanisms to develop modules with attractive interfaces. In the following pages we will explore a set of class design principles which extensive practice has shown to yield quality and durability.Because what determines the success of a class is how it will look to its clients,the emphasis here is not on the internal implementation of a class but on how to make its interface simple,easy to learn,easy to remember,and able to withstand the test of time and change. We will successively examine:whether functions should be permitted to have side effects;how many arguments a feature should reasonably have,and the associated notions of operand and option;whether you should be concerned about the size of your classes; making abstract structures active;the role of selective exports;how to document a class; how to deal with abnormal cases. From this discussion will emerge an image of the class designer as a patient craftsman who chisels out and polishes each class to make it as attractive as possible to clients.This spirit of treating classes as carefully engineered products,aiming at perfection from the start and yet always perfectible,is a pervasive quality of well-applied object technology.For obvious reasons it is particularly visible in the construction of library classes,and indeed many of the design principles reviewed in this chapter originated in library design;in the same way that successful ideas first tried in Formula 1 racing eventually trickle down to the engineering of cars for the rest of us,a technique that has shown its value by surviving the toughest possible test-being applied to the development of a successful library of reusable components-will eventually benefit all object-oriented software,whether or not initially intended for reuse
23 Principles of class design Experienced software developers know that few issues are more critical than the proper design of module interfaces. In a multi-person, or just multi-week software project, many of the decisions, discussions, disputes and confusions tend to revolve around matters of module interface specification: “Who takes care of making sure that…?”, “But I thought you only passed me normalized input…”, “Why are you processing this since I already took care of it?”. If there were just one advantage to expect from object technology, this would have to be it. From the outset of this presentation, object-oriented development has been described as an architectural technique for producing systems made of coherent, properly interfaced modules. We have now accumulated enough technical background to review the design principles through which you can take advantage of the best O-O mechanisms to develop modules with attractive interfaces. In the following pages we will explore a set of class design principles which extensive practice has shown to yield quality and durability. Because what determines the success of a class is how it will look to its clients, the emphasis here is not on the internal implementation of a class but on how to make its interface simple, easy to learn, easy to remember, and able to withstand the test of time and change. We will successively examine: whether functions should be permitted to have side effects; how many arguments a feature should reasonably have, and the associated notions of operand and option; whether you should be concerned about the size of your classes; making abstract structures active; the role of selective exports; how to document a class; how to deal with abnormal cases. From this discussion will emerge an image of the class designer as a patient craftsman who chisels out and polishes each class to make it as attractive as possible to clients. This spirit of treating classes as carefully engineered products, aiming at perfection from the start and yet always perfectible, is a pervasive quality of well-applied object technology. For obvious reasons it is particularly visible in the construction of library classes, and indeed many of the design principles reviewed in this chapter originated in library design; in the same way that successful ideas first tried in Formula 1 racing eventually trickle down to the engineering of cars for the rest of us, a technique that has shown its value by surviving the toughest possible test — being applied to the development of a successful library of reusable components — will eventually benefit all object-oriented software, whether or not initially intended for reuse
748 DESIGNING CLASS INTERFACES $23.1 23.1 SIDE EFFECTS IN FUNCTIONS The first question that we must address will have a deep effect on the style of our designs. Is it legitimate for functions-routines that return a result-also to produce a side effect, that is to say,to change something in their environment? The gist of the answer is no,but we must first understand the role of side effects,and distinguish between good and potentially bad side effects.We must also discuss the question in light of all we now know about classes:their filiation from abstract data types, the notion of abstraction function,and the role of class invariants. Commands and queries A few reminders on terminology will be useful.The features that characterize a class are "Atributes and ro divided into commands and queries.A command serves to modify objects,a query to tines".page 173. return information about objects.A command is implemented as a procedure.A query may be implemented either as an attribute,that is to say by reserving a field in each run- time instance of the class to hold the corresponding value,or as a function,that is to say through an algorithm that computes the value when needed.Procedures(which also have an associated algorithm)and functions are together called routines. The definition of queries does not specify whether in the course of producing its result a query may change objects.For commands,the answer is obviously yes,since it is the role of commands(procedures)to change things.Among queries,the question only makes sense for functions,since accessing an attribute cannot change anything.A change performed by a function is known as a side effect to indicate that it is ancillary to the function's official purpose of answering a query.Should we permit side effects? Forms of side effect Let us define precisely what constructs may cause side effects.The basic operation that changes an object is an assignment a:=b(or an assignment attempt a ?b,or a creation instruction !a)where the target a is an attribute;execution of this operation will assign a new value to the field of the corresponding object(the target of the current routine call). We only care about such assignments when a is an attribute:if a is a local entity,its value is only used during an execution of the routine and assignments to it have no permanent effect,if a is the entity Result denoting the result of the routine,assignments to it help compute that result but have no effect on objects. Also note that as a result of information hiding principles we have been careful,in "The client's privi- the design of the object-oriented notation,to avoid any indirect form of object leges on an modification.In particular,the syntax excludes assignments of the form obj.attr:=b, attribute”page206. whose aim has to be achieved through a call obj.set attr(b),where the procedure set attr(x:...)performs the attribute assignment attr:=x. The attribute assignment that causes a function to produce a side effect may be in the function itself,or in another routine that the function calls.Hence the full definition:
748 DESIGNING CLASS INTERFACES §23.1 23.1 SIDE EFFECTS IN FUNCTIONS The first question that we must address will have a deep effect on the style of our designs. Is it legitimate for functions — routines that return a result — also to produce a side effect, that is to say, to change something in their environment? The gist of the answer is no, but we must first understand the role of side effects, and distinguish between good and potentially bad side effects. We must also discuss the question in light of all we now know about classes: their filiation from abstract data types, the notion of abstraction function, and the role of class invariants. Commands and queries A few reminders on terminology will be useful. The features that characterize a class are divided into commands and queries. A command serves to modify objects, a query to return information about objects. A command is implemented as a procedure. A query may be implemented either as an attribute, that is to say by reserving a field in each runtime instance of the class to hold the corresponding value, or as a function, that is to say through an algorithm that computes the value when needed. Procedures (which also have an associated algorithm) and functions are together called routines. The definition of queries does not specify whether in the course of producing its result a query may change objects. For commands, the answer is obviously yes, since it is the role of commands (procedures) to change things. Among queries, the question only makes sense for functions, since accessing an attribute cannot change anything. A change performed by a function is known as a side effect to indicate that it is ancillary to the function’s official purpose of answering a query. Should we permit side effects? Forms of side effect Let us define precisely what constructs may cause side effects. The basic operation that changes an object is an assignment a := b (or an assignment attempt a ?= b, or a creation instruction !! a) where the target a is an attribute; execution of this operation will assign a new value to the field of the corresponding object (the target of the current routine call). We only care about such assignments when a is an attribute: if a is a local entity, its value is only used during an execution of the routine and assignments to it have no permanent effect; if a is the entity Result denoting the result of the routine, assignments to it help compute that result but have no effect on objects. Also note that as a result of information hiding principles we have been careful, in the design of the object-oriented notation, to avoid any indirect form of object modification. In particular, the syntax excludes assignments of the form obj ● attr := b, whose aim has to be achieved through a call obj ● set_attr (b), where the procedure set_attr (x:…) performs the attribute assignment attr := x. The attribute assignment that causes a function to produce a side effect may be in the function itself, or in another routine that the function calls. Hence the full definition: “Attributes and routines”, page 173. “The client’s privileges on an attribute”, page 206
$23.1 SIDE EFFECTS IN FUNCTIONS 749 Definition:concrete side effect A function produces a concrete side effect if its body contains any of the following: An assignment,assignment attempt or creation instruction whose target is an attribute. ·A procedure call. (The term "concrete"will be explained below.)In a more fine-tuned definition we would replace the second clause by "A call to a routine that (recursively)produces a concrete side effect",the definition of side effects being extended to arbitrary routines rather than just functions.But the above form is preferable in practice even though it may be considered both too strong and too weak: The definition seems too strong because any procedure call is considered to produce a side effect whereas it is possible to write a procedure that changes nothing.Such procedures,however,are rarely useful-except if their role is to change something in the software's environment,for example printing a page,sending a message to the network or moving a robot arm;but then we do want to consider this a side effect even if it does not directly affect an object of the software itself. The definition seems too weak because it ignores the case of a function that calls a side-effect-producing function g.The convention will simply be that can still be considered side-effect-free.This is acceptable because the rule at which we will arrive in this discussion will prohibit all side effects of a certain kind,so we will need to certify each function separately. The advantage of these conventions is that to determine the side-effect status of a function you only need to look at the body of the function itself.It is in fact trivial,if you have a parser for the language,to write a simple tool that will analyze a function and tell you whether it produces a concrete side effect according to the definition. Referential transparency Why should we be concerned about side effects in functions?After all it is in the nature of software execution to change things. “ntroducing a more The problem is that if we allow functions to change things as well as commands,we imperative view”, lose many of the simple mathematical properties that enable us to reason about our page 145. software.As noted in the discussion of abstract data types,when we first encountered the distinction between the applicative and the imperative,mathematics is change-free:it talks about abstract objects and defines operations on these objects,but the operations do not change the objects.(Computing 2 does not change the number two.)This immutability is the principal difference between the worlds of mathematics and computer software
§23.1 SIDE EFFECTS IN FUNCTIONS 749 (The term “concrete” will be explained below.) In a more fine-tuned definition we would replace the second clause by “A call to a routine that (recursively) produces a concrete side effect”, the definition of side effects being extended to arbitrary routines rather than just functions. But the above form is preferable in practice even though it may be considered both too strong and too weak: • The definition seems too strong because any procedure call is considered to produce a side effect whereas it is possible to write a procedure that changes nothing. Such procedures, however, are rarely useful — except if their role is to change something in the software’s environment, for example printing a page, sending a message to the network or moving a robot arm; but then we do want to consider this a side effect even if it does not directly affect an object of the software itself. • The definition seems too weak because it ignores the case of a function f that calls a side-effect-producing function g. The convention will simply be that f can still be considered side-effect-free. This is acceptable because the rule at which we will arrive in this discussion will prohibit all side effects of a certain kind, so we will need to certify each function separately. The advantage of these conventions is that to determine the side-effect status of a function you only need to look at the body of the function itself. It is in fact trivial, if you have a parser for the language, to write a simple tool that will analyze a function and tell you whether it produces a concrete side effect according to the definition. Referential transparency Why should we be concerned about side effects in functions? After all it is in the nature of software execution to change things. The problem is that if we allow functions to change things as well as commands, we lose many of the simple mathematical properties that enable us to reason about our software. As noted in the discussion of abstract data types, when we first encountered the distinction between the applicative and the imperative, mathematics is change-free: it talks about abstract objects and defines operations on these objects, but the operations do not change the objects. (Computing does not change the number two.) This immutability is the principal difference between the worlds of mathematics and computer software. Definition: concrete side effect A function produces a concrete side effect if its body contains any of the following: • An assignment, assignment attempt or creation instruction whose target is an attribute. • A procedure call. “Introducing a more imperative view”, page 145. 2
750 DESIGNING CLASS INTERFACES $23.1 Some approaches to programming seek to retain the immutability of mathematics:Lisp in its so-called"pure"form,"Functional Programming"languages such as Backus's FP, and other applicative languages shun change.But they have not caught on for practical software development,suggesting that change is a fundamental property of software. The object immutability of mathematics has an important practical consequence known as referential transparency,a property defined as follows: Definition:referential transparency Definition from “The Free On-Lime An expression e is referentially transparent if it is possible to exchange any Dictionary of Com- subexpression with its value without changing the value of e. puting"',htp∥ wombat. Ifx has value three,we can use x instead of 3,or conversely,in any part of a The Swifi quotation referentially transparent expression.(Only Swift's Laputa academicians were willing to was on page 672. pay the true price of renouncing referential transparency:always carrying around all the things you will ever want to talk about.)As a consequence of the definition,if we know that x and y have the same value,we can use one interchangeably with the other.For that reason referential transparency is also called "substitutivity of equals for equals". With side-effect-producing functions,referential transparency disappears.Assume a class contains the attribute and the function attr:INTEGER sneaky:INTEGER is do attr:=attr 1 end Remember that Result in an integer Then the value of sneaky (meaning:of a call to that function)is always 0;but you function is initial- cannot use 0 and sneaky interchangeably,since an extract of the form ized to zero. attr =0;if attr /=0 then print ("Something bizarre!")end will print nothing,but would print Something bizarre!if you replaced 0 by sneaky. Maintaining referential transparency in expressions is important to enable us to See [Dijkstra 1968] reason about our software.One of the central issues of software construction,analyzed clearly by Dijkstra many years ago,is the difficulty of getting a clear picture of the dynamic behavior(the myriad possible executions of even a simple software element) from its static description(the text of the element).In this effort it is essential to be able to rely on the proven form of reasoning,provided by mathematics.With the demise of referential transparency,however,we lose basic properties of mathematics,so deeply rooted in our practice that we may not even be aware of them.For example,it is no longer true that n+n is the same thing as 2 n if n is the sneaky-like function n:INTEGER is do attr :attr 1:Result:=attr end since,with attr initially zero,2 n will return 2 whereas n+n will return 3. By limiting ourselves to functions that do not produce side effects,we will ensure that talking about "functions"in software ceases to betray the meaning of this term in ordinary mathematics.We will maintain a clear distinction between commands,which
750 DESIGNING CLASS INTERFACES §23.1 Some approaches to programming seek to retain the immutability of mathematics: Lisp in its so-called “pure” form, “Functional Programming” languages such as Backus’s FP, and other applicative languages shun change. But they have not caught on for practical software development, suggesting that change is a fundamental property of software. The object immutability of mathematics has an important practical consequence known as referential transparency, a property defined as follows: If x has value three, we can use x instead of 3, or conversely, in any part of a referentially transparent expression. (Only Swift’s Laputa academicians were willing to pay the true price of renouncing referential transparency: always carrying around all the things you will ever want to talk about.) As a consequence of the definition, if we know that x and y have the same value, we can use one interchangeably with the other. For that reason referential transparency is also called “substitutivity of equals for equals”. With side-effect-producing functions, referential transparency disappears. Assume a class contains the attribute and the function attr: INTEGER sneaky: INTEGER is do attr := attr + 1 end Then the value of sneaky (meaning: of a call to that function) is always 0; but you cannot use 0 and sneaky interchangeably, since an extract of the form attr := 0; if attr /= 0 then print ("Something bizarre!") end will print nothing, but would print Something bizarre! if you replaced 0 by sneaky. Maintaining referential transparency in expressions is important to enable us to reason about our software. One of the central issues of software construction, analyzed clearly by Dijkstra many years ago, is the difficulty of getting a clear picture of the dynamic behavior (the myriad possible executions of even a simple software element) from its static description (the text of the element). In this effort it is essential to be able to rely on the proven form of reasoning, provided by mathematics. With the demise of referential transparency, however, we lose basic properties of mathematics, so deeply rooted in our practice that we may not even be aware of them. For example, it is no longer true that n + n is the same thing as 2 ∗ n if n is the sneaky-like function n: INTEGER is do attr := attr + 1; Result := attr end since, with attr initially zero, 2 ∗ n will return 2 whereas n + n will return 3. By limiting ourselves to functions that do not produce side effects, we will ensure that talking about “functions” in software ceases to betray the meaning of this term in ordinary mathematics. We will maintain a clear distinction between commands, which Definition: referential transparency An expression e is referentially transparent if it is possible to exchange any subexpression with its value without changing the value of e. Definition from “The Free On-Line Dictionary of Computing”, http:// wombat. The Swift quotation was on page 672. Remember that Result in an integer function is initialized to zero. See [Dijkstra 1968]
$23.1 SIDE EFFECTS IN FUNCTIONS 751 change objects but do not directly return results,and queries,which provide information about objects but do not change them. Another way to express this rule informally is to state that asking a question should not change the answer. Objects as machines The following principle expresses the prohibition in more precise terms: Command-Query Separation principle Functions should not produce abstract side effects. The definition of Note that we have only defined concrete side effects so far;for the moment you can abstract side effects ignore the difference. appears on page 757. As a result of the principle,only commands(procedures)will be permitted to produce side effects.(In fact,as noted,we not only permit but expect them to change objects- unlike in applicative,completely side-effect-free approaches.) A list object as list machine start forth 80 Dut search item before after index count The view of objects that emerges from this discussion (a metaphor,to be treated with care as usual)is that of a machine,with an internal state that is not directly observable,and two kinds ofbutton:command buttons,rectangular on the picture,and query buttons,round. Object lifecycle pic- Pressing a command button is a way to make the machine change state:it starts moving ture:page 366. and clicking,then comes back to a new stable state (one of the states shown in the earlier picture of object lifecycle).You cannot directly see the state-open the machine-but you can press a query button.This does not change the state(remember:asking a question does not change the answer)but yields a response in the form of a message appearing in the display panel at the top;for boolean queries one of the two indicators in the display panel
§23.1 SIDE EFFECTS IN FUNCTIONS 751 change objects but do not directly return results, and queries, which provide information about objects but do not change them. Another way to express this rule informally is to state that asking a question should not change the answer. Objects as machines The following principle expresses the prohibition in more precise terms: Note that we have only defined concrete side effects so far; for the moment you can ignore the difference. As a result of the principle, only commands (procedures) will be permitted to produce side effects. (In fact, as noted, we not only permit but expect them to change objects — unlike in applicative, completely side-effect-free approaches.) The view of objects that emerges from this discussion (a metaphor, to be treated with care as usual) is that of a machine, with an internal state that is not directly observable, and two kinds of button: command buttons, rectangular on the picture, and query buttons, round. Pressing a command button is a way to make the machine change state: it starts moving and clicking, then comes back to a new stable state (one of the states shown in the earlier picture of object lifecycle). You cannot directly see the state — open the machine — but you can press a query button. This does not change the state (remember: asking a question does not change the answer) but yields a response in the form of a message appearing in the display panel at the top; for boolean queries one of the two indicators in the display panel, Command-Query Separation principle Functions should not produce abstract side effects. The definition of abstract side effects appears on page 757. A list object as list machine start forth go put search item before after index count Object lifecycle picture: page 366
752 DESIGNING CLASS INTERFACES $23.1 representing true and false,will light up.If you press the button several times in a row, without touching the command buttons,you will get the same result each time.If,on the other hand,you push a command button and then a query button,the answer that you get will usually be different from what you would have obtained before the command. Commands as well as queries may take arguments;these are figuratively entered in the slot at the top left. The figure is based on the example of a list object with the kind of interface hinted at in earlier chapters and studied in more detail later in the present one.Commands include start(move the cursor to the first element),forth(advance the cursor one position),search (move the cursor to the next occurrence of the element entered into the top-left slot); queries include item(show in the display panel the value of the element at cursor position) and index(show the current cursor position).Note the difference between a notion such as "cursor",relative to the internal state and hence not directly visible,and item or index which provide more abstract,officially exported information about the state. Functions that create objects A technical point needs to be clarified before we examine further consequences of the Command-Query Separation principle:should we treat object creation as a side effect? The answer is yes,as we have seen,if the target of the creation is an attribute a:in this case,the instruction !a changes the value of an object's field.The answer is no if the target is a local entity of the routine.But what if the target is the result of the function itself, as in !Result or the more general form !Result.make (...) Such a creation instruction need not be considered a side effect.It does not change any existing object and so does not endanger referential transparency (at least if we assume that there is enough memory to allocate all the objects we need).From a mathematical perspective we may pretend that all of the objects of interest,for all times past,present and future,are already inscribed in the Great Book of Objects;a creation instruction is just a way to obtain one of them,but it does not by itself change anything in the environment.It is common,and legitimate,for a function to create,initialize and return such an object. These observations assume that in the second form the creation procedure make does not produce side effects on any object other than the one being created. A clean style for class interfaces From the Command-Query Separation principle followsa style of design that yields simple and readable software,and tremendously helps reliability,reusability and extendibility. As you may have realized,this style is very different from the dominant practices of today,as fostered in particular by the C programming language.The predilection of C for side effects-for ignoring the difference between an action and a value-is not just a feature of the common C style (it sometimes seems just psychologically impossible for a C programmer to resist the temptation,when accessing a value,also to modify it a little in
752 DESIGNING CLASS INTERFACES §23.1 representing true and false, will light up. If you press the button several times in a row, without touching the command buttons, you will get the same result each time. If, on the other hand, you push a command button and then a query button, the answer that you get will usually be different from what you would have obtained before the command. Commands as well as queries may take arguments; these are figuratively entered in the slot at the top left. The figure is based on the example of a list object with the kind of interface hinted at in earlier chapters and studied in more detail later in the present one. Commands include start (move the cursor to the first element), forth (advance the cursor one position), search (move the cursor to the next occurrence of the element entered into the top-left slot); queries include item (show in the display panel the value of the element at cursor position) and index (show the current cursor position). Note the difference between a notion such as “cursor”, relative to the internal state and hence not directly visible, and item or index which provide more abstract, officially exported information about the state. Functions that create objects A technical point needs to be clarified before we examine further consequences of the Command-Query Separation principle: should we treat object creation as a side effect? The answer is yes, as we have seen, if the target of the creation is an attribute a: in this case, the instruction !! a changes the value of an object’s field. The answer is no if the target is a local entity of the routine. But what if the target is the result of the function itself, as in !! Result or the more general form !! Result ● make (…)? Such a creation instruction need not be considered a side effect. It does not change any existing object and so does not endanger referential transparency (at least if we assume that there is enough memory to allocate all the objects we need). From a mathematical perspective we may pretend that all of the objects of interest, for all times past, present and future, are already inscribed in the Great Book of Objects; a creation instruction is just a way to obtain one of them, but it does not by itself change anything in the environment. It is common, and legitimate, for a function to create, initialize and return such an object. These observations assume that in the second form the creation procedure make does not produce side effects on any object other than the one being created. A clean style for class interfaces From the Command-Query Separation principle follows a style of design that yields simple and readable software, and tremendously helps reliability, reusability and extendibility. As you may have realized, this style is very different from the dominant practices of today, as fostered in particular by the C programming language. The predilection of C for side effects — for ignoring the difference between an action and a value — is not just a feature of the common C style (it sometimes seems just psychologically impossible for a C programmer to resist the temptation, when accessing a value, also to modify it a little in
$23.1 SIDE EFFECTS IN FUNCTIONS 753 passing);it is embedded deeply into the language,with such constructs asx++,meaning: return the value of x,then increase it by one-saving a few keystrokes iny=x++ compared to y=x;x:=x+1,and not to be confused with ++x which increments before computing the value.A whole civilization,in fact,is built on side effects It would be foolish to dismiss this side-effect-full style as thoughtless;its widespread use shows that many people have found it convenient,and it may even be part of the reason for the amazing success of C and its derivatives.But what was attractive in the nineteen-seventies and eighties -when the software development population was growing by an order of magnitude every few years,and the emphasis was on getting some kind of job done rather than on long-term quality-may not be appropriate for the software technology of the twenty-first century.There we want software that will grow with us,software that we can understand,explain,maintain,reuse and trust.The Command-Query Separation principle is one of the required conditions for these goals. Applying a strict separation between commands and queries by prohibiting abstract side effects in functions is particularly appropriate for the development of large systems, where the key to success is to exert full control on every inter-module interaction. If you have been used to the converse style,you may at first,like many people,find the new one too extreme.But after starting to practice it I think you will quickly realize its benefits. Quietly,the preceding chapters have already applied Command-Query Separation to its full extent.You may remember for example that the interface for all our stack classes included a procedure remove describing the operation of popping a stack(removing the top element),and a function or attribute item which yields the top element.The first is a command,the second a query.In other approaches you might have seen a routine pop which both removes the element and returns it-a side-effect-producing function.This example has,I hope,been studied in enough depth to show the gains of clarity and simplicity that we achieve by keeping the two aspects cleanly separated. Other consequences of the principles may seem more alarming at first.For reading input,many people are used to the style of using functions such as getint-the C name, but its equivalent exists in many other languages-whose effect is to read a new input element and return its value.This is a side-effect-producing function in all its splendor:a call to the function,written getint ()-with the empty parentheses so unmistakably characteristic of the C look-and-feel-does not just return a value but affects the context ("asking a question changes the answer"),as typical consequences,excluding the chance case in which the input has two identical consecutive values: If you call getint ()twice you will get different answers. getint ()+getint (and 2 getint()will not yield the same value.(If an overzealous "optimizing"compiler treats the first expression like the second,you will report a bug to the compiler vendor,and you will be right.) In other words,we lose the benefits of referential transparency-of reasoning about software functions as if they were mathematical functions,with a crystal-clear view of how we can build expressions from them and what values these expressions will denote
§23.1 SIDE EFFECTS IN FUNCTIONS 753 passing); it is embedded deeply into the language, with such constructs as x++, meaning: return the value of x, then increase it by one — saving a few keystrokes in y = x++ compared to y = x; x := x+1, and not to be confused with ++x which increments before computing the value. A whole civilization, in fact, is built on side effects. It would be foolish to dismiss this side-effect-full style as thoughtless; its widespread use shows that many people have found it convenient, and it may even be part of the reason for the amazing success of C and its derivatives. But what was attractive in the nineteen-seventies and eighties — when the software development population was growing by an order of magnitude every few years, and the emphasis was on getting some kind of job done rather than on long-term quality — may not be appropriate for the software technology of the twenty-first century. There we want software that will grow with us, software that we can understand, explain, maintain, reuse and trust. The Command-Query Separation principle is one of the required conditions for these goals. Applying a strict separation between commands and queries by prohibiting abstract side effects in functions is particularly appropriate for the development of large systems, where the key to success is to exert full control on every inter-module interaction. If you have been used to the converse style, you may at first, like many people, find the new one too extreme. But after starting to practice it I think you will quickly realize its benefits. Quietly, the preceding chapters have already applied Command-Query Separation to its full extent. You may remember for example that the interface for all our stack classes included a procedure remove describing the operation of popping a stack (removing the top element), and a function or attribute item which yields the top element. The first is a command, the second a query. In other approaches you might have seen a routine pop which both removes the element and returns it — a side-effect-producing function. This example has, I hope, been studied in enough depth to show the gains of clarity and simplicity that we achieve by keeping the two aspects cleanly separated. Other consequences of the principles may seem more alarming at first. For reading input, many people are used to the style of using functions such as getint — the C name, but its equivalent exists in many other languages — whose effect is to read a new input element and return its value. This is a side-effect-producing function in all its splendor: a call to the function, written getint () — with the empty parentheses so unmistakably characteristic of the C look-and-feel — does not just return a value but affects the context (“asking a question changes the answer”); as typical consequences, excluding the chance case in which the input has two identical consecutive values: • If you call getint () twice you will get different answers. • getint () + getint () and 2 ∗ getint () will not yield the same value. (If an overzealous “optimizing” compiler treats the first expression like the second, you will report a bug to the compiler vendor, and you will be right.) In other words, we lose the benefits of referential transparency — of reasoning about software functions as if they were mathematical functions, with a crystal-clear view of how we can build expressions from them and what values these expressions will denote
754 DESIGNING CLASS INTERFACES $23.1 The Command-Query Separation principle brings referential transparency back.Here this means that we will distinguish between the procedure that advances the input cursor to the next item and the function or attribute that yields the item last read.Assume input is of type FILE;the instructions to read the next integer from file input will be something like input.advance n:=input.last integer If you call last integer ten times in a row you will,unlike with getint,get ten times the same result.If you are new to this style,it may take some getting used to;but the resulting simplicity and clarity will soon remove any temptation to go back to side effects. In this example as in the x++case seen earlier,the traditional form beats the object- oriented one if the goal of the game is to minimize keystrokes.This illustrates a general observation:the productivity gains of object technology will not derive from trying to be as terse as possible on a microscopic scale(a game at which APL or modern"scripting languages"such as Perl will always win against a good O-O language).The achievements are on the global structure of a system:through reuse,through such mechanisms as genericity and garbage collection,through the use of assertions,you can decrease the size of your software by amounts far higher than anything you can achieve by skimping by a character here or a line there.Keystroke-wise is often system-foolish. Pseudo-random number generators:a design exercise An example sometimes quoted in favor of functions with side effects is that of pseudo- random number generators,which return successive values from a sequence enjoying adequate statistical properties.The sequence is initialized by a call of the form random seed (seed) where seed is a seed value provided by the client.A common way to get the successive pseudo-random values is by calling a function: xx:=next random ( But here too there is no reason to make an exception to the command/query dichotomy.Before looking at the solution let us just forget that we have seen the above and restart from scratch by asking the question:how should we handle random generation in an object-oriented context?This will provide the opportunity of a little design exercise, and will enable us,if the need arises,to explain the results to someone whose view has not been unduly influenced by pre-O-O approaches. As always in object technology,the relevant question-often the only one-is: What are the data abstractions? The relevant abstraction here is not "random number generation"or "random number generator",both of them quite functional in nature,focusing on what the system does rather than what it does it to. Probing further,we might think "random number",but that is not the right answer yet.Remember,a data abstraction is characterized by features-commands and queries; it is hard to think of features applicable to "random number
754 DESIGNING CLASS INTERFACES §23.1 The Command-Query Separation principle brings referential transparency back. Here this means that we will distinguish between the procedure that advances the input cursor to the next item and the function or attribute that yields the item last read. Assume input is of type FILE; the instructions to read the next integer from file input will be something like input ● advance n := input ● last_integer If you call last_integer ten times in a row you will, unlike with getint, get ten times the same result. If you are new to this style, it may take some getting used to; but the resulting simplicity and clarity will soon remove any temptation to go back to side effects. In this example as in the x++ case seen earlier, the traditional form beats the objectoriented one if the goal of the game is to minimize keystrokes. This illustrates a general observation: the productivity gains of object technology will not derive from trying to be as terse as possible on a microscopic scale (a game at which APL or modern “scripting languages” such as Perl will always win against a good O-O language). The achievements are on the global structure of a system: through reuse, through such mechanisms as genericity and garbage collection, through the use of assertions, you can decrease the size of your software by amounts far higher than anything you can achieve by skimping by a character here or a line there. Keystroke-wise is often system-foolish. Pseudo-random number generators: a design exercise An example sometimes quoted in favor of functions with side effects is that of pseudorandom number generators, which return successive values from a sequence enjoying adequate statistical properties. The sequence is initialized by a call of the form random_seed (seed) where seed is a seed value provided by the client. A common way to get the successive pseudo-random values is by calling a function: xx := next_random () But here too there is no reason to make an exception to the command/query dichotomy. Before looking at the solution let us just forget that we have seen the above and restart from scratch by asking the question: how should we handle random generation in an object-oriented context? This will provide the opportunity of a little design exercise, and will enable us, if the need arises, to explain the results to someone whose view has not been unduly influenced by pre-O-O approaches. As always in object technology, the relevant question — often the only one — is: What are the data abstractions? The relevant abstraction here is not “random number generation” or “random number generator”, both of them quite functional in nature, focusing on what the system does rather than what it does it to. Probing further, we might think “random number”, but that is not the right answer yet. Remember, a data abstraction is characterized by features — commands and queries; it is hard to think of features applicable to “random number
$23.1 SIDE EFFECTS IN FUNCTIONS 755 “Discovery and That "random number"leads to a dead end illustrates the Class Elicitation principle rejection",page 725. encountered when we studied the general rules for finding the classes:a key step may be to reject inappropriate candidates.And once again we see that not all promising nouns yield classes:were a"requirements document"written for this problem,the noun random number would certainly figure prominently in it. A random number does not mean much by itself;it must be understood in relation to its predecessors and successors in the sequence. Wait a minute-here we have it:sequence,more precisely pseudo-random number sequence.This is the abstraction we have been looking for,a perfectly legitimate data abstraction,similar to the cursor lists we have seen on a number of occasions,only infinite (do not look for an after boolean query!).Features will include: Commands:make-initialize with a certain seed;forth-advance to next element. Queries:item-return the element at cursor position. An infinite list item as a machine start To get a new random number sequence rand,clients will use !rand.make (seed),to advance to the next value,they will call rand.forth;and they will obtain the current value by xx:=rand.item. There is really nothing specific to random number sequences in the interface,except for the seed argument to the creation procedure.Adding a start procedure which brings the cursor to the first item (and which make may call for random number sequences),what we have is the framework for a deferred class COUNTABLE SEOUENCE describing arbitrary infinite sequences.Think for example of how to model prime numbers in an object-oriented way;the answer is the same:define a class PRIMES,an heir to COUNTABLE SEOUENCE,whose successive elements are the prime numbers.Other sequences-Fibonacci numbers and the like-will be modeled in the same way. These examples illustrate in passing that contrary to popular belief it is quite possible,and even trivial,to represent infinite structures on a computer.Abstract data types provide the key:a structure is entirely defined by the applicable operations,of which there is of course a finite number,three in this case-starl,forth,item-plus any auxil iary features we may want to add.The trick,of course,is that any execution will only try to evaluate a finite number of elements of an infinite structure. M1994a COUNTABLE SEOUENCE and its heirs such as PRIMES are part of the universal computing science hierarchy described in the companion guide to reusable components
§23.1 SIDE EFFECTS IN FUNCTIONS 755 That “random number” leads to a dead end illustrates the Class Elicitation principle encountered when we studied the general rules for finding the classes: a key step may be to reject inappropriate candidates. And once again we see that not all promising nouns yield classes: were a “requirements document” written for this problem, the noun random number would certainly figure prominently in it. A random number does not mean much by itself; it must be understood in relation to its predecessors and successors in the sequence. Wait a minute — here we have it: sequence, more precisely pseudo-random number sequence. This is the abstraction we have been looking for; a perfectly legitimate data abstraction, similar to the cursor lists we have seen on a number of occasions, only infinite (do not look for an after boolean query!). Features will include: • Commands: make — initialize with a certain seed; forth — advance to next element. • Queries: item — return the element at cursor position. To get a new random number sequence rand, clients will use !! rand ● make (seed); to advance to the next value, they will call rand ● forth; and they will obtain the current value by xx := rand ● item. There is really nothing specific to random number sequences in the interface, except for the seed argument to the creation procedure. Adding a start procedure which brings the cursor to the first item (and which make may call for random number sequences), what we have is the framework for a deferred class COUNTABLE_SEQUENCE describing arbitrary infinite sequences. Think for example of how to model prime numbers in an object-oriented way; the answer is the same: define a class PRIMES, an heir to COUNTABLE_SEQUENCE, whose successive elements are the prime numbers. Other sequences — Fibonacci numbers and the like — will be modeled in the same way. These examples illustrate in passing that contrary to popular belief it is quite possible, and even trivial, to represent infinite structures on a computer. Abstract data types provide the key: a structure is entirely defined by the applicable operations, of which there is of course a finite number, three in this case — start, forth, item — plus any auxiliary features we may want to add. The trick, of course, is that any execution will only try to evaluate a finite number of elements of an infinite structure. COUNTABLE_SEQUENCE and its heirs such as PRIMES are part of the universal computing science hierarchy described in the companion guide to reusable components. “Discovery and rejection”, page 725. An infinite list as a machine forth item start [M 1994a]
756 DESIGNING CLASS INTERFACES $23.1 Abstract state,concrete state From the discussion of referential transparency it would seem desirable to bar all concrete "If it is barogue.fix side effects from functions.Such a rule would have the advantage that-in line with one it".page 670. of our methodology precepts-we could build it into the language,since a compiler can easily detect concrete side effects(as we saw just after the definition of this notion). Unfortunately,this would be unacceptably restrictive,explaining why the Command-Query Separation principle only prohibits abstract side effects,a notion that will now be defined.The problem is that some concrete side effects are not only harmless but necessary.They are of two kinds. The first category includes functions which,in the course of their execution,modify the state,sometimes drastically,and affecting very visible features;but then they restore the original state.Consider for example a class describing integer lists with cursor,and the following function for computing the maximum of a list: max is --The highest value of items in the list require not empty local original index:INTEGER do original index index from start;Result item until is last loop forth.Result :Result.max (item) end go (original index) end To traverse the list,the algorithm needs to move the cursor over all elements.The function,calling such procedures as start,forth and go,is indeed full of concrete side effects on the cursor position;but it begins by noting the cursor position into original index and ends by returning the cursor to that position through a call to go.All is well that ends well:the function leaves the list in exactly the state in which it found it.But no compiler in the world is going to detect that the side effect is only apparent. Side effects of the second acceptable category may change the state of the object,but Pages 376 to 378. only affecting properties that are not visible to clients.To understand the concepts in depth,it will be useful to make sure that you are familiar with the discussion of “abstraction function”and“implementation invariants'”in the presentation of Design by Contract.(In particular,take a look at the accompanying figures to refresh your memory.)
756 DESIGNING CLASS INTERFACES §23.1 Abstract state, concrete state From the discussion of referential transparency it would seem desirable to bar all concrete side effects from functions. Such a rule would have the advantage that — in line with one of our methodology precepts — we could build it into the language, since a compiler can easily detect concrete side effects (as we saw just after the definition of this notion). Unfortunately, this would be unacceptably restrictive, explaining why the Command-Query Separation principle only prohibits abstract side effects, a notion that will now be defined. The problem is that some concrete side effects are not only harmless but necessary. They are of two kinds. The first category includes functions which, in the course of their execution, modify the state, sometimes drastically, and affecting very visible features; but then they restore the original state. Consider for example a class describing integer lists with cursor, and the following function for computing the maximum of a list: max is -- The highest value of items in the list require not empty local original_index: INTEGER do original_index := index from start; Result := item until is_last loop forth; Result := Result ● max (item) end go (original_index) end To traverse the list, the algorithm needs to move the cursor over all elements. The function, calling such procedures as start, forth and go, is indeed full of concrete side effects on the cursor position; but it begins by noting the cursor position into original_index and ends by returning the cursor to that position through a call to go. All is well that ends well: the function leaves the list in exactly the state in which it found it. But no compiler in the world is going to detect that the side effect is only apparent. Side effects of the second acceptable category may change the state of the object, but only affecting properties that are not visible to clients. To understand the concepts in depth, it will be useful to make sure that you are familiar with the discussion of “abstraction function” and “implementation invariants” in the presentation of Design by Contract. (In particular, take a look at the accompanying figures to refresh your memory.) “If it is baroque, fix it”, page 670. Pages 376 to 378