System and Software Safety Nancy G Leveson MIT Aero/Astro Dept Safeware Engineering Corp The Problem The first step in solving any problem is to understand it We often propose solutions to problems that we do not understand and then are surprised when the solutions fail to have the anticipated effect
. c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞ System and Software Safety Nancy G. Leveson MIT Aero/Astro Dept. Safeware Engineering Corp. Copyright by the author, June 2001. All rights reserved. Copying without fee is permitted provided that the copies are not made or distributed for direct commercial advantage and provided that credit to the source is given. Abstracting with credit is permitted. c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✡ The Problem The first step in solving any problem is to understand it. We often propose solutions to problems that we do not understand and then are surprised when the solutions fail to have the anticipated effect. c
Accident with No Component Failures ( LAHGEARBOX LC ONDENSER CATALYST VAPOR COOLING REFLUX REACTOR COMPUTER eveson-4 The Problem Ty pes of Accidents Component Failure Accidents Single or multiple component failures Usually assume random failure · System Accidents Arise in interactions among components No components may have"failed Caused by interactive complexity and tight coupling Exacerbated by the introduction of computers
Accident with No Component Failures c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✘ ✍✏✎✂✁✠✑✓✒✝✂✔✂✕✁✂✖ LC COMPUTER WATER COOLING CONDENSER VENT REFLUX REACTOR VAPOR LA CATALYST GEARBOX Types of Accidents Component Failure Accidents Single or multiple component failures Usually assume random failure System Accidents Arise in interactions among components No components may have "failed" c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✗ ✍✏✎✂✁✠✑✓✒✝✂✔✂✕✁✂✖ Caused by interactive complexity and tight coupling Exacerbated by the introduction of computers. .
Interactive Complexity Complexity is a moving target The underlying factor is intellectual manageability 1. A"simple"system has a small number of unknowns in its nteractions within the system and with its environment 2. a system is intellectually unmanageable when the level of interactions reaches the point where they cannot be thoroughly planned understood anticipated guarded against 3. Introducing new technology introduces unknowns and even"unk-unks Computers and risk We seem not to trust one another as much as would be desirable. In lieu of trusting each other, are we putting too much trust in our technology ?.. Perhaps we are not educating our children sufficient/y we// to understand the reasonable uses and limits of technology. Thomas b. sheridan
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✩ ✍✏✎✂✁✠✑✓✒✝✂✔✂✕✁✂✖ Interactive Complexity Complexity is a moving target The underlying factor is intellectual manageability 1. A "simple" system has a small number of unknowns in its interactions within the system and with its environment. 2. A system is intellectually unmanageable when the level of interactions reaches the point where they cannot be thoroughly planned understood anticipated guarded against 3. Introducing new technology introduces unknowns and even "unk−unks." c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✪ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Computers and Risk We seem not to trust one another as much as would be desirable. In lieu of trusting each other, are we putting too much trust in our technology? . . . Perhaps we are not educating our children sufficiently well to understand the reasonable uses and limits of technology. Thomas B. Sheridan
The Computer Revolution General Special Purpose Software = Purpose Machine Machine Software is simply the design of a machine abstracted from its physical realization Machines that were physically impossible or impractical to build become feasible Design can be changed without retooling or manufacturing Can concentrate on steps to be achieved without worrying about how steps will be realized physically Advantages disadvantages Computer so powerful and so useful because it has eliminated many of physical constraints of previous machines Both its blessing and its curse no longer have to worry about physical realization of our designs No longer have physical laws that limit the complexity of our designs
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✮ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✫✦✆☎★ The Computer Revolution General Special Purpose + Software = Purpose Machine Machine Software is simply the design of a machine abstracted from its physical realization. Machines that were physically impossible or impractical to build become feasible. Design can be changed without retooling or manufacturing. Can concentrate on steps to be achieved without worrying about how steps will be realized physically. Advantages = Disadvantages Computer so powerful and so useful because it has eliminated many of physical constraints of previous machines. Both its blessing and its curse: + No longer have to worry about physical realization of our designs. c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✭✬ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✫✦✆☎★ − No longer have physical laws that limit the complexity of our designs
The Curse of Flexibility Software is the resting place of afterthoughts No physical constraints To enforce discipline on design, construction and modification To control complexity So flexible that start working with it before fully understanding what need to . And they looked upon the software and saw that it was good, but they just had to add one other feature Software Myths Good software engineering is the same for all types of software 2. Software is easy to change 3. Software errors are simply"teething" problems 4. Reusing software will increase safety 5. Testing or"proving" software correct will remove all the errors
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✯ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ The Curse of Flexibility Software is the resting place of afterthoughts No physical constraints To enforce discipline on design, construction and modification To control complexity So flexible that start working with it before fully understanding what need to do ‘‘And they looked upon the software and saw that it was good, but they just had to add one other feature ...’’ c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✰ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Software Myths 1. Good software engineering is the same for all types of software. 2. Software is easy to change. 3. Software errors are simply ‘‘teething’’ problems. 4. Reusing software will increase safety. 5. Testing or ‘‘proving’’ software correct will remove all the errors
eyeson Abstraction from Physical design Software engineers are doing system design Autopilot Syster oftware Design of Expert Requirements Engineer Autopilot Most errors in operational software related to requirements Completeness a particular problem Software"failure modes"are different Usually does exactly what you tell it to do Problems occur from operation, not lack of operation Usually doing exactly what software engineers wanted Computers and Risk Typical Fault Trees Hazard Software (error) Hazard Cause Probability Mitigation Software error Test software
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✂☞ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Abstraction from Physical Design Software engineers are doing system design Expert Autopilot Autopilot Engineer System Software Design of Requirements Most errors in operational software related to requirements Completeness a particular problem Software "failure modes" are different Usually does exactly what you tell it to do Problems occur from operation, not lack of operation Usually doing exactly what software engineers wanted c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✡ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Typical Fault Trees ... ... 0 Test software Hazard Cause Probability Mitigation Software Error (error) fails Software Hazard OR
Black Box Testing Test data derived solely from specification (i.e without knowledge of internal structure of program) Need to test every possible input X:=y·2 C(since black box, only way to be sure to detect this is to try every input condition) Valid inputs up to max size of machine(not astronomical) Also all invalid input (e.g, testing Ada compiler requires all valid and invalid programs If program has"memory,, need to test all possible unique valid and invalid sequences So for most programs, exhaustive in put testing is impractical Computers and Risk White Box Testing Derive test data by examining programs logic Exhaustic path testing Two flaws 1) Number of unique paths through program is astronomical oop≤20 5+5+5+∴+5=10 100 trillion If could develop/execute/verify one test case every five minutes= 1 billion years Q夕 If had magic test processor that could develop/executelevaluate one test per msec 3170 years (control-flow graph)
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✘ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Black Box Testing Test data derived solely from specification (i.e., without knowledge of internal structure of program). Need to test every possible input x := y * 2 (since black box, only way to be sure to detect this is to try every input condition) Valid inputs up to max size of machine (not astronomical) Also all invalid input (e.g., testing Ada compiler requires all valid and invalid programs) If program has ‘‘memory’’, need to test all possible unique valid and invalid sequences. So for most programs, exhaustive in put testing is impractical. c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✗ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ White Box Testing Derive test data by examining program’s logic. Exhaustic path testing: Two flaws 1) Number of unique paths through program is astronomical. loop 20x 20 19 18 14 5 + 5 + 5 + ... + 5 = 10 = 100 trillion If could develop/execute/verify one test case every five minutes = 1 billion years If had magic test processor that could develop/execute/evaluate one test per msec = 3170 years. (control−flow graph)
om fasani White Box Testing(con't 2)Could test every path and program may still have errors Does not guarantee program matches specification, L.e. Wrong program Missing paths: would not detect absence of necessary pati Could still have data-sensitivity errors e.g. program has to compare two numbers for convergence if (A-B)< epsilon is wrong because should compare to abs (a-B) Detection of this error dependent on values used for A and B and would not necessarily be found by executing every path through program Computers and Risk Mathematical Modeling Difficulties Large number of states and lack of regularity Lack of physical continuity: requires discrete rather than continuous math Specifications and proofs using logic May be same size or larger than code More difficult to construct than code Harder to understand than code Therefore, as difficult and error-prone as code itself Have not found good ways to measure software quality
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✩ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ White Box Testing (con’t) 2) Could test every path and program may still have errors! Does not guarantee program matches specification, i.e., wrong program. Missing paths: would not detect absence of necessary paths Could still have data−sensitivity errors. e.g. program has to compare two numbers for convergence if (A − B) < epsilon ... is wrong because should compare to abs(A − B) Detection of this error dependent on values used for A and B and would not necessarily be found by executing every path through program. c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✪ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Mathematical Modeling Difficulties Large number of states and lack of regularity Lack of physical continuity: requires discrete rather than continuous math Specifications and proofs using logic: May be same size or larger than code More difficult to construct than code Harder to understand than code Therefore, as difficult and error−prone as code itself Have not found good ways to measure software quality
eveson-17 A Possible Solution Enforce discipline and control complexity Limits have changed from structural integrity and physical constraints of materials to intellectual limits Improve communication among engineers Build safety in by enforcing constraints on behavior Example(batch reactor) System safety constraint: Water must be flowing into reflux condenser whenever catalyst is added to reactor Software safety constraint Software must always open water valve before catalyst valve Computers and Risk Stages in Process Control System Evolution 1.Mechanical systems Direct sensory perception of process Displays are directly connected to process and thus are physical extensions of it Design decisions highly constrained by Available space Physics of underlying process Limited possibility of action at a distance
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✮ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ A Possible Solution Enforce discipline and control complexity Limits have changed from structural integrity and physical constraints of materials to intellectual limits Improve communication among engineers Build safety in by enforcing constraints on behavior Example (batch reactor) System safety constraint: Water must be flowing into reflux condenser whenever catalyst is added to reactor. Software safety constraint: Software must always open water valve before catalyst valve ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✬ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Stages in Process Control System Evolution 1. Mechanical systems Direct sensory perception of process Displays are directly connected to process and thus are physical extensions of it. Design decisions highly constrained by: Available space c Physics of underlying process Limited possibility of action at a distance
eveson -19 Stages in Process Control System Evolution(2) 2. Electromechanical systems Capability for action at a distance Need to provide an image of process to operators Need to provide feedback on actions taken Relaxed constraints on designers but created new possibilities for designer and operator error Computers and Risk Stages in Process Control System Evolution (3) 3. Computer-based systems Allow multiplexing of controls and displays Relaxes even more constraints and introduces more possibility for error But constraints shaped environment in ways that efficiently transmitted valuable process information and supported cognitive processes of operators Finding it hard to capture and present these qualities in new systems
c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟✌☞✱✯ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Stages in Process Control System Evolution (2) 2. Electromechanical systems Capability for action at a distance Need to provide an image of process to operators Need to provide feedback on actions taken. Relaxed constraints on designers but created new possibilities for designer and operator error. c ✂✁✂✄☎✁✂✆☎✝✂✞✠✟☛✡✂✰ ✙✝✂✖✠✚✂✛✂✜✁✂✒✆✣✢✂✞✂✤✠✥✧✦✆☎★ Stages in Process Control System Evolution (3) 3. Computer−based systems Allow multiplexing of controls and displays. Relaxes even more constraints and introduces more possibility for error. But constraints shaped environment in ways that efficiently transmitted valuable process information and supported cognitive processes of operators. Finding it hard to capture and present these qualities in new systems