
EMB424 Implementing Fault Tolerant Systems in Windows CE 5.0 Nat Frampton President Real Time Development nat@realtimeonline.com MEDC DevCon 2005
EMB424 Implementing Fault Tolerant Systems in Windows CE 5.0 Nat Frampton President Real Time Development nat@realtimeonline.com

Microsoft MEDC Mobile Embedded DevCon 2005 May9-12,2005 Las Vegas Microsoft

着Windows Mobile Windows CE5.0 Windows Window” Hardware/privers OEM/HV Supplied BSP OEM Hardware and Standard PC (ARM,SH4,MIPS) Standard Drivers Hardware and Drivers Device Building Windows XP DDK Tools Platform Builder Windows Embedded Studio Lightweigh EDB SQL Server 2005 Express Edition Relational SQL Server 2005 Mobile Edition SQL Server 2005 Native Win32 MFC 8.0,ATL 8.0 Managed .NET Compact Framework .NET Framework server side ASP.NET Mobile Controls ASP.NET Multimedia Windows Media DirectX Location services MapPoint Development Tools Visual Studio 2005 Internet Security and Acceleration Server communications Exchange Server Messaging Live Communications Server Speech Server Device Update Agent Management Image Update Software Update Services Tools Systems Management Server Microsoft Operations Manager
Management Tools Communications & Messaging Device Update Agent Software Update Services Live Communications Server Exchange Server Internet Security and Acceleration Server Speech Server Image Update Location Services Multimedia MapPoint DirectX Windows Media Development Tools Visual Studio 2005 MFC 8.0, ATL 8.0 Native Win32 Managed Server Side Lightweight Relational EDB SQL Server 2005 Express Edition Data Programming Model Device Building Tools Hardware/Drivers Windows XP DDK Windows Embedded Studio Platform Builder OEM/IHV Supplied BSP (ARM, SH4, MIPS) OEM Hardware and Standard Drivers Standard PC Hardware and Drivers SQL Server 2005 Mobile Edition SQL Server 2005 ASP.NET Mobile Controls ASP.NET .NET Compact Framework .NET Framework Microsoft Operations Manager Systems Management Server

Overview © Background History Definitions -Ground Rules OS Properties Fault Tolerant Techniques Partitioning into Threads and Processes Watchdogs Exception Handling Interrupt level Fault Tolerance Conclusions
Overview Background History – Definitions – Ground Rules OS Properties Fault Tolerant Techniques Partitioning into Threads and Processes Watchdogs Exception Handling Interrupt level Fault Tolerance Conclusions

Background History (1) Hardware has improved Software has become the primary cause of faults! Building complex systems from unreliable parts has been addressed from years Space applications served as the catalyst for fault tolerant system design Fault tolerant system design includes Carefully designed hardware Redundant software
Background – History (1) Hardware has improved Software has become the primary cause of faults! Building complex systems from unreliable parts has been addressed from years Space applications served as the catalyst for fault tolerant system design Fault tolerant system design includes Carefully designed hardware Redundant software

Background History (2) We have to accept that systems ship with failures Reliability Engineering Leveraged the concepts of MBTFs from HW End User describes failures and tolerances Statistical models -probability of a failure Systems can ship with failures the user can tolerate Tradeoff Lower the probability of failure vs.cost
Background – History (2) We have to accept that systems ship with failures Reliability Engineering Leveraged the concepts of MBTFs from HW End User describes failures and tolerances Statistical models -> probability of a failure Systems can ship with failures the user can tolerate Tradeoff Lower the probability of failure vs. cost

Background Definitions (1) Dependability of a computing system is the ability to deliver service that can justifiably be trusted Service delivered by a system is its behavior perceived by another system(physical,human)that interacts with the former at the service interface Function of a system is what the system is intended to do,as described by the functional specification A system failure occurs when the service delivered does not comply with the specification An error is a system state,which may lead to failure; An error is detected if an error message or signal is produced within the system,or latent if not detected A fault is the cause of an error,and is active when it results in an error,otherwise is dormant
Background – Definitions (1) Dependability of a computing system is the ability to deliver service that can justifiably be trusted Service delivered by a system is its behavior perceived by another system (physical, human) that interacts with the former at the service interface Function of a system is what the system is intended to do, as described by the functional specification A system failure occurs when the service delivered does not comply with the specification An error is a system state, which may lead to failure; An error is detected if an error message or signal is produced within the system, or latent if not detected A fault is the cause of an error, and is active when it results in an error, otherwise is dormant

Background Definitions(2) Fault tolerance is ability of a system to deliver of correct service in the presence of faults Applications may emphasize different attributes of dependability,including Availability:readiness for correct service Reliability:the continuity of that service Safety:the avoidance of catastrophic consequences on the environment Security:the prevention of unauthorized access
Background – Definitions (2) Fault tolerance is ability of a system to deliver of correct service in the presence of faults Applications may emphasize different attributes of dependability, including Availability: readiness for correct service Reliability: the continuity of that service Safety: the avoidance of catastrophic consequences on the environment Security: the prevention of unauthorized access

Background OS Properties Processes and Threads Synchronization Objects Priorities Interrupt Architecture
Background – OS Properties Processes and Threads Synchronization Objects Priorities Interrupt Architecture

Windows Priority Component 0-19 Open-Real Time Above Drivers 20 Permedia Vertical Retrace cE5.0 21-89 Open-Real Time Above Drivers Power management Resume Threa Priority 100-103 SB OHCI UHCL Serial 109-2岁 Irsirl.NDIS.Touch KITL Map 830 Ii 132 133-144 Open-Device Drivers 143 PS2 Keyboard 146-147 Open-Device Drivers 路 IRComm 149 Open-Device Drivers 150 TAPL 151-152 Open-Device Drivers 153-247 Open-Real Time Below Drivers 248 Power Management 249 WaveDev,TVIA5000.Mouse,PnP.Power 250 WaveAPI 251 Power Manager Battery Thread 252-257 Open
Windows CE 5.0 Priority Map Priority Component 0-19 Open – Real Time Above Drivers 20 Permedia Vertical Retrace 21-89 Open – Real Time Above Drivers 99 Power management Resume Thread 100-108 USB OHCI UHCI, Serial 109-129 Irsir1, NDIS, Touch 130 KITL 131 VMini 132 CxPort 133-144 Open – Device Drivers 145 PS2 Keyboard 146-147 Open – Device Drivers 148 IRComm 149 Open – Device Drivers 150 TAPI 151-152 Open – Device Drivers 153-247 Open – Real Time Below Drivers 248 Power Management 249 WaveDev, TVIA5000,Mouse,PnP,Power 250 WaveAPI 251 Power Manager Battery Thread 252-257 Open