正在加载图片...
Interrupt disabled F1 Infinite loop Preemption enabled(loop in kernel)F2 Interrupt enabled System Hang Preemption disabled F3 Resources not released Deadlock(except spinlock)F4 Indefinite wait Sleeping while holding locks F5 Resources released slowly Abnormal resource consumption F6 Holding resources too long during correct operations Figure 1.Categories of system hang causes(F:Fault in abbreviation core computer,the stall of only one core can cause through additional assistances (e.g.,hardware mod- the freeze of the whole system for certain reasons, ules or kernel modification),this section investigates e.g.,the synchronization mechanism between different whether exploiting the services provided by the OS cores.Indeed,this phenomenon does occur frequently itself can help detect system hang.In Section 3.1,we in our experiments. first introduce a hypothesis about empirical metrics 2.2.2.Indefinite Wait used for system hang detection.According to this hypothesis,the research questions about detection Awaiting resources (e.g.,signals,semaphores,I/O, metrics are proposed in Section 3.2.In Section 3.3, interrupts or memory spaces)indefinitely can be ex- we conduct experiments to determine which metrics plained as waiting for the resources requested either should be selected to detect system hang.Finally,we infinitely or for a long time (depends on the patience discuss how to use the selected performance metrics of users).The deadlock described in F4 does not to detect system hang. include the circumstance triggered by spinlocks even if double spinlocks(it belongs to F1)is also a kind 3.1.Hypothesis of Detection Metrics of deadlock.If tasks or a piece of kernel codes, which have several interactions with other tasks,are We choose system performance metrics(e.g.,context trapped by deadlock,system hang may occur due to switches per second and number of runnable tasks) the sudden loss of the key internal services.In general, as the targets of detection because they are usually sudden disappearance of resources (e.g,peripheral provided by most OSes and implicate the overall per- devices,pipe)also belongs to F4.OS provides no formance information when the system slows down. mechanism to ensure that a task holding spinlock Our detection metrics are hypothesized as follows: would not fall into a sleep state.As a result,F5 may Hypothesis:Combined with a theoretical analysis. cause system hang because tasks that wait for the partial system performance metrics can be regarded as spinlocks to be released have to run on CPU in a busy a sufficient basis to determine whether system hang waiting way,thus providing no chance to schedule occurs. other tasks.F6 is usually relevant to anomalous mem- 3.2.Research Questions ory consumption,since there are not enough memory space immediately provided to the new forked tasks Since system performance metrics are uncontrollable, or the ones swapped in again.The classical malicious it is impossible to build a mapping from performance program "fork bomb"(fork infinitely)also belongs to metrics to a hang state.As a result,the other way, F6.Holding resources for a long time during correct i.e.,observing the values of performance metrics when operations,e.g.,copying many files simultaneously to the system enters a hang state,can be attempted to peripheral devices,may cause temporal system hang. help understand which metrics may implicate system However,this situation is not considered as a cause of hang.It should be noticed that,in this situation,the system hang,since it is a correct operation and varies influenced performance metrics are necessary rather with different system configurations.It should be than sufficient to detect system hang. noticed that although F5 and F6 may release resources As a result,whether the selected metrics are also after a while (e.g.,the task holding spinlock is waked sufficient or not needs to be validated(empirically in up and executes an unlock operation),F5 and F6 are Section 5).According to the hypothesis (Section 3.1) considered as the causes of system hang because they and the analysis above,we seek to answer the follow- occur due to inappropriate operations. ing research questions: 3.Empirical Detection Metrics RO/Among hundreds of system performance metrics provided by OS,which ones should The difficulty in handling system hang lies in how be selected? to detect it,since OS offers no mechanisms to make RO2 How to determine system hang with the itself informed when it enters a hang state.Most system performance metrics? studies (as described in Section 1)detect system hang Sections 3.3 and 3.4 answer the two researchSystem Hang    Infinite loop    Interrupt disabled F1 Interrupt enabled    Preemption enabled(loop in kernel) F2 Preemption disabled F3 Indefinite wait    Resources not released Deadlock(except spinlock) F4 Resources released slowly    Sleeping while holding locks F5 Abnormal resource consumption F6 Holding resources too long during correct operations Figure 1. Categories of system hang causes ( F:Fault in abbreviation ) core computer, the stall of only one core can cause the freeze of the whole system for certain reasons, e.g., the synchronization mechanism between different cores. Indeed, this phenomenon does occur frequently in our experiments. 2.2.2. Indefinite Wait Awaiting resources (e.g., signals, semaphores, I/O, interrupts or memory spaces) indefinitely can be ex￾plained as waiting for the resources requested either infinitely or for a long time (depends on the patience of users). The deadlock described in F4 does not include the circumstance triggered by spinlocks even if double spinlocks (it belongs to F1) is also a kind of deadlock. If tasks or a piece of kernel codes, which have several interactions with other tasks, are trapped by deadlock, system hang may occur due to the sudden loss of the key internal services. In general, sudden disappearance of resources (e.g., peripheral devices, pipe) also belongs to F4. OS provides no mechanism to ensure that a task holding spinlock would not fall into a sleep state. As a result, F5 may cause system hang because tasks that wait for the spinlocks to be released have to run on CPU in a busy waiting way, thus providing no chance to schedule other tasks. F6 is usually relevant to anomalous mem￾ory consumption, since there are not enough memory space immediately provided to the new forked tasks or the ones swapped in again. The classical malicious program “fork bomb” (fork infinitely) also belongs to F6. Holding resources for a long time during correct operations, e.g., copying many files simultaneously to peripheral devices, may cause temporal system hang. However, this situation is not considered as a cause of system hang, since it is a correct operation and varies with different system configurations. It should be noticed that although F5 and F6 may release resources after a while (e.g., the task holding spinlock is waked up and executes an unlock operation), F5 and F6 are considered as the causes of system hang because they occur due to inappropriate operations. 3. Empirical Detection Metrics The difficulty in handling system hang lies in how to detect it, since OS offers no mechanisms to make itself informed when it enters a hang state. Most studies (as described in Section 1) detect system hang through additional assistances (e.g., hardware mod￾ules or kernel modification), this section investigates whether exploiting the services provided by the OS itself can help detect system hang. In Section 3.1, we first introduce a hypothesis about empirical metrics used for system hang detection. According to this hypothesis, the research questions about detection metrics are proposed in Section 3.2. In Section 3.3, we conduct experiments to determine which metrics should be selected to detect system hang. Finally, we discuss how to use the selected performance metrics to detect system hang. 3.1. Hypothesis of Detection Metrics We choose system performance metrics (e.g., context switches per second and number of runnable tasks) as the targets of detection because they are usually provided by most OSes and implicate the overall per￾formance information when the system slows down. Our detection metrics are hypothesized as follows: Hypothesis: Combined with a theoretical analysis, partial system performance metrics can be regarded as a sufficient basis to determine whether system hang occurs. 3.2. Research Questions Since system performance metrics are uncontrollable, it is impossible to build a mapping from performance metrics to a hang state. As a result, the other way, i.e., observing the values of performance metrics when the system enters a hang state, can be attempted to help understand which metrics may implicate system hang. It should be noticed that, in this situation, the influenced performance metrics are necessary rather than sufficient to detect system hang. As a result, whether the selected metrics are also sufficient or not needs to be validated (empirically in Section 5). According to the hypothesis (Section 3.1) and the analysis above, we seek to answer the follow￾ing research questions: RQ1 Among hundreds of system performance metrics provided by OS, which ones should be selected? RQ2 How to determine system hang with the system performance metrics? Sections 3.3 and 3.4 answer the two research
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有