正在加载图片...
Evaluating Diagnostic Accuracy in the Face of Multiple reference Standards I RESEARCH AND REPORTING METHODS erence standard and the verification pattern when ruling When the inferior reference standard is believed to out the risk of clinically relevant differential verification have high accuracy, clinically relevant differential verifica- bi tion bias is unlikely and there is no need to look into the other factors influencing bias. This was believed to be the Accuracy of the Inferior Reference Standard case in an example of differential verification from a stud As a general rule, the risk of clinically relevant differ- involving a clinical prediction rule for screening for depres ential verification bias decreases as the accuracy of the in- sion in primary care(see Table 1)(9). In this study, in- ferior reference standard increases. To estimate the accu- person or telephonic questionnaires were used as the refer acy of the inferior reference standard, its performance ence standard, but because the authors had reason to needs to be compared with that of the preferred reference assume that these methods for assessing dep standard. Although this may not be possible or ethical to similar accuracy, they argued that clinically relevant differ- do within the study, accuracy estimates may be available in ential verification bias was unlikely. When the inferior ref- the literature erence standards accuracy is questionable, however, the monly used alternative reference standard next step is to consider the verification pattern making a final disease classification is follow-up(following Verification Pattern patients over time to see whether symptoms worsen or The pattern of verification plays an important role in improve). The follow-up information is used to decide ret- whether bias is introduced. The most straightforward ver rospectively whether patients did indeed have the disease at ification pattern is when the choice of reference standard the time the index test was done. Assessing the accuracy of fully dependent on the index test results(Figure 3, pattern is difficult, even when the preferred reference A). Studies with this pattern are likely to have biased esti- standard and follow-up are both done in a random subset mates of sensitivity, specificity, diagnostic odds ratios, and f patients, because tl condition of the patients can im- likelihood ratios because these estimates rely partly on dis prove or worsen between when the preferred reference ease status classification by the inferior reference standard standard is performed and the end of follow-up An exception is positive predictive value estimates(the Because estimating the accuracy of follow-up is diffi- probability that a person has the disease given that th cult, it is particularly important to consider the quality and index test results are abnormal). The positive predictive length of follow-up from a biological perspective, taking value estimate is not affected by differential verification into account the natural course of existing cases as well as bias in pattern a because all patients with positive index the incidence of new cases. We provide a cursory example test results receive the preferred reference standard (3) of how this can be done using a study that investigated the Negative predictive value estimates can also still be inter- accuracy of blood oxygen concentration measured at birth reted in a meaningful way in the sense that they provide in detecting congenital heart defects(see Table 1)(23). In information on the proportion of missed cases, as defined this study, newborns with low oxygen levels were treated by the inferior reference standard(see Figure 4 for an by a cardiologist, whereas the rest were followed up after 1 example) year through a congenital anomalies register When the choice of reference standard is not fully Follow-up should be long enough to allow as many dependent on the index test results(Figure 3, pattern B) hidden cases of disease to progress to a detectable stage as differential verification is likely to bias all accuracy esti- possible. However, if follow-up is too long, long, new cases de- mates because they rely to some degree on disease classifi veloping after the index test was performed will also be cation by the inferior reference standard. In the rare case detected. In the example, follow-up might be considered too short because some types of congenital heart defects are detected later in life. On the other hand it was not too Table 2. Questions to Ask When Assessing Risk of long because ce ital heart disease is, by definition, al- Differential Verification Bias eady present at birth. The second point to consider is whether follow-up allows detection of the same type and was the choice of reference standard completely dependent on the results severity of disease as the preferred reference standard. Re- searchers should focus on whether the test being studied If the answer to the first question is no, how accurate is the inferior reference standard?(The higher the accuracy of the inferior reference detects cases that will benefit from clinical intervention standard, the lower the risk of bias. rather than simply the presence of any disease(24). In the What percentage of the participants were diagnosed by use o the interior example, more serious types of defects are probably de inferior standard. the risk of bias is low Several factors must be taken into tected at birth, whereas less serious ones are detected dur- account when determining whether the percentage is negligible.) follow If follow-up is used as the inferior reference standard, does it identify almost though this may not be a problem, if hidden cases present at the time of the index test but very few new follow-up instead detects the more pronounced cases, the cases that develop afterward? Does follow-up detect the same type of estimated sensitivity of the index test will be an overesti cases as the preferred reference standard? (lf the answer to both questions is yes, the risk of bias is low mate of its sensitivity in detecting serious cases DownloadedFrom:httpannalsorgbyaFudanUniversityUseron08/05/2013erence standard and the verification pattern when ruling out the risk of clinically relevant differential verification bias. Accuracy of the Inferior Reference Standard As a general rule, the risk of clinically relevant differ￾ential verification bias decreases as the accuracy of the in￾ferior reference standard increases. To estimate the accu￾racy of the inferior reference standard, its performance needs to be compared with that of the preferred reference standard. Although this may not be possible or ethical to do within the study, accuracy estimates may be available in the literature. A commonly used alternative reference standard for making a final disease classification is follow-up (following patients over time to see whether symptoms worsen or improve). The follow-up information is used to decide ret￾rospectively whether patients did indeed have the disease at the time the index test was done. Assessing the accuracy of follow-up is difficult, even when the preferred reference standard and follow-up are both done in a random subset of patients, because the condition of the patients can im￾prove or worsen between when the preferred reference standard is performed and the end of follow-up. Because estimating the accuracy of follow-up is diffi- cult, it is particularly important to consider the quality and length of follow-up from a biological perspective, taking into account the natural course of existing cases as well as the incidence of new cases. We provide a cursory example of how this can be done using a study that investigated the accuracy of blood oxygen concentration measured at birth in detecting congenital heart defects (see Table 1) (23). In this study, newborns with low oxygen levels were treated by a cardiologist, whereas the rest were followed up after 1 year through a congenital anomalies register. Follow-up should be long enough to allow as many hidden cases of disease to progress to a detectable stage as possible. However, if follow-up is too long, new cases de￾veloping after the index test was performed will also be detected. In the example, follow-up might be considered too short because some types of congenital heart defects are detected later in life. On the other hand, it was not too long because congenital heart disease is, by definition, al￾ready present at birth. The second point to consider is whether follow-up allows detection of the same type and severity of disease as the preferred reference standard. Re￾searchers should focus on whether the test being studied detects cases that will benefit from clinical intervention rather than simply the presence of any disease (24). In the example, more serious types of defects are probably de￾tected at birth, whereas less serious ones are detected dur￾ing follow-up. Although this may not be a problem, if follow-up instead detects the more pronounced cases, the estimated sensitivity of the index test will be an overesti￾mate of its sensitivity in detecting serious cases. When the inferior reference standard is believed to have high accuracy, clinically relevant differential verifica￾tion bias is unlikely and there is no need to look into the other factors influencing bias. This was believed to be the case in an example of differential verification from a study involving a clinical prediction rule for screening for depres￾sion in primary care (see Table 1) (9). In this study, in￾person or telephonic questionnaires were used as the refer￾ence standard, but because the authors had reason to assume that these methods for assessing depression had similar accuracy, they argued that clinically relevant differ￾ential verification bias was unlikely. When the inferior ref￾erence standard’s accuracy is questionable, however, the next step is to consider the verification pattern. Verification Pattern The pattern of verification plays an important role in whether bias is introduced. The most straightforward ver￾ification pattern is when the choice of reference standard is fully dependent on the index test results (Figure 3, pattern A). Studies with this pattern are likely to have biased esti￾mates of sensitivity, specificity, diagnostic odds ratios, and likelihood ratios because these estimates rely partly on dis￾ease status classification by the inferior reference standard. An exception is positive predictive value estimates (the probability that a person has the disease given that the index test results are abnormal). The positive predictive value estimate is not affected by differential verification bias in pattern A because all patients with positive index test results receive the preferred reference standard (3). Negative predictive value estimates can also still be inter￾preted in a meaningful way in the sense that they provide information on the proportion of missed cases, as defined by the inferior reference standard (see Figure 4 for an example). When the choice of reference standard is not fully dependent on the index test results (Figure 3, pattern B), differential verification is likely to bias all accuracy esti￾mates because they rely to some degree on disease classifi- cation by the inferior reference standard. In the rare case Table 2. Questions to Ask When Assessing Risk of Differential Verification Bias Was the choice of reference standard completely dependent on the results of the index test? (If so, the predictive values are clinically interpretable.) If the answer to the first question is no, how accurate is the inferior reference standard? (The higher the accuracy of the inferior reference standard, the lower the risk of bias.) What percentage of the participants were diagnosed by use of the inferior reference standard? (If a negligible percentage of participants received an inferior standard, the risk of bias is low. Several factors must be taken into account when determining whether the percentage is negligible.) If follow-up is used as the inferior reference standard, does it identify almost all hidden cases present at the time of the index test but very few new cases that develop afterward? Does follow-up detect the same type of cases as the preferred reference standard? (If the answer to both questions is yes, the risk of bias is low.) Evaluating Diagnostic Accuracy in the Face of Multiple Reference Standards Research and Reporting Methods www.annals.org 6 August 2013 Annals of Internal Medicine Volume 159 • Number 3 199 Downloaded From: http://annals.org/ by a Fudan University User on 08/05/2013
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有