Test Diagnostics

Purpose[edit | edit source]

The purpose of this page is to provide users of Physiopedia a quick reference to commonly used diagnostic statistics in physical therapy practice and issues around evaluation of these statistics in clinical research. Diagnostic accuracy statistics are often used to describe the effectiveness of special tests in identifying specific disorders.  Knowing the diagnostic accuracy of special tests is important in obtaining an accurate diagnosis, and in turn maximizing treatment outcomes. [1]

Diagnosis in Physical Therapy Practice[edit | edit source]

Physical therapists use the diagnosis of specific conditions to guide their treatment options. Through the physiotherpy assessment, clinicians gather data to evaluate and form clinical judgements. [1] The diagnostic process begins with aquiring relevant data from the history and physical examination. Some data may be used to focus the examination on a specific part of the body, other to identify a specific pathology and some to select an appropriate intervention. [1]

Diagnostic Accuracy[edit | edit source]

Determining the diagnostic accuracy through the estimation of sensitivity and specificity of a test is the first step in the evaluation of a diagnostic test.[2] This is accomplished by comparing the performance of the test in question with a reference or "gold" standard in a 2x2 contigency table. [2]

[1]
2X2 Table Reference test positive result Reference test negative result
Diagnostic test positive True positive results

A

False positive results

B

Diagnostic test negative False negative results

C

True negative results

D

Study Design Considerations[edit | edit source]

The optimal study design for diagnostic studies is suggested to be the prospective cohort study; in this design, the prospective blind comparison of the test(s) and the reference standard in a sample of patients relevant to clinical practice can reduce possible study bias. [1]

Study bias refers to the successibity of study results to deviation from the truth in a consistent manner. [3] Other factors may also contribute to study bias such as the study population, the diagnostic test, the reference standard and these should be carefully considered when evaluating the results of a study. [1] Fritz and Wainner [1]have summarised these issues in a tabular form which is presented below.

[1]
Study factor
Population Population should be representative of patients on whom test is used
Diagnostic test Intended purpose of test should be clearly defined

Test description in terms of procedures, performance and interpretation of results should be clear

The results of the reference standard should be unknown to examiners

Reference standard Relevant to intended diagnostic purpose

Condition of interest clearly defined

Applied consistently to all study participants

Independent of diagnostic test

The results of the diagnostic test should be unknown to examiners

Overall Accuracy[edit | edit source]

The overall accuracy of a test is defined as the number of correct results divided by the total number of tests conducted i.e. (A+D)/(A+B+C+D). [4]It reflects the proportion of tests that are correct; however, because it does not distinguish false test results, it is considered of limited value. [5]

Sensitivity[edit | edit source]

Sensitivity is defined as the ability of a test to identify patients with a particular disorder.[6] In other words, it represents the proportion of a population with the target disorder that has a positive result with the diagnostic test i.e A/(A+C). [7] Tests that are highly sensitive are most useful for ruling out a disorder, as people who test negative are more likely not to have the target disorder.  "SnNout" is an acronym that can be used to remember that a highly sensitive test and a negative result is good for ruling out the disorder in question.[8]

For example, the Neers Test has been reported to have a sensitivity rating of 0.93 for detecting subacromial impingement.  So, if the test is negative, the examiner can be confident that the patient does not have impingement.

Specificity[edit | edit source]

Specificity is the ability of a test to identify patients that do not have the disorder in question.[9] In other words, specificity is the proportion of the population without the target disorder who test negative for the disorder i.e D/(B+D).[7] Therefore, tests that are highly specific are useful for ruling in a disorder.  The acronym "SpPin" is commonly used to remember that a test with high specificity and a positive result is good for ruling in a disorder.[8]

For example, the Hawkins-Kennedy test for subacromial impingement has been reported by some to have a specificty of 1.00, or 100%. A positive test result is very likely include those people who have impingement.

Predictive Values[edit | edit source]

Predictive values reflect the proportion of patients with a positive or negative result that are correct results. [1] These statistics are calculated horizontally from the 2x2 table. The positive predictive value represents the proportion of patients with a positive test result who actually have the condition i.e. A/(A+B), whereas the negative predictive value refers to the proportion of patients who have a negative test result and not the condition i.e. D/(C+D). [1]

Watch this video[10] for a detailed discussion on the above statistics.

Likelihood Ratios[edit | edit source]

Likelihood ratios are an index measurement that combines the sensitivity and specificty values of a specific test. Likelihood ratios can be used to gauge the performance of a diagnostic test, as it indicates how much a given diagnostic test will lower or raise the pretest probability of the target disorder.[7]  

  • Positive likelihood ratio (+LR) is the proportion of people who test positive and actually have the disorder.  In other words, +LR indicates the shift in probability that favors the existence of a disorder.[11]   +LR is usually calculated by: +LR = Sensitivity / (1 - Specificity)
  • Negative likelihood ratio (-LR) is the proportion of people who test negative and who do not actually have the disorder.  Or, a test with a -LR indicates the shift in probability that favors the absence of the disorder.[12]  -LR is usually calculated by: -LR = (1 - Sensitivity)/Specificity


Interpretation of Likelihood Ratios [11]
    +LR     -LR                                  Interpretation
 > 10.0  < 0.1 Generate large and often conclusive shifts in probability
 5.0 - 10.0  0.1 - 0.2 Generate moderate shifts in probability
 2.0 - 5.0  0.2 - 0.5 Generate small, but sometimes important shifts in probability
 1.0 -2.0  0.5 - 1.0 Alter probability to a small and rarely important degree

Statistical Significance and Confidence Intervals[edit | edit source]

Results of studies of diagnostic tests are commonly analysed with the chi-square statistic and significance level. [1]This tests the hypothesis that the test results and reference standard have no association and it should be interpreted in combination to the diagnostic accuracy estimates and their confidence intervals.

Confidence intervals (CIs) refer to the precision of the diagnostic accuracy estimates. [1]95% CIs are the most common, and indicate the range of values within which the population value would lie with 95% certainty. Wide CIs are not considered clinically important and thus, a diagnostic accuracy value may be questionnable if not precise. [1]

Reliability of Tests[edit | edit source]

The evaluation of diagnostic tests does not stop in the determination of diagnostic accuracy. A test should also be reliable in order to provide consistent and useful information for clinicians. Reliability refers to the ability of a test to produce the same results on different occasions provided that the patient status has not changed. [13] Reliability is considered a precursor to other examinations of the performance of diagnostic tests but is better evaluated when enbedded in the study design of the diagnostic testing.[1]

Resources[edit | edit source]

STARD Statement for Reporting Diagnostic Accuracy Studies

Diagnostic Testing Accuracy by Cochrane Austria

DiTA Diagnostic Test Accuracy database by PEDro

References[edit | edit source]

  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 Fritz J, Wainner R. Examining diagnostic tests: an evidence - based perspective. Phys Ther 2001; 81(9):1546-1564.
  2. 2.0 2.1 Fardy J, Barrett B. Evaluation of diagnostic tests. Methods Mol Biol 2015; 1281:289-300.
  3. Geddes J, Harrison P. Closing the gap between research and practice. Br J Psyciatry 1997; 171:220-225.
  4. Greenhalgh T. How to read a paper: papers that report diagnostic or screening tests. BMJ 1997; 315(7107):540-543.
  5. Bernstein J. Decision analysis. J Bone Joint Surg Am 1997; 79:1404-1414.
  6. Sackett D, Straws S, Richardson W, Rosenberg W, Haynes B. Evidence-based medicine: How to practice and teach EBM.(2nd ed.) London: Harcourt Publishers Limited, 2000.
  7. 7.0 7.1 7.2 Dutton M. Orthopaedic: Examination, evaluation, and intervention (2nd ed.). New York: The McGraw-Hill Companies, Inc, 2008.
  8. 8.0 8.1 Flynn T, Cleland J, Whitman J. User's guide to the musculoskeletal examination: Fundamentals for the evidence-based clinician. Buckner, Kentucky: Evidence in Motion, 2008.
  9. Sackett D, Straws S, Richardson W. Evidence-based medicine: How to practice and teach EBM.(2nd ed.) London: Harcourt Publishers Limited,2000.
  10. Cochrane Austria. Diagnostic Testing Accuracy. Available from: https://youtu.be/9a-d4d4UHD4 (accessed 28-5-2022)
  11. 11.0 11.1 Jaeschke R, Guyatt J, Sackett D. Users guide to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? JAMA 1994; 27: 703-707.
  12. Cleland J. Orthopaedic clinical examination: An evidence-based approach for physical therapists. Carlstadt, NJ: Icon Learning Systems, LLC, 2005.
  13. Batterham A, George K. Reliability in evidence-based clinical practice: a primer for allied health professionals. Phys Ther Sport 2000; 1(2): 54-62.