Psychometric Properties

Original Editors

Top Contributors - Angeliki Chorti, Amanda Ager, Joseph Ayotunde Aderonmu, Vidya Acharya and Abbey Wright  

What are Psychometric Properties[edit | edit source]

Whether you identify as a student, clinician and/or researcher having confidence in your clinical tools is important. 

Clinicians and researchers use clinical tools on a daily basis for evaluations,  measuring change over time and establishing prognosis for patients. Our clinical reasoning can only be as strong as our tools.  

Having confidence in clinical tools means that they measure what they are intended to measure (validity), they are stable over time (reliability) and can detect changes in conditions (responsiveness).  Collectively, this is called looking at the psychometric properties (or methodological qualities) of a tool or outcome measure. 

Psychometrics is the field of mathematics that is concerned with the statistical description of instrumental data as variables and with the inferential statistical description of the relationships between variables.[1] 

In rehabilitation medicine, psychometrics is usually concerned with measuring an individual’ physical characteristics, ability, perception of change, pain, and types of functional ability. 

Psychometric properties can be applied to questionnaires, outcome measures, clinical tools, scales or special tests. For the remainder of the page, the term "tool" will apply to describe all of these categories. 

Figure 1 COSMIN: Definitions of Domains, Psychometric Properties, and Aspects of Psychometric Properties for Health-Related Patient-Reported Outcomes retrieved from: Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology. 2010; 63(7):737– 45. doi: 10.1016/j.jclinepi.2010.02.006 PMID: 20494804

Level of Measurement[edit | edit source]

Measurement instruments play an important role in research, clinical practice and health assessment.[2]

Researchers and clinicians use measurement as a way of quantifying, understanding, evaluating and differentiating physical characteristics of the human body.[3] This is achieved through the use of, clinical tools with patients. The nature of measurement represents quantifying (measuring) bodily characteristics; for example level of pain, range of motion, strength, or functional outcomes. 

The usefulness of measurement in clinical research or practice helps with decision making and measuring progress during rehabilitation.  

Validity[edit | edit source]

Validity refers to the tool's ability to measure what it is supposed to measure. Is the tool measuring the construct it is intended to? For example, does the goniometer truly measure range of motion? 

Validity implies that a tool is relatively free from error. A tool that is not consistent cannot produce a meaningful measurement. 

If a measurement is valid, it is always reliable. However, a measurement or tool can be reliable without being valid (consistent over time, but not measuring the construct of interest). To be classified as a tool with strong psychometric properties, it needs to be both valid and reliable.[4]

There are many different types of validity. They include:[3][edit | edit source]

Face Validity - The tool measures what it is supposed to (the weakest form of validity).

Content Validity - The sub sections (or items) of the tool adequately sample  the universe of content of the variable of interest (used with questionnaires).

Criterion-related Validity -  The measurement of one tool can be used as a substitute measurement, for an established reference standard (Gold Standard).

Concurrent Validity - Establishes the validity of two measurements taken at the same time (perhaps one tool is considered more efficient than the Gold Standard).

Predictive Validity - The measurement of one tool can be used to predict a future  score of another tool. 

Construct Validity - The ability of a tool to measure an abstract concept (does it measure the theoretical component of the construct or variable). 

Figure 3: Validity measurements of instruments / tools. Retrieved from Souza et al. (2017).
Figure 2: Possible combinations of validity and reliability. Retrieved from Sauza et al. (2017) and adapted from: Babbie (1986).

Reliability [edit | edit source]

Reliability refers to the extent to which a measurement is consistent and free from error.[3] Reliability is often associated with reproducibility or dependability of a measurement.

Reliability is absolutely key to a strong clinical tool, because without it, we cannot have confidence in our tools or measurements, nor can we have strong clinical reasoning. 

It is important to understand that measurements are rarely perfectly reliable, as humans do respond with some degree of inconsistency. For example, if you measure someone's knee flexion range of motion three times, will the measurements be identical all three times? Most likely not, as there will be inconsistencies with the precision of the evaluator and the state of the patient. 

Reliability refers mainly to stability, internal consistency and equivalence of a measure.[5] It is important to highlight that the reliability is not a fixed property of a questionnaire. On the contrary, reliability relies on the function of the instrument, of the population in which it is used, on the circumstances, on the context; that is, the same instrument may not be considered reliable under different conditions.[6]

Reliability estimates are affected by several aspects of the assessment environment (raters, sample characteristics, type of instrument, administration method) and by the statistical method used.[7] Therefore, the results of a research using measurement instruments can only be interpreted when the assessment conditions and the statistical approach are clearly presented.[8]

Figure 4Reliability measurement of instruments / tools. Retrieved from Souza et al. (2017).

Types of reliability[edit | edit source]

1. Test-retest reliability:The test-retest reliability of a test describes the stability of scores obtained by a patient when he/she is evaluated on two separate occasions. This appears similar to intra-rater reliability but in this case, the patient self-evaluates him/herself (for example a pain-rating scale).[9] 

  • Quantitative measure:

- Intraclass correlation coefficients (ICC)

- Bland and Altman method (fidelity between two raters)

  • Qualitative measure:

-Two coefficients, Kappa or weighted Kappa

2. Intra-rater: The same evaluator over time. The intra-rater reliability of a test relates to the stability of the scores obtained by a rater when he/she carries out the test on two separate occasions. A single rater tests each patient twice (or more) with several days in between each test. The patient's state must remain unchanged during this time.[9]

  • Quantitative measure:

- Intraclass correlation coefficients (ICC)

- Bland and Altman method (fidelity between two raters)

  • Qualitative measure:

-Two coefficients, Kappa or weighted Kappa

3. Inter-rater: Different evaluators, usually within the same time period.  The inter-rater reliability of a test describes the stability of the scores obtained when two different raters carry out the same test. Each patient is tested independently at the same moment in time by two (or more) raters[9].

  • Quantitative measure:

- Intraclass correlation coefficients (ICC)

- Bland and Altman method (fidelity between two raters)

  • Qualitative measure:

-Two coefficients, Kappa or weighted Kappa

Other statistics associated with reliability:

  1. Pearson product-moment coefficient of correlation;
  2. Spearmann rho (ordinal data);
  3. Intraclass correlation coefficient (ICC) (correlations and level of agreement);
  4. Kappa statistics (percent agreement).

If there is a question about the stability of the measurement over time, the standard error of measurement (SEM) can also be calculated.[3]

Minimal detectable difference (MDD) / also called the Minimal Detectable Change (MDC):  The amount of change in a variable (a measurement) that must be achieved to reflect a true difference. 

It is important to understand that the MDC is not the same as the Minimal Clinically Important Difference (MCID). The MCID reflects the amount of change that needs to occur to be clinically meaningful. In general, the MDC will be smaller than the MCID values.[3]

Responsiveness[edit | edit source]

Responsiveness, also known as sensitivity to changes, is the ability instruments have to measure small changes that are clinically important, where participants or patients respond to effective therapeutic interventions. This is considered an important part of the longitudinal constructs assessment process.[10]

A tool is said to be sensitive to change if it can precisely measure increases and decreases in the construct measured. This is important for tools which are used to evaluate changes following a therapeutic action. The aim is to measure the capacity of the scale to detect small but clinically significant changes. When an outcome measure is sensitive to change, the score increases as the patient improves, decreases as the patient worsens and does not change if the patient's state remains stable.[9]

Clinical Bottom Line[edit | edit source]

When you are using a tool, a questionnaire or a functional outcome measure with your patient, you want to have confidence in your tool. Is it measuring what it is supposed to? Is it reliable over time? Can it detect the difference between a healthy joint and a pathological joint?

To be confident with your tool, you need to have strong validity, reliability and responsiveness. Otherwise, you cannot be certain as a clinician that your treatments are truly helping your patient.

Be supportive and encouraging to researchers who are conducting methodological studies on psychometric properties, they are heightening the quality of rehabilitation medicine.

References[edit | edit source]

  1. Russell, EW. Chapter 2: The Nature of Science. The Scientific Foundation of Neuropsychological Assessment: With Applications to Forensic Evaluation. Copyright © 2012 Elsevier Inc.  DOI https://doi.org/10.1016/C2011-0-04279-5.
  2. de Souza AC, Costa Aleandre NM, de Brito Guirardello, E. Psychometric properties in instruments evaluation of reliability and validity. Applications of Epidemiology. Brasília, 26(3), Jul-Sep 2017. doi: 10.5123/S1679-49742017000300022.
  3. 3.0 3.1 3.2 3.3 3.4 Portney LG, Watkins, MP. Chapter 4: Principles of Measurement, within Foundations of Clinical Research : Applications to Practice, 3rd Edition. F.A. Davis Company, Pennsylvania, United States 2015. ISBN10 0803646577. 
  4. Gellman MD, Turner, JR. Psychometric properties in Encyclopedia of Behavioral Medicine. 2013 Edition. Springer, New York, NY doi: https://doi.org/10.1007/978-1-4419-1005-9_480.
  5. Martins GA. Sobre confiabilidade e validade. RBGN. 2006 jan-abr;8(20):1-12.
  6. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J Psychosom Res. 2010 Apr;68(4):319-23.
  7. Roach KE. Measurement of health outcomes: reliability, validity and responsiveness. J Prosthet Orthot. 2006 Jan;18(1S):8-12.
  8. Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, et al. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol. 2011 Jan;64(1):96-106. 
  9. 9.0 9.1 9.2 9.3 Fermanian J .[Validation of assessment scales in physical medicine and rehabilitation: how are psychometric properties determined?]. Ann Readapt Med Phys. 2005 Jul;48(6):281-7. Epub 2005 Apr 25.
  10. Aaronson N; Alonso J; Burnam A; Lohr KN; Patrick DL; Perrin E; Stein RE, Quality Of Life Research: An International Journal Of Quality Of Life Aspects Of Treatment, Care And Rehabilitation [Qual Life Res], ISSN: 0962-9343, 2002 May; Vol. 11 (3), pp. 193-205; Publisher: Springer Netherlands; PMID: 12074258.