Why Readers Vary in Interpreting Clinical Trial Images, and When to Worry About It


Inter-reader discordance demonstrates trial integrity and yet is a concern for sponsors of nearly every clinical trial that involves imaging and is something that the FDA reviews. Recently, The Pharma Imaging Network for Therapeutics and Diagnostics (PINTAD), a consortium of international experts in medical imaging for use in drug development research, took up the topic. Aiming to “summarize common reasons for reader discordance” and “identify what factors can be controlled and which actions are likely to be effective in reducing discordance,” the PINTAD group authored two review articles. They’ve been published by The Drug Information Association.

The first article1 focuses on the nature of reader disagreements. The second article2 offers practical statistical methods for better understanding reader performance. Bracken’s own Colin G. Miller was among the team of authors. This post aims to summarize the key points the articles put forward.

A range of factors affect variability.

Minimizing reader discordance in a clinical trial setting begins with understanding what sources of variability are involved. The factors that affect reader performance can be classified as controllable, less-controllable, and not controllable at all. Those that are controllable include expertise, training, the reading environment and setting, and fatigue. The most significant factor in inter-reader variability originates from a radiologist’s own expertise in the subjective aspects of the response criteria.

The authors note, however, that two independent experts will always disagree to some extent. Discordant evaluations are both expected and necessary when readers independently review complex anatomy or evaluate a patient’s response to a therapeutic intervention. In fact, disagreement rates of 20% to 40% on the interpretation of an image are reasonable. That level of variability does not necessarily indicate inadequate performance. Instead, it often merely reflects expected differences. What’s more, it might show where multiple interpretations are well-founded. Still, unexpected levels of disagreement do indicate a need for further investigation.

Reader independence, meaning that one reader’s assessment does not affect another’s, is the prerequisite for convergence toward the truth. High reader performance also depends on these practices:

  • All readers are well trained in radiological assessments for the disease indication and remain so for the entire trial

  • All readers are experts in the use of the imaging modality and the evaluation criteria and are trained in protocol-specific modifications

  • The adjudicator is trained to the same level as the other readers

  • The readers all assess images in the same manner without any relative biases between them

The result is that the combined assessment of multiple readers is a more reliable estimate of the truth than is that of a single reader.

Evaluating reader variability is challenging.

Although seemingly logical, measures of intra-reader variability (IRV) and adjudication rate (AR) might not, by themselves, determine the level of acceptable reader performance. For example, low values for adjudication rate variation could indicate reader independence is compromised. High AR values for indications like glioblastoma should be expected due to the difficulty of the reads. It’s a mistake to hold such reads to the same standards as those of less difficult indications.

Through standard, rigorous statistical methods that the papers’ authors prescribe for clinical trial managers, misinterpretation of the objectives of these methods can be minimized and confusion of the results of the analyses limited. By implementing prospectively designed reader performance monitoring, overall confidence in data integrity and reliability can be achieved with easily accessible statistical tools.

Papers and guidance worth keeping in mind.

These two papers by PINTAD offer a robust framework for sponsors and core labs to use in monitoring inter-reader agreement and discordance and provide the techniques that can be used. In the core lab industry, they should be considered landmark papers.

Since Colin was one of the authors, if you have questions about either of these publications, or medical imaging in general, you can turn to Bracken with confidence. Medical imaging is a core competency for Bracken, and the team is always ready to help.


1. Schmid AM, Raunig DL, Miller CG, Walovitch RC, Ford RW, O'Connor M, Brueggenwerth G, Breuer J, Kuney L, Ford RR. Radiologists and Clinical Trials: Part 1: The Truth About Reader Disagreements. Ther Innov Regul Sci. 2021 Jul 6:1–11. doi: 10.1007/s43441-021-00316-6. Epub ahead of print. PMID: 34228319; PMCID: PMC8259547.

2. Raunig DL, Schmid AM, Miller CG, et al. Radiologists and Clinical Trials: Part 2: Practical Statistical Methods for Understanding and Monitoring Independent Reader Performance. Ther Innov Regul Sci (2021). https://doi.org/10.1007/s43441-021-00317-5




New call-to-action

Subscribe to receive more content