Fractals:

Measuring What Matters: Biomarkers, Imaging, and Statistical Rigor

With Guest David Raunig [TRANSCRIPT]

Click Here for podcast episode details and listening links.

[Colin Miller]

Hello, I'm Colin Miller, CEO at the Bracken Group, and this is Fractals, Life Science Conversations. Bracken is the professional services firm for life sciences and digital health organizations. Our intelligence ecosystem fulfils consulting, regulatory, marketing and analytics needs with an integrated and strategic approach.

Today's episode will cover biomarkers and their key role in medical imaging and clinical trials, which is why I'm absolutely delighted to be joined by my colleague, Bracken Senior Partner and subject matter expert, Dr. David Raunig. David is a biostatistician and a biomedical imaging engineer who has been deeply involved in medical imaging biomarkers, now for over two decades, specifically in the quantification and imaging biomarker development and the validation and their use in clinical trials. His experience includes the lead or supporting role on five FDA biomarker qualification review teams and five FDA drug development imaging tool validations as endpoints in industry sponsored clinical trials.

With over 28 years of experience guiding clinical trials from phase zero through 2B, David has been instrumental in shaping statistical and biomarker analysis strategies for regulatory submissions. Key highlights of his career include validation of the enhanced MRI score for hemophilia, co-inventor of the pixel compounding method for super resolution in ultrasound imaging, co-chair of the Quantitative Imaging Biomarker Alliance or QIBA guidelines for quantitative biomarker and multi-component biomarker development and validation. He holds executive leadership roles in strategic planning, business development and scientific governance and leading collaborations with top institutions like… such as the CPath, Worcester, Wake Forest, Columbia University and Johns Hopkins, helping to shape and evolve the role of biomarkers in drug development.

I've had the delight of knowing David for many years and being involved in a number of committees. David, it is such a pleasure to see you and welcome to Fractals.

[David Raunig]

Great. Thanks, Colin. And thanks for the introduction.

It's great to be here. It's great to talk about this. This has been a passion of mine for 20-something years.

And I think that biomarkers are biomarkers have come a long way since the Critical Path Initiative started in 2004. And I think that we're going to go even further so I can see nothing but biomarkers in the future for drug development. I hope.

[Colin Miller]

You hope. Fantastic. I think you're probably right.

And maybe what we should do is start off before we start today's discussion in setting the stage and thinking of what is a biomarker, the terminology and its usage. So, I think most of us agree that a biomarker is a measurable characteristic that includes a biological process, disease state or response to treatment. It must be reliably and accurately measured to have clinical research value.

Biomarkers can be found in blood, urine or tissue samples and fall into four main categories. Molecular, physiological, histologic and radiographic. David, before I go on, any other comments to that?

[David Raunig]

Yes. It's…there's been some debate on whether clinical outcome assessments or liquor like scales or scores, for instance, are biomarkers. And they really aren't.

They're not defined by either the ICHC 9 guidelines or by the FDA as biomarkers per se. But they are included in the same governance that the FDA has put out for biomarkers and endpoints. And they are endpoints in clinical trials.

So importantly, a biomarker is not an assessment of how a patient feels, functions or survives. So how a patient would function, say, in a neuromuscular study where you have to measure how far they walk, for instance, that is not a biomarker. So, a biomarker is a biological marker, not necessarily how a patient feels, functions and survives.

However, having said that, it's important to know that many of these outcome assessments or the pros or the scales or some other measurement of how a patient responds to treatment can be treated as an endpoint, as a biomarker, a quantitative endpoint. And so they would fall under the same guidance, and they actually do fall under the same kind of statistical guidance on how you would validate them or qualify them in a clinical trial. So, for instance, how far somebody could walk in a muscular dystrophy trial that can be treated as in the same context with the same statistical properties as a biomarker can and would have to go through the same basic validation plus a little bit more.

[Colin Miller]

To that end, I think one of the challenges with biomarkers, when we're thinking about the use of them specifically in drug development, is ensuring that they are operating on the same pathway that the drug is affecting the physiology of the patient. And I don't know if you want to speak to that, but I think we've both seen challenges where a biomarker might not actually be truly reflecting what's going on because of the therapeutic input.

[David Raunig]

Oh, yes. There are a lot of interfering pathways. The biomarker may not actually be on the causal pathway.

The biomarker may only be a measurement that can be taken at the end. And so, because of that, Fleming and Demetz pointed out four different types of pathways that a biomarker might be situated on, and only one of them actually really helps us out completely. The other ones may partially help us out or have confounding factors.

So, a biomarker must be validated technically. So, analytically, there has to go through a technical validation, and then it has to be qualified for the use. And what that basically means in a single sentence is that it has to change as expected over time due to an intervention or due to biological processes.

And that's why we define biomarkers as a measurement of a biological process. So, in the case where it may not be in the direct causal pathway of the mechanism of action, it may be affected in the same way that it would be had it been in the direct causal pathway. So, it may not be complete, so you may not have a surrogate, but it may be partially complete or complete enough to be able to use for…it's called the context of use.

So, in other words, whatever you need it for, is it useful for what you need to use it for, and does it measure what you're supposed to be measuring? So, when going to the FDA, the FDA wants to know, are you measuring what you're supposed to measure? So, you know, you can go through all the other guidances and guidance language, but basically, are you measuring what you're supposed to measure?

And if it does exist, can you see the change? So, those two things really kind of drive the rest of the…both the technical and the clinical validation of biomarkers.

[Colin Miller]

It's funny you bring up Fleming and DeMets, because I've presented their graphs multiple times, and I think it's actually a very clear visual. With that, you've just brought up another term, and that was surrogate endpoint. Right.

Perhaps you'd like to define it for the listeners.

[David Raunig]

Well, the surrogate endpoint is only one of the 10 different types of endpoints that the FDA and the NIH have kind of defined. So, a surrogate endpoint gives you all the information that you would need in a clinical endpoint. So, let's just say overall survival in a cancer trial.

So, a surrogate endpoint might be an endpoint, not a predictive endpoint, like resist might be, or progression-free survival, but a surrogate endpoint would tell you everything that you would need to know for overall survival. There are other surrogates. And that kind of follows from Prentice Criteria, published 20 or 30 years ago now.

And Prentice Criteria basically says that if you have a model that has an endpoint, you put in the surrogate endpoint as an explanatory model that explains all of the information provided by the endpoint that you're trying to be a surrogate for. So, all of it. So, in statistical terms, that P value that you got that was less than .0001, it now becomes 1 or .5. So, in other words, it explains everything. It explains everything that you need to see as far as the surrogate endpoint is concerned. So, we don't really have surrogate endpoints. And there are endpoints that have been proven to be mostly surrogate.

It's a partial surrogate or a likely surrogate. So, what that would do is it would give you the information that you would need, but not all of it. So, what we have now for surrogacy, I think you could probably call progression-free survival a surrogate endpoint.

But what that means is that it doesn't tell you everything. It doesn't completely predict overall survival. So, when going through the process of qualifying or validating a biomarker, you fall into one of 10 different categories.

And the categories are diagnostic, monitoring, multiple component, predictive, prognostic, surrogate, response, and safety or response measures. So, we do have response measures that act as surrogates. And these definitions are fuzzy because you could get a diagnostic that actually comes down to a monitoring or multiple component that acts as a predictive or prognostic indicator.

A safety or a response or monitoring biomarker might actually be used for efficacy, so a response biomarker in a clinical trial. So, these biomarker definitions are really just to kind of set the stage so that the language spoken between the sponsors or clinical trial sponsors and the FDA is consistent and everybody knows what everybody's talking about. The validation for them are about the same.

There's not a strong difference in validation techniques or qualification techniques between these biomarkers.

[Colin Miller]

I'm going to come back to the validation in a moment or two, but if I can just jump back a little bit onto that part of the conversation. In imaging, I've always used the concept that there's four uses for medical imaging. Diagnosis, prognosis, monitoring, the input of therapy, and monitoring natural history.

And I think they are quite well defined. And yet, in the context of biomarkers, you've suggested they're fuzzy. And I wonder if you want to elaborate because I would have thought they were fairly distinct.

And I wonder if that's because of where we're coming from here. I'm talking as a pure scientist. You've had the beauty of understanding the stats and the underlying components of all of that and how that follows in.

So, your insights are appreciated.

[David Raunig]

FDA and the NIH got together in 2015 and released their best categories, biomarkers, endpoints, and other tools. And what they tried to do in that intervening year was to define biomarkers based on the conversations that have been going around, certainly since before the Critical Path Initiative in 2004, but mostly to set the stage for biomarker qualification program at the FDA. And what that was intended to do was to define a biomarker for qualification for everybody.

So, everybody would be able to use this biomarker under the circumstances prescribed. And you would need no other evidence to show that it would work. So, the best categories were kind of in response to the biomarker qualification efforts that had been going on in the previous 10 or 11 years.

Even in the guidance document, they talk about that one category of biomarker might also qualify to be another category. So, for instance, a predictive biomarker might actually be used in response for a clinical trial. So, a response biomarker demonstrates a biological response to a therapeutic intervention.

So, that's tumor size.

[Colin Miller]

Is it tumor size or is it tumor burden?

[David Raunig]

It's actually, for the case, this is very interesting. RECIST is not a measurement of tumor burden. The only measurement of tumor burden was the original WHO criteria that said you had to measure the volumes of the cross-product volumes of all the tumors in the entire body, which was, if you're familiar with the term, it's an NP-complete problem.

So, it's an NP-complete problem, which means to get to that answer, you have to measure all the tumors. And the change in tumor burden is the response variable. So, that could be determined as a response variable.

It would be the change in tumor burden, and that would be used to define either the criteria for progressive disease or a response or some other measure in a statistical model. There is a lot of discussion on what's prediction and what's prognostic. And this is the way the FDA and the NIH defined it in their best categories.

So, predictive predicts the likelihood of an event or individuals who are likely to have an event, and prognostic defines the likelihood or identifies the likelihood of a clinical event.

[Colin Miller]

Taking a step back, we've kind of discussed a little bit about the WHO criteria and then the measurements of the whole lesion or the lesion load or lesion burden vis-a-vis the resist criteria. And do you think we should be going back to a more complete evaluation now that we've got the ability with AI and just very easy to mark up lesions and track them over time once we've got the baseline done? In your opinion, is this a better approach, at least from a statistical viewpoint?

I won't ask you. Yeah, feel free to add that. Clinical, but from a statistical approach, should we be looking at this and starting to take another look at this area?

[David Raunig]

Well, we can. Presumably, the only reason we went from the original WHO criteria to 10 lesions to five lesions, and there were efforts to move that down to a single lesion. Presumably, the only reason we did that was because of the burden.

So it was reader burden and storage, data storage. We don't really have that problem anymore. So data storage is nearly infinite, and reader burden will fall with the use of AI.

[Colin Miller]

But do you think with the advent of AI, with the advent of us being able to now analyze data so much more rapidly, we should be looking and going back to that tumor burden, the total tumor burden, in your view, from a sort of stats viewpoint and concept?

[David Raunig]

We can, and I think we probably should. So presumably, the only reason we went from all the body to 10 to five, and there are even efforts to go down to one, was to reduce reader burden. So what the AI-assisted radiology reads, there should be no reason why we couldn't look for everything.

Now, we're not going to find everything, but again, resist was meant to be a subset sample of all the lesions in the body with some blocking, so by organ, for instance. And that made perfect sense when the readers were taking maybe, there were 600 patients, and you had a number of different time points, and you had a number of different reads, and you were limited in the number of readers you could use. But today, we have the ability to look at all the lesions, and it might not be perfect, and we might have some false lesions in there, but we should probably look at if the number of false lesions affects the outcome, and whether we can use these AI-assisted reads with some monitoring and some supervision to maybe look at changing whole body tumor burden for whatever that means for some cancers, maybe not for all, but for some. If we're looking at changing whole body tumor burden to see if maybe that provides a little bit better insight into the changing health of the patient. So we can, and I think we should.

You often see it used in phase one and phase two. Some of the reasons are for small biotech, the venture capitalists, they know what resist is, and they want to see resist. Some of the reasons are fear of doing something that's not written down in publication.

Statistically, it makes a lot of sense to go from 10, when you think of the statistical numbers, from 10 to 5 is probably about the largest jump you want to make statistically for numbers, you know, all things else being equal. But that's really, what that gets down to is, below that, you have too much noise, you know, it's the standard error of the mean, basically. So you're not adding enough to your sum of longest diameters to get a statistically robust value that, you know, that doesn't, that's robust, that doesn't have too much noise in it.

And so you do, so you start losing, you start gaining too much noise, you start losing information, and that information loss leads to higher p-values and your significance changes below 5. So the work that Jan Bogers did, along with some other great statisticians, it will stand the test of time, as far as the lowest number goes. But now for the highest number, the same group looked at 10 and said, we don't really need to go more than 10.

And then they redid that work to go, to get down to 5 for resist 1.1. So do we need, do we need more numbers than that? And the answer is not clear. Maybe that's why resist is failing in some cases.

Maybe it's because we're looking at the wrong thing. Maybe we should look at organ-specific lesions. I don't know, but, but the, the continuity between the WHO criteria and the resist 1.1 criteria over the decades is going to be an important factor of whether you, you can use that in a phase three pivotal or confirmatory trial.

[Colin Miller]

Right. And, and I think it also potentially pertains to the training as well. So if you were talking to other statisticians in the industry, what's your recommendation and how to think about best approaching the FDA, just on stats, statistical issues, really the clinical trials you're developing and working on, what's your recommendation there?

[David Raunig]

My first recommendation is that you're not testing hypotheses. You're getting data. You're, you're, you're evaluating the parameters of whatever it is that you're looking at.

So whether it's means and standard deviations or whatever it is that you're doing, you're, you're getting, you're measuring those parameters within a certain amount of precision. The, you're not, you're not sample sizing for a test against the hypothesis because you really don't have a hypothesis. And if you do, it's probably not appropriate, at least for the first steps, maybe later.

My first, my, probably my biggest recommendation is for imaging, go to the QIBA profiles. So QIBA is the Quantitative Imaging Biomarker Alliance, and they have defined how we, they, we, we have, we defined how to do that, how to validate a biomarker as an assay. So, using standard metrological concepts and metrological means is the study of how to measure things.

So, my, my first recommendation is go look at the QIBA profiles. Don't be scared of them. There's sometimes they're very long, but go to the QIBA profiles and find out what was done.

[Colin Miller]

Very good. Thank you. Thank you.

So, well, well, finally, while this has been, in many respects, a very highly specialized discussion around biomarkers and the endpoints in clinical trials, it wouldn't be Fractals without asking our favorite final question. And that is, if you could speak to yourself at the age of 25 or thereabouts, what advice would you, would you give yourself?

[David Raunig]

I have thought about this a lot, especially now I'm up there in age, but my advice would be, don't change a thing. So all of the, you know, all of the birthdays and holidays that I miss by staying out at sea and, and friends' weddings, and luckily nobody, you know, nobody died during that time that I, where I missed them, but I would tell them to, I would tell that person to stay, to do the same thing. You know, as we get older, we look back and we, we think of things that we did and sometimes we smile, but often we cringe.

I wish I, you know, I wish I'd done that differently. But then if we did it differently, we might not be able to be in the same place we are now to say, I wish I'd done it differently. So everything that I did in the Navy, the people that I worked for, some things were good, some things weren't.

The things that good far outweigh the things that did not. And I'm thankful, I'm thankful for all the incredible people I got to work with in the Navy, in the defense industry, and then coming up through Pfizer, almost starting again through Pfizer from 1996 until now. And, and, and here I am with you after all these years.

[Colin Miller]

What a privilege. It has been fascinating. And David, thank you for your time today.

This has been a great sojourn through the, the development process, the nuggets of information you provided along the way. And really your experience in the industry is second to none. And, you know, appreciate all that you've done to help buy market development and also now for our clients at Bracken.

So really appreciate it. And thank you again for your time today and your experience.

[David Raunig]

It's been a complete pleasure to be able to do this. I really enjoy every waking moment doing this. It's great. It's been great. Thanks.

[Colin Miller]

Fantastic. Thanks, David. Fractals is brought to you by Bracken, the professional services firm for life science and digital health organizations.

Subscribe to Fractals via your preferred podcast platform. Visit us at thebrackengroup.com or reach out directly on LinkedIn. We'll be delighted to speak with you.

I'm Colin Miller, wishing you sound business and good health. Thanks for listening.

Fractals:

Measuring What Matters: Biomarkers, Imaging, and Statistical Rigor

Click Here for podcast episode details and listening links.

Contact us for a free consultation.

It's almost like getting free consulting...