Categories
Lab Talk

Lavanya Umapathy on Disambiguating Prostate MRI and Advancing Medicine Through Engineering

Lavanya Umapathy, postdoctoral fellow who develops representation learning models for medical imaging, talks about improving prostate-cancer screening and using artificial intelligence to approach “the person behind the images.”

Lavanya Umapathy, PhD, is a postdoctoral fellow at NYU Grossman School of Medicine and machine learning scientist with the Center for Advanced Imaging Innovation and Research at NYU Langone Health. Dr. Umapathy develops AI algorithms for the interpretation of MR images, with particular emphasis on representation learning and contrastive learning techniques, which require less annotated data than does conventional supervised deep learning. Dr. Umapathy holds a PhD in electrical and computer engineering from the University of Arizona, where she proposed novel representation learning approaches for MR image segmentation. At NYU Langone, she has recently led the development of an artificial intelligence model to clarify an ambiguous category in PI-RADS, a standard used by radiologists to evaluate prostate MRIs for the likelihood and severity of cancer. Our conversation has been edited for clarity and length.

In a recent study, you and colleagues developed representation learning models to disambiguate a category in the radiology prostate cancer evaluation system called PI-RADS. What is PI-RADS and why does PI-RADS 3 need disambiguating?

PI-RADS is a standard for reporting the findings of prostate MRI. PI-RADS 1 is no risk, 2 is low risk, 4 is high risk, and 5 is very high risk. And then there’s 3, where you don’t want to call it cancer but you also cannot say that the risk of cancer is low, so it’s an ambiguous category. The current clinical recommendation is that people with PI-RADS 3 or higher are referred for a biopsy, which tells you the actual ground truth: you take a tissue sample from the prostate and see whether cancer is present.

With Hersh Chandarana and Daniel Sodickson, we had been curating a big, dataset of prostate MR images taken at NYU Langone for known or suspected prostate cancer over a decade—around 40,000 MRI exams—and we noticed that out of the people who were assigned PI-RADS 3, once biopsied, 64 percent were found negative for cancer.

So, that’s a lot of people who undergo biopsies that turn out to be negative. Biopsy is painful, expensive, and when you have repeated biopsies that come back negative, you can kind of lose faith in longitudinal monitoring.

In this project, our research question was: can we avoid unnecessary biopsies in PI-RADS 3? How can we learn to summarize information in medical data, in this case an image, to get the gist of the image or succinctly describe its content?

That’s the representation part in “representation learning.”

Yes, when a radiologist calls two images from two different patients PI-RADS 1, there is something similar between them. We wanted to learn that, and we used a subset of representational learning called contrastive learning to kind of model how radiologists make decisions when they look at prostate MR images.

Does the term contrastive refer to juxtaposing information from, say, PI-RADS 2 and PI-RADS 4 images?

That’s exactly what it is. Our deep learning model takes in prostate MR images and distills that information into a very low-dimensional representation, a 256-dimensional vector for each image. We want all PI-RADS 1 representations to be similar to each other, all PI-RADS 2 representations to be similar, and so on. We decided to train a model to learn to generate representations using MR images from the cases radiologists are confident in: PI-RADS 1 and 2, and PI-RADS 4 and 5.

In order to learn how to better classify information in a category that radiologists themselves find ambiguous, you went first to the information they don’t find ambiguous at all?

Exactly. We trained our model, which we call a representation learner, to distill information from high-confidence cases. The model learns a representational space where the low-risk cases are clustered together and the high-risk cases are clustered together. Then we thought: why don’t we project all the ambiguous cases onto this space and see where they fall?

And what did you find?

Based on these learned representations, we trained biopsy decision models to see whether we could predict clinically significant prostate cancer. This second set of training was based on biopsy outcomes, and what we found was that specifically in PI-RADS 3, we could avoid 41 percent of biopsies. And a very interesting side effect was that when we used this system on images across all PI-RADS categories, not just the 3s, the model could avoid almost 62 percent of biopsies.

Does that mean that the model was also disambiguating other categories or that the test set happened to have a lot of PI-RADS 3 cases?

There’s a couple of things here. One is that PI-RADS is like a Likert scale, and research has shown that there’s a lot of inter-reader variability. One reader’s PI-RADS 3 may be another reader’s 2 or 4. And second, we observed in the all-PI-RADS group that some PI-RADS 4s—which are findings of high risk—were placed by the model in the low-risk space. At first, we were puzzled, but when we looked at the biopsy outcomes for these patients, they were negative, which is very exciting.

In the paper, you and coauthors use the phrase “hidden representation.” You’ve just brought up something that the model has learned—accurately, as confirmed by biopsy—but radiologists did not see. How do you think about what the model is learning? What kind of insight do you and the research team have into what the model focuses on?

With machine learning as a whole, we’ve gone from manual feature extraction—where you give a set of custom-defined features to a classifier model, which then uses these to decide A or B—to representation learning, with the premise that there is information in the data that might not be obvious to us. And you can only do this when you have a large database. For example, an expert radiologist would have seen over the course of their career thousands and thousands of images. Our goal is to mimic that. And we were very lucky that we happened to have 40,000 images to train our model.

As to what the model is looking at, there are explainability approaches. We looked at Grad-CAM, which creates a heat map that shows what areas in an image contributed the most to the model’s decision. It’s a reality check, because you want the model to be looking at relevant regions. Mind you, we never instructed the model where the prostate is or where the lesions are, so it has to look at the whole volume and decide what accounts for a certain PI-RADS score. And we saw that the model focused on lesions. In some examples where the model is making a negative decision—no cancer—we saw that it’s looking at the entire prostate. So, it’s looking at the anatomy it’s supposed to look at and at the lesions it’s supposed to look at.

You mentioned a dataset of about 40,000 exams, but in the paper it’s a set of some 28,000. Can you talk about the kind of work that goes into creating a research dataset like this and what determines how much of it is useful for a specific research project?

The whole credit of curating this dataset goes to Patricia Johnson and Hersh Chandarana, who were key to putting it together. The data curation is part of a larger endeavor. For this project, there were a lot of criteria that the data had to satisfy. First, we wanted to look at everybody who was imaged between 2015 and 2023 at any NYU Langone clinic. And from there we had to find out how many had a complete exam. Also, we were focused on biparametric MRI, not multiparametric MRI, which uses dynamic contrast enhanced [DCE] imaging.

And the addition of contrast imaging of course influences the PI-RADS evaluation.

That is correct. When a radiologist reads a multiparametric exam, they have access to the DCE alongside the T2, diffusion, and other images. But one of our goals here has been to explore how we can get the relevant information without the images that require a contrast injection. We also had other criteria, which, once applied, trimmed the data cohort to 28,000 images.

How do you know how much data is enough?

For a foundation model, you would need a lot of data to learn what things are relevant. For very specific segmentation and classification tasks people have used maybe a thousand, two thousand images, depending on availability. And there are approaches to leverage lack of annotations and labels—there’s a whole field behind it, and representation learning helps with that as well. But as far as I know, I don’t think I’ve seen such high volumes of prostate MRIs, so we were confident that we’d be able to train our model effectively.

With the understanding that the creation of a dataset like this is not just for the purposes of any one study, how big a part of an investigation goes into cleaning the data, categorizing the data, and so on?

It’s important to get the most out of your data, and we do have automated tools for that. The radiologists had already done the groundwork by telling us the diagnosis for the image, the prostate volume, whether the patient had had prior surgeries or biopsies, whether they have implants—that’s all in the radiology report.

The study on disambiguating PI-RADS 3 is retrospective, and the advance has the potential to make a difference for a lot of patients. On the road from a research investigation to a clinical tool, what happens next?

Especially for PI-RADS 3, we were thinking, could it be like a second reader to the radiologist? Can a radiologist see what the model says about estimated risk and use that information to guide their decision on the biopsy recommendation?

In our study, in the all-PI-RADS cohort, the radiologists miss rate was 7.7 percent, so, when there was a cancer, they found it 92.3 percent of the time. The model had the same sensitivity and the same error rate. But in the group of people who were cancer-negative, radiologists could only avoid biopsies in 33 percent of cases, whereas the model avoided 62 percent without losing sensitivity. It missed the same number of cancers as the radiologists but avoided twice as many unnecessary biopsies.

I think the safest prospective analysis would be one where everyone who would otherwise get a biopsy still gets one but we also record the model’s recommendations and go back to check them against biopsy results. That would give us more confidence in getting to a clinical trial.

Representation learning is an area you have worked in for several years. It was the subject of your doctoral thesis at the University of Arizona and continues to be your focus during postdoctoral fellowship here at NYU Langone. How did your interest in representation learning come about?

I have been working in machine learning since 2016. At that point, it was simple classifiers, decision trees, clustering…  I was in a master’s program, looking at medical image segmentation, and for all of these deep learning models you used to need a lot of labeled data. But that meant someone—usually a radiologist—had to do it. That takes a lot of time and resources, and it’s a known fact that no two experts’ segmentations are the same.

So, I was looking into the problem of not having enough data to train deep learning models and found literature on representation learning approaches—including some really nice articles by Yoshua Bengio and Geoffrey Hinton—and then went into contrastive learning, which is a subset that learns how similar should similar things be.

One of the papers I read around then was led by Yann LeCun, and Sumit Chopra was one of the coauthors. Years later, I was very excited to interview at NYU when I knew that Sumit was going to be one of the people I’d be talking to. It was a very interesting paper on using contrastive learning to cluster representations, and I was thinking that if we can learn representations from the data without telling it more specific tasks, then you could do your downstream tasks with fewer labels.

Being in a master’s focused on machine learning and representation learning for medical imaging is already so specific to the field. How did you find your way to that in the first place?

Engineering was not my first choice; medicine was. I wanted to go to medical school after high school. And, for reasons, it didn’t happen. Instead, I picked up engineering, and after a year in undergrad I realized that I actually love it: there’s logic, there’s critical thinking, you learn the science behind how everything around you works. In the master’s I had the chance to work with people who were in the field of medical imaging, and I thought, fantastic, even if I can’t be a doctor, I can be a scientist working with medical images.

What was it like for you to pursue engineering after having set sights on medicine?

I think I was always fascinated by medicine partly because I feel that it helps you improve people’s lives. I feel medicine is the field where every decision you make guides and shapes the world around you. You make a very positive impact on people. I felt that if I were part of that community, I could contribute in a way that makes me happy. And then I had to switch to engineering, where you’re in a lab.

Where you’re in a lab with the things and not with the people.

Yes, you’re with the things and away from people, and you have to learn how the things work. It was a big transition, because that was not the future I had envisioned for myself. But I realized that I do like learning the science behind things, and it taught me to think critically and solve problems by breaking them down.

When was it that you realized that engineering could bring you back to medicine?

During my master’s I found a really nice lab and my eventual PhD mentors, Maria Altbach and Ali Bilgin, who worked on medical imaging, trying to solve medical problems. I remember sitting with them and looking at liver images showing chronic liver disease, and training a model based on those images. One time, we were looking at an image of stage-4 fibrosis. I felt bad for the patient. It made me very sad. And my mentor told me, your job is to develop frameworks that can help people find these diseases not at stage 4 but at stage 1.

At that point, I wasn’t sure I could do that. It felt like a very far-fetched goal. But I think during the transition from master’s to PhD I took steps toward that. Steps toward the person behind the images.

Engineering became less academic, so to speak.

Exactly. And I felt the passion that I used to feel when I thought about medicine. It’s not how I had originally intended to contribute, but it’s one way I can help someone’s future: help early diagnosis, help avoid unnecessary biopsies. If I can do that, I’ll take that as a win.