Artificial Intelligence and Machine Learning to Better Inform Patient Outcomes

SESSION 5: Artificial Intelligence and Machine Learning to Better Inform Patient Outcomes
Moderator: Tamara Lotan (Johns Hopkins University)

Pathology Deep Learning Tools for Localized Prostate Cancer
Tamara Lotan (Johns Hopkins University)

Navigating the Prostate Cancer Journey with Genetic Testing, RNA, AI, and MRD
Hosein Kouros-Mehr (Myriad Genetics)

Digital Pathology for Patient Management and Treatment Decisions in Prostate Cancer and Urologic Oncology
Andre Esteva (Artera)

Computational Approaches for PSA Trajectories to Guide Therapy
Julian Hong (University of California, San Francisco)

View the Transcript Below:

Artificial Intelligence and Machine Learning to Better Inform Patient Outcomes

Tamara Lotan, MD [00:00:11] Okay. So, we’ve saved the best session for last today. Welcome to the AI and machine learning session for prostate cancer. I think we have a really great lineup of both academic and industry folks working in this space. So, I’m gonna actually kick off the talks if we bring up my slides, talking about pathology-based deep learning tools in prostate cancer. And what’s going to distinguish my talk, I think, from some of the others is that I’m focusing on deep learning tools that really use as their sole input the histopathology images. Oops, I guess I’ve got it here. Okay. So, this has been a long-standing collaboration, and I presented here before on some of this work between collaborators at Johns Hopkins and an Indian AI company called AIRA Matrix. And these are the folks that have been involved in the studies. And we’re very proud that we now have some trainees who’ve gone on to be PCF YIs and have research programs of their own. We do disclose research funding from AIRA Matrix. So, the talk will be really in two parts. The first is really focusing on tools that are essentially almost currently available to assist pathologists in their daily workflow right now. And in the second part of the talk, I’ll talk about what I kind of think about as the black box algorithms, maybe a little bit less understandable from a histologic perspective, but you know, equally important in terms of using the histology to predict risk of metastasis directly from the tissue. So, AI grading in prostate cancer needle biopsies has been around for a while. Prostate has really been a test case for a lot of histopathology-based AI algorithms. And you can see that these grading algorithms on the left, you know, very nicely identify the cancer. They annotate the cancer glands that are pattern three versus pattern four. They give very specific and quantitative estimates of the percentages of these patterns and the amount of tumor and area of tumor, much more than the pathologists can do visually. And the vast majority of these algorithms have really been validated in mostly fairly highly curated data sets, often using benchmarked against a gold standard, which is the pathologist’s grade from a panel of pathologists, because there’s sort of this baseline interobserver variability for Gleason grading among pathologists. But you could see in a recent contest in a set of these kinds of curated images, you know, two of the commercial algorithms, including one from AIRA Matrix, one from Paige, had the highest agreement with pathologists’ grading by this majority vote standard. So, we’re looking at the quadratic weighted kappa here. But a lot of the publicly developed algorithms, which are all the rest of the ones on the right side, did fairly equally well in many cases. So, there are many options in this space. So, we were interested in validating some of these AI algorithms in more real-world data sets, prospective population-based cohorts as opposed to these sorts of highly curated pathology cohorts. And so, for this we leverage samples from RESPOND, which is a large study of prostate cancer in African American men. PI is Chris Hyman at USC, and our lab functions as the pathology core for this study. And so, we have been accumulating large numbers of prostate biopsies and prostatectomies from this study. And we do centralized grading, so we have a pathologist take a look at all of these and regrade them when they arrive. And we ran a number of these biopsies, more than 800 through this AI algorithm for grading and you could see the confusion matrix here and had pretty good agreement given that we’re looking at a single pathologist as the benchmark standard here with a quadratic weighted kappa of 0.81. You could see in the red boxes that these algorithms are more sensitive than they are specific in terms of tumor detection. We had some slides that happen to have no tumor on the deeper cuts, but this is exactly what we want from tools that essentially assist the pathologist in their workflow. And you could see on the right side you know what these annotated slides look like when the algorithm is run on them. Again, annotating in green and blue various Gleason patterns. And if we look at the area of tumor quantified by either the algorithm or the pathologist, there’s very high correlation. So, they’re very good at identifying tumor. But that being said, we still need pathologists. So, what does it look like when these algorithms have a miss? So, on the left-hand side, you can see an over call. The AI thought this was a grade group five prostate cancer, but all the pathologists in the room, I think, would agree that this is very obviously just a super-inflamed biopsy with lots of wall-to-wall lymphocytic inflammation. On the right side, you could see the algorithm missed the cancer in this very fragmented core. There were small foci that have been circled by the pathologist in blue of grade group four. So really thinking about these as tools that assist the pathologists, we’re not yet ready to be replaced. So, we were interested in testing these algorithms in maybe more clinically relevant context, and I’ve talked about some similar studies we did when I talked here previously. This is looking at these AI grading algorithms in an active surveillance cohort where of course the grade is really critical for patient management. Here we did the study in an MRI-screened, more contemporary cohort, almost 170 men from Hopkins all had an MRI, all were graded as grade group one by contemporary Euro pathologists. We scanned in all of the cancer-containing slides from their biopsies, including both systematic and MRI targeted cores. And the outcome, we had them, these slides, then re-read or regraded by either a panel of pathologists or the AI algorithm. And the outcome that we assessed was subsequent reclassification to grade group two. So, you could see at the top left that when the Euro pathologist upgraded the case, it was not associated with any risk of future grade reclassification. In contrast to that, when the AI algorithm upgraded the case to grade group two or higher, there was about a five-fold increased risk of future reclassification in regression models. And this maintained its significance even after we corrected for other variables, PSA density, et cetera, that we know are predictive for grade reclassification. So, we can also look at these AI grading algorithms and prostatectomy samples. You can see on the right that they beautifully annotate tumors. This patient has a posterior lateral tumor and also an anterior tumor and define the pattern three and pattern four areas. Here we took a single representative slide from almost 800 prostatectomies at Hopkins. We had either the pathologist regrade the slide or the AI algorithm regrade the slide, and you’re looking at the Kaplan- Meier curves here for metastasis-free survival, and also the Harrell C-indices, which is a measure of the area under the curve for the Cox models. And you could see that the performance of all of these different grades is very similar in terms of predicting future metastasis-free survival. So, they perform AI and human grading in the prostatectomy setting perform very comparably. We looked at this in another cohort where we had actually paired biopsies and prostatectomy samples highly enriched for metastatic events. Subsequently, here we scanned all of the tumor-containing slides, so a little bit more like a real-world pathology review. And we asked, you know, we looked at the pathologist grade or the AI grade. And you could see again, fairly comparable in terms of the C index for metastasis. And in both cases, looking at sort of the slide with the maximum AI grade was really the most predictive for future metastasis as opposed to taking all of the tumor tiles and looking at the composite grade. And that, of course, this aligns with what we do clinically, right? We look at the highest-grade biopsy and use that to dictate management generally. So, another task we were interested in creating a model for was pelvic lymph node metastasis screening. This is probably a more trivial AI task. But for the pathologists in the room, a very tedious pathology task where we’re looking at large numbers of pelvic lymph nodes and screening them for small metastatic foci at prostatectomy. Here we trained on relatively few images and created an AI algorithm that annotates in red all the tumor foci within the lymph nodes. And then we tested it on two different cohorts of pN1 patients, including one that had had preoperative PET PSMA imaging. So, these were very small metastases that had basically been missed on PET imaging. And the sensitivity of the algorithm for detecting lymph nodes with metastases was very high across the board. Lower specificity, which is exactly what we would like in an algorithm that’s going to just screen and annotate areas that the pathologist then needs to double check. And again, very nice alignment between the pathologist annotated total tumor area and the AI total tumor area. Here’s an example of one of these lymph nodes that has been screened by the AI algorithm with two regions annotated. In the top right, you can see a very small area of tumor, just a few glands sitting under the capsule of the lymph node that have been correctly identified by the AI algorithm. And in the lower right, you can see what turns out to be actually a collection of histiocytes that have been identified as potential metastatic cancer by the algorithm. And this is a very common mistake, actually, we see pathology trainees make when they’re in their first few years of training. So finally, we were interested also in some diagnostic classifiers, and this is very early work, but we’re, you know, one of the sorts of holy grails in prostate pathology is to try to nail down our diagnosis of neuroendocrine prostate cancer, which as everyone knows can be very challenging, particularly in the treated setting. And here we trained a relatively simple classifier, just a dichotomous classifier, on untreated patients who either had very high-grade adenocarcinomas or full-blown small-cell neuroendocrine prostate cancers. And then we tested them initially in some TMAs from kind of a similar types of cohorts. These were largely untreated patients with very obvious either neuroendocrine or adenocarcinomas. And the AI algorithm had a reasonable specificity, 96% for identifying the neuroendocrine prostate cancer spots in the TMA that we tested. So, then we fine-tuned it in a treated cohort to try to produce a more continuous probability score of neuroendocrine prostate cancer and also tested it in a cohort that had more treated cases. And you can see that we still have a lot of work to do with this algorithm. But there is some discrimination between the treated NEPC cases and the adenocarcinoma cases, although maybe less so with the cases the pathologists called poorly-differentiated carcinoma. And you know, in this setting, I think, as we I think we’ve discussed before, we really have to think about what is the gold standard we’re gonna benchmark these algorithms against. Probably we wanna use treatment response or transcriptomics or something that maybe is a little bit of a less wobbly gold standard. Okay, so in the last two minutes or so, I’ll just talk a little bit about this, the more black box algorithms to just predict metastasis directly from the histopathology. I discussed this the last time I presented here, but these algorithms you know in our projects use first tumor identification with patches generated from the tumor bearing areas and feature extraction followed by classification. So, we initially created, and actually, this paper was literally just published online today. So, we initially created an algorithm to use prostatectomy samples to predict the risk of metastasis. This is a continuous risk score from zero to one, about the, you know, that’s indicative of the probability of metastasis using a single representative prostatectomy slide from the case, trained in fairly small cohorts. But you could see even with these fairly small training cohorts. When we test invalidation cohorts, we get a better C-indices for the AI score than for genomic classifiers, including the Decipher and also the Prolaris classifier in these cases. So even with relatively small cohorts, we’re doing fairly well in terms of predicting metastasis. We don’t beat grade group and stage from the whole prostatectomy in every case, but we add to that when we add it into the models. So those were all Hopkins cohorts, hospital-based cohorts. Again, really important to test in prospective population-based cohorts. So, we teamed up with a wonderful group at Harvard School of Public Health to look at the health professionals follow-up study and physicians’ health cohort. And here we used a TMA classifier, so we sort of tweaked our prostatectomy classifier to now take very small tumor tissue input, about a thirtieth of the total tissue input that we would get from the prostatectomy sample overall. And this is because we only had two tissue microarrays available for this population-based cohort. And you could see again a metastasis classifier that outputs probability of metastasis. Here we’re looking at the Kaplan-Meier curves for cases that had a predicted probability of metastasis of less than 1%, 1 to 20%, or greater than 20%. And it corresponds fairly well to their actual risk of lethal disease over the next 25 years. And I’m also showing you the hazard ratios for some of these comparisons, and they remain significant even after we correct for grade group. Of course, the Harvard Health Professionals follow-up study is mostly men of European ancestry, so very important to test in other populations. So, we have taken some initial looks at the RESPOND cohort. Here we don’t have mature follow-up data, so we have to, you know, we definitely have to wait until we have that. But initially, we can look at the AI metastasis risk predicted from a single slide from prostatectomies and respond and how it correlates with the CAPRA-S score, you can see that on the left, and we get a nice stepwise increase as we would expect, because we know CAPRA-S is a reasonable predictor of metastasis overall. And then finally, we were interested in moving our TMA classifier into needle biopsies to predict risk of metastasis. Needle biopsies and tissue microarray cores have similar orders of magnitude of tumor tissue, so they’re fairly similar from that standpoint. And so, we used a cohort from Hopkins, again, all surgically treated, very enriched for metastatic disease. Here we scanned all of the tumor containing slides again. So again, trying to be a little bit more real world in terms of practice, and we fed them all into the algorithm. We selected the maximum AI risk score that the pay that the patient had over all of those slides. And you could see the C-index here for the AI score versus CAPRA or NCCN in this cohort. Again, we’re not beating CAPRA or NCCM, which is not super surprising given that both of those include more clinical pathologic parameters, PSA, clinical stage, et cetera, than just the histopathology, but still proof of principle that we can start to predict risk directly from needle biopsies. And we do add to the CAPRA-S, or to the CAPRA score rather. Okay, so with that, I will wrap up just to emphasize that you know, diagnosis and grading of prostate cancer are poised, I would say almost already, becoming pathologist supervised rather than pathologist-driven. You know, we have a new cohort of sort of AI residents who can screen our cases for us. AI algorithms that surpass genomic classifiers are already feasible with relatively small cohorts, and I think we’re gonna see a huge proliferation of these over the next few years, likely available as sort of apps that the pathologist can plug and play into their digital image management software to provide a metastasis classifier score in addition to grade and all the traditional parameters at the time of diagnosis. And then finally, you know, AI for molecular classification and therapy prediction obviously is the next frontier. And not to be too much of a spoiler, but that I think will be the topic of some of the other speakers that you’ll see here. So, with that, I’ll thank other collaborators who contribute to these cohorts who weren’t named on the prior slides, and of course our funders, and take any questions. 

Adam Dicker, MD, PhD, FASTRO, FASCO [00:18:24] So thank you, Tamara. So why was there a degradation of the C-index in the needle biopsies for metastasis? Is it a sampling error? 

Tamara Lotan, MD [00:18:35] Compared to prostatectomy? Yeah, I think absolutely, right? Yeah. So, we don’t have… the difference is the prostatectomies we’ve identified pathologically the dominant nodule, and we are only sampling that. Only feeding in tiles from that nodule. Whereas in the needle biopsy, and these are fairly old because we had to wait long enough to get metastatic outcomes. You know, we’re just dealing with whatever happened to be sampled, no MRI at that time and so forth. Good question. 

Daniel Spratt, MD [00:19:07] Fantastic, fantastic talk. One kind of provocative question, one sort of comment. The question is, you know, why do we grade cancer? We, you know, it was never designed, as you well know, to actually be prognostic. It happens to be modestly prognostic. So why are we using this immense power of AI to predict something as crappy as a Gleason score? Why don’t we simply remake using outcomes an actual prognostic quote unquote grade that can be informative to actually how we treat patients? 

Tamara Lotan, MD [00:19:41] Yeah, I mean that that’s why the talk was in two parts. So, the second part was to directly train these. I mean, I think partly, you know, because of regulatory issues, we are gonna just see these incremental steps. And so, as pathologists become, you know, routinely digital in their workflow, I think part of that change management is just get them used to using algorithms that are helping them with the tasks that they’re familiar with and feel comfortable with. And then obviously I think, I imagine we’ll introduce these other classifiers sort of on top of that for some period, and then maybe when people are comfortable that we’re outperforming them, we’ll throw out grade. But I think it’s a just a change management issue. 

Daniel Spratt, MD [00:20:21] And I think a comment just ’cause a lot of people now are doing this is, obviously often if you send a genomic classifier test out, it may not exactly be the same sample that you’re running the AI algorithm. And so, we just did a hundred thousand patients of AI and genomics, and they actually look pretty dang close. So, the question is ultimately what’s gonna be best and I have no idea. But it’s very impressive stuff you’re doing. 

Tamara Lotan, MD [00:20:46] Yeah, it I think also something we’re trying to work a little bit on, and maybe this is more of a pathology-oriented question, but is how do we choose what slide to do any of these things on, right? Even for the classifiers, it hasn’t been well defined. Yeah. Yeah. 

Unknown [00:21:01] Yeah, very nice work. My question is to what extent do the algorithms that inform you where they’re focusing their attention? And if you look at those areas if they allow you to, does it tell you anything that is a known feature like IDC, perineural invasion, other things that are explainable or is it in some high dimensional vector space that we don’t know yet? 

Tamara Lotan, MD [00:21:23] Yeah, so I think I had on the last slide, but I didn’t have time to talk about the image itself, but it’s in the manuscript, I think in the supplementary data. We can take each tile or some aggregate of tiles and show the risk score by tile, and so we can get a heat map across the slide for you know some indication of explainability. And it does look, you know, we haven’t done anything quantitative with that, but just you know, the gestalt is that you do see you know, clearly grade group one and the lowest tier, and then we see cribriform in kind of the intermediate tier and necrosis, you know, in the top or middle. So, it does seem to, as we would expect, sort of align with Gleason, but we haven’t really looked at that in a quantitative way. One more question. Okay, Max, you’re in. 

Unknown [00:22:10] Yeah, great work, Tamara. Related to the two questions, when you break down in tiles, what’s the contribution of the tumor microenvironment and the stroma to determine metastasis? Because there’s a lot of data that supports. 

Tamara Lotan, MD [00:22:25] Yeah, that’s a great question. So that’s maybe a little bit of a limitation of some of the work I’m showing because we use that tumor identification algorithm first. So that’s not to say that there’s no stroma in those tiles that we’re analyzing, but it’s definitely depleted for stroma as opposed to just inputting kind of agnostic to tumor the entire slide. So great question, and there may be very important information encoded in stroma. Great. Okay, super. So, moving on, our next talk is by Hosein Korous-Mehr, who’s gonna talk to us about navigating the prostate cancer journey with genetic testing, RNA, AI, and MRD from myriad genetics. 

Hosein Kouros-Mehr, MD, PhD [00:23:11] Thank you. It’s really great to see everybody. I was on this stage six years ago, actually chairing a session on bi-specifics. And one of those drugs we talked about was AMG 509 and just wanted to acknowledge Dr. Englert and the Xalu investigators on really terrific work. So, I’m switching gears here on the diagnostic side now, and we’ll tell you about tools to help you navigate the prostate cancer journey, and that includes AI, but certainly these other tools as well. Some disclosures. So, at Myriad, we have products on the market and in development, spanning really the entire prostate cancer journey from unaffected men to patients with metastatic prostate cancer. So today I’m going to focus on the Prolaris test and how we are updating this test with digital AI pathology capability through a partnership with PATHOMIQ. We’ll also tell you a bit about our Precise MRD test, which Dr. Sartor alluded to. This is our ultra-sensitive MRD test to detect circulating tumor DNA in indications like prostate cancer. Before that, I just did want to mention that we have MyRisk on the market. This is our germline testing product. For women, MyRisk comes with a risk score to predict lifetime risk of breast cancer. And while we don’t have a risk score for men, I did want to acknowledge PCF investigators like Tyler Siebert, Jason Vassy, Chris Hyman, who are validating a polygenic risk score for men, and maybe someday soon we’ll have a PRS tool to estimate a lifetime prostate cancer risk. Did want to mention as well we have Precise Tumor, that’s our cancer genome profiling test on the market. And we also have MyChoice, it’s an HRD test approved for ovarian cancer, and we’re validating it for prostate, and that could significantly expand the pool of HRD positive patients, and I’ll tell you a little bit about that. So Prolaris has been on the market for over a decade now. It’s really an ideal test for the active surveillance decision because it’s been trained and validated in untreated prostate cancer patients. So Prolaris it’s a qPCR test of 31 cell cycle-related genes, and there’s what’s called the CCR or Prolaris test score that can predict for a newly diagnosed prostate cancer patient whether to go on active surveillance or to get single modal treatment or multimodal treatment. And so, the CCR score in Prolaris is made up of CCP, that’s the qPCR component, as well as CAPRA, that’s the set of clinical variables like age, PSA, Gleason, and others. And those coefficients you see are based on training and validation in a number of studies, spanning NCCN risk groups and spanning Gleason scores, looking at a range of treatments and also validating on different endpoints, including metastasis and disease-specific mortality and biochemical recurrence. So, to give you a sense of the clinical performance of Prolaris on the left, there is a study from Dr. Lin showing for active surveillance. These are patients in orange who are above the Prolaris active surveillance threshold. You can see they have prostate cancer mortality versus those patients below the active surveillance threshold did not show any prostate cancer mortality in this study. On the right is that multimodal threshold, where above a CCR score of about 2.1, patients are recommended to get multimodal treatment with radiation plus ADT. And so Prolaris can tell you about the absolute risk reduction of adding ADT to RT, and patients below that multimodal threshold are less likely to benefit from a treatment intensification. This is a meta-analysis that we will be submitting soon. It’s over 7,000 patients, looking at over a dozen studies, spanning NCCN risk groups for patients with newly diagnosed disease, and getting a range of treatments, and you can see in predicting metastasis and prostate cancer mortality very highly significant hazard ratio here at the bottom. And so, look forward to sharing these data soon. And so, what we’re doing now is we’re updating Prolaris with new AI capability. This is through a partnership with a company called PATHOMIQ. And so, they have a digital pathology AI tool that from a single H&E image can predict prognosis and treatment response and provide a lot of information about the tumor architecture, which I’ll show you about. And so, by combining Prolaris with this AI test, we think we have potentially a best-in-class approach to unlock predictive signals to look at you know biology and have greater confidence in clinical decision making. And so, the PATHOMIQ PRAD tool from that H&E slide, it can predict outcomes, including metastasis and biochemical recurrence. But it’s more than a black box, it can really tell you a lot about the biology of that tumor through the AI algorithm. It can tell you about Gleason patterns within the tumor, architectural components, cribriform patterns within the stroma, it can even predict gene expression, and so there’s a lot of research power here. It’s a very strong hypothesis generating tool, which we really love. Here’s how the PATHOMIQ PRAD AI tool performs across the prostate cancer journey. You can see in that active surveillance setting, it can predict the risk of Gleason upgrade. At the time of surgery, it can predict outcomes like metastasis-free survival. And in the salvage setting, it can also predict prognosis and has some predictive capability that we’re leveraging. And so, this is unpublished data. We combined Prolaris here with the AI tool, looking at these two studies, a conservatively managed cohort on the left there, RP cohort on the right, looking at these endpoints, disease-specific mortality, metastasis, biochemical recurrence. And what we’ve seen is that by combining the Prolaris components with the AI components, we see by far the best prognostic performance. And so, we’re very excited now to use this combined tool really to not just unlock better prognostic performance, but to really confer predictive capability. And so, we’re starting in that active surveillance setting. We’re gonna update Prolaris with new capability to predict Gleason upgrade and better performance. But we are looking across the prostate cancer journey and looking at adding new predictive capability, whether it’s intensifying treatment with IO or ARPi, and this is very much in collaboration with all of you, and we’re very open to collaborating. So, let’s chat if there’s any interest here. Switching gears, this is our Precise MRD test. Now, MRD has been on the market. It’s an established clinical endpoint in myeloma, as you know. But in solid tumors, there’s a need for an ultra-sensitive test because those first-generation MRD tests out there aren’t really sensitive enough for indications like prostate cancer. And so, this test is tumor-informed. We start with a tumor and normal, and we just need a couple of slides of tumor, and you know a tube of blood, and from that, we identify tumor variants. We track about a thousand variants per patient. And so, when we collect plasma to look at circulating DNA, we enrich for those variant regions of interest. And then we do ultra-deep NGS to look for that circulating DNA. And so, we can get down to single-digit parts per million, about a hundred times lower than the first-gen MRD tests out there. And as you can see for prostate cancer, this is really critically important to understand which patients are responding to treatment and who’s having an early recurrence. We have a couple dozen clinical validation studies ongoing. This is one we presented at ASCO from National Cancer Center East. They’re the group that ran the circulate and the galaxy studies with Signatera. And so, they chose us for our ultrasensitive test, and this is called the MONSTAR study. They’re enrolling over a thousand newly diagnosed solid tumor patients and really tracking ctDNA across the journey. So, this is the baseline ctDNA signal. You can see that 100% of these patients, this is the first hundred patients enrolled in the trial, hundred percent of them had a baseline ctDNA. And about 11% of them were in the ultrasensitive range. And in particular, for indications like breast, urothelial, renal, and prostate, as I’ll show you, that’s critically important to have an ultrasensitive test. On treatment after that surgery, you can see the ctDNA levels drop at one month in blue there and three months in green there. And so, at one month after surgery, about a quarter of these patients were ctDNA positive, and most of those patients in that ultrasensitive range, again, that’s below 100 parts per million. At three months, about 22% of the patients were ctDNA positive, about a half of those in the ultrasensitive range. And so those patients who are ctDNA positive are much more likely to have a recurrence, and you can see that association between one-month MRD and disease-free survival on the right. And we’re very excited by the data as it’s maturing. Look forward to presenting an update very soon. Now, what about prostate cancer? We looked at a panel of 38 prostate cancer samples. This is mostly localized. You can see they’re non-metastatic, mostly node negative. And we detected ctDNA in 84% of the baseline patient samples, most of them in that ultrasensitive range. So that would have been missed by some of the tests out there on the market. And the average signal was at about 13 parts per million, so quite low here. For those patients where we had Gleason information, we found ctDNA even in patients with Gleason 7 disease. And there was some you know mild association with PSA levels, but you can see there we had patients with a PSA of 10, some of whom had very high levels of ctDNA, and others with quite low. So, we think this MRD will be quite important for the types of studies we’ve been talking about all day, who should get treatment intensification or deintensification. And we are launching this test in breast cancer, but we really need new partnerships, collaborations in prostate. So, we look forward to collaborating with all of you in the PCF, our new prospective trials to show the clinical utility here. Lastly, I wanted to mention HRD. So, we all know there are PARP inhibitors approved for HRD positive patients, but we’re running into some problems. One is pretty low rates of genetic testing, particularly among urologists, but also pretty low percentage of prostate cancer patients who have an HRR mutation. So, to that first problem, I mentioned we’ve got MyRisk, we’ve got Precise Tumor, we have basically tools, you know, turnkey solutions that we can give to a urologist to do that, you know, really help them with genetic testing. But also, we’ve got real world evidence you know, capability, we have a registry, and so we can generate data to support things like universal germline testing for prostate cancer, which I know is you know, a huge effort. And so yeah, we would again look forward to collaborating with all of you on that, generating that real world evidence. In terms of the second issue there, so it turns out that there are patients who are HRD positive but don’t have HRR mutations. And so, there is a signature of HRD that you can find with genomic rearrangements. And so, things like loss of heterozygosity, telomeric allelic imbalance and large-scale state transitions can predict HRD. So, we have a test, it’s called MyChoice, HRD test approved for ovarian cancer, and we are validating this HRD test for prostate and looks quite promising and this could you know potentially expand the pool of HRD patients by as much as 30 to 35% beyond HRR mutations alone. So, we’re very much open to also collaborating with you on this test as well. So, in closing I think this is a new chapter for Myriad. We’re very much committed to R&D and really assembling these you know products that span the cancer patient journey. I think there’s going to be strong synergies when we combine let’s say our MRD test with Prolaris-PATHOMIQ capability and I think we’re gonna be able to answer some new questions and really deliver on that promise of personalized medicine. So, I’ve got my email there. Please reach out if you have any ideas, you know, collaborations with retrospective you know, types of trials or prospective. We’re open to both. And yeah, with that I thank you for your attention. 

Unknown [00:37:08] Hey nice talk. Thanks for sharing this all these ideas. So probably the best way to measure HRD at the moment is to look for the mutational signatures of microhomology at breakpoints and so forth. So, is the signature that you’re proposing expanding upon that? So, you so you think there are patients that don’t have the traditional HRD signature that will be positive by a different measure? 

Hosein Kouros-Mehr, MD, PhD [00:37:38] So in ovarian cancer, so MyChoice is approved in ovarian cancer, and my choice provides what’s called a genomic instability score, a GIS. So above a certain threshold, patients are HRD positive, and we’ve shown that they benefit from PARP inhibitors. And so, what you know, that ovarian signature doesn’t work in prostate. So, we’ve developed a prostate specific genomic instability score. So obviously we’re gonna have to validate it, but at least you know the early data look encouraging. I look forward to sharing it at some point. I wasn’t able to share it today. 

Daniel Spratt, MD [00:38:12] So I have to ask a provocative question. So, we know now the updated ProtecT trial results in low-risk prostate cancer mainly, with now almost 20 years of follow-up, with no MRIs, no serial biopsies, no genomics. Treatment was not as good back then. They do phenomenal. So this isn’t just to you, this is to all the companies in the space, you know, Veracyte, Myriad, Atera, or whoever makes GPS now, how do you guys really feel when you see this data that you think in low-risk prostate cancer and you give this test that none of them for any of the companies is validated to say you should be on surveillance or not, right? None of you have prospective data. How do you feel? Because for all the tests, it is pulling some men away from surveillance when these men treatment itself doesn’t help. So, how do you guys talk about this internally? 

Hosein Kouros-Mehr, MD, PhD [00:39:11] Yeah, no, that’s a great question. So Prolaris was trained and validated on a cohort from Dr. Cuzick of patients who were untreated, and it is more likely to recommend active surveillance for those low-risk patients compared to you know Decipher an Artera. I would agree this space would benefit from prospective trials as breast cancer has already done. So yeah, I think there is value in prospective data, but at least Prolaris has been trained and validated in that patient data set, and that’s why it is more likely to recommend AS. But great, great question. Yeah. 

Speaker 9 [00:39:51] Sorry. Yeah. I guess just to point out about the ProtecT trial, you know, more than half the patients eventually did get treatment. And I’m not sure how validated the decision was to actually get treatment. So that’s maybe a space.  

Hosein Kouros-Mehr, MD, PhD [00:40:10] Agreed. Yeah, agreed. I think it there’s definitely room to develop new predictive markers, even in active surveillance, that are really robust prospective trials, and I think we need to really think and invest in those trials and that’s a great point. All right, thank you. 

Tamara Lotan, MD [00:40:38] Great, so our next speaker is Andre Esteva, who’s coming to us from Artera to talk about digital pathology for patient management and treatment decisions in prostate cancer and neurologic oncology. 

Andre Esteva, PhD [00:41:03] Okay, good afternoon, folks. It’s a pleasure to be here. Pleasure to see all of you. Thank you to the organizers and Mike and Gina and everybody else for the invitation. It’s a real privilege. My name is Andre Esteva. I’m the co-founder and CEO of Artera AI. I’m an AI scientist by training, and I’ll be talking about AI enabled therapy personalization. Full disclosure, I’m the founder and CEO of Artera. Before I begin, just a big hats-off and a huge debt of gratitude to Felix Feng, who I think most, if not all of you, knew. Felix was a brilliant clinician, a great human, and my co-founder at Artera. We were introduced by Marc Benioff, the Salesforce CEO back in 2019. I was working at Salesforce, and Mark had contributed to Felix’s research at UCSF. And after a couple of years, we ended up incorporating and forming Artera near the end of 2021. Those of you who have ever founded a company know just how many hours and how much intensity goes into it. So even though I only knew Felix for a total of six years, it felt like much longer. He was like my brother by the end. And his legacy lives on through the work that we do. If you don’t know the story, he passed away from a very rare and hyper aggressive stomach cancer. Artera strives to be a pan cancer company, and one day we’ll also have tests for stomach cancer patients. So, this is kind of how digital pathology and AI typically work. Many of you are familiar with it. You’ll take your specimen, you’ll digitize it, you’ll get a very high-dimensional image that runs through an AI algorithm that sometimes combines other forms of data. And then typically you’ll either diagnose, prognosticate outcomes, or predict therapeutic response. What we do at Artera is strictly the latter two. We prognosticate outcomes and we predict therapeutic response, we do not diagnose. And this method is very cost-effective, it’s non-destructive to tissue, and it’s very fast. It can run instantaneously. So, when you think about Artera in the context of research, we take a very collaborative approach and one that follows very rigorous publication. For instance, we worked very closely with the RTOG, and at this point, over 50 different groups in various cancers with a focus on prostate cancer. We took eight phase three trials from the RTOG and worked with a number of you, and we’ve published a number of papers in this space focusing on prognostic and predictive algorithms. We’ve now published in prostate cancer with over 15,000 patients. This number is actually out of date. We’ve published over 55 abstracts and manuscripts and similar over the last four years. And we’ve covered the spectrum of localized to metastatic, low-risk to very high-risk cancer. So, we’re very, very focused on research. Clinically, we’re also available as a test. Strictly speaking, we are an AI platform that runs in clinic, but you experience it as a test. We’re reimbursed by Medicare and have about 100 million lives of coverage, so one in three patients will be covered. We are in the NCCN guidelines with level one evidence as predictive and prognostic. And we just received FDA clearance a couple months ago as a class two de novo with breakthrough status, which is particularly exciting because that means you’ll now be able to run Artera in your own clinic or in your own pathology lab, be compensated for your work, it’s a billable test, and have the pathologist report and the Artera test report with you at the same time. So, you can have a conversation with a patient about them being diagnosed with cancer, but also how you’re going to treat them. So, let’s begin by talking about the clinical validation that we’ve done. And this is work that we’ve done with many of you. We have hundreds of collaborators that we’ve published with at Artera. It’s really thanks to the warmth and welcoming of this community that we exist. So, let’s talk about how well the AI works. This is the flow of how data gets processed through our cloud. You’ll take a prostate cancer patient, you’ll take some clinical data, which is optional, age, PSA levels, and T stage, along with digitized pathology, specifically the H&E histopathology. You’ll run that through the AI, and you’ll get back prognostic risk scores, typically in the form of distant mets, PCSM and similar biochemical recurrence. And we’ll also predict therapeutic benefit depending on the risk group of that patient. This table shows the clinical validation across the risk groups in the NCCN from very low to very high. So, the prognostic form of Artera has been validated in all these groups for 10-year distant mets as well as for 10-year PCSM. We have a signature, which we call active surveillance insights that helps predict adverse pathology at radical prostatectomy, excuse me, for very low to favorable. We have a signature for abiraterone that helps you identify patients with benefit from abiraterone for high and very high-risk patients, and for intermediate risk patients, we can predict therapeutic benefit for hormone therapy, short-term ADT. Let’s dive into the science, but just to give you an overview, these are some of the works that we’ve published in the last four years, from localized to metastatic across all risk groups, low, intermediate, and high. We focus on very top-tier publications: Nature, The New England Journal of Medicine, The Lancet, a number of different conferences and so forth. So here you’re looking at the tool outperforming standard NCCN risk groups, which we’ve shown time and time again in a number of studies. These bar charts represent the area under the curve of sensitivity and specificity, it’s tDAUC specifically, of three different outcomes. Distant mets at five years. Distant mets at 10 years and prostate cancer specific mortality also at 10 years. And we’ve validated it time and time again, including in different racial subgroups, African Americans, Asians, and similar. And it’s very robust. Every time it’s consistently shown performance above NCCN risk groups. Here we’re looking at a study in intermediate risk localized cancer. So, we showed using RTOG 9408, long-term trial that probably many of you know, in which intermediate risk patients were put on radiation, and some were randomized to have a short-term hormone therapy as an adjuvant. And what we found was that AI could identify that roughly two-thirds of patients that did not benefit from the addition of ST-ADT. You see that in the left-hand plot, and what we call the biomarker positive group. Excuse me, in this left-hand plot, you see patients that did benefit. This was a trial with about 1,600 patients, and the AI found roughly the 500 patients that benefited very meaningfully from ST-ADT. You see in the distant mets rate about a 15% improvement if you give those patients hormone therapy. On the right-hand side, however, you see patients that did not benefit from hormone therapy. In the long term, the outcomes are roughly the same as measured by the distant mets rate. So, these are patients that could comfortably skip hormone therapy and avoid all the toxic side effects. In this study, we leveraged STAMPEDE, RTOG 9202, 9902, and 0521, validating on a number of different patients. This was a prognostic study where we took a look at the 10-year risk of distant MUTS in excuse me, in over a thousand patients. This was focused on NCCN high and very high-risk disease. As you can see in the separation of the curves, the MMAI high group had a much higher risk of distant mets versus the MMAI low and intermediate risk population. In this particular study, now we’re looking at high-risk patients. This was to predict the benefit of abiraterone as an adjuvant and as recommended in the guidelines. So here we were working with data from STAMPEDE. It was about 1,300 patients pooled from high-risk and non-metastatic patients. What we found was that the AI could identify about you know the 25% of patients that benefited very meaningfully from the addition of abiraterone. In the middle plot, you see about a thousand patients that, in the eight-year time horizon from beginning to end, had roughly the same metastasis-free survival rate. And on the right-hand side, you see a very pronounced difference, about a 40% increase in the metastasis-free survival rate from the addition of abiraterone. And it’s the same story that we just saw for intermediate-risk. What the AI identifies as high-risk versus low allows you to bend patients into these two groups, omit excess therapy for the lower risk group, and give very beneficial therapy to the higher risk group. We’ve also been able to show that the MMAI is predictive of hormone therapy post-surgery. This is our post-RP biomarker. And it can help you identify patients with benefit from hormone therapy in addition to salvage radiation. This was a study using RTOG 9601 and 0534. The plot on the left shows in about 200 patients the MMAI high-risk group and about 300 patients on the right. Again, you see this very nice separation of curves as shown in the distant metastasis-free rate. Here you see about a 20% increase in the distant mets free rate for the higher risk group when they omit the hormone therapy as an adjuvantal radiation. But in the group on the right, there’s no difference in outcomes when giving them more therapy. So, same story. You can skip excess therapy in the lower risk group, and you can get very beneficial and needed therapy in the higher risk group. So that was a bunch of studies that we’ve published in the last several years, a sampling of the broader set that talk about clinical validation. Does the AI work? So now let’s talk about analytical validation. Is the AI robust in a real-world setting? Does it deploy as you would expect? And what we found is that the AI, if trained properly, with the right set of guardrails and similar, is consistent across scanners, operators, and variations in tissue cores. These are two different studies shown on the slide. So, in the top, this was an analytical accuracy study in which we analyzed about 60 tissue samples that were digitized with two different scanners, the Leica AT2 and the 3DHistech P1000. And what we found was that the MMAI model was incredibly consistent across it too. Hardly any difference in inter scanner concordance. And on the bottom, you’re looking at what we call a reliability study, where we took 30 cases and we took the same operator of the scanner, multiple operators. We took a look at cases where you would input one core versus multiple cores. And we saw that the models were very, very consistent. As long as you give the AI the slide with the highest grade of Gleason, the output is very consistent, independent of operator, day, and so forth. This was a particularly interesting study that we published recently where we took our foundation model and compared it to all of the open-source foundation models that exist for pathology. And if this is the first time you’re hearing the term foundation model, we’ve all played around with ChatGPT. That is a foundation model for text. Foundation model means you’ve trained an AI on so much data that you expect it to be able to generalize to tasks it hasn’t seen before. And so, there’s been a lot of push in in recent years with the rise of GPUs and similar, to train very, very large-scale models. We call those foundation models. And there’s a number of different foundation models in the open source that have been trained on open-source pathology data. And what we found is they all fail in a real-world setting. Every single one fails in a real-world setting. We use two forms of the study: slide fading and slide noise. Slides fade over time, especially with the bleaching that occurs from a scanner. And there’s plenty of noise that can be injected into a slide from all sorts of things related to operator use, issues with the scanner, and so forth. So, what you’re looking at here is a comparison of Artera’s model to all of the other foundation models. Artera’s is the blue line at the top, which you cannot see on my pointer, but it’s a very it’s the very top blue line on the right and on the left. And what you see is that the scores don’t change with respect to noise and fading. On the other hand, all of the open-source foundation models do. The reason for that is very, very intuitive. Most of these models that you see in the open source are actually trained using highly curated data sets that are not representative of real-world messiness that you would get from actual pathology samples. Okay, so we’ve talked about clinical validation, analytical validation. Can you train the model? Does it work in a real-world setting? And finally, let’s ask the question can AI change clinical practice? This is a particularly challenging one-two study, but you can do it in the form of clinical utility studies. So here we have a shared decision-making before and after trial we call ASTuTE that we’ve been running with our friends from Genesis Care down in Australia. It’s with radiation oncologists, it’s with intermediate-risk patients that are getting radiation. And you need to address the question of would they benefit or not from hormone therapy. So, in this study, what we do is we ask clinicians, how are you going to treat this patient before they see Artera and after? And what we have found is that there is actually, after seeing the results, there is a 70% reduction in the use of hormone therapy in these patients. And about 30% of patients change their regimen from getting hormone therapy to not as a result of seeing Artera. And there was over an 85% final agreement between what the clinician chose and what the AI recommended. Finally, you see similar discriminatory performance and prognosis relative to genomic testing. This is a study that was just published at ASTRO comparing Artera’s AI to Decipher’s genomic classifier in patients managed with surgery on the left and patients managed with radiation in the middle. In both cases, the AI was found to be slightly better than genomic classifiers, an AUC of 0.97 versus 0.92 in a post-RP setting, and 0.88 versus 0.83 for genomics in a radiation setting. And what’s interesting is that there are potentially very distinct biological signals here that each of the two classifiers is tapping into. You can see that in the correlation plots on the right, where you take a look at how the AI or GC correlates to various biological pathways. And the signature expressions are completely different between what the AI correlates with and what the genomic classifier correlates with, pointing at possible addition of these two models. As a final thought, I’m going to show you guys one slide on preliminary results in bladder cancer. It’s really the same story that we’ve seen time and time again with prostate. Our foundation model has now been trained on over 100,000 patients from about 10 different cancers. And so, it’s unsurprising that with a very small amount of data, you can get really outstanding results in a new cancer site. Here you’re looking at data in bladder for non-muscle invasive cancer, and you see again a very nice separation of curves, like we’ve been seeing between high-risk patients and low-risk patients. This was a very small study relative to what we typically publish with about 570 patients, 143 in validation, but still a huge jump in progression-free survival. From a low-risk to a high-risk population. I’m a little bit over time, so with that I will open it up for questions. Thank you for your time. 

Unknown [00:57:31] Hi, that was a great talk. When you showed the STAMPEDE trial with abiraterone, when I looked at the slide it showed that most patients did not benefit from abiraterone. I’m trying to reconcile the amount of benefit that we know we get from abiraterone versus your prediction that most patients did not benefit. 

Andre Esteva, PhD [00:57:51] So it’s very intuitive what happens here. Typically, yeah, if you have a trial and that trial is successful, you’ll get some marginal benefit from therapy intensification. We’ve seen it time and time again, we’ve seen it with abiraterone, we’ve seen it with hormone therapy, and so forth. Unsurprisingly, in every single one of those trials, you have your responders and your not responders. And what we have found is that almost every single time when you apply the AI into these populations, it will find that about two-thirds of patients don’t respond, two-thirds, 60 to 80 percent, let’s say, and about 20 to 40 percent respond very, very strongly. So that’s exactly what we saw in the study with abiraterone. 

Speaker 9 [00:58:31] Abiraterone tripled the progression-free survivals. So, I guess I’m just statistically how could three quarters of patients not benefit when if you average it out it tripled the progression-free survival from like, you know, 16 months to 40 months? I’m trying to just reconcile those two observations. 

Andre Esteva, PhD [00:58:47] I’m sorry, could you come to the closer to the mic? I’m having a hard time. 

Speaker 9 [00:58:50] Oh, so is this the STAMPEDE trial looking at newly diagnosed metastatic prostate cancer ADT plus or minus abiraterone? Is that the one you analyzed? 

Andre Esteva, PhD [00:59:01] That’s exactly right. 

Speaker 9 [00:59:02] Right. So, we know the PFS, the hazard ratio is like less than point five. So, I’m just trying to reconcile how you can have such a dramatic benefit in the whole population, but then you predict that three quarters of population did not benefit. 

Andre Esteva, PhD [00:59:22] It’s the averaging effect. Lo let’s talk offline and we can walk you through some of the evidence. 

Speaker 9 [00:59:27] And just one more question. Sure. So, when you have these predictive, so looking at adding ADT from one trial, do you need a kind of a validation cohort, or do you think that’s enough? 

Andre Esteva, PhD [00:59:39] You absolutely need a validation cohort. Every single one of the studies that I just showed you was a validation cohort. We were taking a look at a trial that the AI had never seen before in its training set. 

Unknown [01:00:00] Great talk, so I have a question about the foundation model. Was it trained on specifically prostate cancer images? And what cohorts did you use? 

Andre Esteva, PhD [01:00:07] That’s a great question. So, some of our earlier studies, for instance, on the first slide I showed you some of the prognostic results that we published in Nature Digital Medicine some years ago, those were trained purely on prostate cancer patients. AI innovates really quickly. So, some of the studies we published, let’s say this year, our AI foundation model has actually been trained on multiple different cancer types. 

Unknown [01:00:34] And so also another question. I’d love to hear your thoughts about the expected performance, you know, the cohorts, the foundation models that you train, for example, in a specifically in a clinical trials cohort, let’s say, right? So, what could be like do you expect any bias, for example, when deploying and using those models in a real-world setting? 

Andre Esteva, PhD [01:00:59] Could you repeat that? I’m sorry. Could but do you expect any what? 

[01:01:02] Bias. 

Hosein Kouros-Mehr, MD, PhD [01:01:02] Any bias? Yes. You have to be very careful about bias, you have to make sure you study different populations, and you have to make sure that you protect the AI against the bias that’s going to exist in a real-world setting. That bias, in part, can happen from variations that you will get in data sets, meaning different racial groups or different ages or whatever the case might be. So, you need to make sure that your validation sets are very large and very diverse. There’s also a more subtle and much more challenging kind of bias to detect within the systems themselves. That’s why I was showing analytical studies, where you just make sure that there aren’t biases that are introduced at the scanner or due to operators or due to other issues that can happen with real world deployment. 

Unknown [01:01:47] Right. Thank you. 

Unknown [01:01:50] Hi, hopefully a quick question. On your last slide about a bladder cancer model. Did you benchmark with histology core of high-grade versus low-grade and T0 versus T1 on your cohort? I know this is very small, but it looks so great. I’m just curious, does that add value on top of the pathology core? Thank you. 

Andre Esteva, PhD [01:02:09] We did compare against standard risk groups, and it does outperform them. Happy to show you some data offline. 

[01:02:15] Thank you. 

Unknown [01:02:17] Quick question. Very interesting talk. D prostate cancer in general is a very heterogeneous disease. So, could you comment on how your model can predict heterogeneity in future or can it do it now? 

Andre Esteva, PhD [01:02:32] That is a great question. It’s actually one of the foundational strengths of using AI in this kind of cancer. So, when you look at a core of tissue and the distribution of Gleason grades across it, you can have a huge distribution. You can have a situation in which you have three to five hyperaggressive cancer cells and the rest of the tissue is massively benign. That is something that can be very, very challenging for most techniques, but when it comes to using computer vision on this kind of data, you get access to every last cell. And so, you can identify that heterogeneity from an image and account for it. You might also be referencing the heterogeneity that exists across patient populations, from low-risk to high-risk to metastatic and so forth. So, when it comes to that kind of heterogeneity, well, you have to go and you have to validate in all those patient subpopulations and make sure that the tool works. 

Speaker 11 [01:03:22] Thanks. And quick technical question. Is your foundation model trained on TMA core punch biopsies or full-format FFP versions? 

Andre Esteva, PhD [01:03:33] We’re we train on whole slide images. 

Speaker 11 [01:03:35] Thanks. 

Andre Esteva, PhD [01:03:36] Yeah. 

Tamara Lotan, MD [01:03:43] Okay, so next up we have Julian Hong from University of California, San Francisco, who’s here to talk about computational approaches for PSA trajectories to guide therapy. Welcome. 

Julian Hong, MD, MS [01:03:59] Alright, thank you for having me. Really excited to be discussing some of our work on behalf of our team at UCSF and in partnership with Johnson and Johnson. So, we’ll just jump right to it. All right, so I’ll start off with some of the obvious points to make, which is that prostate clinical trials take time. And so, there are important ramifications for the time that it requires for us to bring products to fruition. Certainly, it slows the drug approval process and thereby extends the period of time that’s required for patients to receive drugs. It also keeps patients on potentially suboptimal therapies during the course of the trial. And so, the natural question is can we move this faster? And of course, there’s been a lot of interest in trying to accelerate clinical trials, particularly through the mechanism of using surrogate endpoints. But we asked the question, you know, can we leverage longitudinal PSA responses? And PSA has been an area that’s been of a lot of interest over the years, and you know hasn’t panned out as a true surrogate for reporting clinical trials. And so, we were wondering, well, you know, over the course of clinical trials, we do collect longitudinal PSAs over time. So maybe we can leverage some of these complex trajectories and relationships to try to predict the outcome of clinical trials. And so, the overarching goal of this study is essentially to develop a computational process to provide some early readout of potential clinical trial outcomes. And really the work that I’ll be discussing today is intended to be applied on the trial level as opposed to individual level prediction models. So, as you can see from the timeline, you know, we start off on a clinical trial, and many trials and even advanced prostate cancer require years to complete, and that takes time. And so, then the question is: can we predict this trial outcome and try to shorten the timeline that’s required down to the months level, and then thereby save us more time. So, we were fortunate enough to collect data from several clinical trials. Many of these are recognizable for those of you in the room and encapsulates you know over 7,000 patients with PSAs collected being collected essentially monthly. And it’s a diverse set of trials, as you can see. There are a number of different agents, the different disease states, different degrees of castration sensitivity. And beyond that, there’s also real diversity in outcomes. And you know, all of these trials have been reported previously, but this slide is just to demonstrate sort of the spectrum of timelines that occurs across different studies. And so, what we tried to do was we took this collection of studies, and we separated them out into two groups, as many of you are familiar with. We separate into a training group and then a test group essentially to serve as validation. And so, these sets of trials are entirely isolated from each other in the analysis. And then the idea was to build out a model and consequent workflow for essentially simulating out trials and to try to predict the outcome. So, I thought I’d share with you a little bit about sort of the huge amount of PSAs that we’re sort of managing on a day-to-day basis and how they’re sort of fundamentally collected pattern-wise on these trials. Essentially, you know, trying to put together this analysis requires a balance between trying to shorten out the reporting time, but also to gather enough information to actually make a substantive prediction. You’ll also notice that, which probably doesn’t come as any surprise to the folks in the room, that over time you start to have a decrease in the number of people who have PSA measurements, both because of events but also because of censoring. And so, you’re trying to balance all of these things together. And what we landed on through a couple of different analyzes and trying to identify the amount of information content behind the PSAs is we sort of landed on a timeline of about four months. The idea being that we were able to have a good number of patients that had some trajectory information over three consequent PSAs, but also being able to shorten the timeline appropriately and at least be able to minimize the consequences of missing data and informative missingness. So, we then took that PSA data and then sort of sliced it into more complex variables, the idea being that we could create a high-dimensional representation of how PSAs change over the passing of time. And PSAs on these trials were all collected at roughly a 30-day interval with some degree of irregularity. And so, we anchored ourselves to those time points, really looking at absolute values, relative changes, trajectory, a number of sort of characteristics that have been evaluated and been found to be correlated with outcomes, but not necessarily surrogate endpoints. And then assembling that data to create sort of a high-dimensional space. We then sort of progressed to putting together a simulation process. And so, the idea is how can we, you know, model out how a trial is going to go over time. And so, we took this data that I just described using just the training trials, and then used an approach called an adaptive lasso to select out the variables. And so, an adaptive lasso essentially takes high-dimensional data, penalizes you for overfitting to try to narrow things down in a concise model. So that outcome model then is sort of similar to my little image here, the New York Times election ticker that you guys probably remember from Elections. And we took that model and then transferred it into sort of our next phase. So, in our validation cohort, we took those same characteristics, ran those through the outcome model, and essentially simulated the remainder of each of these trials a thousand times. So essentially, you’re taking four months’ worth of trial data and then trying to extrapolate out from those four months to the end of the trial. And so those are the trials that we ran this validation in. I’ve give you a little bit of a peak under the hood for the prediction model. A number of these clinical variables have been described in prior publications as being prognostic, and then a handful of sorts of PSA features at specific times that were also relevant in the design of the model. So here are some of the results for our validation trials. So, LATITUDE, COU-AA-302, and MAGNITUDE. So, to explain these figures, essentially the blue density plot is a distribution of all of the thousand simulated outcomes from our model. And then the red line is the actual reported results from the publications with the point estimate of the hazard ratio for survival as the red dot. And so, the name of the game is to try to get those two things to overlap as much as possible. As you can see, you know, there’s sort of different elements that we’re trying to figure out. The first of these is you know whether or not a trial is positive. Can we confirm that the model predicts appropriately? And you can see that there’s agreement between the modeled-out outcomes and the final outcome. I’ll highlight that the MAGNITUDE result is specifically in the unselected patients of MAGNITUDE. The other sort of key thing is the hazard ratio and to assess the effect size. So here you can see essentially, it’s how far you are from the hazard ratio of one. And you can see that there’s a decent amount of overlap. There’s a little bit less overlap in COU-AA-302, but for the remainder of the trials, there’s pretty good fundamental overlap. And I will say, like, in general, you know, trying to interpret the consistency across the simulations, the actual trial results can be somewhat challenging because the trials are actually run in reality only once. And so, we know from scenarios where the same scientific question has been asked multiple times, sometimes trials will disagree. And so, it’s really hard to assess, you know, kind of that prediction with one specific trial outcome. But we do the best we can based on the confidence intervals. And of course, as I mentioned, you know, trying to validate this across sort of different types of scenarios, you know, particularly MAGNITUDE. We were fortunate enough to then partner with the CHAARTED team to add them to our validation set. So, this is actually entirely added subsequent to the design of this entire study. So, we had already built the model, it was already locked in, and we were fortunate enough to partner with their team. And so, we’ve broken this one down across the unselected patients, high volume and low volume, to provide some simulations. And as you guys can see, first of all, I think the first thing I’d point out is that there’s a little bit less confidence in the predictions from the simulations. So, the peaks are essentially spread out a little bit more over a probability range. But the kind of key finding here is as you all recall, you know, the predominance of the effect in CHAARTED was in the high-volume disease group. And so, you can see that sort of hazard ratio finding is predominantly in that group as opposed to the simulations in the low volume disease. So, we have a number of efforts that are ongoing to kind of build on this model, working on collecting data from other trials with different modalities of therapy. We’re also trying to sort of diversify the different disease scenarios. And then we’re also working on various assessments in the real-world data space, particularly, I think saliently to try to work on emulated clinical trials. I should also shout out David Quigley, who I know is hanging out in the audience somewhere. And he and Li Zhang have been working on integrating biomarkers into some of this work as well. And then I thought, you know, Andre got me excited because he was talking about foundation models. So, I thought I’d talk a little bit about some of the like broader term vision. And so, a lot of this work is sort of like one use case for trying to understand complex longitudinal that we encounter every day in our clinical practices. We’ve been fortunate enough to partner with Stanford’s group in the new Weill Cancer Hub West, Franklin Huang, who I think is closing out today is part of this team as well. And the real vision here is to take the complex longitudinal data that we routinely collect in clinical practice and try to assemble it to build essentially a cancer-centric electronic health record foundation model. And so, you know, patients come in for different types of cancers and over time experience different events throughout their cancer journey. And this, of course, results in changes in therapy, changes in outcomes, different types of toxicities. And the idea is to generate a model that can help us learn these relationships that are potentially generalizable across cancers and then hopefully applicable to different use cases similar to the PSA model that we’re trying to accomplish to improve how we deliver care to patients. So, I think I’ll wrap up just to say that you know we were able to develop an externally validated approach, essentially to simulate clinical trials and give us an early readout with four months’ worth of data. And the hope, of course, is that this will allow us to expedite clinical trials. There are also some other trial applications, which I didn’t talk about as much, but it does have potential implications for providing go or no go type decisions. One thing that I think is interesting is also, you know, I mentioned earlier that like we run clinical trials once, and we do accept a certain amount of false negatives. And so, I’m sure many companies have negative trials in their so-called drug graveyard. And it potentially this might be an opportunity to sort of reassess some of those scenarios. Then, of course, we’re working, we’re doing quite a bit of work to try to add on additional validation. With that, a lot of people to thank. I’ll give a special shout out to Ali, who’s standing over or sitting over there, who led a big amount of this work. I certainly thank our patients, of course, and the patients who participated on these trials, PCF for funding this with a challenge award and my I think esteemed group of investigators at UCSF and my co-PIs on the PCF grant. And then this day has been filled with shout-outs for Felix. And you know I absolutely thought I’d take a moment to also acknowledge him. He recruited me to UCSF and none of this work would have been possible without him. 

[01:17:45] Julian, fantastic work. A couple of quick questions. So, there is a lot of missing data on the PSA trajectory over time, right? 

Julian Hong, MD, MS [01:18:05] Yes. 

[01:18:05] Instead of taking the cut point like at four months or around that, did you think of doing a Bayesian joint model analysis or like running a Markov model to see what the missing pattern is and then do a Bayesian joint modeling to predict? And in the simulations that you did, those are confidence intervals or those are credible intervals? 

[01:18:30] So the simulations are purely the distribution of the outcomes. Yeah. That one was quicker. For your first question about the PSAs. I think it’s a great question. We’ve actually thought a lot about you know, it’s hard because it’s you know, PSA data is often, especially in real world practice, irregularly spaced, but even within like the context of a trial, mildly irregularly spaced, and as you said, sometimes missing. So, I think definitely there’s a long list of like potential approaches you can make. You know, I think it’s often hard to kind of say, you know, are we over engineering the problem sometimes? So, you know, asking, you know, are simple things like interpolation like gonna be effective enough. You know, the way that we sort of handled the problem just in general was we just tried to, you know, we use the absolute cutoff as our time boundary to keep things like systematic across you know, it’s from kind of a practical perspective of you know how long you follow a patient to simulate out the trial, you know, it’s easier to be consistent about that. And then you know, to handle kind of the data within the four months, you know, the idea was basically to slice it at different time intervals. So, I do think there’s a lot of different ways that you can approach it though. 

[01:19:49] Yeah, absolutely. And fantastic work. Fantastic. 

Julian Hong, MD, MS [01:19:52] Yeah, thanks. 

Adam Dicker, MD, PhD, FASTRO, FASCO [01:19:54] So maybe you answered this, Julian, and thanks for sharing this. In your four-month snapshot, were you doing like a sliding window across and simulating that? 

Andre Esteva, PhD [01:20:04] Oh no the simulation, basically the way to think of it as far as the simulation is that it’s not a sliding window, it’s a consistent four-month time frame for every patient. So, one could imagine, and this gets into more the complexities of operationally how trials enroll and run over time with rolling time, because not everybody gets enrolled at the same time. And so, I think you know we sort of have, and I didn’t go into detail about this, but one of the things we do in the validation process is if a patient has an event, for instance, earlier than four months, then they don’t get simulated because they’ve already had an event. And I think to your question, Adam, a little bit is that you know, depending on the timing of when a patient gets enrolled, like if you were to use this in real life, you wouldn’t necessarily you would change kind of the time frame that you’re simulating depending on what’s going on with that patient at that time. 

Adam Dicker, MD, PhD, FASTRO, FASCO [01:21:10] Thank you. 

Julian Hong, MD, MS [01:21:16] Cool, thanks. 
Tamara Lotan, MD [01:40:18] So that’s a wrap on the AI session. I just wanna thank you guys for sticking it out with us and thank our wonderful speaker panel again. 

Back to Video Hub